Bodleian Digitisation Research project: testing ways to digitise our collections at scale

About this project

A Victorian black and white printed leaflet next to a ruler and colour grade chart

The Bodleian Libraries began a pilot research project in February 2025 to investigate how to digitise parts of our collection at scale. OpenAI has funded the project as part of their 5-year partnership with the University of Oxford and as part of their NextGenAI consortium of Higher Education Institutions.

We are asking the following research questions:

Could the Bodleian scale up digitisation throughput in its imaging studio? (Workstream 1)
Can generative AI be used to enhance metadata and full text records for areas of our collection? (Workstream 2)
How much of our collection could be digitised at scale and how would we prioritise digitisation activities? (Workstream 3)
Could we scale up digitisation outside of our imaging studio? (Workstream 4)
What equipment and workflows are needed to effectively digitise at scale at the Bodleian? (Workstreams 1 and 4)
What are other libraries doing with AI? (Workstreams 2 and 5)
What impact might AI have on search and discovery of our collections? (Workstream 5)

The project has been divided into 5 workstreams and we will publish a report on the findings from each workstream as open access reports on the Oxford University Research Archive (ORA) in early 2026. Our team will present their early findings from the research project at an online event on 28 November 2025. Find out more and book here.

Why are we researching our digitisation processes?

The Bodleian Libraries has been digitising parts of its collections for decades, and we continue to digitise so that our readers, whether they are researchers, students, or members of the public, can access the knowledge and information in our collections.

The core work of a library – to collect, curate, preserve and share the world’s knowledge – has been the same since we were founded in 1602 by Sir Thomas Bodley, and continues to drive us today. The fact that we can digitise and make available our unique collections to readers beyond the physical bounds of our libraries, the city of Oxford, or indeed the UK, gives us new ways to connect our readers to information.

But our collections are vast. We have approximately 14 million printed items, and each item might have hundreds of pages. So we are looking for ways to scale up the digitisation of millions and millions of pages and make our collections more widely available.

Workstream 1: Scaled digitisation in our Imaging Studio

Digitisation equipment at the Imaging Studio in the Weston Library, Oxford

The Imaging Studio at the Bodleian Libraries is home to a team of skilled photographers and digitisation professionals who have digitised millions of pages of our most precious collections items, and which are published on Digital Bodleian.

In this project, the team is trialling the use of new equipment to digitise books at scale. They are using a collection called Global Dissertations as a sample for the project. This collection of European and American PhD theses from the nineteenth and twentieth centuries was used as the base material for testing new approaches to enhancing metadata and transcriptions in Workstream 2.

Around 125,000 images of our Global Dissertations have been captured for digitisation. The team is also digitising the handwritten card catalogues that are used as finding aids for the Global Dissertations. A further 429,000 card files have been created using the new scanning equipment.

Workstream 2: Metadata and transcriptions

A catalogue card with data about a book from our Global Dissertations collection

The team is investigating how optical character recognition (OCR) and handwriting text recognition (HTR) technologies perform when extracting text from scanned card catalogues and full texts. They are interested in how ‘traditional’ OCR performs compared to new AI-enhanced OCR models found in ChatGPT and other generative AI providers.

The team is also looking at whether the data created and stored for decades in card catalogues can be brought into the digital age by using AI to extract and classify elements of catalogue data, and the extent to which AI can create catalogue records from documents directly. The team is asking key questions about:

The reliability of data extraction.
The accuracy of any data produced.
What workflows would be needed.
How to ensure specialist librarians with expertise in resource description and cataloguing are ‘humans in the loop’ of any automated or semi-automated processes.

In the final element in this workstream, one of our researchers is doing a sector review of how AI is being used in research libraries, comprising desk-based research of AI projects in libraries as well as an international sector-wide survey taking the pulse of where the libraries are on their AI journeys.

Workstream 3: Collections audit for digitisation

The Bodleian Libraries has good information in our library catalogue and related finding aids, but we do not know specifically how many items in our collection would be suitable for digitisation. This workstream asks, which items in our collection can be digitised, and how might we be able to create workflows to digitise these items at scale?

There are key limitations on what we might want to digitise at scale. These limitations include excluding very old books, which are better digitised at a slower pace by experts in our Imaging Studio. More modern books are still in copyright and cannot be digitised and published.

We are working with librarians at all our 23 site libraries and the Collection Storage Facility (CSF) to inform the audit. Their specialist knowledge of their collections is helping us to systematically review the items that may be in scope, as indicated by our catalogue data.

Workstream 4: Scaled digitisation at the Collections Storage Facility

Books in our Collections Storage Facility in Swindon

Our Collections Storage Facility (CSF) in Swindon provides specialist storage for lower-usage items from the Bodleian Libraries’ collections. Items at the CSF include books, maps, manuscripts, microfilms, periodicals, and newspapers, totalling over 10 million items.

The team has set up a mini digitisation studio at the CSF to trial new approaches to digitising our books, card catalogues, and microfilm collections, using new equipment funded by the research project.

The team is producing a service description, operating procedure, and workflows because of trialling these new approaches. In the process, thousands of books that are stored at the CSF are also being digitised.

Workstream 5: AI for collections search and discovery

Ultimately, the items that we are capturing for digitisation need to be made discoverable for research purposes.

In May 2025, staff from the project team attended a workshop at Yale University on AI for Cross Collections Discovery. The results from this workshop will inform the creation of an international network of cultural heritage professionals and AI researchers interested in the future of cultural heritage and library search and discovery.

Further information

For further information about the project, contact us at: strategy@bodleian.ox.ac.uk

Bodleian Digitisation Research project

Bodleian Digitisation Research project: testing ways to digitise our collections at scale

About this project

Why are we researching our digitisation processes?

Workstream 1: Scaled digitisation in our Imaging Studio

Workstream 2: Metadata and transcriptions

Workstream 3: Collections audit for digitisation

Workstream 4: Scaled digitisation at the Collections Storage Facility

Workstream 5: AI for collections search and discovery

Further information

Resources

For readers

Follow us

Quick links

Live chat