Fantastic Futures 2025
This meeting has been held annually since 2018. It brings together heritage institutions that want to improve access to their digitised collections and the quality of their metadata by using artificial intelligence tools. The following projects are particularly relevant in the context of implementing the 2029 action plan at the ISSN International Centre, and more broadly in terms of necessary acculturation.
The French Ministry of Culture has launched the Comparia website, which compares the performance of several generative artificial intelligences (https://comparia.beta.gouv.fr/). Users can submit a query to two AIs chosen at random by the site. The AIs analyse the response and provide the answers. The user then evaluates the answers and finally obtains an assessment of the energy consumption of the two AIs in producing the answers.
As part of their mission to preserve the memory of government institutions, the British government archives have developed a tool to create metadata from the vast quantity of documents they process. According to its website, the Apache Tika™ toolkit can detect and extract metadata and text from over a thousand different file types, such as PPT, XLS and PDF. These file types can all be processed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more. This tool could be used to support publishers’ requests.
The Library of Congress has implemented an AI tool that automatically creates metadata for digital or digitised monographs. This service is managed by Digirati (https://digirati.com/). Similarly, the Harvard Library uses Apache Airflow to automate the ingestion of resources with metadata extraction. These are then compared to the indexing of resources already described in the library catalogue via ElasticSearch. Harvard also uses Better Binary Quantisation to store data in vector form. BBQ is described on its website as follows: ‘BBQ is a leap forward in quantisation for Lucene and Elasticsearch, reducing float32 dimensions to bits and delivering ~95% memory reduction while maintaining high ranking quality.’ It outperforms traditional approaches such as product quantisation (PQ) in terms of indexing speed (20–30 times faster), query speed (2–5 times faster), and with no loss of accuracy.
The Yale Library, like the National Library of Luxembourg, is also involved in producing metadata using AI. The latter had to process a backlog of around 75,000 deposited digital files. ChatGPT 4.0 was initially used to generate metadata, but the results were disappointing in terms of subject indexing. ANNIF (https://annif.org/) was therefore preferred. The National Libraries of Sweden and Germany are engaged in similar projects. The German National Library (DNB) has implemented a project aimed at improving the performance of generative intelligence in German. Seventeen million digital publications were selected, including thirteen million periodicals. These were reworked to anonymise the texts and modify them so that they were no longer subject to copyright. These texts have been ‘tokenised’ and will be used to train AI in German.
The KB, nationale bibliotheek (formerly the Koninklijke Bibliotheek of the Netherlands) has published a statement (https://www.kb.nl/en/ai-statement) to technically limit the use of its digital collections by commercial companies that train their generative AI, particularly from the Delpher digitised continuing resources site (https://www.delpher.nl).
Finally, the Stanford Library presented a project involving the digitisation of typewritten cards containing marine biological observations. These cards were processed by AI to generate JSON metadata files. The presenters emphasised the importance of providing very detailed prompts to the AI.

Print