Aggregation
DPLA’s aggregation provides access to more than 47 million images, documents, videos and other cultural heritage artifacts from more than 5,000 libraries, archives, and museums across the United States.
What is aggregation?
Aggregation is the process of taking records from more than one catalog, converting them so they all match the same format, applying some standardization so the data match the same rules, and then making the resulting collection searchable. The goal of our work is to allow users to find all the records in the same place, rather than having to find all the different search interfaces. This practice can be seen as the opposite strategy to “federated search,” which applies the same search query to multiple interfaces. Aggregation is considered to be superior to federated search because it is difficult to combine the search results from a variety of different search engines into one list of search results that is ordered by relevance properly.
How did DPLA come to do aggregation?
DPLA’s aggregation business can be seen as a mirror of the work established by Europeana. Originally, DPLA’s mission was to aggregate more than just cultural heritage; more types of information such as ebooks and newspapers were meant to be included. This was seen as a means for libraries and similar institutions to have a robust footprint on the web in the wake of the Google Books settlement and in the absence of a national entity coordinating information resources at the national level in the United States.
What tasks are involved in DPLA’s aggregation work?
To establish a new hub, DPLA first works with the hub to define a metadata standard and means of transmission of the records to DPLA. DPLA then builds a “harvester,” a program that accepts the records from the hub and records them in DPLA’s system. Next, a program called a “mapper” is written that converts the records in their native format to DPLA’s metadata standard called DPLA MAP. Finally, generic programs do work to attempt to clean and validate the metadata and write it to DPLA’s search index. These programs are typically run multiple times a year for every hub to ensure the accuracy of DPLA’s copy of their metadata, as institutions add, remove, and change records regularly.
How do hubs aggregate?
Some hubs don’t have to do any extra work to enable aggregation because their records are already in a single system. Other hubs do the equivalent of what DPLA does with a variety of contributing institutions. Some of these hubs would aggregate content from their contributing institutions even if they didn’t then share this content with DPLA. Generally, these hubs have their own portal or web presence. There are also hubs that aggregate purely as a part of participating in DPLA; these hubs generally don’t have their own web presence.
If you have questions about aggregation, or how your institution might contribute to DPLA’s aggregation, please get in touch.