Collection Development Guidelines

The Digital Public Library of America’s mission is to connect “people to the riches held within America’s libraries, archives, museums, and other cultural heritage institutions.” To that end, we aggregate records for free and immediately available digital resources. Bibliographic records, records that do not resolve to a digital resource that is freely and openly available on the web, are out of DPLA’s collecting scope.

In order to provide the best experience for our users and to create a well-rounded yet coherent collection, we have adopted some internal guidelines about materials to include in our aggregation. These guidelines generally rely on principles of inclusion: include as much as we can with exceptions as needed. The key factor for decision-making is whether the content will be discoverable and make sense within the context of DPLA. For example, since DPLA initially chose to focus on metadata and not full-text in our search, items that are best found through to full-text discovery would likely not be well-served in DPLA, and therefore, were discouraged.

Some types of content that typically require review and decision-making are:

  • Serials
    Serially produced titles like newspapers, journals, and volumes in monographic series are not as a rule excluded from DPLA aggregation. However, we ask partners to think about what level of description best suits DPLA’s metadata-only index. Issue level records that contain little unique metadata other than a date are not usually successfully integrated in the search and we try to avoid them if possible. In this case we would be comfortable including only a title-level record for the entire series.
  • Scholarly Materials
    Scholarly content in libraries and archives is usually very heterogeneous and difficult to categorize. DPLA is generally interested in adding to the collection materials that have been published or contain textual descriptions of research such as article pre- and post-prints, electronic thesis and dissertation, and reports or whitepapers of committees or research groups. Other materials that are more informal or non-textual such as presentations, data sets, faculty CVs, syllabi, or other course materials are generally discouraged. Another consideration is the metadata and rights status of the content. Since many libraries relay on faculty or student self-submission for these materials, description and assignment of rights can be ambiguous. When repositories can separate out the content in the first categories from those in the latter, DPLA is interested in collecting it.
  • Born-Digital Materials
    Born-digital materials that are easily accessible are the easiest for DPLA to collect. Formats that adhere to traditional types of content such as texts and images are typically fine. Problems arise with materials that require special software or other tools for access or interpretation such as web archive files or emulated interactive resources. Since DPLA’s primary mission is to provide access to content, these kinds of materials that require special forms of access are avoided.
  • Archival Content
    Most archival content is a great fit for DPLA and makes up a large portion of the collection. Issues with archival content however arise when the level of description doesn’t fit with the DPLA’s model of one record for one digital object. Most archival description relies on the categorization of groups of related content rather than individual items. A DPLA working group investigated this question in depth in 2016 and wrote Aggregating and Representing Collections in the Digital Public Library of America in response. The recommendations of that group were that records for aggregations of archival content, such as folder or boxes for example, should be collected by DPLA if the level of description is clear and the record corresponds to an aggregated object and not a full collection landing page. Records for individual items that are minimal, with only identifiers as titles and little to no other metadata, or that repeat metadata applied to the aggregation but not the individual item, should be avoided. Partners with questions about this type of content should read the whitepaper linked above for more details.
  • Audio/Visual Content
    In general audio/visual content is actively encouraged by DPLA as it is a relatively small portion of the existing collection. Decisions for this type of content, as with others, primarily relate to the level of description. Full text transcription of spoken content are not indexed, so robust metadata records are required. In addition, the ultimate ownership and hosting of content may also need to be reviewed. Objects that are hosting on external, public platforms like YouTube or Vimeo are acceptable, but should be free of advertising and should include a clear statement of ownership.
  • Scientific or Other Specialized Material
    Materials that are indexed or described using highly specialized language may be discouraged from inclusion in DPLA. Plant, animal or other scientific specimens that are only described by scientific names or terminology do not integrate well with the rest of the collection, which uses more common language. Records that include both scientific and common names are acceptable.

This information is intended only as a guideline. Final decisions are made on a case-by-case basis. In addition, they may change as our indexing and search changes. We are considering the addition of full-text indexing to our search which may impact decisions around serial titles considerably, for example.