Web Archiving

Background

  • Web archiving is conducted under the Library and Archives of Canada Act, section 8 (2) (sampling from the Internet for digital preservation purposes). Library and Archives Canada’s (LAC) latest policy instruments recognize web-based resources as unique, born-digital documentary heritage. Collecting and preserving web resources ensures future access and research use.
  • The Web Archiving Program began at LAC in December 2005 and has been an ongoing operational activity since 2013.
  • Web archiving is a digital preservation discipline and is practiced by over 50 international memory institutions, mostly national libraries. The field is advanced primarily by the International Internet Preservation Consortium (IIPC) of which LAC is a founding member. In 2019, Sylvain Bélanger is serving as the Treasurer and a member of the Steering Committee.
  • LAC employs a robust methodology for collecting web resources and social media, which includes comprehensive crawls of the Government of Canada (GC) web presence; curating thematic research collections (e.g., Centenary of the First World War, Canada 150, Federal Elections, Olympic and Paralympic Games); documenting important events in Canadian history as they unfold (e.g., Humboldt Broncos junior hockey team bus accident, forest fires in western Canada); engaging in “rescue” or preservation archiving of resources at known risk (e.g., the website of the National Inquiry into Missing and Murdered Indigenous Women and Girls); and supplementing other library and archival collections with web holdings, in collaboration with other internal and external experts (e.g., Truth and Reconciliation Web Archive).

Considerations

  • Currently, no public access is available for LAC’s non-federal web holdings, which comprise 50% of the total collections (30 terabytes). Funding to develop additional services and a comprehensive access portal is being proposed for the Central Agency Funding Request for Digital Optimization.

Key Public Messages

  • LAC’s web archiving methodology includes five main activities: 1. Domain crawls of the GC 2. Curation of thematic web and social media collections 3. Event-based crawling 4. Preservation archiving of resources at known risk and 5. Supplementing library collections or archival fonds with web holdings.
  • The collection currently comprises nearly 1.5 billion digital objects and 60 terabytes of data. As of 2016, web archival holdings accrue at a minimum rate of 13 terabytes per fiscal year.

SME:

Tom Smyth, Manager, Digital Preservation and Migration Division
Email: tom.smyth@canada.ca
Tel: 613-668-0674

Date modified: