Documenting the COVID-19 Pandemic

Web and Social Media Preservation Program

Coronavirus/COVID-19 Collection (Feb 2020–)

More than ever before, web archiving since 2020 has emerged internationally as a rapid-response means of documenting a crisis. The COVID-19 pandemic demonstrated that web archiving is one of the few immediate actions that information professionals and digital librarians and archivists can take to preserve a historical timeline and the primary resources about an extended crisis.

From the beginning of the COVID-19 pandemic in early 2020, Library and Archives Canada's (LAC) Web Archiving and Social Media Program team was fully engaged in documenting the evolution of the situation and its effects on Canadian society. The team curated a diverse collection that includes websites from government and non-government sources, as well as social media relating to the pandemic's impact on life in Canada.

COVID-19 collection scope, priorities and highlights:

  • French and English news media (daily newspaper crawls and targeted content)
  • Public health information from all levels of government (federal, provincial and territorial government resources with a focus on public health communications)
  • Impact on business and the economy (for example, corporate sites for affected industries)
  • Health, science and medicine (for example, information about research efforts)
  • Sites focused on social and cultural aspects, including religion, artistic and cultural expression, and impacts on families, children and education
  • Curated social media related to COVID-19 (for example, Twitter communications from public health officials, and ongoing capture of tweets with hashtags related to COVID-19 in Canada)

This important work not only collected digital information that will serve as historical primary sources on COVID-19 for future research, but it will also help tomorrow's Canadians to understand what it was like for those living through this crisis, and it will provide future leaders with important background, data and experiences to help guide their decisions.

Collection overview

This report summarizes archiving activity at LAC related to COVID-19 for the period from February 1, 2020, to March 1, 2022. In January and February of 2022, we increased our crawling activity in order to document the convoy protests happening in Ottawa and across the country. This effort added more than three million tweets to the Twitter dataset, and 78 web resources (reflected in the totals below).

Summary statistics for COVID-19 (February 1, 2020, to March 1, 2022)

  • Total news/media websites crawled daily: 34
  • Total non-media web resources selected daily: 1,929
  • Total digital assets collected: 453,954,388
  • Total data collected (including news media): 14.67 TB
  • Tweets captured for COVID-19 (hashtags #COVIDCanada, #COVID19Canada, #CanadaLockdown, #CanadaCOVID19, #MaskUpCanada, #Masks4Canada, #LightUpLive, #Eclaironslesscenes, #COVIDBC, #COVIDAB, #COVIDSK, #COVIDMB, #COVIDON, #COVIDQC, #COVIDNB, #COVIDNS, #COVIDPEI, #COVIDNFLD, #Convoidelaliberte, #FreedomConvoyCanada, #FreedomConvoy2022, #TruckerConvoy2022, #FreedomConvoy, #TruckersForFreedom, #TruckersForFreedom2022, #ConvoyForFreedom, #ConvoyForFreedom2022): 3,833,076

This graph shows the distribution of all resources collected in the LAC COVID-19 collection by publisher/data origin.

Figure 1: Resource distribution
Resource distribution, see text version below
Figure 1: Resource distribution – text version
  • Government of Canada: 5%
  • Provincial and territorial governments: 5%
  • Non-governmental: 90%

This graph shows the distribution of all resources collected in the LAC COVID-19 collection by language.

Figure 2: Resource language
Resource language, see text version below
Figure 2: Resource language – text version
  • English: 57%
  • French: 43%

This graph shows the distribution of all resources collected in the LAC COVID-19 collection by type of resource.

Figure 3: Resource type
Resource type, see text version below
Figure 3: Resource type – text version
  • Article: 11%
  • Website: 41%
  • Website-partial: 35%
  • Social media: 7%
  • Other (data repository, Podcast, Wikipedia, etc.): 6%
Date modified: