Developers for WE1S collect (scrape and clean) materials on local workstations first, then upload the results to the WE1S Google Drive, where we are accumulating a corpus of both working_data (e.g., aggregate text files and JSON files representing stages of the collection process) and final data (plain-text and cleaned text files for each publication, organized by individual years).

On the shared Google Drive, there is a standard convention for folder structure within the data_archive for each collected resource. For example, the folder tree for the New York Times is as follows:

Here,

Instructions for Uploading Collection Results to Google Drive