• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, and Slack. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.


Collection Rehearsal Workshop (2015-04-01)

Page history last edited by Alan Liu 7 years, 8 months ago

1. Orientation Rehearsal (Collection)

(Alan to show rehearsals on his screen, with other workshop participants reproducing steps on their computers)


(A) Quick group run-through of NY Times collection workflow (articles mentioning "humanities" in Jan-Feb, 2014):


(B) Quick run-through of Wall Street Journal workflow for articles mentioning "humanities" in 1995 and 2014:

  • WSJ collection workflow
    • Concise _instructions.txt
    • Data storage: [based on above data storage tree for New York Times, with any necessary variations due to the Proquest workflow for WSJ and any tweaks discovered in process of New York Times run-through] 


2. Production-level Rehearsals (Collection)


  • Production Runs:
    • Repetitions of above "orientation rehearsals" for NYT: 2014, 2015 and WSJ: 1995 and 2014 by individual participants in the workshop using specific lab machines (or laptops).
    • Creating the manifest.yaml files for each collection run.

  • Discussion of Results:
    • Quality-control analysis/discussion of our collection process, focusing on issues such as:
      • Problems in downloading files (i.e., skipped files)
      • "Continued" article pages in NY Times
      • Wall St. Journal full-text in early years hidden in a Flash thingie.
      • Problems arising in data storage organization (if any). 
    • Discussion of issues related to data storage organization.
    • Discussion of issues related to the manifest schema.





Comments (0)

You don't have permission to comment on this page.