Meeting Outcomes (to-do's in red)
- Next full-team meeting: Thursday, Sept. 15, 1pm (Pacific): We will brainstorm topic model interpretation process in that meeting.
- Corpus Metadata Finalization:
- RAs to finish current corpus by c. Sept. 12th
- Nathalie Popa to collect Globe & Mail "liberal arts" (However, exporting the article bodies manually from the sheets is no longer necessary.)
- Variations among individual RAs' CSV files to be fixed/standardized
- Jamal to provide Tyler with sample CSVs
- Producing the Corpus:
- Tyler to make a script that does the following: [P.S. Scott indicated at the meeting he had quickly started on this]
- Export the article bodies as plain-text files named by the values of the ID column (e.g., "nyt-2012-h-14")
- Store in appropriate folders in tree on filestation (or storage MongoDB/Manifest system if it is ready)
- Lindsay to work on producing a quick-and-dirty "random" corpus on a relatively small scale (pending future decision on whether we need to improve the random corpus).
- We will use script to export plain-text files for whole corpus.
- Back-end Development:
- Tyler to continue working on MongoDB/Manifest (and file uploading) in collaboration with Scott. Tyler to document his work/code as appropriate.
- Tyler to consult with Jeremy (in a meeting?) on ideas for possibly simplifying the WE1S workflow.
- Jeremy to work on the total WE1S workflow system on the filestation, MongoDB, and virtual machine (as indicated in his diagram).
- Jeremy to work on the de-duping part of the workflow (as in his diagram)
|