
Meeting 12 (2015-06-03)

Page history last edited by Alan Liu 9 years, 7 months ago






(1) Summer Development Plans 

  • Funding:
    • Currently, Alan projects he will have about $3,000 of his UCSB Academic Senate research grant left by beginning of June (or about 200 hours at the RA rate of $15/hr)
    • Alan has applied for another Academic Senate research grant to continue the project next year (should hear thismonth)
  • Planning for Collection (Scraping) Runs in June-July:
    • Alan has recruited RA's for collection runs of New York Times, Wall Street Journal, USAToday, The Guardian, NPR, etc.:
      • Jonathan Callies (English)
        Ashley Champagne (English)
        Phillip Cortes (English)
        Zach Horton (English)
        (Alex Kulick -- possible, depending on his summer travels) (Sociology)
        Patrick Mooney (English)
        Christopher Walker (back after July 7th) (English)
    • RA orientation meeting: June 29th, 1 pm
    • Collection runs to be scheduled in July
      • Pre-planning and preparation:
        • Prepping the workstations in Transcriptions lab with full "kit" of tools, workspace folders, manifest forms
          • (also: prepping personal laptop computers with same)
        • Hard-copy instructions
        • Pre-organizing storage site for corpora
        • RAs to request API keys for various sources
      • Coordinating with Lindsay?
    • If possible, we will also start pre-processing (scrubbing, and if necessary also chunking), e.g. using Lexos.
      • Pre-planning and preparation:
        • Install Lexos locally on the workstations if possible.
        • Prepare stoplist(s).
    • If possible, we will also do some topic modeling.
      • Pre-planning and preparation:
        • Ensure we have latest version of Mallet installed on the workstations and on our personal computers.
        • Work out the kinks in local installations of Mallet
        • Decide on number of topics
        • Pre-organize storage scheme and manifest tracking forms
  • Manifest system Web form development:
    • If Alan gets another round of funding from the UCSB Academic Senate, he can try to recruit MAT grad students to help with developing the Web form system for manifests.



(1) Data Storage & CMS


  • Latest steps in development work: Scott's MongoDB Manifest Schema
    • Scott (from last meeting): documentation of the MongoDB model: https://github.com/scottkleinman/WE1S. ("Read the README first, and then move on to the Draft Schema document. Apologies that neither document is particularly short.")
    • Current Progress toward the following goal (from the to-do's that came out of our last meeting)?: "Scott, Jeremy, and Chris will convene as a subcommittee to discuss the MongoDB-based schema that Scott has started, together with server, CMS, and related issues.  The goal, if possible, is to have a solution we can use for the June-July collecting runs.  If that is impractical, then the collection runs can just create doc versions of the manifest schema we  have already created (and we will migrate the data to the new system later)."



(2) Latest Scraping Workflows (and implications for Manifest system)




(3) Our Earlier Discussions of the Manifest Schema, Controlled Vocabularies, etc.



Comments (0)

You don't have permission to comment on this page.