Meeting 23 (2015-10-2)


Spotlight

Siberia!

 


 

Scheduling of Meetings During Fall


Scraping Work

 


Topic Modeling Work

Developer Task Assignments (Topic Modeling)

    1. Accessing and working locally with "flattened" collections of files. (Jeremy?)
    2. Deduplication of "humanities," "liberal arts," and "the arts" files. (Jeremy or Alan?)
    3. Scrubbing (current files are kept on Google Drive: we1s-2 > stopwords_and_scrubbing_list ) (Alan, Lindsay, Scott?) (need to put these files on Github? Run orientation meeting on Github?)
      1. Extra Stopwords list (current version created by Alan)
      2. config.py for Scott's python scrubbing script (current version added to by Lindsay and then by Alan)
    4. Creating/forking deduplicated and scrubbed working corpora.  (Long range goal: through query?) (Short range goal: manually created versions of our corpus for topic modeling experiments):
      1. all files
      2. sub-corpora by publication(s)
      3. sub-corpora by year(s)

 


Manifest scheme, Database system, Backup system

 

Scott created a demo of webform access to a mongodb database, and I have build a system to serve it out of containers (virtual machines). An early form example and a more recent database-connected example are hosted here:

 

    1. WE1S flask+deform  

    http://mirrormask.english.ucsb.edu:8500/

 

    2. WE1S flask+alpaca (+pymongo)  

    http://mirrormask.english.ucsb.edu:8501/

 

(NOTE -- as always you may need to campus VPN in order to access these URLs)