| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, and Slack. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.

View
 

Meeting (2020-03-01) - Project Management Planning Meeting 3

Page history last edited by Alan Liu 2 years, 8 months ago

 

Meeting Time:       March 1, 2020, 3-5 pm (Pacific)

Meeting Location: Hollister Brewing Company

Meeting Zoom:     We'll use Alan's "instant" Zoom ID (our default meeting Zoom):  https://ucsb.zoom.us/j/760-021-1662

 

Red font = notes from the meeting


 

1. Dataset Releases

 

  • Primary public release: on Web site or Github?
    • Large dataset releases as organized by keyword/conceptual unit in MongoDB (e.g., humanities, science, comparison data, Reddit) 
      • Flat file datasets
        • Accompanied by Manifests


      • Queryable access via Manager? 
      • Notes from our Discussion:
        • Datasets will be large batches of zipped JSON files (of "bags of tokens"), with a manifest, in a Frictionless Data package
        • Those batches will need to be sectioned and otherwise made useable for the laptop-based user (who will otherwise find it impossible to download, open, and work with millions of files)
        • We will also include with the datasets utilities that make the JSONs useable in other common formats:
          • Scott's notebook for converting JSONs to CSV; and also a tool for converting to plain text
        • Datasets will be deposited in GitHub and Zenodo.
        • We intend to do a demo by the time we meet with Greg in early April. 
        • In regard to queryable access:
          • Manager will not be available for that within the scope of this grant.
          • But see below on hosting our topic models on our site 
    • "Collection" dataset releases--for each collection:
      • JSONs of "bags of tokens" with citations
      • Mallet files
      • Vizualization files 
      • Notes from our discussion:
        • These collections, models, and visualizations will be packaged like the large-batch datasets above for downloading
        • However, we will also host the models live from our web server.
        • And we will also make available containers for each collection (able to serve up the live models). 

 

 

 

 

2. Social media datasets

 

  • Reddit
  • Twitter 

 

 

3.

 

  •  

 

 

 

4.

 

  •  

 

 

 

5.

 

  •  

 

 

 

 

Planning for Future Meetings

 

 

 

 

 

 

 

Comments (0)

You don't have permission to comment on this page.