• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Meeting (2020-03-01) - Project Management Planning Meeting 3

Page history last edited by Alan Liu 4 years, 4 months ago


Meeting Time:       March 1, 2020, 3-5 pm (Pacific)

Meeting Location: Hollister Brewing Company

Meeting Zoom:     We'll use Alan's "instant" Zoom ID (our default meeting Zoom):  https://ucsb.zoom.us/j/760-021-1662


Red font = notes from the meeting


1. Dataset Releases


  • Primary public release: on Web site or Github?
    • Large dataset releases as organized by keyword/conceptual unit in MongoDB (e.g., humanities, science, comparison data, Reddit) 
      • Flat file datasets
        • Accompanied by Manifests

      • Queryable access via Manager? 
      • Notes from our Discussion:
        • Datasets will be large batches of zipped JSON files (of "bags of tokens"), with a manifest, in a Frictionless Data package
        • Those batches will need to be sectioned and otherwise made useable for the laptop-based user (who will otherwise find it impossible to download, open, and work with millions of files)
        • We will also include with the datasets utilities that make the JSONs useable in other common formats:
          • Scott's notebook for converting JSONs to CSV; and also a tool for converting to plain text
        • Datasets will be deposited in GitHub and Zenodo.
        • We intend to do a demo by the time we meet with Greg in early April. 
        • In regard to queryable access:
          • Manager will not be available for that within the scope of this grant.
          • But see below on hosting our topic models on our site 
    • "Collection" dataset releases--for each collection:
      • JSONs of "bags of tokens" with citations
      • Mallet files
      • Vizualization files 
      • Notes from our discussion:
        • These collections, models, and visualizations will be packaged like the large-batch datasets above for downloading
        • However, we will also host the models live from our web server.
        • And we will also make available containers for each collection (able to serve up the live models). 





2. Social media datasets


  • Reddit
  • Twitter 






















Planning for Future Meetings








Comments (0)

You don't have permission to comment on this page.