• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Meeting (2018-12-04)

Page history last edited by Alan Liu 5 years, 7 months ago


Meeting Time:       Tuesday, December 4, 2018, 9-11 am (Pacific)

Meeting Location: DAHC (Digital Arts & Humanities Commons) (directions)

Meeting Zoom:     We'll use Alan's "instant" Zoom ID (our default meeting Zoom):  https://ucsb.zoom.us/j/760-021-1662




Preliminary Business


  • Upcoming Doodle poll to find possible "all-hands" meeting times in UCSB's Winter quarter (Jan-end of March)
  • UC Path issues: be vigilant!




Purpose of Today's Meeting


  1. Planning for work in Dec. and Jan. to prepare for next cycle of topic-model interpretation workshops beginning in Feb.

    1. Big-picture roadmap of WE1S project timeline of tasks for project years 2 & 3 (excerpted)

    2. Roadmap for rest of current academic year:

      • Ongoing collection work
      • Interpretation rehearsals (of parts or all of our interpretation protocol) in February and March.
      • "Real" interpretation of our corpus (including subcorpora) to address one to three specific research questions in spring quarter--e.g.,
        • What is the similarity/difference between mainstream news media and student discourse on the humanities? (Or a comparison between our main corpus and any subcorpus)
        • How do the humanities and sciences compare in news media?
        • How does the "cosmic background radiation" of the humanities compare in the U.S. West vs. East vs. Midwest, vs. South?
        • How does our corpus compare to a "random" corpus? 
      • The goal is to position ourselves by the end of the academic year to embark on actual analytical and reporting work on our corpus.  
  2. Breakout meetings of teams


1. Status Updates


  • (Team 1) Primary Corpus Collection & Analysis Team -- (Led by: Lindsay Thomas, with the other PIs; Members: Tyler, Leila, Tarika, Rebecca, Jamal, Ryan, Sean)
  • (Team 2) Students and the Humanities Team (Led by Abigail Droge; Members: Jessica, Avery, Leila, Rebecca, Aleah, Tyler)
  • (Team 3) "Gender, Ethnic, and Racial Diversity and Interests" Team (Led by Giorgina Paiella; Members: Jamal, Aleah, Su)
  • (Team 4) Interpretation Lab Team (Led by Dan Baciu; Members: Su, Ryan, Sean, Cindy, Sihwa) 
    • Team 4 Google Drive folder 
    • Interpretation Lab Work Log   
    • Interpretation Protocol development (led by Dan)
    • Word Embedding (Fabian Offert, Teddy Roland, Isaac, Su, Ryan)
    • Data Visualization (Sihwa Park, Cindy Kang) 

    • "Random" Corpus (Led by Lindsay Thomas; Members: U. Miami RAs, Ray) 

      • Note: This group may be run in collaboration between Team #1 and Team #4



2. Tasks for Dec. and Jan. to prepare for next cycle of topic-model interpretation workshops beginning in Feb.



Fuller Interpretation Protocol (red = what we have workshopped so far)

  1. Preliminary steps:

    1. Method for sampling and normalizing materials from our corpus for modeling

    2. Method for assessing quality of a topic model

    3. Method for choosing granularity level to work with

    4. Scrubbing 

    5. Rules-based (and algorithmically assisted) method of labeling topics 

      • Eventually, we may be able to consolidate clusters of labeled topics under a controlled vocabulary for a topic (e.g., "economics" or "politics") that would be like a "codebook." 
    6. [Include initial "human topic modeling" to help assess topic model and also provide a preliminary overview of topics?] See Team 2's recommendations (Google Doc)

    7. Method for finding and reading original articles (i.e., when our articles are eventually "bagged" and not available as JSON text) 

  2. Method for macroanalysis of a topic model

    1. Identifying topics of interest

    2. Describing the structure of the model and relation of main topics

  3. Method for close analysis of a topic

  4. Method for comparative analysis of two topics

  5. Method for comparing two corpora or subcorpora (or sets of sources within a corpus) 

    1. Method for including social media in our analysis. (E.g., see Ray's topic model of Reddit)

  6. Method for longitudinal analysis of topics 

  7. Method for reporting on the interpretation of a topic model 



3. Breakout meetings


    • Team meetings
    • PIs to meet. PI topics include--
      • TBD




From Lindsay's Ryver Post to Team 1 (Primary Corpus Team) of Nov. 24, 2018:


First, I've finally managed to download data for and produce models of articles from the NYT from 2012-2017 containing the other keywords we agreed on last month.... I've produced 11 models based on different keywords, and one comparison model (articles from the NYT from 2012-2017 containing the word "humanities"), all of which you'll find on harbor 10000 in our team folder: http://harbor.english.ucsb.edu:10000/tree/write/projects/teams/2018-19-1-primary-corpus.


The data for all models is taken from the NYT from 2012-2017. All models were produced on harbor 10000 with the python-only workflow, which means that the "Scaled" view in dfr-browser doesn't work, and neither will the topic bubbles visualization. (There's a lot more to say about why we're gradually going to switch to a python-only workflow for modeling, and what problems that will solve and cause, and maybe we'll talk some about it on Tuesday. For now, it's enough to know that there are now two ways to make a dfrbrowser: using notebook 4, which is an R notebook, or using only notebook 1, which contains a python-only way to create a browser. The python-only workflow works better if you want to create diagnostics files, or if you have a large number of articles [above 10,000].) Where possible, I've normalized the number of articles in each model to ~1200, which is the size of the comparison model (the NYT 2012-2017 "humanities" model), so that we are comparing models of roughly the same size. I did this in the notebook workflow by randomly selecting articles for inclusion in the model. Models with ~1200 articles have 100 topics, models with <500 articles have 25 topics. I chose these numbers quickly and without a ton of thought, so it's very likely the models aren't as good as they could be. But I hope they will suit our purposes. Links to each model are in the worksheets I've created (see below).

The models are as follows, in no particular order:

  • "art history", 100 topics
  • "english literature", 25 topics
  • "history", 100 topics
  • "humanities" AND "funding", 25 topics
  • "humanities" AND "science", 25 topics
  • "liberal arts", 100 topics
  • "literature", 100 topics
  • "nea", 25 topics
  • "philosophy", 100 topics
  • "science", 100 topics
  • "stem", 100 topics
  • comparison model: "humanities", 100 topics
  • The search for articles containing the keyword "neh" only returned 32 articles from 2012-2017, so I did not do a model for that keyword.

Here's what I'm thinking for Tuesday's meeting:

  1. Once we get to the breakout sessions part of the meeting, those of us in team 1 choose 1 model to focus on and interpret. In our team Google drive folder (let me know if you don't have access to this), you will find a folder called "Topic model worksheets." This folder contains 12 worksheets I've created for interpreting the models. These worksheets are based on the interpretation protocol we went through during our last two meetings, with some revisions based on Abigail's feedback from her group and the new diagnostics files we are now saving via MALLET. Choose the model you want to focus on, click on that worksheet, and add your name to the top where it says "Name." If someone has already taken a model you wanted to work on, please just choose another one. The worksheets contain links to each model's browser and diagnostics file, as well as basic information about each model (the number of articles, the keyword, etc). (If you want, obviously, you can choose your model ahead of the meeting.)
    • I will focus on the comparison model (the "humanities" 100-topic model), and will complete my worksheet by the meeting on Tuesday. During the meeting's breakout session, I actually have to meet with the other PI's about the interim report due to the Mellon. So I won't be able to meet with our team then. That's why I'm thinking we'll use the meeting time for individual work (see below).
  2. Use the breakout session time of the meeting to work on your worksheet. There are more models than there are members of our group, so if you are feeling ambitious, feel free to take on more than 1 model. Just remember to add your name to the top of the worksheet for any model you are working on or plan to work on.
  3. Finally, we schedule an actual team meeting for Thursday, December 6 from 10 am - 11 am PST to discuss our models. I'm shooting for Thursday just in case Alan wants to schedule another all-hands meeting for that Tuesday. Does Thurs, Dec 6 from 10 - 11 am PST work for everyone for a team meeting?







































Planning for Future Meetings








Comments (0)

You don't have permission to comment on this page.