• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Finally, you can manage your Google Docs, uploads, and email attachments (plus Dropbox and Slack files) in one convenient place. Claim a free account, and in less than 2 minutes, Dokkio (from the makers of PBworks) can automatically organize your content for you.


Meeting (2019-02-14) UM Comparison Corpus Meeting

Page history last edited by Lindsay Thomas 2 years, 5 months ago



Meeting Time:       Thursday, February 14, 2 pm EST

Meeting Location: Library, Faculty Exploratory (room 305)



0. Preliminary Business


  • March timesheets due March 18 
  • Planning for future meetings (see bottom of agenda) 


Purpose of today's meeting

  • Exploration and comparison of humanities model and "not humanities" model, both composed of articles from top 10 US newspapers by circulation



1. Models for today


Sources used in each model -- top 10 US newspapers by circulation (that LexisNexis stores):

  • USA Today
  • The New York Times
  • Los Angeles Times
  • New York Post
  • Chicago Tribune
  • The Washington Post
  • Newsday
  • Daily News (New York)


Top 10 US newspapers NOT stored in LexisNexis:

  • The Wall Street Journal
  • AM New York 


  1. Humanities model: 75-topic model of articles containing the word "humanities," latest 2 years for each source (2014-2017), 1057 articles
    1. Dfrbrowser 
    2. PyLDAvis 
    3. Topic Bubbles 
    4. Project directory on harbor 
  2. Not humanities model: 75-topic model of articles NOT containing the word "humanities" (from comparison corpus, randomly sampled), 2017, 1184 articles
    1. Dfrbrowser 
    2. PyLDAvis 
    3.  Topic Bubbles
    4. Project directory on harbor 


Something about the models to keep in mind:

  • In the Humanities model, The New York Times and The Washington Post are overrepresented, as these sources return by far the most results for articles containing the word humanities.
    • NYT articles in Humanities model: 353 of 1057 (33% of articles)
    • Washington Post articles in Humanities model: 415 of 1057 (39% of articles)
    • Combined NYT and WP total: 768 of 1057 articles (72.6% of articles)
    • Other sources:
      • Chicago Tribune: 171 
      • Newsday: 30  
      • Daily News: 12  
      • USA Today: 6
      • New York Post: 3  
      • LA Times: 0  
  • Not Humanities model comparisons:
    • Chicago Tribune: 220 
    • NYT articles: 210 of 1184 (18% of articles)
    • LA Times: 170 
    • New York Post: 149  
    • Newsday: 138 
    • Daily News: 115  
    • Washington Post articles: 100 of 1184 (8% of articles)
    • USA Today: 82 articles   


  • Talk about PyLDA vis example:  
    • Humanities model: topic 2 in dfrbrowser vs topic 2 in PyLDAvis
      • Relevancy metric in PyLDA vis at 1.0 vs .6 




2. Model interpretation


  • Google form for model interpretation (2.14.19 form) 
    • This form is stored in our team folder on our shared Google drive (Academic Year 2018-19 > Project Teams > Comparison Corpus Team)
    • Get as far as you can: aim to fill out through step 2d 
  • Results spreadsheet (2.14.19 form)
    • Also in our team folder on GDrive 


  • Humanities model first, ~30-45 min
  • Comparison model second, ~25-30 min 



3. Discussion 




4. Finding articles that are about the humanities


  • Use Humanities model from today
    1. Dfrbrowser 
    2. PyLDAvis 
    3. Topic Bubbles 
    4. Project directory on harbor 
  • Use Bibliography view in dfrbrowser
    • 2014: Ruth
    • 2015: Suchi
    • 2016: Ashley
    • 2017: Dieyun
    • 2018: Tarika
  • Skim! Do a first-pass/quick read to just try to whittle the number of total publications down to a more manageable number. 
  • Record findings using Google form Ashley created in team Google drive (you should all be added into this folder; a link to it is also on Ryver in our team forum, in the top sticky post called "Links to get us started")
    • Google form for data entry (use the same form for both kinds of humanities articles: articles explicitly about the humanities as such, and articles about humanities subjects or disciplines but that aren't about the humanities writ large)
  • At our next meeting (March 5), we will discuss the first pass process, and begin selecting articles that are about humanities subjects or disciplines but that don't inlude the keyword "humanities."






Planning for Future Meetings


  • Next meetings: 
    • All-hands meeting: Feb. 28, Faculty Exploratory, 1 pm EST
      • UM team meeting: Feb 28, Faculty Exploratory 3 pm EST  Canceled due to English Dept faculty meeting
    • UM team meeting March 5? This is a Tuesday, but would 2 pm at this time work? (I am out of town March 7)  
    • All-hands meeting: Mar. 14 (UM spring break) (remote meeting via Zoom -- you are not required/expected to attend, but it is paid time if you want to attend the meeting)
    • UM team meeting March 19? This is a Tuesday, but would 2 pm at this time work? (I am out of town March 21) 







Comments (0)

You don't have permission to comment on this page.