| 
View
 

Meeting (2019-03-05) UM Comparison Corpus Meeting

Page history last edited by Lindsay Thomas 6 years, 2 months ago

 

 

Meeting Time:       Tuesday, March 5, 2 pm EST

Meeting Location: Faculty Exploratory (library room 305)

 


 

0. Preliminary Business

 

  • March timesheets due March 18 at the latest 
  • All-hands meeting next week, 1 pm EST (10 am PST): can attend remotely 

 

Purpose of today's meeting

  • Check in about article classification, begin process for classification of articles NOT about the humanities 

 

 

1. Hand classification of humanities articles discussion

 

  • 140ish marked as explicitly about the humanities 
  • Why we are doing this: produce data set of known classifications for training
  • Process:
    1. Train model on known data, A
      1. Set 1: articles about the humanities (explicit)
      2. Set 2: articles not about the humanities 
      3. Feature set:
        1. Tf-idf: increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word. 
        2. Possibly also using LWIC 
    2. Train model on known data, B
      1. Set 1: articles about the humanities (explicit and implicit)
      2. Set 2: articles not about the humanities 
      3. Feature set:
        1. Tf-idf: increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word. 
        2. Possibly also using LWIC 
    3. Test models A and B on unknown data
      1. Set 1: articles containing the word humanities (that the model hasn't seen before)
      2. Set 2: articles not containing the word humanities (that the model hasn't seen before)
  • Logic of steps 1 and 2 is to see how/if these classifiers can predict the class of an article based on the presence or absence of a search term alone. 
  • Questions? 

 

 

 

 

2. Read articles explicitly about the humanities

 

  • Google form results spreadsheet (NOT orange!)
  • Process:
    1. For those articles marked as being explicitly about the humanities:
      1. Using the link in the results spreadsheet, read the article.
      2. If it is indeed explicitly about the humanities as a concept/theme, highlight the results spreadsheet row in yellow (see example row 2 in spreadsheet).
  • 3 Readers:
    • Ashley
    • Suchi 
    • Tarika 

  

 

 

3. Classification of articles NOT about the humanities

 

  • New Google form (NOT about the humanities form is orange! So is top row of its results spreadsheet!)
  • Working from 75-topic comparison corpus model 
  • Only need about ~150ish articles 
  • If article is somehow about the humanities (without containing the word humanities), enter it into the humanities form we worked with previously: WE1S Comparison Corpus Top 10 News Sources 2014-2017 
    • Mark as "does not contain the word humanities" on that form (this is a new edit to the form) 
  • Volunteers:
    • 2017, A-M (bibliography view): Ruth
    • 2017, N-Z (bibliography view): Dieyun

 

 

 

4. Goals

 

  • Next UM team meeting is March 19
    • Finish classification of articles NOT about the humanities by then
    • Finish reading/double-checking articles marked as explicitly about the humanities 
    • My goal: Get classification notebook written and in working order 

 

 

 

 

 

Planning for Future Meetings

 

  • All-hands meeting: Mar. 14 (UM spring break) (remote meeting via Zoom -- you are not required/expected to attend, but it is paid time if you want to attend the meeting)
  • UM team meeting March 19, 2 pm, Faculty exploratory 

 

 

 

 

 

 

 

Comments (0)

You don't have permission to comment on this page.