| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Meeting 2014-05-16

Page history last edited by Alan Liu 9 years, 10 months ago

4Humanities "WhatEvery1Says" Project

 

Project Idea | Current State of Project | Future Steps and Need for More Collaborators

 

WhatEvery1Says Project Idea

 

Research Material:

  • WhatEvery1Says Corpus #1 (collected manually)
  • WhatEvery1Says Corpus #2 (extended corpus collected systematically or algorithmically)

Research Questions:

Example of trying to rebut clichéd premises on discussions of education: Valerie Strauss, "Five Bad Education Assumptions the Media Keeps Recycling")
Our hypothesis is that digital methods can help us learn new things about how media pundits, politicians, business leaders, administrators, scholars, students, artists, and others are actually thinking about the humanities. For example, are there sub-themes beneath the familiar dominant clichés and memes? Are there hidden connections or mismatches between the “frames” (premises, metaphors, and narratives) of those arguing for and against the humanities? How do different parts of the world or different kinds of speakers compare in the way they think about the humanities? Instead of concentrating on set debates and well-worn arguments, can we exploit new approaches or surprising commonalities to advocate for the humanities in the 21st century?"

 

                   Specific research questions:
    • What are the common "themes" (ideas, theses, evidence, metaphors, etc.) that divide or join people discussing the humanities?
    • What are the lower-level or latent themes beneath those everyone "knows"?
    • What are the outlier themes?
    • What are the patterns of connection between themes, between spokespersons, and between media outlets?
    • How do themes compare across time?
    • How are themes differentiated by nation, region, gender, age, etc.?
    • Other questions ... 

Research Method: Topic Modeling

 

       Other Possible Analytical Goals

 


Initial Proof of Concept

 


Intended Outcomes
  • Creation of interactive site for exploring the topic model of WhatEvery1Says. (Cf., DFR-Browser, a browser-based visualization interface created by Andrew Goldstone for exploring his topic model of JSTOR articles).
  • Co-authored research report or article on outcomes.
  • Workshop to brainstorm ways we can apply the outcomes in facilitating, guiding, or creating advocacy arguments and materials.

 

 

 

Current State of Project

 

Stage 1 Transformation of Corpus
(documents from raw corpus archived and extracted as plain text)

 

 

 

Stage 2 Transformation of Corpus
(plain text files cleaned and prepared for topic modeling)

We are currently working on specific components of the following set of processes, which ideally should be explored in iterative complementarity with initial topic modeling runs, and which should ultimately should be stitched together and automated as a single workflow:

 

  • Perform initial text cleaning, punctuation-stripping, and low-level prepping work -- automate using Lexos or other text-preparation tools?
  • Identify bigrams (e.g., "social sciences") that need to be converted to unigrams
  • Build a stop list (Jeremy Douglass)
  • Use named-entity parsers to identify proper names, etc., that can either be put in the stop list or set aside for social-network analysis separate from the topic modeling) (Zach Horton and Liz Shayne)
  • Use Parts-of-speech taggers to allow us to experiment with subtracting verbs, etc., to improve usefulness of topic modeling. (Priscilla)

 

Early Topic Model Run on the 61 Documents in the Stage 1 Transform Sample Corpus:

 

 

 

 

Future Steps and Need for More Collaborators

 

Major Tasks (some task groups could be the projects of other SoCal 4Humanities chapters / digital humanists):

  1. Continue advancing and experimenting with Stage 2 Transformation of WhatEvery1Says corpus.
  2. Iterative work on running and tweaking topic models of the corpus.
  3. Develop methods and scripts for automated, systematic identification of relevant documents for inclusion in raw Whatever1Says corpus:
    1. Identify available full-text corpora (e.g., newspaper and magazine online archives)
    2. Develop methods of searching and relevancy identification.
    3. Collect documents for Stage 1 transformation.
    4. Extend collection backward in time to selected sample decades.
  4. Develop (or borrow) methods of facilitating the interpretation of topic models:
    1. Create visualizations and other methods of "grokking" topic models
    2. Develop or adapt front-end interfaces for topic models.  Examples:
      1. Andrew Goldstone's interface
      2. Jeffrey M. Binder and Collin Jennings's interface
  5. Use the WhatEvery1Says corpus for other kinds of analysis:
    1. Social network analysis
    2. Other kinds of text analysis, or clustering analysis
  6. Possible future co-authoring of article(s)

Comments (0)

You don't have permission to comment on this page.