Meeting (2018-11-13) Topic Model Interpretation Workshop (II)


 

Meeting Time:       Tuesday, November 6, 2018, 9-11 am (Pacific)

Meeting Location: DAHC (Digital Arts & Humanities Commons) (directions)

Meeting Zoom:     We'll use Alan's "instant" Zoom ID (our default meeting Zoom):  https://ucsb.zoom.us/j/760-021-1662

 

 


 

Purpose of today's meeting

 

Preliminary Business

 

 

 

 

1. Topic Model Interpretation Workshop (continued)

 

 

 

 

 

2. Discussion by all hands (meant to help further develop topic modeling settings and our interpretation protocol)

    1. Do we have enough consensus among all hands about the definitions of the topics?

      1. What is the right form/format for a definition of a topic? (list of the senses ["attributes'] gathered last Tuesday)

      2. Comment by Alan on Ryver to the project management group yesterday:

        "
        My initial thought is that a definition of a topic that is both functional (can be parsed, sorted, and labeled efficiently) and satisfyingly meaningful would have these components:

         

        * A label of two to three words (probably including at last one of the top words in the topic). Example: "humanities and ethics". We may need rules for how to form labels.

        * A gloss of the label in the form of a syntactically complete sentence. (I find the definitions that were written by some groups as sentence fragments or lists of concepts hard to grokk.) It's unclear what kind of sentence we want, however. One option is a descriptive "about" sentence (e.g., "This topic is about the humanities and ethics.") Another is a thesis statement (e.g., "The humanities are integrally related to ethics.") The advantage of the descriptive format is that it does not try to force a topic or theme into an argument. Another advantage is that we could add to descriptions the "attributes" we discover (e.g., "This topic is about the humanities and ethics, and includes subtopics of . . . ")"

         

    2. Do the definitions derived from the top topics and related topics we examined help us understanding better the themes, values, and connotations associated with the humanities in public discourse? Did the exercise help us discover a richer stock of humanities narratives?

    3. What improvements do we need to make in existing steps of interpretation?

    4. What additional steps or kinds of interpretation do we need protocols for? (Edited after discussion at the meeting(red = what we have tried so far)

      1. Method for assessing quality of a topic model

      2. Method for choosing granularity level to work with

      3. Rules-based (and algorithmically assisted) method of labeling topics 

        • Eventually, we may be able to consolidate clusters of labeled topics under a controlled vocabulary for a topic (e.g., "economics" or "politics") that would be like a "codebook"
      4. Method for identifying topics of interest

        • Method for identifying topics of interest for a particular research question 
      5. Method for analyzing a topic

        • After identifying topics of interest, we could invest the effort and time in studying according to our interpretation protocol--in essence, unpacking the label for the topic into a definition. 
      6. Method for comparative analysis of two topics

      7. Method for macro-analysis of a topic model 

        1. Describing the relation of topics

        2. Describing the relative importance of topics 

      8. Method for comparing two corpora (or sets of sources within a corpus)

      9. Method for longitudinal analysis of topics 

    5. Future tools and methods:

      1. Topic Bubbles (example)

      2. LDAviz

      3. Word embedding