• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Meeting (2017-08-03)

Page history last edited by Alan Liu 6 years, 9 months ago

 Meeting Outcomes:

(jump to notebook added after meeting at bottom of page)



 New UCSB "Kronos" Timekeeping system (email)Spotlight Next meeting (PI's + RA's): Thursday, August 17th, 1pm (?)



1. Scoping Paradigm Research Reports




2. Break-out tasks that emerged from last meeting


  • Alan to create a page where our group can suggest the kinds of research questions we think we would like WE1S's expanded corpus to allow us to address. We will prioritize these questions, and use them to guide us in prioritizing what materials our expanded corpus must include. In progress: page here
  • Sage & Ryan to study and use "lists" of journalism sources we already know about to draft a suggested top sources for us to gather. In progress: page here (see also Lindsay's Google doc)
  • Alan to consult with colleagues in the Communication Dept. to see if there are existing lists (representative, most impactful, etc.) of journalistic resources. In progress: letter sent as per below
    Dear Andrew and Miriam,

    I thought I'd seek your advice, or perhaps a recommendation of anyone else in Comm. or on campus who might know:

    The team for a project called "WhatEvery1Says" that I direct has been using topic modeling to study a corpus of newspaper articles related to the humanities (with the goal of researching the state of public discussion of the humanities and arts). We've so far been working with a restricted set of publications--e.g., NY Times, Wash. Post, Wall St. Journal, LA Times--since the advent of fully digital articles c. 1980-90.

    Now it looks like we will be getting a Mellon grant to scale up the project considerably (we'll know for sure in Sept.). My team and I are studying how to expand the number of English-language publications we sample in a meaningful way to accommodate greater geopolitical, sociocultural, and other aspects of diversity (including balancing local and national news), while also paying attention to available criteria of media impact (circulation, etc.). We're also investigating widening to include born-digital journalism and audiovisual journalism available in transcript form.

    Do you know if there are any existing lists of newspapers, magazines, and/or online journalism sources (domestic and/or international) that Comm researchers or political/sociological researchers use as standard representative publications?

    We're trying not to reinvent the wheel, in other words.

    Any leads you have, or suggestions of other doors I should knock on, would be great.
  • Alanna to consult with subject librarians for the same purpose. Alanna's contact ideas:
    1. Leahkim Gannett who is the subject librarian for Communications/Dance/Film & Media Studies/Theatre;

    2. Rick Caldwell, the subject librarian  for History (world)/Political Science/Economics (also he's married to Jan Caldwell, and is a lovely person);

    3. Jane Faulkner (English/ French);

    4. Ryan Lynch (Global and International Studies, Global Peace and Security, as well as being the subject librarian for the Iberian studies/Spanish/Portuguese, which may be helpful if we do eventually make a Spanish language corpus);

    5. Shari Laster (Government Information/Law and Legal Information);

    6. Chizu Morihara (Library Sciences); and

    7. Gary Colmenar (Linguistics).
  • Teddy to think about how to implement a way of tracking selection criteria for particular sources that would allow us to "score" them for inclusion in our corpus. In progress:
    Alan's synopsis of Teddy's slides in the form of a response to him this morning. His slides are here.

    1. To help us cognitively track and understand the shape of the corpus we want (essentially, to model it), we create a visualization platform--perhaps as simple as a set of slides--in which we insert publications in a tree organization. Elaborating on your idea:
    * we could evolve or change the tree as we go;
    * we could talk to Scott about how to express the corpus model in his Manifest schema (in JSON). This would logically treat each visual box as an "object" that in the future might be manipulated programmatically (e.g., via JavaScript) to generate the visual tree;
    * it would be possible to visualize and compare several versions of the tree. (This may be the way to deal with overlapping or difficult ontology structures that don't cleanly nest in each other).
    --[cf. Battle of Britain War Room at Uxbridge]
    --cf. WE1s "Manifest" system
            Scott's Manifest schema
            Form for a publication manifest in Scott's Manifest system

    2. After selecting publications to collect from and then do the actual collection work, we set up what amounts to an analysis logic that:
    * prepares the corpus for a particular research question by choosing the right boxes from the tree;
    * then samples the publications in those boxes for an equal number of articles (2,000 articles from each publication)

    This makes topic modeling a logical process of assessing an overall sample with equal weights of source material. (But it also raises interesting issues about the sampling method for sources with unequal total numbers, density, lengths, and kinds of articles)
  • Sage & Ryan to use the data sheet as a means of modeling our corpus. (Probably best use a separate sheet of individual sources for this purpose, rather than reorganize the existing sheet of databases as Alan had mistakenly suggested during the meeting).
    • To begin with, as an empty "wire frame" for a corpus model, we can create a separate band of rows in the sheet for each kind of source (e.g., newspaper, magazine, online publication, etc.)
    • Then we can begin adding particular candidate sources.
    • Later, we can "score" the candidate sources based on their "representativeness" criteria and review the balance of the overall corpus. 
    • Finally, we can map onto the Manifest schema, and onto visualizations (Teddy's tree graph, Boiurdieu-like XY graphs)



3. Source Research Project (source research project home page)

  • Data Sheet for WE1S Sources (Google spreadsheet) -- (Note to curators of data sheet: need to download as Excel sheet periodically for backup)
  • Task A (Database research)
    • RAs to be assigned as lead researchers for each existing database source in the Data Sheet for WE1S Sources.
      • Besides filling out the columns in the data sheet, the lead researcher for each source should produce a sample search results page and, if the licensing terms of the source allow, save it in the "Sample Search Results From Sources" folder (linked as a persistent link if possible; or copied onto a page or as a PDF or something).
    • But all RAs are asked to familiarize themselves with the following, clearly important database sources (list may grow):
      • LexisNexis
      • ProQuest
    • All RAs are also asked to keep their eye out for new sources during the course of their research (to be reported to Task B team or investigated directly)
    • Sage/Ryan to mark completed rows in the database sheet where we have done the initial research on that database; then assign (ask for volunteers for) researching rows yet to be researched
    • Summary assessment of database sources to be conducted at our Aug. 17th meeting.
  • Task B (Scouting new database and related sources) (supervised by Lindsay) + Alanna + Jamal + Giorgina
    • Three RAs to be assigned to work with Lindsay on finding other sources (not necessarily just databases) we haven't thought of yet. For example, the task group could canvass library resources, peruse bibliographies of research in relevant fields, etc. The task group would discuss/brainstorm possible sources; collate/combine sources; and then add the new sources to the existing data sheet.
    • Additional sources suggested by Lindsay:
      "I put together a Google doc of potential sources to include/add in the main corpus, which you can find here (well it’s really a list of lists of potential sources): https://docs.google.com/document/d/1yh8BAw_SXHFA8QFJ7I15uJwiyLG33HAC7lq5DnWO3sw/edit?usp=sharing (I also posted this on Ryver). I organized the sources in this document by circulation numbers/rankings, which I was thinking could be a useful metric for us as we’re thinking about what sources to include in the main corpus (something like as many of the top-25 or top-50 newspapers by circulation as we can get, as many of the top-25 magazines by circulation as we can get, etc.). This is a slightly different way of thinking about what sources to include, but it doesn’t preclude including other sources we decide we want to include in order to answer our specific research questions, like newspapers by and for different ethnic groups, for example. And many of the sources listed on these lists might already be included in our main corpus. But thinking about the composition of our main corpus in this way does have the advantage of being concrete. Lists like the ones I’ve started to include in the Google doc provide us with a kind of pre-existing logic that we can build on to."
  • Timeline:
    • Tasks A and B to proceed into mid August
    • Summit meeting on sources on Aug. 17th
    • Then, using the results of our scoping and source research projects together, we will in late August to Sept. begin producing the actual list of publications for WE1S to collect (i.e., populating the data sheet with candidate publications, annotated with "representativeness" facets/criteria.
  • Teddy's counts of sources in LexisNexis: LNAUSA_A_to_Z.csv
















Meeting Outcomes -- To-Dos

  1. Alanna and Jamal to complete  their scoping paradigm research reports.
  2. Everyone to add questions to the research questions page.
  3. Ryan & Sage to assess the "lists" of journalism and keep scratchsheet of what appear to be the most important and useful metrics that they will allow us to utilize in assessing candidate publications for our corpus.
  4. Alanna to consult with subject librarian(s).
  5. Sage & Ryan & Teddy to add columns for "representativeness" facets on the "Individual Sources By Medium Sheet" in our data sheet . (See Alan's Whiteboard from last week's meeting for the facets/criteria we were considering)
  6. [To be discussed in future: how to express our "representativeness" facets equivalently in a spreadsheet, the Manifest schema, and Graphviz so that we can:
  7. Ryan & Sage to mark completed rows in the "Database Sources" sheet of the datasheet, and to assign RAs to rows that still have to be researched.
  8. Task B group to begin populating the "Task B Brainstorming" sheet in the datasheet with suggestions for additional database and other resources to research

Goals for Upcoming Aug. 17th "summit meeting" on sources research:

  • Assess and prioritize database sources to concentrate on based on research completed on the "Database Sources" sheet of the datasheet.
  • Discuss how to express our "representativness facets" in the Manifest schema and Graphviz.
  • Discuss Teddy's idea for normalizing number of articles to include in each category & publication when topic modeling (we may defer this issue because Teddy can't make the Aug. 17th meeting)
  • Set up for main task of the remaining part of summer (late August through September): populating the "Individual Sources By Medium Sheet" in our data sheet with candidate publications to be collected for the WE1S corpus.









































Comments (0)

You don't have permission to comment on this page.