• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Meeting (2018-01-18) RAs meeting

Page history last edited by Samina Gul Ali 6 years ago Saved with comment

 Meeting Outcomes:

(jump to notebook added after meeting at bottom of page)


Meeting time: Thursday, January 18, 2018, 10:00 a.m. Pacific Meeting Zoom URLhttps://ucsb.zoom.us/j/877286911
  • PIs: Alan Liu; co-PIs: Jeremy Douglass, Scott Kleinman, Lindsay Thomas
  • Project Manager: Samina Ali



  • UCSB RAs:
    • Rebecca Baker (English)
    • Nazanin Keynejad (Comp. Lit.)
    • Giorgina Paiella (English)
    • Aili Peeker (English)
    • Jamal Russell (English)
    • Tyler Shoemaker (English)



  • Text-Analysis Hacker Group:
    • Faculty
      • Fermín Moscoso del Prado Martín, Assistant Professor of Linguistics, UC Santa Barbara
    • Graduate Students
      • Sandra Auderset(Linguistics, UCSB)
      • Devin Cornell (Sociology, UCSB)
      • Nicholas Lester(Linguistics, UCSB)
      • Fabian Offert (Media Arts & Technology, UCSB)
      • Teddy Roland (English, UCSB)
      • Chloe Willis (Linguistics, UCSB) 


  •  U. Miami RAs:
    • Samina Ali (English, U. Miami)
    • Tarika Sankar (English, U. Miami)
    • Annie Schmalstig (English, U. Miami)
  •  CSUN RA: Sandra Fernandez 

Next meeting (?):  




Administrative Issues

  • Workshop on Markdown/Github now moved to Dec. 26th, 12:30-3:30 Pacific.
  • Alerts for the future:
    • WE1S workflow for ingest of articles for analysis to be ready (hopefully) by end of Feb. 
    • WE1S summer research camp & Advisory Board meeting: Planning Document 



1. Purpose of today's meeting


 From Lindsay's Ryver post of Jan. 10:

"At this meeting, we will be discussing the current state of our corpus and its representativeness. In order to help us do this, we are asking each of you to please come to the meeting prepared to discuss the progress you've made researching your area of focus. Please be ready to share the following things with the group:

  • An estimate of the number of sources you have been able to find in that area of focus (even if you haven't yet entered data about that source into the Google form).
  • How you have been making decisions about what counts as "representative" for your area of focus. Is there a particular metric you have been using? Are you working from some kind of list? Are you going for geographical coverage? Political spectrum coverage? Major outlet coverage? Etc.

Alan and I will also be assessing the overall state of the corpus before the meeting. After reports from everyone, we will discuss the current state of the corpus, where we seem to have good coverage and why, what areas of focus we might be missing sources from and why, etc."


2. Database Sheets/Forms For Corpus Collection Scoping Work 


 WE1S Corpus Collection Form

WE1S Corpus Collection List Form  

WE1S Corpus Collection List (current) WE1S  Corpus Collection List   
  Deprecated version of corpus collection list  
Trello Board for current tasks  
Areas of focus Areas of Focus  



3. Reports by RAs 

--Ongoing work

--Estimate of the number of sources you have been able to find in your area of focus (even if you haven't yet entered data about that source into the Google form).

--How you have been making decisions about what counts as "representative" for your area of focus? Is there a particular metric you have been using? Are you working from some kind of list? Are you going for geographical coverage? Political spectrum coverage? Major outlet coverage? Etc


  • Samina Ali
  • Rebecca Baker
  • Naz Keynejad 
  • Giorgina Paiella 
  • Aili Peeker 
  • Jamal Russell
  • Tarika Sankar 
  • Annie Schmalstig
  • Tyler Shoemaker




4. Assessing Current "Representativeness" of Collection List


(Using Copy of data sheet for counting and analysis)


Useful formulas for analyzing Google sheets (examples)

    =countif(E3:E365, "search-term")  (to count numbers of specific items in a range)

    =countif(E3:E365, "<>")  (to count numbers of non-empty cells in a range)


  • Current total of sources -- 362
  • Col. C-D (Ownership / Corporate) -- 85 in "Corporate" column
  • Col. E (Distribution Method) 
    • Print -- 277 
    • Broadcast -- 33 
    • Online -- 47 
  • Col. F (Medium)
    • Newspaper -- 230
    • Magazine --  54
    • Web news --  36 
    • TV -- 22
    • Radio --  10
  • Col. G (Format)
    • Broadsheet -- 137
    • Tabloid --  36
  • Col. H (Nationality), red = more than 3 items
    • US --  221
    • AL (Albania)  --  1
    • AT (Austria) --  1
    • AU (Australia) -- 13
    • AZE (Bahrain) --  1
    • BA (Bosnia and Herzegovina) --  1
    • BB (Barbados) -- 1
    • BG (Bulgaria) --  1
    • BW (Botswana) --  1
    • CA (Canada) -- 6 
    • CD (Republic of Congo) --  1
    • CH (Switzerland) --  1
    • CZ (Czech Republic) --  3
    • CM (Cameroon) --  2
    • CY (Cyprus) --  2
    • DE (Germany) --  4
    • DZ  (Algeria) --  1
    • EG (Egypt) --  1
    • ES (Spain) --  2
    • FJ (Fiji) --  1
    • FI (Finland) -- 1
    • FR (France) --  1
    • GE (Georgia) --  1
    • GH (Ghana) --  2
    • GR (Greece) --  1
    • HU (Hungary) --  2
    • IE (Ireland) --  3
    • IL (Israel) -- 1 
    • IN (India) -- 5 
    • IS (Iceland)  --  1
    • IT (Italy) --  1
    • JP (Japan) --  1
    • LV (Latvia)  --  1
    • MDA (Moldova) --  1
    • MX (Mexico) --  10
    • MT (Malta)  --  2
    • NG (Nigeria) --  3
    • NL (Netherlands) --  2
    • NO (Norway) --  2
    • PL (Poland) --  2
    • PT (Portugal) --  1
    • RO (Romania) --  1
    • RU (Russia) --  4
    • RW (Rwanda) --  1
    • SA (Saudi Arabia) --  2
    • SE (Sweden) --  1
    • SI (Slovenia) -- 1 
    • SK (Slovakia) -- 1
    • SN (Senegal) --  1
    • TH (Thailand) -- 1
    • UAE -- 1 
    • UG (Uganda) --  2
    • UK --  22
    • ZA (South Africa) -- 6 
    • ZW (Zimbabwe) --  2
  • Col. K (Press Freedom Ranking) 
  • Col. L (Circulation) (updated 2/9/2018, 11 AM EST)
    • DAILY:
      • Under 50,000: 94
      • 50,000 – 100,000: 32
      • 100,000 – 200,000: 44
      • 200,000 -  300,000: 24
      • 300,000 – 400,000: 9
      • 400,000 – 500,000: 11
      • 500,000 – 1 million: 8
      • 1 – 5 million: 16
      • 5 – 10 million: 4
      • 10 – 20 million: 3
      • 20 – 30 million: 1

    • WEEKLY:
      • Under 100,000: 7
      • 1 - 10 million: 2
      • 60– 70 million: 1


    • Under 500,000: 4
    • 500,000 – 1 million: 1
    • 1 - 5 million: 1
    • 5 - 10 million: 1
    • 10 - 15 million: 1
  • UNKNOWN: 207


  • Col. S (Geographical Coverage) 
  • Col. T (Cultural Class)
    • High brow -- 36
    • Middle brow -- 142
    • Low brow --  18
  • Col. AA (Political Orientation)
    • Libertarian -- 1
    • Liberal --  26
    • Progressive --  14
    • Center-Left --  23
    • Centrist --  23
    • Center-Right --  10
    • Conservative --  31
    • Independent  -- 1 
    • Neutral --  1



5. Next Steps


  • (a) Analysis of current list and recommendations for high-priority collection:
    • Assign RA to analyze Col. H (Nationality)
      • analyze by region
    • Assign RA to analyze by Col. K (Press Freedom Ranking)
    • Assign RA to analyze by Col., L (Circulatoin) 
    • Assign RA to analyze by Col. AA (Political Orientation) 
  • (b) Assess quality/value of articles in sources for the WE1S mission
    • Sample articles, including those centering on humanities, and those in which humanities is a peripheral mention 
  • (c) Initial trial runs of searching/downloading from sources.
    • Create searching/downloading recipes for database and other sources .
    • Comment on problems. 



Meeting Outcomes


1. Group To Do’s

  • Github workshop: Everyone must sign up for a Github account before next week’s workshop
  • Collection work and related research on our areas of focus:
    • As we come up against challenges and questions in our collection, we will document them (in both Georgina’s document and Lindsay’s new section of the Google form
    • We will also do a quick search to see if other scholars have addressed or research issues relevant to our areas of focus. (At this point, the goal is to begin gathering citations and other material so that by the summer we will be in position to judge whether there is enough material to merit writing a research report on a problem. The concrete example that came up in our meeting: has there been research, or are there ways of studying, what accounts historically for the left-leaning slant of independent zines, newspapers, etc?)
    • Even if there is little info on a publication, for now we are entering it into the google form. We will use the convention of adding “(?)”--without the quote marks--in sections of the form (that is, cells in the data sheet that the form reports to) where we currently have little confidence in the information.
    • As we move forward, we will determine if our running list serves as a place to keep track of everything, or just the “best” or most available sources
  • Initial searching of sources and inspection of articles found:
    • We will perform some test searches/downloads of articles containing the word “humanities” from databases or other sources of publications in our areas of focus
      • After we collect this initial sample and document anything significant or interesting, we can also do a secondary search with words like “literature,” “philosophy,” etc. 
    • We will do some reading of sample articles to get a feel for the material the sources offer.
    • We will add to or tweak the "recipes" page (or pages) on the PBWorks site that Samina will set up to document the exact steps for searching and downloading in specific database and other sources. 
      • Keep in mind that our PBworks page is public, so as we document comments and challenges about specific publications, be aware of your descriptions
  • Remember to keep track of all your work on Trello


2. Samina (Caribbean)

  • Collection:
    • Narrowed sources down to 32 major newspapers, but only 6 of them are available through academic databases (for now, she will enter all 32 into the Google form with the limited information available)
    • Samina will also go through our current corpus and begin checking/organizing for overall listings on circulation numbers
  • On Trello: will make a separate list of Spanish-language newspapers in the Caribbean (to possibly share with Tarika?)
  • On PBworks:
    • will put together a list of all our websites, documents, spreadsheets, etc; to be posted on PBworks
    • will set up a page where we can develop “recipes” for how to search and collect for news sources
      • We can also keep track of any problems/challenges/constraints we face when searching (for future RAs)
      • This page will also have links to our newspaper collection workflows from last year, so we have a sense of how to record our database workflows


3. Rebecca (Mexican Newspapers in English)

  • Issues of access—not much available in databases
  • Will be collecting translated newspapers as viable sources
  • Although the main priority is news sources, an additional low priority sub corpus collection will include publications on tourism, airline/cruise ship magazines, graduation speeches, academic discourse, etc.
    • As we explore these additional resources, start gathering information on any “terms and conditions;” keep an eye on red flags that mind hinder our use of the sources (add this in the “notes” section within the Google form)
  • Web news sources: there is currently over-representation of left-leaning sources; as we move forward we should be more mindful of our political representativeness in the corpus
  • Rebecca will go through our current corpus and begin checking/organizing for overall listings of Press Freedom Rankings


4. Naz (Middle East)

  • Currently avoiding government sponsored news sources and publications that are purely focused on business (but Naz is keeping track of these on a side list)
  • Many Middle East newspapers are not actually published within the region, but they are coming out of places like D.C., Australia
  • Naz is adding countries like Pakistan, Afghanistan, Bangladesh, etc. to her list
    • She will also be in conversation with Aili about some Eastern European countries that are also considered part of Middle East
  • Naz will also go through our current corpus and begin checking/organizing for overall listings of region/nation


5. Georgina (Gender and Ethnic News)

  • Georgina is currently narrowing list based on “canonical” publications (Ms., Off Our Backs, etc.) as well as smaller publications that are mentioned by canonical sources
  • Is also considering how to flag or track publications that are likely to allow us to address the project's research questions related to diversity from a perspective addressed to or from specific groups (e.g., papers in ProQuest Ethnic Newswatch that serve a particular group, as opposed to broad-coverage newspapers that may also address etnnicity issues.) 
  • Is also making sure that we have coverage for all possible ethnic groups
  • Will mock up a google doc for us to record problems/challenges we face as we continue our collection (she has some notes from her summer research that may help us tackle certain issues)


6. Aili (Europe)

  • Aili is currently using a Guardian listing to determine “best” newspapers per country
    • This list is very thorough in terms of representation, but there are some questions about its categorization
  • Aili (like most of us) is not familiar with the politics of each nation, so it is a bit challenging to determine if representativeness covers the political spectrum
  • There is also the issue of number of newspapers per country (e.g. Russia vs. Luxembourg); for now, we are collecting everything deemed "useful" (using our best judgement), but as we move forward we will have to think about large countries and the number of sources included, compared to smaller countries (geographically and population-wise)


7. Jamal (political orientation)

  • Right now, we have an over-representation of left-leaning publications
  • There is also the issue of how we are determining a publication's political orientation (this changes based on region, historical context, cultural shifts, etc.)
    • As we move forward, Jamal will keep track of these details in the “notes” section of the Google Form, as well as his own notes
  • Jamal will also go through our current corpus and begin checking/organizing for overall political orientation


8. Tarika (Spanish Language Sources in US)

  • Organizing her sources by region/state
  • Will be including Puerto Rico in her collection
  • Not sure how her sources identify in terms of political orientation, brow, audience
    • Will contact a UM faculty member who might be able to provide insight on this issue (possibly Lillian Manzor?)


9. Annie (Canada)

  • Right now, there are simply too many: not sure how to narrow down sources
    • We are considering narrowing down based on circulation, but we don’t want to leave out any important perspectives
    • Annie has narrowed it down to 3 national newspapers, but also has 140 local/region/province publications
    • This issue might be solved on its own later, when we determine what is available in academic databases
  • Next step is to contact two Canadian graduate students who might be able to provide insight on this issue: Ray Leonard (former UM student) and a UCSB student (former summer researcher)


10. Tyler (Africa)

  • Currently it is difficult to organize collection around region, state-owned, and political orientation
  • Tyler has determined two kinds of publications
    • Those connected to the continent’s colonial history (and sometimes published out of former colonial nation)
    • National newspapers that focus on more local issues
  • Currently narrowing it down to two per nation (based on availability)
    • This is a bit challenging since we know very little about the politics of all these countries, but hopefully the details will come to light once we begin to analyze
      • Then we can determine if we need to improve our representativeness
  • We also want to keep in mind that this collection (and most of our international sources, really) might be a “shadow” of European colonialism


11. Lindsay

  • Will be adding a few sections to the google form
    • Region/Nation
    • Ethnicity/Gender (only to be considered if publication has a specific audience in mind)
  • Will bring up concerns about access and conglomerate ownership when speaking with Ryan Cordell (especially considering issues of “representativeness”)
  • Will email everyone instructions for Github download and log in
  • Will email UM students with information on completing the IP agreement


12. Things to consider as we continue:

  • An interactive political spectrum bar for publications
  • A world map for keeping track of how many articles we have per region
  • How to increase our knowledge of a publication when there is little info available online
    • Contact the publication HQ?
  • Since Xiuhe Zhang is no longer part of the team, we need to think about different ways to gain insight on English-language newspapers from East Asia
  • By the end of the academic year, we might have two corpora—what is available to us in academic databases and what is not available
    • This is help us with discussions about access and challenges we have had with the project


Comments (0)

You don't have permission to comment on this page.