Scoping Project Home Page

Page history last edited by Alan Liu 3 years ago


This page serves as the home page for the WE1S scoping project, which researches strategies and resources for expanding the WE1S corpus of materials. (Last revised 7/12/17)

I. The Scoping Problem


A. Statement of Corpus Expansion Plans

[From grant proposal]: WE1S plans to devote research at the beginning of its timeline to determine which specific sources to target in these areas that will be most representative and useful for the project's goals. While the criteria for representativeness and usefulness will evolve iteratively as the project team begins its research on potential sources.... [Go to full statement on corpus expansion plans]


B. Statement of Plan for "Scoping Statement" at End of Project

[From grant proposal]: Finish collection work for the WE1S main corpus and sub-corpora, and create a "scoping statement" for the collection. Activities related to collecting and ingesting materials as datasets will be completed near the beginning of year 3 so that WE1S can concentrate on analysis and dissemination work. PI Liu and co-PI Thomas, with the assistance of RAs at their campuses and also in consultation with other co-PIs and postdocs, will take the lead in writing a scoping statement describing the nature, selection criteria, and organization of the project's gathered materials (with their associated manifests providing metadata on provenance and workflow) so that WE1S's public, humanities scholar and administrator, and digital humanities audiences will be able to understand what was gathered for study.


II. Seed Resources for Beginning to Think About the Scoping Problem


  1. Katherine Bode, "The Equivalence of “Close” and “Distant" Reading; or, Toward a New Object for Data-Rich Literary History," Modern Language Quarterly 78:1 (2017): 77-106.  (paywalled article) (open-access preprint). The beginning of Bode's article is a severe critique of "distant reading" in the digital humanities for naive or non-transparent understandings of the corpora they make the bases of studies. The part to focus on for our purposes is pp. 95 to the end. Here, Bode articulates the idea of a corpus gathered for DH analysis that would be a "scholarly edition of a literary system" (i.e., a scholarly edition not just of a work but of a whole corpus of works).
            Essentially, WE1S wants to scope its selection of resources as a "scholarly edition of a media [not Bode's literary] system." WE1S will want to explain the rationale for its collected materials in some manner like that articulated by Bode, though with attention to contemporary media "impact."
  2. Anya Schiffrin and Ethan Zuckerman, "Can We Measure Media Impact? Surveying the Field," Stanford Social Innovation Review, Fall 2015. This is an overview of current approaches to assessing the "impact" of media. This piece sets out categories of assessing media impact. However, consensus and tools/data for implementing assessment qua the categories have clearly not arrived. Nevertheless, just knowing what categories to consider might be useful for WE1S.


III. Paradigms to Investigate



IV. Format for Reports on Paradigms (reports to be assigned to RAs or teams of RAs)

(Cf., the "Research Reports" for the Transliteracies Project)


V. Criteria and Principles to Consider in Scoping (will evolve as scoping research continues)

  • "Representativeness"
    • Regional, national, local geographical representativeness
    • Political representativeness
    • High/middle/low-brow representativeness
    • Social respresentativeness
  • Circulation
  • "Impact"
    • Most cited
    • Most referred to from social media
    • Most influential
From Lindsay's email to Alan, 6/14/17:
"The Schiffrin and Zuckerman piece also seems helpful in terms of how it breaks down the conceptual category of “impact”: this includes not only reach, or circulation/traffic, but also the more-difficult-to-measure concepts of “influence” and “impact.” Impact seems very difficult to qualify or quantify, and I wonder how important it is for our research goals. If, for example, there was a highly influential article about the humanities — one cited or referred to many times by other articles — published in an outlet that isn’t eventually included in our corpus, is this important to us? Would not including this particular article in our corpus affect our corpus’s “representativeness” in a substantial way? My initial instinct is that it wouldn’t, not necessarily, since the decisions we are making about “representativeness” are being made at the level of the publication or outlet, not the individual article (i.e., "does this particular publication represent 'US public discourse' in some substantial way?" not “is this individual article representative of ‘US public discourse about the humanities’ in some substantial way?”). But perhaps this is the wrong scale to be thinking at? I’m not sure."


  • Syndication & Reprinting
    • See Ryan Cordell, "Reprinting, Circulation, and the Network Author in Antebellum Newspapers," American Literary History 27.3 (2015): 417-445.
    • Katherine Bode, "Fictional Systems: Network Analysis and Syndication Networks." Chapter 5 in her A World of Fiction: Mass-digitization, Nineteenth-century Australian Newspapers, and the Future of Literary History. Manuscript, 2017.
  • Chronology
    • Diachronic criteria for "representativeness"? (i.e., what is representative now, as opposed to 20 years ago?)
    • How to prioritize materials chronologically? e.g., first collect the last five years, and then stage work so that we collect materials in batches going back in time?)
  • ?



IV.a Scoping Research Tracking Sheet


IV.b Scoping Research Graph?

Lindsay's idea (in email to Alan of 6/14/17):
"One thing that we might want to think more about are figures 3.1-3.3 from the “Canon/Archive” Lit Lab pamphlet you link to under Part IV of the Scoping Project Home Page (pg 4 from the pamphlet, https://litlab.stanford.edu/LiteraryLabPamphlet11.pdf). These figures describe a map of the “literary field,” which is based on Bourdieu’s famous diagram of the 19th-century French literary field. The x-axis represents popularity, and the y-axis represents prestige. I wonder if we could come up with a similar model for newspapers. For instance, the popularity axis could represent some combination of circulation figures and web traffic figures, and the prestige axis might include some metric of how “influential” certain publications are. For example, we could adopt/adapt Nate Silver’s methodology for calculating the “influence” of news outlets based on a representative (? this is arguable, I suppose) sample of their citation metrics (this is from the Forbes article you link to in Part IV of the Scoping Project Home Page: https://www.forbes.com/sites/jeffbercovici/2011/03/25/the-most-influential-news-orgs-according-to-google/#5d05b62541ae; Silver’s explanation of the method is here: https://fivethirtyeight.blogs.nytimes.com/2011/03/24/a-note-to-our-readers-on-the-times-pay-model-and-the-economics-of-reporting/?scp=2&sq=nate%20silver&st=cse&_r=0). We could then choose publications from this field that maximize both popularity and prestige, and that are technically feasible to scrape."

"One major problem with this approach, of course, is that it’s likely to miss some sources that we’ve already identified as high value for our particular research questions, like publications found in the Ethnic Newswatch database. We would need to come up with additional criteria describing the inclusion of these sources, and I’m not sure what such criteria would be. One idea would be to repeat a similar experiment to the one detailed above, but only including sources listed in databases like Ethnic Newswatch."








