| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Developer Task Assignments (Scraping) (redirected from Developer Task Assignments)

Page history last edited by Ashley Champagne 8 years, 2 months ago

 

 

 

 

 

New York Times collection work

Red = completed (queries for both "humanities" and "liberal arts").  Finished work here.

 

Year
Collector
Started
Completed

Inspector
(inspection protocol)

Inspection Notes

Inspection

Completed

Inspection Sign-Off "Reader" Scrubber

Scrubbing

Completed

1981 (original bad scrape)
Alan Liu
6/30/15
7/2/15
Chris Walker
This year to be rescraped by Alan; about 20% of articles are missing "next pages"      
   
1981 (second scrape) Alan Liu 7/25/15 7/25/15
Chris Walker  This is the second scrape from scratch of 1981. No detectable errors.
7/31/15  CW
Ashley
   
1982
Zach Horton
7/3/15  7/28/15  Chris Walker  In the humanities, there are more plain text files (316) than appear on the master spreadsheet (265). It appears that the chopping process produced a number of plain_text files with information only about the title, date, etc. I suggest a rescrape for humanities. Liberal arts completely clean.

8/21/15

 

       
1983
Alan Liu
7/3/15
7/5/15
Chris Walker
Only minor errors  7/24/15    Ashley
   
1984
Zach Horton  7/3/15  7/28/15   Chris Walker 

In the humanities, there are more plain text files (302) than appear on the master spreadsheet (250). It appears that the chopping process produced a number of plain_text files with information only about the title, date, etc. I suggest a rescrape for humanities. Liberal arts completely clean.

8/21/15

 

       
1985
Zach Horton  7/3/15  7/28/15   Chris Walker 

Only Minor errors

8/21/15

 

       
1986
Zach Horton  7/3/15  7/28/15   Chris Walker 

No errors detected
8/21/15
CW
     
1987
Zach Horton  7/3/15  7/28/15   Chris Walker 

Humanities folder is missing a significant number of plain text files--it only contains 148, but there are 230 listed on the spreadsheet. I suggest a rescrape. Liberal arts plain text folder had numerous duplicate files (corrected now).
8/21/15
       
1988
Zach Horton  7/3/15  7/28/15   Chris Walker 

Completely clean.
8/21/15
CW
     
1989
Zach Horton  7/3/15  7/28/15   Chris Walker 

Many URLs missing (corrected). A few missing next pages.
8/21/15
       
1990
Zach Horton  7/3/15  7/28/15   Chris Walker 

LA plain text missing file 31. Otherwise clean.
8/21/15
       
1991
Zach Horton  7/3/15  7/28/15   Chris Walker 

Completely Clean.
8/28/15
CW
     
1992
Zach Horton  7/3/15  7/28/15   Chris Walker 

Completely Clean
8/28/15
CW
     
1993
Zach Horton  7/3/15  7/28/15   Chris Walker 

Completely Clean
8/28/15
CW
     
1994
Zach Horton  7/3/15  7/28/15   Chris Walker 

Single missing "next page" in Humanities plain text files
8/28/15
       
1995
Zach Horton  7/3/15  7/28/15   Chris Walker 

Completely Clean
8/28/15
CW
     
1996
Alan Liu
7/9/15
7/10/15
Chris Walker
Very clean  7/24/15    Alan (Alan is also scraping "arts" in NYT for this year as an experiment)
   
1997
Alan Liu
7/10/15
7/22/15
Lindsay Thomas  Clean  7/28/15         
1998
Phillip Cortes
7/17/15
7/22/15
Lindsay Thomas  Problems with aggregate-plain-text files for both "humanities" and "liberal arts" (didn't correspond to master spreadsheets). Re-copied and pasted master spreadsheets to create new aggregate-plain-text files for both search terms; re-split those aggregate files into individual text files for both "humanities" and "liberal arts" working_data and data folders. Individual plain text files should correspond to master spreadsheet now. 7/28/15         
1999
Jonathan Callies
7/17/15
7/17/15  Lindsay Thomas  Clean; renamed plain-text files for "liberal arts" to correspond to master spreadsheet file numbers (there was a blank first file). Missing "data" folders for 1999; created these folders and copied plain-text files for both "humanities" and "liberal arts" into proper folders.  8/3/15         
2000
Chris Walker
7/17/15
7/22/15
Lindsay Thomas  Clean; renamed plain-text files for "liberal arts" to correspond to master spreadsheet file numbers (there was a blank first file).  8/3/15         
2001
Ashley Champagne
7/18/15
 7/19/15

Phillip

Cortes

clean with a few 50-70 words or less "Corrections" articles 
7/30/15
  Lindsay     
2002
Jonathan Callies  7/20/15  7/21/15
Ashley Champagne
clean, but no "data" folder.
7/24/15
  Lindsay     
2003
Jonathan Callies  7/20/15  7/22/15 

Phillip 

Cortes

had to reorder the plain txt files to make them correspond with the master spreadsheets
7/30/15
  Lindsay     
2004
Lindsay Thomas  7/21/15  7/22/15  Ashley Champagne  Clean
7/27/15
       
2005
Alan Liu
7/24/15

7/25/15

Chris Walker  Mostly clean. A few articles missing "next pages" are marked in red.  7/29/15         
2006
Alan Liu Patrick Mooney
8/29/15
   
           
2007
Alan Liu
7/5/15
7/7/15
Chris Walker  Very clean. One article missing next page (marked in red).  7/29/15         
2008
Alan Liu
7/7/15
7/9/15
Chris Walker  Json folder not in proper hierarchy.  Several articles missing "next pages" (marked in red). 7/29/15         
2009
Alan Liu
7/13/15
7/14/15
Chris Walker  Liberal arts "Data Folder" has more plain text files (179) than are listed on the master spread sheet (136). It seems there are several duplicate files in the data folder. A long articles (and one short) are missing data (marked in red on the spread sheet). 7/29/15         
2010
Alan Liu
7/14/15
7/18/15
Chris Walker  Mostly clean. I encountered a problem on the "humanities" master spreadsheet in which articles beginning with "http://query" have dead links listed in the URL column. Strangely, however, the "net page" URLs seem to work. I've marked these rows in red.   7/31/15         
2011
Lindsay Thomas
7/17/15  7/20/15  Chris Walker  Master spreadsheet for "Liberal arts" missing an "article #" column. It seems all plain text files begin with "@@@@@@@@@@@" (because of improper chopping?). One article incomplete.  7/31/15         
2012
Alan Liu
7/26/15 7/26/15
Chris Walker  It seems that the plain text files have two formats for the date (Sept. 11, 2012 v. 6/11/12). Not sure this needs to be corrected, or what caused it. Several articles missing "next pages" 7/31/15         
2013
Alan Liu
7/18/15
7/19/15
Chris Walker  Relatively clean.  7/31/15         
2014
Alan Liu
7/19/15
7/20/15
Chris  Walker  Only minor problems--a few "next pages" missing.            
2015
Alan Liu
 
 
             
                     

 

 

Wall Street Journal collection work

Red = completed (queries for both "humanities" and "liberal arts"). Finished work here [TBD].

Year
Collector
Started
Completed

Inspector

Inspection Notes

Inspection

Completed

Inspection Sign-Off "Reader" Scrubber

Scrubbing

Completed

1984
Alan Liu
7/21/15
7/21/15
Ashley Champagne   Clean
8/11/15
AC
     
1985

Jonathan Callies 

7/22/15 

7/24/15 

Ashley Champagne
Clean   8/10/15 AC      
1986
Jonathan Callies  7/22/15  7/23/15  Ashley Champagne
Pretty clean--just deleted some blank files in the plain text chopped folders. 
8/11/15
AC
     
1987
Jonathan Callies  7/23/15  7/23/15  Ashley Champagne
not all liberal arts articles have liberal arts in them. e.g. plain text 4, 30.
8/19/15
AC       
1988
Jonathan Callies  7/23/15  7/23/15  Ashley Champagne
master spreadsheet had 1987 also in it. This is corrected.
8/19/15
AC
     
1989
Alan Liu
7/22/15
7/22/15
Ashley Champagne 
Clean  8/28/15  AC       
1990
Jonathan Callies
7/31/15  7/31/15  Phillip Cortes
Clean
7/31/15
PC
     
1991
Jonathan Callies
7/31/15  7/31/15  Phillip Cortes
Clean
7/31/15
PC
     
1992
Jonathan Callies 
7/31/15  7/31/15  Phillip Cortes
Clean
7/31/15
PC
     
1993
Jonathan Callies
7/31/15  7/31/15  Phillip Cortes

Need to redo for "humanities"--in the master spreadsheet many article bodies do not correspond with article titles. For example,

Article A would have content for Article B, and B would content for C, etc. No need to rescrape for "liberal arts." 

Rescraped the humanities. All the article bodies should correspond with the titles now. Thanks

8/3/15
PC
     
1994
Lindsay Thomas  7/24/15  7/24/15  Phillip Cortes
"Humanities" and "liberal arts" scrapes are clean.
8/5/15
PC
     
1995
Lindsay Thomas  7/24/15  7/24/15  Phillip Cortes
Clean, though there were duplicate files which were deleted.
8/5/15
PC
     
1996
Lindsay Thomas  7/26/15  7/26/15  Phillip Cortes
Clean. Duplicates deleted. 
8/19/15
PC
     
1997
Lindsay Thomas  7/26/15  7/26/15  Ashley Champagne
Not all files in "Liberal Arts" folder have “liberal arts” in them: e.g.:11/17/1997 " Jazz: Swinging at 18th and Vine." My guess is that there are duplicate files in the liberal arts folder because the articles that don't have liberal arts have "humanities". Also: the url file is missing.
8/5/2015
 AC
     
1998
Jonathan Callies  7/23/15  7/23/15  Ashley Champagne

There is no proper master spreadsheet for the humanities. Currently the master spreadsheets for the liberal arts and the humanities are the same, but the number of chopped files in the liberal arts corresponds to the master spreadsheet for the liberal arts. So, the humanities master spreadsheet is the missing one.

 

Liberal arts contained the duplicate data. I rescraped liberal arts for 1998--the master spreadsheet and the chopped files should correspond now.

8/5/2015
AC
     
1999
Jonathan Callies  7/23/15  7/23/15  Ashley Champagne
No errors detected.
8/5/2015
AC
     
2000
Jonathan Callies  7/24/15

 

7/24/15  Ashley Champagne
Pretty clean--I deleted the first blank files in the "data" and "working data" folders.  8/10/2015  AC       
2001
Jonathan Callies  7/24/15  7/24/15
Chris Walker  No errors detected.
8/3/15  CW
     
2002
Jonathan Callies  7/24/15  7/24/15  Chris Walker  No errors detected.

8/3/15 

CW
     
2003
Jonathan Callies  7/24/15  7/24/15  Chris Walker  No errors detected.

8/3/15 

CW
     
2004
Jonathan Callies  7/24/15  7/24/15  Chris Walker  No errors detected. 8/3/15  CW
     
2005
Jonathan Callies  7/24/15 
7/24/15  Chris Walker  No errors detected.

8/3/15 

CW
     
2006
Jonathan Callies  7/24/15 
7/27/15  Chris Walker  No errors detected. 8/3/15  CW
     
2007
Jonathan Callies  7/27/15  7/27/15  Chris Walker  No errors detected. 8/5/15
CW
     
2008
Jonathan Callies  7/27/15  7/27/15  Chris Walker  Plain text folder contains what seems like duplicates of the humanities plain text files.
8/5/15
       
2009
Jonathan Callies  7/27/15  7/27/15  Chris Walker  No errors detected.
8/5/15
CW
     
2010
Jonathan Callies  7/27/15  7/27/15  Chris Walker  No errors detected.
8/5/15
CW
     
2011
Jonathan Callies  7/27/15  7/27/15  Chris Walker  2 duplicate plain text files were present in the humanities folder. 
8/5/15
CW
     
2012
Jonathan Callies  7/27/15  7/27/15  Chris Walker  No errors detected.
8/14/15
CW
     
2013
Jonathan Callies  7/27/15  7/27/15  Chris Walker  A single error detected; an article over 5000 words was truncated. 
8/14/15
       
2014
Jonathan Callies  7/27/15  7/27/15  Chris Walker  Humanities completely clean. The master spreadsheet for Liberal arts in scrambled--urls don't match with the articles. Suggest a new scrape.  Now Clean.
8/14/15
CW
     
2015
                   
                     

 

 

Guardian collection work

Red = completed (queries for both "humanities" and "liberal arts"). Finished work here [TBD].

Year
Collector
Started
Completed
"the arts" scraped
(by, date)

Inspector

Inspection Notes

Inspection

Completed

Inspection
Sign-Off
"Reader" Scrubber

Scrubbing

Completed

1994
Jonathan Callies  7/28/15  7/28/15  Jonathan Callies, 8/5/15  Phillip Cortes
Clean, though spreadsheet didn't input the author name.
8/19/15
PC
     
1995 Jonathan Callies  7/28/15  7/28/15  Jonathan Callies, 8/5/15 (no hits)  Phillip Cortes
No need to clean.
8/19/15
PC
     
1996

Jonathan Callies 

7/28/15  7/28/15  Jonathan Callies, 8/5/15  Phillip Cortes
Clean, though spreadsheet didn't input author name.
8/19/15
PC
     
1997 Jonathan Callies  7/28/15  7/28/15  Jonathan Callies, 8/5/15  Phillip Cortes
Clean.
8/19/15
PC
     
1998 Jonathan Callies  7/28/15  7/28/15  Jonathan Callies, 8/5/15  Phillip Cortes
Clean, though not all of the author names were included in the spreadsheet.
8/27/15
PC
     
1999 Jonathan Callies  7/28/15  7/28/15  Alan Liu, 8/2/15 Phillip Cortes
Clean. There were no plain text files created though for "the_arts", so they were added.
8/27/15
PC
     
2000 Jonathan Callies
7/28/15  7/29/15  Jonathan Callies, 8/6/15  Ashley Champagne

The first plain text file in the "liberal arts" and "humanities" scrapes are empty, which leads 

to a mismatch in the number of files listed in the master spread sheet and those in the folder (latter is increased by 1). Other minor errors.

 

1/20/16
AC
     
2001 Jonathan Callies
7/29/15  7/29/15  Jonathan Callies, 8/6/15  Ashley Champagne
Many of the dates in the master spreadsheet are incorrect (specially for "the arts"). However, I think this is due to updates to some of the articles. I don't suggest a re-scrape. 
1/20/16
 AC
     
2002 Jonathan Callies
7/29/15  7/29/15  Jonathan Callies, 8/6/15  Ashley Champagne

The first plain text file in the "liberal arts" and "humanities" scrapes are empty, which leads 

to a mismatch in the number of files listed in the master spread sheet and those in the folder (latter is increased by 1). For "the arts," there is also a mismatch between the number of plain text files and the files listed in the master spreadsheet. I cannot locate the reason why. A few "next pages" missing.

 


1/20/16  AC       
2003
Jonathan Callies
7/29/15  7/29/15  Jonathan Callies, 8/11/15  Ashley Champagne
First file in "liberal arts" plain text is blank, so there is a mismatch between the number on the spreadsheet. In "the arts," mismatch by 2 (713 in plain text files and 711 in spread sheet).
1/20/16  AC       
2004
Alan Liu
7/23/15
7/23/15
Jonathan Callies, 8/11/15  Ashley Champagne
Strange redundancy in humanities files. A few truncated files in "the arts."
1/20/16  AC       
2005

Jonathan Callies 

7/29/15  7/29/15  Jonathan Callies, 8/11/15  Ashley Champagne
Mismatch in "liberal arts" plain text files and master spreadsheet--first file is blank.
1/20/16  AC       
2006
Jonathan Callies 7/29/15  7/29/15  Jonathan Callies, 8/11/15  Ashley Champagne
Several of the authors missing from the master spreadsheets, but otherwise clean.
1/20/2016
AC
     
2007
Jonathan Callies  7/29/15  7/29/15  Jonathan Callies, 8/11/15  Ashley Champagne
           
2008
Jonathan Callies  7/30/15  7/30/15  Jonathan Callies, 8/14/15  Ashley Champagne
           
2009
Jonathan Callies  7/30/15  7/30/15  Jonathan Callies, 8/14/15    Ashley Champagne
           
2010
Jonathan Callies  7/30/15  7/30/15  Jonathan Callies, 8/14/15   Ashley Champagne
           
2011
Jonathan Callies  7/30/15  7/30/15  Jonathan Callies, 8/14/15   Ashley Champagne
           
2012
Jonathan Callies  7/30/15  7/30/15  Jonathan Callies, 8/14/15   Ashley Champagne
           
2013
Jonathan Callies  7/30/15  7/30/15  Jonathan Callies, 8/14/15   Ashley Champagne
           
2014
Jonathan Callies  7/30/15  7/30/15  Alan Liu, 8/2/15 Ashley Champagne
           
2015
                     
                       

 

 

NPR collection work

Red = completed (queries for both "humanities" and "liberal arts"). Finished work here [TBD].

Year 
Collector 
Started 
Completed 

Inspector

Inspection

Completed

Inspection
Sign-Off
"Reader" Scrubber

Scrubbing

Completed

2004 Jonathan Callies 8/3/15 8/3/15 Ashley Champagne
8/12/2015 AC       
2005 Jonathan Callies 8/3/15 8/3/15 Ashley Champagne
There is only one article in "liberal arts" folder, but article doesn't actually contain phrase "liberal arts." Should we delete this year? 8/12/2015 AC       
2006 Jonathan Callies

 

8/3/15 8/3/15 Ashley Champagne
8/12/2015  AC       
2007 Jonathan Callies 8/3/15 
8/3/15 
Ashley Champagne
8/12/2105  AC       
2008 Jonathan Callies
8/3/15 
8/3/15 
Ashley Champagne  8/14/2015  AC       
2009 Jonathan Callies 8/4/15 
8/5/15 
Ashley Champagne  8/14/2015  AC       
2010 Jonathan Callies
 8/5/15
 8/5/15
Ashley Champagne  An issue here that I can fix. Found an article (there may be more) that the scrape didn't get the whole article for--the reason for this is because NPR articles sometimes have subtitles and I think the scrape may read as though the article is finished before it it is --e.g. "Humanities" article "Google Book Tool"   AC       
2011 Jonathan Callies
 8/5/15
 8/5/15
Ashley Champagne  8/14/2015  AC       
2012 Jonathan Callies
 8/5/15
 8/5/15
Ashley Champagne

9/28/2015

Some extra columns in the Excel spreadsheet (edition, source)

AC       
2013 
Jonathan Callies
 8/5/15
 8/5/15
Ashley Champagne

9/28/2015 

Some extra columns in the Excel spreadsheet (edition, source)

AC       
2014 
Jonathan Callies   8/5/15  8/5/15
Ashley Champagne

9/28/2015 

Some extra columns in the Excel spreadsheet (edition, source)

AC       
2015 

 

 
 
           
   
 
 
 
 
   
 
 

 

 

The Washington Post collection work

Red = completed (queries for both "humanities" and "liberal arts"). Finished work here [TBD].

Year 
HTML file in shared drive (in aggregate_working_data folder for each search term) Collector 
Started 
Completed 

Inspector

Inspection Notes

Inspection

Completed

Inspection
Sign-Off
"Reader" Scrubber

Scrubbing

Completed

1987 X Ashley Champagne
9/7/2015
 9/16/2015
Jamal Russell  Liberal arts agg. working data folder missing agg. plain text documents.  11/19/2015         
1988 Ashley Champagne
9/30/2015
10/7/2015
Jamal Russell  Two articles (File 63 in the LA folder; File 18 in the Humanities folder) have been cut off.  11/19/2015         
1989 Ashley Champagne
10/7/2015
10/7/2015
Jamal Russell  Humanities master excel spreadsheet in LA agg. working data folder. 11/19/2015         
1990 Ashley Champagne
10/7/2015
10/7/2015
Jamal Russell  1989 LA master excel spreadsheet in Humanities agg. working data folder. LA agg. working data folder missing agg. plain text documents. Two Humanities files (2 and 30) have been cut off.  11/19/2015         
1991 Ashley Champagne
10/8/2015
10/8/2015
Jamal Russell  File 15 in the Humanities folder is essentially blank (only the date, title, and author is in the plain text file). File 70 in the Humanities folder has been cut off.  11/20/2015         
1992 Ashley Champagne
10/8/2015
10/8/2015
Jamal Russell  Humanities master spreadsheet in LA agg. working data folder. Files 96 and 135 in the Humanities folder are essentially blank (date and title only). File 27 in the LA folder has been cut off. 11/20/2015         
1993 Ashley Champagne
10/8/2015
10/8/2015
Jamal Russell  File 149 in the Humanities folder is blank (date and title only).  11/20/2015         
1994 Ashley Champagne
10/8/2015
10/8/2015
Jamal Russell  File 4 in the Humanities folder is blank (date, title, and author only). File 120 in the Humanities folder and file 3 in the LA folder have both been cut off. 11/20/2015         
1995 Ashley Champagne
10/8/2015
10/8/2015
Jamal Russell  1994 LA master excel spreadsheet in Humanities agg. working data folder.  11/20/2015         
1996 Ashley Champagne
10/8/2015
10/11/2015  Jamal Russell  Dec. 3-31 LA articles not scraped. File 32 in the LA folder has been cut off.  11/20/2015         
1997 Ashley Champagne
10/11/2015  10/11/2015  Jamal Russell  File 15 in the Humanities folder did not get scraped.   11/24/2015         
1998 Ashley Champagne
10/11/2015  10/11/2015  Jamal Russell  Files 2 and 145 in the Humanities folder and file 23 in the LA folder have been cut off.  11/24/2015         
1999 Ashley Champagne 10/11/2015  10/14/2015  Jamal Russell  Files 90 and 153 in the Humanities folder and files 4 and 42 in the LA folder have been cut off.  11/24/2015         
2000 Ashley Champagne  10/14/2015  10/14/2015  Jamal Russell  1999 LA master excel spreadsheet in Humanities agg. working data folder. File 107 in the Humanities folder and files 24 and 45 in the LA folder have been cut off.  11/24/2015         
2001 Ashley Champagne  10/14/2015  10/14/2015  Jamal Russell  Files 158 and 224 in the Humanities folder and file 70 in the LA folder have been cut off. 12/17/2015         
2002 Ashley Champagne  10/14/2015  10/14/2015  Jamal Russell  2001 LA master spreadsheet in Humanities agg. working data folder. Files 64 and 78 in the LA folder have been cut off.  12/17/2015         
2003 Ashley Champagne  10/15/2015
10/15/2015
Jamal Russell    12/17/2015         
2004 Ashley Champagne
10/15/2015
 10/15/2015
Jamal Russell File 149 in the Humanities folder and file 45 in the LA folder have been cut off.  12/17/2015         
2005 Ashley Champagne  10/15/2015
 10/15/2015
Jamal Russell One article not scraped from Humanities articles. File 44 in the Humanities folder and file 74 have been cut off.  12/17/2015         
2006 Ashley Champagne
 10/21/2015
10/21/2015
Jamal Russell File 2 in the Humanities folder has been cut off.  12/18/2015         
2007 Ashley Champagne
 10/21/2015
 10/21/2015
Jamal Russell The text of LA files 15 and 19 are located at external links on the ProQuest document and, thus, were not scraped. File 11 in the LA folder has been cut off.  12/18/2015         
2008 Ashley Champagne
 10/21/2015
10/21/2015
Jamal Russell  File 48 in the Humanities folder has been cut off.  12/18/2015         
2009 Ashley Champagne
 11/03/2015
11/03/2015
Jamal Russell    12/18/2015         
2010 Ashley Champagne
 11/03/2015
11/03/2015
Jamal Russell    12/18/2015         
2011 Jamal Russell  10/27/2015 10/27/2015 
Ashley Champagne
           
2012 Jamal Russell  10/29/2015 10/29/2015 Ashley Champagne
           
2013 
Jamal Russell  10/29/2015 11/03/2015 
Ashley Champagne
           
2014 
Jamal Russell 
11/03/2015  11/05/2015 Ashley Champagne
           
2015 
 

 

 
 
             

 

 

LA Times collection work

Red = completed (queries for both "humanities" and "liberal arts"). Finished work here [TBD].

Year
Collector
Started 
Completed

Inspector

Inspection Notes

Inspection

Completed

Inspection Sign-Off "Reader" Scrubber

Scrubbing

Completed

1996

Jonathan Callies   8/19/15
8/19/15

 CW

   
 
     
1997 Jonathan Callies   8/19/15
8/19/15 
 CW
 
 
 
     
1998 Jonathan Callies  8/19/15  8/19/15
 CW
 
 
 
     
1999 Jonathan Callies  8/19/15  8/19/15
 CW
           
2000 Jonathan Callies  8/19/15  8/19/19
 CW
           
2001
Jonathan Callies  8/25/15   9/7/15
Jamal Russell
Both folders missing master spreadsheet; file 95 in the Humanities folder has been cut off.  12/27/2015         
2002
Jonathan Callies  9/2/15   9/7/15
Jamal Russell
 
12/27/2015   
     
2003
Jonathan Callies  9/2/15   9/7/15
Jamal Russell 
File 31 in the Humanities folder has been cut off. 
12/27/2015 
 
     
2004
Jonathan Callies  9/2/15   9/7/15
Jamal Russell 
File 33 in the Liberal Arts folder has been cut off. 
12/27/2015 
 
     
2005
Jonathan Callies  9/2/15   9/7/15
Jamal Russell 
 
12/28/2015 
 
     
2006
Jonathan Callies  9/9/15   9/11/15
Jamal Russell 
File 56 in the Humanities folder has been cut off. 
12/28/2015 
 
     
2007
 Jonathan Callies
9/9/15    9/11/15
Jamal Russell 
 
12/28/2015 
 
     
2008
Jonathan Callies  9/9/15    9/11/15
Jamal Russell 
  12/28/2015         
2009
Jonathan Callies  9/9/15    9/11/15
Jamal Russell 
 
12/28/2015 
 
     
2010
Jonathan Callies  9/9/15    9/11/15
Jamal Russell 
 
12/29/2015 
 
     
2011
Jonathan Callies  9/9/15    9/11/15
Jamal Russell 
 
12/29/2015 
 
     
2012
Jonathan Callies  9/9/15    9/11/15
Jamal Russell 
 
12/29/2015  
     
2013
Jonathan Callies  9/9/15    9/14/15
Jamal Russell 
File 30 in the Liberal Arts folder was not fully scraped. 
12/29/2015 
 
     
2014
Jonathan Callies  9/9/15    9/14/15
Jamal Russell 
Found two blank plain text documents (34 and 35) in the Liberal Arts folder. 
12/29/2015 
 
     
2015
 
   
 
 
 
 
     

 

New Yorker collection work

Red = completed (queries for both "humanities" and "liberal arts"). Finished work here [TBD].

Year
Collector
Started 
Completed

Inspector

Inspection Notes

Inspection

Completed

Inspection Sign-Off "Reader" Scrubber

Scrubbing

Completed

2004
Ashley Champagne  8/20/2015  9/18/2015
 Phillip Cortes
 Humanities master Spreadsheet cannot input the total article body for one article because it exceeds the limit.
 9/23/2015
 PC
     
2005
Ashley Champagne
8/20/2015  9/8/2015
 Phillip Cortes
 Clean.
9/23/2015
PC
     
2006
Ashley Champagne
 9/3/2015
9/3/2015
 Phillip Cortes
 Liberal Arts spreadsheet could not include total article body for one article due to word number limits. The only file in the humanities folder does not contain the word "humanities."
 9/23/2015
PC
     
2007
Ashley Champagne
9/1/2015  9/2/2015
 Phillip Cortes

 No masters spreadsheets or aggregate_plain_text.docx files for humanities and liberal arts. The docx. files were added.

I searched for the master spreadsheets and found them in the restored-sheets folder. I have added them, and made an Excel copy. Thanks.

 9/23/2015
PC
     
2008
Ashley Champagne
8/28/2015  8/31/2015
 Phillip Cortes

No masters spreadsheets.

I searched for the master spreadsheets and found them in the restored-sheets folder. I have added them, and made an Excel copy. Thanks.

9/23/2015
PC
     
2009
Ashley Champagne
8/27/2015
8/27/2015
 Phillip Cortes

 No masters spreadsheets or doxc. files for humanities and liberal arts. Docx. files were added.

I searched for the master spreadsheets and found them in the restored-sheets folder. I have added them, and made an Excel copy. Thanks.

 9/23/2015
 PC
     
2010
Ashley Champagne
9/18/2015
9/21/2015
 Ashley Champagne
 Clean
11/13/2015 
 AC
     
2011
Ashley Champagne
9/21/2015
9/21/2015
 Ashley Champagne
 Two files didn't have the full articlebody in them (from the Excel spreadsheet cutting them off). I fixed this. 
11/15/2015 
AC 
     
2012
Ashley Champagne
 10/1/2015
10/1/2015
 Ashley Champagne
 Two files didn't have the full articlebody in them (from the Excel spreadsheet cutting them off). I fixed this. 
 11/24/2015
AC 
     
2013
Ashley Champagne
10/1/2015
10/1/2015
 Ashley Champagne
 One file didn't have the full articlebody in it (from Excel spreadsheet cutting). I fixed this.
11/24/2015 
AC 
     
2014
Ashley Champagne
10/1/2015
10/7/2015
 Ashley Champagne
 Clean
11/24/2015 
AC 
     
2015
 
   
 
 
 
 
     

 

 

Comments (0)

You don't have permission to comment on this page.