Alan's synopsis of Teddy's slides in the form of a response to him this morning. His slides are
here.
1. To help us cognitively track and understand the shape of the corpus we want (essentially, to model it), we create a visualization platform--perhaps as simple as a set of slides--in which we insert publications in a tree organization. Elaborating on your idea:
* we could evolve or change the tree as we go;
* we could talk to Scott about how to express the corpus model in his Manifest schema (in JSON). This would logically treat each visual box as an "object" that in the future might be manipulated programmatically (e.g., via JavaScript) to generate the visual tree;
* it would be possible to visualize and compare several versions of the tree. (This may be the way to deal with overlapping or difficult ontology structures that don't cleanly nest in each other).
--[cf. Battle of Britain War Room at Uxbridge]--cf. WE1s "Manifest" system
Scott's
Manifest schema Form for a publication manifest in Scott's Manifest system
2. After selecting publications to collect from and then do the actual collection work, we set up what amounts to an analysis logic that:
* prepares the corpus for a particular research question by choosing the right boxes from the tree;
* then samples the publications in those boxes for an equal number of articles (2,000 articles from each publication)
This makes topic modeling a logical process of assessing an overall sample with equal weights of source material. (But it also raises interesting issues about the sampling method for sources with unequal total numbers, density, lengths, and kinds of articles)