|9/4 - 9/6||Character Encodings + Philosophy||Technical Reading: Bits to Characters
|Programming 1: Character encodings|
|9/9 - 9/13||Tokenization and Counting||Technical:
|Programming 2: Tokenization, Counting|
|9/16 - 9/20||Sentiment analysis||Technical:
||Programming 3: Evaluate two sentiment lexicons; Manually create a dictionary-based lexicon for an emotion.|
|9/23 - 9/27||Classification:||Technical:
||Programming 4: What distinguishes History, Tragedy, and Comedy in Shakespeare's plays?|
|9/30 - 10/4||Measurements of uncertainty; Similarity and Divergence||Technical:
||Programming 5: Identifying change and variation over time by comparing documents in a sequence.|
|10/7 - 10/11||Similarity, Clustering, and Authorship||Technical:
Shared data project: Construct a collection of The Federalist Papers.
Programming 6: We will apply similarity functions to the Federalist Papers, and see how they imply different clusterings.
|10/16 - 10/18||Similarity, Clustering, and Authorship||Technical: No reading
Discussion: No Discussion!
Complete data project on Federalist Papers
Programming 6 cont'd: examine differences in keyword use, add clustering and tf-idf
|10/21 - 10/25||Corpus-building, clustering||
||Programming 7: IDF weighting. Build a collection from Project Gutenberg texts. Extending similarity to clustering. Distinguish Horror from non-Horror.|
|10/28 - 11/1||Corpus-building, clustering continued||
Choose TWO (2) of the following articles about constructing a collection:
|Programming 7 Cont'd: IDF weighting. Build a collection from Project Gutenberg texts. Extending similarity to clustering. Distinguish Horror from non-Horror.|
|11/4 - 11/8||Topic modeling||Technical:
||Programming 8: Training, analyzing, and evaluating topic models.|
|11/11 - 11/15||Word embeddings||Technical:
Mini-project is due Monday
Programming 9: Word embeddings, keywords in context, distance functions.
|11/18 - 11/22||Tools for Hypothesis Testing||Technical:
||How do we make the connection between text analytics and persuasive arguments? What methods can help us convince ourselves that we're not reporting random values, and how can we explain to others what we've done?|
||Do we need to think about significance differently when we're running many experiments than when we're running just one?|
|11/27 - 11/29||Thanksgiving, no class|