4. Document similarity: compare every article pair (doc-doc)


Because the currently small size of the DHQ article set (110 articles) doesn't yet allow for significant statements about the flow of knowledge among these articles, I introduced an additional measure to these visualizations: document similarity. MITH's Travis Brown ran a topic modeling procedure to provide datasets comparing the vocabulary of DHQ articles to other DHQ articles as well as exploring how well a given topic (think set of thematically related words) fit an article.

This visualization looks at the former (document to document) dataset: paired nodes each representing two DHQ articles, with the color of the edge between each pair signifying how similar their topics are. This visualization isn't very useful as is; the only thing that sticks out visually is the DHQ article 000032.xml (Raphael Finkel's "What Your Teacher Told You is True: Latin Verbs Have Four Principal Parts"), a quick skim of which shows that it is syntactically quite different from the bulk of DHQ articles--so, while this visualization needs some finessing to be visually useful, the dataset seems to be succesfully identifying similarity and difference among the DHQ articles. Now, let's look closer at the document similarity measure...