5. Document similarity, outlier removed


Larger view of image

This visualization shows the ten DHQ articles that the topic modeling procedure identified as most similar (the outlier article mentioned in the last visualization description, 000032.xml, was removed first). A quick skim of these articles reveals some obvious similarities as well as some mysteries in terms of how the articles are similar (the ID numbers are individual DHQ article IDs):

00005 - 000009
Reading Potential: The Oulipo and the Meaning of Algorithms
Somewhere Nearby is Colossal Cave: Examining Will Crowther's Original Adventure in Code and in Kentucky

000009 - 000044
Somewhere Nearby is Colossal Cave: Examining Will Crowther's Original Adventure in Code and in Kentucky
Mining Eighteenth Century Ontologies: Machine Learning and Knowledge Classification in the Encyclopédie

000042 - 000068
Vive la Différence! Text Mining Gender Difference in French Literature
Ontologies and Logic Reasoning as Tools in Humanities?

000009 - 000068
Somewhere Nearby is Colossal Cave: Examining Will Crowther's Original Adventure in Code and in Kentucky
Ontologies and Logic Reasoning as Tools in Humanities?

000042 - 000082
Vive la Différence! Text Mining Gender Difference in French Literature
Digital Encoding as a Hermeneutic and Semiotic Act: The Case of Valerio Magrelli

000044 - 000082
Mining Eighteenth Century Ontologies: Machine Learning and Knowledge Classification in the Encyclopédie
Digital Encoding as a Hermeneutic and Semiotic Act: The Case of Valerio Magrelli

000009 - 000070
Somewhere Nearby is Colossal Cave: Examining Will Crowther's Original Adventure in Code and in Kentucky
The Potential and Problems in using High Performance Computing in the Arts and Humanities: the Researching e-Science Analysis of Census Holdings (ReACH) Project

000070 - 000082
The Potential and Problems in using High Performance Computing in the Arts and Humanities: the Researching e-Science Analysis of Census Holdings (ReACH) Project
Digital Encoding as a Hermeneutic and Semiotic Act: The Case of Valerio Magrelli

000009 - 000082
Somewhere Nearby is Colossal Cave: Examining Will Crowther's Original Adventure in Code and in Kentucky
Digital Encoding as a Hermeneutic and Semiotic Act: The Case of Valerio Magrelli

000005 - 000070
Reading Potential: The Oulipo and the Meaning of Algorithms
The Potential and Problems in using High Performance Computing in the Arts and Humanities: the Researching e-Science Analysis of Census Holdings (ReACH) Project

Some similarities are obviously thematic ("Reading Potential: The Oulipo and the Meaning of Algorithms" and "Somewhere Nearby is Colossal Cave: Examining Will Crowther's Original Adventure in Code and in Kentucky" both discuss games and play). Other articles may seem to discuss quite different issues while still employing a highly similar vocabulary; "Reading Potential: The Oulipo and the Meaning of Algorithms" and "The Potential and Problems in using High Performance Computing in the Arts and Humanities: the Researching e-Science Analysis of Census Holdings (ReACH)" shared a closeness to a topic containing fairly decontextualized words such as work, time, university, see, same, and make.