If you’ve noticed an absence of posts in recent weeks, it’s because of the end-of-semester rush followed by three weeks of winter break. I’ve been doing a fair bit of reading and some basic programming practice, but not much worthy of writing about. To remedy that, I undertook a small project that could serve as a template for a contribution to a poetry project of my brother-in-law’s. Following the steps and code laid out in Dr. Healey’s text analytics page, I used Python to parse and analyze five poems (using each poem as a “document”) and Gephi to visualize the resulting TFIDF pairwise similarity network (using the “Expansion” layout):
This would probably fare a little better with a larger dataset than five poems clustered into thematic nodes, rather than using each of the five poems as a node. The term frequency is very low with such a small set, but it was a fun exercise to start from raw text and translate it into a visualization. I chose this particular methodology because it seemed the visualization that would translate best to print. Next, I may try to work on increasing the number of poems used, and then play with sentiment analysis and visualizations of those results, again optimizing for print as opposed to an interactive web display.