13 Feb 2015

I scraped the transcripts to the comic strips from Dilbert over the last 25 years. Initially, the idea was to look at what the most popular topics were over the years, examining and illustrating how often certain words came up in conversation, or when certain words were first mentioned (“google”, “unfriend”, “tweet”, etc.)  in the comic. But determining what words were deemed too common or uninteresting and filtering out these words was a challenge. The other challenge is that the transcripts do not use a consistent method at identifying the speaker of the dialogue, nor does it specify which dialogue belongs to what comic strip panel. One option is to use some type of OCR software to examine the strip’s images and recognize and separate the dialogue by panel. Then a computationally generated strip of disjoint panels could be assembled by topic or keyword. I’m not sure if this will still work, so I may have to change the source of my data for the visualization.

For random strips, Cyanide and Happiness is a webcomic that has an option to generate random panels. This results in comic strips that are often non-sensical and uninteresting, but sometimes, it produces dialogue that, put together across panels, are quite funny. The Random Garfield Generator is a similar concept for the long-running Garfield comic strip, but allows the user to fix one panel while the others change. I’ve not yet seen a similar concept for Dilbert, but my “visualization”, if still deemed viable, would be to implement something similar. My only thought is that the generated panels may only be interesting at rare instances.

Some sketches below capture the randomly-generated panel or the keyword plot during the comic’s yearly history.