Category Archives: 32-visualization

Zack Aman

12 May 2015

I think that, to a large extent, people are what they read. This project is a way for me to look back at my Kindle highlights, a historical mirror of what I have read and found important.

Screen Shot 2015-05-12 at 11.21.07 AM

Screen Shot 2015-05-12 at 11.21.23 AM

The Amazon Kindle makes it easy to highlight passages as you read, but it does not provide an easy, efficient way to go back through your highlighted notes. They do, however, provide all of your notes online, grouped by book. I wanted to take these and try and visualize them.

One of the main ways my reading habits changed when I got a Kindle was that I started reading many books at once, choosing to read whatever I feel like reading at the time rather than focusing on a single book until it is complete. I would love to be able to view how I am reading different books at different times, and what books are co-read with each other. Unfortunately, Amazon doesn’t seem to save (or at least not surface) when individual highlights were created, instead choosing to simply show when the last highlight was made.

While I would have preferred to look at individual highlights, or number of highlights per book per month, it was still worthwhile to visualize what Amazon provides. What I ended up doing is still displaying it chronologically, just by last edited. For example, the  large blue peak in the second image shows that that is around when I finished the first volume of Anaïs Nin’s diary, as there is a huge peak (for completion) but it is far to the left (hasn’t been edited recently). I also ran the Kindle highlights through the Lynn Cherny NLP-Python gamut, but didn’t get great results because some of the books have so little text associated with them. The only strong relationship was that the different Anaïn Nin books are related to each other, which is no big surprise.

In order to get at the notion of “you are what you read,” I set up the visualization to be interactive. By default, it picks a random quote and displays it, but you can also click on a bar and drag up and down to scrub through the different highlights for each book.

Code viewable on GitHub.

Zack Aman

12 May 2015

Twitch Dialects tells the story of what’s happening on through scraping and visualizing chat emotes.

An hour of Arteezy’s Dota2 stream.

Full size examples viewable here, here, and here.

I’m fascinated by At it’s core, Twitch is a place for people to stream video games — what’s made it successful, however, are the viewers. The chat experience is integral to watching a stream, providing a constant, hive-mind commentary. In this project, the chat becomes the focus. In particular, I wanted to look at three things:

  1. Can you understand the story as it unfolds by only looking at the emotes used?
  2. How does the hive-mind work? How do people bandwagon together in spamming emotes?
  3. Do different channels have unique ways they use emotes to communicate?

I’m interested in these questions so I can better understand and design for participatory chat experiences. In Twitch Plays Pokemon, for example, the viewers collectively become the player. While this is hard to do in more complex games such as Dota2, there are ways people attempt to bridge the gap between game play and game spectation. For example, some streams are set up to speech-synthesize spectator-input messages when donations come in. My goal with this project is to create a tool to examine culture in spectation and look for ways to watching more participatory.

In this image (larger here), you can see how Twitch chat spikes around certain events. At the highest peak, you can infer that Arteezy (the player that is streaming) made a poor play, resulting in viewers spamming the facepalm emote.

I would like to continue refining this — one of the key things I took away from critique on Tuesday was that the data set is solid, but it’s hard to make sense of the data in its present form. The real question I need to answer is, “What context do I need to provide so that people can understand a story?” To this end, I’d like to either accompany the graph with other graphs that chart a sort of sentiment (boredom is closely linked to the sleeping face), or provide a way to interactively look at different hour slices and filter the graph down for easier viewing.

Code viewable on GitHub here.


04 Mar 2015

Since this was going to be my secondary assignment, I thought I’d use it as an opportunity to collect and play around with some data I’d been interested in for a while.

Story Time

My interest stems from the tried-and-true notion that generating narrative with a computer is hard.  Historically, it’s been very difficult to write programs that create comprehensible, meaningful story (spoiler alert!  This does not change by the end of this post).  Creating characters, leading them through plot arcs and coming up with entertaining, meaningful action are very hard for computers to get right, and while we’ve certainly been making advances, no one seems convinced that some algorithm is going to replace the screenwriter any time soon.

This makes it sound as though humans coming up with stories have it easy.  While we’re probably better equipped for creative writing than NLTK, I can’t help but think of all the times I’ve been stuck in a script room for hours as we tried to figure out what could possibly come next?  Does this make any sense at all?  How are we going to make this profound?

In the script room, we have our own algorithms for generating narrative.  On a good day, you start by getting a very detailed picture of your characters in your head, then place them in their setting and simply narrate the natural outcomes of their interaction.  On a good day, this feels so obvious and organic — there’s only one way the story could possibly go.  Other times, however, this just isn’t going to work: you can’t write a scene that doesn’t feel totally contrived, it’s not clear what the big picture is going to be, nothing you say feels clever, authentic, original. In other words, you get stuck.

Fortunately, the script room also had a lot of algorithms for getting unstuck.  Lots of writers across genres like to use the cutup technique, where you cut out words from the newspaper, stick them together at random, and try to make sense of the sentences it creates.  With theatre in particular, we liked putting iPods on shuffle, and using the sequence of songs to inspire some sort of story.

This works really well for drama.  Songs are sort of like micro-expressions that quickly establish a set of characters, along with their relationships, motivations and actions.  Stringing these together, combined with some amount of assistance and optimism, does help come up with characters, find a trajectory and eventually do something clever.

Data Collection

While the general processes for crafting narrative are creative, intuition-heavy and hard to model, it’s worth pointing out that, even in the human case, we sometimes turn to procedure and randomness when we want something new.  I was interested in collecting a dataset that might leverage this fact, making use of what computers do best to create, at the very least, a starting point for original narrative.

With the song cutout technique in mind, I scraped the information for the 2000 most recent songs from (inspired by Jon Mars’ data scraping plans), as well as the songs from a list of albums for top Broadway musicals .  From there, I used a Youtube python library to find a YouTube video that played the music for each of these songs.  At the end of the day, I had a list of song titles, their artists, and YouTube videos playing each song.

I sent these entries to 250 workers via Mechanical Turk, asking them each to listen to a sing, and write a few sentences describing the story the song seems to tell.  For instance, a description for Baby Got Back might be “Man rebutes one woman’s claims that large butts are unattractive, citing his own preference for curvy women”.  I considered this to be a test run: I didn’t give the task a lot of restriction, because I wanted to see what folks would come up with.

5 hours and $30 later, I had my 250 entries.  Responses ranged from the literal:

A girl goes to Los Angeles for the first time. She goes to a club and gains attention because she doesn’t look like someone who lives there. Every time she gets uneasy she hears a song on the radio/stereo/etc. that makes her feel reassured and then she continues to party. The time comes close for her to go back home so she makes one last hurrah.

To the figurative:

The protagonist/singer is describing a difficult, tenuous, challenging and ultimately obsessive relationship, one that affords him both pride and shame. Despite the problems, including the fact that there are issues neither party is willing to address aloud, the singer does not want it to end. In fact, he is begging his partner to keep it alive, reminding that person that they have pulled through before, again and again, and that he, (the singer/protagonist,) is willing to be subservient, put his partner in the limelight, be the quiet ‘wind beneath the wings,’ as it were, fly under the radar, do whatever it takes, experience whatever he must, of shame and denial, to keep the relationship going.

On the whole, most of the responses had at least some trace of analysis, and generally tried to get to the core of the song — “this song is about”.  It was really interesting data, and it seemed like it might be a good set of building blocks for creating narrative.

Methods of Interpretation

At first, I was interested in writing a script to cluster songs that had similar ideas: for instance, I had several songs about cheating (which basic IDF was generally able to pick up on): one where a man is upset about his girlfriend’s flirtation, one where a girl is feeling tempted to cheat, one where a man tempts a woman to cheat, one where a man is upset about a breakup in which his girlfriend cheated, one where a woman is relishing her new single life, one where a man moves on with a better partner, cursing the old one, one where a woman recoils at the sight of her ex with a new girl: if a computer was able to order that kind of progression from these word descriptions, we’d be in business.  But it’s not clear we have any good tools for doing such a thing, or if the sample size is big enough to do some clever analysis.  I also considered randomly assembling the descriptions into a series of paragraphs to tell a story but course staff decided that this wasn’t a very interesting visualization.

If I had a ton of resources, I’d love to take a set of these song descriptions, ordered either at random or by turkers, and send them to paid writers to write a short script.  I think it would be very interesting to see if something particularly creative and plausible is able to come out of this sort of treatment.  Without resources, however, it’s hard to add creative intervention to this sort of thing.

It was noted that, in their original form, many of them strung together sound like an art review:

The singer is describing an infatuation, or maybe a lover, who is a recurring image in her dreams. Though the dreams are often pleasant she describes him as also being a nightmare – perhaps reflecting the confused feelings. Even still, she wants to see him every night when she drifts off to sleep.

There’s an interplay here between a man and woman outlining their abusive relationship which each other. The man is in love with the woman but can’t resist his violent and hateful impulses. The woman knows that she should leave, but is continually lured in by the promise of his love (which is a lie)

This song is about how a person should treat his significant other like they will never see them again. This is because like it says in the song, “every day is not guaranteed”. Anyone can die tomorrow.

However, without a visualization or some kind of interaction, this wasn’t going to be interesting.

As a last attempt at using my data to engage with the world around me, I turned to one of the final techniques in my memory of script room meetings, the kind that never results in anything good, for when you know you’ve lost and yet, have somehow deluded yourself into believing that your action is going to result in something positive and productive.

You start a fight.

I created a gmail account under the name songpoet1996, along with a YouTube account.  I then wrote a script to find 50 YouTube videos that played each song, and for each, posting its “description” as a comment on the video.  At face value, the descriptions are innocent attempts at summary and analysis.  As a YouTube comment, however, they tended to take on a certain amount of pretension and obtuseness.  It looked like the kind of thing that might invite some strong words on the internet.

Screen Shot 2015-03-04 at 2.25.20 PM

Screen Shot 2015-03-04 at 2.27.07 PMScreen Shot 2015-03-04 at 2.26.45 PMScreen Shot 2015-03-04 at 2.30.36 PMScreen Shot 2015-03-04 at 2.24.00 PMScreen Shot 2015-03-04 at 2.12.34 PM I am now officially a tool-assisted asshole


As for my other plans, I’m still interested in working with this kind of dataset, and would like to see what kind of potential songs-as-narrative-blocks has for creating new stories.  Until then, I’m going to resort to the traditional pastime of theatre artists who have no more good ideas — fucking with people and calling their backlash “art”

Screen Shot 2015-03-04 at 2.18.21 PM


Combined with the random montage bot, this is working out to be an interesting YouTube account.  I’m interested in seeing what happens next.


03 Mar 2015

Summary: Mapping a collection of memories of Pleasanton posted on the Facebook group “You Know You’re from Pleasanton if…”.

Explanation: For my visualization project I wanted to map the memories that are posted on this Facebook group: “You know you’re from Pleasanton if…” where people from my hometown reminisce about the places that used to be there, people they knew, activities, etc. I found this especially fascinating because I grew up right as Pleasanton was beginning to lose the “old”, “authentic” feeling that many of the people on this feed are reminiscing about. It’s interesting to think about which places were significant and why and to imagine the discrepancies if I were to overlay my own memories on this map. I scraped the facebook feed for the group and was able to plot a few hundred memories (for any given pin on this map below their are overlapping memories. Each pin below is plotting a cluster of memories that happened in the same spot.)

When you hover over a pin you see what memory is associated with it. Here are some examples:

Here’s another version with circles and no pins. The circle color represents number of memories posted about that place.



Step 1: Get the entire feed using the Facebook Graph API.

This is output as json and was pretty easy to use; the json contains an array of objects representing messages with information such as date created, user author, likes, message content and corresponding comments. The information I focused on were messages ( “Does anyone remember sliding down hills in the late 70’s with…etc.”) and comments (where people respond with different anecdotes). These messages are really rich in spatial (“at the ice cream shop on main”) and temporal (“in the 60’s”) information. I focused on the spatial information for this project.

Step 2: Build a dictionary of place names (and alternate names).

The main challenge for me to think about was how to match messages (strings with multiple sentences and informal references to places) with a geographic location. I first used a part of speech tagger to tag all words in messages and return nouns and see what turned up. This turned out to only be useful for the initial stage in seeing that many place names would not be caught by just getting proper or even all nouns; for example ‘main’ in ‘main st’ would be tagged as an adjective. Also most place names are phrases, not individual words. I put together a list of places from different sources like old lists of pleasanton businesses, roads, parks, and my own knowledge of the place. Then for each place i used a geocoder to translate those place names to longitude and latitude coordinates. Sometimes just appending the city, state and zipcode to a place name would work as an input but often times I’d have to just manually search for it (especially if the name is of a place that isn’t actually there anymore).

Step 3: Matching messages/comments to place names.

Instead of trying to get places by first breaking down the messages into individual words (and identifying certain parts of speech etc.), I looked for phrases contained in strings that approximately matched place names. I wanted to account for the spelling errors and slight differences in the way people talk about places (i.e. Rod’s Hickory Pit might be just referred to as rods, or the hickory pit) so rather than searching for exact matches I “fuzzily searched” for phrases that were within some range of edit distance from the place names. This is a step in the right direction, and there is a library that allows you to do this fuzzy search and control aspects like the threshold. So it worked better than I had thought it would, but could definitely be improved for phrases like “pleasanton middle school” which has words that match many different places and might get matched to a different middle school for example just because of the overlap in terms. Then once I had latitude and longitude information I created a new csv by pairing each message with its coordinates (and other information) . To plot these more easily I converted this to geojson format.