Ben Gotow-SMS Visualization

by Ben Gotow @ 11:37 pm 25 January 2011

The Idea
What can you infer about someone’s social network from their text messaging activity?

A few months ago, I started working on an app that syncs text messages from an Android phone to a desktop client for the Mac. The idea was to decouple text messaging from the phone, enabling the user to have a conversation anywhere and seamlessly transition between messaging on the phone and messaging on a laptop or desktop.

While developing the application, I noticed that the thousands of messages synced by the app revealed interesting trends about my conversational patterns, and it seemed like a perfect data set for a visualization.

The Data
The application downloads all the user’s messages from their phone and stores them in an SQLite database. A text dump from this database formed the data used in the visualization. The format of the text dump is shown below:


The text is ‘:::’ delimited. The columns are as follows:

  1. The phone number messaged
  2. The display name of the user messaged
  3. The origin of the message (0 = your phone, 1 = theirs)
  4. The frame number on which the message should appear in the animation. This is calculated by taking the timestamp of when the message was sent or received, subtracting the timestamp of the first message in the animation and dividing by an acceleration factor.
  5. The length of the text content in the message.

The Visualization

The Discoveries
The biggest discovery along the way was that Processing is pretty cool and very easy to use. I have a lot of experience working with OpenGL and Mac OS X’s Quartz2D APIs, and Processing was a nice surprise. I was able to go from concept to an early working version of the visualization in one afternoon. My one big complaint is that there’s no built-in debugger whatsoever… Coming from a programming background, that’s pretty damn ghetto. I’ve heard you can use Eclipse somehow, so I’ll try that next time.

I was unsure of how to create a graph of interconnected nodes in Processing. I wanted to create one dynamically without advance knowledge of the number of nodes needed, and I didn’t want to write any code to do it. I thought that using some sort of spring physics model would allow the graph to be self-organizing. I did some searching and found the Traer Physics library, which I dropped into my Processing libraries folder and linked into my sketch by binding each contact object to a node in the physics simulation. That was it. There was much rejoicing.

Each node was added to the physics model as a solid body, and negative attractors were added between each of the nodes to cause them to spread evenly. Springs were added between each node and the center ring. This turned out to be a great solution because the resting length of the spring could be adjusted to move the nodes toward and away from the center. I’d wanted to do this the whole time, but using the springs allowed me to smoothly animate that part of the visualization, too.

My original idea was to represent each message sent or received as an arc between nodes. However, I wasn’t sure whether Processing would be able to handle drawing the number of curves required at a decent framerate. With a data set of over 2300 messages, I was pretty sure it would become unworkably slow. Big surprise, it’s Java, and it did. I had to add a shortcut to disable lines so I could rapidly test the visualization.

The Critique
Overall, I’m pretty happy with the visualization. I was able to animate it, and it achieved the initial goal of revealing the social network inferred by your messaging habits. There are a few things I’d like to explore in the data that the visualization doesn’t reveal, though. There’s a lot of data in the actual text contents of the messages that would be fun to look at. How often do people use emoticons? Do you use emoticons more frequently when the person you’re talking to also does? Is there a bimodal distribution in message length that implies that some messages are part of complex multi-message conversations while others are simple “pings?” Answering those questions would require other visualizations, I think–but I’m really curious.

The Code
Dependencies: The processing applet requires the Traer Physics library.
Example Code Used: The code below draws on a large amount of sample code, from and from the documentation of the Traer Physics Library. The Processing “Load File 2” example was particularly useful. The code for the wavestream was written from scratch (in a rather ghetto way.) I’m still looking for a good library that creates them!

The source code is available here

A note about source data: Unfortunately, the source data for this experiment contains sensitive data including people’s phone numbers and names. I’ll be releasing the SMS synchronizing app for Mac and Android soon and that will allow you to gather and format your own messaging history for visualization.

Charles Doomany- InfoViz: Project Update

by cdoomany @ 8:07 pm 24 January 2011

Currently I have a Processing sketch that can accommodate multiple realtime data feeds from Pachube and use the datastreams from those feeds to drive the recursive growth of a virtual tree.

The growth of the tree is dependent on two parameters; ambient light intensity and temperature. The recursive stages of growth consist of 5 divisions ( the first division is represented by a single line (or sapling) and the fifth (last) division represents a fully matured tree. When the optimal environmental conditions are are met for plant growth IRL (roughly 7000 lumens for the light reading and 65 degrees Fahrenheit for temperature), the tree is represented in its mature form (5th division).

My final version of the program will have three distinct geographic locations with their own corresponding tree. The stage of growth will serve as a rough representation of the ambient environmental data from that geographic location.

Here is a rough mock-up of what the final realtime animation will look like:

* The code for the tree itself is based on a project by Stefan Boeykens since I do not have prior experience with simulating recursive growth. The final version of the program will also include additional modification to the source code.

Alex Wolfe | Data Visualization | Update

by Alex Wolfe @ 8:10 am

As of right now I have the bare bones of the processing sketch finished (particle system that does what I want more or less, running on random numbers instead of the actual data). I’ve done several sketches of a background image that will contain the various objects that are jumped from.

I have also compiled my data into a .cvs file, sorted by cause of death/place jumped from (ICD-10 number when available).I went through the entire ICD, picked out any codes related to falling, and then sorted those out of the 21st Century Mortality Compendium. Also added fatalities/survivors from BASE jumpers.

I’d like to actually have more survivors, the BASE jumpers (1500 recorded from all time, about 300 or so from the time period from the 21st Century i’m looking at) are slightly outnumbered by the less fortunate. However, there aren’t many reliable records of survivors of intentional jumpers besides them (since usually the jumping is illegal). Still the balance isn’t half as terrible as I expected.

Data Sources at my previous posthere

Meg Richards – Project 2 Update

by Meg Richards @ 5:52 am

I’ve processed the network traffic from the border and incorporated the free database of IP<->Country mappings to plot the source and destination IPs on a map of the earth. The sphere is given an earth-like texture from Marius Watz’s TileSaver lib that takes a rectangular image and applies it across a sphere. In this case I used the Blue Marble image of the earth provided by NASA.

The border network traffic is well-suited to being plotted on a map, but I still need to finalize the presentation of core traffic. With such a small set of IPs representing internal traffic (mostly 128.2 and 128.237 prefixes) it might be possible to represent the flow of internal traffic as a water simulation or as a set of edges and vertices where the vertex size denotes the amount of traffic it receives.

Ben Gotow-SMS Visualization Progress Report

by Ben Gotow @ 1:08 am

So I spent way too much time working on this over the weekend, but I got Processing to draw what I wanted! I pulled a data set of 2200 text messages off my Android phone, and I created a visualization that creates an animated graph of your social network. The bubbles represent individuals in your network, and their relative size and distance from the center are tied to the number of text messages you’ve been sending them and the “currentness” of your relationship, respectively. People you stop talking to drift out farther in the ring and become dim over time.

Video, just because:

Marynel Vázquez – Update

by Marynel Vázquez @ 12:37 am

I’ve collected a bunch of data about wikipedia’s common misspellings.. I’m working on the visualization now.

Here is an interesting way of using AppleScript to search phrases with Safari (have in mind this is not fast!):

-- Access Safari and save page
on run argv
        -- check for input arguments or show help
	if (count of argv) is less than 2 then
		return "Not enough arguments were given!" & myHelp()
	end if
        -- join arguments from 2nd to last
	set theWords to item 2 of argv
	repeat with i from 3 to (number of items of argv)
		set theWords to theWords & " " & (item i of argv)
	end repeat
        -- output file comes from argument 1
	set sourceFileName to (item 1 of argv)
	set sourceFileName to POSIX file sourceFileName as string
        -- play with Safari
	my SearchWords(theWords, sourceFileName)
end run
-- Search a particular word or group of words
on SearchWords(queryText, sourceFileName)
        -- enclose query in quotes
	set queryText to "\"" & queryText & "\""
	tell application "Safari" to activate
	-- add some random delay
	delay (random number from 2 to 5)
	tell application "Safari"
		-- open new window and move to the search bar
		make new document
		tell application "System Events"
			tell process "Safari"
				keystroke "l" using {command down}
				delay 1
				keystroke tab
				delay 1
				keystroke queryText
				delay 1
				keystroke return
			end tell
		end tell
		delay 3
		-- wait until page loads
		set web_page_is_loaded to false
		set myCounter to 0
		set maxCounter to 50
		set my_delay to 2
		repeat until web_page_is_loaded is true
			-- change words here depending on home page
			if name of window 1 contains "Loading" or name of window 1 contains "Untitled" then
				delay my_delay
				set web_page_is_loaded to true
			end if
			set myCounter to myCounter + 1
			if myCounter is maxCounter then
				set web_page_is_loaded to true
			end if
		end repeat
		delay 1
		if name of window 1 contains "Loading" or name of window 1 contains "Untitled" or name of window 1 contains "http://www." then
			-- do nothing! results didn't load...
			-- save results page
			save document 1 in file sourceFileName
		end if
	end tell
	-- close window
	tell application "System Events"
		tell process "Safari"
			keystroke "w" using {command down}
			delay 1
		end tell
	end tell
end QueryWords
-- Command usage
on myHelp()
	return "
You should pass the output file path and a list of words (at least one) through the command line.
this-applescript output-file word1 [word2] [word3] [...]"
end myHelp

The previous script can be called as follows:

arch -i386 osascript this-applescript-path output-path word1 [word2] ...

Relative paths gave me you-don’t-have-permission-kind-of-errors, so I recommend using full paths. I’m no expert here.. just thought it might be fun to use AppleScript ;)

Caitlin Boyle :: Infoviz Progress

by Caitlin Boyle @ 12:36 am

I ended up ditching the bat data; it was too widespread, and difficult to prep; I want to try and pick it back up later in the semester. For now, I am working with data of North American predator population from NERC Center for Population Biology, plotting the data over time (how far back in time depends on the state) and in space over a map, wherein silhouettes of the animals will grow and shrink depending on population size- mouse over will display population estimate. The data will represent relations between black bear, coyote, fox and cougar habitats and population in the United States over time. The data has been prepped and put into .csv documents, and the visuals are planned out, code is maybe 35% done.

InfoViz update

by Ward Penney @ 12:09 am

I have a pretty good handle on using Google’s GeoCode to get lat/long from my cities. Next, I am plotting them on a map, which Ben Fry provides a pretty good example for. Then, add a date slider. Then, maybe a zoom thing where you can read the sighting text.

Susan Lin — InfoViz, Update

by susanlin @ 11:56 pm 23 January 2011

I have started to tackle my project. I see this in 2 parts:

1. Data Wrangling – Source: Comments RSS. Here is the simple way I am attempting to make sense of it and putting it in a form:

  • Indicate +/- words
  • Input: comment
  • Output: categories
  • Change radius based on number of words
  • Irregular shape: vector function, RNG  for shape, +/- for change

2. Form – Starting here as it made more sense for me to, the exact means of wrangling the data will depend on how I’d like things to look/interact. A bit backwards, but works best for me.
Starting places: Bouncy Bubbles, Lava Lamp


by Samia @ 11:51 pm

The project is coming along. Currently, I have a working parser for my data, as well as the beginnings of the pieces of the visualization. I need to push forward with the interaction – building the framework for how the user views/clicks through/accesses the infromation/visualizations.

LeWei-Project 2 Update-InfoViz

by Le Wei @ 11:09 pm

I changed my project idea pretty drastically since last week. I’m now going to be working with a website [] that hosts images of a ton of coins from the UK, all the way back to 3-digit years. The UK has a tradition, like many places, of putting their current leader on the face of their coins. Because they have been a monarchy since forever, many of their Queens and Kings have images of them from the majority of their lives catalogued through these coins. My base project will be to create sort of a flipbook that shows the coins in chronological order, so we can see how their appearance change over the years. If I have time, I might add a little information about the historical context of the coins. So far, I have a complete dataset of the images and the monarch and year of each coin. I was out of town this weekend so unfortunately not much got done on the coding front, but it shouldn’t be too hard to turn the data into an animation pretty quickly, so that is my next goal.

Mark Shuster – InfoViz Progress Report

by mshuster @ 9:46 pm

I spent a good portion of my weekend creating version one of my visualization.

The concept is to take the current weeks’ Top 40 Radio chart and mash it up with YouTube to see if the music that the recording industry is pushing on the airwaves is in sync with what people consume when they can self-select their music in video format. Thumbnails from the videos are displayed in a collage and their image sizes correspond to the total video views they have received. This means that a video with 100,000,000 views will have 50x the surface area of a video that has only 2,000,000 views, where a video with only a few thousand views will appear to be only a few pixels large.

I’m doing all the scraping (BeautifulSoup), data crunching and processing (PIL) in python and sending the result to HTML via a templating (Jinja2) engine. Right now everything works locally. I’m going to work on the interaction and aesthetic and hopefully have a version that works on web rather than just my machine, but that may not be feasible (each request requires ~40 API calls and a good deal of image processing).

Project 2: General Direction

by Asa Foster @ 8:26 pm 22 January 2011

The Age of Adz

After a productive brainstorming session with Golan, I have decided that I will be doing a study of people’s intangible connection to music. The specific piece – Sufjan Stevens’ 5 part, 25-minute long suite “Impossible Soul” – will be played to a listener, who during which will be asked to, in their heads, define some variable by which to rate the music using a physical knob supplied to them. This self defined variable could be something as basic as “how much I’m liking it at the moment” to something more of the obtuse hippie psychobabble type such as “how much the music is flowing with my inner karmic energy”. Using whatever rating scale they choose, the listener will be asked to track their response over time with the knob, and the information will be presented in graph form.

Charles Doomany- InfoViz: Revised Project Concept

by cdoomany @ 2:17 pm 20 January 2011

I gave the project a little more thought, and I will be making a processing app that acquires realtime environmental data (i.e. barometric pressure, temperature, humidity) via a live feed from Pachube, and uses that realtime data to drive the recursive growth of virtual plant life. The realtime animation (the growth and form characteristics of the plant) will serve a virtual representation of the environmental conditions at that specific remote location.

I am also considering taking multiple feeds (which will be individually represented by a different plant life or pseudo-species), so one could compare sets of environmental data from various international remote locations.

Currently I have a few live Pachube feeds that I can use which seem to be streaming reliable environmental data.

Here is a feed out of Ogaki, Japan that is streaming light intensity, humidity, temperature, etc.:


Alex Wolfe | Data Visualization | Idea Revision

by Alex Wolfe @ 8:03 am 19 January 2011

After the workshop on Monday, I decided to further explore the idea of flight vs. fall. I found a number of data sources on falling, a list of accidental falls, skydiving fatalities, suicide records, etc. I want to aggregate this information into one visualization.

I was thinking of creating a particle system that would begin in the top left side of the screen and would “jump” from the varying heights, building, bridge, plane, and would either halt before hitting the ground if the person survived or continue all the way to the bottom to some dramatic effect if the person was not so fortunate.

Data Sources

21st Century Mortality Dataset
A list of all of the deaths from 2007 – 2009 categorized by cause. Deciphered using the ICD-10, the World Health Organization Classification of Deaths Compendium. I specifically looked at all entries labeled from W00, Fall on same level involving ice and snow, to W19 Unspecified Fall.

BASE Jumping Fatality List
List of all reported BASE jumping deaths from inside the community with short snippets on how they died

BASE Numbers
Number of people rewarded a ranking for jumping off of a building, antenna, span(bridge), earth(cliff)


by ppm @ 3:47 am

The idea is to render color palettes over time for a selection of movies. We could look for color trends by genre or year, or play a “match the movie to the palette” game like so:

Of course I don’t know what the rendered palettes will look like yet–I will have to try and see.

Susan Lin — InfoViz, Sketch

by susanlin @ 12:56 am

As encouraged, this is the simplest, most viable idea. Probably best if I’m not the strongest coder and running a cold for the week…

In the simplest form, I’d like my idea to visualize the growing or shrinking arguments on both sides (for or against the Chinese mother). The interface in mind is simple: like a video player. The user can drag between old and new and watch the blobs change.

When hovering over either blob at any given spot, the user will be given the most prevailing argument (as measured by the number of inline comments) and words (count frequency of word in the area maybe +/- 25 comments.

The blobs themselves will be driven by word counts as well… After figuring out which are the top 5 words in a 50 comment radius, the change in the count (positive or negative) will affect one point on the blob dragging it in forms. (Haven’t looked into what language to use, but *something’s* gotta be able to support this right?

This lava lamp-esque means of visualizing a firey argument also intrigues me.. If the blobs expand into each other, they will repel.

LeWei-Idea Revision and Big Questions-Infoviz

by Le Wei @ 11:00 pm 18 January 2011

My idea for the project has evolved into two separate but related sub-projects.

My initial idea was to create a family tree with fun facts and juicy details about the lives of some royal family. It would include not only relationships but also highlight the more interesting parts of history including scandals, mental illness, and mysterious deaths.

Jumping off from suggestions in class that I look at families other than the royalty whose lineages have already been mapped out, I thought about comparing their family tree with that of some regular Joes. However, I don’t have easy access to my own family’s data (much less in a computer-readable form) and I would like to avoid using some random stranger’s family tree from the internet, so I’ve decided to construct a ‘typical family tree’ from US family statistics throughout the years. Specifically, I will be using data for life expectancy, household size, marriage trends, and most popular names. This could probably be enough of an information visualization on its own, so I might narrow down my scope to just this portion if I run short on time.

Questions to consider: How do I accurately convert statistics into one “average” family? Is there an easier way to get royal family relationships than going through Wikipedia’s articles? Will comparing the two trees actually lead to any insight, or should I just concentrate on constructing one or the other?


by Samia @ 9:29 pm

I am working with a log of all of my actions for the fall and spring of my sophomore year.

Below are some sketches that I made, beginning to think about the form of my visualization. They are mostly straightforward, looking at bar charts and pie charts. The bigger questions I’ve been asking is what I am comparing across — I think the interesting things will come out in looking at falls/spring or weekend/weekday, for example. I was also wrestling with the question of static/interactive. I’ve decided to work interactively, most likely in some kind of simple interface with a time line at the bottom, and the “visualization” at top. I am trying to figure out how to incorporate/build in small viz of a single day to the overall viz of the entire semester (so how does this specific day compare to the rest of the semester, or the “average” monday).

Next things next — typin’ up the data.

Meg Richards – Potential Data Sources

by Meg Richards @ 2:17 am 17 January 2011


Each student, staff member, and faculty member that comes to or works for CMU has some amount of personal info made available by default with the LDAP servers. The front-end of this service can be seen at The information includes a person’s associated departments, majors, student level, primary affiliation, name, andrew id, and email address. Analyzing this data might let us draw some conclusions about the probabilities of getting a minor with a certain major; the declaration of a minor by a certain student level in a particular department; or even the likelihood of someone in a certain major to forward their mail to Gmail.

2. Foursquare API

Foursquare is a framework and service for location-based social networking. The Foursquare API allows you to pull information about venues, users,  user checkins at a venue, and tips for a venue in addition to allowing you to perform actions like a checkin. Using that data, you could get an idea of where you and your friends typically go or what users who visit a particular venue are also likely to visit.

3. CMU Service Monitoring Data

Mail, OLR, Portal, LDAP, Calendar, Blackboard, KDC, WebISO, and smattering of web servers that keep this university running each fall down occasionally. When that happens, the monitoring system leaps into action and alerts the poor sucker on coverage duty to go and fix whatever’s broken. The service, problem, and alert timestamp are all logged, and visualizing that data over a variable time granularity might lead to some interesting observations.

« Previous PageNext Page »
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2021 Interactive Art & Computational Design / Spring 2011 | powered by WordPress with Barecity