Meg Richards – Project 2 Final

by Meg Richards @ 3:14 pm 4 February 2011

Network Usage Bubbles

The original incarnation of this project was inspired by the Good Morning! Twitter visualization created by Jer Thorp. A demonstration of CMU network traffic flows, it would show causal links for the available snapshot of the network traffic. All internal IP addresses had to be anonymized, making the internal traffic less meaningful. Focusing only on traffic with an endpoint outside of CMU was interesting, but distribution tended towards obeying the law of large numbers, albeit with a probability density function that favored Pittsburgh.

This forced me to consider what made network traffic interesting and valuable, and I settled on collecting my own HTTP traffic in real-time using tcpdump. I summarily rejected HTTPS traffic in order to be able to analyze the packet contents, from which I could extract the host, content type, and content length. Represented appropriately, those three items can provide an excellent picture of personal web traffic.


The visualization has two major components: Collection and representation. Collection is performed by a bash script that calls tcpdump and passes the output to sed and awk for parsing. Parsed data is inserted into a mysql database. Representation is done by Processing and the mysql and jbox2d libraries for it.

Visualization Choices

Each bubble is a single burst of inbound traffic, e.g. html, css, javascript, or image file. The size of the bubble is a function of the content size, in order to demonstrate the relative amount of tube it takes up to other site elements. Visiting a low-bandwidth site multiple times will increase the number of bubbles and thus the overall area of its bubbles will approach and potentially overcome the area of bubbles representing fewer visits to a bandwidth-intensive site. The bubbles are labeled by host and colored by the top level domain of the host. In retrospect, a better coloring scheme would have been the content type of the traffic. Bubble proximity to the center is directly proportional to how recently the element was fetched; elements decay as they approach the outer edge.

The example above shows site visits to, (and by extension,,, and finally, in that order.

Network Bubbles in Action

Code Snippets

Drawing Circles

Create a circle in the middle of the canvas (offset by a little bit of jitter on the x&y axes) for a radius that’s a function of the content length.

Body new_body = physics.createCircle(width/2+i, height/2+j,sqrt(msql.getInt(5)/PI) );

Host Label

If the radius of the circle is sufficiently large, label it with the hostname.

if (radius>7.0f) {
    textFont(metaBold, 15); 

tcpdump Processing

Feeding tcpdump input into sed

tcpdump -i wlan0 -An 'tcp port 80' | 
while read line
if [[ `echo $line |sed -n '/Host:/p'` ]]; then 
    activehost=`echo $line | awk '{print $2}' | strings`

The full source

Project 2: The Globe

by huaishup @ 6:30 am 2 February 2011

1. Overall
When talking about data visualization, most of the people will think of computer graphic visualization. However, from my view, this is only one of the possible ways to do it. Why not trying visualizing data in physical ways? People can not only see the visualization result, but can also touch and manipulate the visualization device, which could be really interesting.

In this project, I explores the physical/tangible way of visualizing data. Using a paper globe as the data media, people can learn the language of a certain area by spinning the globe and adjusting the probe.

2. Material

Arduino x1

WaveShield with SD card x1

Button x1

Speaker x1

Variable resister x2




3. Process

a. Prepare the paper globe
Using google images to download one LANGE size global map. Download a Photoshop plugin called Flexify 2 to revise the map images. Here is the tutorial. Plot the revised image, cut and glue.

b. Fix the variable resistor
Laser cut 4 pieces of round woods to fix the shape of the paper globe. May use extra timber to do so. Install one of the variable resistor to the bottom of the globe. See below.

c. Install all the other parts
Install another variable resistor as the probe which points to the globe. Lazer cut a seat for the probe and the globe. Hook up two different Analog Input pins with the Arduino and resistors.

d. Calculate the position
Spin the globe and alter the probe. Different position has different resistor value, which can be mapped to the sound track. Calculate the position and map the sound track.

e. Prepare the sound
Download the language sound from Google translate and store them in the waveshield.

4. Video

5. Code

#include "WaveUtil.h"
#include "WaveHC.h"
char decision = 0;
SdReader card;    // This object holds the information for the card
FatVolume vol;    // This holds the information for the partition on the card
FatReader root;   // This holds the information for the filesystem on the card
FatReader f;      // This holds the information for the file we're play
WaveHC wave;      // This is the only wave (audio) object, since we will only play one at a time
#define DEBOUNCE 100  // button debouncer
int mySwitch = 7;
// this handy function will return the number of bytes currently free in RAM, great for debugging!
int freeRam(void)
  extern int  __bss_end;
  extern int  *__brkval;
  int free_memory;
  if((int)__brkval == 0) {
    free_memory = ((int)&free_memory) - ((int)&__bss_end);
  else {
    free_memory = ((int)&free_memory) - ((int)__brkval);
  return free_memory;
void sdErrorCheck(void)
  if (!card.errorCode()) return;
  putstring("\n\rSD I/O error: ");
  Serial.print(card.errorCode(), HEX);
  putstring(", ");
  Serial.println(card.errorData(), HEX);
void setup() {
  // set up serial port
  putstring_nl("WaveHC with 6 buttons");
   putstring("Free RAM: ");       // This can help with debugging, running out of RAM is bad
  Serial.println(freeRam());      // if this is under 150 bytes it may spell trouble!
  //  if (!card.init(true)) { //play with 4 MHz spi if 8MHz isn't working for you
  if (!card.init()) {         //play with 8 MHz spi (default faster!)
    putstring_nl("Card init. failed!");  // Something went wrong, lets print out why
    while(1);                            // then 'halt' - do nothing!
  // enable optimize read - some cards may timeout. Disable if you're having problems
// Now we will look for a FAT partition!
  uint8_t part;
  for (part = 0; part < 5; part++) {     // we have up to 5 slots to look in
    if (vol.init(card, part))
      break;                             // we found one, lets bail
  if (part == 5) {                       // if we ended up not finding one  :(
    putstring_nl("No valid FAT partition!");
    sdErrorCheck();      // Something went wrong, lets print out why
    while(1);                            // then 'halt' - do nothing!
  // Lets tell the user about what we found
  putstring("Using partition ");
  Serial.print(part, DEC);
  putstring(", type is FAT");
  Serial.println(vol.fatType(),DEC);     // FAT16 or FAT32?
  // Try to open the root directory
  if (!root.openRoot(vol)) {
    putstring_nl("Can't open root dir!"); // Something went wrong,
    while(1);                             // then 'halt' - do nothing!
  // Whew! We got past the tough parts.
void loop() {
  //putstring(".");    // uncomment this to see if the loop isnt running
  if(digitalRead(mySwitch) == HIGH) {
    Serial.println("switch is ok");
    int spin = analogRead(5);
    int probe = analogRead(2);
    if(spin>=0 && spin<=576 && probe >=179 && probe <=276) {
    else if(spin>=85 && spin<=313 && probe >=35 && probe <=160) {
    else if(spin>=580 && spin<=780&& probe >=0 && probe <=142) {
    else if(spin>=980 && spin<=1023 && probe >=7 && probe <=22) {
    else if(spin>=980 && spin<=1023 && probe >=0 && probe <=7) {
    else if(spin>=1023 && probe >=47 && probe <=288) {
// Plays a full file from beginning to end with no pause.
void playcomplete(char *name) {
  // call our helper to find and play this name
  while (wave.isplaying) {
  // do nothing while its playing
  // now its done playing
void playfile(char *name) {
  // see if the wave object is currently doing something
  if (wave.isplaying) {// already playing something, so stop it!
    wave.stop(); // stop it
  // look in the root directory and open the file
  if (!, name)) {
    putstring("Couldn't open file "); Serial.print(name); return;
  // OK read the file and turn it into a wave object
  if (!wave.create(f)) {
    putstring_nl("Not a valid WAV"); return;
  // ok time to play! start playback;

Alex Wolfe | Data Visualization

by Alex Wolfe @ 8:34 am 31 January 2011


I’m personally terrified of falling. I used to be a big rock climber, and one time I was sort of shimmying my way up a chimney, it’s this narrow space and there are no handholds so your back is wedged up against one wall and your feet the other and you just try to walk your way up. But I was too short and my shoes were too slippery and a lost my grip, my baleyer wasn’t paying attention so I just fell. I was pretty high up and it was probably only 100ft before  he stopped the rope and grabbed me, but it felt like eons, and I was so scared and I kept thinking of that unpleasant wet *thwack* sound I’d make when I finally hit the bottom.

So I have a sort of morbid fascination for people who’d jump of their own free will. Initially when I started this project I had this idea of flight vs. fall, visually displaying all the people who jump each year, and showing who survived and who didn’t seeing as I myself came so close to the statistic. I really wanted to highlight the falling motion, and probably the dramatic splat I’d so envisioned.


I stumbled across the 21st century mortality dataset which was this comprehensive list of everyone who’d died since 2001 in england, and exactly where and how they died. It was ridiculously huge, with over 62,000 entries, each storing multiple deats. They used the ICD-10 International Classification of Diseases which is brutally specific to categorize them. Just looking for deaths related to falls earthed up 17 different categories, ranging from meeting your demise by leaping off of a burning building to death by falling off the toilet. However, when I went digging around for survivors, there wasn’t anything half so comprehensive. BASE jumpers are assigned a number when they complete all 4 tasks, skydiving companies keep some vague handwavy statistics, and I found several lists of people who’d died trying. However those crazy people who jump for fun typically are up to some crazy(illegal) stunts, such as underwater base jumping, or walking out to the edge of a wind turbine so there is no easy way to count/find/categorize them all with half the level of detail as the less fortunate.

So i decided to focus on the dataset I had. I wrote a quick javascript that essentially just trolled through the dataset, which was stored as a .cvs file, and pulled out any deaths filed under codes related to falling and put them in a nice new .cvs file

First Stab/Brainstorming

Since I had that jumping/falling effect in mind, I went through and made each person I had into his/her own particle. Mass/Radius I based on the age of the person who died, color based on gender, and I stored any information other information about them in the particle. I put some basic physics and started playing around. I had this idea where I could simulate the “jump” with each particle starting from the height of the person’s final leap, and I could hand-draw a graphic for the side displaying the more common places.

Here was my initial attempt


Although interesting, it wasn’t particularly informative at all, so i abandoned the “jumping effect” and focused on other things I could get the particles to do. Ultimatly I executed blobbing based on gender, and then sorting themselves into the ICD-10 categories of death, keeping hold of the “falling effect” during the in between transitions. I wanted to have each stage look like the particles just fell into place create the visualization


Although I love the freeform effect of the falling particles, and their transition from displaying each of the patterns, it doesn’t really do the data justice. I have so many juicy details stored in there I just couldn’t display. With the number of particles, it was horribly slow if you did mouseover display, and each one was virtually unique as far as age, place, cause of death, gender, so there weren’t any overarching trends for the really interesting stuff. I think I’m going to go back and guess my best estimate for height, hardcode it in and maybe do a state where it attempts to display it, or at least a few more things, eg. line up by age and mouseover explain ICD-10 code. I really want to think of a way to get the cause of death to be a prominent feature for each individual

Chong Han Chua | App Store Visualization | 31 January 2011

by Chong Han Chua @ 8:31 am

This project explores the possibility of doing an interesting visualization of icons from the Apple iTunes App Store.

This project was conceived when I saw the visualization of flags by colours. The whole nature of a set of similar manner graphics, such as  the flags, amused me immensely. It then suddenly occurred to me that the icons on the iTunes App Store are of the same nature. Almost rectangle, with rounded corners, and usually vector graphics of some sort – would be interesting to look at.

I guess in a way, this existed as largely a technical inquiry. This is the first time I wrote a crawler as well as a screen scraper. This is the first time I dealt with a large set of data that takes almost forever to do anything with. I can almost feel that heart beat when I ran the scraping script for the first time, half expecting Apple to boot me off their servers after a few hundred continuous queries. Thankfully, they didn’t.

There are a bunch of technical challenges in this inquiry mainly:

1. Scraping large sets of data requires planning. My scraping code went through at least 3 different versions, not to mention playing with various language. Originally, I wanted to use Scala as I was under the impression that the JVM would be more efficient as well as speedy. Unfortunately, the HTML returned by the iTunes App store is malformed – one of the link tags is not properly closed and choked the built in Scala’s XML parser.

After determining that using any random Java XML parser would be too much of a hassle, I turned to my favourite scripting language, JavaScript on node.js (using Google V8). After looking through a bunch of DOM selection solutions, I finally got jsdom and jquery to work, then I knew that I was in business.

The original plan was to crawl the website from first page to last page and create a Database entry for every page in the website. There was only very basic crash recovery in the script which basically state that the last scraped entry is a certain index n. Unfortunately for me, the links are traversed not exactly in the same order every time so I ended up having duplicate entries in my database. Also, the script was largely single threaded, and it took almost over 10 hours to scrape 70+k worth of pictures.

After realizing that a partial data set will not do me any good, I decided to reconcentrate my efforts. I then built in some redundancy in getting links and test the data base for existing entries before inserting. I also ran another script on top of the scraper script that restarts the script when it crashes on a bad response. Furthermore, I used 20 processes instead of 1 to expedite the process. I was half expecting to really get booted off this time round, or get a warning letter from CMU but thankfully till now there is none. After 10 hours or so, I managed to collect 300,014 images. Finder certainly isn’t very happy about that.

2. Working with large data sets requires planning. Overall, this is a pretty simple visualization, however the scaffolding required to process the data consumes plenty of time. For one, there was a need to cache results so that it doesn’t take forever to debug anything. SQLite was immensely useful in this process. Working of large sets of data also means that when there is a long running script, and it crashes, most of the time, the mid point data is corrupted and has to be deleted. I pretty much ran through every iteration at least 2 to 3 times. I’m quite sure most of my data is in fact accurate, but the fact that a small portion of the data was corrupted (I think > 20 images) does not escape me.

I wouldn’t consider this a very successful inquiry. Technically it is ego stroking, on an intellectual-art level, there seems to be no very useful results from the data visualization. When constructing this visualization, I had a few goals in mind

1. I don’t want to reduce all these rich data sets into simply aggregations of colours or statistics. I want to display the richness of the dataset.

2. I want to show the vastness of the data set.

As a result, I ended up with a pseudo spectrum display of every primary colour of every icon in the App Store that I scraped. It showed basically the primary colour distribution in something that looks like a HSB palette. The result was that it seems to be obvious that there are plenty of whitish or blackish icons, and the hue distribution of the middle saturation seems quite even. In fact, it says nothing at all. It’s just nice to look at.

There’s a couple of technical critiques on this: The 3D to 2D mapping algorithm sucks. What I used was a very simple binning and sorting via both the x and y axis. Due to the binning, the hue distribution was not equal for all bins. To further improve this visualization, the first step is to at least equalize the hue distribution across bins.

I guess what I really wanted to do, if I had the time, was to have a bunch of controls that filters the icons that showed up on the screen. I really wanted to have a control where I can have a timeline where I can drag the slide across time and see the appstore icons populate or a bunch of checkboxes which I can show and hide categories that the apps belong to. If I have more chops, I would attempt to sort the data into histograms for prices and ratings and have some sort of colour sorting algorithms. If I had more chops, I would make them animate from one to another.

I think there is a certain difficulty in working with big data sets as there is no expectation of what trends to occur since statistically speaking everything basically evens out and on the other hand it basically just takes forever to get anything done. But it is fun, and satisfying.

If you want the data set, email me at johncch at cmu dot edu.

Code can be found at:

Data prep


by Samia @ 7:15 am

For my project, I wanted to explore the data of my life. For fall and spring semester of my sophomore year, I kept a detailed, fairly complete log of all of my actions in an analogue notebook. I wanted to see if there were any connections I could draw out of my everyday actions.

One of the biggest problems I ran into was simply getting the data into digital form. I had (and still have) to type it up personally because no one else can read my handwriting or understand the shorthand.

After I had a few days of data written, and a working parser, I began to run the data with a first, basic visualization of a pie chart. I mapped events catagorized as sleep to a dark purple, school to blue, fun to pink, housekeeping to yellow, and wasting time to green. In the screen shots below, I also gave them a transparency with alpha. NOTE: because I am silly, these graphs start midnight at 0300, and then move through the day clockwise.

the image below is of maybe 3 or 4 days worth of data. Already patterns are emerging — a desaturated area where I sleep, a yellow/olive band of waking up and getting up in the morning. two blue peaks of morning and afternoon classes, and a late afternoon pink band of doing fun after class stuff.

by this point, probably around 10 days of data, the patterns are pretty clear — especially the sharp line between black and yellow when I wake up every morning.

the pie chart re-oriented so that 0000 hours is midnight. 0600 is midday.

below I looked at the data sequentially, each day as a horizontal band.

My final interaction looked like the below images: each of the bands is a day of the week. So monday, for example, is the three days of semi-transparent monday data graphed on top of one another. patterns are pretty obvious here – the front of the school week has lots of blue for homework and school. The afternoons of thursday and friday are fairly homework free, etc

Clicking on a day allows you to compare the “typical” day with specific ones, as well as compare the events broken down by catagory (how many hours of school work vs that days’ average)

All in all, I’m glad I got this to work in some capacity. I think the data would be more interesting if I had all of it. In terms of interaction and design there are lots of weak points — poor labelling, jarring color-coding, and non-intuitive buttons.

For concept however, Golan hit the nail on the head. As I was transcribing the data, I was really excited to work with some of the specific things I tracked — for example, when I tried to get up in the morning and failed, or when I did my laundry, or how much time I spent doing homework, verses in class, what times of day I biked. I think I was so caught up in getting the “overview” of the project to work that I never got to those more interesting and telling points. In retrospect, my time may have been better spent digitizing the data about, perhaps, when I slept, and then just working with that, since it became obvious that I would not have time to put in the entire database. A smaller subset of the information might have conveyed a more understandable picture — for example seeing that I’m biking home from campus at 2 in the morning might just as well convey I had a lot of work to do as writing all the tasks of that day.

Caitlin Boyle :: Project 2 InfoViz

by Caitlin Boyle @ 6:35 am

My idea came from…, various exercises in frustration. In a way, the hardest part about this project was just committing to an idea… Once my initial project fell through, my attack plan fell to pieces. I’m not used to having to think in terms of data, and I think I got too wrapped up in the implications of “data”. Really, data could have been anything… I could have taken pictures of my belongings and made a color map, or done the same for my clothing; but in my head, at the time, I had a very specific idea of what the dataset I was searching for was, what it meant, and what I was going to do once I found it. I think stumbling upon the ruins of the government’s bat database put me in too narrow a mindset for the rest of the project… for a week after I realized the batdata wasn’t going to fly, I went looking for other population datasets without really questioning why, or looking at the problem another way. It took me a little longer than it should have to come back around to movie subtitles, and I had to start looking at the data before I had any idea of what I wanted to visualize with it. My eventual idea stemmed out of the fluctuation of word frequency in different genres; what can you infer about the genre’s maturity level, overarching plot, and tone by looking at a word map? Can anything really be taken from dialogue, or is everything in the visuals? The idea was poked along with thanks to Golan Levin and two of his demos; subtitle parsing and word clouds in processing.

Data was obtained… after I scraped it by hand from ‘s 50 Best/Worst charts for the genres Horror, Comedy, Action and Drama. .srt files were also downloaded by hand because I am a glutton for menial tasks I’m a novice programmer, and was uncomfortable poking my nose into scripting. I just wanted to focus on getting my program to perform semi-correctly.

Along the way, I realized… how crucial it is to come to a decision about content MUCH EARLIER to open up plenty of time for debugging, and how much I have still to learn about Processing. I used a hashtable for the first time, got better acquainted with classes, and realized how excruciatingly slow I am as a programmer. In terms of the dataset itself, I was fascinated by the paths that words like “brother, mother, father” and words like “fucking” took across different genres. Comedy returns a lot of family terms in high frequency, but barely uses expletives; letting us know that movies that excel in lewd humor (Judd Apatow flicks, Scary Movie, etc.) are not necessarily very popular on imdb. On the other hand, the most recurring word in drama is “fucking”, letting us know right away that the dialogue in this genre is fueled by anger.

All in all I think I gave myself too little time to answer the question I wanted to answer. I am taking today’s critique into consideration and adding a few things to my project overnight; my filter was inadequate, leaving the results muddied and inconclusive. I don’t think you can get too much out of my project in terms of specific trending; the charm is in it’s wiki-like linking from genre-cloud, to movie titles, to movie cloud, to movie titles, to movie cloud, for as long as you want to sit there and click through it. I really personally enjoy making little connections between different films that may not be apparent at first.

Subtitle Infoviz ver. 1 video

Pre-Critique Project

Post-Critique (coming soon) :: more screenshots/video/zip coming soon… making slight adjustments in response to critique, implementing labels and color, being more comprehensive when filtering out more common words. I plan to polish this project on my own time.

Project 2: Data Visualization – Mapping Our Intangible Connection to Music

by Asa Foster @ 4:28 am

General Concept

Music is an incredible trigger for human emotion. We use it for its specific emotional function a lot of the time, using music to cheer us up or calm us down, as a powerful contextual device in theater and film, and for the worship of our deities of choice. Although it is very easy for an average listener to make objective observations about tempo and level of intensity, it is harder to standardize responses to the more intangible scale of how we connect to the music emotionally. This study aims to gain some insight on that connection by forcing participants to convert those intangible emotional responses to a basic scale-of-1-to-10 input.

The goal of this project is to establish a completely open-ended set of guidelines for the participant in order to collect a completely open-ended set of data. Whether correlations in that data can be made (or whether any inference can be made based on those correlations) becomes somewhat irrelevant due to the oversimplification and sheer arbitrariness of the data.


An example of an application of a real-time system for audience analysis is the response graph at the bottom of the CNN screen during political debates. The reaction of the audience members, displayed by partisanship, is graphed to show the topic-by-topic approval level during the speech. By having a participant listen to a specific piece of music (in this case, Sufjan Stevens’ five-part piece Impossible Soul) and follow along using a program I created in Max/MSP to graph response over time, I can fashion a crude visual map of where the music took that person emotionally.

Data & Analysis

Data was gathered from a total of ten participants, and the graphs show some interesting connections. First off are the similarities within the opening movement of the piece; from talking with the participants there seemed to be a general sense of difficulty standardizing one’s own responses. This led to a general downward curve once the listener realized that there was a lot more breadth to the piece than the quiet opening lets on. Second is the somewhat obvious conclusion that the sweeping climax of the piece put everyone more or less towards the top of the spectrum. The third pattern is more interesting to consider: people were split down the middle with how to approach the song’s ending. To some it served as an appropriately minimalist conclusion to a very maximalist piece of music, to others it seemed forced and dry.

Areas of Difficulty & Learning Experiences

  • The song is 25 minutes long, far too long for most CMU students to remove their noses from their books.
  • As the original plan was to have a physical knob for the listener to use, I had an Arduino rig all set up to input to my patch when I fried my knob component and had to scale back to an on-screen knob. Nowhere near as cool.
  • A good bit of knowledge was exchanged for the brutal amount of time wasted on my initial attempt to do this using Processing.
  • I have become extremely familiar with the coll object in Max, a tool I was previously unaware of and that has proved EXTREMELY useful and necessary.


Download Max patches as .zip: DataVis

Susan Lin — InfoViz, Final

by susanlin @ 3:14 am

Visualizing a Flaming Thread
InfoViz based off of WSJ article “Why Chinese Mothers are Superior” comments thread.

This is a looong post, so here’s a ToC:

  • The Static Visualization
  • Everything Presented Monday
  • Process Beginnings
  • Pitfalls
  • Retrospective

The Static Visualization

As per the advice during the critique, I create a infographic based off the static variant of my project. I decided to keep the 10×100 grid layout for aesthetic reasons (not having 997 bubbles all in one row and thus making a huge horizontal graphic).

This alternative version of the above offers the same thing with the areas of interest highlighted.

Everything Presented Monday
Links for everything pre-critique.

Process Beginnings

Like mentioned, the data source of interest was this WSJ Article. The article sparked some serious debate leading to threads such as these. Here is a particularly powerful answer from the Quora thread from an anon user:

Drawing from personal experience, the reason why I don’t feel this works is because I’ve seen an outcome that Amy Chua, the author fails to address or perhaps has yet to experience.

My big sister was what I used to jealously call “every Asian parent’s wet dream come true” [… shortened for conciseness …]
Her life summed up in one paragraph above.

Her death summed up in one paragraph below.
Committed suicide a month after her wedding at the age of 30 after hiding her depression for 2 years.

I thought the discussion around it, though full of flaming, was very rich with people on both ends of the spectrum chiming in. My original idea was to take apart the arguments and assemble it in a form which would really bring out the impact, similar to the excerpt from Quora.

I started off with the idea of having two growing+shrinking bubbles “battle.” More information can be read on this previous post.

This was the baseline visual I devised:

  • Green and Orange balls collide with each other.
  • Collision: green does not affect green, likewise, orange did not affect orange.
  • Colliding with opposition shortens your life span (indicated by opacity).
  • Touching an ally ups your life span.

Giving credit where credit is due:
The code started with Bouncy Balls and was inspired by Lava Lamp.

Next, I wanted to see if I could work some words into the piece. Word Cloud was an inspiration point. In the final, I ended up using this as the means of picking out the words which charged comments usually contained: parent, Chinese, and children.

Cleaning up the data:

  • When I downloaded the RSS feed of the comments, it was all in one line of  HTML (goody!).
  • With some help, I learned how to construct a Python script to organize it.
  • Basically, the code figures out where each time stamp and comment is relative to the mark-up patterns, and separates the one line out to many lines.
import re
f = open('comments.html', 'r')
text = ''
for line in f:
    while 1:
        m ='#comment.*?#comment\d+', line)
        if m is None:
        comment = line[:m.span()[1]]
        n = comment.find("GMT") + 4
        text += comment[:n] + "\n"
        text += comment[n:] + "\n\n"
        line = line[m.span()[1]:]
f2 = open('comments-formatted.html', 'w')

Sources: comments.html, comments-formatted.html

More looking outwards:

While working, I often took a break by browsing Things Organized Neatly. It served both as motivation, inspiration, and admittedly procrastination. Also, if I could revise my idea, maybe something interesting to look at in a future project would be commonly used ingredients in recipes (inspired by above photo taken from the blog).


The greatest downer of this project was discovering that language processing was actually quite hard for a novice coder to handle. Here were abandoned possibilities, due to lack of coding prowess:

  • LingPipe Sentiment Analysis – This would have been really freaking cool to adapt this movie review polarity to a ‘comment polarity’ analysis, but unfortunately, this stuff was way over my head.
  • Synesketch – Probably would have been a cool animation, but didn’t get to show two emotions at once like the original idea desired.
  • Stanford NLP – Again, admired this stuff, but way over my head.

In no order, some of the things I learned and discovered while doing this project.

  • Language processing is still a new-ish field, meaning, it was hard to find a layman explanation and adaptation. It would have been nice to do more sophisticated language processing on these comments, but language processing is a monster on its own to tackle.
  • Vim is impressive. I now adore Vim users. (Learned during the Python script HTML clean-up portion of this project.)
  • Mechanical Turk: This might have been an alternative after figuring out language processing was hard to wrangle. Though building a framework to harvest this data is unfamiliar territory as well (probably with its own set of challenges).
  • Another field: I really wanted to map this variable out, especially after harvesting it, but the time stamp was not used. An animation with the time stamps normalized by comment frequency may have added another layer of interpretation. Addition: Though, from the critique, it seems like more layers would actually hurt more than help. Still, I wonder if in the static visualization the time stamp could have added.
  • All-in-all: I thought this was parsed down to the simplest project for 2+ weeks… This clearly wasn’t the case. Lesson: Start stupidly simple next time!
  • As for things that went well: I forced myself to start coding things other than simple mark-up again, which is very pleasing when things come together and start working.
  • I am pleased with the combined chaos+order the project exudes (lava lamp on steroids?). The animation made for a poor visualization compared to the static version even though I spent 80% of my time getting the animation to work. On the bright side, I would have never found out without trying, so next time things will be different.

Charles Doomany- InfoVis: Final Post

by cdoomany @ 2:29 am

Digital Flora

This project acquires realtime environmental data (ambient light and temperature) from several distinct geographic locations and uses the data as a parameter for driving the recursive growth of a virtual tree. Each tree serves as a visual indicator of the environmental conditions of their respective geographic location. When the optimal conditions are met for plant growth (~7000 lumens/ 18.3 °C) the animation displays a fully matured tree at its last stage of recursion.

I used Pachube to acquire the data and Processing to generate the tree animation.

Ideas for Improvement:

• Add more parameters for influencing growth ( ex: daily rainfall, soil pH, etc.)

• Increase the resolution of growth (currently only ten levels of recursive depth)

• Growth variation is not observable over short periods of time, but is only apparent over long term seasonal environmental changes

• Current animation appears fairly static, there is an opportunity to add more dynamic and transient animated elements that correspond with environmental conditions

• An ideal version of the program would have multiple instances of the animation running simultaneously, this would make it possible to compare environmental data from various geographic locations easily

• A viewable history of the realtime animation would be an interesting feature  for accessing and observing environmental patterns

• More experience with recursively generated form and some aspects of OOP would certainly have helped me reach my initial goal

Timothy Sherman – Project 2 – ESRB tag cloud

by Timothy Sherman @ 1:17 am