Network Usage Bubbles

The original incarnation of this project was inspired by the Good Morning! Twitter visualization created by Jer Thorp. A demonstration of CMU network traffic flows, it would show causal links for the available snapshot of the network traffic. All internal IP addresses had to be anonymized, making the internal traffic less meaningful. Focusing only on traffic with an endpoint outside of CMU was interesting, but distribution tended towards obeying the law of large numbers, albeit with a probability density function that favored Pittsburgh.

This forced me to consider what made network traffic interesting and valuable, and I settled on collecting my own HTTP traffic in real-time using tcpdump. I summarily rejected HTTPS traffic in order to be able to analyze the packet contents, from which I could extract the host, content type, and content length. Represented appropriately, those three items can provide an excellent picture of personal web traffic.

Implementation

The visualization has two major components: Collection and representation. Collection is performed by a bash script that calls tcpdump and passes the output to sed and awk for parsing. Parsed data is inserted into a mysql database. Representation is done by Processing and the mysql and jbox2d libraries for it.

Visualization Choices

Each bubble is a single burst of inbound traffic, e.g. html, css, javascript, or image file. The size of the bubble is a function of the content size, in order to demonstrate the relative amount of tube it takes up to other site elements. Visiting a low-bandwidth site multiple times will increase the number of bubbles and thus the overall area of its bubbles will approach and potentially overcome the area of bubbles representing fewer visits to a bandwidth-intensive site. The bubbles are labeled by host and colored by the top level domain of the host. In retrospect, a better coloring scheme would have been the content type of the traffic. Bubble proximity to the center is directly proportional to how recently the element was fetched; elements decay as they approach the outer edge.

The example above shows site visits to www.cs.cmu.edu, www.facebook.com (and by extension, static.ak.fbcdn.net), www.activitiesboard.org, and finally www.cmu.edu, in that order.

Network Bubbles in Action

Code Snippets

Drawing Circles

Create a circle in the middle of the canvas (offset by a little bit of jitter on the x&y axes) for a radius that’s a function of the content length.

Body new_body = physics.createCircle(width/2+i, height/2+j,sqrt(msql.getInt(5)/PI) );
new_body.setUserData(msql.getString(4));

Host Label

If the radius of the circle is sufficiently large, label it with the hostname.

if (radius>7.0f) {
    textFont(metaBold, 15); 
    text((String)body.getUserData(),wpoint.x-radius/2,wpoint.y);
  }

tcpdump Processing

Feeding tcpdump input into sed

#!/bin/bash
tcpdump -i wlan0 -An 'tcp port 80' | 
while read line
do
if [[ `echo $line |sed -n '/Host:/p'` ]]; then 
    activehost=`echo $line | awk '{print $2}' | strings`
...
fi

The full source

Comments (1)

1 Comment

Meg, please fill in the blog post ASAP with documentation of your project. Here are the comments from the PiratePads.

—————————————–
say content type on the big bubbles, so we can compare
nice physics on the bubbles

plz, slooooow down — can tell you are very passionate, but hard to take in the information if you go too fast >> especially those of us that are less internet-knowledgeable (me), i’m basically just seeing acronyms, urls and differently sized circles.

Good stuff good job. I think it’ll be more interesting if you recorded a session of people using it.

Watch your jargon! People are getting lost. Yes… too much jargon in the slides.

Man, you should have made the CMU data work! You’re lucky to have access to that.

The bubble simulation seems a little slow. I would ditch Processing and go lower level since you’re dealing with so much data.

What’s interesting about this data is the opportunity for voyeurism. Don’t wimp out! Spy on people! I Agree.

The visualization would be very interesting if we were seeing people’s traffic. but I GUESS it’s immoral. Even if it was a snapshot of people who agreed to use the App – I want to see real traffic. I want to see my own traffic – I want to see where I go, compare it to other people (even anonymously?) Overlay them, etc.

Great idea! I agree with the comment above. It would really be cool to see people’s traffic…maybe you could break it down by person.

Too bad the slides get cut.
What’s the duck to the right?
Don’t worry, sniff our packets! It’s easier to understand if you do and we have nothing to hide (yeah right). KB is maybe less relevant compared to amount of files?
I would like to see an individual’s patterns (anonymized).

i like the physics of the bubbles. is there a library for this or did you code that yourself?

>> Isn’t it obvious? ITS A TRAP ITS A TARP

I’d like to see my traffic and what is it that takes all the bandwidth when downloading :) nice application! The only thing I would change is the colors…

I like the simplicity of the visualization, but I’m not totally clear about what information is being displayed. The use of color works well. It would be interesting to see how this would evolve over time, and how it would look for different people.

Talk slower please [Agreed, you put a lot of work into the project, and you are selling it short by rushing through the critical processes of the visualization]

audience assumptions — need a little bit more explanation of what these networky things are. it would have been nice to see some of the failed attempts, if there were visuals.
nice! I think it needs a legend of some kind.

not entirely sure what is being talked about…
starting to make sense, the visuals are starting to clarify things for me. could still use a little direct explanation WHILE showing the visualization. But very cool, I really appreciate how robust it is.

I wonder why the bubbles don’t appear immediately after you press enter on the address bar. Does it always have to wait until the page has completely finished loading? I’m not sure how much more insight color-coding the top level domain gives, but it does make the visualization look prettier.

It was a little hard to follow you; talk slower, focus more on the why and less on the how; while it’s very technically impressive, I think that unless you are presenting to people you are 100% sure are on the same level as you, your presentations are going to have to be more about why things act and look like they do and less about how.
-I agree with the point of “why not how”. since the technical side isn’t a problem for you, you might try defining a design/art problem or scope and letting that drive the inquiry a little bit more.

I think the bubbles need more context. More than just where they came from.

Practice presenting to a non-technical audience … again, technically impressive-

Live demo was cool & performed well, though it would have been better if there were a way to leave it on the screen & let it populate rather than jumping back & forth.

A one-sentence overview at the beginning to remind us of the overall objective would have been helpful. Some folks (i.e. visitors to the class) are just tuning in. Also, your slides are cut off (on the right side) in a weird way — please double-check your laptop VGA-to-projector connections. Linux, it’s not alwasys easy w/ projectors (ah)

I know the coloring is based on the TLD (edu, com, gov, etc), but it would be interesting if you could color the data according to the site’s colors.. Just thinking out loud, I know this wouldn’t be easy.

You have made some basic assumptions that you haven’t clearly articulated — like the idea that a viewer’s desire to understand their surfing activity is best monitored by measuring the amount of net traffic — as measured by kilobytes/megabytes. Best to mention this stuff clearly at the outset.

Please don’t call a project you worked on for some time and want to present to us a “thingy”
talking a bit fast
could this be running in realtime as we are surfing? that would be the most impressive way to demo it
could use an animation, demoing how it works over a longer period of time
maybe map size to requests as opposed to raw request size, you could also work on making a sort of network which builds up over time based on which requests go where and when

a lot of info with IP addresses here, plz assist with a slide graphic as to why this is crazy. missing slide text, but we have a swimming rubber duckey! Dude I dig the duck. Is that in her sidebar all the time? That’s a cpu activity graph .. depends how high the water is. What are the bubbles in the duck’s water?
The size of the bubble / data size correlation is awesome. That is a greate way to visualize what takes up data in a request (oh its a single request, I thought it was part of a request.)
(it’s called wmbubble)

So the CMU-wide data-set turned out to be unusable? It’s cool to use live data, but it’s really unfortunate that the CMU data-set was bad. I agree with Golan about the perceptual bias given to large files. That banner ad looks like a really big deal.

Rather than colorizing them by the TLD type, I would have colorized them based on the content-type. It’d be cool to see images / binary / text separated.

Impressed by the tech behind this, could help less tech-savvy listeners if you slowed down a little while explaining. I want to see this merged w/ Riley’s project.

Comment by Golan Levin — 4 February 2011 @ 3:16 pm