Interactive Art & Computational Design / Spring 2011 » Mark Shuster – InfoViz

Mark Shuster – InfoViz – “Video Killed the Radio Star”

by mshuster @ 8:41 am 26 January 2011

Do people watch what they are told to listen to?

Even in the age of digital consumption consumption, millions of people still pass their time listening to music on the radio. Stuck in their cars or cubicles, listeners tune in to hear whatever the DJ, station, and ultimately, the music industry wants them to. When these people are given the chance to self-select their music preferences, and watch music videos on YouTube, will their choices mirror what the music industry has labeled as worthy of consumption? This visualization attempts to show whether radio play is congruent with video play and what the relationship is between the two mediums, if anything.

Datasets and Implementation

Billboard.com publishes weekly music charts involving many metrics on a weekly, and sometimes daily, basis. Conveniently, they also implemented an API for their chart data. Somewhat less conveniently, they neglected to detail how users are supposed to know which chart corresponds to which chartid. After many hours of searching, it was becoming apparent that radio chart data was not going to be available through their API system.

I resorted back to Google to find a new radio chart data source and was met with a few options, none of them good, and all of them statically published. The dataset that I settled on was produced by Mediabase, a company that produces “charts and analysis based on the monitoring of 1,836 radio stations in the US and Canada.” While the data was not available via API, it was human readable.

To actually make use of the chart data, I employed a Python script and the BeautifulSoup html parsing library. Using the BeautifulSoup library made getting at the three columns of data that I needed much easier. Once I had parsed the rank, artist and song, data, it was time to mash it against YouTube.

YouTube’s Search API provides beautiful results in JSON, returning the “most relevant” result first. Each radio song is then associated with a video that comes along with a thumbnail. The thumbnails are then processed using the Python Image Library to render their size relative to the most viewed video. This creates one of the visualization elements of being able to see that a video with approximately 100,000,000 views will have twice the pixel area than a video with 50,000,000 views.

The resulting data is arranged and sent to an HTML templating engine powered by the JinJa2 module that makes quick work of assembling the resulting web page. Running on the web page is a jQuery script that controls the dynamic layout and layering of thumbnails and allows users to click and drag around images to compare image sizes.

The final product is hosted at markshuster.com/iacd/1/ and currently displays data from the week of Jan 17 – 24. The source code, in Python, HTML, and JavaScript is available for download.

Findings

Looking at the data through the visualization, it can sometimes be easy to see examples of songs that while less popular on the radio, are more popular on Youtube. There is significant variability as the difference in frequency between #1 and #40 is about 40x on the radio chart and about 100x or more on the YouTube chart. This variance can create some interesting blips that are easy to distinguish among the data. Also unsurprising was that many popular radio songs are also big hits on YouTube and appeal to very large audiences.

Reflections

I think that the visualization is much less than perfect. Firstly, the current implementation isn’t “live” in that it isn’t dynamically updating and querying. This can be solved by placing the script in an appropriate web-facing python environment which I don’t currently possess. Second, there is a conceptual conflict between radio chart data and YouTube view data. While radio chart positions will climb and fall, week to week, YouTube view data will accumulate over time. This mismatch can lead to potentially confusing results for songs that are new to the radio chart or old on YouTube. Third, one key element of the final visualization was to be able to play video within the context of the visualization, and although the code to perform this function is within the jQuery script, the publishers of the video forbid YouTube to allow the videos to be embedded on other pages.

Given a second iteration, I may have found more congruent sets of data, or provided the ability to look at datasets over time to see trends. There could also be value in mapping the number of radio plays or the number of video likes and dislikes to find deeper trends in music palatability.

Comments (2)

2 Comments

Comments from PiratePad A:

Beautiful soup – nice work
you can make this live with php scraping
maybe change the visual layout, looks a bit dated in terms of web design, fonts, etc

Don’t forget the $! LIke that this is online. Does it update every day or week? A graph or something highlighting ones where there is a large difference between youtube and top 40 would be interesting

Really cool concept. I think it’s cool to see how what you listen to on the radio matches up against YouTube videos.

Katy Perry: Cause, you know, fireworks shooting out of boobs is inspirational
http://www.youtube.com/watch?v=QGJuMBdaqIw

Ruby equivalent of beautiful soup: Nokogiri

It’s difficult to see the bigger ones because of the overlap. How could you organize this so you can get an idea of the whole set of options? A grid? How does the organization aid comprehension?

Visualizing payola is much more interesting. How could you get at this data?

Check out this cool software radio: http://zao.jp/radio/66RF/index_e.php
Maybe you could use it to do some sort of analysis of live radio. It can tune into any frequency (even taxi cab radio frequencies) and send the data a the computer.

What interests me about commercial radio: how they play different songs on an album. Usually they play the most popular song from an album to death, then play the number two song to death and so on.

Nice narrative presentation. There is a time-scale issue with the billboard chart pertaining to last week, and YouTube pertaining to all-time views. It would be interesting to see all-time youtube views against historical billboard rankings and their travel.

Really good pecha-kucha timing! Nice communicative images. The interaction is really nice; you can really play around with the pictures.

Presentation content is great, makes me “lol” and is informative at the same time. Sorting options in addition to the drag and drop? (Sort by size, by number come to mind.)

Nice technical background. The photo collage is nice, but it would be useful to see things in a more structured fashion. Maybe having a plot sorted by size of the frame, or by the view count?

The gui needs to be cleaned up but this would work well as a built in feature for YouTube, like a homepage-based aggregator of what people are watching. Especially due to the real-time/interactivity aspect.

Verry cool. Great presentation, nice would be cool, but I dont know if more that weekly updates are needed. I love the UI and the stacking.Taking in more data, maybe lines between the boxes showing position changes. Red lines for a drop or green for a gain ect.

I like the collage feature. I would like to have an option to automatically distribute the elements of the collage along the window, though. Maybe this would facilitate comparing results. It would be very cool to have this real-time :)

Very cool. One thing to note though, although the numbers scale in 1 dimension, the pictures scale in 2 dimensions.

I think it would be easier to see the comparison of radio/youtube if the thumbnails were visually organized (even just in a column), as opposed to layered on top of each other. It would be nice if you built in some pre-sets that would let me see some inferences. e4e4ssLike your caveats/improvements.

Good use of humor in your presentation! I like that as you explore and move things around new layers of information are revealed. It really invites you to explore the data. However, without this exploration the comparison you are making is not immediately clear.

>>> I am cringing, yet I am also jealous. +2 Upvote lol karma FUUUUUUUUUUUUUUUu

This is an interesting idea: If radio in effect is in control of what songs and artists become popular, why do we still go on YouTube and listen to the same songs when we are given that control?

Comment by Golan Levin — 26 January 2011 @ 3:14 pm
Comments from PiratePad B:

Mark, I’m really delighted with your comfort and agile use of the Pecha Kucha presentation format. The contextualization of the problem — both in its personal dimension (meaning to you) but also its social dimensions (remarks about how this music is made) — was very well done.

You’re a swiss army knife. Please be sure to detail the quick outline of the different technologies you used to create this — Python, Jquery, computer vision (Java?) etcetera. This is some very nicely executed scraping.

That said, I’m not 100% satisfied with the visual design. The typographics (Stencil, Courier) don’t look great, and the image overlapping (plus those white edges) looks not-so-good. (What accounts for the spatial layout? Is there a radial/angular or X/Y mapping to the positions of the images?) You also haven’t done a great job explaining what question or questions that the visualization actually answers.

Yes, from your presentation I was momentarily expecting that you would do some kind os social graph between the singers (Shakira, Kesha, Madonna, Gaga) and the people who actually write/produce the tracks.

This is beast. I was really liking where it was going when you mentioned that the same guy produced all of the songs. ((Agreed)) It’d be cool to see that tied into the final product somehow. Showing the hidden influence of someone like that would be a great direction for another project.

It’s true that a song with tons of youtube hits COULD HAVE BEEN POPULAR and is now on the downward slope. # of views isn’t a metric that shows current popularity the same way radio plays do. Using Last.fm data (which records how many times people actually listen to the songs in their own iTunes libraries day by day) might have been a fix for this.

I’m sure there’s people who wouldn’t like this information to get out, but I’d love to know how many one guy’s it took to write the top 40.

Way to kill it on the presentation, Mark. Enjoyed your representation of the data – i.e., the way your parameters mapped to the visuals. I found the jumble of images (when the webpage first loads) a little messy; people loading your page for the first time could benefit from a more organized initial layout. Yeah thanks-now I have to go. :)

Really well presented.
Super cool interface.
It would be great to auto-sort in a tree-map or similar viz.
Should sent to Mashable.com and similar sites.

I think it could be interesting to have these images moving across the screen in some sort of meaningful way.

I think the visual aspect is a little jumbled in the end; maybe you could arrange it so everything isn’t overlapping. I think you could be a little slicker in your use of text and maybe use the numbers in a more interesting way; the actual # of views seems a little tacked on in the visuals when in theory it’s one of the most important aspects of the data. That said I love the idea and the way you went about it… maybe just polish the end result and you’ll have something really fantastic.

This is an interesting study, however I’m wondering if there is a better way of organizing your results to draw more clear conclusions. I also think that comparing radio and video is a little like apples and oranges when trying to answer your question. For instance, would it be better to compare the “top 40” of youtube music and the “top 40” of radio, instead of the numbers for the same 40 songs, because these songs are going to be popular just because of their production quality.

Comment by Golan Levin — 26 January 2011 @ 3:14 pm