Project 2-Infoviz-Nisha Kurani – Final

by nkurani @ 8:13 am 31 January 2011

Project Focus

What makes a movie a hit?  Do people watch movies with strong plots?  Is there a trend in which genre people enjoy watching?  I’ve always assumed that box office hits are always the movies with the most amazing plots.  Why else would so many people spend their money to watch these movies?  Why would some people watch them multiple times?

Finding the data

For as long as I could remember, I’ve been consulting IMDB ( to see how people have ranked a movie I’m about to watch.  Many times I use the site to guide my decision.  I found a list of the Top 474 Box Office Hits; however, when I went to download the data, I realized that it was going to be very challenging.  After struggling with it, I decided to find another source.  I ended up using a list from Box Office Mojo ( and extracted the data from the site into 4 text files.  Each file included 100 of the top 474 all-time box office hits.  In the end, I decided to focus on the top 100 since it was a more manageable data set.  I felt using more than 100 would clutter the screen.

While searching for more data, I brainstormed other data that I could provide in addition to the information from the Box Office Mojo site that provided me with each movie’s rank, title, studio, lifetime gross, and year. I fiddled with the possibilities of including the movie’s genre, rating, director, release date, lead actors and more.  To find the corresponding information for each movie, I had to do some digging.  I searched for an IMDB API which lead me to their “Alternative Interfaces.”  If you’re interested in the data, I acquired my set from:

Scraping the data

Scraping the data took the majority of my time.  I ended up downloading really big list files that included ratings/genre data from tv shows to international movies!  The files I used (genres.list and ratings.list) were very messy.  It was challenging, but I eventually extracted the data and saved them into text files that I could access more conveniently later.  What was even more frustrating was figuring out ways to display the information.  New to processing, I always had to do some research before executing my idea.

While figuring out ways to scrape the data, I found a few data visualizations about box office hits that I found useful.  Here they are:

Displaying the Data

I decided to plot the ratings on the y-axis and timeline on the x-axis.  I would then increase/decrease the size of the circle depending on how much money that movie made.  I also changed the color of the circle based on the movie’s genre.  In the end, I found it a useful chart that would provide its viewer with a quick way to view trends in box office movies.  I found it interesting that not all the highest ranking movies would get the best ratings.

I’ve included my code in a zip file in case you’d like to take a look at it.  Please let me know if you have any questions or suggestions regarding my infoviz.  I look forward to making some of the changes that were discussed in class.

Below is a screenshot of the latest representation of my application.  When you roll over the buttons, it displays the movie name.  When you click on a bubble, you get additional information about the movie including: box office gross, year, genre, rating, and the number of votes it received.


After the critique, I realized I should have tested out more variations of displaying the data in processing instead of just sketching and disregarding things.  Also, I could make the visualization stronger by providing information like release dates, directors, and a “star factor” that would list the number of big stars in the movie.  I plan on exploring these areas in the weeks to come.  Since the critique, I’ve redone the colors to make the differences in genre more visible and labeled my axes.  I tried to make it so that when you scroll over the genre key to the right of the page, you only see the bubbles that are part of that genre; however, the code wasn’t functioning correctly so I decided to leave it out for now.

This assignment was really fun.  It was the first time I’ve ever created a visualization with code.  I found it challenging at times, but it was definitely worth it.  Over the course of this project, I’ve developed a long list of visualizations I’d like to create, so I’m excited to test those out in the weeks to come.


1 Comment

  1. Hi Nisha, here are comments from the PiratePads.


    the journnnnnneeeeey of life >> like your brainstorming, good that you played with multiple ideas — ftp from imdb? u’s a hax0r >> share the link plz?

    are you really passionate about this data set? what do you hope to learn from this trending

    what is a TRUE facebook friend exactly? the ones that don’t poke “I invented poking” – Mark Zukerberg “I invented the facebook” – Al Gore

    A little more energy in your voice.

    It’s really comedy hour today this is the cool place to be during IA&CD
    Pirate Pad A – it’s hopping They should make people type pirate language in Pirate Pad, Yarrrrrr

    LOTTA BUBBLES ALL UP IN THIS BITCH < < You must have a ton of RAM. too many open apps <-- indeed... crazy... even XCode you mean even Chrome, Firefox AND Safari? Heh. Ran out of heap space. Interesting idea, but there doesn't seem to be a trend in this visualization. Maybe that's alright, and what we get is that there is no trend. (maybe because she grouped many into "other" losing some info and having most of the viz being that category). Even taking away genre, though, I can't find much of a trend between score and $$. (maybe because of the inflation issue). true. How do you memory manage Processing? I also had some issues with that. I increased the heap to several gigs but it crashed after a while of opening thousands of files (didn't know how to delete them from memory). >> If you remove all references to it then the garbage collector should take care of it. >> How do you remove references? Set the variable referencing it to null. >> Thanks. Will try that. They were PImages or something like that.

    ^^ You’re probably trying to open too many files at once.
    ^^you can increase your heap in processing. Pooowers of Two. Its ridiculously small by default.

    I’m not sure if this tackles your idea of “good movie” though. Agree completely. Marketability and box office performance is a different beast entirely and rarely factors in. – the top 100 are user voted, what about YOUR top 100?

    Genre is a tough thing to separate data by; whereas a numerical rating is standardized, some genres overlap. Also, we need to have some stipulations for anyone using black backgrounds in their programs.

    I like your journey to your final concept. Agree with other comments about this solution maybe not communicating what makes a good movie.

    Perhaps a little TMI about your struggle to find a topic — mostly just because you spent valuable time describing your indecision when you could have jumped into your project — which is strong work.

    Good challenge getting the data.
    More Hans Rosling circles, hey?

    HAve you looked at previous visualizations of films/IMDB? See Schniederman’s Starfield Display from the early 1990s, etc.

    Is that adjusted for inflation? A dollar in 1975 means something different today.
    I’m concerned about how well this answers your question, “what makes a good/bad movie”.
    Label your axes!!

    I think some interaction allowing browsing by genre (on genre-rollover or something) would be useful… *as mentioned in presentation

    Perhaps the personal lessons aren’t as useful to share (though still valuable)

    I would be interested in seeing what other data is available in the dataset.

    Glad you finally settled on a dataset. Good context on the interaction with the circles.

    How does this expand on Lee Byron’s work?
    I feel like there have been a number of visualizations comparing ratings to box office gross before.

    You probably don’t need to commit as much time as you do to the problems you had; I think you should put more time on the project itself.

    Maybe move the y-axis over so Jaws isn’t leaking out of the graph. There’s a lot more datapoints in the present than in the past, is that just because there’s less documentation on movies from the past?

    wish the projector was brighter and the processing sketch wasn’t larger then the projector resolution
    the colors are not contrasting enough
    could boil the 11 genres down to maybe 3-4
    good to see a Lessons Learned slide
    I’d like to see a more fine grained time line (months) to see trends

    This is pretty cool – there are a lot of different metrics being displayed here in a way that’s pretty easy to understand. I like how you can hover over the bubbles and see more information about the individual movies.

    I like how you have more ideas for interactions that you could add to the visualization. Director and studio information would be pretty cool – especially if it reveals a cool trend in recent years (Pixar’s movies are ALL in the top 100 I think)

    Month that it was released would be pretty cool

    cool visualization, I like the idea of seeing movies stack up against each other.

    fyeah, bubbles! Perhaps a focus on 1 type of movie would yield easier to spot conclusions.

    Comment by Golan Levin — 4 February 2011 @ 3:29 pm

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2019 Interactive Art & Computational Design / Spring 2011 | powered by WordPress with Barecity