I was interested in visualizing one of the museum image databases. I initially chose the Carnegie Museum of Art but after going through several museum datasets and websites, I was significantly more interested in Cooper Hewitt‘s exhibits and chose to work with them.

Cooper Hewitt has a really well documented API (Cooper Hewitt API) that provides access to the museum’s entire collection along with useful metadata. I wrote image scrapers in Python to download image sets by exhibition, object type, tags and random samples. To get an idea of the museum’s offerings, I ran t-SNE on a set of 500 random images and a set of 500 images from a recently added Pixar exhibition. The output from the Pixar images is shown below.

t-SNE output for a Pixar exhibition

On browsing more collections, I narrowed down to using images of posters for my project. Cooper Hewitt has just over 3000 posters. I am currently running a scraper to download these by using the API to search by object type. I ran t-SNE on a smaller set of 172 posters from just one exhibition. The output and a few sample posters are shown below. I plan to explore the entire set of posters tonight and scope myself further down to a subset by tomorrow.

t-SNE output for a small poster exhibition