Skyscrapercity Building Image Classification
I built a program that scrapes images of under-construction buildings from Skyscrapercity and uses machine learning to classify them as night/day images and skyline/individual images. It is my first personal machine learning project, and I’m amazed by how well it performs!
Skyscrapercity is an architecture and urban development forum where users post updates about construction projects in their communities. Although the forum is a goldmine of information for geeks like me, it is very poorly organized and presented. My goal in doing these classifications was to induce some order into this messy dataset.
In the end, I was able to successfully classify images as night/day and skyline/individual. The night/day classification was impeccable, and almost every single image was classified correctly. Surprisingly, the skyline/individual classification was fairly accurate as well, and I was blown away by the program’s ability to solve such a complex, human problem of image classification.
I used Beautiful Soup, a Python web scraping library to pull massive datasets of images from the site. Then, I used Pillow (a wrapper around PIL) to extract the images’ features, notably the mean and standard deviation of the RGB pixel values. I created a GUI to label a sample of images, which became my training data. Finally, I trained a SVM classifier from the Scikit Learn machine learning library using the training data and used it to classify new images.