03 Mar 2015

Summary: Mapping a collection of memories of Pleasanton posted on the Facebook group “You Know You’re from Pleasanton if…”.

Explanation: For my visualization project I wanted to map the memories that are posted on this Facebook group: “You know you’re from Pleasanton if…” where people from my hometown reminisce about the places that used to be there, people they knew, activities, etc. I found this especially fascinating because I grew up right as Pleasanton was beginning to lose the “old”, “authentic” feeling that many of the people on this feed are reminiscing about. It’s interesting to think about which places were significant and why and to imagine the discrepancies if I were to overlay my own memories on this map. I scraped the facebook feed for the group and was able to plot a few hundred memories (for any given pin on this map below their are overlapping memories. Each pin below is plotting a cluster of memories that happened in the same spot.)

When you hover over a pin you see what memory is associated with it. Here are some examples:

Here’s another version with circles and no pins. The circle color represents number of memories posted about that place.



Step 1: Get the entire feed using the Facebook Graph API.

This is output as json and was pretty easy to use; the json contains an array of objects representing messages with information such as date created, user author, likes, message content and corresponding comments. The information I focused on were messages ( “Does anyone remember sliding down hills in the late 70’s with…etc.”) and comments (where people respond with different anecdotes). These messages are really rich in spatial (“at the ice cream shop on main”) and temporal (“in the 60’s”) information. I focused on the spatial information for this project.

Step 2: Build a dictionary of place names (and alternate names).

The main challenge for me to think about was how to match messages (strings with multiple sentences and informal references to places) with a geographic location. I first used a part of speech tagger to tag all words in messages and return nouns and see what turned up. This turned out to only be useful for the initial stage in seeing that many place names would not be caught by just getting proper or even all nouns; for example ‘main’ in ‘main st’ would be tagged as an adjective. Also most place names are phrases, not individual words. I put together a list of places from different sources like old lists of pleasanton businesses, roads, parks, and my own knowledge of the place. Then for each place i used a geocoder to translate those place names to longitude and latitude coordinates. Sometimes just appending the city, state and zipcode to a place name would work as an input but often times I’d have to just manually search for it (especially if the name is of a place that isn’t actually there anymore).

Step 3: Matching messages/comments to place names.

Instead of trying to get places by first breaking down the messages into individual words (and identifying certain parts of speech etc.), I looked for phrases contained in strings that approximately matched place names. I wanted to account for the spelling errors and slight differences in the way people talk about places (i.e. Rod’s Hickory Pit might be just referred to as rods, or the hickory pit) so rather than searching for exact matches I “fuzzily searched” for phrases that were within some range of edit distance from the place names. This is a step in the right direction, and there is a library that allows you to do this fuzzy search and control aspects like the threshold. So it worked better than I had thought it would, but could definitely be improved for phrases like “pleasanton middle school” which has words that match many different places and might get matched to a different middle school for example just because of the overlap in terms. Then once I had latitude and longitude information I created a new csv by pairing each message with its coordinates (and other information) . To plot these more easily I converted this to geojson format.