Category Archives: 12-datascraping

mmontenegro

27 Jan 2015

For my data scrapping assignment I first got the final results from the Women’s Tennis tournaments: US Open and Aus. Open. After having this data, I ran a Twitter query trying to grab all the tweets from the days of the tournament, which mentioned the winner, and the loser of the specific match.

I want to answer the question of which of the players was more popular and what words were mostly related to that player the day of the match. I had some issues using Temboo to extract my data. I had to use the native Twitter API. I will try again to use Temboo to try and collect more data.

Data of the tournament:
UsOpen:
Winner: Cepelova J. Loser: Torro Flor M.T. Date: 25/08/14
Winner: Radwanska A. Loser: Fichman S. Date: 25/08/14
Winner: Kerber A. Loser: Pervak K. Date: 25/08/14
Winner: Petkovic A. Loser: Jabeur O. Date: 25/08/14
Winner: Nara K. Loser: Wozniak A. Date: 25/08/14
Winner: Bencic B. Loser: Wickmayer Y. Date: 25/08/14
Winner: Safarova L. Loser: Babos T. Date: 25/08/14
Winner: Zheng S. Loser: Voegele S. Date: 25/08/14
Winner: Halep S. Loser: Collins D.R. Date: 25/08/14
Winner: Peng S. Loser: Zheng J. Date: 25/08/14
Winner: Puig M. Loser: Smitkova T. Date: 25/08/14
Winner: Hantuchova D. Loser: Oprandi R. Date: 25/08/14
Winner: Kudryavtseva A. Loser: Duan Y.Y. Date: 25/08/14
Winner: Pironkova T. Loser: Knapp K. Date: 25/08/14
Winner: Williams V. Loser: Date Krumm K. Date: 25/08/14
Winner: Jankovic J. Loser: Jovanovski B. Date: 25/08/14
Winner: Larsson J. Loser: Razzano V. Date: 25/08/14
Winner: Rodionova An. Loser: Giorgi C. Date: 25/08/14
Winner: Bacsinszky T. Loser: Bertens K. Date: 25/08/14
Winner: Peer S. Loser: Konta J. Date: 25/08/14
Winner: Stephens S. Loser: Beck A. Date: 25/08/14
Winner: Sasnovich A. Loser: Schmiedlova A. Date: 25/08/14
Winner: Cornet A. Loser: Hesse A. Date: 25/08/14
Winner: Lucic M. Loser: Muguruza G. Date: 25/08/14
Winner: Vinci R. Loser: Ormaechea P. Date: 25/08/14
Winner: Dulgheru A. Loser: Pliskova Kr. Date: 25/08/14
Winner: Lisicki S. Loser: Abanda F. Date: 25/08/14
Winner: Brengle M. Loser: Glushko J. Date: 25/08/14
Winner: Begu I. Loser: Soler Espinosa S. Date: 25/08/14
Winner: Errani S. Loser: Flipkens K. Date: 26/08/14
Winner: Wozniacki C. Loser: Rybarikova M. Date: 26/08/14
Winner: Sharapova M. Loser: Kirilenko M. Date: 26/08/14
Winner: Barthel M. Loser: Zhang S. Date: 26/08/14
Winner: Cirstea S. Loser: Watson H. Date: 26/08/14
Winner: Dellacqua C. Loser: Mayr P. Date: 26/08/14
Winner: Wang Q. Loser: Kania P. Date: 26/08/14
Winner: Erakovic M. Loser: Kuznetsova S. Date: 26/08/14
Winner: Lepchenko V. Loser: Van Uytvanck A. Date: 26/08/14
Winner: Kanepi K. Loser: Parmentier P. Date: 26/08/14
Winner: Pliskova Ka. Loser: Meusburger Y. Date: 26/08/14
Winner: Stosur S. Loser: Davis L. Date: 26/08/14
Winner: Ivanovic A. Loser: Riske A. Date: 26/08/14
Winner: Vesnina E. Loser: Chan Y.J. Date: 26/08/14
Winner: Rogers S. Loser: Zanevska M. Date: 26/08/14
Winner: Zahlavova Strycova B. Loser: Barty A. Date: 26/08/14
Winner: Pennetta F. Loser: Goerges J. Date: 26/08/14
Winner: Hercog P. Loser: Svitolina E. Date: 26/08/14
Winner: Bouchard E. Loser: Govortsova O. Date: 26/08/14
Winner: Niculescu M. Loser: Shvedova Y. Date: 26/08/14
Winner: Mchale C. Loser: Scheepers C. Date: 26/08/14
Winner: Azarenka V. Loser: Doi M. Date: 26/08/14
Winner: Bellis C. Loser: Cibulkova D. Date: 26/08/14
Winner: Kvitova P. Loser: Mladenovic K. Date: 26/08/14
Winner: Cetkovska P. Loser: Koukalova K. Date: 26/08/14
Winner: Vandeweghe C. Loser: Vekic D. Date: 26/08/14
Winner: Diyas Z. Loser: Tsurenko L. Date: 26/08/14
Winner: Makarova E. Loser: Min G. Date: 26/08/14
Winner: King V. Loser: Schiavone F. Date: 26/08/14
Winner: Suarez Navarro C. Loser: Tomljanovic A. Date: 26/08/14
Winner: Gibbs N. Loser: Garcia C. Date: 26/08/14
Winner: Krunic A. Loser: Piter K. Date: 26/08/14
Winner: Keys M. Loser: Gajdosova J. Date: 26/08/14
Winner: Pavlyuchenkova A. Loser: Pereira T. Date: 27/08/14
Winner: Williams S. Loser: Townsend T. Date: 27/08/14
Winner: Peng S. Loser: Radwanska A. Date: 27/08/14
Winner: Bencic B. Loser: Nara K. Date: 27/08/14
Winner: Cornet A. Loser: Hantuchova D. Date: 27/08/14
Winner: Vinci R. Loser: Begu I. Date: 27/08/14
Winner: Larsson J. Loser: Stephens S. Date: 27/08/14
Winner: Kerber A. Loser: Kudryavtseva A. Date: 27/08/14
Winner: Safarova L. Loser: Zheng S. Date: 27/08/14
Winner: Halep S. Loser: Cepelova J. Date: 27/08/14
Winner: Jankovic J. Loser: Pironkova T. Date: 27/08/14
Winner: Petkovic A. Loser: Puig M. Date: 27/08/14
Winner: Sharapova M. Loser: Dulgheru A. Date: 27/08/14
Winner: Wozniacki C. Loser: Sasnovich A. Date: 27/08/14
Winner: Lucic M. Loser: Peer S. Date: 27/08/14
Winner: Lisicki S. Loser: Brengle M. Date: 27/08/14
Winner: Williams V. Loser: Bacsinszky T. Date: 28/08/14
Winner: Errani S. Loser: Rodionova An. Date: 28/08/14
Winner: Dellacqua C. Loser: Wang Q. Date: 28/08/14
Winner: Pliskova Ka. Loser: Ivanovic A. Date: 28/08/14
Winner: Pennetta F. Loser: Rogers S. Date: 28/08/14
Winner: Azarenka V. Loser: Mchale C. Date: 28/08/14
Winner: Vesnina E. Loser: Erakovic M. Date: 28/08/14
Winner: Zahlavova Strycova B. Loser: Niculescu M. Date: 28/08/14
Winner: Gibbs N. Loser: Pavlyuchenkova A. Date: 28/08/14
Winner: Lepchenko V. Loser: Barthel M. Date: 28/08/14
Winner: Williams S. Loser: King V. Date: 28/08/14
Winner: Krunic A. Loser: Keys M. Date: 28/08/14
Winner: Kvitova P. Loser: Cetkovska P. Date: 28/08/14
Winner: Suarez Navarro C. Loser: Vandeweghe C. Date: 28/08/14
Winner: Kanepi K. Loser: Stosur S. Date: 28/08/14
Winner: Makarova E. Loser: Hercog P. Date: 28/08/14
Winner: Diyas Z. Loser: Bellis C. Date: 29/08/14
Winner: Bouchard E. Loser: Cirstea S. Date: 29/08/14
Winner: Peng S. Loser: Vinci R. Date: 29/08/14
Winner: Jankovic J. Loser: Larsson J. Date: 29/08/14
Winner: Errani S. Loser: Williams V. Date: 29/08/14
Winner: Lucic M. Loser: Halep S. Date: 29/08/14
Winner: Bencic B. Loser: Kerber A. Date: 29/08/14
Winner: Safarova L. Loser: Cornet A. Date: 29/08/14
Winner: Wozniacki C. Loser: Petkovic A. Date: 29/08/14
Winner: Sharapova M. Loser: Lisicki S. Date: 30/08/14
Winner: Krunic A. Loser: Kvitova P. Date: 30/08/14
Winner: Pennetta F. Loser: Gibbs N. Date: 30/08/14
Winner: Dellacqua C. Loser: Pliskova Ka. Date: 30/08/14
Winner: Azarenka V. Loser: Vesnina E. Date: 30/08/14
Winner: Williams S. Loser: Lepchenko V. Date: 30/08/14
Winner: Kanepi K. Loser: Suarez Navarro C. Date: 30/08/14
Winner: Makarova E. Loser: Diyas Z. Date: 30/08/14
Winner: Bouchard E. Loser: Zahlavova Strycova B. Date: 31/08/14
Winner: Errani S. Loser: Lucic M. Date: 31/08/14
Winner: Wozniacki C. Loser: Sharapova M. Date: 31/08/14
Winner: Peng S. Loser: Safarova L. Date: 01/09/14
Winner: Bencic B. Loser: Jankovic J. Date: 01/09/14
Winner: Pennetta F. Loser: Dellacqua C. Date: 01/09/14
Winner: Williams S. Loser: Kanepi K. Date: 01/09/14
Winner: Makarova E. Loser: Bouchard E. Date: 01/09/14
Winner: Azarenka V. Loser: Krunic A. Date: 02/09/14
Winner: Peng S. Loser: Bencic B. Date: 02/09/14
Winner: Wozniacki C. Loser: Errani S. Date: 03/09/14
Winner: Makarova E. Loser: Azarenka V. Date: 03/09/14
Winner: Williams S. Loser: Pennetta F. Date: 04/09/14
Winner: Wozniacki C. Loser: Peng S. Date: 05/09/14
Winner: Williams S. Loser: Makarova E. Date: 05/09/14
Winner: Williams S. Loser: Wozniacki C. Date: 07/09/14

AuOpen:

Winner: Bencic B. Loser: Date Krumm K. Date: 13/01/14
Winner: Kudryavtseva A. Loser: Garcia C. Date: 13/01/14
Winner: Kerber A. Loser: Gajdosova J. Date: 13/01/14
Winner: Pliskova Ka. Loser: Parmentier P. Date: 13/01/14
Winner: Hantuchova D. Loser: Watson H. Date: 13/01/14
Winner: Makarova E. Loser: Williams V. Date: 13/01/14
Winner: Falconi I. Loser: Medina Garrigues A. Date: 13/01/14
Winner: Puig M. Loser: Tatishvili A. Date: 13/01/14
Winner: Flipkens K. Loser: Robson L. Date: 13/01/14
Winner: Dellacqua C. Loser: Zvonareva V. Date: 13/01/14
Winner: Li N. Loser: Konjuh A. Date: 13/01/14
Winner: Razzano V. Loser: Van Uytvanck A. Date: 13/01/14
Winner: Pennetta F. Loser: Cadantu A. Date: 13/01/14
Winner: Hradecka L. Loser: Vekic D. Date: 13/01/14
Winner: Ivanovic A. Loser: Bertens K. Date: 13/01/14
Winner: Bouchard E. Loser: Tang H.C. Date: 13/01/14
Winner: Keys M. Loser: Mayr P. Date: 13/01/14
Winner: Beck A. Loser: Martic P. Date: 13/01/14
Winner: Stosur S. Loser: Zakopalova K. Date: 13/01/14
Winner: Goerges J. Loser: Errani S. Date: 13/01/14
Winner: Safarova L. Loser: Glushko J. Date: 13/01/14
Winner: Barthel M. Loser: Zhang S. Date: 13/01/14
Winner: Dolonc V. Loser: Arruabarrena-Vecino L. Date: 13/01/14
Winner: Zheng J. Loser: Vinci R. Date: 13/01/14
Winner: Niculescu M. Loser: Peer S. Date: 13/01/14
Winner: Kumkhum L. Loser: Kvitova P. Date: 13/01/14
Winner: Riske A. Loser: Vesnina E. Date: 13/01/14
Winner: Lisicki S. Loser: Lucic M. Date: 13/01/14
Winner: Wickmayer Y. Loser: Pfizenmaier D. Date: 13/01/14
Winner: Pironkova T. Loser: Soler Espinosa S. Date: 13/01/14
Winner: Davis L. Loser: Vickery S. Date: 13/01/14
Winner: Williams S. Loser: Barty A. Date: 13/01/14
Winner: Cornet A. Loser: Hercog P. Date: 14/01/14
Winner: Wozniacki C. Loser: Dominguez Lino L. Date: 14/01/14
Winner: Zahlavova Strycova B. Loser: Hsieh S.W. Date: 14/01/14
Winner: Azarenka V. Loser: Larsson J. Date: 14/01/14
Winner: Mchale C. Loser: Chan Y.J. Date: 14/01/14
Winner: Halep S. Loser: Piter K. Date: 14/01/14
Winner: Suarez Navarro C. Loser: King V. Date: 14/01/14
Winner: Voskoboeva G. Loser: Begu I. Date: 14/01/14
Winner: Cibulkova D. Loser: Schiavone F. Date: 14/01/14
Winner: Giorgi C. Loser: Sanders S. Date: 14/01/14
Winner: Diyas Z. Loser: Siniakova K. Date: 14/01/14
Winner: Voegele S. Loser: Mladenovic K. Date: 14/01/14
Winner: Lepchenko V. Loser: Tsurenko L. Date: 14/01/14
Winner: Nara K. Loser: Peng S. Date: 14/01/14
Winner: Pavlyuchenkova A. Loser: Pereira T. Date: 14/01/14
Winner: Minella M. Loser: Witthoeft C. Date: 14/01/14
Winner: Jankovic J. Loser: Doi M. Date: 14/01/14
Winner: Radwanska A. Loser: Putintseva Y. Date: 14/01/14
Winner: Govortsova O. Loser: Duan Y.Y. Date: 14/01/14
Winner: Muguruza G. Loser: Kanepi K. Date: 14/01/14
Winner: Schmiedlova A. Loser: Babos T. Date: 14/01/14
Winner: Rogowska O. Loser: Duque Marino M. Date: 14/01/14
Winner: Morita A. Loser: Kichenok N. Date: 14/01/14
Winner: Tomljanovic A. Loser: Majeric T. Date: 14/01/14
Winner: Erakovic M. Loser: Cirstea S. Date: 14/01/14
Winner: Knapp K. Loser: Ormaechea P. Date: 14/01/14
Winner: Rybarikova M. Loser: Petkovic A. Date: 14/01/14
Winner: Stephens S. Loser: Shvedova Y. Date: 14/01/14
Winner: Jovanovski B. Loser: Cepelova J. Date: 14/01/14
Winner: Svitolina E. Loser: Kuznetsova S. Date: 14/01/14
Winner: Meusburger Y. Loser: Scheepers C. Date: 14/01/14
Winner: Sharapova M. Loser: Mattek-Sands B. Date: 14/01/14
Winner: Barthel M. Loser: Kumkhum L. Date: 15/01/14
Winner: Safarova L. Loser: Hradecka L. Date: 15/01/14
Winner: Li N. Loser: Bencic B. Date: 15/01/14
Winner: Makarova E. Loser: Falconi I. Date: 15/01/14
Winner: Niculescu M. Loser: Lisicki S. Date: 15/01/14
Winner: Williams S. Loser: Dolonc V. Date: 15/01/14
Winner: Pennetta F. Loser: Puig M. Date: 15/01/14
Winner: Hantuchova D. Loser: Pliskova Ka. Date: 15/01/14
Winner: Kerber A. Loser: Kudryavtseva A. Date: 15/01/14
Winner: Dellacqua C. Loser: Flipkens K. Date: 15/01/14
Winner: Riske A. Loser: Wickmayer Y. Date: 15/01/14
Winner: Zheng J. Loser: Keys M. Date: 15/01/14
Winner: Bouchard E. Loser: Razzano V. Date: 15/01/14
Winner: Stosur S. Loser: Pironkova T. Date: 15/01/14
Winner: Davis L. Loser: Goerges J. Date: 15/01/14
Winner: Ivanovic A. Loser: Beck A. Date: 15/01/14
Winner: Diyas Z. Loser: Erakovic M. Date: 16/01/14
Winner: Sharapova M. Loser: Knapp K. Date: 16/01/14
Winner: Cornet A. Loser: Giorgi C. Date: 16/01/14
Winner: Pavlyuchenkova A. Loser: Minella M. Date: 16/01/14
Winner: Svitolina E. Loser: Rogowska O. Date: 16/01/14
Winner: Suarez Navarro C. Loser: Voskoboeva G. Date: 16/01/14
Winner: Halep S. Loser: Lepchenko V. Date: 16/01/14
Winner: Cibulkova D. Loser: Voegele S. Date: 16/01/14
Winner: Muguruza G. Loser: Schmiedlova A. Date: 16/01/14
Winner: Wozniacki C. Loser: Mchale C. Date: 16/01/14
Winner: Radwanska A. Loser: Govortsova O. Date: 16/01/14
Winner: Stephens S. Loser: Tomljanovic A. Date: 16/01/14
Winner: Azarenka V. Loser: Zahlavova Strycova B. Date: 16/01/14
Winner: Jankovic J. Loser: Morita A. Date: 16/01/14
Winner: Nara K. Loser: Rybarikova M. Date: 16/01/14
Winner: Meusburger Y. Loser: Jovanovski B. Date: 16/01/14
Winner: Kerber A. Loser: Riske A. Date: 17/01/14
Winner: Pennetta F. Loser: Barthel M. Date: 17/01/14
Winner: Williams S. Loser: Hantuchova D. Date: 17/01/14
Winner: Li N. Loser: Safarova L. Date: 17/01/14
Winner: Makarova E. Loser: Niculescu M. Date: 17/01/14
Winner: Bouchard E. Loser: Davis L. Date: 17/01/14
Winner: Dellacqua C. Loser: Zheng J. Date: 17/01/14
Winner: Ivanovic A. Loser: Stosur S. Date: 17/01/14
Winner: Sharapova M. Loser: Cornet A. Date: 18/01/14
Winner: Halep S. Loser: Diyas Z. Date: 18/01/14
Winner: Jankovic J. Loser: Nara K. Date: 18/01/14
Winner: Cibulkova D. Loser: Suarez Navarro C. Date: 18/01/14
Winner: Stephens S. Loser: Svitolina E. Date: 18/01/14
Winner: Radwanska A. Loser: Pavlyuchenkova A. Date: 18/01/14
Winner: Muguruza G. Loser: Wozniacki C. Date: 18/01/14
Winner: Azarenka V. Loser: Meusburger Y. Date: 18/01/14
Winner: Pennetta F. Loser: Kerber A. Date: 19/01/14
Winner: Li N. Loser: Makarova E. Date: 19/01/14
Winner: Ivanovic A. Loser: Williams S. Date: 19/01/14
Winner: Bouchard E. Loser: Dellacqua C. Date: 19/01/14
Winner: Cibulkova D. Loser: Sharapova M. Date: 20/01/14
Winner: Halep S. Loser: Jankovic J. Date: 20/01/14
Winner: Azarenka V. Loser: Stephens S. Date: 20/01/14
Winner: Radwanska A. Loser: Muguruza G. Date: 20/01/14
Winner: Li N. Loser: Pennetta F. Date: 21/01/14
Winner: Bouchard E. Loser: Ivanovic A. Date: 21/01/14
Winner: Cibulkova D. Loser: Halep S. Date: 22/01/14
Winner: Radwanska A. Loser: Azarenka V. Date: 22/01/14
Winner: Li N. Loser: Bouchard E. Date: 23/01/14
Winner: Cibulkova D. Loser: Radwanska A. Date: 23/01/14
Winner: Li N. Loser: Cibulkova D. Date: 25/01/14

Some of my results from Titter are:

{
      "metadata": {
        "iso_language_code": "en",
        "result_type": "recent"
      },
      "created_at": "Mon Jan 26 21:42:51 +0000 2015",
      "id": 559828842575958000,
      "id_str": "559828842575958017",
      "text": "RT @usopen: We're watching #usopen champ @serenawilliams vs. @GarbiMuguruza at @AustralianOpen on @ESPN3 live stream. Who do you think wins…",
      "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
      "truncated": false,
      "in_reply_to_status_id": null,
      "in_reply_to_status_id_str": null,
      "in_reply_to_user_id": null,
      "in_reply_to_user_id_str": null,
      "in_reply_to_screen_name": null,
      "user": {
        "id": 1051485312,
        "id_str": "1051485312",
        "name": "✩ ᏧḕᎯn ᑬ ᖇᎯᕍᎯ ᖴ ® ✈",
        "screen_name": "jeanrada1980",
        "location": "La Guaira",
        "profile_location": null,
        "description": "ρυє∂σ α∂мιяαя мαѕ иσ яєи∂ιя ςυℓтσ α αℓgυιєи. ρяєfιєяσ мσℓєѕтαя ςσи νєя∂α∂єѕ α ςσмρℓαςєя ςσи α∂υℓαиςιαѕ. ρяαgмÁтιςσ мαѕ иσ fαиÁтιςo.",
        "url": null,
        "entities": {
          "description": {
            "urls": []
          }
        },
        "protected": false,
        "followers_count": 1190,
        "friends_count": 1079,
        "listed_count": 6,
        "created_at": "Tue Jan 01 00:38:10 +0000 2013",
        "favourites_count": 50,
        "utc_offset": -16200,
        "time_zone": "Caracas",
        "geo_enabled": false,
        "verified": false,
        "statuses_count": 47022,
        "lang": "es",
        "contributors_enabled": false,
        "is_translator": false,
        "is_translation_enabled": false,
        "profile_background_color": "9AE4E8",
        "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/559172977749540864/_1rVOp_U.jpeg",
        "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/559172977749540864/_1rVOp_U.jpeg",
        "profile_background_tile": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/556241533045723137/a8JJeWVm_normal.jpeg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/556241533045723137/a8JJeWVm_normal.jpeg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/1051485312/1422153246",
        "profile_link_color": "0084B4",
        "profile_sidebar_border_color": "FFFFFF",
        "profile_sidebar_fill_color": "DDFFCC",
        "profile_text_color": "333333",
        "profile_use_background_image": true,
        "default_profile": false,
        "default_profile_image": false,
        "following": false,
        "follow_request_sent": false,
        "notifications": false
      },
      "geo": null,
      "coordinates": null,
      "place": null,
      "contributors": null,
      "retweeted_status": {
        "metadata": {
          "iso_language_code": "en",
          "result_type": "recent"
        },
        "created_at": "Mon Jan 26 02:56:35 +0000 2015",
        "id": 559545407731007500,
        "id_str": "559545407731007488",
        "text": "We're watching #usopen champ @serenawilliams vs. @GarbiMuguruza at @AustralianOpen on @ESPN3 live stream. Who do you think wins? #AusOpen",
        "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
        "truncated": false,
        "in_reply_to_status_id": null,
        "in_reply_to_status_id_str": null,
        "in_reply_to_user_id": null,
        "in_reply_to_user_id_str": null,
        "in_reply_to_screen_name": null,
        "user": {
          "id": 14836197,
          "id_str": "14836197",
          "name": "US Open Tennis",
          "screen_name": "usopen",
          "location": "Flushing Meadows, New York",
          "profile_location": null,
          "description": "Official Twitter of the US Open Tennis Championships | 2015 Dates: 8/31 - 9/13 | Terms of Use: http://t.co/MW1WgvIAcE | #usopen",
          "url": "http://t.co/4Ilr5i7adf",
          "entities": {
            "url": {
              "urls": [
                {
                  "url": "http://t.co/4Ilr5i7adf",
                  "expanded_url": "http://www.usopen.org",
                  "display_url": "usopen.org",
                  "indices": [
                    0,
                    22
                  ]
                }
              ]
            },
            "description": {
              "urls": [
                {
                  "url": "http://t.co/MW1WgvIAcE",
                  "expanded_url": "http://bit.ly/1pfzqkQ",
                  "display_url": "bit.ly/1pfzqkQ",
                  "indices": [
                    95,
                    117
                  ]
                }
              ]
            }
          },
          "protected": false,
          "followers_count": 741514,
          "friends_count": 2317,
          "listed_count": 7242,
          "created_at": "Mon May 19 18:35:03 +0000 2008",
          "favourites_count": 8339,
          "utc_offset": -18000,
          "time_zone": "Eastern Time (US & Canada)",
          "geo_enabled": true,
          "verified": true,
          "statuses_count": 15576,
          "lang": "en",
          "contributors_enabled": false,
          "is_translator": false,
          "is_translation_enabled": false,
          "profile_background_color": "004B85",
          "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/509127361405517824/McLvjvZA.jpeg",
          "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/509127361405517824/McLvjvZA.jpeg",
          "profile_background_tile": false,
          "profile_image_url": "http://pbs.twimg.com/profile_images/552160166930419712/GUtMNRwi_normal.png",
          "profile_image_url_https": "https://pbs.twimg.com/profile_images/552160166930419712/GUtMNRwi_normal.png",
          "profile_banner_url": "https://pbs.twimg.com/profile_banners/14836197/1420475864",
          "profile_link_color": "0090FF",
          "profile_sidebar_border_color": "FFFFFF",
          "profile_sidebar_fill_color": "252429",
          "profile_text_color": "666666",
          "profile_use_background_image": true,
          "default_profile": false,
          "default_profile_image": false,
          "following": false,
          "follow_request_sent": false,
          "notifications": false
        },

jackkoo

26 Jan 2015

My project was to find words that are good friends with other words.
This is what I did.

1. Youtube query a word
2. return with descriptions
3. find and rank most common words in the description.

Some searches that was interesting is that the most common word while searching “Sexy” was “Pranks”. For the word “Interaction, we see poetic words such as thought, learning, moments and eyes”.

History returns with words such as “times, interesting, humble, science”. However it is also fun to see that BREAKFAST was a good friend of history.

I think think I’d like to display these information as individual searches, although some charts such as “narcsist” words (words who return themselves highly) may be fun. I also thought about a “One sided relationships chart” that would display words that return a certain word that doesn’t return itself.

IMG_004522

Here are some sample results:

“History”

56 = the
55 = full
53 = times
53 = interesting
50 = set
48 = with
47 = genghis
46 = franklin
45 = humble
43 = science
42 = breakfast
41 = created
40 = nova
40 = discovery
39 = everything
39 = loewen
38 = of
38 = republic
37 = this
36 = an
35 = about\
33 = quarter
32 = population
32 = extraordinary
31 = from
27 = video
27 = watching
27 = lies
27 = american
27 = critically

“Sexy”

25 = pranks
25 = wronghaving
24 = ass
23 = oppaibreast
22 = for
21 = killer
21 = 2014
20 = 2015
19 = breast
19 = check
19 = basketball
18 = twitter
17 = with
16 = ball
16 = and
16 = comparisons
16 = upton
16 = ???
16 = shirt
16 = ?
14 = seduction
14 = getting
14 = gems
14 = ?
14 = it
14 = originality
12 = new

And last but not least here is my code:

// BFF's (Words)

import com.temboo.core.*;
import com.temboo.Library.YouTube.Search.*;
import java.util.Collections;
// Create a session using your Temboo account application details
TembooSession session = new TembooSession("asveron", "myFirstApp", "abf52e76ec6b4e0f84c076b531180b3a");

void setup()
{
 
 // Run the ListSearchResults Choreo function
 String allFreinds = findAllFreinds("cartoon");
 
 String[] freindCandidatesArray = allFreinds.split(" ", -1);
 ArrayList<String> freindCandidates = new ArrayList<String>();
 
 for (int i = 0; i < freindCandidatesArray.length; i++)
 {
 freindCandidates.add(freindCandidatesArray[i]);
 }
 
 ArrayList<Score> closeFreinds = new ArrayList<Score>();
 
 // Find close freinds
 for (int i = 0; i < freindCandidates.size(); i++)
 {
 String currentString = freindCandidates.get(i);
 boolean foundFreind = false;
 
 for (int j = 0; j < closeFreinds.size(); j++)
 {
 String closeFreind = freindCandidates.get(j);
 
 if (currentString.equals(closeFreind))
 {
 foundFreind = true;
 Score s = closeFreinds.get(j);
 s.score += 1;
 }
 }
 
 if (foundFreind != true)
 {
 closeFreinds.add(new Score(0, currentString));
 }
 }
 
 // Sort the data
 Collections.sort(closeFreinds, Collections.reverseOrder());
 
 // Print the data
 for (int i = 0; i < closeFreinds.size(); i++)
 {
 Score s = closeFreinds.get(i);
 int score = s.score;
 String word= s.name;
 println(score + " = " + word);
 }
 
 frameRate(1);
}

void draw()
{
 
}


String findAllFreinds(String searchWord)
{
 // Create the Choreo object using your Temboo session
 ListSearchResults listSearchResultsChoreo = new ListSearchResults(session);

 // Set inputs
 listSearchResultsChoreo.setFields("items/snippet/description");
 listSearchResultsChoreo.setQuery(searchWord);
 listSearchResultsChoreo.setMaxResults("50");
 
 // Run the Choreo and store the results
 ListSearchResultsResultSet listSearchResultsResults = listSearchResultsChoreo.run();
 
 // Get description from Temboo query
 String descriptions = listSearchResultsResults.getResponse();
 
 // Change result to list of words
 String result = descriptionsToFreinds(descriptions);
 
 // Return the list of works, aka the freinds
 return result;
}

String descriptionsToFreinds(String descriptions)
{
 ArrayList<String> removeStrings = new ArrayList<String>();
 
 removeStrings.add("[");
 removeStrings.add("]");
 removeStrings.add("{");
 removeStrings.add("}");
 removeStrings.add("(");
 removeStrings.add(")");
 removeStrings.add(":");
 removeStrings.add(",");
 removeStrings.add(".");
 removeStrings.add("/");
 removeStrings.add("\"");
 removeStrings.add("?");
 removeStrings.add("-");
 
 removeStrings.add("items");
 removeStrings.add("snippet");
 removeStrings.add("description");
 
 // remove json stuff & symbols
 for (int i = 0; i < removeStrings.size(); i++)
 {
 descriptions = descriptions.replace(removeStrings.get(i), "");
 }
 
 // Remove extra spaces, first space, last space, change everything to lower case
 descriptions = descriptions.replaceAll("\\s+", " ");
 descriptions = descriptions.substring(1);
 descriptions = descriptions.substring(0, descriptions.length()-1);
 descriptions = descriptions.toLowerCase();
 
 // Now we have freinds
 return descriptions;
}

dave

26 Jan 2015

I scraped data about the top 100 chain restaurants by sales at each of the many locations throughout the US. Top 100 restaurants came from http://nrn.com/us-top-100/top-100-chains-us-sales. The data is loaded from Factual through Temboo. I tried to use Corpwatch first, but their API was having problems giving me queries, so I switched to something that would produce a massive amounts of data that I can have more choices working with. I learned to drop my ideas when APIs do not go my way.

tembooSketch

 

Sample Data:

{“accessible_wheelchair”:true,”address”:”1912 Pike Pl”,”alcohol”:false,”alcohol_bar”:false,”alcohol_beer_wine”:false,”alcohol_byob”:false,”attire”:”streetwear”,”category_ids”:[342],”category_labels”:[[“Social”,”Food and Dining”,”Cafes, Coffee and Tea Houses”]],”chain_id”:”ab4c54c0-d68a-012e-5619-003048cad9da”,”chain_name”:”Starbucks”,”country”:”us”,”cuisine”:[“Coffee”,”Tea”,”Cafe”,”American”,”Fast Food”],”email”:”info@starbucks.com”,”factual_id”:”df48ccb9-23cf-4a08-83ed-8d9b9176f6e3″,”hours”:{“monday”:[[“6:00″,”21:00″]],”tuesday”:[[“6:00″,”21:00″]],”wednesday”:[[“6:00″,”21:00″]],”thursday”:[[“6:00″,”21:00″]],”friday”:[[“6:00″,”21:00″]],”saturday”:[[“6:00″,”21:00″]],”sunday”:[[“6:00″,”21:00″]]},”hours_display”:”Open Daily 6:00 AM-9:00 PM”,”kids_goodfor”:true,”kids_menu”:true,”latitude”:47.609987,”locality”:”Seattle”,”longitude”:-122.342521,”meal_breakfast”:true,”meal_deliver”:true,”meal_dinner”:true,”meal_lunch”:true,”meal_takeout”:true,”name”:”Starbucks”,”neighborhood”:[“Downtown”,”Pine Market”,”Pike Place Market”],”open_24hrs”:false,”options_glutenfree”:true,”options_healthy”:true,”options_organic”:true,”options_vegetarian”:true,”parking”:true,”parking_street”:true,”payment_cashonly”:false,”postcode”:”98101″,”price”:1,”rating”:4,”region”:”WA”,”reservations”:false,”tel”:”(206) 448-8762″,”website”:”http://www.starbucks.com/”,”wifi”:true},{“accessible_wheelchair”:true,”address”:”177 Marshall St”,”alcohol”:false,”alcohol_bar”:false,”alcohol_beer_wine”:false,”alcohol_byob”:false,”attire”:”streetwear”,”category_ids”:[342],”category_labels”:[[“Social”,”Food and Dining”,”Cafes, Coffee and Tea Houses”]],”chain_id”:”ab4c54c0-d68a-012e-5619-003048cad9da”,”chain_name”:”Starbucks”,”country”:”us”,”cuisine”:[“Coffee”,”Tea”,”Cafe”,”American”,”cafe”],”email”:”ordersupport@starbuckscardb2b.com”,”factual_id”:”67dc42f3-9948-4fda-8e82-26fb5d74dd0e”,”hours”:{“monday”:[[“5:00″,”23:00″]],”tuesday”:[[“5:00″,”23:00″]],”wednesday”:[[“5:00″,”23:00″]],”thursday”:[[“5:00″,”23:00″]],”friday”:[[“5:00″,”23:00″]],”saturday”:[[“5:30″,”23:00″]],”sunday”:[[“6:00″,”22:00″]]},”hours_display”:”Mon-Fri 5:00 AM-11:00 PM; Sat 5:30 AM-11:00 PM; Sun 6:00 AM-10:00 PM”,”kids_goodfor”:true,”kids_menu”:true,”latitude”:43.041707,”locality”:”Syracuse”,”longitude”:-76.134737,”meal_breakfast”:true,”meal_cater”:true,”meal_deliver”:false,”meal_lunch”:true,”meal_takeout”:true,”name”:”Starbucks”,”neighborhood”:[“University Hill”],”open_24hrs”:false,”options_glutenfree”:true,”options_healthy”:true,”options_organic”:true,”options_vegetarian”:true,”parking”:true,”parking_street”:true,”payment_cashonly”:false,”postcode”:”13210″,”price”:1,”rating”:3.5,”region”:”NY”,”reservations”:false,”smoking”:false,”tel”:”(315) 474-2863″,”website”:”http://www.starbucks.com/”,”wifi”:true},{“accessible_wheelchair”:true,”address”:”867 Peachtree St NE”,”alcohol”:false,”alcohol_bar”:false,”alcohol_beer_wine”:false,”alcohol_byob”:false,”attire”:”streetwear”,”category_ids”:[342],”category_labels”:[[“Social”,”Food and Dining”,”Cafes, Coffee and Tea Houses”]],”chain_id”:”ab4c54c0-d68a-012e-5619-003048cad9da”,”chain_name”:”Starbucks”,”country”:”us”,”cuisine”:[“Coffee”,”Tea”,”Cafe”,”American”,”cafe”],”email”:”info@starbucks.com.gr”,”factual_id”:”5d7119ee-ea41-44dc-8263-18fe560b70eb”,”hours”:{“monday”:[[“5:00″,”23:30″]],”tuesday”:[[“5:00″,”23:30″]],”wednesday”:[[“5:00″,”23:30″]],”thursday”:[[“5:00″,”23:30″]],”friday”:[[“5:00″,”23:59″]],”saturday”:[[“5:30″,”23:59″]],”sunday”:[[“6:00″,”23:30″]]},”hours_display”:”Mon-Thu 5:00 AM-11:30 PM; Fri 5:00 AM-11:59 PM; Sat 5:30 AM-11:59 PM; Sun 6:00 AM-11:30 PM”,”kids_goodfor”:true,”kids_menu”:true,”latitude”:33.778356,”locality”:”Atlanta”,”longitude”:-84.384124,”meal_breakfast”:true,”meal_deliver”:false,”meal_lunch”:true,”meal_takeout”:true,”name”:”Starbucks”,”neighborhood”:[“Midtown”],”open_24hrs”:false,”options_glutenfree”:true,”options_healthy”:true,”options_organic”:true,”options_vegetarian”:true,”payment_cashonly”:false,”postcode”:”30308″,”price”:1,”rating”:2,”region”:”GA”,”reservations”:false,”tel”:”(404) 876-7466″,”website”:”http://www.starbucks.com/”,”wifi”:true}

Bryce Summers

26 Jan 2015

Datascraping Assignment

For this project I have scraped some data about New York Times articles that are in some way related to various words related to entertainment or leisure activities including:  “Board Game”, “Game”, “Computer Game”, “Gambling”, “Sport”, “Fun”, “Arcade”, “Competition”, “Amusement”, “Entertainment”, “Movie”, “Drinking”, “Beach”, “Reading”, and “Music”. For each of these words I scraped 1000 records each. I learned how to write a programmatic data scraper (At least the beginners version using Temboo) and had to think about what data I could get from the internet. I started out thinking about trying to get a list of data such as statistics about stuttering or a list of the names of every board game in existence. I then realized that if I wanted such as list, I would either have to find a list file that someone else out in the wild had compiled or compile my own list to be used as input terms for a public API. Either way such a list was not to be found by data scraping, so I had to revise my approach. I decided to scrape data about entertainment activities to try to find a data set that would shed some light upon how the media, specifically the New York Times, represents and perceives different forms of entertainment.

Here is a link to the code that I used to scrape the data: https://github.com/Bryce-Summers/IACD-Datascraping

Amazingly Rough Sketch:

InfoVisSketch

 

In the future Information visualization project, I will attempt to display relationships between the data I received, which may or may not relate to the terms the I was looking for.

Here is some of the data that I have collected:

https://github.com/Bryce-Summers/IACD-Datascraping/tree/master/Sample%20of%20Data

I have decide to provide a link to the files instead of embedding the text into this webpage, because the JSON files are so verbose that they crash this blog post.

JohnMars—DataScraping

Popular music has the stigma of being overly duplicative — it seems like almost every song has the same chord progression. Is this really the case? Has it always been like this?

This project took place in two parts: scraping the Billboard Hot 100 for every year since 1940, and then using that data to get each song’s chord progression.

The Billboard information was gathered from this website with a custom python scraper. I was lucky enough that the site had a consistent URL scheme to find the pages, and all but a few of the pages had a consistent table layout from which to scrape the data. A couple years were inconsistent, so I had to download those pages as HTML files, do some manual touch-up with multiple cursors in Sublime Text, and send them back through the scraper as strings instead of actual requests. By the end of that phase, my data structure looked like this, but with 5400 entries:

{
  "songs": [
    {
      "title": "White Christmas",
      "artist": "Bing Crosby",
      "rank": 1,
      "year": 1940
    },
    {
      "title": "The Christmas Song",
      "artist": "Nat \"King\" Cole",
      "rank": 2,
      "year": 1940
    },
    …
    {
      "title": "Somethin' Bad",
      "artist": "Miranda Lambert and Carrie Underwood",
      "rank": 99,
      "year": 2014
    },
    {
      "title": "Adore You",
      "artist": "Miley Cyrus",
      "rank": 100,
      "year": 2014
    }
  ]
}

Phase two was using that data to scrape chord information from a guitar tab website. Again, I was extremely lucky that they used a consistent URL naming scheme — unfortunately, not all of my data fit that scheme, so my final data has holes. It’s entirely possible that I could find the missing data points by traversing search results, but that’s not within the scope of this project at the moment. After stripping the chord information from the website, again with a custom scraper, my structure looked like this (the total amount of data is 200,679 lines):

{
  "songs": [
    {
      "chords": [
        "G",
        "C",
        "D",
        "D7",
        "C",
        "D",
        "G",
        "G",
        "B",
        "Em",
        "G",
        "G",
        "Em",
        "D",
        "G",
        "C",
        "D",
        "D7",
        "C",
        "D",
        "G",
        "C",
        "Em",
        "C",
        "G",
        "D",
        "G",
        "G",
        "C",
        "D",
        "D7",
        "C",
        "D",
        "G",
        "C",
        "Em",
        "C",
        "G",
        "D",
        "G"
      ],
      "chord_url": "http://tabs.ultimate-guitar.com/b/Bing_Crosby/White_Christmas_crd.htm",
      "title": "White Christmas",
      "rank": 1,
      "year": 1940,
      "artist": "Bing Crosby"
    },
    {
      "chords": "",
      "chord_url": "",
      "title": "The Christmas Song",
      "rank": 2,
      "year": 1940,
      "artist": "Nat \"King\" Cole"
    },
    {
      "chords": [
        "Ebmaj7",
        "Abmaj7",
        "Ebmaj7",
        "Abmaj7",
        "Eb7",
        "Eb7",
        "Abmaj7",
        "Ebmaj7",
        "F7",
        "C7",
        "Bb7",
        "Ebmaj7",
        "Abmaj7",
        "Ebmaj7",
        "Abmaj7",
        "Ebmaj7",
        "Abmaj7",
        "Eb7",
        "Eb7",
        "Abmaj7",
        "Ebmaj7",
        "F7",
        "C7",
        "Bb7",
        "Ebmaj7",
        "Abmaj7",
        "Ebmaj7",
        "Ab7",
        "G7",
        "D7",
        "G7",
        "C7",
        "Bb7",
        "Ebmaj7",
        "Abmaj7",
        "Ebmaj7",
        "Abmaj7",
        "Eb7",
        "Eb7",
        "Abmaj7",
        "Ebmaj7"
      ],
      "chord_url": "http://tabs.ultimate-guitar.com/b/Billie_Holiday/God_Bless_The_Child_crd.htm",
      "title": "God Bless The Child",
      "rank": 3,
      "year": 1940,
      "artist": "Billie Holiday"
    },
    {
      "chords": "",
      "chord_url": "",
      "title": "Take The \"A\" Train",
      "rank": 4,
      "year": 1940,
      "artist": "Duke Ellington"
    },
    …
    {
      "chords": [
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "Am",
        "C",
        "F",
        "Am",
        "F",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "F",
        "C",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C",
        "F",
        "Am",
        "C"
      ],
      "chord_url": "http://tabs.ultimate-guitar.com/m/Miley_Cyrus/Adore_You_crd.htm",
      "title": "Adore You",
      "rank": 100,
      "year": 2014,
      "artist": "Miley Cyrus"
    }
  ]
}

In terms of visualization/representation, obviously it has to be something sonic. Someone has already made an application that explores related chords in popular music, which is pretty cool; implementing something similar, except where time is a main contributor, might be a good avenue — sort of a “what does this era sound like” type of thing, or maybe album covers sortable by progression.

One of the awesome things about this data set and the way I’ve constructed it is that it’s easily expandable. Want tempos for each song? Easy. Album Cover? Okay. Genre? Done.

The project source is available here, and the full JSON data is here (it’s only about 5MB).