Crowdfund-Mine

General experiment to mine social media data in context of crowdfunding

View project onGitHub

Mining social media data in context of crowdfunding data

- Part 1: Social Side

Trying to figure out the effectiveness of one’s efforts for fundraising can be a pretty arduous task online (so I'm told), especially if one is not equipped with the mindset of what would be a way to measure such. Not only that, analyzing datasets like these can help individuals and organizations identify exactly who is their audience, who in their audience has the most potential for helping launch/sustain/grow campaigns, and see how these things can change over the lifetime of one or more campaigns.

The Challenge

This challenge came to me from a cousin (Naji Barnes-McFarlane) of mine who works for a non-profit in Houston, TX (Neighborhood Centers Inc.) where Neighborhood Centers bring resources, education and connection to more than 400,000 people throughout Texas each year. For more than a century, Neighborhood Centers has offered comprehensive community-based programs for people at every stage of life – from infants to seniors. The organization works with residents of emerging communities in 60 service locations to help them discover the strengths and skills necessary to become productive, prosperous and self-sufficient. Building on the strengths of individuals and communities, Neighborhood Centers is transforming them. From the work they do, He was wondering if it was possible to be able to measure how effective any crowdfunding campaign could be in general (especially when you are trying to do work within a community), over time. One of the modern hallmarks of crowdfunding is to usually rely on sites like kickstarter and indiegogo, so I knew where I would start. For some examples of non profits raising money through crowdfunding online, you can check out indiegogo's list. If non-profit individuals and organizations had a tool at their disposal to help their efforts, not only will they benefit from being able to focus their efforts, but their audience will as well.

Getting Started

While getting started, my initial mindset was to figure out how to get social data without needing API access. API access is a drag on experiments like these because it takes away the time you could be coding up a solution, and because Information Wants To Be Free™. So that pretty much immediately put Facebook out of the race (for now) and left me with Twitter. On Twitter, you can search for whatever you want publicly. Using this functionality, I opened up the console in Firefox and looked for HTTP requests made that could simplify a future scrape...

Jackpot:

https://twitter.com/i/search/timeline?q=%s&composed_count=1&include_available_features=1&include_entities=1&include_new_items_bar=true&interval=1&f=realtime' % (some_query_of_interest)

…which returns a bunch of tweets that include “some_query_of_interest” in json format

What Data To Mine

Since I'm starting out with social data, I wanted to mine data that would possibly signal the effectiveness of a crowdfunding campaign online (on Twitter). So on twitter that would be tweets that contain links (possibly to the crowdfunding site and filtering out those that don’t contain any), who tweeted the original link at what time, how many re-tweets and favorites (at what time) and from who. The 'who' is this case will be a twitter user id which will allow us to go back retroactively to obtain information such as how many followers they have and etc. During this process I somewhat tried to minimize the amount of requests I made to twitter, though I suspect that most of the time is spent in processing since some of the operations are O(n^2) and O(n^3).

As to obtain this data for an individual tweet:

tweet url: ‘https://twitter.com/i/status/%s' % (tweet_id)

retweet-data: 'https://twitter.com/i/activity/retweeted_popup?id=%s' % (tweet_id)

favorite-data: 'https://twitter.com/i/activity/favorited_popup?id=%s' % (tweet_id)

How To Mine For Yourself

I'm doing this in python using virtualenv, but this can be done in any language of your choosing. Here are my steps:

1) $ git clone https://github.com/cinquemb/crowdfund-mine.git
2) $ cd ../path/to/crowdfund-mine/
3) $ virtualenv env --distribute
4) $ source env/bin/activate
5) $ mkdir mined_data
6) $ mkdir final_mined_data
7) $ pip install -r ops/requirements.txt
8) Edit line 25 in social-mine.py to your choosing for queries that are of interest to you in respect to this topic as of now it is set to: crowdsource_site_list = ['startsomegood', 'indiegogo', 'kickstarter']
9) $ python social-mine.py

Last command will generate a json file in mined_data/ directory, where you can do what you want with it. The data in the generated files will look something like below for every tweet:

{
    "query": "startsomegood",
    "user": 18086787,
    "time_created": 1386460857,
    "tweet_id": 409472628903378940,
    "urls": "['http://ow.ly/r3Za8'],['http://ow.ly/r3ZC5'],",
    "data": [
        {
            "metadata": [
                {
                    "retweets": 2
                },
                {
                    "favorites": 2
                },
                {
                    "retweet-metadata": [
                        {
                            "user": 17877757
                        },
                        {
                            "user": 15247690
                        }
                    ]
                },
                {
                    "favorite-metadata": [
                        {
                            "user": 17877757
                        },
                        {
                            "user": 1705181065
                        }
                    ]
                }
            ]
        }
    ]
}

This command has taken me anywhere from 44s to 57s to run (duration is printed at end of script), and can be run on an interval of your choosing (with something like cron).

Take Away

My main goal for Part 1 was to simply find and retrieve relevant (meta)data that would help identify signals to possible campaigns found from within and in the context of the tweets. From the sample mined data from above, I feel that I have successfully accomplished that.

What's Next?

My next objective is to work on crowdfund-mine.py which will follow the link(s) contained in the tweet to see if it is a crowdfunding site and scrape data from the page and associate it with the tweet (and it's data). If anyone has any suggestions/ideas about any of this, feel free to send me an email.

Author(s)/Contributor(s)

Me: @cinquemb

Contact

You can reach me at NSA's favorite email service with my github username [at] gmail.com