I have little working knowledge of python. I know that there is something called a Twitter search API, but I'm not really sure what I'm doing. I know what I need to do:
I need point data for a class. I thought I would just pull up a map of the world in a GIS application, select cities that have x population or larger, then export those selections to a new table. That table would have a key and city name.
next i randomly select 100 of those cities. Then I perform a search of a certain term (in this case, Gaddafi) for each of those 100 cities. All I need to know is how many posts there were on a certain day (or over a few days depending on amount of tweets there were).
I just have a feeling there is something that already exsists that does this, and I'm having a hard time finding it. I've dowloaded and installed python-twitter but have no idea how to get this search done. Anyone know where I can find or how I can make this tool? Any suggestions would really help. Thanks!
A tweet itself comes with a geo tag. But it is a new feature and majority tweets do not have it. So it is not possible to search for all tweets containing "Gaddafi" from a city given the city name.
What you could do is the reverse, you search for "Gaddafi" first (regardless of geo location), using search api. Then, for each tweet, find the location of the poster (either thru the RESTful api or use some sort of web scraping).
so basically you can classify the tweets collected according to the location of the poster.
I think only tweepy have access to both twitter search API as well as RESTful API.
Related
I have a database of possibly infinitely many products.
Every single product has a Title and tags representing it.
I am trying to make a system where somebody can search for the product using the Title primarily, but helping to sort out the Top 100 products using tags.
Some people have suggested Django (Taggit) or just forming an API.
I am not sure how relevant that is and how needed it is for something simple like the task mentioned above.
I am also looking for something efficient.
Coding (python) newbie here.I am trying to make a small project where I gather news headlines from an API.The API I am trying to use is newsapi.org. Every time I enquire the API, It gives me a a long list of news articles,which I add to a database . But when I enquire again after some time gap , I get a new article(S) plus many, if not all the previous articles. How do I avoid adding duplicates to the database? I thought of iterating through the last few new entries and see if such articles already exist but somehow I feel that is not the way.Any help is appreciated.
Thank you
The API you want to use includes a from parameter that you can set to filter the articles returned and only get the ones that are more recent.
If you store the current time each time you gather data from the api, you can use it in the from parameter on the next api call you will do to only get what has been published in between the two calls.
After one month of learning and toying around with python and doing tons and tons of exercieses and samples I am still not really able to answer one important question to myself. If I generate data, for example with loading them from a xml or from scraping. How do I work with them? Right now I put each and every entrie directly into a sqlite db.
For exmaple:
I read a news feed, I put id, title, description, link, tags into one row in my sqlite db. I do some categorizing according to keywords within the the title or description. Each news entry goes into one row. After that I can read them row by row or at once.
I can filter them, sort them... But somehow I feel like there is a better way. Put them all with a dictionary into one big list and somehow work with this list. And then after I worked thru them sorted them or got even more infos with that data. Put them into the sqlite db. But none of my books or tutorial talk about this topic!? I try to make an example is a news article more important if a certain keyword comes up multiple times or in with an other keyword. Compare the news from that page with the news from another page with same keywords.
I guess u will be laughing and say. He is talking about ..... how can he not know that. But I am new to all of that. Sooorrry.
Thank you for your help guiding me in the right direction.
I used Tweepy, the Twitter library for Python, to look at Tweets based containing specific keywords, and applying a date and location range. This process was relatively simple.
Now I would like to use python to search for Facebook posts in a similar fashion, yet facebook's data and API don't seem to allow me to 1) search for posts 2) search with multiple criteria. Is there a way to do this, or is it simply a lost cause and I should stick to using Twitter?
Assembla provides a simple way to fetch all commits of an organisation using api.assembla.com/v1/activity.json and it takes to and from parameters allowing to get commits of selected date(from all the spaces(repos) the user is participating.
Is there any similar way in Github ?
I found these for Github:
/repos/:owner/:repo/commits
Accepts since and until parameters for getting commits of selected date. But, since I want commits from all repos, I have to loop over all those repos and fetch commits for each repo.
/users/:user/events
This shows the commits of a user. I dont have any problem looping over all the users in the org, but how can I get for a particular date ?
/orgs/:org/events
This shows commits of all users of all repos but dont know how to fetch for a particular date ?
The problem with using the /users/:user/events endpoint is that you just don't get the PushEvents and you would have to skip over non-commit events and perform more calls to the API. Assuming you're authenticated, you should be safe so long as your users aren't hyper active.
For /orgs/:org/events I don't think they accept parameters for anything, but I can check with the API designers.
And just in case you aren't familiar, these are all paginated results. So you can go back until the beginning with the Link headers. My library (github3.py) provides iterators to do this for you automatically. You can also tell it how many events you'd like. (Same with commits, etc). But yeah, I'll come back an edit after talking to the API guys at GitHub.
Edit: Conversation
You might want to check out the GitHub Archive project -- http://www.githubarchive.org/, and the ability to query the archive using Google's BigQuery. Sounds like it would be a perfect tool for the job -- I'm pretty sure you could get exactly what you want with a single query.
The other option is to call the GitHub API -- iterate over all events for the organization and filter out the ones that don't satisfy your date rage criteria and event type criteria (commits). But since you can't specify date ranges in the API call, you will probably do a lot of calls to get the the events that interest you. Notice that you don't have to iterate over every page starting from 0 to find the page that contains the first result in the date range -- just do a (variation of) binary search over page numbers to find any page that contains a commit in the date range, a then iterate in both directions until you break out of the date range. That should reduce the number of API calls you make.