I am looking to stream and save twitter data based on a hashtag during an event. I don't pay twitter, so I may have restriction limits on my account. Assuming I have a twitter_credentials.py with acc_secret, acc_token, con_key, and con_secret, and the hashtag #hashtag, could someone please help me build this? I'd like it to end up as a json object that I can then convert to pandas dataframes.
The search method allow you to get all tweets refer to a query. I think you will need this function to retrieve all tweets refering to a specific hashtag
You can look on https://docs.tweepy.org/en/v3.10.0/api.html?highlight=search#API.search to have more details
Related
I am trying to pull the Category/Topic of Tweets for a school project.
I am not seeing it as one of the keys in public_tweets (below), was wondering if it was located within a variable somewhere else? Thanks!
Example of a Twitter Topic/Category (would like to pull the “Lebron James” label):
Topics are known as “context annotations” in the Twitter API, and are only available in v2. The Tweet data you have here is from v1.1. You’ll need to update the way you are accessing the API and add parameters to get Tweet expansions and fields for context annotations.
I'm trying to pull tweets from a user's timeline in real-time. I then want to do some analysis on those tweets. Having read the docs it looks like I will need to use tweepy.Stream for this use case. I've done the following:
stream.filter(follow='25073877')
But Twitter's filter API states the following:
Tweets created by the user.
Tweets which are retweeted by the user.
Replies to any Tweet created by the user.
Retweets of any Tweet created by the user.
Manual replies, created without pressing a reply
button (e.g. “#twitterapi I agree”).
It seems that this will return a huge volume of tweets that aren't relevant to my use case. Do I have to use this approach and then filter by screen name to get only tweets by the real user? This doesn't seem right at all.
The alternative seems to be the api.user_timeline class but that isn't a streaming API. Do I therefore use this API and hit it every second? I can't seem to find suitable examples of how best to accomplish my use case.
Yes, you'll need to filter either by screen_name or maybe you can check if it's a retweet or not.
I wouldn't recommend the second approach since you'll be getting an even bigger amount of tweets since you'll have to filter out the tweets you already received in previous requests plus you may hit the API querying limits if you don't time ti properly.
That's the signature of the filter function:
def filter(self, follow=None, track=None, is_async=False, locations=None,
stall_warnings=False, languages=None, encoding='utf8', filter_level=None)
Which maps to this Twitter API request.
And here the explanation of the parameters.
I have a list of tweets. For each tweet I have different attributes (user, date, text and tweet IDs).
To scrape that data, I’m using the project of Jefferson Henrique (https://github.com/Jefferson-Henrique/GetOldTweets-python).
In addition to that, I would like to know two geographical elements for each tweet:
where tweets were generated (location or long, lat)?
where the user resides?
Do you have any idea to get those two information either from tweet IDs or something else?
You might want to post the json file of the tweet to let me locate where the information is.
From my experience, the location of the tweet may not always be available, depending on whether the user allows location sharing when tweeting.
For the user location, it is usually not in the tweet. Scrape the user profile and you should easily find it.
You could query the Twitter API directly using the tweet id. This would allow you to retrieve more data about the tweet, including the location if available.
According to the Twitter API documentation:
If the Tweet is geo-tagged, there will a "place" object included.
I am trying to use Twitter API with the Python wrapper Twython and I want to retrieve all replies (the comments below a tweet) to a certain tweet find using some patterns.
At the moment to achieve this, I perform the search of a string, I retrieve the screen_name field of user field in the response, related to the original tweets and then I use again the API in order to search the latest tweets directed to the user, using in the query the substring to:screen_name.
Is there a better solution? The only questions related to this topic that I found were written in '14 and I hope that, in the mean time, there were some improvements.
I need to retrieve specific data from twitter.
I'd like to get all the responses tweets received by a specific user (which is not the authenticating user of the program). Is there a way to achieve this? Right now I'm thinking about using the search function and see if the 'in_reply_to_user_id_str' matches the id of the user I want.
But this means that I need to filter a lot of data to find the one I want
Edit: I'm using the Python-Twitter-Tools
If it is the authenticating user, you can directly get the response tweets using the 'mentions timeline'. As the user is not the authenticating user, you have two options here.
Streaming API
Use the filter endpoint along with the 'follow' parameter. Pass the required 'user_id' to the follow paramenter and it will return the followings. You will have to check the 'in_reply_to_user_id_str' in order to isolate the replies(responses).
Tweets created by the user.
Tweets which are retweeted by the user.
Replies to any Tweet created by the user.
Retweets of any Tweet created by the user.
Manual replies, created without pressing a reply button.
Python-Twitter-Tools supports Streaming API. Streaming API is realtime and better than Search API considering the completeness.
Search API
Every response tweet contains the "#username" tag. You can searching using "#username" tag and then filter the tweets using 'in_reply_to_user_id_str' as you have mentioned.
Considering the two options, Streaming API will help you to get what you need easily and reliably.