Tracking hashtag with geolocation - tweepy - python

I want to track all tweets of a particular hashtag but i need only the tweets with geolocation.
This line not working fine, the results are all tweets with geolocation.
stream.filter(track=["hashtag"],locations = GEOBOX_WORLD)
This work fine.
stream.filter(track=["hashtag"])
This work fine.
stream.filter(locations = GEOBOX_WORLD)
But the union of track and location don't work. There is some solution (without check tweet by tweet if .geo != None)?
Thanks.

You can't filter both things at once, you need to choose one and then check the other.
Here you can find a more detailed answer to a similar question: How to add a location filter to tweepy module

Related

Twitter API: How to look up tweets in a specific location and remove retweets?

I'm working on a school project and I'm trying to retrieve tweets that involve robberies in my home city. I have already created most of the code, but I'm still struggling with some arguments, specifically the location and the repeated tweets that appear (RT); I add a picture to illustrate the problem with the RT Repeted tweets due to a RT that has the words that I'm looking for.
I'm working in python using tweepy. Here's a portion of my code that is trying to retrieve the tweets
tweets = tweepy.Cursor(api.search_tweets,
q = keywords,
lang = "es",
tweet_mode = "extended",
count = 100,
result_type = "recent").items(limit)
I know that for the location I have to use the geocode argument, nevertheless, I do not understand how to insert the coordinates and the radius.
To solve the RT problem I don't know the specific argument I have to use.
I really hope someone can help me solve this problem, I've been struggling with it the last few days.

Collect tweets from multiple users at the same time with tweepy

Currently I am working on a project with tweepy to collect new tweets from users very quickly. So far, I have found that the fastest method to collect the newest tweet of a user is like so:
tweets = api.user_timeline(screen_name='user',count=1, include_rts = True,tweet_mode = 'extended')
status = tweets[0]
I was wondering if there is anyway to get the most recent tweets of multiple users in one request? I tried using a streamer, but that ended up having about a 10 second delay between when a tweet was posted and when it popped up, which is way too slow for my application. Please let me know if you have any other ideas on how to fetch tweets quickly.
Thanks
I haven't used tweepy to search the posts by user, but I have used it to extract information on multiple hashtags before.
When I tried with multiple hashtags the code looked like this:
query = "(#nike OR #puma OR #adidas)"
rule = gen_rule_payload(query, results_per_call=100, from_date="2020-11-30", to_date="2020-12-02")
So I would try for you query:
users = "(User1 OR User2 OR User3)"
tweets = api.user_timeline(screen_name=users,count=1, include_rts = True,tweet_mode = 'extended')
status = tweets[0]
I'm not sure if that will work, but fingers crossed! The good thing about working with Twitter is that you can copy paste your search strings directly into the search box on a normal twitter webpage and see if your syntax is correct.

Cant get more than 600 tweets using tweepy search api

noofitem = 1000
tweets = tweepy.Cursor(api.search,q=['#iphone11, -filter:retweets'],since='2019-11-14',lang='en',tweet_mode='extended',retweeted=False).items(noofitem)
i = [tweet.full_text for tweet in tweets] #Tweet text
I am trying to get about 1000 tweets using tweepy. But the max tweets I get are around 600. Changing the date does not work. Any modification or other workarounds will be helpful. Thanks.
Please note that Twitter’s search service and, by extension, the
Search API is not meant to be an exhaustive source of Tweets. Not all
Tweets will be indexed or made available via the search interface.
Please refer to this link for more information: http://docs.tweepy.org/en/latest/api.html#help-methods
Probably you will need to set up a Stream to get the amount of data you need.

Tweepy not finding results that should be there

I am writing a script in Python, that uses tweepy to search for tweets with a given keyword. Here is the snippet:
for tweet in tweepy.Cursor(api.search, q=keyword, lang="en").items(10):
print tweet.id
I have everything authenticated properly and the code works most of the time. However, when I try to search for some keywords (examples below) it doesn't return anything.
The keywords that cause trouble are "digitalkidz" (a tech conference) and "newtrendbg" (a Bulgarian company). If you do a quick search on Twitter for either of those you will see that there are results. However, tweepy doesn't find anything. Again, it does work for pretty much any other keyword I use.
Do you have any ideas what might be the problem and how to fix it?
Thank you
I believe you're forgetting an important aspect of the twitter api, it's not exhaustive.
Taken from the api docs
Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Regardless of whether you're using the streaming or rest api, you're going to have issues with this if you're looking for specific tweets.
Rest API
When looking for historical tweets, you unfortunately won't be able to obtain anything that is older than a week using api.search(). This is also shown in the docs.
Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.
There are other ways of getting older tweets, this post details those options.
Streaming API
While it doesn't sound like you're using twitter's streaming API, it should be noted that this only gives a small sample of twitter's current tweet traffic (~1-2%).
Hopefully this is helpful. Let me know if you have any questions.

How do I make sure a twitter bot doesn't retweet the same tweet multiple times?

I'm writing a simple Twitter bot in Python and was wondering if anybody could answer and explain the question for me.
I'm able to make Tweets, but I haven't had the bot retweet anyone yet. I'm afraid of tweeting a user's tweet multiple times. I plan to have my bot just run based on Windows Scheduled Tasks, so when the script is run (for example) the 3rd time, how do I get it so the script/bot doesn't retweet a tweet again?
To clarify my question:
Say that someone tweeted at 5:59pm "#computer". Now my twitter bot is supposed to retweet anything containing #computer. Say that when the bot runs at 6:03pm it finds that tweet and retweets it. But then when the bot runs again at 6:09pm it retweets that same tweet again. How do I make sure that it doesn't retweet duplicates?
Should I create a separate text file and add in the IDs of the tweets and read through them every time the bot runs? I haven't been able to find any answers regarding this and don't know an efficient way of checking.
You should store somewhere the timestamp of the latest tweet processed, that way you won't go throught the same tweets twice, hence not retweeting a tweet twice.
This should also make tweet processing faster (because you only process each tweet once).
I wrote a twitter bot in python a few months ago and this link helped a lot. I also used this github repo which although is in Ruby, was quite helpful for logic flow. This repo uses a similar approach to what you mentioned, creating a local datastore of previous retweets to compare against each tweet.
This is how I did it. I grabbed the list of things to retweet and a list of my feed. I cut the lists down to only posts within the past 24 hours. Then for each item in retweetable I check to see if it's in my feed list. If not I post RT #user retweet content.
I also wrote a function to chop the str down to 140 chars (137 + '...')
E.G.
TO_RT = 'a post to post'
MYTWT = ('old post', 'other old post')
if TO_RT not in MYTWT
Tweet(TO_RT)
Twitter is set such that you can't retweet the same thing more than once. So if your bot gets such a tweet, it will be redirected to an Error 403 page by the API. You can test this policy by reducing the time between each run by the script to about a minute; this will generate the Error 403 link as the current feed of tweets remains unchanged.

Categories