I'm facing problem like this. I used tweepy to collect +10000 tweets, i use nltk naive-bayes classification and filtered the tweets into +5000.
I want to generate a graph of user friendship from that classified 5000 tweet. The problem is that I am able to check it with tweepy.api.show_frienship(), but it takes so much and much time and sometime ended up with endless ratelimit error.
is there any way i can check the friendship more eficiently?
I don't know much about the limits with Tweepy, but you can always write a basic web scraper with urllib and BeautifulSoup to do so.
You could take a website such as www.doesfollow.com which accomplishes what you are trying to do. (not sure about request limits with this page, but there are dozens of other websites that do the same thing) This website is interesting because the url is super simple.
For example, in order to check if Google and Twitter are "friends" on Twitter, the link is simply www.doesfollow.com/google/twitter.
This would make it very easy for you to run through the users as you can just append the users to the url such as 'www.doesfollow.com/'+ user1 + '/' + user2
The results page of doesfollow has this tag if the users are friends on Twitter:
<div class="yup">yup</div>,
and this tag if the users are not friends on Twitter:
<div class="nope">nope</div>
So you could parse the page source code and search to find which of those tags exist to determine if the users are friends on Twitter.
This might not be the way that you wanted to approach the problem, but it's a possibility. I'm not entirely sure how to approach the graphing part of your question though. I'd have to look into that.
Related
I am rn trying to scrape twitter for an nlp research. I already used snscrape to get tweets with the required filters, the issue is that we need tweets from a specific age range. In my head I guess some profiles on twitter have their birthdate public, so maybe we can fetch that. Maybe webscrape that from the profile? Any ideas are welcomed.
Till now I have tried some methods of webscraping but can't find something concrete
Twitter has a pretty well documented API that works very well with Python.
Try to make a simple crawler and see one of the JSONs that you get for a Tweet/User.
You will need to sign up and get some Access Tokens/Keys to use in your script, but other than that you are ready to go:
https://developer.twitter.com/en/docs/twitter-api
The age of Twitter users is not made available via the API (and the website may show birthday but not year). There are also a number of other factors you should read about in relation to analysis of Twitter user data.
I want to find/get username of all the Instagram influencers in the world who have more than 10k followers.
I have an idea in mind, we can search for a hashtag on instagram and retrieve all the username that have posted to this specific tag. Then for every unique username we will check if they have follower > 10k.
Any suggestions to reach the goal please...
You can do this by using selenium , BeautifulSoup , request or you can just use their APIs for your work. I am not going towards APIs part you can find/get help from their official documentation for APIs things. So let's go for scraping.
Firstly use selenium to do login and search for hashtag. Now you got the result of hashtag search you can scrape user ids using BeautifulSoup, you got user ids also then you can do main web scraping things, go into the every users profile and check if there followers is more the 10,000 or not and then save or do what you need if followers is more then 10,000 or if not then scrape another user profile. I am not going to write any code, do you your own research and write a code for yourself. I am going to share some links that may help you to solve you problem.
From this article, you can do hashtag search in Instagram using selenium and get user tags. Now you have to use requests and BeautifulSoup for getting followers count. This may help you for this problem. If you found speeding problem while going through users profile the you can use threading and multiprocessing for that. But firstly do all other steps and only go to threading and other steps because writing code for webscraping is bit tricky and doing threading is much easier than webscraping.
I have answered this as of your idea,
I want to find/get username of all the Instagram influencers in the world who have more than 10k followers
But doing this things is very much hard then you can think of, there could be tens of thousands of users who has 10,000 followers and that is you want to find 1M+ followers user name then also it is very hard thing. You can just do scraping from some website where there is list of top Instagram followers. It will make your task easier.
Don't forget to mark this as answer if this helps with solve your problem.
I'd like to build in Python a Twitter Bot that automatically posts product info(product image(optional), product name, short description, shortened url) that are defined in csv or something at a certain interval.
I've only been researching how to do this however have had no luck finding out how.
I'd appreciate receiving a link to a site for good reference or any direction to achieve this if possible at all. thanx!
I’m looking into web scraping /crawling possibilities and have been reading up on the Scrapy program. I was wondering if anyone knows if it’s possible to input instructions into the script so that once it’s visited the url it can then choose pre-selected dates from a calendar on the website. ?
End result is for this to be used for price comparisons on sites such as Trivago. I’m hoping I can get the program to select certain criteria such as dates once on the website like a human would.
Thanks,
Alex
In theory for a website like Trivago you can use the URL to set the dates you want to query but you will need to research user agents and proxies because otherwise your IP will get blacklisted really fast.
My younger brother, who still lives in China is a fan of Michael Phelps. He wants to see his twitter posts. Since they can't access twitter behind the GFW and setting up a VPN is too hard for my mom. I want to write something that grabs the twitter and sends them to my mom's email.
I use python as my main language. Familiar with tweepy / request / scrapy
I have tried or thought about three ways of doing this:
Use the twitter API and grabs the user_timeline. However, this method will lost all graphical data and throws a bunch of useless links that are only visible after proper rendering
Do a web scraping and save the html content. Then send the html file as an attachment. However, this method still loses some graphical contents and is not that user friendly to someone in her 40s. In addition, it will be kinda hard to tell how many tweets I have scraped and if there's any updates.
Wrap the html content in the email and use html rendering within the email. I haven't work with this before so I am not exactly sure how its gonna work out.
I am aware that "what's the best way to do this" kinda question is always downvoted on SO but I do believe this problem is particular enough to engage meaningful Q&As. Any suggestion will be appreciated.
Have you thought of using selenium and taking screen shots of the browser window? Taking a screen shot with selenium is as easy as
browser.get('twitter.com')
browser.get_screenshot_as_file('twitter_screenshot.png')
You'd have to figure out a way to automate both watching for new tweets and running the selenium script when a new tweet is found. However in terms of preserving graphical content, taking screenshots w/ Selenium would be simple to implement.
Docs: http://selenium-python.readthedocs.io/api.html#selenium.webdriver.remote.webdriver.WebDriver.get_screenshot_as_file