retrieving data from twitter python - python

I am trying to build an app where users will be able to connect to my app, enter a keyword for searching on twitter and then the results will be stored on a database. From the moment the user enters a keyword I want to keep track of what is being said on twitter.Those results will be further analyzed and some statistics will be presented to the user.
So far I have used tweppy and twitter streaming api for getting the tweets. But I realized that I can not have more than one open streaming connections (for searching in parallel for multiple keywords).
I searched the stackoverflow and found solutions like disconnect, connect and then search with a new keyword, but in that case I am going to lose data.
Also I checked the Twitter API, which gives you 450 results max/15 min:
https://dev.twitter.com/docs/rate-limiting/1.1/limits
Stream API:
- public stream doesn't give the oppurtunity to have more than connections
- Site stream doesn't give you the oppurtunity for search
Firehose API is not option since is too expensive.
How can I solve this problem? I am seeing many apps searching live for more instances than one. Have anyone met this before?

You could use tweepy to collect all tweets from the sample or filter streaming endpoint and save this to a database. Then use the database to only return tweets for your search term.
If you don't want tweets to persist for too long, you might have better results using noSQL databases like redis and using an expiration timestamp, so it doesn't fill up infinitely.

Related

Is there a way to programmtically check via an API on that status of Google Workspace Apps?

Google has a Google Workspace Status Dashboard where they indicate whether any of their core services are experiencing an outage or not.
Accordingly, I would like to be able to fetch the status for a particular service. For example, I would like to check whether Gmail has one of the following statuses:
Available
Service disruption
Service outage
I would like to make an API call in Python that would retrieve the status and allow me to perform an action according to the current status.
Is there a way I can achieve this?
I found some documentation here but I'm still trying to figure out how I can do it.
The problem with the API is that it does not work directly with the Dashboard, instead it works with the information from the Google Workspace Alert Center, meaning that you need to set up an alert first in order to pull the data from this specific alert using the API and the alert will only be triggered when there is a service disruption or outage reported in the Dashboard, so it will not show any data about the service being Available but only when there is an outage.
As mentioned by Bijay Regmi and the official documentation, I think the best option would be to subscribe to the Status Dashboard RSS or use the JSON feeds.
With Python you could also create a RSS reader to pull that information in a better way, and you can use this other Stack Overflow post as a reference on how to build it.

Is there a way to retrieve Google Analytics 4 data on a schedule using Node.js?

This is what I want to achieve:
Ask the user to authorize the collection of their data on a Google Analytics 4 property (or Universal Analytics but I would rather not)
Programmatically retrieve and store the data every n-hours
I was able to do (1) client-side by asking for authorization with google's OAUTH2 and making a call to Reporting API v4 https://developers.google.com/analytics/devguides/reporting/core/v4 using gapi on the front-end.
However, I'm not sure how to do it on a schedule without user interaction. I've searched Google's API docs and I believe there's a way to do it in python https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py but I am currently limited to Node and the browser. I guess I could make a server in python that does the data fetching and connects with the Node application, but that's yet another layer of complications that I'm trying to avoid. Is there a way to do everything in Node?
GCP APIs are all documented in a way which allows everyone to generate client libraries in a variety of languages, including node.js. The documentation for the node.js client for Analytics Reporting is here.
For the question of how to schedule this on GCP, I would recommend you to use Cloud Scheduler. This will hit an endpoint running on Cloud Run, which will do the actual work. Alternatively, if you already have a service running somewhere else, you can simply add the required endpoints there and point Cloud Scheduler to it.
The overall design which I would suggest you goes something like this:
Build a site which takes the user through the OAUTH2 login process,
requesting the relevant Google Analytics Reporting API scopes
required to make the request.
Store the obtained credentials in their user database.(preferably
Firestore in Datastore mode)
Set up a Cloud Run service (or anything else), with two endpoints
Iteration endpoint: Iterate through the list of users and add tasks
to Cloud Tasks to hit the download endpoint for each one.
Download endpoint: Takes a user ID (e.g. as a query parameter) and
performs the download for this user. You will need to load the
credentials for the user from the database and use this to access the
reporting API.
Store the downloaded data in the desired location, e.g. Cloud
Storage, Firestore, Cloud SQL, etc.
Set up Cloud Scheduler to hit the iteration endpoint at the desired
frequency.
For the GCP services mentioned above, basically everything other than Analytics, you may use the "cloud" clients for node.js, which are available here
Note : The question you have asked is a very broad question and this answer is just a suggestion. You may think about other designs whichever works best for you.

Facebook's Graph API's request limit on a locally run program? How to get specific data in real time without reaching it?

I've been writing a program in Python which needs to have the datum of the number of likes of a specific Facebook page in real time. The program itself works, but it's based on a loop that is constantly requesting the number of likes and updating it on a variable, and I was afraid that this way it will soon reach the API's limit of requests.
I read that Graph API's request limit per user for an application is 200 requests per hour. Is a program locally run as this one considered an application with one user, or what is it considered?
Also, I read that some users say the API can handle 600 requests per 600 seconds without returning an error, does this still apply? (Could I, for example, delay the loop for one second and still be able to make all the requests?) If not, is there a solution to get that info in real time in a local program? (I saw that Graph can send you updates with a POST on a specified URL, but is there a way to receive those updates without owning an URL? Maybe a way to renew the token or something?). I need to have this program running for almost a whole day, so not being rejected from the API is quite important.
Sorry if it sounds silly or anything, this is the first time I'm using the Graph API (and a web-based API in general).

Real-time timeline function like tweetdeck?

I'm creating an app using python/tweepy
I'm able to use the StreamListener to get real time "timeline" when indicating a "topic" or #, $, etc.
Is there a way to have a real-time timeline function similar to tweetdeck or an embedded widget for a website for the user ID? non-stop
When using api.user_timeline receiving the 20 most recent tweepy.
Any thoughts?
Twitter is a Python library, so there's really no way to use it directly on a website. You could send the data from the stream to a browser using WebSockets or AJAX long polling, however.
There's no way to have the Userstream endpoint send tweets to you any faster - all of that happens on Twitter's end. I'm not sure why Tweetdeck would receive them faster, but there's nothing than can be done except to contact Twitter, if you're getting a slow speed.

How to subscribe to real-time XMPP RSS feeds with Superfeedr

I'm trying to subscribe to feeds with Superfeedr, and I've got a python wrapper for XMPP up and running, and I'm receiving the dummy.xml successfully.
I don't quite understand how to add more sources, however? I've tried adding a few superfeedr.com/track/'s, but I get no new feeds from it (though I do seem to get a confirmation of subscription).
I'd like to add as many real-time (non-POLL) feeds as possible, perhaps by using PubSubHub servers.
I'd really appreciate some help towards this - where do I find such feeds? Can I subscribe to the whole superfeedr.com real-time feed just by adding /track/ ? Or will that only filter the feeds I'm subscribing to? Also, as I'm subscribing from my XMPP.py client on my Amazon server, what exactly is my Subscriber URL (callback) ?
Where do I go from here?
I'll add more info if needed, just let me know.
Superfeedr is an API which will help you gather data from feeds that you're supposed to curate yourself. So the whole process starts with you collecting a list of feeds to which you want to subscribe.
The Track API does not help you find feeds, but rather helps you build virtual feeds that match a given criteria. For example, if you want any mention ot 'stackoerflow' in any feed, you could use track for that. Think of it as RSS feeds for search results, but in realtime (forward looking).
Finally, if you use XMPP, you don't need a callback url, as these are part of the PubSubHubbub API.

Categories