Tweepy StreamingClient save json to file - python

I'd like to save Tweepy stream tweets to a .txt with json format. According to the documentation it should be possible to set return_type=dict with StreamingClient.
With the following code I get: TypeError: Object of type Tweet is not JSON serializable. Maybe I would need to set the parameter return_type=dict in the superclass? Afte a lot of trying I haven't been able to make it work. I'd be very grateful for any help!
import tweepy
import json
from tweepy import StreamingClient, StreamRule
class TweetPrinter(tweepy.StreamingClient):
def on_tweet(self, tweet):
with open("fetched_tweets.txt", "a") as f:
f.write(json.dumps(tweet, indent=4))
return True
printer = TweetPrinter(bearer_token=bearer_token, return_type=dict) # I don't get dict as output.
rule = StreamRule(value="Python")
printer.add_rules(rule)
printer.filter(expansions=['author_id', 'geo.place_id'], tweet_fields="created_at")

Related

Python for loop API request

I am extracting data from this API
I was able to save the JSON file on my local machine.
I want to run the requests for several stocks.
How do I do it?
I tried to play with for loops but not good came out of this. I attached the code below.
the out put is:
AAPL
[]
TSLA
[]
Thank you, Tal
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
# Fall back to Python 2's urllib2
from urllib2 import urlopen
import requests
import json
import time
def get_jsonparsed_data(url):
"""
Receive the content of ``url``, parse it as JSON and return the object.
Parameters
----------
url : str
Returns
-------
dict
"""
stock_symbol = ["AAPL","TSLA"]
for symbol in stock_symbol:
print (symbol)
#Sending the API request
r = requests.get('https://financialmodelingprep.com/api/v3/income-statement/symbol={stock_symbol}?limit=120&apikey={removed by me})
packages_JSON = r.json()
print(packages_JSON)
#Exporting the data into JSON file
with open('stocks_data321.json', 'w', encoding='utf-8') as f:
json.dump(packages_JSON, f, ensure_ascii=False, indent=4)
Querying multiple APIs iterativelly will take a lot of time. Consider using theading or AsyncIO to do requests simultaniously and speed up the process.
In a nutshell you should do something like this for each API:
import threading
for provider in [...]: # list of APIs to query
t = threading.Thread(target=api_request_function, args=(provider, ...))
t.start()
However better read this great article first to understand whats and whys of threading approach.

TypeError when importing json data with pymongo

I am trying to import json data from a link containing valid json data to MongoDB.
When I run the script I get the following error:
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
What am I missing here or doing wrong?
import pymongo
import urllib.parse
import requests
replay_url = "http://live.ksmobile.net/live/getreplayvideos?"
userid = 769630584166547456
url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'
print(f"Replay url: {url2}")
raw_replay_data = requests.get(url2).json()
uri = 'mongodb://testuser:password#ds245687.mlab.com:45687/liveme'
client = pymongo.MongoClient(uri)
db = client.get_default_database()
replays = db['replays']
replays.insert_many(raw_replay_data)
client.close()
I saw that you are getting the video information data for 22 videos.
You can use :
replays.insert_many(raw_replay_data['data']['video_info'])
for saving them
You can make one field as _id for mongodb document
use the following line before insert_many
for i in raw_replay_data['data']['video_info']:
i['_id'] = i['vid']
this will make the 'vid' field as your '_id'. Just make sure that the 'vid' is unique for all videos.

Python: Automatically updating Twitch moderator list using API

I'm trying to auto update a moderator list using this API:
https://tmi.twitch.tv/group/user/ice3lade/chatters
I am accessing and storing it with
from urllib.request import urlopen
response = urlopen('https://tmi.twitch.tv/group/user/ice3lade/chatters')
chatlist = response.read()
But attempting to simply use it as a dictionary e.g.
print(chatlist("chatters"))
Returns an error
TypeError: 'bytes' object is not callable
I'm a total python noob so any help is appreciated. How do I either access this as a dictionary directly from the API, or how to store the data I get from reading the API as a proper dictionary?
Made a fairly reasonable solution, chatlist gives the full dictionary, chatters gives all the keys and values within the chatters dictionary, and moderators gives the list of moderators.
from urllib.request import urlopen
from json import loads
response = urlopen('https://tmi.twitch.tv/group/user/xflixx_teampokerstars/chatters')
readable = response.read().decode('utf-8')
chatlist = loads(readable)
chatters = chatlist['chatters']
moderators = chatters['moderators']
Didn't know json was required to decode the API.

JSON serialization Error in Python 3.2

I am using JSON library and trying to import a page feed to an CSV file. Tried many a ways to get the result however every time code execute it Gives JSON not serialzable. No Facebook use auth code which I have and used it so connection string will change however if you use a page which has public privacy you will still be able to get the result from below code.
following is the code
import urllib3
import json
import requests
#from pprint import pprint
import csv
from urllib.request import urlopen
page_id = "abcd" # username or id
api_endpoint = "https://graph.facebook.com"
fb_graph_url = api_endpoint+"/"+page_id
try:
#api_request = urllib3.Requests(fb_graph_url)
#http = urllib3.PoolManager()
#api_response = http.request('GET', fb_graph_url)
api_response = requests.get(fb_graph_url)
try:
#print (list.sort(json.loads(api_response.read())))
obj = open('data', 'w')
# write(json_dat)
f = api_response.content
obj.write(json.dumps(f))
obj.close()
except Exception as ee:
print(ee)
except Exception as e:
print( e)
Tried many approach but not successful. hope some one can help
api_response.content is the text content of the API, not a Python object so you won't be able to dump it.
Try either:
f = api_response.content
obj.write(f)
Or
f = api_response.json()
obj.write(json.dumps(f))
requests.get(fb_graph_url).content
is probably a string. Using json.dumps on it won't work. This function expects a list or a dictionary as the argument.
If the request already returns JSON, just write it to the file.

urllib2.urlopen not getting all content

I am a beginner in python to pull some data from reddit.com
More precisely, I am trying to send a request to http:www.reddit.com/r/nba/.json to get the JSON content of the page and then parse it for entries about a specific team or player.
To automate the data gathering, I am requesting the page like this:
import urllib2
FH = urllib2.urlopen("http://www.reddit.com/r/nba/.json")
rnba = FH.readlines()
rnba = str(rnba[0])
FH.close()
I am also pulling the content like this on a copy of the script, just to be sure:
FH = requests.get("http://www.reddit.com/r/nba/.json",timeout=10)
rnba_json = FH.json()
FH.close()
However, I am not getting the full data that is presented when I manually go to
http://www.reddit.com/r/nba/.json with either method, in particular when I call
print len(rnba_json['data']['children']) # prints 20-something child stories
but when I do the same loading the copy-pasted JSON string like this:
import json
import urllib2
fh = r"""{"kind": "Listing", "data": {"modhash": ..."""# long JSON string
r_nba = json.loads(fh) #loads the json string from the site into json object
print len(r_nba['data']['children']) #prints upwards of 100 stories
I get more story links. I know about the timeout parameter but providing it did not resolve anything.
What am I doing wrong or what can I do to get all the content presented when I pull the page in the browser?
To get the max allowed, you'd use the API like: http://www.reddit.com/r/nba/.json?limit=100

Categories