Retrieving all comments from a thread on Reddit

Retrieving all comments from a thread on Reddit - python

I’m new to API’s and working with JSON and would love some help here.
I know everything I’m trying to accomplish can be done using the PRAW library, but I’m trying to figure it out without PRAW.
I have a for loop that pulls post titles from a specific subreddit, inputs all the post titles into a pandas data frame, and after the limit is reached, changes the ‘after parameter to the last post id so it repeats with the next batch.
Everything worked perfectly, but when I tried the same technique with a specific thread and gathering the comments, the ‘after’ parameter doesn’t work to grab the next batch.
I’m assuming ‘after’ works differently with threads than with a subreddits posts. I saw in the JSON ‘more’ with a list of ids. Do I need to use this somehow? When I looked at the JSON for the thread, the ‘after’ says ‘none’ even with the updated parameters.
Any idea on what I need to change here? It’s probably something simple.
Working code for getting the subreddit posts with limit 5:
params = {"t":"day","limit":5}
for i in range(2):
response = requests.get('https://oauth.reddit.com/r/stocks/new',
headers=headers, params = params)
response = response.json()
for post in response['data']['children']:
name = post['data']['name']
print('name',name)
params['after'] = name
print(params)
Giving the output:
name t3_lifixn
name t3_lifg68
name t3_lif6u2
name t3_lif5o2
name t3_lif3cm
{'t': 'day', 'limit': 5, 'after': 't3_lif3cm'}
name t3_lif26d
name t3_lievhr
name t3_liev9i
name t3_liepud
name t3_lie41e
{'t': 'day', 'limit': 5, 'after': 't3_lie41e'}
Code for the Reddit thread with limit 10
params = {"limit":10}
for i in range(2):
response = requests.get('https://oauth.reddit.com/r/wallstreetbets/comments/lgrc39/',
params = params,headers=headers)
response = response.json()
for post in response[1]['data']['children']:
name = post['data']['name']
print(name)
params['after'] = name
print(params)
Giving the output:
t1_gmt20i4
t1_gmzo4xw
t1_gmzjofk
t1_gmzjkcy
t1_gmtotfl
{'limit': 10, 'after': 't1_gmtotfl'}
t1_gmt20i4
t1_gmzo4xw
t1_gmzjofk
t1_gmzjkcy
t1_gmtotfl
{'limit': 10, 'after': 't1_gmtotfl'}
Even though the limit was set to 10, it only gave 5 id's before continuing the loop. Also, rather than updating the 'after' parameter, it just restarted.

I ended up figuring out how to do it. Reading the documentation for Reddit's API, when in a thread and you want to pull more comments, you have to compile a list of the id's from the more sections in the JSON. It's a nested tree and looks like the following:
{'kind': 'more', 'data': {'count': 161, 'name': 't1_gmuram8', 'id': 'gmuram8', 'parent_id': 't1_gmt20i4', 'depth': 1, 'children': ['gmuram8', 'gmt6mf6', 'gmubxmr', 'gmt63gl', 'gmutw5j', 'gmtpitn', 'gmtoec3', 'gmtnel0', 'gmt4p79', 'gmupqhx', 'gmv70rm', 'gmtu2sj', 'gmt2vc7', 'gmtmjai', 'gmtje0b', 'gmtkzzj', 'gmt93n5', 'gmtvsqa', 'gmumhat', 'gmuj73q', 'gmtor7c', 'gmuqcwv', 'gmt3lxe', 'gmt4l78', 'gmum9cm', 'gmt857f', 'gmtjrz3', 'gmu0qcl', 'gmt9t9i', 'gmt8jc7', 'gmurron', 'gmt3ysv', 'gmt6neb', 'gmt4v3x', 'gmtoi6t']}}
When using the get request, you would use the following url and format
requests.get(https://oauth.reddit.com/api/morechildren/.json?api_type=json&link_id=t3_lgrc39&children=gmt20i4,gmuram8....etc)

Related

JSON from (RIOT) API Formatted Incorrectly

I am importing JSON data into Python from an API and ran into the following decode error:
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
Looking at online examples its immediately clear my JSON data has ' where others have have ".
Ideally, I'd like to know why it's being downloaded in this way. It seems highly likely its an error on my end, not theirs.
I decided it should be easy to correct the JSON format but I have failed here too. Please see the below code for how I obtain the JSON data and my attempt at fixing it.
#----------------------------------------
#---Read this only if you want to download
#---the data yourself.
#----------------------------------------
#Built from 'Towards Data Science' guide
#https://towardsdatascience.com/how-to-use-riot-api-with-python-b93be82dbbd6
#Must first have installed riotwatcher
#Info in my example is made up, I can't supply a real API code or
#I get in trouble. Sorry about this. You could obtain one from their website
#but this would be a lot of faff for what is probably a simple StackOverflow
#question
#If you were to get/have a key you could use the following information:
#<EUW> for region
#<Agurin> for name
#----------------------------------------
#---Code
#----------------------------------------
#--->Set Variables
#Get installed riotwatcher module for
#Python
import riotwatcher
#Import riotwatcher tools.
from riotwatcher import LolWatcher, ApiError
#Import JSON (to read the JSON API file)
import json
# Global variables
# Get new API from
# https://developer.riotgames.com/
api_key = 'RGAPI-XXXXXXX-XXX-XXXX-XXXX-XXXX'
watcher = LolWatcher(api_key)
my_region = 'MiddleEarth'
#need to give path to where records
#are to be stored
records_dir = "/home/solebaysharp/Projects/Riot API/Records"
#--->Obtain initial data, setup new varaibles
#Use 'watcher' to get basic stats and setup my account as a variable (me)
me = watcher.summoner.by_name(my_region, "SolebaySharp")
# Setup retrieval of recent match info
my_matches = watcher.match.matchlist_by_account(my_region, me["accountId"])
print(my_matches)
#--->Download the recent match data
#Define where the JSON data is going to go
recent_matches_index_json = (records_dir + "/recent_matches_index.json")
#get that JSON data
print ("Downloading recent match history data")
file_handle = open(recent_matches_index_json,"w+")
file_handle.write(str(my_matches))
file_handle.close()
#convert it to python
file_handle = open(recent_matches_index_json,)
recent_matches_index = json.load(file_handle)
Except this giver the following error...
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
So instead to correct this I tried:
file_handle = open(recent_matches_index_json)
json_sanitised = json.loads(file_handle.replace("'", '"'))
This returns...
AttributeError: '_io.TextIOWrapper' object has no attribute 'replace'
For the sake of completeness, beneath is a sample of what the JSON looks like. I have added the paragraphs to enhance readability. It does not come this way.
{'matches': [
{'platformId': 'NA1',
'gameId': 5687555181,
'champion': 235,
'queue': 400,
'season': 13,
'timestamp': 1598243995076,
'role': 'DUO_SUPPORT',
'lane': 'BOTTOM'
},
{'platformId': 'NA1',
'gameId': 4965733458,
'champion': 235,
'queue': 400,
'season': 13,
'timestamp': 1598240780841,
'role': 'DUO_SUPPORT',
'lane': 'BOTTOM'
},
{'platformId': 'NA1',
'gameId': 4583215645,
'champion': 111,
'queue': 400,
'season': 13,
'timestamp': 1598236666162,
'role': 'DUO_SUPPORT',
'lane': 'BOTTOM'
}],
'startIndex': 0,
'endIndex': 100,
'totalGames': 186}

This is occurring because Python is converting the JSON to a string (str).
file_handle.write(str(my_matches))
As it doesn't see the difference between ' and " it just goes with its default of '.
We can stop this from happening by using JSON.dumps. This partially (certainly not fully) answers the second part of the question as well, as we're using a correctly formatted JSON command.
The above-mentioned line simply has to be replaced with this:
file_handle.write(json.dumps(my_matches))
This will preserve the JSON formatting.

Facebook API return a list of object ID's

I am trying to get a list of ids from the most recent posts on my profile so I can perform operations on each post individually elsewhere but am having issues isolating the ids from the rest of the post data. It says that the posts object is a dictionary but when I try to iterate through it using posts['id'] I only get back myID, and when I use for p in posts posts[p] I get back everything. I am relatively new to python and am having some major issues, how can I get a list of just the ids?
import facebook
import json
token = {'****'}
graph = facebook.GraphAPI(token)
page_ids = []
myID = graph.request('/me?fields=id')
print(myID['id'])
posts = graph.get_object(id=myID['id'],fields='posts.fields(object_id)')
print(posts)
{'object_id': '503243983830603', 'id': '2894021623958853_2890876384273377'}, {'object_id': '402662493728056', 'id': '2894021623958853_2890866360941046'}, {'id': '2894021623958853_2890842810943401'}], 'paging': {'previous': 'https://graph.facebook.com/v4.0/2894021623958853/posts?fields=object_id&since=1569687697&access_token=EAAKsaVifZCCYBAE7ofO8qRSheQsBhi2mYrZB39wzfhCZAJ2ejGoyZAi8hdKZAoEjIWUEk7Y1Y8nCb9yrU17JEXf2jGMF7E4SVpneE3EYbCV2zDRb1K8ZCkcY5tQP00DALPWYXNLimGxyGsugwK5GPOZC19suEItAszJM4RHFaTUMMZBXiyiHH0mnzpdUP0vl2MklfCct4mk6EwZDZD&limit=25&__paging_token=enc_AdCi4Pnb01TJi24NGPgqiXLRi3AuoPtBkgJfx43aXRQwPPzkfcJ0BHQuqNXebzs5Vm01uEHcSrvAtTiYcTKvJu5raYCKn4pftMqeyD60hDz0PgZDZD&__previous=1', 'next': 'https://graph.facebook.com/v4.0/2894021623958853/posts?fields=object_id&access_token=EAAKsaVifZCCYBAE7ofO8qRSheQsBhi2mYrZB39wzfhCZAJ2ejGoyZAi8hdKZAoEjIWUEk7Y1Y8nCb9yrU17JEXf2jGMF7E4SVpneE3EYbCV2zDRb1K8ZCkcY5tQP00DALPWYXNLimGxyGsugwK5GPOZC19suEItAszJM4RHFaTUMMZBXiyiHH0mnzpdUP0vl2MklfCct4mk6EwZDZD&limit=25&until=1569590626&__paging_token=enc_AdDfKEIsYp9uDTOhVdXvJ2KKarEfKqyiV3stKSk3ZBPh5rNWDDihjnBZC29jw2xQF1fmjViUe18SIlW8CW4CVV5sFsEtxdSFOnQZA63RNgWrk4ZAqQZDZD'}}, 'id': '2894021623958853'}

Why is the HITBTC V2 REST API returning a 2001 error (Incorrect Pair)?

I'm trying to place an order using V2 of the HITBTC API (docs here). I'm trying to place an order via a POST request, and everything is fine authorization wise, but upon placing the order, the following function returns what the server is sending back, which is the following JSON:
{'error': {'code': 2001, 'message': 'Symbol not found', 'description': 'Try get /api/2/public/symbol, to get list of all available symbols.'}}
My issue arises with the fact that I'm passing the pair I wish to order in the format that's specified by this call for the symbols, which returns JSON like the following:
{"id":"NOAHBTC","baseCurrency":"NOAH","quoteCurrency":"BTC","quantityIncrement":"1000","tickSize":"0.000000001","takeLiquidityRate":"0.001","provideLiquidityRate":"-0.0001","feeCurrency":"BTC"}
I'm passing a string formatted exactly as 'id' is formatted.
def HITBTCorder(pair, side, quantity, price, session):
'''
Creates an order on HITBTC, returns status (filled or not filled)
Side: 'buy' or 'sell'
'''
orderData = json.dumps({'symbol': pair, 'side': side, 'quantity': quantity, 'price': price})
print(orderData)
response = session.post('https://api.hitbtc.com/api/2/order', data = orderData)
responseDict = json.loads(response.text)
return responseDict
The code I'm running looks like this:
session = requests.session()
session.auth = ('APIPUBLIC', 'APISECRET')
response = trade.HITBTCorder("NOAHBTC", 'buy', 1000, tickers.HITBTCprice("NOAHBTC"), session)
Any idea how to get this working?

You may replace
orderData = json.dumps({'symbol': pair, 'side': side, 'quantity': quantity, 'price': price})
to:
orderData = json.dumps({'symbol': pair.lower(), 'side': side, 'quantity': quantity, 'price': price})
because symbol is required to be sent as lowercase.

Data needs to be URL encoded in request body (quantity=1&symbol=ETHBTC...) not JSON to be accepted by server, hope it helps :)
import urllib.parse as parse;
data = parse.urlencode(yourparamsasdict);

How to post with categories to Wordpress using WP REST API?

I created this small script to create wordpres posts, using Basic Auth, and it works. The problem is when I try to assign multiple categories to a post.
The reference is pretty vague. It says that the categories field must be an array. But it doesn't specify if it should be an array of category objects or if the id of these categories must be passed to the field.
https://developer.wordpress.org/rest-api/reference/posts/#schema-categories
So I tried to make it fail so I can get more info from an exception message. The exception message says something like categories[0] is not an integer So I tried with a list of integers. And then, it works. But only one category is assigned, only the last category in the list.
So, How do I add more categories to a post ?
N1: Categories with id's 13 and 16 actually exists in my wordpress install.
N2: I know that I could create a draft, then create new requests to create categories then, use an update post endpoint to assign categories to posts... But in theory, should be possible to pass multiple categories just creating the post, since its in the reference xd
N3: I don't care about security. It is not a requirement.
import base64
import requests
r = requests.session()
wp_host = 'wphost.dev'
wp_username = 'FIXME'
wp_password = 'FIXME'
# BUILD BASIC AUTH STRING
basic_auth = str(
base64.b64encode('{user}:{passwd}'.format(
user=wp_username,
passwd=wp_password
).encode()
), 'utf-8')
# PARAMETERS TO POST REQUEST
p = {
'url': 'http://{wp_host}/wp-json/wp/v2/posts'.format(wp_host=wp_host),
'headers': {'Authorization': 'Basic {basic_auth}'.format(basic_auth=basic_auth)},
'data': {
'title': 'My title',
'content': 'My content',
'categories': [13, 16],
'status': 'publish',
},
}
# THE REQUEST ITSELF
r = r.post(url=p['url'], headers=p['headers'], data=p['data'])
# Output
print(r.content)
# ... "categories":[16],"tags":[] ...

The WP API reference is misleading.
Actually comma separated string with categories IDs is expected:
data: {
...
categories: "162,224"
...
}

PyXero Library Validation Exception

I'm trying to add a payment to xero using the pyxero python library for python3.
I'm able to add invoices and contacts, but payments always returns a validation exception.
Here is the data I'm submitting:
payments.put([{'Amount': '20.00',
'Date': datetime.date(2016, 5, 25),
'AccountCode': 'abc123',
'Reference': '8831_5213',
'InvoiceID': '09ff0465-d1b0-4fb3-9e2e-3db4e83bb240'}])
And the xero response:
xero.exceptions.XeroBadRequest: ValidationException: A validation exception occurred

Please note: this solution became a hack inside pyxero to get the result I needed. This may not be the best solution for you.
The XML that pyxero generates for "payments.put" does not match the "PUT Payments" XML structure found in the xero documentation.
I first changed the structure of your dictionary so that the XML generated in basemanager.py was similar to the documentation's.
data = {
'Invoice': {'InvoiceID': "09ff0465-d1b0-4fb3-9e2e-3db4e83bb240"},
'Account': {"AccountID": "58F8AD72-1F2E-AFA2-416C-8F660DDD661B"},
'Date': datetime.datetime.now(),
'Amount': 30.00,
}
xero.payments.put(data)
The error still persisted though, so I was forced to start changing code inside pyxero's basemanager.py.
In basemanager.py on line 133, change the formatting of the date:
val = sub_data.strftime('%Y-%m-%dT%H:%M:%S')
to:
val = sub_data.strftime('%Y-%m-%d')
pyxero is originally returning the Time. This is supposed to only be a Date value - The docs stipulate the formatting.
Then, again in basemanager.py, on line 257, change the following:
body = {'xml': self._prepare_data_for_save(data)}
to:
if self.name == "Payments":
body = {'xml': "<Payments>%s</Payments>" % self._prepare_data_for_save(data)}
else:
body = {'xml': self._prepare_data_for_save(data)}
Please note that in order for you to be able to create a payment in the first place, the Invoice's "Status" must be set to "AUTHORISED".
Also, make sure the Payment's "Amount" is no greater than Invoice's "AmountDue" value.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Retrieving all comments from a thread on Reddit - python

Related

JSON from (RIOT) API Formatted Incorrectly

Facebook API return a list of object ID's

Why is the HITBTC V2 REST API returning a 2001 error (Incorrect Pair)?

How to post with categories to Wordpress using WP REST API?

PyXero Library Validation Exception

Categories

Resources