Python parsing JSON from url incomplete - python

I am trying to get a JSON from a URL I have succeeded in doing that but when I print the result it seems like I get only half of the content.
this is what I wrote:
with urllib.request.urlopen("https://financialmodelingprep.com/api/v3/company/stock/list") as url:
data = json.loads(url.read().decode())
print(data)
I won't print what I am getting because it's very long(if you look at the source and ctrl+f look up the sentence you will find that its in the middle of the page) but it starts with
ties Municipal Fund', 'price': 9.83,
thanks for the help.

I think that your output has been cut by your IDE.
When I run the same request and I write it into a file, you can see the that the data is fully written:
import requests
with requests.get("https://financialmodelingprep.com/api/v3/company/stock/list") as url:
data = url.content.decode()
with open("file.txt","w") as f:
f.write(data)

When the response is large, the first read may only return what is available at that time. If you want to be sure to have the full data, you should loop reading until you get nothing (an empty byte string here).
Here, it would probably be more robust to pass the response stream to json.load and let it read until the json string is completed:
with urllib.request.urlopen(
"https://financialmodelingprep.com/api/v3/company/stock/list") as url:
data = json.load(url)
print(data)
But I think that if the data was partial, json.loads should have choked on it. So the problem is likely elsewhere...

Related

How to use wiktextract

I am trying to extract a Wiktionary xml file from their dumps using the wiktextract python module. However their website does not give me enough information. I could not use the command line program that comes with it since it isn't a Windows executable, so I tried the programmatic way. The following code takes a while to run so it seems to be doing something but then I'm not sure what to do with the ctx variable. Can anyone help me?
import wiktextract
def word_cb(data):
print(data)
ctx = wiktextract.parse_wiktionary(
r'myfile.xml', word_cb,
languages=["English", "Translingual"])
You are on the right track, but don't have to worry too much about the ctx object.
As the documentation says:
The parse_wiktionary call will call word_cb(data) for words and redirects found in the
Wiktionary dump. data is information about a single word and part-of-speech as a dictionary (multiple senses of the same part-of-speech are combined into the same dictionary). It may also be a redirect (indicated by presence of a redirect key in the dictionary).
The output ctx object mostly contains summary information (the number of sections processed, etc; you can use dir(ctx) to see some of its fields.
The useful results are not the ones in the returned ctx object, but the ones passed to word_cb on a word-by-word basis. So you might just try something like the following to get a JSON dump from a wiktionary XML dump. Because the full dumps are many gigabytes, I put a small one on a server for convenience in this example.
import json
import wiktextract
import requests
xml_fn = 'enwiktionary-20190220-pages-articles-sample.xml'
print("Downloading XML dump to " + xml_fn)
response = requests.get('http://45.61.148.79/' + xml_fn, stream=True)
# Throw an error for bad status codes
response.raise_for_status()
with open(xml_fn, 'wb') as handle:
for block in response.iter_content(4096):
handle.write(block)
print("Downloaded XML dump, beginning processing...")
fh = open("output.json", "wb")
def word_cb(data):
fh.write(json.dumps(data))
ctx = wiktextract.parse_wiktionary(
r'enwiktionary-20190220-pages-articles-sample.xml', word_cb,
languages=["English", "Translingual"])
print("{} English entries processed.".format(ctx.language_counts["English"]))
print("{} bytes written to output.json".format(fh.tell()))
fh.close()
For me this produces:
Downloading XML dump to enwiktionary-20190220-pages-articles-sample.xml
Downloaded XML dump, beginning processing...
684 English entries processed.
326478 bytes written to output.json
with the small dump extract I placed on a server for convenience. It will take much longer to run on the full dump.

Retrieving a JSON valid string from a python requests.post()

I am extremely confused after trying a few possible solutions and getting various errors that just lead me in circles. I have a function that will grab a tweet, put it in a dictionary, then write that dictionary to a file using dumps like so:
jsonFile = {}
jsonFile["tweet"] = tweet
jsonFile["language"] = language
with open('jsonOutputfile.txt', 'w') as f:
json.dump(jsonFile, f)
I then have another python file that has a function that will return the value of this jsonOutputfile.txt if I want to use it elsewhere. I do that like so:
with open('jsonOutputfile.txt') as f:
jsonObject = json.load(f)
return jsonObject
This function sits on my localhost. The above two functions that have to do with saving and retrieving the JSON file are separate from the rest of my functions below, as I want them to be.
I have another function that will retrieve the values of the returned status using python requests, like so:
def grab_tweet():
return requests.post("http://gateway:8080/function/twittersend")
and then after grabbing the tweet I want to manipulate it, and I want to do so using the JSON that I should have received from this request.
r = grab_tweet()
data = json.dumps(r.text)
return data.get('tweet')
I want this function above to return just the value that is associated with the tweet key in the JSON that I received from when I saved and loaded it. However, I keep on getting the following error: AttributeError: 'str' object has no attribute 'get' which I am confused about because from my understanding using json.dumps() should create a JSON valid string that I can call get on. Is there an encoding error when I am transferring this to and from a file, or maybe when I am receiving my request?
Any help is appreciated.
EDIT:
Here is a sample of a response from my requests.post when I use r.text, it also looks like there is some Unicode in the response so I put an example at the end of the tweet section. (This also doesn't look like a JSON which is what my question is centered around. There should at least be double quotes and no U's right?):
{u'tweet': u'RT THIS IS THE TWEET BLAH BLAH\u2026', u'language': u'en'}
Use .json() in requests module to get response as JSON
Ex:
data = r.json()
return data.get('tweet')
Note: json.dumps convert your response to a string object
Edit as per comment - Try using the ast module.
Ex:
import ast
data = ast.literal_eval(r.text)
You will need to use the .json() method. See requests' documentation: JSON Response Content
Also, for future reference, rather than do
f.write(json.dumps(jsonFile))
You could simply use:
json.dump(jsonFile, f)
Same with using load instead of loads:
jsonObject = json.load(f)

Python reading responses and validating the result

currently I am stuck with being able to print out the result gotten from the API, but not being able to alter nor read them without parsing it into a text file.
Furthermore, I wouldn't need all of the information that the API provides and would be great if I can only have the match_id.
The response from the API:Result.
From the result I would only need the match_id and after I have gotten the match_id, I would compare it with a list of string e.g. 3238829394, 3238829395 and more, to check whether does any of the value are similar to mine, and if it's similar, the system would then alert me
I have found a way of doing it by passing the results into a text file, then comparing it with the list that I have.
The code for getting the response:
import dota2api
import json
import requests
api = dota2api.Initialise("[Value API][2]")
reponse = api.get_match_history_by_seq_num(start_at_match_seq_num=2829690055, matches_requested=1)
response = str(hist)
f = open('myfile.txt', 'w')
f.write(response)
f.close()
However I am hoping to find a faster and better way to do this process, as it is very time consuming and unstable. Thank you.
You are getting a JSON file back from that API. In python all data can be accessed directly without parsing it.
The response will be something like (sorry, but in that image I cannot copy paste to read the JSON properly):
for match in response['matches']:
if is_similar(match['match_id']):
do_something_cool_here
I think that should do what you need. If you give the answer as string I can help you building the code properly, but I guess you get the idea of what I am trying to say there :)
Hope it helps!
EDIT:
We talked by private and this works:
import dota2api
import requests
api = dota2api.Initialise("API_KEY")
response = api.get_match_history_by_seq_num(start_at_match_seq_num=SEQ_NUM, matches_requested=1)
match_id_check = MATCH_ID
for match in response['matches']:
if match_id_check == match['match_id']:
print(match)
with API_KEY, SEQ_NUM and MATCH_ID to configure

Python unable to read my csv due to extra carriage returns

Sorry if this is redundant but I've tried very hard to look for the answer to this but I've been unable to find one. I'm very new to this so please bear with me:
My objective is for a piece of code to read through a csv full of urls, and return a http status code. I have Python 2.7.5. The outcome of each row would give me the url and the status code, something like this: www.stackoverflow.com: 200.
My csv is a single column csv full of hundreds of urls, one per row. The code I am using is below and when I run this code, it gives me a /r separating two urls similar to this:
{http://www.stackoverflow.com/test\rhttp://www.stackoverflow.com/questions/': 404}
What I would like to see is the two urls separated, and each with their own http status code:
{'http://www.stackoverflow.com': 200, 'http://www.stackoverflow.com/questions/': 404}
But there seems to be an extra \r when Python reads the csv, so it doesn't read the urls correctly. I know folks have said that strip() isn't an all inclusive wiper-outer so any advice on what to do to make this work, would be very much appreciated.
import requests
def get_url_status(url):
try:
r = requests.head(url)
return url, r.status_code
except requests.ConnectionError:
print "failed to connect"
return url, 'error'
results = {}
with open('url2.csv', 'rb') as infile:
for url in infile:
url = url.strip() # "http://datafox.co"
url_status = get_url_status(url)
results[url_status[0]] = url_status[1]
print results
You probably need to figure out how your csv file is formatted, before feeding it to Python.
First, make sure it has consistent line endings. If it has newlines sometimes, and crlf others, that's probably a problem that needs to be corrected.
If you're on a *ix system, tr might be useful.

Python not parsing JSON API return prettily

I reviewed a handful of the questions related to mine and found this slightly unique. I'm using Python 2.7.1 on OS X 10.7. One more note: I'm more of a hacker than developer.
I snagged the syntax below from Python documentation to try to do a "Pretty Print:"
date = {}
data = urllib2.urlopen(url)
s = json.dumps(data.read(), sort_keys=True, indent=4)
print '\n'.join([l.rstrip() for l in s.splitlines()])
I expected using the rstrip / splitlines commands would expand out the calls like in the example.
Also, not sure if it's relevant, but when tring to pipe the output to python -mjson.tool the reply is No JSON object could be decoded
Here's a snippet of the cURL output I'm trying to parse:
{"data":[{"name":"Site Member","created_at":"2012-07-24T11:22:04-07:00","activity_id":"500ee7cbbaf02xxx8e011e2e",
And so on.
The main objective is to make this mess of data more legible so I can learn from it and start structuring some automatic scraping of data based on arguments. Any guidance to get me from green to successful is a huge help.
Thanks,
mjb
The output of the urllib2.urlopen().read() is a string and needs to be converted to an object first before you can call json.dumps() on it.
Modified code:
date = {}
data = urllib2.urlopen(url)
data_obj = json.loads(data.read())
s = json.dumps(data_obj, sort_keys=True, indent=4)
print s

Categories