struggling to parse an object using jsonlines

struggling to parse an object using jsonlines - python

I'm having trouble parsing the body of a request using jsonlines. I'm using tornado as the server and this is happening inside a post() method.
My purpose in this is to parse the request's body into separate JSONs, then iterate over them with a jsonlines Reader, do some work on each one and then push them to a DB.
I solved this problem by dumping the utf-8 encoded body into a file and then used:
with jsonlines.open("temp.txt") as reader:
That works for me. I can iterate over the entire file with
for obj in reader:
I just feel like this is an unnecessary overhead that can be reduced if I can understand what's keeping me from just using this bit of code instead:
log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as reader:
for obj in reader:
the exception I get is this:
jsonlines.jsonlines.InvalidLineError: line contains invalid json:
Expecting property name enclosed in double quotes: line 1 column 2
(char 1) (line 1)
I've tried searching for this error here and all I found were examples where people tried using incorrectly formatted jsons that have one quote instead of double quotes. That is not the case for me. I debugged the request and saw that the string that returns from the decode method indeed has double quotes for both properties and values.
here is a sample of the body of the request I send (this is what it looks like in Postman):
{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}}
{"type":"event","timestamp":"2018-03-25 09:19:51.061","event":"ScreenShown","params":{"name":"SettingsScreen"}}
{"type":"event","timestamp":"2018-03-25 09:19:53.580","event":"ButtonClicked","params":{"screen":"SettingsScreen","button":"MissionsButton"}}
{"type":"event","timestamp":"2018-03-25 09:19:53.615","event":"ScreenShown","params":{"name":"MissionsScreen"}}
You can reproduce the exception by using this simple bit of code in a post method and sending the lines I provided through Postman:
log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as currentlog:
for obj in currentlog:
print("obj")
As a sidenote: Postman sends the data as text, not JSON.
If you need any more information to answer this question, please let me know.
One thing I did notice is that the string that returns from the decode method starts and ends with one quote. I guess this is because of the double quotes in the JSONs themselves. Is it related in any way?
An example:
'{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}}'
Thanks for any help!

jsonlines.Reader accepts iterable as an arg ("The first argument must be an iterable that yields JSON encoded strings" not json-encoded single string as in your example), but, after .decode("utf-8"), log would be a string, which happen to support iterable interface. So when reader calls under the hood next(log) it will get first item of a log string, i.e. character { and will try to process it as an json-line which would be obviously invalid. Try log = log.split() before passing log to the Reader.

Related

Issue with strings and dictionaries in python

I'm using the click package to get input for one or more variables which get loaded in as a combined dictionary. Each entry is then joined and the combined string is added to the end of a base URL and sent through the requests package to receive some xml data.
Earlier I had an issue with one of the variables that let you search through a range, such as
[value1, value2]
Python added double quotes around it so the search function didn't operate correctly, so I used
.replace('"', '')
on the joined string before combined with the base url and that seemed to fix that problem. The issue now is that individual input that contains more than one word now doesn't produce the same output as the actual search engine online. I have to use quotes when I input the information to keep it as a single argument, but then the quotes get removed by the function above and I believe that is what is causing the issue.
I think if I have a way to access individual entries of this dictionary and remove the double quotes from only certain entries then that should get the job done. But if I am overlooking something please let me know.
Help is appreciated.
Code added below:
import click
import requests
#click.command()
#click.option(--variable1)
#click.option(--variable2)
query_list=[variable1, variable2]
query=''.join(query_list)
base_url = "abc.com...."
response=requests.get(base_url,query)

Overflow error when reading json file

I am trying to read a json which includes a number of tweets, but I get the following error.
OverflowError: int too large to convert
The script filters multiple json files to get specific tweets, and it crashes when reaching to a specific json.
The line that creates the error is this one :
df_temp = pd.read_json(path_or_buf=json_path, lines=True)
Here is the error in the cmd

Just store the user id as a String, and treat it like it is one (this is actually what you should do when dealing with this kind of ids). If you can't change the json input format, you can always parse it like a string before parsing it like a json object, and add the quotes to the id code, using for instance regexes: Regex in python.
I don't know with which library you are parsing the json, but maybe also implicit casting will work: either try the "getString" method on the number instead of the "getInt" method, or force python to treat the object like a string, with something like x = "" + json.getId()
Python is pretty loose on typing and may let you do it.

Decode text response from API in Python 3.6

I am trying to extract data from mailchimp export api, which returns responses based on the following specifications:
Returns:
Parameter - text
Description:
a plain text dump of JSON objects. The first row is a header row. Each additional row returned is an individual JSON object. Rows are delimited using a newline (\n) marker, so implementations can read in a single line at a time, handle it, and move on.
To get the data I am using:
response = requests.get(urldetails).text
If I use .json() it errors out with a JSON decode error. The output of the above is something along the lines of:
{data..}
{data...}
I am unsure whether each dict is on a separate row, however I am under the impression it's actually just one continuous string as many of my attempts to decode it ended up with an error 'str' object cannot be...etc. . I don't see the '\n' separators anywhere when I am using the .text method.
What's the best way of going about and make each dict a separate item in a list or a row in a dataframe (which I can unpack later).
Thanks

You can get all the data from the MailChimp export api using a simple approach. Please note that I am using f-strings, only available in Python 3.6+.
import requests
import json
apikey = '<your-api-key>'
id = "<list-id>"
URL = f"https://us10.api.mailchimp.com/export/1.0/campaignSubscriberActivity/?apikey={apikey}&id={id}"
json_data = [json.loads(s) for s in requests.get(URL).text.strip().split("\n")]
print(json_data[0]['<some-subscriber-email>'][0]['action'])

Provided that the text response isn't insanely badly formed json, you can use the json library. In particular, the loads() function.
import json
json_response = json.loads(response)
loads() loads JSON into a python dict from a string.
EDIT:
The Mailchimp API states that each JSON object is separated by a newline character. We can create a list of dicts with the following code:
# get response from GET request and load as a string
import json
json_resp = [json.loads(line) for line in response.split('\n')]

Flask - Get_JSON changing delimiter

I post the following JSON to my flask server:
'{"on":false}'
I then use the following line of code to return the JSON to HTTP PUT it onto another device on the network:
content = ("'" + str(request.get_json()) + "'").lower()
However, instead of returning the expected:
'{"on":false}'
It returns:
'{'on':false}'
Thus meaning the JSON is invalid and does not work. Is there something I can change in the request.get_json() or is there a different method?

You are doing some strange things here.
get_json() automatically parses the incoming JSON string into a Python data structure. You then call str on it, converting it back not into JSON but into a representation of the Python structure.
Now, you could call json.dumps instead of str, but it would be better to avoid converting it from JSON in the first place. Instead of using request.get_json, use request.get_data; now Flask won't parse the content from JSON, and your quotes will be preserved.

What is the difference between json.JSONDecoder().decode() and json.loads()

I am using urllib2 to grab the html of a url and then a regex to extract a JSON that I need from there. I want to get the usual "dictionary of dictionaries" Python object and both of the following work:
my_json #a correctly formatted json string
json_dict1 = json.JSONDecoder().decode(my_json)
json_dict2 = json.loads(my_json)
What is the difference and which is better in what circumstances (besides mine, but that one in particular)?

json.loads() essentially creates a json.JSONDecoder() instance and calls decode on it. As such your first line is exactly the same thing as the second line. See the json.loads() source code.
The module offers you flexibility; a simple function API or a full OO API that you can subclass if needed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

struggling to parse an object using jsonlines - python

Related

Issue with strings and dictionaries in python

Overflow error when reading json file

Decode text response from API in Python 3.6

Flask - Get_JSON changing delimiter

What is the difference between json.JSONDecoder().decode() and json.loads()

Categories

Resources