create valid json object in python - python

Each line is valid JSON, but I need the file as a whole to be valid JSON.
I have some data which is aggregated from a web service and dumped to a file, so it's JSON-eaque, but not valid JSON, so it can't be processed in the simple and intuitive way that JSON files can - thereby consituting a major pain in the neck, it looks (more or less) like this:
{"record":"value0","block":"0x79"}
{"record":"value1","block":"0x80"}
I've been trying to reinterpret it as valid JSON, my latest attempt looks like this:
with open('toy.json') as inpt:
lines = []
for line in inpt:
if line.startswith('{'): # block starts
lines.append(line)
However, as you can likely deduce by the fact that I'm posing this question- that doesn't work- any ideas about how I might tackle this problem?
EDIT:
Tried this:
with open('toy_two.json', 'rb') as inpt:
lines = [json.loads(line) for line in inpt]
print(lines['record'])
but got the following error:
Traceback (most recent call last):
File "json-ifier.py", line 38, in <module>
print(lines['record'])
TypeError: list indices must be integers, not str
Ideally I'd like to interact with it as I can with normal JSON, i.e. data['value']
EDIT II
with open('transactions000000000029.json', 'rb') as inpt:
lines = [json.loads(line) for line in inpt]
for line in lines:
records = [item['hash'] for item in lines]
for item in records:
print item

This looks like NDJSON that I've been working with recently. The specification is here and I'm not sure of its usefulness. Does the following work?
with open('the file.json', 'rb') as infile:
data = infile.readlines()
data = [json.loads(item.replace('\n', '')) for item in data]
This should give you a list of dictionaries.

Each line looks like a valid JSON document.
That's "JSON Lines" format (http://jsonlines.org/)
Try to process each line independantly (json.loads(line)) or use a specialized library (https://jsonlines.readthedocs.io/en/latest/).
def process(oneline):
# do what you want with each line
print(oneline['record'])
with open('toy_two.json', 'rb') as inpt:
for line in inpt:
process(json.loads(line))

Related

Reading a text file of dictionaries stored in one line

Question
I have a text file that records metadata of research papers requested with SemanticScholar API. However, when I wrote requested data, I forgot to add "\n" for each individual record. This results in something looks like
{<metadata1>}{<metadata2>}{<metadata3>}...
and this should be if I did add "\n".
{<metadata1>}
{<metadata2>}
{<metadata3>}
...
Now, I would like to read the data. As all the metadata is now stored in one line, I need to do some hacks
First I split the cluttered dicts using "{".
Then I tried to convert the string line back to dict. Note that I do consider line might not be in a proper JSON format.
import json
with open("metadata.json", "r") as f:
for line in f.readline().split("{"):
print(json.loads("{" + line.replace("\'", "\"")))
However, there is still an error message
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I am wondering what should I do to recover all the metadata I collected?
MWE
Note, in order to get metadata.json file I use, use the following code, it should work out of the box.
import json
import urllib
import requests
baseURL = "https://api.semanticscholar.org/v1/paper/"
paperIDList = ["200794f9b353c1fe3b45c6b57e8ad954944b1e69",
"b407a81019650fe8b0acf7e4f8f18451f9c803d5",
"ff118a6a74d1e522f147a9aaf0df5877fd66e377"]
for paperID in paperIDList:
response = requests.get(urllib.parse.urljoin(baseURL, paperID))
metadata = response.json()
record = dict()
record["title"] = metadata["title"]
record["abstract"] = metadata["abstract"]
record["paperId"] = metadata["paperId"]
record["year"] = metadata["year"]
record["citations"] = [item["paperId"] for item in metadata["citations"] if item["paperId"]]
record["references"] = [item["paperId"] for item in metadata["references"] if item["paperId"]]
with open("metadata.json", "a") as fileObject:
fileObject.write(json.dumps(record))
The problem is that when you do the split("{") you get a first item that is empty, corresponding to the opening {. Just ignore the first element and everything works fine (I added an r in your quote replacements so python considers then as strings literals and replace them properly):
with open("metadata.json", "r") as f:
for line in f.readline().split("{")[1:]:
print(json.loads("{" + line).replace(r"\'", r"\""))
As suggested in the comments, I would actually recommend recreating the file or saving a new version where you replace }{ by }\n{:
with open("metadata.json", "r") as f:
data = f.read()
data_lines = data.replace("}{","}\n{")
with open("metadata_mod.json", "w") as f:
f.write(data_lines)
That way you will have the metadata of a paper per line as you want.

Read JSON file correctly

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia
With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.
Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]

How do I loop through a multiple delimited JSON file over Python?

I'm facing problem to loop over multiple delimited JSON, following is my JSON file content:
[{"Timestamp":"2019-05-17T18:00:00.19+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"3","Type":"MachineInfo"}}}]]
[{"Timestamp":"2019-05-17T18:00:10.502+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"1","Type":"MachineInfo"}}}]]
[{"Timestamp":"2019-05-17T18:00:05.814+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"9","Type":"MachineInfo"}}}]]
It doesnt work, unless I did the manually adding the commas (,) after the row work as below:
[{"Timestamp":"2019-05-17T18:00:00.19+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"3","Type":"MachineInfo"}}}],
{"Timestamp":"2019-05-17T18:00:10.502+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"1","Type":"MachineInfo"}}}],
{"Timestamp":"2019-05-17T18:00:05.814+08:00","Items":[{"Name":"CurrentTaskSequence","Body":{"Status":"9","Type":"MachineInfo"}}}]]
def main():
#Read json file
f = open('/home/amirizzat/Desktop/data.json')
data = json.load(f)
f.close()
#Print json
print(data)
#call main
main()
So it appears that your file isn't exactly JSON, instead it has lines and the content of each line is JSON.
You could do something like
with open('/home/amirizzat/Desktop/data.json') as f:
data = [json.loads(line) for line in f]
print(data)
That loops over the lines and deserializes the JSON for each one, putting the results in an array.

Python: data to file then data from text file to list - TypeError: must be str, not bytes

I'm a beginner in programming and have decided to teach myself Python. After a few days, i've decided to code a little piece. I's pretty simple:
date of today
page i am at (i'm reading a book)
how i feel
then i add the data in a file. every time i launch the program, it adds a new line of data in the file
then i extract the data to make a list of lists.
truth is, i wanted to re-write my program in order to pickle a list and then unpickle the file. However, as i'm coping with an error i can't handle, i really really want to understand how to solve this. Therefore i hope you will be able to help me out :)
I've been struggling for the past hours on this apparently a simple and stupid problem. Though i don't find the solution. Here is the error and the code:
ERROR:
Traceback (most recent call last):
File "dailyshot.py", line 25, in <module>
SaveData(todaysline)
File "dailyshot.py", line 11, in SaveData
mon_pickler.dump(datatosave)
TypeError: must be str, not bytes
CODE:
import pickle
import datetime
def SaveData(datatosave):
with open('journey.txt', 'wb') as thefile:
my_pickler = pickle.Pickler(thefile)
my_pickler.dump(datatosave)
thefile.close()
todaylist = []
today = datetime.date.today()
todaylist.append(today)
page = input('Page Number?\n')
feel = input('How do you feel?\n')
todaysline = today.strftime('%d, %b %Y') + "; " + page + "; " + feel + "\n"
print('Thanks and Good Bye!')
SaveData(todaysline)
print('let\'s make a list now...')
thefile = open('journey.txt','rb')
thelist = [line.split(';') for line in thefile.readlines()]
thefile.close()
print(thelist)
Thanks a looot!
Ok so there are a few things to comment on here:
When you use a with statement, you don't have to explicitly close the file. Python will do that for you at the end of the with block (line 8).
You don't use todayList for anything. You create it, add an element and then just discard it. So it's probably useless :)
Why are you pickling string object? If you have strings just write them to the file as is.
If you pickle data on write you have to unpickle it on read. You shouldn't write pickled data and then just read the file as a plain text file.
Use a for append when you are just adding items to the file, w will overwrite your whole file.
What I would suggest is just writing a plain text file, where every line is one entry.
import datetime
def save(data):
with open('journey.txt', 'a') as f:
f.write(data + '\n')
today = datetime.date.today()
page = input('Page Number: ')
feel = input('How do you feel: ')
todaysline = ';'.join([today.strftime('%d, %b %Y'), page, feel])
print('Thanks and Good Bye!')
save(todaysline)
print('let\'s make a list now...')
with open('journey.txt','r') as f:
for line in f:
print(line.strip().split(';'))
Are you sure you posted the right code? That error can occur if you miss out the "b" when you open the file
eg.
with open('journey.txt', 'w') as thefile:
>>> with open('journey.txt', 'w') as thefile:
... pickler = pickle.Pickler(thefile)
... pickler.dump("some string")
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
TypeError: must be str, not bytes
The file should be opened in binary mode
>>> with open('journey.txt', 'wb') as thefile:
... pickler = pickle.Pickler(thefile)
... pickler.dump("some string")
...
>>>

reading csv file without for

I need to read a CSV file in python.
Since for last row I receive a 'NULL byte' error I would like to avoid using for keyword but the while.
Do you know how to do that?
reader = csv.reader( file )
for row in reader # I have an error at this line
# do whatever with row
I want to substitute the for-loop with a while-loop so that I can check if the row is NULL or not.
What is the function for reading a single row in the CSV module?
Thanks
Thanks
p.S. below the traceback
Traceback (most recent call last):
File "FetchNeuro_TodayTrades.py", line 189, in
for row in reader:
_csv.Error: line contains NULL byte
Maybe you could catch the exception raised by the CSV reader. Something like this:
filename = "my.csv"
reader = csv.reader(open(filename))
try:
for row in reader:
print 'Row read with success!', row
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
Or you could use next():
while True:
try:
print reader.next()
except csv.Error:
print "Error"
except StopIteration:
print "Iteration End"
break
You need (always) to say EXACTLY what is the error message that you got. Please edit your question.
Probably this:
>>> import csv; csv.reader("\x00").next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_csv.Error: line contains NULL byte
>>>
The csv module is not 8-bit clean; see the docs: """Also, there are currently some issues regarding ASCII NUL characters."""
The error message is itself in error: it should be "NUL", not "NULL" :-(
If the last line in the file is empty, you won't get an exception, you'll merely get row == [].
Assuming the problem is one or more NULs in your file(s), you'll need to (1) speak earnestly to the creator(s) of your file(s) (2) failing that, read the whole file in (mode="rb"), strip out the NUL(s), and feed fixed_text.splitlines() to the csv reader.
The Django community has addressed Python CSV import issues, so it might be worth searching for CSV import there, or posting a question. Also, you could edit the offending line directly in the CSV file before trying the import.
You could try cleaning the file as you read it:
def nonull(stream):
for line in stream:
yield line.replace('\x00', '')
f = open(filename)
reader = csv.reader(nonull(f))
Assuming, of course, that simply ignoring NULL characters will work for you!
If your problem is specific to the last line being empty, you can use numpy.genfromtxt (or the old matplotlib.mlab.csv2rec)
$: cat >csv_file.txt
foo,bar,baz
yes,no,0
x,y,z
$:
$: ipython
>>> from numpy import genfromtxt
>>> genfromtxt("csv_file.txt", dtype=None, delimiter=',')
array([['foo', 'bar', 'baz'],
['yes', 'no', '0'],
['x', 'y', 'z']],
dtype='|S3')
not really sure what you mean, but you can always check for existence with if
>>> reader = csv.reader("file")
>>> for r in reader:
... if r: print r
...
if this is not what you want, you should describe your problem more clearly by showing examples of things that doesn't work for you, including sample file format and desired output you want.
I don't have an answer, but I can confirm the problem, and that most answers posted don't work. You cannot catch this exception. You cannot test for if line. Maybe you could check for the NULL byte directly, but I'm not swift enough to do that... If it is always on the last line, you could of course skip that.
import csv
FH = open('data.csv','wb')
line1 = [97,44,98,44,99,10]
line2 = [100,44,101,44,102,10]
for n in line1 + line2:
FH.write(chr(n))
FH.write(chr(0))
FH.close()
FH = open('data.csv')
reader = csv.reader(FH)
for line in reader:
if '\0' in line: continue
if not line: continue
print line
$ python script.py
['a', 'b', 'c']
['d', 'e', 'f']
Traceback (most recent call last):
File "script.py", line 11, in <module>
for line in reader:
_csv.Error: line contains NULL byte
Process the initial csv file and replace the Nul '\0' with blank, and then you can read it.
The actual code looks like this:
data_initial = open(csv_file, "rU")
reader = csv.reader((line.replace('\0','') for line in data_initial))
It works for me.
And the original answer is here:csv-contain null byte

Categories