random/empty characters while re-editing a json file - python

I apologize for the vague definition of my problem in the title, but I really can't figure out what sort of problem I'm dealing with. So, here it goes.
I have python file:
edit-json.py
import os, json
def add_rooms(data):
if(not os.path.exists('rooms.json')):
with open('rooms.json', 'w'): pass
with open('rooms.json', 'r+') as f:
d = f.read() # take existing data from file
f.truncate(0) # empty the json file
if(d == ''): rooms = [] # check if data is empty i.e the file was just created
else: rooms = json.loads(d)['rooms']
rooms.append({'name': data['roomname'], 'active': 1})
f.write(json.dumps({"rooms": rooms})) # write new data(rooms list) to the json file
add_rooms({'roomname': 'friends'})'
This python script basically creates a file rooms.json(if it doesn't exist), grabs the data(array) from the json file, empties the json file, then finally writes the new data into the file. All this is done in the function add_rooms(), which is then called at the end of the script, pretty simple stuff.
So, here's the problem, I run the file once, nothing weird happens, i.e the file is created and the data inside it is:
{"rooms": [{"name": "friends"}]}
But the weird stuff happens when the run the script again.
What I should see:
{"rooms": [{"name": "friends"}, {"name": "friends"}]}
What I see instead:
I apologize I had to post the image because for some reason I couldn't copy the text I got.
and I can't obviously run the script again(for the third time) because the json parser gives error due to those characters
I obtained this result in an online compiler. In my local windows system, I get extra whitespace instead of those extra symbols.
I can't figure out what causes it. Maybe I'm not doing file handling incorrectly? or is it due to the json module? or am I the only one getting this result?

When you truncate the file, the file pointer is still at the end of the file. Use f.seek(0) to move back to the start of the file:
import os, json
def add_rooms(data):
if(not os.path.exists('rooms.json')):
with open('rooms.json', 'w'): pass
with open('rooms.json', 'r+') as f:
d = f.read() # take existing data from file
f.truncate(0) # empty the json file
f.seek(0) # <<<<<<<<< add this line
if(d == ''): rooms = [] # check if data is empty i.e the file was just created
else: rooms = json.loads(d)['rooms']
rooms.append({'name': data['roomname'], 'active': 1})
f.write(json.dumps({"rooms": rooms})) # write new data(rooms list) to the json file
add_rooms({'roomname': 'friends'})

Related

Reading a text file of dictionaries stored in one line

Question
I have a text file that records metadata of research papers requested with SemanticScholar API. However, when I wrote requested data, I forgot to add "\n" for each individual record. This results in something looks like
{<metadata1>}{<metadata2>}{<metadata3>}...
and this should be if I did add "\n".
{<metadata1>}
{<metadata2>}
{<metadata3>}
...
Now, I would like to read the data. As all the metadata is now stored in one line, I need to do some hacks
First I split the cluttered dicts using "{".
Then I tried to convert the string line back to dict. Note that I do consider line might not be in a proper JSON format.
import json
with open("metadata.json", "r") as f:
for line in f.readline().split("{"):
print(json.loads("{" + line.replace("\'", "\"")))
However, there is still an error message
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I am wondering what should I do to recover all the metadata I collected?
MWE
Note, in order to get metadata.json file I use, use the following code, it should work out of the box.
import json
import urllib
import requests
baseURL = "https://api.semanticscholar.org/v1/paper/"
paperIDList = ["200794f9b353c1fe3b45c6b57e8ad954944b1e69",
"b407a81019650fe8b0acf7e4f8f18451f9c803d5",
"ff118a6a74d1e522f147a9aaf0df5877fd66e377"]
for paperID in paperIDList:
response = requests.get(urllib.parse.urljoin(baseURL, paperID))
metadata = response.json()
record = dict()
record["title"] = metadata["title"]
record["abstract"] = metadata["abstract"]
record["paperId"] = metadata["paperId"]
record["year"] = metadata["year"]
record["citations"] = [item["paperId"] for item in metadata["citations"] if item["paperId"]]
record["references"] = [item["paperId"] for item in metadata["references"] if item["paperId"]]
with open("metadata.json", "a") as fileObject:
fileObject.write(json.dumps(record))
The problem is that when you do the split("{") you get a first item that is empty, corresponding to the opening {. Just ignore the first element and everything works fine (I added an r in your quote replacements so python considers then as strings literals and replace them properly):
with open("metadata.json", "r") as f:
for line in f.readline().split("{")[1:]:
print(json.loads("{" + line).replace(r"\'", r"\""))
As suggested in the comments, I would actually recommend recreating the file or saving a new version where you replace }{ by }\n{:
with open("metadata.json", "r") as f:
data = f.read()
data_lines = data.replace("}{","}\n{")
with open("metadata_mod.json", "w") as f:
f.write(data_lines)
That way you will have the metadata of a paper per line as you want.

Convert to 'Strict' JSON

I have a JSON file, which the creators say is not 'Strict' JSON and have given python code to convert it into Strict JSON. I am new to python and keep getting error messages.
JSON example:
{'asin': '0078764343', 'description': 'Brand new sealed!', 'price': 37.98, 'imUrl': 'http://ecx.images-amazon.com/images/I/513h6dPbwLL._SY300_.jpg', 'related': {'also_bought': ['B000TI836G', 'B003Q53VZC', 'B00EFFW0HC', 'B003VWGBC0', 'B003O6G5TW', 'B0037LTTRO', 'B002I098JE', 'B008OQTS0U', 'B005EVEODY', 'B008B3AVNE', 'B000PE0HBS', 'B00354NAYG', 'B0050SYPV2', 'B00503E8S2', 'B0050SY77E', 'B0022TNO7S', 'B0056WJA30', 'B0023CBY4E', 'B002SRSQ72', 'B005EZ5GQY', 'B004XACA60', 'B00273Z9WM', 'B004HX1QFY', 'B002I0K50U'], 'bought_together': ['B002I098JE'], 'buy_after_viewing': ['B0050SY5BM', 'B000TI836G', 'B0037LTTRO', 'B002I098JE']}, 'salesRank': {'Video Games': 28655}, 'categories': [['Video Games', 'Xbox 360', 'Games']]}
Python Code:
import json
import gzip
def parse(file_path=r"c:\Users\kiero\PycharmProjects\untitled\source\reviews_Video_Games.json.gz"):
g = gzip.open(file_path, 'r')
for l in g:
yield json.dumps(eval(l))
f = open("C:\\Users\\kiero\\PycharmProjects\\untitled\\source\\reviews_Video_Games.json.gz",'w')
for l in parse("C:\\Users\\kiero\\PycharmProjects\\untitled\\source\\reviews_Video_Games.json.gz"):
f.write(l + '\n')
The process keeps finishing with no edit to any files. Also, the script runs for less than a second. no error messages.
Any help would be appreciated.
There are many problems, the first is that using eval on data you get from other people is a huge security leak.
The second is that your code is indented wrong -- the part from f = open( onwards that calls parse shouldn't be indented, the current way it is part of parse and that function is never called (so nothing happens).
Third, you open the exact same file for writing (with , "w") as you do on the next line for reading; but opening a file for writing empties it. So there is no data, what was there is destroyed.
Fourth, the file you open for writing has a ".gz" filename, but is written to as text, so the result would never be a gzip file.
Real code could look like:
import ast, gzip, json
INFILE = 'c://path/to/infile.gz' # Gzip file
OUTFILE = 'c://path/to/other/file/out.json' # Text file with generated JSON
in_f = gzip.open(INFILE)
out_f = open(OUTFILE, 'w')
for line in in_f:
data = ast.literal_eval(line) # Assuming the line is a valid Python literal
out_f.write(json.dumps(data) + '\n')
in_f.close()
out_f.close()
Fifth, the end result would still have one data object per line, so the file as a whole would still not be valid JSON -- a JSON string represents one object. Each individual line would be valid JSON.

pickle.dump dumps nothing when appending to file

User may give a bunch of urls as command line args. All URLs given in the past are serialized with pickle. The script checks all given URLs, if they are unique then they are serialized and appended to a file. At least that's what should be happening. Nothing is being appended. However when I open the file in write mode,the new, unique URL is written. So what gives? Code is:
def get_new_urls():
if(len(urls.URLs) != 0): # check if empty
with open(urlFile, 'rb') as f:
try:
cereal = pickle.load(f)
print(cereal)
toDump = []
for arg in urls.URLs:
if (arg in cereal):
print("Duplicate URL {0} given, ignoring it.".format(arg))
else:
toDump.append(arg)
except Exception as e:
print("Holy bleep something went wrong: {0}".format(e))
return(toDump)
urlsToDump = get_new_urls()
print(urlsToDump)
# TODO: append new URLs
if(urlsToDump):
with open(urlFile, 'ab') as f:
pickle.dump(urlsToDump, f)
# TODO check HTML of each page against the serialized copy
with open(urlFile, 'rb') as f:
try:
cereal = pickle.load(f)
print(cereal)
except EOFError: # your URL file is empty, bruh
pass
Pickle writes out the data you give it in a special format, e.g. it will write some header/metadata/etc, to the file you give it.
It is not intended to work this way; concatenating two pickle files doesn't really make sense. To achieve a concatenation of your data, you'd need to first read whatever is in the file into your urlsToDump, then update your urlsToDump with any new data, and then finally dump it out again (overwriting the whole file, not appending).
After
with open(urlFile, 'rb') as f:
you need a while loop, to repeatedly unpickle (repeatedly read) from the file until hitting EOF.

Odd behavior of os

I'm trying to write a Python script that will take any playlist and recreate it on another file structure. I have it written now so that all the filenames are stripped off the original playlist and put into a file. That works. Then the function findsong() is supposed to walk thru the new directory and find the same songs and make a new playlist based on the new directory structure.
Here's where it gets weird. If I use 'line' as my argument in the line 'If line in files' I get an empty new playlist. If I use ANY file that I know is there as the argument the entire playlist is recreated, not just the file I used as the argument. That's how I have it set up in this code. I cannot figure out this weird behavior. As long as the file exists, the whole playlist is recreated with the new paths. Wut??
Here is the code:
import os
def check():
datafile = open('testlist.m3u')
nopath = open('nopath.txt', 'w')
nopath.truncate()
for line in datafile:
if 'mp3' in line:
nopath.write(os.path.basename(line))
if 'wma' in line:
nopath.write(os.path.basename(line))
nopath.close()
def findsong():
nopath = open('nopath.txt')
squeezelist = open('squeezelist.m3u' ,'w')
squeezelist.truncate()
for line in nopath:
print line
for root, dirs, files in os.walk("c:\\Documents and Settings\\"):
print files
if ' Tuesday\'s Gone.mp3' in files:
squeezelist.write(os.path.join(root, line))
squeezelist.close()
check()
findsong()
When you iterate over the lines in a file Python retains the trailing newlines \n. You'll want to strip those off:
for line in nopath:
line = line.rstrip()

NameError: global name is not defined

Hello
My error is produced in generating a zip file. Can you inform what I should do?
main.py", line 2289, in get
buf=zipf.read(2048)
NameError: global name 'zipf' is not defined
The complete code is as follows:
def addFile(self,zipstream,url,fname):
# get the contents
result = urlfetch.fetch(url)
# store the contents in a stream
f=StringIO.StringIO(result.content)
length = result.headers['Content-Length']
f.seek(0)
# write the contents to the zip file
while True:
buff = f.read(int(length))
if buff=="":break
zipstream.writestr(fname,buff)
return zipstream
def get(self):
self.response.headers["Cache-Control"] = "public,max-age=%s" % 86400
start=datetime.datetime.now()-timedelta(days=20)
count = int(self.request.get('count')) if not self.request.get('count')=='' else 1000
from google.appengine.api import memcache
memcache_key = "ads"
data = memcache.get(memcache_key)
if data is None:
a= Ad.all().filter("modified >", start).filter("url IN", ['www.koolbusiness.com']).filter("published =", True).order("-modified").fetch(count)
memcache.set("ads", a)
else:
a = data
dispatch='templates/kml.html'
template_values = {'a': a , 'request':self.request,}
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Content-Length'] = len(output)
zipstream=StringIO.StringIO()
file = zipfile.ZipFile(zipstream,"w")
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
zipstream.seek(0)
# create and return the output stream
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = 'attachment; filename="list.kmz"'
while True:
buf=zipf.read(2048)
if buf=="": break
self.response.out.write(buf)
That is probably zipstream and not zipf. So replace that with zipstream and it might work.
i don't see where you declare zipf?
zipfile? Senthil Kumaran is probably right with zipstream since you seek(0) on zipstream before the while loop to read chunks of the mystery variable.
edit:
Almost certainly the variable is zipstream.
zipfile docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter
should be 'r' to read an existing
file, 'w' to truncate and write a new
file, or 'a' to append to an existing
file. If mode is 'a' and file refers
to an existing ZIP file, then
additional files are added to it. If
file does not refer to a ZIP file,
then a new ZIP archive is appended to
the file. This is meant for adding a
ZIP archive to another file (such as
python.exe).
your code:
zipsteam=StringIO.StringIO()
create a file-like object using StringIO which is essentially a "memory file" read more in docs
file = zipfile.ZipFile(zipstream,w)
opens the zipfile with the zipstream file-like object in 'w' mode
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
uses the addFile method to retrieve and write the retrieved data to the file-like object and returns it. The variables are slightly confusing because you pass a zipfile to the addFile method which aliases as zipstream (confusing because we are using zipstream as a StringIO file-like object). Anyways, the zipfile is returned, and closed to make sure everything is "written".
It was written to our "memory file", which we now seek to index 0
zipstream.seek(0)
and after doing some header stuff, we finally reach the while loop that will read our "memory-file" in chunks
while True:
buf=zipstream.read(2048)
if buf=="": break
self.response.out.write(buf)
You need to declare:
global zipf
right after your
def get(self):
line. you are modifying a global variable, and this is the only way python knows what you are doing.

Categories