Can't write long JSON output to text file - python

I have a long string (8,315 characters) worth of JSON, but I can't seem to write it to a .txt file using Python without it being truncated.
I write the JSON to a text file and then upload it via FTP, but both the .txt file on my system and the .txt file on the FTP server are truncated.
Here's the code:
# Upload the results
host = ftputil.FTPHost('ftp.website.com', 'username', 'password')
jsonOutput = json.dumps(full_json)
f = open('C:/Comparison.txt', 'w')
f.write(jsonOutput)
host.upload('C:/Comparison.txt', '/public_html/Comparison.txt')
f.close()
print jsonOutput
The JSON output in the console is valid and whole, but it is truncated in the .txt file that is written (and then the .txt file after it is uploaded).
Most of the time, the output will end at http://www.digikey.com/product-detail/en/A000073/1050-10 when the full URL is actually http://www.digikey.com/product-detail/en/A000073/1050-1041-ND/3476357 (and then of course, it cuts off the rest of the JSON)
I'm not sure if this makes any difference, but I also tried f.write(re.escape(jsonOutput) with the same results.
Can anyone help with this?

with open('C:/Comparison.txt', 'w') as f:
json.dump(full_json, f)

Related

How To Read bytes.fromhex() From A File in Python

I'm working on a project, and a key thing that I'm stuck on is being able to read in encrypted data from a file. I've done some looking around, and I can't find anything specific about this issue.
Data is encrypted from a Python implementation of DES, and the encryption comes out from this return statement: return bytes.fromhex('').join(result). For example, encrypting b'This' gives this as a result:
b'\xc5lP\x04\x8c\xe2\xa8\x05'
I then place this encryption into a file (opened as "wb") using out_file.write(data).
My problem is that when I try to read the encrypted data from the file, nothing gets read. The code below shows that I can read in data the way I want when plaintext is used, but not when this formatting of encrypted text is. I need the read-in data as a bytes type.
with open(filename, "rb") as in_file:
buffer = in_file.read()
Using this on a file with the plaintext This, printing buffer looks like:
b'This'
However, doing this on a file with the encrypted plaintext formed from bytes.fromhex(''), printing buffer gives nothing:
b''
Are there any suggestions on how to either format the encrypted text to put it into a file so that it can be read, or reading data from a file in this particular format? I'm just not understanding why this format is not being interpreted properly as bytes when I read it in from a file.

When extracting my .json.gz file, some characters are added to it - and the file cannot be stored as a json file

I am trying to unzip some .json.gz files, but gzip adds some characters to it, and hence makes it unreadable for JSON.
What do you think is the problem, and how can I solve it?
If I use unzipping software such as 7zip to unzip the file, this problem disappears.
This is my code:
with gzip.open('filename' , 'rb') as f:
json_content = json.loads(f.read())
This is the error I get:
Exception has occurred: json.decoder.JSONDecodeError
Extra data: line 2 column 1 (char 1585)
I used this code:
with gzip.open ('filename', mode='rb') as f:
print(f.read())
and realized that the file starts with b' (as shown below):
b'{"id":"tag:search.twitter.com,2005:5667817","objectType":"activity"
I think b' is what makes the file unworkable for the next stage. Do you have any solution to remove the b'? There are millions of this zipped file, and I cannot manually do that.
I uploaded a sample of these files in the following link
just a few json.gz files
The problem isn't with that b prefix you're seeing with print(f.read()), which just means the data is a bytes sequence (i.e. integer ASCII values) not a sequence of UTF-8 characters (i.e. a regular Python string) — json.loads() will accept either. The JSONDecodeError is because the data in the gzipped file isn't in valid JSON format, which is required. The format looks like something known as JSON Lines — which the Python standard library json module doesn't (directly) support.
Dunes' answer to the question #Charles Duffy marked this—at one point—as a duplicate of wouldn't have worked as presented because of this formatting issue. However from the sample file you added a link to in your question, it looks like there is a valid JSON object on each line of the file. If that's true of all of your files, then a simple workaround is to process each file line-by-line.
Here's what I mean:
import json
import gzip
filename = '00_activities.json.gz' # Sample file.
json_content = []
with gzip.open(filename , 'rb') as gzip_file:
for line in gzip_file: # Read one line.
line = line.rstrip()
if line: # Any JSON data on it?
obj = json.loads(line)
json_content.append(obj)
print(json.dumps(json_content, indent=4)) # Pretty-print data parsed.
Note that the output it prints shows what valid JSON might have looked like.

How does one read a .dif file with Python

I am working on a project that requires me to read a file with a .dif extension. Dif stands for data information exchange. The file opens nicely in Open Office Calc. Then you can easily save as a csv file, however when I open in Python all I get are random characters that don't make sense. Here is the last code that I tried just to see if I could read.
txt = open('C:\myfile.dif', 'rb').read()
print txt
I would even be open to programatically converting the file to csv first. before opening if someone knows how to do that. As always, any help is much appreciated. Below is a partial screenshot of what I get when I run the code.
Hadn't heard of this file format. Went and got a sample here.
I tested your method and it works fine:
>>> content = open(r"E:\sample.dif", 'rb').read()
>>> print (content)
b'TABLE\r\n0,1\r\n"EXCEL"\r\nVECTORS\r\n0,8\r\n""\r\nTUPLES\r\n0,3\r\n""\r\nDATA\r\n0,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"Welcome to File Extension FYI Center!"\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n""\r\n1,0\r\n""\r\n1,0\r\n""\r\n-1,0\r\nBOT\r\n1,0\r\n"ID"\r\n1,0\r\n"Type"\r\n1,0\r\n"Description"\r\n-1,0\r\nBOT\r\n0,1\r\nV\r\n1,0\r\n"ASP"\r\n1,0\r\n"Active Server Pages"\r\n-1,0\r\nBOT\r\n0,2\r\nV\r\n1,0\r\n"JSP"\r\n1,0\r\n"JavaServer Pages"\r\n-1,0\r\nBOT\r\n0,3\r\nV\r\n1,0\r\n"PNG"\r\n1,0\r\n"Portable Network Graphics"\r\n-1,0\r\nBOT\r\n0,4\r\nV\r\n1,0\r\n"GIF"\r\n1,0\r\n"Graphics Interchange Format"\r\n-1,0\r\nBOT\r\n0,5\r\nV\r\n1,0\r\n"WMV"\r\n1,0\r\n"Windows Media Video"\r\n-1,0\r\nEOD\r\n'
>>>
The question is what is in the file and how do you want to handle it. Personally I liked:
with open(r"E:\sample.dif", 'rb') as f:
for line in f:
print (line)
In the first code block, that long line that has a b'' (for bytes!) in front of it can be iterated on \r\n:
b'TABLE\r\n'
b'0,1\r\n'
b'"EXCEL"\r\n'
b'VECTORS\r\n'
b'0,8\r\n'
b'""\r\n'
b'TUPLES\r\n'
b'0,3\r\n'
b'""\r\n'
b'DATA\r\n'
b'0,0\r\n'
.
.
.
b'"Windows Media Video"\r\n'
b'-1,0\r\n'
b'EOD\r\n'

new line chars added to csv file after ftp.storbinary()

I am attempting to store a csv file on an ftp server using python's ftplib module.
Right now, I have about 30 lines of code which generates probabilities of weather values in a 2-d array. I then write this 2-d array to a csv file.
When I write the csv file onto my local drive, the file displays as expected within excel. However, when I view the file after I uploaded it to an ftp server, I see that a new line character has been added after every row.
I've done some minor testing to see what the problem may be, and I have been able to upload the csv file with coreftp. The csv file displays correctly after I do that. So I am pretty sure the file is fine, its something that is happening when python uploads it onto an ftp server.
I was originally creating a text file with a .csv extension file then reopening it as a binary file and uploading it. I thought that may be the issue so I tried using the csv module, but same issue.
Here is my code at the moment...
TEMPSHEADER = [i-50 for i in range(181)]#upper bounds exclusive
WINDSHEADER = [i for i in range(101)]#upper bounds exclusive
HEADER = TEMPSHEADER + WINDSHEADER
for site in ensmosdic:
ensmos = ensmosdic.get(site)
with open(utcnow.strftime("%Y-%m-%d") + "-" +site+"-prob.csv","w",newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=",")
writer.writerow(["CODE ","F","ForecastDate","HOUR"]+HEADER)
siteTable =[[0 for x in range(286)] for y in range(24,169)]#upper bounds exclusive
###########
#other code here, but not important with regards to post
###########
for i in siteTable:
writer.writerow(i)
csvfile.close()#not sure if you have to close csv file, not in csv module docs
f = open(utcnow.strftime("%Y-%m-%d") + "-" +site+"-prob.csv","rb")
ftpInno.storbinary("STOR " + utcnow.strftime("%Y-%m-%d-") + site +"-prob.csv",f)
f.close()
ftpInno.close()
Thanks in advance
After an hour or so of trouble shooting, the answer is fairly simple, although I am not entirely sure why it works.
what i did was create a text file instead of a csv file which I was doing in my original question
with open(FILELOCATION + utcnow.strftime("%Y-%m-%d") + "-" +site+"-prob.txt","w") as f:
#write to the file below
f.close()
#open file again as a txt file
f = open(FILELOCATION + utcnow.strftime("%Y-%m-%d") + "-" +site+"-prob.txt","rb")
ftp.storlines("STOR " + utcnow.strftime("%Y-%m-%d-") + site +"-prob.csv",f)
f.close()
reading the file as a binary file and then storing it with the storlines method removed the extra lines I was seeing within the file after I uploaded it to an ftp server.
This might shed some light on your issue. I had a project where I was using windows command line and also windows powershell to transfer .csv files with the ftp get and mget commands. And like you said I was getting a extra between each row. It seems like switching to binary transfer mode fixed my issue. For example once you are in the ftp dialog just type "binary" and hit enter and it switches the mode.

Create hash table from the contents of a file

How can I open a text file, read the contents of the file and create a hash table from this content? So far I have tried:
import json
json_data = open(/home/azoi/Downloads/yes/1.txt).read()
data = json.loads(json_data)
pprint(data)
I suggest this solution:
import json
with open("/home/azoi/Downloads/yes/1.txt") as f:
data=json.load(f)
pprint(data)
The with statement ensures that your file is automatically closed whatever happens and that your program throws the correct exception if the open fails. The json.load function directoly loads data from an open file handle.
Additionally, I strongly suggest reading and understanding the Python tutorial. It's essential reading and won't take too long.
To open a file you have to use the open statment correctly, something like:
json_data=open('/home/azoi/Downloads/yes/1.txt','r')
where the first string is the path to the file and the second is the mode: r = read, w = write, a = append

Categories