Dumping JSON data straight into a text file? - python

Currently, I can't get the data being received from my other client software to write into a file that will append as well as add a space after each dump. I Tried quite a few different approaches but I'm left with this now and I'm a bit stumped.
At the moment I can no longer get a file to write and I'm not sure what I've done to destroy that part of my code.
while True:
data = s.recv(1024).decode('utf-8')
if data:
with open("data.txt", 'w') as f:
json.dump(data, f, ensure_ascii=False)
I am expecting a file will appear that will not be overwritten each time I receive new data, allowing me to develop my search and table features of my application.

What you are currently doing for each block:
Decode the block as UTF
Open a file, truncating the previous contents ('w' mode)
Re-encode the data
Dump it to the file
Why this is a bad way to do it:
Your blocks are not necessarily going to respect UTF code point boundaries. You need to accumulate all the data before you decode.
Not only are you tuncating theexisting file by using 'w' instead of 'a' mode, but opening and closing a file over and over is very inefficient and generally a bad idea.
You are not going to get the same result if the original block was off a UTF boundary. Worst case, your whole dataset will be trash.
You have no way of ending the stream. You probably want to close the file eventually and decode it.
How you should do it:
Open an output file (in binary mode)
Loop until the stream ends
Dump all your raw binary packets to a file
Close the file
Decode the file when you read it
Sample code:
with open('data.txt', 'wb') as file:
while True:
data = s.recv(1024)
if not data:
break
file.write(data)
If the binary stream contains UTF-8 encoded JSON data, that's what you will get in your file.

Related

How To Read bytes.fromhex() From A File in Python

I'm working on a project, and a key thing that I'm stuck on is being able to read in encrypted data from a file. I've done some looking around, and I can't find anything specific about this issue.
Data is encrypted from a Python implementation of DES, and the encryption comes out from this return statement: return bytes.fromhex('').join(result). For example, encrypting b'This' gives this as a result:
b'\xc5lP\x04\x8c\xe2\xa8\x05'
I then place this encryption into a file (opened as "wb") using out_file.write(data).
My problem is that when I try to read the encrypted data from the file, nothing gets read. The code below shows that I can read in data the way I want when plaintext is used, but not when this formatting of encrypted text is. I need the read-in data as a bytes type.
with open(filename, "rb") as in_file:
buffer = in_file.read()
Using this on a file with the plaintext This, printing buffer looks like:
b'This'
However, doing this on a file with the encrypted plaintext formed from bytes.fromhex(''), printing buffer gives nothing:
b''
Are there any suggestions on how to either format the encrypted text to put it into a file so that it can be read, or reading data from a file in this particular format? I'm just not understanding why this format is not being interpreted properly as bytes when I read it in from a file.

Using requests to get an excel spreadsheet via openpyxl's virtual book (Python 3)

I am currently getting an excel file via the requests module in Python. I want to take the binary object and write the contents to an excel spreadsheet like so:
try:
resp = request.post('www.restendpoint.com/rest/', data=body, stream=True)
except Exception as e:
print("Encountered exception: [ ", e, " ]")
# First just try to write the content to the file
with open(file_name, 'wb') as f:
f.write(resp.content)
# Try to loop over chunks of the data to write to the file
with open(file_name, 'wb') as f:
for chunk in resp.iter_lines():
f.write(chunk)
# 2nd attempt to loop over the chunks of data to write to the file
with open(file_name, 'wb') as f:
for chunk in resp.iter_content():
f.write(chunk)
For each of those "with open" blocks, the data ends up being written to the spreadsheet without really doing what I want (which is to generate a nice beautiful spreadsheet).
For reference, here is the first cell that ends up being written:
b'PK\x03\x04\x14\x00\x00\x00\x08\x00L\x82oL\x1f#\xcf\x03\xc0\x00\x00\x00\x13\x02\x00\x00\x0b\x00\x00\x00_rels/.rels\xad\x92O\x8b\xc2#\x0c\xc5\xbfJ\x99\xfb\x1aW\xc1\xc3b=y\xe9mY\xfc\x02q&\xfdC;\x93!\x13\xb1~{\x87\xbdl\xb7TP\xf0\x18^\xf2\xde\x8fG\xf6?4\xa0v\x1cR\xdb\xc5T\x8c~\x08\xa94\xadj\xfc\x02H\xb6%\x8fi\xc5\x91BVj\x16\x8f\x9aGi \xa2\xed\xb1!\xd8\xac\xd7;\x90\xa9\x879\xec\xa7\x9eE\xe5J#\x95\xfb4\xc5\t\xa5!-\xcd8\xc0\x95\xa5?3\xf7\xabl\x9b\x85[\xa4gB\xb9\xae;KG\xb6\x17OA\x17\xb2g\x1b\x06\x96Y6\x7f
And to compare it, here is the same chunk from the resp.content:
b'b\'PK\\x03\\x04\\x14\\x00\\x00\\x00\\x08\\x00\\x82\\x83oL\\x1f#\\xcf\\x03\\xc0\\x00\\x00\\x00\\x13\\x02\\x00\\x00\\x0b\\x00\\x00\\x00_rels/.rels\\xad\\x92O\\x8b\\xc2#\\x0c\\xc5\\xbfJ\\x99\\xfb\\x1aW\\xc1\\xc3b=y\\xe9mY\\xfc\\x02q&\\xfdC;\\x93!\\x13\\xb1~{\\x87\\xbdl\\xb7TP\\xf0\\x18^\\xf2\\xde\\x8fG\\xf6?4\\xa0v\\x1cR\\xdb\\xc5T\\x8c~\\x08\\xa94\\xadj\\xfc\\x02H\\xb6%\\x8fi\\xc5\\x91BVj\\x16\\x8f\\x9aGi \\xa2\\xed\\xb1!\\xd8\\xac\\xd7;\\x90\\xa9\\x879\\xec\\xa7\\x9eE\\xe5J#\\x95\\xfb4\\xc5\\t\\xa5!-\\xcd8\\xc0\\x95\\xa5?3\\xf7
And at the scene of the crime, we have our culprit. The double escape characters must be the reason I can't simply generate the spreadsheet with the easy 3 lines and this is where I am stumped.
What I've tried
I have tried,
resp.content.decode()
which gives me a string without one of the backslashes. Encoding it again, just returns it to the original form.
I have also tried,
escaped = resp.content.decode('unicode_escape')
print(escaped)
escaped = escaped.replace("\\", "")
reencode_binary = escaped.encode()
print(reencode_binary)
with open(file_name, 'wb') as f:
f.write(reencode_binary)
This generates a file, but Libre can't open it. It looks how you would expect an excel file would look if you opened it in a text editor.
I'm not quite sure of any other angles to approach this problem from. I'm fairly honed in on the binary form and that it appears to be corrupted and trying to work around that.
I would appreciate any help.
Thanks.

Opening, edit/rewrite string, save back to a new or same file

I want to open a file, decode the format of data (from base64 to ASCII), rewrite or save the decoded string, either back to the same file, or new one.
I have it opening, reading, decoding (and printing as a test) the decoded base64 string into readable format (ASCII I believe)
My goal is to now save this output to: either a "newfile.txt" document or back to the original "test.mcz" file ready for the next steps of my mission...
I know there are great online base64 decoders and they do work well for what I am doing - I use them often, but my goal is to write my own program as a learning exercise more than anything (also when my internet plays up I need an offline program)
Here's where I am so far (the original file is .mcz format it is a game save)
# PYTHON 3
import base64
f = open('test.mcz', 'r')
f_read = f.read()
# print(f_read) # was just as a test
new_f_read = base64.b64decode(f_read)
print (new_f_read)
This prints a butt-load of readable code that is what I need, but I don't want to have to just copy and paste this output from the Python shell into another editor, I want to save it to a file...for convenience.
Either back into the same test.mcz (I will be re-encoding to base64 again later on anyway) or to a new file - thus leaving my original as it was.
problem arises when I want to save/write this decoded output that is stored within the new_f_read variable...it's just been a headache, before I started I could visualise how it needed to be written, I got tripped up when I had to switch it all over to Python3 for some reason (Don't ask...) and I have tried so many variations from online examples - I wouldn't know where to start explaining what I've tried so far. I can't open the original file as both "r" AND "w" together so once Ive opened and decoded I cant reopen the original file as "w" because it just wipes the contents (which are still encoded anyway) -
I think I need to write functions to handle:
1. Open, read, save string to a variable
2. Manipulate string - decode
3. Write the new string to new or existing file
Sounds easy I know, but I am stuck...so here I am. If anyone shows examples, please take the time to explain what is going on, it seems pointless to me having code I don't understand. Apologies if this seems like a simple thing, help would be appreciated..Thanks
First, you can absolutely open a file for both reading and writing without truncating the contents. That's what the r+ mode is for (see https://docs.python.org/3/library/functions.html#open). If you do this, the model is (a) open the file, (b) read it, (c) seek back to the beginning with e.g. f.seek(0), (d) write it.
Secondly, you can simply open the file, read it, then close the file, and then reopen it, write it, and close it again, like this:
# open the file for reading, read the data, then close the file
with open('test.mcz', 'rb') as f:
f_read = f.read()
new_f_read = base64.b64decode(f_read)
# open the file for writing, write the data, then close the file
with open('test.mcz', 'wb') as f:
f.write(new_f_read)
This is probably the easiest solution.
The easiest thing is to open first a read file handle, close it then open a write handle. Read/Write handles are complicated because they have to have a pointer to where in the file you are and it add overhead that you don't need to use. You could do it if you wanted, but its a waste of time here.
Using the with operator to open files is recommended since the file will automatically close when you leave the with block.
import base64
with open('test.mcz', 'r') as f:
encode = base64.b64decode(f.read())
with open('test.mcz', 'wb') as f:
f.write(encode)
This is the same as
import base64
f = open('test.mcz', 'r'):
encode = base64.b64decode(f.read())
f.close()
f = open('test.mcz', 'wb'):
f.write(encode)
f.close()

Python server is sending truncated images over TCP

I have a simple server on my Windows PC written in python that reads files from a directory and then sends the file to the client via TCP.
Files like HTML and Javascript are received by the client correctly (sent and original file match).
The issue is that image data is truncated.
Oddly, different images are truncated at different lengths, but it's consistent per image.
For example, a specific 1MB JPG is always received as 95 bytes. Another image which should be 7KB, is received as 120 bytes.
Opening the truncated image files in notepad++, the data that is there is correct. (The only issue is that the file ends too soon).
I do not see a pattern for where the files end. The chars/bytes immediately before and after truncation are different per image.
I've tried three different ways for the server to read the files, but they all have the same result.
Here is a snippet of the reading and sending of files:
print ("Cache size=" + str(os.stat(filename).st_size))
#1st attempt, using readlines
fileobj = open(filename, "r")
cacheBuffer = fileobj.readlines()
for i in range(0, len(cacheBuffer)):
tcpCliSock.send(cacheBuffer[i])
#2nd attempt, using line, same result
with open(filename) as f:
for line in f:
tcpCliSock.send(f)
#3rd attempt, using f.read(), same result
with open(filename) as f:
tcpCliSock.send(f.read())
The script prints to the console the size of the file read, and the number of bytes matches the original image. So this proves the problem is in sending, right?
If the issue is with sending, what can I change to have the whole image sent properly?
Since you're dealing with images, which are binary files, you need to open the files in binary mode.
open(filename, 'rb')
From the Python documentation for open():
The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.)
Since your server is running on Windows, as you read the file, Python is converting every \r\n it sees to \n. For text files, this is nice: You can write platform-independent code that only deals with \n characters. For binary files, this completely corrupts your data. That's why it's important to use 'b' when dealing with binary files, but also important to leave it off when dealing with text files.
Also, as TCP is a stream protocol, it's better to stream the data into the socket in smaller pieces. This avoids the need to read an entire file into memory, which will keep your memory usage down. Like this:
with open(filename, 'rb') as f:
while True:
data = f.read(4096)
if len(data) == 0:
break
tcpCliSock.send(data)

Python: Issue reading/writing mp3 file

I am trying to read the text data out of an mp3 file, and then save it to a different mp3 file in python. I DON´T simply want to move the file, as I will be trying to modify it´s contents in the future.
Here is my code:
encoding1="latin-1"
with open(path.get(),"r", encoding=encoding_1) as f:
file=f.read()
...
...
with open("D:\\test\\music_2.mp3","w+", encoding=encoding_1) as f:
f.write(file)
I already tried different combinations of .encode() and .decode() with latin1 and utf8, but that didn´t work either.
Here are some notes on my problem:
The file I save has about 32.000 more symbols than the original one for some reason, even though it should have the exact same length
I don´t get an error message, but the mp3 file is just noise, not music
If I don´t use encoding="latin-1", there is an error message, usually already while reading the file
In one of these error messages, there was a problem with the letter "ï"
mp3 files are not text files. You need to open them as binary files, so that certain characters are not translated. You also will not need to worry about encoding with binary files as you are dealing with binary data not text. To open a file as binary you need to pass the a b to the file mode. open(file, mode)
with open(path.get(),"rb") as f:
You can then parse the file and get to the text data in the binary mp3 file.

Categories