I'm working on a project, and a key thing that I'm stuck on is being able to read in encrypted data from a file. I've done some looking around, and I can't find anything specific about this issue.
Data is encrypted from a Python implementation of DES, and the encryption comes out from this return statement: return bytes.fromhex('').join(result). For example, encrypting b'This' gives this as a result:
b'\xc5lP\x04\x8c\xe2\xa8\x05'
I then place this encryption into a file (opened as "wb") using out_file.write(data).
My problem is that when I try to read the encrypted data from the file, nothing gets read. The code below shows that I can read in data the way I want when plaintext is used, but not when this formatting of encrypted text is. I need the read-in data as a bytes type.
with open(filename, "rb") as in_file:
buffer = in_file.read()
Using this on a file with the plaintext This, printing buffer looks like:
b'This'
However, doing this on a file with the encrypted plaintext formed from bytes.fromhex(''), printing buffer gives nothing:
b''
Are there any suggestions on how to either format the encrypted text to put it into a file so that it can be read, or reading data from a file in this particular format? I'm just not understanding why this format is not being interpreted properly as bytes when I read it in from a file.
Related
Currently, I can't get the data being received from my other client software to write into a file that will append as well as add a space after each dump. I Tried quite a few different approaches but I'm left with this now and I'm a bit stumped.
At the moment I can no longer get a file to write and I'm not sure what I've done to destroy that part of my code.
while True:
data = s.recv(1024).decode('utf-8')
if data:
with open("data.txt", 'w') as f:
json.dump(data, f, ensure_ascii=False)
I am expecting a file will appear that will not be overwritten each time I receive new data, allowing me to develop my search and table features of my application.
What you are currently doing for each block:
Decode the block as UTF
Open a file, truncating the previous contents ('w' mode)
Re-encode the data
Dump it to the file
Why this is a bad way to do it:
Your blocks are not necessarily going to respect UTF code point boundaries. You need to accumulate all the data before you decode.
Not only are you tuncating theexisting file by using 'w' instead of 'a' mode, but opening and closing a file over and over is very inefficient and generally a bad idea.
You are not going to get the same result if the original block was off a UTF boundary. Worst case, your whole dataset will be trash.
You have no way of ending the stream. You probably want to close the file eventually and decode it.
How you should do it:
Open an output file (in binary mode)
Loop until the stream ends
Dump all your raw binary packets to a file
Close the file
Decode the file when you read it
Sample code:
with open('data.txt', 'wb') as file:
while True:
data = s.recv(1024)
if not data:
break
file.write(data)
If the binary stream contains UTF-8 encoded JSON data, that's what you will get in your file.
I am working with a SQL Server database table similar to this
USER_ID varchar(50), FILE_NAME ntext, FILE_CONTENT ntext
sample data:
USER_ID: 1
FILE_NAME: (AttachedFiles:1)=file1.pdf
FILE_CONTENT: (AttachedFiles:1)=H4sIAAAAAAAAAOy8VXQcy7Ku….
Means regular expressions I have successfully isolated the "content" of the FILE_CONTENT field by removing the "(AttachedFiles:1)=" part resulting with a string similar to this:
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc…"
My plan was to reconstruct the file using this string to download it from the database. During my investigation process, I found this post and proceeded to replicate the code like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(content_str))
...getting a TypeError: expected bytes-like object, not str
Investigating further, I found this other post and proceeded like this:
content_str = 'H4sIAAAAAAAAAO19B0AUR/v33...'
encoded = content_str.encode('ascii')
with open(os.path.expanduser('test.pdf'), 'wb') as f:
f.write(base64.decodestring(encoded))
...resulting as a successful creation of a PDF. However, when trying to open it, I get an error saying that the file is corrupt.
I kindly ask you for any suggestions on how to proceed. I am even open to rethink the process I've came up with if necessary. Many thanks in advance!
The value of the FILE_CONTENT is base64-encoded. This means it's a string consisting of 64 possible characters which represent raw bytes. All you need to do is base64-decode the string and write the resulting bytes directly to a file.
import base64
content_str = "H4sIAAAAAAAAAOy8VXQcy7Ku22JmZmZmspiZGS2WLGa0xc=="
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(base64.b64decode(content_str))
The base64 sequence "H4sI" at the start of your content string translates to the bytes 0x1f, 0x8b, 0x08. These bytes are not normally at the start of a PDF file, but indicate a gzip-compressed data stream. It's possible that a PDF reader won't understand this.
I don't know for certain if gzip compression is a valid part of the PDF file format, but it's a valid part of web communication, so maybe the file stream was compressed for transfer/download and has not been decompressed before writing it to the database.
If your PDF reader does not accept the data as is, decompress it before saving it to file:
import gzip
# ...
with open(os.path.expanduser('test.pdf'), 'wb') as fp:
fp.write(gzip.decompress(base64.b64decode(content_str)))
I have written an encryption program that I want to be able to encrypt Excel files with and then decrypt them and output a final Excel file. I have decided to read the whole file then encrypt it as it would be easier than reading each cell from the Excel file.
So far I have been able to read the file and convert it to bytes but cannot figure out how to turn it back into an Excel file.
root = Tk()
root.withdraw()
file = filedialog.askopenfile(initialdir="C:")##Creates a file dialog to pick a file
tempFile=open(file.name,encoding="Latin-1")##Encodes it with Latin-1 so all 256 bytes can be read
file.close()
data=tempFile.read()
tempFile.close()
newFile=open("testfile","w",encoding="cp1252")##Creates a new file with cp1252 encoding as that is what Excel uses
newFile.write(data)
newFile.close() ##Currently it just fills the Excel file with a whole bunch of random characters
Edit:
To be more concise, what I want to do is take the data from an Excel file with anything in it, encrypt it, decrypt it and then write it back into a new Excel file with all formatting etc intact. Is there a way to do the reading and writing of the whole file?
I have found the answer to my problem, #furas your comment is what I needed to do:
selectedFile = filedialog.askopenfile(initialdir="C:")
tempFile=open(selectedFile.name,mode="rb")
data=tempFile.read()
newFile=open("test.xlsx",mode="wb")
newFile.write(data)
This creates the exact same file as the original one.
I want to open a file, decode the format of data (from base64 to ASCII), rewrite or save the decoded string, either back to the same file, or new one.
I have it opening, reading, decoding (and printing as a test) the decoded base64 string into readable format (ASCII I believe)
My goal is to now save this output to: either a "newfile.txt" document or back to the original "test.mcz" file ready for the next steps of my mission...
I know there are great online base64 decoders and they do work well for what I am doing - I use them often, but my goal is to write my own program as a learning exercise more than anything (also when my internet plays up I need an offline program)
Here's where I am so far (the original file is .mcz format it is a game save)
# PYTHON 3
import base64
f = open('test.mcz', 'r')
f_read = f.read()
# print(f_read) # was just as a test
new_f_read = base64.b64decode(f_read)
print (new_f_read)
This prints a butt-load of readable code that is what I need, but I don't want to have to just copy and paste this output from the Python shell into another editor, I want to save it to a file...for convenience.
Either back into the same test.mcz (I will be re-encoding to base64 again later on anyway) or to a new file - thus leaving my original as it was.
problem arises when I want to save/write this decoded output that is stored within the new_f_read variable...it's just been a headache, before I started I could visualise how it needed to be written, I got tripped up when I had to switch it all over to Python3 for some reason (Don't ask...) and I have tried so many variations from online examples - I wouldn't know where to start explaining what I've tried so far. I can't open the original file as both "r" AND "w" together so once Ive opened and decoded I cant reopen the original file as "w" because it just wipes the contents (which are still encoded anyway) -
I think I need to write functions to handle:
1. Open, read, save string to a variable
2. Manipulate string - decode
3. Write the new string to new or existing file
Sounds easy I know, but I am stuck...so here I am. If anyone shows examples, please take the time to explain what is going on, it seems pointless to me having code I don't understand. Apologies if this seems like a simple thing, help would be appreciated..Thanks
First, you can absolutely open a file for both reading and writing without truncating the contents. That's what the r+ mode is for (see https://docs.python.org/3/library/functions.html#open). If you do this, the model is (a) open the file, (b) read it, (c) seek back to the beginning with e.g. f.seek(0), (d) write it.
Secondly, you can simply open the file, read it, then close the file, and then reopen it, write it, and close it again, like this:
# open the file for reading, read the data, then close the file
with open('test.mcz', 'rb') as f:
f_read = f.read()
new_f_read = base64.b64decode(f_read)
# open the file for writing, write the data, then close the file
with open('test.mcz', 'wb') as f:
f.write(new_f_read)
This is probably the easiest solution.
The easiest thing is to open first a read file handle, close it then open a write handle. Read/Write handles are complicated because they have to have a pointer to where in the file you are and it add overhead that you don't need to use. You could do it if you wanted, but its a waste of time here.
Using the with operator to open files is recommended since the file will automatically close when you leave the with block.
import base64
with open('test.mcz', 'r') as f:
encode = base64.b64decode(f.read())
with open('test.mcz', 'wb') as f:
f.write(encode)
This is the same as
import base64
f = open('test.mcz', 'r'):
encode = base64.b64decode(f.read())
f.close()
f = open('test.mcz', 'wb'):
f.write(encode)
f.close()
I am trying to read the text data out of an mp3 file, and then save it to a different mp3 file in python. I DON´T simply want to move the file, as I will be trying to modify it´s contents in the future.
Here is my code:
encoding1="latin-1"
with open(path.get(),"r", encoding=encoding_1) as f:
file=f.read()
...
...
with open("D:\\test\\music_2.mp3","w+", encoding=encoding_1) as f:
f.write(file)
I already tried different combinations of .encode() and .decode() with latin1 and utf8, but that didn´t work either.
Here are some notes on my problem:
The file I save has about 32.000 more symbols than the original one for some reason, even though it should have the exact same length
I don´t get an error message, but the mp3 file is just noise, not music
If I don´t use encoding="latin-1", there is an error message, usually already while reading the file
In one of these error messages, there was a problem with the letter "ï"
mp3 files are not text files. You need to open them as binary files, so that certain characters are not translated. You also will not need to worry about encoding with binary files as you are dealing with binary data not text. To open a file as binary you need to pass the a b to the file mode. open(file, mode)
with open(path.get(),"rb") as f:
You can then parse the file and get to the text data in the binary mp3 file.