encoding string - python

"afile" is a previously existing file.
handle=open("afile",'r+b')
data=handle.readline()
handle.close()
# signgenerator is a hashlib.md5() object
signgenerator.update(data)
hex=signgenerator.hexdigest()
print(hex) # prints out 061e3f139c80d04f039b7753de5313ce
and write this to a file
f=open("syncDB.txt",'a')
#hex=hex.encode('utf-8')
pickle.dump(hex,f)
f.close()
But when i read back the file as
while True:
data=f.readline()
print(data)
This gives the output:
b'\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00b857c3b319036d72cb85fe8a679531b0q\x00.\x80\x03X \x00\x00\x007532fb972cdb019630a2e5a1373fe1c5q\x00.\x80\x03X \x00\x00\x000126bb23767677d0a246d6be1d2e4d5cq\x00.'
How do i encode to get the same hexdigest back from these bytes??
Also I am getting some gibberish characters in syncDb.txt like "€X" after each line.How do I correctly write the data in a readable form??

You need to unpickle the data:
pickle.load(open('syncDB.txt', 'r+b'))
What you have there is pickled data. Proof:
>>> import pickle
>>> pickle.loads(b'\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00b857c3b319036d72cb85fe8a679531b0q\x00.\x80\x03X \x00\x00\x007532fb972cdb019630a2e5a1373fe1c5q\x00.\x80\x03X \x00\x00\x000126bb23767677d0a246d6be1d2e4d5cq\x00.')
'061e3f139c80d04f039b7753de5313ce'
But there's no point in pickling a hex string. You can just put it in the file. The pickle module should be used with more complex structures, like arrays, dicts, or even classes.

Don't pickle the hexdigest, just write it out as text.
with open("afile",'rb') as handle:
data=handle.readline()
signgenerator.update(data)
hex=signgenerator.hexdigest()
with open("syncDB.txt",'ab') as f:
f.write(hex + '\n')
with open("syncDB.txt",'rb') as f:
for data in f:
print(data)
If you really want to use pickle, you need to use the pickle.load function to read the data back from the file.

Related

Writing a JSON file from dictionary, correcting the output

So I am working on a conversion file that is taking a dictionary and converting it to a JSON file. Current code looks like:
data = {json_object}
json_string = jsonpickle.encode(data)
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(json_string, outfile)
But when I go to open that rendered file, it is adding three \ on the front and back of each string.
ps: sorry if I am using the wrong terminology, I am still new to python and don't know the vocabulary that well yet.
Try this
import json
data = {"k": "v"}
with open( 'path_to_file.json', 'w') as f:
json.dump(data, f)
You don't need to use jsonpickle to encode dict data.
The json.dump is a wrapper function that convert data to json format firstly, then write these string data to your file.
The reason why you found \\ exist between each string is that, jsonpickle have took your data to string, after which the quote(") would convert to Escape character when json.dump interact.
Just use the following code to write dict data to json
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(data, outfile)

Writing a set to an output file in python

I usually use json for lists, but it doesn't work for sets. Is there a similar function to write a set into an output file,f? Something like this, but for sets:
f=open('kos.txt','w')
json.dump(list, f)
f.close()
json is not a python-specific format. It knows about lists and dictionaries, but not sets or tuples.
But if you want to persist a pure python dataset you could use string conversion.
with open('kos.txt','w') as f:
f.write(str({1,3,(3,5)})) # set of numbers & a tuple
then read it back again using ast.literal_eval
import ast
with open('kos.txt','r') as f:
my_set = ast.literal_eval(f.read())
this also works for lists of sets, nested lists with sets inside... as long as the data can be evaluated literally and no sets are empty (a known limitation of literal_eval). So basically serializing (almost) any python basic object structure with str can be parsed back with it.
For the empty set case there's a kludge to apply since set() cannot be parsed back.
import ast
with open('kos.txt','r') as f:
ser = f.read()
my_set = set() if ser == str(set()) else ast.literal_eval(ser)
You could also have used the pickle module, but it creates binary data, so more "opaque", and there's also a way to use json: How to JSON serialize sets?. But for your needs, I would stick to str/ast.literal_eval
Using ast.literal_eval(f.read()) will give error ValueError: malformed node or string, if we write empty set in file. I think, pickle would be better to use.
If set is empty, this will give no error.
import pickle
s = set()
##To save in file
with open('kos.txt','wb') as f:
pickle.dump(s, f)
##To read it again from file
with open('kos.txt','rb') as f:
my_set = pickle.load(f)

Can't read JSON from .txt and convert back to JSON object

I have a .txt with JSON formatted content, that I would like to read, convert it to a JSON object and then log the result. I could read the file and I'm really close, but unfortunately json_data is a string object instead of a JSON object/dictionary. I assume it's something trivial, but I have no idea, because I'm new to Python, so I would really appreciate if somebody could show me the right solution.
import json
filename = 'html-json.txt'
with open(filename, encoding="utf8") as f:
jsonContentTxt = f.readlines()
json_data = json.dumps(jsonContentTxt)
print (json_data)
You may want to consult the docs for the json module. The Python docs are generally pretty great and this is no exception.
f.readlines() will read the lines of f points to—in your case, html-json.txt—and return those lines as a string. So jsonContentTxt is a string in JSON format.
If you simply want to print this string, you could just print jsonContentTxt. On the other hand, if you want to load that JSON into a Python data structure, manipulate it, and then output it, you could do something like this (which uses json.load, a function that takes a file-like object and returns an object such as a dict or list depending on the JSON):
with open(filename, encoding="utf8") as f:
json_content = json.load(f)
# do stuff with json_content, e.g. json_concent['foo'] = 'bar'
# then when you're ready to output:
print json.dumps(json_content)
You may also want to use the indent argument to json.dumps (link here) which will give you a nicely-formatted string.
Read the 2.7 documentation here or the 3.5 documentation here:
json.loads(json_as_string) # Deserializes a string to a json heirarchy
Once you have a deserialized form you can convert it back to json with a dump:
json.dump(json_as_heirarchy)

How to convert fileData from str to file type in Python?

I've got this file data which I read from an API as base64 and converted to regular file data using the following:
base64FileData = attachmentObj.data['data']
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
print type(fileData) # prints out <type 'str'>
Since I need the file in binary to further process it, I can then store it and read it back out as follows:
print type(fileData) # prints out <type 'str'>
with open('thefile.pdf', 'w') as f:
f.write(fileData)
with open('thefile.pdf', 'rb') as f:
print type(f) # prints out <type 'file'>, which I actually need.
This works, but seeing there is no need to actually store the file, this seems like one of the worst pieces of code I've ever seen.
Does anybody know how I can convert the initial fileData to a type 'file' without storing it and reading it back out? All tips are welcome!
Check out the StringIO / cStringIO modules. They present a file interface to an existing buffer (string) in memory. This will let you pass your data to the library without writing a temp file.
For example:
import StringIO
...
base64FileData = attachmentObj.data['data']
fileData = base64.urlsafe_b64decode(base64FileData.encode('UTF-8'))
memoryFile = StringIO.StringIO(fileData)
someFunctionThatOperatesOnFileObjects(memoryFile)

segmenting and writing binary file using Python

I have two binary input files, firstfile and secondfile. secondfile is firstfile + additional material. I want to isolate this additional material in a separate file, newfile. This is what I have so far:
import os
import struct
origbytes = os.path.getsize(firstfile)
fullbytes = os.path.getsize(secondfile)
numbytes = fullbytes-origbytes
with open(secondfile,'rb') as f:
first = f.read(origbytes)
rest = f.read()
Naturally, my inclination is to do (which seems to work):
with open(newfile,'wb') as f:
f.write(rest)
I can't find it but thought I read on SO that I should pack this first using struct.pack before writing to file. The following gives me an error:
with open(newfile,'wb') as f:
f.write(struct.pack('%%%ds' % numbytes,rest))
-----> error: bad char in struct format
This works however:
with open(newfile,'wb') as f:
f.write(struct.pack('c'*numbytes,*rest))
And for the ones that work, this gives me the right answer
with open(newfile,'rb') as f:
test = f.read()
len(test)==numbytes
-----> True
Is this the correct way to write a binary file? I just want to make sure I'm doing this part correctly to diagnose if the second part of the file is corrupted as another reader program I am feeding newfile to is telling me, or I am doing this wrong. Thank you.
If you know that secondfile is the same as firstfile + appended data, why even read in the first part of secondfile?
with open(secondfile,'rb') as f:
f.seek(origbytes)
rest = f.read()
As for writing things out,
with open(newfile,'wb') as f:
f.write(rest)
is just fine. The stuff with struct would just be a no-op anyway. The only thing you might consider is the size of rest. If it could be large, you may want to read and write the data in blocks.
There is no reason to use the struct module, which is for converting between binary formats and Python objects. There's no conversion needed here.
Strings in Python 2.x are just an array of bytes and can be read and written to and from files. (In Python 3.x, the read function returns a bytes object, which is the same thing, if you open the file with open(filename, 'rb').)
So you can just read the file into a string, then write it again:
import os
origbytes = os.path.getsize(firstfile)
fullbytes = os.path.getsize(secondfile)
numbytes = fullbytes-origbytes
with open(secondfile,'rb') as f:
first = f.seek(origbytes)
rest = f.read()
with open(newfile,'wb') as f:
f.write(rest)
You don't need to read origbytes, just move file pointer to the right position: f.seek(numbytes)
You don't need struct packing, write rest to the newfile.
This is not c, there is no % in the format string. What you want is:
f.write(struct.pack('%ds' % numbytes,rest))
It worked for me:
>>> struct.pack('%ds' % 5,'abcde')
'abcde'
Explanation: '%%%ds' % 15 is '%15s', while what you want is '%ds' % 15 which is '15s'

Categories