I'm trying to decode binary which are located in a .txt file, but I'm stuck. I don't see any possibilities this can go around.
def code(): testestest
ascii = {'01000001':'A', ...}
binary = {'A':'01000001', ...}
print (ascii, binary)
def encode():
pass
def decode(code,n):
f = open(code, mode='rb') # Open a file with filename <code>
while True:
chunk = f.read(n) # Read n characters at time from an open file
if chunk == '': # This is one way to check for the End Of File in Python
break
if chunk != '\n':
# Process it????
pass
How can I take the binary in the .txt file and output it as ASCII?
From your example, your input looks like a string of a binary formatted number.
If so, you don't need a dictionnary for that:
def byte_to_char(input):
return chr(int(input, base=2))
Using the data you gave in the comments, you have to split your binary string into bytes.
input ='01010100011010000110100101110011001000000110100101110011001000000110101001110101011100110111010000100000011000010010000001110100011001010111001101110100001000000011000100110000001110100011000100110000'
length = 8
input_l = [input[i:i+length] for i in range(0,len(input),length)]
And then, per byte, you convert it into a char:
input_c = [chr(int(c,base=2)) for c in input_l]
print ''.join(input_c)
Putting it all together:
def string_decode(input, length=8):
input_l = [input[i:i+length] for i in range(0,len(input),length)]
return ''.join([chr(int(c,base=2)) for c in input_l])
decode(input)
>'This is just a test 10:10'
Related
I have three text values that I am encrypting and then writing to a file. Later I want to read the values back (in another script) and decrypt them.
I've successfully encrypted the values:
cenc = rsa.encrypt(client_name.encode('utf8'), publicKey)
denc = rsa.encrypt(expiry_date.encode('utf8'), publicKey)
fenc = rsa.encrypt(features.encode('utf8'), publicKey)
and written to a binary file:
licensefh = open("license.sfb", "wb")
licensefh.write(cenc)
licensefh.write(denc)
licensefh.write(fenc)
licensefh.close()
The three values cenc, denc and fenc are all of different lengths so when I read the file back:
licensefh = open("license.sfb", "rb")
encMessage = licensefh.read()
encMessage contains the entire file and I don't know how to get the three values back again.
I've tried using a separator between the values:
SEP = bytes(chr(0x02).encode('utf8'))
...
licensefh.write(cenc)
licensefh.write(SEP)
...
and then using encMessage.partition(SEP) or encMessage.split(SEP) but the data invariably contains the SEP value in it somewhere (I've tried a few different characters) so that didn't work.
I tried getting the length of the bytes objects cenc, denc and fenc, but this returned 256 for each value even though the contents of the variables are all different lengths.
My question is this. How do I write these three variable length values to a binary file and then separate them when I read them back again?
Here's an example of the 3 binary values:
b'tX\x10Fo\x89\x10~\x83Pok\xd1\xfb\xbe\x0e<a\xe5\x11md:\xe6\x84#\xfa\xf8\xe5\xeb\xf8\xdc{\xc0Z\xa0\xc0^\xc1\xd9\x820\xec\xec\xb0R\x99/\xa2l\x88\xa9\xa6g\xa3\x01m\xf9\x7f\x91\xb9\xe1\x80\xccs|\xb7_\xa9Fp\x11yvG\xdc\x02d\x8aK2\x92t\x0e\x1f\xca\x19\xbb&\xaf{\xc0y>\t|\x86\xab\x16.\xa5kZ"\xab6\xaaV\xf4w\x7f\xc5q\x07\xef\xa9\xa5\xa3\xf3 6\xdb\x03\x19S\xbd\x81\xf9\xc8\xc5\x90\x1e\x19\x86\xa4q\xe3?i\xc4\xac\t\xd5=3C\x9b#\xc3IuAN,\xeat\xc6\x96VFL\x1eFWZ\xa4\xd73\x92P#\x1d\xb9\x12\x15\xc9\xd4~\x8aWm^\xb8\x8b\x9d\x88\n)\xeb#\xe3\x93\xb1\\\xd6^\xe0\xce\xa2(\x05\xf5\xe6\x8b\xd1\x15\xd8v\xf0\xae\x90\xd8?\x01\r\x00\xf4\xa5\xadM|%\x98\xa9SR\xc6\xd0K\x9e&\xc3\xe0M\x81\x87\xdea\xcc\xd5\x9c\xcd\xfd1l\x1f\xb9?\xed\xd1\x95\xbc\x11\x85U9'
b'l\xd3S\xcc\x03\x9a\xf2\xfdr\xca\xbbA\x06\xfb\xd8\xbbWi\xdc\xb1\xf6&\x97T\x81Kl\r\x86\x9b\x95?\x94}\x8a\xd3\xa1V\x81\xd3]*B\x1f\x96`\xa3\xd1\xf2|B\x84?\xa0\ns\xb7\xcf\x18Y\x87\xcfR\x87!\x14\x81!\xf7\xf2\xe5x|=O\xe3\xba2\xf2!\x93\x0fT7\x0c~4\xa3\xe5\xb7\xf9wy\xb5\x12FM\x96\xd9\xfd\xedn\x9c\xacw\x1b\xc2\x17+\xb6\x05`\x10\xf8\xe4\x01\xde\xc7\xa2\xa0\x80\xd8\x15\xb1+<s\xc7\x19\x9c\x14\xb0\x1a"\x10\xbb\x0f\xe1\x05\x93\xd2?xX\xd9\x93\x8an\x8d\xcd\xbd!c\xd0,\xa45\xbai\xe3\xccx\x08\xaa,\xd1\xe5\'t\x91\xb8\xf2n$\x0c\xf9-\xb4\xc2\x07\x81\xe1\xe7\x8e\xb3\x98\x11\xf3\xa6\xd9wz\x9a3\xc9\x9c?z\xd8\xaa\x08}\xa2\x9c[\xf2\x9d\xe4\xcdb\xddl\xceV\x7f\xf1\x81\xb3\x88\x1e\x9c5?k\x0f\xc9\x86\x86&\xedV.\xa7\x8d\x13&V\xad\xca\xe5\x93\xfe\xa5\x94\xbc\xf5\xd1{Cl\xc0\x030\x92\x03\xc9'
b'#\xbdd7\xe9\xa0{\t\xb9\x87B\x9e\xf9\x97P^\xf3V\xb6\x93\x1f(J\x0b\xa3\xbf\xd8\x04\x86T\xa4\xca\xf3\xe8%\xddC\x11\xdb5\xff,\xf7\x13\xd7\xd2\xbc\xf3\x893\x83\xdcmJ\xc8p\xdf\x07V\x7fb\xeb\xa9\x8b\x0f\xca\xf9\x05\xfc\xdfS\x94b\x90\xcd\xfcn?/]\x11\xaf\xe606\xfb\\U59\xa0>\xbd\xd8\x1c\xa8\xca\x83\xf4C\x95v7\xc6\xe00\xe4,d_/\x83\xa0\xb9mO\x0e\xc4\x97J\x15\xf0\xca-\xa0\xafT\xe4\x82\x03\n\x14:\xa1\xdcL\x98\x9d,1\xfa\x10\xf4\xfd\xa0\x0b\xc7\x13!\xf7\xdb/\xda\x1a\x9df\x1cQ\xc0\x99H\x08\xa0c\x8f9/4\xc4\x05\xc6\x9eM\x8e\xe5V\xf8D\xc3\xfd\xad4\x94A\xb9[\x80\xb9\xcf\xe6\xd9\xb3M2\xd9N\xfbA\x18\x84/W\x9b\x92\xfe\xbb\xd6C\x85\xa3\xc6\xd2T\xd0\xb2\xb9\xf7R\xb4(s\xda\xbcX,9w\x17\x1c\xfb|\xa0\x87\xba\xca6>y\xba\\L4wc\x94\xe7$Y\x89\x07\x9b\xfe\x9b?{\x85'
#pippo1980 's comment is how I would do it, using struct :
import struct
cenc = b'tX\x10Fo\x89\x10~\x83Pok\xd1\xfb\xbe\x0e<a\xe5\x11md:\xe6\x84#\xfa\xf8\xe5\xeb\xf8\xdc{\xc0Z\xa0\xc0^\xc1\xd9\x820\xec\xec\xb0R\x99/\xa2l\x88\xa9\xa6g\xa3\x01m\xf9\x7f\x91\xb9\xe1\x80\xccs|\xb7_\xa9Fp\x11yvG\xdc\x02d\x8aK2\x92t\x0e\x1f\xca\x19\xbb&\xaf{\xc0y>\t|\x86\xab\x16.\xa5kZ"\xab6\xaaV\xf4w\x7f\xc5q\x07\xef\xa9\xa5\xa3\xf3 6\xdb\x03\x19S\xbd\x81\xf9\xc8\xc5\x90\x1e\x19\x86\xa4q\xe3?i\xc4\xac\t\xd5=3C\x9b#\xc3IuAN,\xeat\xc6\x96VFL\x1eFWZ\xa4\xd73\x92P#\x1d\xb9\x12\x15\xc9\xd4~\x8aWm^\xb8\x8b\x9d\x88\n)\xeb#\xe3\x93\xb1\\\xd6^\xe0\xce\xa2(\x05\xf5\xe6\x8b\xd1\x15\xd8v\xf0\xae\x90\xd8?\x01\r\x00\xf4\xa5\xadM|%\x98\xa9SR\xc6\xd0K\x9e&\xc3\xe0M\x81\x87\xdea\xcc\xd5\x9c\xcd\xfd1l\x1f\xb9?\xed\xd1\x95\xbc\x11\x85U9'
denc = b'l\xd3S\xcc\x03\x9a\xf2\xfdr\xca\xbbA\x06\xfb\xd8\xbbWi\xdc\xb1\xf6&\x97T\x81Kl\r\x86\x9b\x95?\x94}\x8a\xd3\xa1V\x81\xd3]*B\x1f\x96`\xa3\xd1\xf2|B\x84?\xa0\ns\xb7\xcf\x18Y\x87\xcfR\x87!\x14\x81!\xf7\xf2\xe5x|=O\xe3\xba2\xf2!\x93\x0fT7\x0c~4\xa3\xe5\xb7\xf9wy\xb5\x12FM\x96\xd9\xfd\xedn\x9c\xacw\x1b\xc2\x17+\xb6\x05`\x10\xf8\xe4\x01\xde\xc7\xa2\xa0\x80\xd8\x15\xb1+<s\xc7\x19\x9c\x14\xb0\x1a"\x10\xbb\x0f\xe1\x05\x93\xd2?xX\xd9\x93\x8an\x8d\xcd\xbd!c\xd0,\xa45\xbai\xe3\xccx\x08\xaa,\xd1\xe5\'t\x91\xb8\xf2n$\x0c\xf9-\xb4\xc2\x07\x81\xe1\xe7\x8e\xb3\x98\x11\xf3\xa6\xd9wz\x9a3\xc9\x9c?z\xd8\xaa\x08}\xa2\x9c[\xf2\x9d\xe4\xcdb\xddl\xceV\x7f\xf1\x81\xb3\x88\x1e\x9c5?k\x0f\xc9\x86\x86&\xedV.\xa7\x8d\x13&V\xad\xca\xe5\x93\xfe\xa5\x94\xbc\xf5\xd1{Cl\xc0\x030\x92\x03\xc9'
fenc = b'#\xbdd7\xe9\xa0{\t\xb9\x87B\x9e\xf9\x97P^\xf3V\xb6\x93\x1f(J\x0b\xa3\xbf\xd8\x04\x86T\xa4\xca\xf3\xe8%\xddC\x11\xdb5\xff,\xf7\x13\xd7\xd2\xbc\xf3\x893\x83\xdcmJ\xc8p\xdf\x07V\x7fb\xeb\xa9\x8b\x0f\xca\xf9\x05\xfc\xdfS\x94b\x90\xcd\xfcn?/]\x11\xaf\xe606\xfb\\U59\xa0>\xbd\xd8\x1c\xa8\xca\x83\xf4C\x95v7\xc6\xe00\xe4,d_/\x83\xa0\xb9mO\x0e\xc4\x97J\x15\xf0\xca-\xa0\xafT\xe4\x82\x03\n\x14:\xa1\xdcL\x98\x9d,1\xfa\x10\xf4\xfd\xa0\x0b\xc7\x13!\xf7\xdb/\xda\x1a\x9df\x1cQ\xc0\x99H\x08\xa0c\x8f9/4\xc4\x05\xc6\x9eM\x8e\xe5V\xf8D\xc3\xfd\xad4\x94A\xb9[\x80\xb9\xcf\xe6\xd9\xb3M2\xd9N\xfbA\x18\x84/W\x9b\x92\xfe\xbb\xd6C\x85\xa3\xc6\xd2T\xd0\xb2\xb9\xf7R\xb4(s\xda\xbcX,9w\x17\x1c\xfb|\xa0\x87\xba\xca6>y\xba\\L4wc\x94\xe7$Y\x89\x07\x9b\xfe\x9b?{\x85'
packing_format = "<HHH" # little-endian, 3 * (2-byte unsigned short)
with open("license.sfb", "wb") as licensefh:
licensefh.write(struct.pack(packing_format, len(cenc), len(denc), len(fenc)))
licensefh.write(cenc)
licensefh.write(denc)
licensefh.write(fenc)
# close is automatic with a context-manager
with open("license.sfb", "rb") as licensefh2:
header_length = struct.calcsize(packing_format)
cenc2_len, denc2_len, fenc2_len = struct.unpack(packing_format, licensefh2.read(header_length))
cenc2 = licensefh2.read(cenc2_len)
denc2 = licensefh2.read(denc2_len)
fenc2 = licensefh2.read(fenc2_len)
assert len(cenc2) == cenc2_len and len(denc2) == denc2_len and len(fenc2) == fenc2_len # the file was not truncated
unread_bytes = licensefh2.read() # until EOF
assert len(unread_bytes) == 0 # there is nothing else in the file, everything has been read
assert cenc == cenc2
assert denc == denc2
assert fenc == fenc2
I'm trying to read a null terminated string but i'm having issues when unpacking a char and putting it together with a string.
This is the code:
def readString(f):
str = ''
while True:
char = readChar(f)
str = str.join(char)
if (hex(ord(char))) == '0x0':
break
return str
def readChar(f):
char = unpack('c',f.read(1))[0]
return char
Now this is giving me this error:
TypeError: sequence item 0: expected str instance, int found
I'm also trying the following:
char = unpack('c',f.read(1)).decode("ascii")
But it throws me:
AttributeError: 'tuple' object has no attribute 'decode'
I don't even know how to read the chars and add it to the string, Is there any proper way to do this?
Here's a version that (ab)uses __iter__'s lesser-known "sentinel" argument:
with open('file.txt', 'rb') as f:
val = ''.join(iter(lambda: f.read(1).decode('ascii'), '\x00'))
How about:
myString = myNullTerminatedString.split("\x00")[0]
For example:
myNullTerminatedString = "hello world\x00\x00\x00\x00\x00\x00"
myString = myNullTerminatedString.split("\x00")[0]
print(myString) # "hello world"
This works by splitting the string on the null character. Since the string should terminate at the first null character, we simply grab the first item in the list after splitting. split will return a list of one item if the delimiter doesn't exist, so it still works even if there's no null terminator at all.
It also will work with byte strings:
myByteString = b'hello world\x00'
myStr = myByteString.split(b'\x00')[0].decode('ascii') # "hello world" as normal string
If you're reading from a file, you can do a relatively larger read - estimate how much you'll need to read to find your null string. This is a lot faster than reading byte-by-byte. For example:
resultingStr = ''
while True:
buf = f.read(512)
resultingStr += buf
if len(buf)==0: break
if (b"\x00" in resultingStr):
extraBytes = resultingStr.index(b"\x00")
resultingStr = resultingStr.split(b"\x00")[0]
break
# now "resultingStr" contains the string
f.seek(0 - extraBytes,1) # seek backwards by the number of bytes, now the pointer will be on the null byte in the file
# or f.seek(1 - extraBytes,1) to skip the null byte in the file
(edit version 2, added extra way at the end)
Maybe there are some libraries out there that can help you with this, but as I don't know about them lets attack the problem at hand with what we know.
In python 2 bytes and string are basically the same thing, that change in python 3 where string is what in py2 is unicode and bytes is its own separate type, which mean that you don't need to define a read char if you are in py2 as no extra work is required, so I don't think you need that unpack function for this particular case, with that in mind lets define the new readString
def readString(myfile):
chars = []
while True:
c = myfile.read(1)
if c == chr(0):
return "".join(chars)
chars.append(c)
just like with your code I read a character one at the time but I instead save them in a list, the reason is that string are immutable so doing str+=char result in unnecessary copies; and when I find the null character return the join string. And chr is the inverse of ord, it will give you the character given its ascii value. This will exclude the null character, if its needed just move the appending...
Now lets test it with your sample file
for instance lets try to read "Sword_Wea_Dummy" from it
with open("sword.blendscn","rb") as archi:
#lets simulate that some prior processing was made by
#moving the pointer of the file
archi.seek(6)
string=readString(archi)
print "string repr:", repr(string)
print "string:", string
print ""
#and the rest of the file is there waiting to be processed
print "rest of the file: ", repr(archi.read())
and this is the output
string repr: 'Sword_Wea_Dummy'
string: Sword_Wea_Dummy
rest of the file: '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
other tests
>>> with open("sword.blendscn","rb") as archi:
print readString(archi)
print readString(archi)
print readString(archi)
sword
Sword_Wea_Dummy
ÍÌÌ=p=Š4:¦6¿JÆ=
>>> with open("sword.blendscn","rb") as archi:
print repr(readString(archi))
print repr(readString(archi))
print repr(readString(archi))
'sword'
'Sword_Wea_Dummy'
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6='
>>>
Now that I think about it, you mention that the data portion is of fixed size, if that is true for all files and the structure on all of them is as follow
[unknow size data][know size data]
then that is a pattern we can exploit, we only need to know the size of the file and we can get both part smoothly as follow
import os
def getDataPair(filename,knowSize):
size = os.path.getsize(filename)
with open(filename, "rb") as archi:
unknown = archi.read(size-knowSize)
know = archi.read()
return unknown, know
and by knowing the size of the data portion, its use is simple (which I get by playing with the prior example)
>>> strins_data, data = getDataPair("sword.blendscn", 80)
>>> string_data, data = getDataPair("sword.blendscn", 80)
>>> string_data
'sword\x00Sword_Wea_Dummy\x00'
>>> data
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
>>> string_data.split(chr(0))
['sword', 'Sword_Wea_Dummy', '']
>>>
Now to get each string a simple split will suffice and you can pass the rest of the file contained in data to the appropriated function to be processed
Doing file I/O one character at a time is horribly slow.
Instead use readline0, now on pypi: https://pypi.org/project/readline0/ . Or something like it.
In 3.x, there's a "newline" argument to open, but it doesn't appear to be as flexible as readline0.
Here is my implementation:
import struct
def read_null_str(f):
r_str = ""
while 1:
back_offset = f.tell()
try:
r_char = struct.unpack("c", f.read(1))[0].decode("utf8")
except:
f.seek(back_offset)
temp_char = struct.unpack("<H", f.read(2))[0]
r_char = chr(temp_char)
if ord(r_char) == 0:
return r_str
else:
r_str += r_char
(Answer found. Close the topic)
I'm trying to convert hex values, stored as string, in to hex data.
I have:
data_input = 'AB688FB2509AA9D85C239B5DE16DD557D6477DEC23AF86F2AABD6D3B3E278FF9'
I need:
data_output = '\xAB\x68\x8F\xB2\x50\x9A\xA9\xD8\x5C\x23\x9B\x5D\xE1\x6D\xD5\x57\xD6\x47\x7D\xEC\x23\xAF\x86\xF2\xAA\xBD\x6D\x3B\x3E\x27\x8F\xF9'
I was trying data_input.decode('hex'), binascii.unhexlify(data_input) but all they return:
"\xabh\x8f\xb2P\x9a\xa9\xd8\\#\x9b]\xe1m\xd5W\xd6G}\xec#\xaf\x86\xf2\xaa\xbdm;>'\x8f\xf9"
What should I write to receive all bytes in '\xFF' view?
updating:
I need representation in '\xFF' view to write this data to a file (I'm opening file with 'wb') as:
«hЏІPљ©Ш\#›]бmХWЦG}м#Ї†тЄЅm;>'Џщ
update2
Sorry for bothering. An answer lies under my nose all the time:
data_output = data_input.decode('hex')
write_file(filename, data_output) #just opens a file 'wb', ant write a data in it
gives the same result as I need
I like chopping strings into fixed-width chunks using re.findall
print '\\x' + '\\x'.join(re.findall('.{2}', data_input))
If you want to actually convert the string into a list of ints, you can do that like this:
data = [int(x, 16) for x in re.findall('.{2}', data_input)]
It's an inefficient solution, but there's always:
flag = True
data_output = ''
for char in data_input:
if flag:
buffer = char
flag = False
else:
data_output = data_output + '\\x' + buffer + char
flag = True
EDIT HOPEFULLY THE LAST: Who knew I could mess up in so many different ways on that simple a loop? Should actually run now...
>>> int('0x10AFCC', 16)
1093580
>>> hex(1093580)
'0x10afcc'
So prepend your string with '0x' then do the above
I need to read firefox's indexeddb using python.
I use slite3 package to retrieve contents of indexeddb:
with sqlite3.connect(indexeddb_file) as conn:
c = conn.cursor()
c.execute('select * from object_data;')
rows = c.fetchall()
for row in rows:
print row[2]
However, although I know that contents in database are strings, they are stored as sqlite binary blobs. Is there a way to read the strings stored as blobs from python?
I've tried:
hex() and quote() sql methods just encode the blob to hexadecimal
the same problem when I write the blob to file
UPDATE
Following the coding scheme in firefox source code of the implementation of indexeddb pointed out by #paa in one of the comments of this question, I implemented part of FF encoding method for database keys in python. So, far I have implemented it only for strings but implementing it for other types would be even easier:
BYTE_LENGTH = 8
def hex_to_bin(hex_str):
"""Return binary representation of hexadecimal string."""
return str(trim_bin(int(hex_str, 16)).zfill(len(hex_str) * 4))
def byte_to_unicode(bin_byte):
"""Return unicode encoding for binary byte."""
return chr(int(str(bin_byte), 2))
def trim_bin(int_n):
"""Return int num converted to trimmed bin representation."""
return bin(int_n)[2:]
def decode(key):
"""Return decoded idb key."""
decoded = key
m = re.search("[1-9]", key) # change for non-zero
if m:
i = m.start()
typeoffset = int(key[i])
else:
# error
pass
data = key[i + 1:]
if typeoffset is 1:
# decode number
pass
elif typeoffset is 2:
# decode date
pass
elif typeoffset is 3:
# decode string
bin_repr = hex_to_bin(data)
decoded = ""
for i in xrange(0, len(bin_repr), BYTE_LENGTH):
byte = bin_repr[i:i + BYTE_LENGTH]
if byte[0] is '0':
byte_1 = int(byte, 2) - 1
decoded += byte_to_unicode(trim_bin(byte_1))
else:
byte = byte[2:]
if byte[1] is '0':
byte_127 = int(byte, 2) + 127
decoded += byte_to_unicode(trim_bin(byte_127))
i += BYTE_LENGTH
decoded += byte_to_unicode(bin_repr[i:i + BYTE_LENGTH])
elif byte[1] is '1':
decoded += byte_to_unicode(byte)
i += BYTE_LENGTH
decoded += byte_to_unicode(bin_repr[i:i + BYTE_LENGTH])
i += BYTE_LENGTH
decoded += byte_to_unicode(bin_repr[i:i + 2])
return decoded
elif typeoffset is 4:
# decode array
pass
else:
# error
pass
return decoded
However, I'm still not able to decode the data fields of indexeddb. It seems to me that they are not using any sophisticated scheme like the one for the keys because I can read some parts of the actual values when I encode them in UTF-16.
(Typing here since I can't comment yet...)
For data itself I've been trying to do the same thing for data blobs. For my problem, I'm trying to grab JSON strings out. If I look at the DB I'm trying to sift through, I do see UTF-16 encoded characters, most of the time. But there are strange cases where I have this:
"there we go" is encoded as 7400 6800 6500 7200 6500 2000 77 [05060C] 6700 6F00. The [05060C] supposedly encodes "e ".
https://mxr.mozilla.org/mozilla-release/source/dom/indexedDB/IDBObjectStore.cpp
I'm trying to look into that and see if there are any clues. Should be plenty of other source files in the directory that could help.
Hey guys so I 'm trying to make a cipher following these sets of instructions:
Print a header.
Prompt the user to enter the name of the file with the encrypted message, the decode
key (the shift number), and the name of the file to store the decrypted message.
Read the encrypted message from the file.
Use the decode key to shift each character in the encrypted message by the
appropriate number to generate the new string corresponding to the decrypted message.
Save the decrypted message in the second file.
Print the encypted and decrypted messages on the screen.
I'm not allowed to use the ord() or chr() functions.
What really confuses me is the encrypted and decrypted files part. I don't really know how to code for this.
I'm pretty new to this so any help would be greatly appreciated.
Thanks in advance.
Note: It sounds like you're probably doing this as a school assignment. I highly recommend that you use the code below only as an example and not as a full solution. I would hate for there to be plagiarism issues surrounding your assignment and I'm sure your professor/teacher is knowledgeable at Googling for prior work. Good luck on your assignment!
I wrote a quick example of how I might try and tackle your problem. The example has a few known issues:
It doesn't deal with capital letters. (Other than to convert them to their lowercase counterparts.)
It doesn't deal with punctuation or non alphanumeric characters. (Numbers, spaces or line endings.)
There is no error checking.
If you try to convert a number < -25 it will throw up on you.
Probably the biggest problem that needed to be solved was the limitation of not using ord() and chr(). I bypassed that limitation by creating my own conversion list of letters to numbers and vice versa. A tricky corner case to make sure you deal with is what happens if the shift moves a letter outside of the conversion range [0,25].
As a side note if you want to decrypt a file you can simply open it up as the plaintext and use a negative offset whose absolute value is equal to the encrypting offset. Or in plain English, if you use the parameters:
infile = clear.txt, offset = 1, outfile = encrypted.txt
To decrypt you can use:
infile = encrypted.txt, offset = -1, outfile = decrypted.txt
caesarcipher.py
import itertools
letters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q',
'r','s','t','u','v','w','x','y','z']
numbers = range(26) # Numbers 0 - 25
lettersToNumbers = dict(zip(letters, numbers))
numbersToLetters = dict(zip(numbers, letters))
def printHeader():
""" Print the program informational header """
print """=======================================
Welcome to CaesarCipher - The unbreakable
Roman cipher.
======================================="""
def convertToNumber(letter):
""" Convert a letter to a number using our predefined conversion table
#param letter: The letter to convert to an integer value
#type letter: str
#rtype: int
"""
return lettersToNumbers[letter]
def convertToLetter(number):
""" Convert a number to a letter using our predefined conversion table
#param number: The number to convert to a letter
#type number: int
#rtype: str
"""
# If we shift outside of our range make sure to wrap
if number > 25:
return numbersToLetters[number%25]
elif number < 0:
return numbersToLetters[number+25]
else:
return numbersToLetters[number]
def shiftUp(letter, shift):
""" Shift letter up a given number of positions
#param letter: The letter we're shifting
#param shift: The number of positions to shift up
#type letter: str
#type shift: int
#note: For simplicity we encode both capital and lowercase letters
to the same values
"""
number = convertToNumber(letter.lower())
number += shift
return convertToLetter(number)
def prompt():
""" Prompt for user input
#rtype: tuple of str, int, str
"""
infile = raw_input("File to encrypt: ")
offset = int(raw_input("Encoding number: "))
outfile = raw_input("Encrypted file destination: ")
return (infile, offset, outfile)
def encrypt(infile, offset, outfile):
""" Encrypt the file using the given offset """
print "=== Plaintext input ==="
printFile(infile)
with open(infile) as red_file:
with open(outfile, 'w') as black_file:
for line in red_file:
for letter in line:
# Only convert alphabetic characters
if letter.isalpha():
black_file.write(shiftUp(letter, offset))
else:
black_file.write(letter)
print "=== Ciphertext output ==="
printFile(outfile)
def printFile(path):
""" Print the data in the given file """
with open(path) as print_file:
for line in print_file:
print line
printHeader()
encrypt(*prompt()) # `*` unpacks the tuple returned by `prompt()` into
# three separate arguments.
test.txt
abcdef
ABCDEF
This is some text I want to try and encrypt.
Example run:
mike#test:~$ python caesarcipher.py
=======================================
Welcome to CaesarCipher - The unbreakable
Roman cipher.
=======================================
File to encrypt: test.txt
Encoding number: 1
Encrypted file destination: test.out
=== Plaintext input ===
abcdef
ABCDEF
This is some text I want to try and encrypt.
=== Ciphertext output ===
bcdefg
bcdefg
uijt jt tpnf ufyu j xbou up usz boe fodszqu.
Since you say the file bits is your biggest problem, I assume function like:
def decaesar(message, shift):
pass
that does the decyphering for you on a string basis - that is, it takes the encrypted message as a string and gives you back the decrypted message as a string. If you haven't written that already, do that first, and test it with hard-coded strings. Ignore the "encrypted and decrypted files" bit at this stage - programming is all about solving one problem at a time.
Once you have that function and you're happy that it works, extending your program to deal with files instead of strings is as simple as asking:
Can I get a string with the contents of a file, given the file's name? , and conversely,
Can I write a string into a file with a given name?
If you can answer both of those with 'yes', then you can extend your program in this way without changing your decaesar function - your logic looks like this:
# Print header
encrypted_filename, decrypted_filename, shift = # get from user input
encrypted_message = # get the contents of encrypted_filename as a string
decrypted_message = decaesar(encrypted_message, shift)
# write decrypted_message to decrypted_filename
# print encrypted_message and decrypted_message
Usefully, Python's file IO works on exactly this principle of converting between strings and files. If you have a file open for reading:
in_file = open(filename)
, then the return value of:
in_file.read()
is exactly the string to answer the first point. Likewise, if you have a file open for writing:
out_file = open(filename, 'w')
. then:
out_file.write(my_string)
will put my_string into that file.
So that means that if you do already have your decaeser function, then you can slip this code into the pseudocode above at the appropriate places, and you will have a mostly working solution.