struct.pack shows shifted data - python

I am trying to use struct.pack to pack a hash disgest, but not getting the expected result.
This is how I am packing the data:
hash = hashlib.sha256(input).digest()
print('hash = ', hash.hex())
packed = struct.pack('!32p', hash)
print('packed = ', packed.hex())
Here is an example result:
hash = b5dbdb2b0a7d762fc7e429062d64b711d240e8f95f1c59fc28c28ac6677ffeaf
packed = 1fb5dbdb2b0a7d762fc7e429062d64b711d240e8f95f1c59fc28c28ac6677ffe
The bytes appear to be shifted, and "1f" has been added. Is this a result of an incorrect format specifier?
EDIT: I believe this first byte is the length of the data, because I am using 'p'. Is there any way to avoid this? I don't want to include this in my packed data

The 'p' format character encodes a “Pascal string” which includes the string's length at the beginning. This is documented. If you don't want that use 's' format to get just the bytes themselves instead:
packed = struct.pack('!32s', hash)
print('packed =', packed.hex())
Output:
packed = b5dbdb2b0a7d762fc7e429062d64b711d240e8f95f1c59fc28c28ac6677ffeaf

Related

cant replace data in serialized textview, bytearray size problem?

Im really struggling with serialized textbuffer data. I just got thru an SQL encoding issue(thanks theGtkNerd for the help.) and now my troubles are back.
Im trying to add search/replace functionality to a textview that is using pixbufs and formated text, since i have images/tags stored in the buffer, i am trying to do the replace on the serialized textview buffer data.
the following code works as long as the replace string is the same size as the searchstr.
def _diagFnRReplaceAll(self,oWidget): #Replace All Function
findbox = self.builder.get_object('FnRFindEntry')
searchstr = findbox.get_text()
buf = self.dataview.get_buffer()
repbox = self.builder.get_object('FnRReplaceEntry')
repstr = repbox.get_text()
format = buf.register_serialize_tagset()
data = buf.serialize(buf, format, buf.get_start_iter(),
buf.get_end_iter())
sys.stdout.buffer.write(data) #< print raw for debugging
newdata = data.replace(bytes(searchstr,'ascii'),bytes(repstr,'ascii'))
print("\n\n\n\n")
sys.stdout.buffer.write(newdata) #< print raw for debugging
buf.set_text('')
format = buf.register_deserialize_tagset()
buf.deserialize(buf, format, buf.get_end_iter(),newdata)
if its smaller or larger i get the following error.
Gtk:ERROR:../../../../gtk/gtktextbufferserialize.c:1588:text_handler: code should not be reached
i tried changing the encoding type, and different ways to encode, but it didnt help. the fact that a same size string works fine makes me think there is a size value for the serialized buffer data or pixbuf data somewhere, but i havnt come up with anything by searching.
i tried to do the replace like you would on a textview without pics, it worked but lost the pic/format data.
Does anyone know why this is happening?
or does anyone know another way i can do a search and replace in a textview widget that has pixbuf data and formatting tags?
well i got it wotking using a little byte patching.I just saw your comment on using marks, i will look into that as i would rather use builtin functionality instead of what im doing. My way is replacing the 4 bytes after the GTKBUFFERCONTENTS-001 with a new 4 byte value for the new size of the buffer.
here is what i have working right now
def _diagFnRReplaceAll(self,oWidget):
findbox = self.builder.get_object('FnRFindEntry')
searchstr = findbox.get_text()
buf = self.dataview.get_buffer()
repbox = self.builder.get_object('FnRReplaceEntry')
repstr = repbox.get_text()
format = buf.register_serialize_tagset()
data = buf.serialize(buf, format, buf.get_start_iter(), buf.get_end_iter())
start_bytes = data[0:26]
size_bytes =data[26:30]
sizeval = int.from_bytes(size_bytes, byteorder='big', signed=False)
end_of_markup = 29 + sizeval +1
the_rest = data[end_of_markup:len(data)]
markup = data[30:end_of_markup]
newmarkup = bytearray(markup.replace(bytes(searchstr,'ascii'),bytes(repstr,'ascii')))
newsize = len(newmarkup).to_bytes(4,'big')
reconstruct =start_bytes + newsize + newmarkup +the_rest
buf.set_text('')
format = buf.register_deserialize_tagset()
buf.deserialize(buf, format, buf.get_end_iter(),reconstruct)
This works without issue so far, i will repost if i get it working with the 'Marks' suggestion. Thanks again theGtknerd.

Cast an array shape into a string

I need to convert the output of a 2D array's myarray.shape into a string, because I want to isolate the rows and columns and reassign them as height and width for an image that I've read in, WITHOUT using PIL.
I tried (str)image1.shape but it just gave a syntax error.
What's the correct way to do this?
It's str(image1.shape). If you want to then parse it (say it's (50,2)), you could do this:
myshape = str(image1.shape) # returns '(50, 2)'
part1, part2 = myshape.split(', ')
part1 = part1[1:] # now is '50'
part2 = part2[:-1] # now is '2'
Or, since you're really after the numbers (I think), just skip the str() step and directly parse the output of image1.shape:
firstnum, secondnum = image1.shape
and you're done.

Parse a sequence of binary digits

How can I parse sequence of binary digits in python.
Following is an example for what i am trying to do.
I have a sequence of binary digits, for example
sequence = '1110110100110111011011110101100101100'
and, I need to parse this and extract the data.
Say the above sequence contains start, id, data and end fields
start is a 2 bit field, id is an 8 bit field, data field can vary from 1 to 8192 bits and end is a 4 bit field.
and after parsing I'm expecting the output as follows:
result = {start : 11,
id : 10110100,
data : 11011101101111010110010,
end : 1100,
}
I'm using this in one of my applications.
I'm able to parse the sequence using regex but, the problem is regex must be written by the user. So as an alternative i'm using BNF grammar as grammars are more readable.
I tried solving this using python's parsimonious and pyparsing parsers. But am not able to find the solution for the fields with variable length.
The grammar I wrote in parsimonious available for python is as follows:
grammar = """sequence = start id data end
start = ~"[01]{2}"
id = ~"[01]{8}"
data = ~"[01]{1,8192}"
end = ~"[01]{4}"
"""
Since the data field is of variable length, and the parser is greedy, the above sequence is not able to match with the above grammar. The parser takes end field bits into the data field.
I just simplified my problem to above example.
Let me describe the full problem. There are 3 kinds of packets (lets call them Token, Handshake and Data packets). Token and Handshake packets are of a fixed length and Data packet is variable length. (The example above shown is an example for data packet)
The input consists of a continuous stream of bits. Each packet beginning is marked by the "start" pattern and packet end is marked by the "end" pattern. Both of these are fixed bit patterns.
Example Token packet grammar:
start - 2 bits, id - 8 bits, address - 7bits, end - 4bits
111011010011011101100
Example Handshake packet grammar:
start - 2 bits, id - 8bits, end - 4 bits
11101101001100
Example top level rule:
packet = tokenpacket | datapacket | handshakepacket
If there were only one type of packet then slicing would work. But when we start parsing, we do not know which packet we will finally end up matching. This is why I thought of using a grammar as the problem is very similar to language parsing.
Can we make the slicing approach work in this case where we have 3 different packet types to be parsed?
Whats the best way to solve this problem?
Thanks in advance,
This will do, just use slicing for this job:
def binParser(data):
result = {}
result["start"] = data[:2]
result["id"] = data[2:8]
result["end"] = data[-4:]
result["data"] = data[10:-4]
return result
You will get the correct data from the string.
Presumably, there will only ever be one variable-length field, so you can allow this by defining a distance from the start of the sequence and a distance from the end, e.g.
rules = {'start': (None, 2), 'id': (2, 10),
'data': (10, -4), 'end': (-4, None)}
and then use slicing:
sequence = '1110110100110111011011110101100101100'
result = dict((k, sequence[v[0]:v[1]]) for k, v in rules.items())
This gives:
result == {'id': '10110100',
'end': '1100',
'data': '11011101101111010110010',
'start': '11'}
Since you mentioned pyparsing in the tags, here is how I would go about it using pyparsing. This uses Daniel Sanchez's binParser for post-processing.
from pyparsing import Word
#Post-processing of the data.
def binParser(m):
data = m[0]
return {'start':data[:2],
'id':data[2:8],
'end':data[-4:],
'data':data[10:-4]}
#At least 14 character for the required fields, attaching the processor
bin_sequence = Word('01',min=14).setParseAction(binParser)
sequence = '1110110100110111011011110101100101100'
print bin_sequence.parseString(sequence)[0]
This could then be used as part of a larger parser.

I need to change a zip code into a series of dots and dashes (a barcode), but I can't figure out how

Here's what I've got so far:
def encodeFive(zip):
zero = "||:::"
one = ":::||"
two = "::|:|"
three = "::||:"
four = ":|::|"
five = ":|:|:"
six = ":||::"
seven = "|:::|"
eight = "|::|:"
nine = "|:|::"
codeList = [zero,one,two,three,four,five,six,seven,eight,nine]
allCodes = zero+one+two+three+four+five+six+seven+eight+nine
code = ""
digits = str(zip)
for i in digits:
code = code + i
return code
With this I'll get the original zip code in a string, but none of the numbers are encoded into the barcode. I've figured out how to encode one number, but it wont work the same way with five numbers.
codeList = ["||:::", ":::||", "::|:|", "::||:", ":|::|",
":|:|:", ":||::", "|:::|", "|::|:", "|:|::" ]
barcode = "".join(codeList[int(digit)] for digit in str(zipcode))
Perhaps use a dictionary:
barcode = {'0':"||:::",
'1':":::||",
'2':"::|:|",
'3':"::||:",
'4':":|::|",
'5':":|:|:",
'6':":||::",
'7':"|:::|",
'8':"|::|:",
'9':"|:|::",
}
def encodeFive(zipcode):
return ''.join(barcode[n] for n in str(zipcode))
print(encodeFive(72353))
# |:::|::|:|::||::|:|:::||:
PS. It is better not to name a variable zip, since doing so overrides the builtin function zip. And similarly, it is better to avoid naming a variable code, since code is a module in the standard library.
You're just adding i (the character in digits) to the string where I think you want to be adding codeList[int(i)].
The code would probably be much simpler by just using a dict for lookups.
I find it easier to use split() to create lists of strings:
codes = "||::: :::|| ::|:| ::||: :|::| :|:|: :||:: |:::| |::|: |:|::".split()
def zipencode(numstr):
return ''.join(codes[int(x)] for x in str(numstr))
print zipencode("32345")
This is made in python.
number = ["||:::",
":::||",
"::|:|",
"::||:",
":|::|",
":|:|:",
":||::",
"|:::|",
"|::|:",
"|:|::"
]
def encode(num):
return ''.join(map(lambda x: number[int(x)], str(num)))
print encode(32345)
I don't know what language you are usingm so I made an example in C#:
int zip = 72353;
string[] codeList = {
"||:::", ":::||", "::|:|", "::||:", ":|::|",
":|:|:", ":||::", "|:::|", "|::|:", "|:|::"
};
string code = String.Empty;
while (zip > 0) {
code = codeList[zip % 10] + code;
zip /= 10;
}
return code;
Note: Instead of converting the zip code to a string, and the convert each character back to a number, I calculated the digits numerically.
Just for fun, here's a one-liner:
return String.Concat(zip.ToString().Select(c => "||::::::||::|:|::||::|::|:|:|::||::|:::||::|:|:|::".Substring(((c-'0') % 10) * 5, 5)).ToArray());
It appears you're trying to generate a "postnet" barcode. Note that the five-digit ZIP postnet barcodes were obsoleted by ZIP+4 postnet barcodes, which were obsoleted by ZIP+4+2 delivery point postnet barcodes, all of which are supposed to include a checksum digit and leading and ending framing bars. In any case, all of those forms are being obsoleted by the new "intelligent mail" 4-state barcodes, which require a lot of computational code to generate and no longer rely on straight digit-to-bars mappings. Search USPS.COM for more details.

python String Formatting Operations

Faulty code:
pos_1 = 234
pos_n = 12890
min_width = len(str(pos_n)) # is there a better way for this?
# How can I use min_width as the minimal width of the two conversion specifiers?
# I don't understand the Python documentation on this :(
raw_str = '... from %(pos1)0*d to %(posn)0*d ...' % {'pos1':pos_1, 'posn': pos_n}
Required output:
... from 00234 to 12890 ...
______________________EDIT______________________
New code:
# I changed my code according the second answer
pos_1 = 10234 # can be any value between 1 and pos_n
pos_n = 12890
min_width = len(str(pos_n))
raw_str = '... from % *d to % *d ...' % (min_width, pos_1, min_width, pos_n)
New Problem:
There is one extra whitespace (I marked it _) in front of the integer values, for intigers with min_width digits:
print raw_str
... from _10234 to _12890 ...
Also, I wonder if there is a way to add Mapping keys?
pos_1 = 234
pos_n = 12890
min_width = len(str(pos_n))
raw_str = '... from %0*d to %0*d ...' % (min_width, pos_1, min_width, pos_n)
Concerning using a mapping type as second argument to '%':
I presume you mean something like that '%(mykey)d' % {'mykey': 3}, right?! I think you cannot use this if you use the "%*d" syntax, since there is no way to provide the necessary width arguments with a dict.
But why don't you generate your format string dynamically:
fmt = '... from %%%dd to %%%dd ...' % (min_width, min_width)
# assuming min_width is e.g. 7 fmt would be: '... from %7d to %7d ...'
raw_string = fmt % pos_values_as_tuple_or_dict
This way you decouple the width issue from the formatting of the actual values, and you can use a tuple or a dict for the latter, as it suits you.
"1234".rjust(13,"0")
Should do what you need
addition:
a = ["123", "12"]
max_width = sorted([len(i) for i in a])[-1]
put max_width instead of 13 above and put all your strings in a single array a (which seems to me much more usable than having a stack of variables).
additional nastyness:
(Using array of numbers to get closer to your question.)
a = [123, 33, 0 ,223]
[str(x).rjust(sorted([len(str(i)) for i in a])[-1],"0") for x in a]
Who said Perl is the only language to easily produce braindumps in? If regexps are the godfather of complex code, then list comprehension is the godmother.
(I am relatively new to python and rather convinced that there must be a max-function on arrays somewhere, which would reduce above complexity. .... OK, checked, there is. Pity, have to reduce the example.)
[str(x).rjust(max([len(str(i) for i in a]),"0") for x in a]
And please observe below comments on "not putting calculation of an invariant (the max value) inside the outer list comprehension".

Categories