Convert string to big endian, index out of range - python

I'm trying to convert a string to a big endian but due to my lack of experience with bit shifting etc, I've got stuck with the following so far:
def my_func(self, b):
a = [(len(b)+3) >> 2]
for i, val in enumerate(b):
a[i>>2] |= ord(b[i]) << (24-(i & 3)*8)
return a
The above returns the error
a[i>>2] |= ord(b[i]) << (24-(i & 3)*8)
IndexError: list index out of range, and also never gets further through the loop index than #4.
The error message is pointing to the a[] list.
Can anyone see what I'm doing wrong here? I'm porting this from JavaScript so that may be the issue (link to that http://pastebin.com/GKE3AeCm )

Without resorting to other methods, your code just needs to be adjusted more correctly from the Javascript version. In Javascript you are creating an Array of certain length, but in your Python code you always create a list of size 1. Here is it corrected:
def my_func(b):
a = [0] * ((len(b)+3) >> 2)
for i, val in enumerate(b):
a[i>>2] |= ord(b[i]) << (24-(i & 3)*8)
return a
So what you are doing is considering sequences of 4 objects as being raw bytes and unpacking them to build an integer. Using struct, the correct way would be to be explicit about your data being bytes and passing it as such:
import struct
def my_func2(data):
lb = len(data)
if lb % 4:
data += b'\x00' * (4 - (lb % 4))
a = [struct.unpack('>i', data[i:i+4])[0] for i in range(0, lb, 4)]
return a
print(my_func2(b'pass123'))

Related

Why is using integers much slower than using strings in this problem dealing with binary strings?

I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution) and integers (class Solution2). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.
import time
class Solution:
def findKthBit(self, n: int, k: int) -> str:
i = 0
Sn = "0"
Sn = self.findR(Sn, i, n)
return Sn[k-1]
def findR(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSn(Sn, i)
return self.findR(newSn, i+1, n)
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
class Solution2:
def findKthBitBin(self, n: int, k: int) -> str:
i = 0
# MSB (S1) has to be 1 for bit operations but is actually always 0
Sn = 1
Sn = self.findRBin(Sn, i, n)
lenSn = (2**(n+1)) - 1
return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)
def findRBin(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSnBin(Sn, i)
return self.findRBin(newSn, i+1, n)
def calcNewSnBin(self, Sn, i):
lenSn = 2**(i+1) - 1
newSn = (Sn << 1) | 1
inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
newSn = self.reverseBits(newSn, inverted, lenSn)
return newSn
def getMask(self, lenSn):
mask = 0
for i in range(lenSn):
mask |= (1 << i)
return mask
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
sol = Solution()
sol2 = Solution2()
start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")
Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?
The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.
For example, look at your reverseBits implementation:
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
With a list, you could just call list.reverse(). Instead, your code shifts newSn and inverted over and over, copying almost all the data involved repeatedly on every iteration.
Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
Instead of building a new string and copying all the data for the +=, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.
The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.
The bulk of the time is spent in reverseBits(). Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just (inverted)[::-1], which runs entirely at "C speed", and is worst-case linear time.
Low Level, High Level
You can get a factor of 8 speedup just by replacing reverseBits()
def reverseBits(self, newSn, inverted, lenSn):
inverted = int(f"{inverted:0{lenSn}b}"[::-1], 2)
return (newSn << lenSn) | inverted
That's linear time. It converts inverted to a bit string, reverses that string, and converts the result back to an int. Other tricks can be used to speed other overly-slow bigint algorithms.
But, as the other answer (which you should accept!) said, it's missing the forest for the trees: the whole approach is off the best track. It's possible, and even with simpler code, to write a function such that all these run blazing fast (O(n) worst case per call), and require only trivial working memory:
>>> [findKthBit(4, i) for i in range(1, 16)] == list("011100110110001")
True
>>> findKthBit(500, 2**20)
'1'
>>> findKthBit(500, 2**20 - 1)
'1'
>>> findKthBit(500, 2**20 + 1)
'0'
There's not a computer in the world with enough memory to build a string with 2**500-1 characters (or a bigint with that many bits).

custom crc32 calculation in python without libs

I have been looking for a simple python code which can generate a crc32-sum. It is for a stm32 and i dont find a good example which is adjustable.
To get the right settings for my calculation i used following side.
http://www.sunshine2k.de/coding/javascript/crc/crc_js.html
The settings would be the following:
Polynomial: 0x4C11DB7,
Initial Value: 0xFFFFFFFF
and no Xor Value or 0x00, also the Input and result are not reflected.
Does someone know where i could get a simple adjustable algorithm or where i can learn how to write one?
Edit:
I use this function to create the table
def create_table():
a = []
for i in range(256):
k = i
for j in range(8):
if k & 1:
k ^= 0x4C11DB7
k >>= 1
a.append(k)
return a
and the following for generating the crc-sum
def crc32(bytestream):
crc_table = create_table()
crc32 = 0xffffffff
for byte in range( int(len(bytestream)) ):
lookup_index = (crc32 ^ byte) & 0xff
crc32 = (crc32 >> 8) ^ crc_table[lookup_index]
return crc32
and call the function with this
print(hex(crc32(b"1205")))
the result is: 0x9f8e7b8c
but the website gives me: 0xA7D10A0A
can someone help me?
First off, what you have is for a reflected CRC, not a non-reflected CRC. Though there is an error in your table construction. This:
if k & 1:
k ^= 0x4C11DB7
k >>= 1
is wrong. The exclusive-or must be done after the shift. So it would need to be (for the reflected case):
k = (k >> 1) ^ 0xedb88320 if k & 1 else k >> 1
Note that the polynomial also needs to be reflected in this case.
Another error in your code is using range to make the integers 0, 1, ..., and using those instead of the actual data bytes to compute the CRC on! What you want for your for loop is simply:
for byte in bytestream:
The whole point of using a table is to make the CRC calculation faster. You don't want to regenerate the table every time you do a CRC. You want to generate the table once when your program starts, and then use it multiple times. Or you can generate the table separately from your program, and then put the table itself in your program. That's what's usually done.
Anyway, to do the non-reflected case, you need to flip things around. So to make the table:
def create_table():
a = []
for i in range(256):
k = i << 24;
for _ in range(8):
k = (k << 1) ^ 0x4c11db7 if k & 0x80000000 else k << 1
a.append(k & 0xffffffff)
return a
To use the table:
def crc32(bytestream):
crc_table = create_table()
crc = 0xffffffff
for byte in bytestream:
lookup_index = ((crc >> 24) ^ byte) & 0xff
crc = ((crc & 0xffffff) << 8) ^ crc_table[lookup_index]
return crc
Now it correctly implements your specification, which happens to be the MPEG-2 32-bit CRC specification (from Greg Cook's CRC catalogue):
width=32 poly=0x04c11db7 init=0xffffffff refin=false refout=false xorout=0x00000000 check=0x0376e6e7 residue=0x00000000 name="CRC-32/MPEG-2"
For the code above, if I do:
print(hex(crc32(b'123456789')))
I get 0x376e6e7, which matches the check value in the catalog.
Again, you need to take the create_table() out of the crc32() routine and do it somewhere else, once.

What's actually happening when I convert an int to a string?

I understand it's easy to convert an int to a string by using the built-in method str(). However, what's actually happening? I understand it may point to the
__str__ method of the int object but how does it then compute the “informal” string representation? Tried looking at the source and didn't find a lead; any help appreciated.
Python repeatedly divides the int by 10 and uses % 10 to get the decimal digits one by one.
Just to make sure we're looking at the right code, here's the function Python 2.7 uses to convert ints to strings:
static PyObject *
int_to_decimal_string(PyIntObject *v) {
char buf[sizeof(long)*CHAR_BIT/3+6], *p, *bufend;
long n = v->ob_ival;
unsigned long absn;
p = bufend = buf + sizeof(buf);
absn = n < 0 ? 0UL - n : n;
do {
*--p = '0' + (char)(absn % 10);
absn /= 10;
} while (absn);
if (n < 0)
*--p = '-';
return PyString_FromStringAndSize(p, bufend - p);
}
This allocates enough space to store the characters of the string, then fills the digits in one by one, starting at the end. When it's done with the digits, it sticks a - sign on the front if the number is negative and constructs a Python string object from the characters. Translating that into Python, we get the following:
def int_to_decimal_string(n):
chars = [None] * enough # enough room for any int's string representation
abs_n = abs(n)
i = 0
while True:
i += 1
chars[-i] = str(abs_n % 10) # chr(ord('0') + abs_n % 10) is more accurate
abs_n //= 10
if not abs_n:
break
if n < 0:
i += 1
chars[-i] = '-'
return ''.join(chars[-i:])
Internally the Int object is stored as 2's complement representation like in C (well, this is true if range value allow it, python can automagically convert it to some other representation if it does not fit any more).
Now to get the string representation you have to change that to a string (and a string merely some unmutable list of chars). The algorithm is simple mathematical computing: divide the number by 10 (integer division) and keep the remainder, add that to character code '0'. You get the unit digit. Go on with the result of the division until the result of the division is zero. It's as simple as that.
This approach works with any integer representation but of course it will be more efficient to call the ltoa C library function or equivalent C code to do that if possible than code it in python.
When you call str() on an object it calls it's classes __ str__ magic method.
for example
class NewThing:
def __init__(self, name):
self.name = name
def __str__(self):
return self.name
From there you can use the str() method on the object, or use it directly in strings.
>> thing = NewThing("poop")
>> print thing
>> poop
More info on magic methods here
Not sure if this is what you wanted, but I can't comment yet to ask clarifying questions.

pop() middle two elements from list

Just doing a review of my Python class and noticed that I forgot how to do this.
def outsideIn2(lst):
'''(list)->list
Returns a new list where the middle two elements have been
removed and placed at the beginning of the result. Assume all lists are an even
length
>>> outsideIn2(['C','a','r','t','o','n'])
['r','t','C','a','o','n'] # rt moves to front
>>> outsideIn2(['H','i'])
['H','i'] # Hi moves to front so output remains the same.
>>> outsideIn2(['B','a','r','b','a','r','a',' ','A','n','n','e'])
['r','a','B','a','r','b,','a',' ','A','n','n','e'] # ra moves to front.
'''
length = len(lst)
middle1 = lst.pop((len(lst) / 2) - 1)
middle2 = lst.pop((len(lst) / 2) + 1)
lst.insert([0], middle1)
lst.insert([1], middle2)
return lst
I'm getting this error:
middle1 = lst.pop((len(lst) / 2) - 1)
TypeError: integer argument expected, got float
What am I doing wrong?
When you upgraded to Python 3, the '/' operator changed from giving you integer division to real division. Switch to '//' operator.
You can use // operator:
middle1 = lst.pop((len(lst) // 2) - 1)
The other answers explained why you are getting the error. You need to use // instead of / (also, just for the record, you need to give list.insert integers, not lists).
However, I'd like to suggest a different approach that uses Explain Python's slice notation:
def outsideIn2(lst):
x = len(lst)//2
return lst[x-1:x+1]+lst[:x-1]+lst[x+1:]
This method should be significantly faster than usinglist.pop and list.insert.
As proof, I made the below script to compare the two methods with timeit.timeit:
from timeit import timeit
def outsideIn2(lst):
length = len(lst)
middle1 = lst.pop((len(lst) // 2) - 1)
middle2 = lst.pop((len(lst) // 2) + 1)
lst.insert(0, middle1)
lst.insert(1, middle2)
return lst
print(timeit("outsideIn2(['B','a','r','b','a','r','a',' ','A','n','n','e'])", "from __main__ import outsideIn2"))
def outsideIn2(lst):
x = len(lst)//2
return lst[x-1:x+1]+lst[:x-1]+lst[x+1:]
print(timeit("outsideIn2(['B','a','r','b','a','r','a',' ','A','n','n','e'])", "from __main__ import outsideIn2"))
The results were as follows:
6.255111473664949
4.465956427423038
As you can see, my proposed method was ~2 seconds faster. However, you can run more tests if you would like to validate mine.
Using pop and insert (especially inserting at positions 0 and 1) can be fairly slow with Python lists. Since the underlying storage for the list is an array, inserting at position 0 means that the element at position n-1 has to be moved to position n, then the element at n-2 has to be moved to n-1 and so on. pop has to do the same in reverse. So imagine in your little method how many element moves must be done. Roughly:
pop #1 - move n/2 elements
pop #2 - move n/2 elements
insert 0 - move n elements
insert 1 - move n elements
So approximately 3n moves are done in this code.
Breaking the list into 3 slices and reassembling a new list may be more optimal:
def outsideIn2(lst):
midstart = len(lst)//2-1
left,mid,right = lst[0:midstart], lst[midstart:midstart+2], lst[midstart+2:]
return mid+left+right
Plus you won't run into any weird issues by pop changing the length of the list between the first and second call to pop. And the slices implicitly guard against index errors when you get a list that is shorter than 2 characters.

read single bit operation python 2.6

I am trying to read a single bit in a binary string but can't seem to get it to work properly. I read in a value then convert to a 32b string. From there I need to read a specific bit in the string but its not always the same. getBin function returns 32bit string with leading 0's. The code I have always returns a 1, even if the bit is a 0. Code example:
slot=195035377
getBin = lambda x, n: x >= 0 and str(bin(x))[2:].zfill(n) or "-" + str(bin(x))[3:].zfill(n)
bits = getBin(slot,32)
bit = (bits and (1 * (2 ** y)) != 0)
print("bit: %i\n"%(bit))
in this example bits = 00001011101000000000000011110011
and if I am looking for bit3 which i s a 0, bit will be equal to 1. Any ideas?
To test for specific bits in a integer value, use the & bitwise operand; no need to convert this to a binary string.
if slot & (1 << 3):
print 'bit 3 is set'
else:
print 'bit 3 is not set'
The above code shifts a test bit to the left twice. Alternatively, shift slot to the right 3 times:
if (slot >> 2) & 1:
To make this generic for any bit position, subtract 1:
if slot & (1 << (bitpos - 1)):
print 'bit {} is set'.format(bitpos)
or
if (slot >> (bitpos - 1)) & 1:
Your binary formatting code is overly verbose. Just use the format() function to create a binary string representation:
format(slot, '032b')
formats your binary value to a 0-padded 32-character binary string.
n = 223
bitpos = 3
bit3 = (n >> (bitpos-1))&1
is how you should be doing it ... don't use strings!
You can just use slicing to get the correct digit.
bits = getBin(slot, 32)
bit = bits[bit_location-1:bit_location] #Assumes zero based values
print("bit: %i\n"%(bit))

Categories