Why are wrong bits extracted in python? - python

I need several k bits at position p extracted in order to convert to decimal value. I use a standard function but still when I test it for a 6/7/8 byte long binary code it is not correct. When I have a 1 byte code it is little-endian but once I use a 8 byte code it is shifted. For one signal (of a 7byte code) it was shifted +7bits but another signal (another ID, 8byte code) was shifted by -21bits. I cannot explain this to myself so I thought of playing around and manually add or subtract bits in order to use the correct bits for calculation. Do you have any idea why this is happening?
Example 8bytes:
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',16,0)
Output: 001110100111101 instead of 1000011110000000
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',16,24)
Output: 0110010001110111 instead of 1011001000111011
This is the code I am working with:
import openpyxl
from openpyxl import Workbook
theFile = openpyxl.load_workbook('Adapted_T013.xlsx')
allSheetNames = theFile.sheetnames
print("All sheet names {} " .format(theFile.sheetnames))
sheet = theFile.active
def extract_k_bits(inputBIN,k,p):
end = len(inputBIN) - p
start = end - k + 1
kBitSub = inputBIN[start : end+1]
print(kBitSub)
Dec_Values=int(kBitSub,2)

Here's a working solution:
def extract_k_bits(inputBIN,k):
# Since you always extract 16 bits
# just use the point where you want to extract from
kBitSub = inputBIN[k : k+16]
print(kBitSub)
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',0)
extract_k_bits('100001111000000001110111011001000111011111111000001110100111101',23)
Output

Related

How can I increase the amount of array iterated during the run-time of script?

My script cleans arrays from the unwanted string like "##$!" and other stuff.
The script works as intended but the speed of it is extremely slow when the excel row size is big.
I tried to use numpy if it could speed it up but I'm not too familiar with is so I might be using it incorrectly.
xls = pd.ExcelFile(path)
df = xls.parse("Sheet2")
TeleNum = np.array(df['telephone'].values)
def replace(orignstr): # removes the unwanted string from numbers
for elem in badstr:
if elem in orignstr:
orignstr = orignstr.replace(elem, '')
return orignstr
for UncleanNum in tqdm(TeleNum):
newnum = replace(str(UncleanNum)) # calling replace function
df['telephone'] = df['telephone'].replace(UncleanNum, newnum) # store string back in data frame
I also tried removing the method to if that would help and just place it as one block of code but the speed remained the same.
for UncleanNum in tqdm(TeleNum):
orignstr = str(UncleanNum)
for elem in badstr:
if elem in orignstr:
orignstr = orignstr.replace(elem, '')
print(orignstr)
df['telephone'] = df['telephone'].replace(UncleanNum, orignstr)
TeleNum = np.array(df['telephone'].values)
The current speed of the script running an excel file of 200,000 is around 70it/s and take around an hour to finish. Which is not that good since this is just one function of many.
I'm not too advanced in python. I'm just learning as I script so if you have any pointer it would be appreciated.
Edit:
Most of the array elements Im dealing with are numbers but some have string in them. I trying to remove all string in the array element.
Ex.
FD3459002912
*345*9002912$
If you are trying to clear everything that isn't a digit from the strings you can directly use re.sub like this:
import re
string = "FD3459002912"
regex_result = re.sub("\D", "", string)
print(regex_result) # 3459002912

Python turn array of booleans to binary

I am preparing new driver for one of our new hardware devices.
One of the option to set it up, is where one byte, has 8 options in it. Every bite turns on or off something else.
So, basically what I need to do is, take 8 zeros or ones and create one byte of them.
What I did is, I have prepares helper function for it:
#staticmethod
def setup2byte(setup_array):
"""Turn setup array (of 8 booleans) into byte"""
data = ''
for b in setup_array:
data += str(int(b))
return int(data, 2)
Called like this:
settings = [echo, reply, presenter, presenter_brake, doors_action, header, ticket_sensor, ext_paper_sensor]
data = self.setup2byte(settings)
packet = "{0:s}{1:s}{2:d}{3:s}".format(CONF_STX, 'P04', data, ETX)
self.queue_command.put(packet)
and I wonder if there is easier way how to do it. Some built in function or something like that. Any ideas?
I believe you want this:
convert2b = lambda ls: bytes("".join([str(int(b)) for b in ls]), 'utf-8')
Where ls is a list of booleans. Works in python 2.7 and 3.x. Alternative more like your original:
convert2b = lambda ls: int("".join([str(int(b)) for b in ls]), 2)
that's basically what you are already doing, but shorter:
data = int(''.join(['1' if i else '0' for i in settings]), 2)
But here is the answer you are looking for:
Bool array to integer
I think the previous answers created 8 bytes. This solution creates one byte only
settings = [False,True,False,True,True,False,False,True]
# LSB first
integerValue = 0
# init value of your settings
for idx, setting in enumerate(settings):
integerValue += setting*2**idx
# initialize an empty byte
mybyte = bytearray(b'\x00')
mybyte[0] =integerValue
print (mybyte)
For more example visit this great site: binary python

Python - Reading a CSV, won't print the contents of the last column

I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.
row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.
If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()

Fastest way to parse (split) binary bits in python

We are counting photons and time-tagging with this FPGA counter.We got about 500MB of data per minutes. I am getting 32bits of data in hex string *32-bit signed integers stored using little-endian byte order. Currently I am doing like:
def getall(file):
data1 = np.memmap(file, dtype='<i4', mode='r')
d0=0
raw_counts=[]
for i in data1:
binary = bin(i)[2:].zfill(8)
decimal = int(binary[5:],2)
if binary[:1] == '1':
raw_counts.append(decimal)
counter=collections.Counter(raw_counts)
sorted_counts=sorted(counter.items(), key=lambda pair: pair[0], reverse=False)
return counter,counter.keys(),counter.values()
I think this part (binary = bin(i)[2:].zfill(8);decimal = int(binary[5:],2)) is slowing down the process. ( No it is not. I found out by profiling my program.) Is there any way to speed it up? So far I only need the binary bits from [5:]. I don't need all 32bits. So I think the parsing the 32bits to last 27bits is taking much of the time. Thanks,
*Update 1
J.F.Sebastian pointed me it is not in hex string.
*Update 2
Here is the final code if any one needs it. I ended up using np.unique instead of collection counter. At the end , I converted back to collection counter because I want to get accumulative counting.
#http://stackoverflow.com/questions/10741346/numpy-most-efficient-frequency-counts-for-unique-values-in-an-array
def myc(x):
unique, counts = np.unique(x, return_counts=True)
return np.asarray((unique, counts)).T
def getallfast(file):
data1 = np.memmap(file, dtype='<i4', mode='r')
data2=data1[np.nonzero((~data1 & (31 <<1)))] & 0x7ffffff #See J.F.Sebastian's comment.
counter=myc(data2)
raw_counts=dict(zip(counter[:,0],counter[:,1]))
counter=collections.Counter(raw_counts)
return counter,counter.keys(),counter.values()
However this one looks like the fastest version for me. data1[np.nonzero((~data1 & (31 <<1)))] & 0x7ffffff is slowing down compared to counting first and convert the data later binary = bin(counter[i,0])[2:].zfill(8)
def myc(x):
unique, counts = np.unique(x, return_counts=True)
return np.asarray((unique, counts)).T
def getallfast(file):
data1 = np.memmap(file, dtype='<i4', mode='r')
counter=myc(data1)
xnew=[]
ynew=[]
raw_counts=dict()
for i in range(len(counter)):
binary = bin(counter[i,0])[2:].zfill(8)
decimal = int(binary[5:],2)
xnew.append(decimal)
ynew.append(counter[i,1])
raw_counts[decimal]=counter[i,1]
counter=collections.Counter(raw_counts)
return counter,xnew,ynew
I guess you could try one of these 2
could just take the bits with binary and fivebits=my_int&0x1f
if you want the five bits at the other end just fivebits = my_int >> (32-5)
but really in my experience converting it to a string is quite fast ... I thought that was a bottle neck many years ago ... after profiling it I found it wasnt

calling a function from another file

I'm writing a code on python where I must import a function from other file. I write import filename and filename.functionname and while I'm writing the first letter of the function name a window pops up on PyCharm showing me the full name of the function, so I guess Python knows that the file has the function I need. When I try it on console it works. But when I run the same thing on my code it gives an error: 'module' object has no attribute 'get_ecc'. what could be the problem? The only import part is the last function, make_qr_code.
""" Create QR error correction codes from binary data, according to the
standards laid out at http://www.swetake.com/qr/qr1_en.html. Assumes the
following when making the codes:
- alphanumeric text
- level Q error-checking
Size is determined by version, where Version 1 is 21x21, and each version up
to 40 is 4 more on each dimension than the previous version.
"""
import qrcode
class Polynomial(object):
""" Generator polynomials for error correction.
"""
# The following tables are constants, associated with the *class* itself
# instead of with any particular object-- so they are shared across all
# objects from this class.
# We break style guides slightly (no space following ':') to make the
# tables easier to read by organizing the items in lines of 8.
def get_ecc(binary_string, version, ec_mode):
""" Create the error-correction code for the binary string provided, for
the QR version specified (in the range 1-9). Assumes that the length of
binary_string is a multiple of 8, and that the ec_mode is one of 'L', 'M',
'Q' or 'H'.
"""
# Create the generator polynomial.
generator_coeffs = get_coefficients(SIZE_TABLE[version, ec_mode][1])
generator_exps = range(len(generator_coeffs) - 1, -1, -1)
generator_poly = Polynomial(generator_coeffs, generator_exps)
# Create the message polynomial.
message_coeffs = []
while binary_string:
message_coeffs.append(qrcode.convert_to_decimal(binary_string[:8]))
binary_string = binary_string[8:]
message_max = len(message_coeffs) - 1 + len(generator_coeffs) - 1
message_exps = range(message_max, message_max - len(message_coeffs), -1)
message_poly = Polynomial(message_coeffs, message_exps)
# Keep dividing the message polynomial as much as possible, leaving the
# remainder in the resulting polynomial.
while message_poly.exps[-1] > 0:
message_poly.divide_by(generator_poly)
# Turn the error-correcting code back into binary.
ecc_string = ""
for item in message_poly.coeffs:
ecc_string += qrcode.convert_to_binary(item, 8)
return ecc_string

Categories