Python turn array of booleans to binary

Python turn array of booleans to binary - python

I am preparing new driver for one of our new hardware devices.
One of the option to set it up, is where one byte, has 8 options in it. Every bite turns on or off something else.
So, basically what I need to do is, take 8 zeros or ones and create one byte of them.
What I did is, I have prepares helper function for it:
#staticmethod
def setup2byte(setup_array):
"""Turn setup array (of 8 booleans) into byte"""
data = ''
for b in setup_array:
data += str(int(b))
return int(data, 2)
Called like this:
settings = [echo, reply, presenter, presenter_brake, doors_action, header, ticket_sensor, ext_paper_sensor]
data = self.setup2byte(settings)
packet = "{0:s}{1:s}{2:d}{3:s}".format(CONF_STX, 'P04', data, ETX)
self.queue_command.put(packet)
and I wonder if there is easier way how to do it. Some built in function or something like that. Any ideas?

I believe you want this:
convert2b = lambda ls: bytes("".join([str(int(b)) for b in ls]), 'utf-8')
Where ls is a list of booleans. Works in python 2.7 and 3.x. Alternative more like your original:
convert2b = lambda ls: int("".join([str(int(b)) for b in ls]), 2)

that's basically what you are already doing, but shorter:
data = int(''.join(['1' if i else '0' for i in settings]), 2)
But here is the answer you are looking for:
Bool array to integer

I think the previous answers created 8 bytes. This solution creates one byte only
settings = [False,True,False,True,True,False,False,True]
# LSB first
integerValue = 0
# init value of your settings
for idx, setting in enumerate(settings):
integerValue += setting*2**idx
# initialize an empty byte
mybyte = bytearray(b'\x00')
mybyte[0] =integerValue
print (mybyte)
For more example visit this great site: binary python

Related

PySpark / Python Slicing and Indexing Issue

Can someone let me know how to pull out certain values from a Python output.
I would like the retrieve the value 'ocweeklyreports' from the the following output using either indexing or slicing:
'config': '{"hiveView":"ocweeklycur.ocweeklyreports"}
This should be relatively easy, however, I'm having problem defining the Slicing / Indexing configuation
The following will successfully give me 'ocweeklyreports'
myslice = config['hiveView'][12:30]
However, I need the indexing or slicing modified so that I will get any value after'ocweeklycur'

I'm not sure what output you're dealing with and how robust you're wanting it but if it's just a string you can do something similar to this (for a quick and dirty solution).
input = "Your input"
indexStart = input.index('.') + 1 # Get the index of the input at the . which is where you would like to start collecting it
finalResponse = input[indexStart:-2])
print(finalResponse) # Prints ocweeklyreports
Again, not the most elegant solution but hopefully it helps or at least offers a starting point. Another more robust solution would be to use regex but I'm not that skilled in regex at the moment.

You could almost all of it using regex.
See if this helps:
import re
def search_word(di):
st = di["config"]["hiveView"]
p = re.compile(r'^ocweeklycur.(?P<word>\w+)')
m = p.search(st)
return m.group('word')
if __name__=="__main__":
d = {'config': {"hiveView":"ocweeklycur.ocweeklyreports"}}
print(search_word(d))

The following worked best for me:
# Extract the value of the "hiveView" key
hive_view = config['hiveView']
# Split the string on the '.' character
parts = hive_view.split('.')
# The value you want is the second part of the split string
desired_value = parts[1]
print(desired_value) # Output: "ocweeklyreports"

Pandas Styler.to_latex() - how to pass commands and do simple editing

How do I pass the following commands into the latex environment?
\centering (I need landscape tables to be centered)
and
\caption* (I need to skip for a panel the table numbering)
In addition, I would need to add parentheses and asterisks to the t-statistics, meaning row-specific formatting on the dataframes.
For example:
Current
variable
value
const
2.439628
t stat
13.921319
FamFirm
0.114914
t stat
0.351283
founder
0.154914
t stat
2.351283
Adjusted R Square
0.291328
I want this
variable
value
const
2.439628
t stat
(13.921319)***
FamFirm
0.114914
t stat
(0.351283)
founder
0.154914
t stat
(1.651283)**
Adjusted R Square
0.291328
I'm doing my research papers in DataSpell. All empirical work is in Python, and then I use Latex (TexiFy) to create the pdf within DataSpell. Due to this workflow, I can't edit tables in latex code while they get overwritten every time I run the jupyter notebook.
In case it helps, here's an example of how I pass a table to the latex environment:
# drop index to column
panel_a.reset_index(inplace=True)
# write Latex index and cut names to appropriate length
ind_list = [
"ageFirm",
"meanAgeF",
"lnAssets",
"bsVol",
"roa",
"fndrCeo",
"lnQ",
"sic",
"hightech",
"nonFndrFam"
]
# assign the list of values to the column
panel_a["index"] = ind_list
# format column names
header = ["", "count","mean", "std", "min", "25%", "50%", "75%", "max"]
panel_a.columns = header
with open(
os.path.join(r"/.../tables/panel_a.tex"),"w"
) as tf:
tf.write(
panel_a
.style
.format(precision=3)
.format_index(escape="latex", axis=1)
.hide(level=0, axis=0)
.to_latex(
caption = "Panel A: Summary Statistics for the Full Sample",
label = "tab:table_label",
hrules=True,
))

You're asking three questions in one. I think I can do you two out of three (I hear that "ain't bad").
How to pass \centering to the LaTeX env using Styler.to_latex?
Use the position_float parameter. Simplified:
df.style.to_latex(position_float='centering')
How to pass \caption*?
This one I don't know. Perhaps useful: Why is caption not working.
How to apply row-specific formatting?
This one's a little tricky. Let me give an example of how I would normally do this:
df = pd.DataFrame({'a':['some_var','t stat'],'b':[1.01235,2.01235]})
df.style.format({'a': str, 'b': lambda x: "{:.3f}".format(x)
if x < 2 else '({:.3f})***'.format(x)})
Result:
You can see from this example that style.format accepts a callable (here nested inside a dict, but you could also do: .format(func, subset='value')). So, this is great if each value itself is evaluated (x < 2).
The problem in your case is that the evaluation is over some other value, namely a (not supplied) P value combined with panel_a['variable'] == 't stat'. Now, assuming you have those P values in a different column, I suggest you create a for loop to populate a list that becomes like this:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
Now, we can apply a function to df.style.format, and pop/select from the list like so:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
panel_a.style.format({'variable': str, 'value': func})
Result:
This solution is admittedly a bit "hacky", since modifying a globally declared list inside a function is far from good practice; e.g. if you modify the list again before calling func, its functionality is unlikely to result in the expected behaviour or worse, it may throw an error that is difficult to track down. I'm not sure how to remedy this other than simply turning all the floats into strings in panel_a.value inplace. In that case, of course, you don't need .format anymore, but it will alter your df and that's also not ideal. I guess you could make a copy first (df2 = df.copy()), but that will affect memory.
Anyway, hope this helps. So, in full you add this as follows to your code:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
with open(fname, "w") as tf:
tf.write(
panel_a
.style
.format({'variable': str, 'value': func})
...
.to_latex(
...
position_float='centering'
))

How to traverse dictionary keys in sorted order

I am reading a cfg file, and receive a dictionary for each section. So, for example:
Config-File:
[General]
parameter1="Param1"
parameter2="Param2"
[FileList]
file001="file1.txt"
file002="file2.txt" ......
I have the FileList section stored in a dictionary called section. In this example, I can access "file1.txt" as test = section["file001"], so test == "file1.txt". To access every file of FileList one after the other, I could try the following:
for i in range(1, (number_of_files + 1)):
access_key = str("file_00" + str(i))
print(section[access_key])
This is my current solution, but I don't like it at all. First of all, it looks kind of messy in python, but I will also face problems when more than 9 files are listed in the config.
I could also do it like:
for i in range(1, (number_of_files + 1)):
if (i <= 9):
access_key = str("file_00" + str(i))
elif (i > 9 and i < 100):
access_key = str("file_0" + str(i))
print(section[access_key])
But I don't want to start with that because it becomes even worse. So my question is: What would be a proper and relatively clean way to go through all the file names in order? I definitely need the loop because I need to perform some actions with every file.

Use zero padding to generate the file number (for e.g. see this SO question answer: https://stackoverflow.com/a/339013/3775361). That way you don’t have to write the logic of moving through digit rollover yourself—you can use built-in Python functionality to do it for you. If you’re using Python 3 I’d also recommend you try out f-strings (one of the suggested solutions at the link above). They’re awesome!

If we can assume the file number has three digits, then you can do the followings to achieve zero padding. All of the below returns "015".
i = 15
str(i).zfill(3)
# or
"%03d" % i
# or
"{:0>3}".format(i)
# or
f"{i:0>3}"

Start by looking at the keys you actually have instead of guessing what they might be. You need to filter out the ones that match your pattern, and sort according to the numerical portion.
keys = [key for key in section.keys() if key.startswith('file') and key[4:].isdigit()]
You can add additional conditions, like len(key) > 4, or drop the conditions entirely. You might also consider learning regular expressions to make the checking more elegant.
To sort the names without having to account for padding, you can do something like
keys = sorted(keys, key=lambda s: int(s[4:]))
You can also try a library like natsort, which will handle the custom sort key much more generally.
Now you can iterate over the keys and do whatever you want:
for key in sorted((k for k in section if k.startswith('file') and k[4:].isdigit()), key=lambda s: int(s[4:])):
print(section[key])
Here is what a solution equipt with re and natsort might look like:
import re
from natsort import natsorted
pattern = re.compile(r'file\d+')
for key in natsorted(k for k in section if pattern.fullmatch(k)):
print(section[key])

How can I increase the amount of array iterated during the run-time of script?

My script cleans arrays from the unwanted string like "##$!" and other stuff.
The script works as intended but the speed of it is extremely slow when the excel row size is big.
I tried to use numpy if it could speed it up but I'm not too familiar with is so I might be using it incorrectly.
xls = pd.ExcelFile(path)
df = xls.parse("Sheet2")
TeleNum = np.array(df['telephone'].values)
def replace(orignstr): # removes the unwanted string from numbers
for elem in badstr:
if elem in orignstr:
orignstr = orignstr.replace(elem, '')
return orignstr
for UncleanNum in tqdm(TeleNum):
newnum = replace(str(UncleanNum)) # calling replace function
df['telephone'] = df['telephone'].replace(UncleanNum, newnum) # store string back in data frame
I also tried removing the method to if that would help and just place it as one block of code but the speed remained the same.
for UncleanNum in tqdm(TeleNum):
orignstr = str(UncleanNum)
for elem in badstr:
if elem in orignstr:
orignstr = orignstr.replace(elem, '')
print(orignstr)
df['telephone'] = df['telephone'].replace(UncleanNum, orignstr)
TeleNum = np.array(df['telephone'].values)
The current speed of the script running an excel file of 200,000 is around 70it/s and take around an hour to finish. Which is not that good since this is just one function of many.
I'm not too advanced in python. I'm just learning as I script so if you have any pointer it would be appreciated.
Edit:
Most of the array elements Im dealing with are numbers but some have string in them. I trying to remove all string in the array element.
Ex.
FD3459002912
*345*9002912$

If you are trying to clear everything that isn't a digit from the strings you can directly use re.sub like this:
import re
string = "FD3459002912"
regex_result = re.sub("\D", "", string)
print(regex_result) # 3459002912

Validating that all components required for an object to exist are present

I need to write a script that gets a list of components from an external source and based on a pre-defined list it validates whether the service is complete. This is needed because the presence of a single component doesn't automatically imply that the service is present - some components are pre-installed even when there is no service. I've devised something really simple below, but I was wondering what is the intelligent way of doing this? There must be a cleaner, simpler way.
# Components that make up a complete service
serviceComponents = ['A','B']
# Input from JSON
data = ['B','A','C']
serviceComplete = True
for i in serviceComponents:
if i in data:
print 'yay ' + i + ' found from ' + ', '.join(service2)
else:
serviceComplete = False
break
# If serviceComplete = True do blabla...

You could do it a few different ways:
set(serviceComponents) <= set(data)
set(serviceComponents).issubset(data)
all(c in data for c in serviceComponents)
You can make it shorter, but you lose readability. What you have now is probably fine. I'd go with the first approach personally, since it expresses your intent clearly with set operations.

# Components that make up a complete service
serviceComponents = ['A','B']
# Input from JSON
data = ['B','A','C']
if all(item in data for item in serviceComponents):
print("All required components are present")

Built-in Set would serve for you, use set.issubset to identify that your required service components is subset of input data:
serviceComponents = set(['A','B'])
input_data = set(['B','A','C'])
if serviceComponents.issubset(input_data):
# perform actions ...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python turn array of booleans to binary - python

I believe you want this: convert2b = lambda ls: bytes("".join([str(int(b)) for b in ls]), 'utf-8') Where ls is a list of booleans. Works in python 2.7 and 3.x. Alternative more like your original: convert2b = lambda ls: int("".join([str(int(b)) for b in ls]), 2)

that's basically what you are already doing, but shorter: data = int(''.join(['1' if i else '0' for i in settings]), 2) But here is the answer you are looking for: Bool array to integer

Related

PySpark / Python Slicing and Indexing Issue

Pandas Styler.to_latex() - how to pass commands and do simple editing

How to traverse dictionary keys in sorted order

How can I increase the amount of array iterated during the run-time of script?

Validating that all components required for an object to exist are present

Categories

Resources