How to dynamically store and execute equations and functions - python

During my current project, I have been receiving data from a set of long-range sensors, which are sending data as a series of bytes. Generally, due to having multiple types of sensors, the bytes structures and data contained are different, hence the need to make the functionality more dynamic as to avoid having to hard-code every single setup in the future (which is not practical).
The server will be using Django, which I believe is irrelevant to the issue at hand but I have mentioned just in case it might have something that can be used.
The bytes data I am receiving looks like this:
b'B\x10Vu\x87%\x00x\r\x0f\x04\x01\x00\x00\x00\x00\x1e\x00\x00\x00\x00ad;l'
And my current process looks like this:
Take the first bytes to get the deviceID (deviceID = val[0:6].hex())
Look up the format to be used in the struct.unpack() (here: >BBHBBhBHhHL after removing the first bytes for the id.
Now, the issue is the next step. Many of the datas I have have different forms of per-processing that needs to be done. F.e. some values need to be ran with a join statement (e.g. ".".join(str(values[2]) ) while others need some simple mathematical changes (-113 + 2 * values[4]) and finally, others needs a simple logic check (values[7]==0x80) to return a boolean value.
My question is, what's the best way to code those methods? I would really like to avoid hardcoding them, but it almost seems like the best idea. another idea I saw was to store the functionalities as a string and execute them such as seen here, but I've been reading that its a very bad idea, and that it also slows down execution. The last idea I had was to hardcode some general functions only and use something similar to here, but this doesn't solve the issue of having to hard-code every new sensor-type, which is not realistic in a live-installation. Are there any better methods to achieve the same thing?
I have also looked at here, with the idea that some functionality can be somehow optimized as an equation, but I didn't see that a possibility for every occurrence, especially when any string manipulation is needed at all.
Additionally, is there a possibility of using some maths to apply some basic string manipulation? I can hard-code one string manipulation maybe, but to be honest this whole thing has been bugging me...
Finally, I am considering if I go with the function storing as string then executing, is there a way to set some "security" to avoid any malicious exploitation? Since such a method is... awful insecure to say the least.
However, after almost a week total of searching I am so far unable to find a better solution than storing functions as a string and running eval on them, despite not liking that option. If anyone finds a better option before then, I would be extremely grateful to any tips or ideas.
Appendum: Minimum code that can be used to show-case and test different methods:
import struct
def decode(input):
val = bytearray(input)
deviceID = val[0:6].hex()
del(val[0:6])
print(deviceID)
values = list(struct.unpack('>BBHBBhBHhHL', val))
print(values)
# Now what?
decode(b'B\x10Vu\x87%\x00x\r\x0f\x04\x01\x00\x00\x00\x00\x1e\x00\x00\x00\x00ad;l')

Related

Find all possible stacktraces for a function

First of all, I'm not sure I'm asking for something that is doable.
Sometimes, when I have to do some refactoring in a large codebase, I need to change the output or even the signature of a certain function. Before than I do that, I have to make sure that that output is handled correctly by the other functions that are going to call it.
The way I do it is by searching for the function name, say get_timezone_from user and then I see that function is used by two other functions called format and change_timezone_for_user. And I start looking for where those two functions are used etc... until I end up with a graph where each path looks like a stacktrace.
There are two problems.
The first one is that doing this manually is pretty time consuming.
The second is that when I look at all occurrences of a function name like format, I will often find occurences of the word format in different contexts. It can be a comment or even another function defined somewhere else.
My question is simple:
Is there an IDE or a tool that allows to find where in your code a certain function is called? Even better would be if that tool is recursive so that it draws the whole graph.
I'm trying to achieve this in Python if that helps.

How do I compare two nested data structures for unittesting?

For those who know perl, I'm looking for something similar to Test::Deep::is_deeply() in Python.
In Python's unittest I can conveniently compare nested data structures already, if I expect them to be equal:
self.assertEqual(os.walk('some_path'),
my.walk('some_path'),
"compare os.walk with my own implementation")
However, in the wanted test, the order of files in the respective sublist of the os.walk tuple shall be of no concern.
If it was just this one test it would be ok to code an easy solution. But I envision several tests on differently structured nested data. And I am hoping for a general solution.
I checked Python's own unittest documentation, looked at pyUnit, and at nose and it's plugins. Active maintenance would also be an important aspect for usage.
The ultimate goal for me would be to have a set of descriptive types like UnorderedIterable, SubsetOf, SupersetOf, etc which can be called to describe a nested data structure, and then use that description to compare two actual sets of data.
In the os.walk example I'd like something like:
comparison = OrderedIterable(
OrderedIterable(
str,
UnorderedIterable(),
UnorderedIterable()
)
)
The above describes the kind of data structure that list(os.walk()) would return. For comparison of data A and data B in a unit test, the current path names would be cast into a str(), and the dir and file lists would be compared ignoring the order with:
self.assertDeep(A, B, comparison, msg)
Is there anything out there? Or is it such a trivial task that people write their own? I feel comfortable doing it, but I don't want to reinvent, and especially would not want to code the full orthogonal set of types, tests for those, etc. In short, I wouldn't publish it and thus the next one has to rewrite again...
Python Deep seems to be a project to reimplement perl's Test::Deep. It is written by the author of Test::Deep himself. Last development happened in early 2016.
Update (2018/Aug): Latest release (2016/Feb) is located on PyPi/Deep
I have done some P3k porting work on github
Not a solution, but the currently implemented workaround to solve the particular example listed in the question:
os_walk = list(os.walk('some_path'))
dt_walk = list(my.walk('some_path'))
self.assertEqual(len(dt_walk), len(os_walk), "walk() same length")
for ((osw, osw_dirs, osw_files), (dt, dt_dirs, dt_files)) in zip(os_walk, dt_walk):
self.assertEqual(dt, osw, "walk() currentdir")
self.assertSameElements(dt_dirs, osw_dirs, "walk() dirlist")
self.assertSameElements(dt_files, osw_files, "walk() fileList")
As we can see from this example implementation that's quite a bit of code. As we can also see, Python's unittest has most of the ingredients required.

Trying to understand which is better in python creating variables or using expressions

One of the practices I have gotten into in Python from the beginning is to reduce the number of variables I create as compared to the number I would create when trying to do the same thing in SAS or Fortran
for example here is some code I wrote tonight:
def idMissingFilings(dEFilings,indexFilings):
inBoth=set(indexFilings.keys()).intersection(dEFilings.keys())
missingFromDE=[]
for each in inBoth:
if len(dEFilings[each])<len(indexFilings[each]):
dEtemp=[]
for filing in dEFilings[each]:
#dateText=filing.split("\\")[-1].split('-')[0]
#year=dateText[0:5]
#month=dateText[5:7]
#day=dateText[7:]
#dETemp.append(year+"-"+month+"-"+day+"-"+filing[-2:])
dEtemp.append(filing.split('\\')[-1].split('-')[0][1:5]+"-"+filing.split('\\')[-1].split('-')[0][5:7]+"-"+filing.split('\\')[-1].split('-')[0][7:]+"-"+filing[-2:])
indexTemp=[]
for infiling in indexFilings[each]:
indexTemp.append(infiling.split('|')[3]+"-"+infiling[-6:-4])
tempMissing=set(indexTemp).difference(dEtemp)
for infiling in indexFilings[each]:
if infiling.split('|')[3]+"-"+infiling[-6:-4] in tempMissing:
missingFromDE.append(infiling)
return missingFromDE
Now I split one of the strings I am processing 4 times in the line dEtemp.append(blah blah blah)
filing.split('\\')
Historically in Fortran or SAS if I were to attempt the same I would have 'sliced' my string once and assigned a variable to each part of the string that I was going to use in this expression.
I am constantly forcing myself to use expressions instead of first resolving to a value and using the value. The only reason I do this is that I am learning by mimicking other people's code but it has been in the back of my mind to ask this question - where can I find a cogent discussion of why one is better than the other
The code compares a set of documents on a drive and a source list of those documents and checks to see whether all of those from the source are represented on the drive
Okay the commented section is much easier to read and how I decided to respond to nosklos answer
Yeah, it is not better to put everything in the expression. Please use variables.
Using variables is not only better because you will do the operation only once and save the value for multiple uses. The main reason is that code becomes more readable that way. If you name the variable right, it doubles as free implicit documentation!
Use more variables. Python is known for its readability; taking away that feature is called not "Pythonic" (See https://docs.python-guide.org/writing/style/). Code that is more readable will be easier for others to understand, and easier to understand yourself later.

(python)is generating and executing code at runtime considered bad practice?

I'm learning python and i want to learn good practices from the start i had a problem and i came up with a solution that involved generating variables on runtime, this is a sample of what i was doing
for i in range (10):
current = 'variable'+str(i)+' = '+str(i)
exec (current)
So, is doing things like this considered bad practice i know this is a simple example but i can see this getting complicated if you include objects in the mix, i'm bad at determine good readable code from bad one(i'm a newbie after all) so i'm asking you vets if this is frowned upon what are the prefered ways to handle situations like this.
No, this isn't a good practice. Whatever you're doing, the solution is probably to put your data in a dict, then you can just access mydict['variable1'] or whatever.
There ARE times when exec is reasonable, but they are typically cases of advanced metaprogramming.
A guideline here: If you don't know exactly why your're doing it, and why there is no other way, don't.
I'm hardly a vet, but the problem I see with this are:
Readability, as you mention. It's also difficult to debug because the buggy code might be assembled from all over the place.
Syntax errors at runtime. This is even more annoying, in my experience, than logical errors.
Code injection, and this one's the killer. How do you make sure that current does not code you do not want executed? (This web applications: a remote user might try to erase your data.) In practice, you have to make that very very sure that current does not include anything that comes from user input that it's usually faster and safer to find a different way of doing it.
The proper way to handle several related quantities is to put them in an appropriate sort of container. That applies no matter what language you're using.
In Python, the normal containers are lists, tuples and dicts. You choose depending on what you're going to be doing with the contents.
Further, you can create a new list of any sort of "patterned" or "transformed" data easily with a list comprehension:
number_strings = [str(i) for i in range(10)]
# number_strings is a list of 10 values, each of which is a string
# representation of the numbers 0 through 9 inclusive. range(10) creates a
# list of 10 values, being the integers 0 through 9; we then process that
# list with the list comprehension, transforming each integer with 'str'.

Is there a good language/syntax for field validation we can re-use?

I'm working on a web app (using Python & Bottle) and building a decorator for validating HTTP parameters sent in GET or POST. The early version takes callables, so this:
#params(user_id=int, user_name=unicode)
... ensures that user_id is an int, user_name is a string, and both fields exist.
But that's not enough. I want to be able to specify that user_name is optional, or that it must be non-empty and within 40 characters. Implementing that is easy enough, but I'm struggling with what syntax would be most elegant. It seems like that might be a problem someone's solved before, but I'm not finding an answer. Specifically I'm wondering if there's an elegant way to take parseable strings that provide the syntax. Something like:
#params(user_id='int:min1', user_name='unicode:required:max40')
I just don't want to invent a syntax if there's a good one floating around somewhere.
Anyone seen something like this? In any language..but I'm specifically valuing terseness and readability.
You could use lists.
#validate(user_id=[int, min(1)], user_name=[unicode,required,max(40)])
And each item could be a function (or class/object) that gets executed with the corresponding field as an argument. If it raises an error, it fails validation.

Categories