GET Request Flask - python

I have written something that works, but I am 100% sure that there is an even more efficient and faster way of doing what I did.
The code that I have written, essentially uses OpenBayes' library and creates a network with its nodes, relationships between nodes, and the probabilities and distributions associated with each of the nodes. Now, I was creating a GET request using Flask, in order to process the conditional probabilities by simply sending the request.
I will send some evidence (given values), and set the node in which I want its probability (observed value). Mathematically it looks like this:
Observed Value = O and Evidence = En, where n > 1
P( O | E1, E2, ..., En)
My final goal would be to have a client/server ping the server hosting this code(with the right parameters) and constantly give me the final values of the observed probability, given the evidence (which could be 1 or more values). The code I have written so far for the GET request portion is:
#app.route('/evidence/evidence=<evidence>&observed=<obv>', methods=['GET'])
def get_evidence(evidence, obv):
# Take <evidence> and <obv> split them up. For example:
# 'cloudy1rain0sprinkler1' to 'cloudy1', 'rain0' and 'sprinkler1', all in a nice list.
analyzeEvidence, observedNode = evidence.upper().strip(), obv.upper().strip()
string, count, newCount, listOfEvidence = "", 0, 0, {}
counter = sum(character.isdigit() for character in analyzeEvidence)
# This portion is to set up all the evidences.
for y in xrange(0, counter):
string, newCount = "", count
for x in xrange(newCount, len(analyzeEvidence)):
count += 1
if analyzeEvidence[x].isalpha() == True:
string += str(analyzeEvidence[x])
elif analyzeEvidence[x].isdigit() == True and string in allNodes:
if int(analyzeEvidence[x]) == 1 or int(analyzeEvidence[x]) == 0:
listOfEvidence[string] = int(analyzeEvidence[x])
break
else: abort(400)
break
else: abort(400)
net.SetObs(listOfEvidence) # This would set the evidence like this: {"CLOUDY": 1, "RAIN":0}
# This portion is to set up one single observed value
string = ""
for x in xrange(0, len(observedNode)):
if observedNode[x].isalpha() == True:
string += str(observedNode[x])
if string == "WETGRASS":
string = "WET GRASS"
elif observedNode[x].isdigit() == True and string in allNodes:
if int(observedNode[x]) == 1 or int(observedNode[x]) == 0:
observedValue = int(observedNode[x])
observedNode = string
break
else: abort(400)
else: abort(400)
return str(net.Marginalise(observedNode)[observedValue]) # Output returned is the value like: 0.7452
Given my code, is there any way to optimize it? Also, Is there a better way of passing these parameters that doesn't take so many lines like my code does? I was planning on setting fixed key parameters, but because my number of evidence can change per request, I thought this would be one way in doing so.

You can easily split your evidence input into a list of strings with this:
import re
# 'cloudy1rain0sprinkler1' => ['cloudy1', 'rain0' and 'sprinkler1'].
evidence_dict = {}
input_evidence = 'cloudy1rain0sprinkler1'
# looks for a sequence of alphabets followed by any number of digits
evidence_list = re.findall('([a-z]+\d+)', input_evidence.lower())
for evidence in evidence_list:
name, val, _ = re.split('(\d+)', evidence)
if name in allNodes:
evidence_dict[name] = val
# evidence_dict = {'cloudy': 1, 'rain': 0, 'sprinkler': 1}
You should be able to do something similar with the observations.
I would suggest you use an HTTP POST. That way you can send a JSON object which will already have the separation of variable names and values done for you, all you'll have to do is check that the variable names sent are valid in allNodes. It will also allow your variable list to grow somewhat arbitrarily.

Related

Extract words from random strings

Below I have some strings in a list:
some_list = ['a','l','p','p','l','l','i','i','r',i','r','a','a']
Now I want to take the word april from this list. There are only two april in this list. So I want to take that two april from this list and append them to another extract list.
So the extract list should look something like this:
extract = ['aprilapril']
or
extract = ['a','p','r','i','l','a','p','r','i','l']
I tried many times trying to get the everything in extract in order, but I still can't seems to get it.
But I know I can just do this
a_count = some_list.count('a')
p_count = some_list.count('p')
r_count = some_list.count('r')
i_count = some_list.count('i')
l_count = some_list.count('l')
total_count = [a_count,p_count,r_count,i_count,l_count]
smallest_count = min(total_count)
extract = ['april' * smallest_count]
Which I wouldn't be here If I just use the code above.
Because I made some rules for solving this problem
Each of the characters (a,p,r,i and l) are some magical code elements, these code elements can't be created out of thin air; they are some unique code elements, that has some uniquw identifier, like a secrete number that is associated with them. So you don't know how to create this magical code elements, the only way to get the code elements is to extract them to a list.
Each of the characters (a,p,r,i and l) must be in order. Imagine they are some kind of chains, they will only work if they are together. Meaning that we got to put p next to and in front of a, and l must come last.
These important code elements are some kind of top secrete stuff, so if you want to get it, the only way is to extract them to a list.
Below are some examples of a incorrect way to do this: (breaking the rules)
import re
word = 'april'
some_list = ['aaaaaaappppppprrrrrriiiiiilll']
regex = "".join(f"({c}+)" for c in word)
match = re.match(regex, text)
if match:
lowest_amount = min(len(g) for g in match.groups())
print(word * lowest_amount)
else:
print("no match")
from collections import Counter
def count_recurrence(kernel, string):
# we need to count both strings
kernel_counter = Counter(kernel)
string_counter = Counter(string)
effective_counter = {
k: int(string_counter.get(k, 0)/v)
for k, v in kernel_counter.items()
}
min_recurring_count = min(effective_counter.values())
return kernel * min_recurring_count
This might sounds really stupid, but this is actually a hard problem (well for me). I originally designed this problem for myself to practice python, but it turns out to be way harder than I thought. I just want to see how other people solve this problem.
If anyone out there know how to solve this ridiculous problem, please help me out, I am just a fourteen-year-old trying to do python. Thank you very much.
I'm not sure what do you mean by "cannot copy nor delete the magical codes" - if you want to put them in your output list you will need to "copy" them somehow.
And btw your example code (a_count = some_list.count('a') etc) won't work since count will always return zero.
That said, a possible solution is
worklist = [c for c in some_list[0]]
extract = []
fail = False
while not fail:
lastpos = -1
tempextract = []
for magic in magics:
if magic in worklist:
pos = worklist.index(magic, lastpos+1)
tempextract.append(worklist.pop(pos))
lastpos = pos-1
else:
fail = True
break
else:
extract.append(tempextract)
Alternatively, if you don't want to pop the elements when you find them, you may compute the positions of all the occurences of the first element (the "a"), and set lastpos to each of those positions at the beginning of each iteration
May not be the most efficient way, although code works and is more explicit to understand the program logic:
some_list = ['aaaaaaappppppprrrrrriiiiiilll']
word = 'april'
extract = []
remove = []
string = some_list[0]
for x in range(len(some_list[0])//len(word)): #maximum number of times `word` can appear in `some_list[0]`
pointer = i = 0
while i<len(word):
j=0
while j<(len(string)-pointer):
if string[pointer:][j] == word[i]:
extract.append(word[i])
remove.append(pointer+j)
i+=1
pointer = j+1
break
j+=1
if i==len(word):
for r_i,r in enumerate(remove):
string = string[:r-r_i] + string[r-r_i+1:]
remove = []
elif j==(len(string)-pointer):
break
print(extract,string)

Variable table width with .format

I'm trying to display data from a csv in a text table. I've got to the point where it displays everything that I need, however the table width still has to be set, meaning if the data is longer than the number set then issues begin.
I currently print the table using .format to sort out formatting, is there a way to set the width of the data to a variable that is dependant on the length of the longest piece of data?
for i in range(len(list_l)):
if i == 0:
print(h_dashes)
print('{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}'.format('|', (list_l[i][0].upper()),'|', (list_l[i][1].upper()),'|',(list_l[i][2].upper()),'|', (list_l[i][3].upper()),'|'))
print(h_dashes)
else:
print('{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}{:^26s}{:^1s}'.format('|', list_l[i][0], '|', list_l[i][1], '|', list_l[i][2],'|', list_l[i][3],'|'))
I realise that the code is far from perfect, however I'm still a newbie so it's piecemeal from various tutorials
You can actually use a two-pass approach to first get the correct lengths. As per your example with four fields per line, the following shows the basic idea you can use.
What follows is an example of the two-pass approach, first to get the maximum lengths for each field, the other to do what you're currently doing (with the calculated rather than fixed lengths):
# Can set MINIMUM lengths here if desired, eg: lengths = [10, 0, 41, 7]
lengths = [0] * 4
fmtstr = None
for pass in range(2):
for i in range(len(list_l)):
if pass == 0:
# First pass sets lengths as per data.
for field in range(4):
lengths[field] = max(lengths[field], len(list_l[i][field])
else:
# Second pass prints the data.
# First, set format string if not yet set.
if fmtstr is None:
fmtstr = '|'
for item in lengths:
fmtstr += '{:^%ds}|' % (item)
# Now print item (and header stuff if first item).
if i == 0: print(h_dashes)
print(fmtstr.format(list_l[i][0].upper(), list_l[i][1].upper(), list_l[i][2].upper(), list_l[i][3].upper()))
if i == 0: print(h_dashes)
The construction of the format string is done the first time you process an item in pass two.
It does so by taking a collection like [31,41,59] and giving you the string:
|{:^31s}|{:^41s}|{:^59s}|
There's little point using all those {:^1s} format specifiers when the | is not actually a varying item - you may as well code it directly into the format string.

Create longestPossible(longest_possible in python) helper function that takes 1 integer argument which is a maximum length of a song in seconds

Am kind of new to coding,please help me out with this one with explanations:
songs is an array of objects which are formatted as follows:
{artist: 'Artist', title: 'Title String', playback: '04:30'}
You can expect playback value to be formatted exactly like above.
Output should be a title of the longest song from the database that matches the criteria of not being longer than specified time. If there's no songs matching criteria in the database, return false.
Either you could change playback, so that instead of a string, it's an integer (for instance, the length of the song in seconds) which you convert to a string for display, and test from there, or, during the test, you could take playback and convert it to its length in seconds, like so:
def songLength(playback):
seconds = playback.split(':')
lengthOfSong = int(seconds[0]) * 60 + int(seconds[1])
return lengthOfSong
This will give the following result:
>>> playback = '04:30'
>>> songLength(playback)
270
I'm not as familiar with the particular data structure you're using, but if you can iterate over these, you could do something like this:
def longestPossible(array, maxLength):
longest = 0
songName = ''
for song in array:
lenSong = songLength(song.playback) # I'm formatting song's playback like this because I'm not sure how you're going to be accessing it.
if maxLength >= lenSong and (maxLength - lenSong) < (maxLength - longest):
longest = lenSong
songName = song.title
if longest != 0:
return songName
else:
return '' # Empty strings will evaluate to False.
I haven't tested this, but I think this should at least get you on the right track. There are more Pythonic ways of doing this, so never stop improving your code. Good luck!

Best way to enable user to input formula without making a security hole?

I would like to enable the user to input a formula for calculation given some parameters. What is the best way to do this without making a security hole?
Kind of like this:
def generate_bills():
land_size = 100
building_size = 200
class = 1
formula = "(0.8*land_size)+building_size+(if class==1 10 else if class==2 5 else 2)"
bill = calculate(formula,{'land_size':land_size,'building_size':building_size})
The easiest way to do this is by sanitizing your input. Basically, you want to ONLY pay attention to parameters you define and discard everything else. Sanitation for a numerical equation follows a few simple steps:
Extract static, known equation parts (variable names, operators)
Extract numerical values (which should be allowed if the user can define their own function).
Reconstruct the function using these extracted parts. This discards everything that you do not handle and could be potentially problematic when using Python's ast or eval.
Here's a pretty robust sanitizer I adapted from another project. The code is below, but here are some sample inputs and outputs:
In an ideal case, input and output are identical:
enter func: building_size*40+land_size*20-(building_size+land_size)
building_size*40+land_size*20-(building_size+land_size)
However, were the user to use spaces/periods/tabs/even newlines (gasp), the output is still beautiful:
enter func:
building_size * 500 + land_size-20+building_size.
building_size*500+land_size-20+building_size
And no matter what kind of misguided, malicious injection your user tries, the input is perfectly clean:
enter func: land_size + 2 * building_size quit()
land_size+2*building_size
enter func: 1337+land_size h4x'; DROP TABLE members;
1337+land_size
What's more, you can very easily modify the function to feed the actual values into the equation once sanitized. What I mean by this is go from land_size+2*building_size to 100+2*200 with a simple replace statement. This will allow your functions to be parseable by eval and ast.
The code is below:
import re
# find all indices of a given char
def find_spans(ch, s):
return [tuple((i, i+1)) for i, ltr in enumerate(s) if ltr == ch]
# check to see if an unknown is a number
def is_number(s):
try:
float(s)
except:
return False
return True
# these are the params you will allow
# change these to add/remove parameters/operators
allowed_params = ['land_size', 'building_size']
operators = ['+', '-', '*', '/', '(', ')']
# get input
in_formula = raw_input('enter func: ')
# dictionary that will hold every allowed function element found in the input and its position(s)
found_params = {}
# extract param indices
for param in allowed_params:
found_params[param] = [i.span() for i in re.finditer(param, in_formula)]
# extract operator indices
for op in operators:
found_params[op] = find_spans(op,in_formula)
# get all index regions that are "approved", that is, they are either a param or operator
allowed_indices = sorted([j for i in found_params.values() for j in i])
# these help remove anything unapproved at beginning or end
allowed_indices.insert(0,(0,0))
allowed_indices.append((len(in_formula),len(in_formula)))
# find all index ranges that have not been approved
unknown_indices = [(allowed_indices[i-1][1], allowed_indices[i][0]) for i in range(1,len(allowed_indices)) if allowed_indices[i][0] <> allowed_indices[i-1][1]]
# of all the unknowns, check to see if any are numbers
numbers_indices = [(''.join(in_formula[i[0]:i[1]].split()),i) for i in unknown_indices if is_number(in_formula[i[0]:i[1]])]
# add these to our final dictionary
for num in numbers_indices:
try:
found_params[num[0]].append(num[1])
except:
found_params[num[0]] = [num[1]]
# get final order of extracted parameters
final_order = sorted([(i[0],key) for key in found_params.keys() for i in found_params[key]])
# put all function elements back into a string
final_function = ''.join([i[1] for i in final_order])
#
# here you could replace the parameters in the final function with their actual values
# and then evaluate using eval()
#
print final_function
Let me know if something doesn't make sense and I'd be glad to explain it.

Range for nargs in argparse

I have a script which merges multiple video and audio files. Now I have a parameter which allows four values:
# -A FILENAME LANGUAGE POSITION SPEED
$ script.py [... more parameters ...] -A audio.mp3 eng -1 1 [... more parameters ...]
Now I want the third and fourth to be optional. Currently I have two ideas but maybe there is a better solution:
Set nargs to + and throw an error if 1 or more than 4 parameters are supplied. Maybe the type parameter can catch this. Problem would be that it isn't visible in the help that 2 to 4 values are required.
Have 4 different parameters for all combinations. This would allow to have the position optional. Problem is that I then need four parameter names.
The parameter also might appear multiple times (action is append).
I would suggest having -A take a single, comma-separated string (or use the delimiter of your choice), and supply a custom metavar for the help message.
def av_file_type(str):
data = tuple(str.split(","))
n = len(data)
if n < 2:
raise ArgumentError("Too few arguments")
elif n == 2:
return data + (default_position, default_speed)
elif n == 3:
return data + (default_speed,)
elif n == 4:
return data
else:
return ArgumentError("Too many arguments")
p.add_argument("-A", action='append', type=av_file_type,
metavar='filename,language[,position[,speed]]')
With nargs='+', it would be extremely non-trivial to format the help string the way you like.
I think the things you want to happen are:
allow the user to input 2, 3 or 4 arguments. '+' allows that.
tell the user how many arguments they can give. If the code doesn't do what you want, you can always give a custom usage, description, or help.
object if they enter 1 or more than 4. You can test entries in 3 places - with a custom type, a custom action, or after parse_args.
type won't help you here, because it handles each argument separately. If I enter p.parse_args('-A one two three'.split()), the type function is called 3 times, once for each of the argument strings. It does not see all the strings together.
action might work, since it sees all the argument values that parse_args thinks -A wants. This would all the strings between one -A and the next -A (or other flag). But since you want to append, you need to model your custom action on the argparse._AppendAction class.
checking the namespace after the fact may be your best choice. You'll have a list of lists, and you can check the number of elements in each of the sublists. You can use parse.error(your_message) to generate an argparse style message.
There is a Python bug issue about enabling a nargs range value http://bugs.python.org/issue11354. I proposed a patch that would accept nargs='{m,n}' which is modeled on the re feature. In fact it ends up using re matching to allocated strings to various actions. Read that issue if you want to know more about what SethMMorton is talking about.
Based on chepner's answer I developed a more advanced “subparser”:
audio_parameters = [ "f", "l", "p", "s", "b", "o" ]
def audio_parser(value):
data = {
"l": None,
"p": -1,
"s": 1,
"b": None,
"o": 0,
}
found = set()
if value[0] in audio_parameters and value[1] == "=":
start = 0
while start >= 0:
end = start
parameter = value[start]
found.add(parameter)
#search for the next ',x=' block, where x is an audio_parameter
while end >= 0:
# try next ',' after the last found
end = value.find(",", end + 2)
# exit loop, when find, (or after non found)
if end >= 0 and value[end + 1] in audio_parameters and value[end + 1] not in found and value[end + 2] == "=":
end += 1
break
if parameter in audio_parameters:
parameter_value = value[start + 2:end - 1 if end > 0 else len(value)]
if parameter_value != "":
data[parameter] = parameter_value
start = end
else:
i = 0
for splitted in value.split(","):
if i >= len(audio_parameters):
return ArgumentTypeError("Too many arguments")
if len(splitted) > 0:
data[audio_parameters[i]] = splitted
i += 1
if "f" in data:
return data
else:
raise argparse.ArgumentTypeError("Too few arguments")
This allows the proposed file[,lang[,pos[,speed]]] but also more advanced selecting specific values. For example to set only the file, language and speed f=file,s=speed,l=lang does work, and this in any order. It also allows something which might look like a parameter name, but which doesn't exist or was already used. Both might have been parsed by the simple version (f=file,x=stillname,s=speed,l=lang). The f parameter there is then file,x=stillname. It also allows something like f=file,f=overwrites because it accepts only the first occurrence. So if the file name contains ,b= you can simply write b=,f=file,b=haha.
A mixed mode like file,l=lang is not possible. And as you might have seen, that parameter got way more complex and has now 6 subparameters which makes it almost impossible to use one parameter name for each combination. And a structure like '{n,m}' is also not as flexible as you can't easily omit values.
One thing I noticed though, a metavar with [] doesn't work.

Categories