I got a situation I have to generate a if condition based on item configured in my configuration json file.
So once the if statement converted into a string I want that to run by eval .
This is what I am trying for that.
Here is what flags and set_value look like
My json
"EMAIL_CONDITION":
{
"TOXIC_THRESOLD":"50",
"TOXIC_PLATFORM_TODAY":"0",
"TOXIC_PRS_TODAY":"0",
"Explanation":"select any or all 1 for TOXIC_Thresold 2 for TOXIC_PLATFORM 3 for toxic_prs ",
"CONDITION_TYPE":["1","2"]
},
"email_list":{
"cc":["abc#def.t"],
"to":["abc#def.net"]
}
The Python
CONDITION_TYPE is a variable where values can be either all, any or 1,2 ,2,3, or 1,3
1 stands for toxic index 2 for platform and 3 for toxic prs
But the idea is going forward any number of parameters can be added so I wanted to make the if condition generic so that can take any number of conditions simply just wanted to avoid to many if else. all and any i have already handaled its straight forward but this is variable number so only else part snippet given
flags['1'] = toxic_index
flags['2'] = toxic_platform
flags['3'] = toxic_prs
set_value['1'] = toxic_index_condition
set_value['2'] = toxic_platform_condition
set_value['3'] = toxic_pr_condition
else:
condition_string = 'if '
for val,has_more in lookahead(conditions):
if has_more:
condition_string = str(condition_string+ str(flags[val] >= set_value[val])+ str('and') )
else:
condition_string = str(condition_string+ str(flags[val] >= set_value[val]) + str(':') )
print str(condition_string)
I do understand that most of them are variable so I am getting the response like
if False and False:
Instead of False I wanted to get the real condition(basically like condition_string+ str(flags[val] >= set_value[val]) ) based on that I can send mail.
I am not able to do that as I am getting False and False.
Please suggest me a best solution for it.
Related
I'm a Python beginner, so please forgive me if I'm not using the right lingo and if my code includes blatant errors.
I have text data (i.e., job descriptions from job postings) in one column of my data frame. I want to determine which job ads contain any of the following strings: bachelor, ba/bs, bs/ba.
The function I wrote doesn't work because it produces an empty column (i.e., all zeros). It works fine if I just search for one substring at a time. Here it is:
def requires_bachelor(text):
if text.find('bachelor|ba/bs|bs/ba')>-1:
return True
else:
return False
df_jobs['bachelor']=df_jobs['description'].apply(requires_bachelor).map({True:1, False:0})
Thanks so much to anyone who is willing to help!
Here's my approach. You were pretty close but you need to check for each of the items individually. If any of the available "Bachelor tags" exist, return true. Then instead of using map({true:1, false:0}), you can use map(bool) to make it a bit nicer. Good luck!
import pandas as pd
df_jobs = pd.DataFrame({"name":["bob", "sally"], "description":["bachelor", "ms"]})
def requires_bachelor(text):
return any(text.find(a) > -1 for a in ['bachelor', 'ba/bs','bs/ba']) # -1 if not found
df_jobs['bachelor']=df_jobs['description'].apply(requires_bachelor).map(bool)
The | in search string does not work like or operator. You should divide it into three calls like this:
if text.find('bachelor') > -1 or text.find('ba/bs') > -1 or text.find('bs/ba') > -1:
You could try doing:
bachelors = ["bachelor", "ba/bs", "bs/ba"]
if any(bachelor in text for bachelor in bachelors):
return True
Instead of writing a custom function that requires .apply (which will be quite slow), you can use str.contains for this. Also, you don't need map to turn booleans into 1 and 0; try using astype(int) instead.
df_jobs = pd.DataFrame({'description': ['job ba/bs', 'job bachelor',
'job bs/ba', 'job ba']})
df_jobs['bachelor'] = df_jobs.description.str.contains(
'bachelor|ba/bs|bs/ba', regex=True).astype(int)
print(df_jobs)
description bachelor
0 job ba/bs 1
1 job bachelor 1
2 job bs/ba 1
3 job ba 0
# note that the pattern does not look for match on simply "ba"!
So, you are checking for a string bachelor|ba/bs|bs/ba in the list, Which I don't believe will exist in any case...
What I suggest you do is to check for all possible combinations in the IF, and join them with a or statement, as follows:
def requires_bachelor(text):
if text.find('bachelor')>-1 or text.find('ba/bs')>-1 or text.find('bs/ba')>-1:
return True
else:
return False
df_jobs['bachelor']=df_jobs['description'].apply(requires_bachelor).map({True:1, False:0})
It can all be done simply in one line in Pandas
df_jobs['bachelor'] = df_jobs['description'].str.contains(r'bachelor|bs|ba')
I am trying to optimize this block of code to use a single query rather than looping over and over.
while not (dataX):
i += 1
this_id = '/'.join(this_id.split('/')[0:-i])
if not this_id:
break
else:
dataX = db.conn[db_read].query("SELECT x AS xX FROM link WHERE _deleted = 0 AND _ref = %s AND _ntype = 'code' LIMIT 1;", data = (this_id,))
I want to use the IN clause with a variable that contains all possible substring but I can't get it to work.
this_id_list = "'/a/b/c/d/e' , '/a/b/c/d', '/a/b/c', '/a/b', '/a'"
result = db.conn[db_read].query("SELECT x AS xX FROM link WHERE _deleted = 0 AND _ref IN($this_id_list)")
Any idea what am I doing wrong and how to fix it? I would really appreciate any input! This is a Python script btw.
this_id_list = "'/a/b/c/d/e' , '/a/b/c/d', '/a/b/c', '/a/b', '/a'"
This should be a string
I'm trying to use Python to call an API and clean a bunch of strings that represent a movie budget.
So far, I have the following 6 variants of data that come up.
"$1.2 million"
"$1,433,333"
"US$ 2 million"
"US$1,644,736 (est.)
"$6-7 million"
"£3 million"
So far, I've only gotten 1 and 2 parsed without a problem with the following code below. What is the best way to handle all of the other cases or a general case that may not be listed below?
def clean_budget_string(input_string):
number_to_integer = {'million' : 1000000, 'thousand' : 1000}
budget_parts = input_string.split(' ')
#Currently, only indices 0 and 1 are necessary for computation
text_part = budget_parts[1]
if text_part in number_to_integer:
number = budget_parts[0].lstrip('$')
int_representation = number_to_integer[text_part]
return int(float(number) * int_representation)
else:
number = budget_parts[0]
idx_dollar = 0
for idx in xrange(len(number)):
if number[idx] == '$':
idx_dollar = idx
return int(number[idx_dollar+1:].replace(',', ''))
The way I would approach a parsing task like this -- and I'm happy to hear other opinions -- would be to break up your function into several parts, each of which identify a single piece of information in the input string.
For instance, I'd start by identifying what float number can be parsed from the string, ignoring currency and order of magnitude (a million, a thousand) for now :
f = float(''.join([c for c in input_str if c in '0123456789.']))
(you might want to add error handling for when you end up with a trailing dot, because of additions like 'est.')
Then, in a second step, you determine whether the float needs to be multiplied to adjust for the correct order of magnitude. One way of doing this would be with multiple if-statements :
if 'million' in input_str :
oom = 6
elif 'thousand' in input_str :
oom = 3
else :
oom = 1
# adjust number for order of magnitude
f = f*math.pow(10, oom)
Those checks could of course be improved to account for small differences in formatting by using regular expressions.
Finally, you separately determine the currency mentioned in your input string, again using one or more if-statements :
if '£' in input_str :
currency = 'GBP'
else :
currency = 'USD'
Now the one case that this doesn't yet handle is the dash one where lower and upper estimates are given. One way of making the function work with these inputs is to split the initial input string on the dash and use the first (or second) of the substrings as input for the initial float parsing. So we would replace our first line of code with something like this:
if '-' in input_str :
lower = input_str.split('-')[0]
f = float(''.join([c for c in lower if c in '0123456789.']))
else :
f = float(''.join([c for c in input_str if c in '0123456789.']))
using regex and string replace method, i added the return of the curency as well if needed.
Modify accordingly to handle more input or multiplier like billion etc.
import re
# take in string and return integer amount and currency
def clean_budget_string(s):
mult_dict = {'million':1000000,'thousand':1000}
tmp = re.search('(^\D*?)\s*((?:\d+\.?,?)+)(?:-\d+)?\s*((?:million|thousand)?)', s).groups()
currency = tmp[0]
mult = tmp[-1]
tmp_int = ''.join(tmp[1:-1]).replace(',', '') # join digits and multiplier, remove comma
tmp_int = int(float(tmp_int) * mult_dict.get(mult, 1))
return tmp_int, currency
>>? clean_budget_string("$1.2 million")
(1200000, '$')
>>? clean_budget_string("$1,433,333")
(1433333, '$')
>>? clean_budget_string("US$ 2 million")
(2000000, 'US$')
>>? clean_budget_string("US$1,644,736 (est.)")
(1644736, 'US$')
>>? clean_budget_string("$6-7 million")
(6000000, '$')
>>? clean_budget_string("£3 million")
(3000000, '£') # my script don't recognize the £ char, might need to set the encoding properly
So I'm writing a small project using python,
But now I'm in trouble.
I made some code like this:
START_BUTTONS = ("button1", "button2")
markup = types.ReplyKeyboardMarkup()
lengthof = len(START_BUTTONS)
countn = 0
while (countn < lengthof):
exec("itembtn" + str(countn) + " = types.KeyboardButton(START_BUTTONS[" + str(countn) + "])")
countn = countn + 1
So, this will parse something like this (unitl the tuple ends):
itembtn0 = types.KeyboardButton(START_BUTTONS[0])
itembtn1 = types.KeyboardButton(START_BUTTONS[1])
and...
So those variables are usable later.
But, my problem is here. I want to check how many of those variables are there (itembtn0 itembtn1 itembtn2 itembtn3...) and put them like this:
markup.row(itembtn0, itembtn1, itembtn2)
so , if there were 5 of those, it will return something like this:
markup.row(itembtn0, itembtn1, itembtn2, itembtn3, itembtn4)
Actually I have no idea for what I should write.
Thanks for help! & Sorry for my bad english.
You are trying to create numbered variables, which can in all cases be replaced by an array. Try something simple instead:
START_BUTTONS = ("button1", "button2")
markup = types.ReplyKeyboardMarkup()
itembtn = []
for btn in START_BUTTONS:
itembtn.append(types.KeyboardButton(btn))
Access it with
itembtn[0]
itembtn[1]
etc.
And you can know how many there are:
len(itembtn)
I am not sure about your markup function, but you can pass the whole array as parameters like this:
markup.row(*itembtn)
I give a lot of information on the methods that I used to write my code. If you just want to read my question, skip to the quotes at the end.
I'm working on a project that has a goal of detecting sub populations in a group of patients. I thought this sounded like the perfect opportunity to use association rule mining as I'm currently taking a class on the subject.
I there are 42 variables in total. Of those, 20 are continuous and had to be discretized. For each variable, I used the Freedman-Diaconis rule to determine how many categories to divide a group into.
def Freedman_Diaconis(column_values):
#sort the list first
column_values[1].sort()
first_quartile = int(len(column_values[1]) * .25)
third_quartile = int(len(column_values[1]) * .75)
fq_value = column_values[1][first_quartile]
tq_value = column_values[1][third_quartile]
iqr = tq_value - fq_value
n_to_pow = len(column_values[1])**(-1/3)
h = 2 * iqr * n_to_pow
retval = (column_values[1][-1] - column_values[1][1])/h
test = int(retval+1)
return test
From there I used min-max normalization
def min_max_transform(column_of_data, num_bins):
min_max_normalizer = preprocessing.MinMaxScaler(feature_range=(1, num_bins))
data_min_max = min_max_normalizer.fit_transform(column_of_data[1])
data_min_max_ints = take_int(data_min_max)
return data_min_max_ints
to transform my data and then I simply took the interger portion to get the final categorization.
def take_int(list_of_float):
ints = []
for flt in list_of_float:
asint = int(flt)
ints.append(asint)
return ints
I then also wrote a function that I used to combine this value with the variable name.
def string_transform(prefix, column, index):
transformed_list = []
transformed = ""
if index < 4:
for entry in column[1]:
transformed = prefix+str(entry)
transformed_list.append(transformed)
else:
prefix_num = prefix.split('x')
for entry in column[1]:
transformed = str(prefix_num[1])+'x'+str(entry)
transformed_list.append(transformed)
return transformed_list
This was done to differentiate variables that have the same value, but appear in different columns. For example, having a value of 1 for variable x14 means something different from getting a value of 1 in variable x20. The string transform function would create 14x1 and 20x1 for the previously mentioned examples.
After this, I wrote everything to a file in basket format
def create_basket(list_of_lists, headers):
#for filename in os.listdir("."):
# if filename.e
if not os.path.exists('baskets'):
os.makedirs('baskets')
down_length = len(list_of_lists[0])
with open('baskets/dataset.basket', 'w') as basketfile:
basket_writer = csv.DictWriter(basketfile, fieldnames=headers)
for i in range(0, down_length):
basket_writer.writerow({"trt": list_of_lists[0][i], "y": list_of_lists[1][i], "x1": list_of_lists[2][i],
"x2": list_of_lists[3][i], "x3": list_of_lists[4][i], "x4": list_of_lists[5][i],
"x5": list_of_lists[6][i], "x6": list_of_lists[7][i], "x7": list_of_lists[8][i],
"x8": list_of_lists[9][i], "x9": list_of_lists[10][i], "x10": list_of_lists[11][i],
"x11": list_of_lists[12][i], "x12":list_of_lists[13][i], "x13": list_of_lists[14][i],
"x14": list_of_lists[15][i], "x15": list_of_lists[16][i], "x16": list_of_lists[17][i],
"x17": list_of_lists[18][i], "x18": list_of_lists[19][i], "x19": list_of_lists[20][i],
"x20": list_of_lists[21][i], "x21": list_of_lists[22][i], "x22": list_of_lists[23][i],
"x23": list_of_lists[24][i], "x24": list_of_lists[25][i], "x25": list_of_lists[26][i],
"x26": list_of_lists[27][i], "x27": list_of_lists[28][i], "x28": list_of_lists[29][i],
"x29": list_of_lists[30][i], "x30": list_of_lists[31][i], "x31": list_of_lists[32][i],
"x32": list_of_lists[33][i], "x33": list_of_lists[34][i], "x34": list_of_lists[35][i],
"x35": list_of_lists[36][i], "x36": list_of_lists[37][i], "x37": list_of_lists[38][i],
"x38": list_of_lists[39][i], "x39": list_of_lists[40][i], "x40": list_of_lists[41][i]})
and I used the apriori package in Orange to see if there were any association rules.
rules = Orange.associate.AssociationRulesSparseInducer(patient_basket, support=0.3, confidence=0.3)
print "%4s %4s %s" % ("Supp", "Conf", "Rule")
for r in rules:
my_rule = str(r)
split_rule = my_rule.split("->")
if 'trt' in split_rule[1]:
print 'treatment rule'
print "%4.1f %4.1f %s" % (r.support, r.confidence, r)
Using this, technique I found quite a few association rules with my testing data.
THIS IS WHERE I HAVE A PROBLEM
When I read the notes for the training data, there is this note
...That is, the only
reason for the differences among observed responses to the same treatment across patients is
random noise. Hence, there is NO meaningful subgroup for this dataset...
My question is,
why do I get multiple association rules that would imply that there are subgroups, when according to the notes I shouldn't see anything?
I'm getting lift numbers that are above 2 as opposed to the 1 that you should expect if everything was random like the notes state.
Supp Conf Rule
0.3 0.7 6x0 -> trt1
Even though my code runs, I'm not getting results anywhere close to what should be expected. This leads me to believe that I messed something up, but I'm not sure what it is.
After some research, I realized that my sample size is too small for the number of variables that I have. I would need a way larger sample size in order to really use the method that I was using. In fact, the method that I tried to use was developed with the assumption that it would be run on databases with hundreds of thousands or millions of rows.