How can I make a conditional expression? - python

I want to see the modeling output with two data frames.
One data frame has a target value of 1 to 8 and another has only 1,2,3,5,6,7
I made a dictionary to map the values, and I made a code as below to make the probability.
my_dict ={1:'a', 2:'b', 3:'c', 4:'d', 5:'e', 6:'f', 7:'g', 8:'f'}
def func(val):
for key, value in my_dict.items():
if val == key:
return value
return "There is no such Key"
inputData = [1, 2, 3, 4, 5]
inputData2 = np.array([inputData])
index = 1;
result_data = OrderedDict()
for x in xgb_model.predict_proba(inputData2,ntree_limit=None, validate_features=False,base_margin=None)[0]:
result_data[func(index)] = round(x,2)
index += 1
print("result_name : ", max(result_data.items(), key=operator.itemgetter(1))[0])
print("result_value : ", max(xgb_model.predict_proba(inputData2, ntree_limit=None, validate_features=False, base_margin=None)[0]))
print(result_data)
But in the second data frame, the key value is pushed back.
For example, a: 0.2, b:0.2, c:0.1, e:0.1, f:0.1 g:0.3 should appear, but in real data, the data should be:
a:0.2, b:0.2, c:0.1, d:0.1, e:0.1, f:0.3
I don’t know what I should do.
So I've been working on the code below.
Only a:0.2, b:0.2, c:0.1 comes out and ends.
for x in xgb_model.predict_proba(inputData2,ntree_limit=None, validate_features=False,base_margin=None)[0]:
if index not in y.target.unique().tolist():
continue
result_data[func(index)] = round(x,2)
index += 1
please let me know if you can't understand the code.
hope for help. Thank you.

In the second model that has 8 coefficients, you overwrite the value for f since it is defined both for the 6th as well as for the 8th element. Your dict should be defined as:
my_dict ={1:'a', 2:'b', 3:'c', 4:'d', 5:'e', 6:'f', 7:'g', 8:'h'}
But you could make the code much simpler by just using a string ("_abcdefgh") to get the correct letter for each index. You could, then, just use result_data[mystring[i]]= and drop the function.

Related

Creating python dictionaries using for loop

I make a bunch of matrices that I want to store in python dictionaries and I always find myself typing the same thing for every state that I want to build, i.e.
Ne21_1st_state = {}
Ne21_2nd_state = {}
Ne21_3rd_state = {}
Ne21_4th_state = {}
Ne21_5th_state = {}
Ne21_6th_state = {}
...
Ne21_29th_state = {}
Ne21_30th_state = {}
Can somebody help me automate this using python for loops?
Thanks in advance!
I want something like this:
for i in range(3, 11):
states = f'Ar36_{i}th_state'
print(states)
where the output would be:
Ar36_3th_state
Ar36_4th_state
Ar36_5th_state
Ar36_6th_state
Ar36_7th_state
Ar36_8th_state
Ar36_9th_state
Ar36_10th_state
but instead of printing it it would create individual dictionaries named Ar36_3th_state, Ar36_4th_state, Ar36_5th_state, ...
can't we make a List of dictionaries
List of 30 (or any N) elements where each element is a dictionary with key = "Ar36_{i}th_state" and value = {whatever value you want}
You can create "name" of pseudo variable and use it as key in dictionary like:
my_dic = {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
my_empty_dic = {}
solution = {}
for i in range(1, 31):
name = 'Ne21_'+str(i)+'st_state'
#solution[name] = my_dic
solution[name] = my_empty_dic
for pseudo_variable in solution:
print(pseudo_variable, solution[pseudo_variable])
print(solution['Ne21_16st_state'])
for pseudo_variable in solution:
if '_16st' in pseudo_variable:
print(pseudo_variable, solution[pseudo_variable])
One way I've done this is using list comprehension.
key = list(
str(input(f"Please enter a Key for value {x + 1}: "))
if x == 0
else str(input(f"\nPlease enter a Key for value {x + 1}: "))
for x in range(3))
value = list(str(input(f"\nPlease enter a Bool for value {x + 1}: "))
for x in range(3))
BoolValues = dict(zip(key, value))
I first create a list of keys followed by a list of the values to be stored in the keys. Then I just zip them together into a dictionary. The conditional statements in the first list are only for a slightly better user-experience with \n being added if it's passed the first input.
Actually now that I look back on the question it may be slightly different to what I was thinking, are you trying to create new dictionaries for every matrix? If that is the case, is it something similar to this?: How do you create different variable names while in a loop?

How to structure dictionary to apply to function with enumerate

I am trying to re-build a simple function, that ask for a dictionary as an input. No matter what I try I cannot figure out a minimum working example of a dictionary to pass through this function. I've read upon dictionaries and there is not so much room to create it differently, hence I do not know what the problem is.
I've tried to apply following minimum dictionary examples:
import nltk
#Different dictionaries to try as minimum working examples:
comments1 = {1 : 'Rockies', 2: 'Red Sox'}
comments2 = {'key1' : 'Rockies', 'key2': 'Red Sox'}
comments3 = dict([(1, 3), (2, 3)])
#Function:
def tokenize_body(comments):
tokens = {}
for idx, com_id in enumerate(comments):
body = comments[com_id]['body']
tokenized = [x.lower() for x in nltk.word_tokenize(body)]
tokens[com_id] = tokenized
return tokens
tokens = tokenize_body(comments1)
I know that with enumerate I am basically calling the index and the key, I can not figure out how to call the 'body', i.e the strings that I want to tokenize.
For both comments1 and comments2 with strings as inputs I receive the error: TypeError: string indices must be integers.
If I apply integers instead of strings, comments3, I receive the error:
TypeError: 'int' object is not subscriptable.
This may seem trivial to you, but I can not figure out what I am doing wrong. If you could provide a minimum working example, that would be highly appreciated.
In order to loop through a dictionary in python, you need to use the items method to get both keys and values:
comments = {"key1": "word", "key2": "word2"}
def tokenize_body(comments):
tokens = {}
for key, value in comments.items():
# values - word, word2
# keys - key1, key2
tokens[key] = [x.lower() for x in nltk.word_tokenize(value)]
return tokens
enumerate is used for lists, in order to get the index of an element:
l = ['a', 'b']
for index, elm in enumerate(l):
print(index) # => 0, 1
You might be looking for .items(), e.g.:
for idx, item in enumerate(comments1.items()):
print(idx, item)
This will print
0 (1, 'Rockies')
1 (2, 'Red Sox')
See a demo on ideone.com.

How to reduce on a list of tuples in python

I have an array and I want to count the occurrence of each item in the array.
I have managed to use a map function to produce a list of tuples.
def mapper(a):
return (a, 1)
r = list(map(lambda a: mapper(a), arr));
//output example:
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)
I'm expecting the reduce function can help me to group counts by the first number (id) in each tuple. For example:
(11817685, 2), (2014036792, 1), (2014047115, 1)
I tried
cnt = reduce(lambda a, b: a + b, r);
and some other ways but they all don't do the trick.
NOTE
Thanks for all the advice on other ways to solve the problems, but I'm just learning Python and how to implement a map-reduce here, and I have simplified my real business problem a lot to make it easy to understand, so please kindly show me a correct way of doing map-reduce.
You could use Counter:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())
EDIT:
As pointed by #ShadowRanger Counter has items() method:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()
Instead of using any external module you can use some logic and do it without any module:
track={}
if intr not in track:
track[intr]=1
else:
track[intr]+=1
Example code :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
output:
{2008: [9], 2006: [1, 5], 2007: [4]}
After writing my answer to a different question, I remembered this post and thought it would be helpful to write a similar answer here.
Here is a way to use reduce on your list to get the desired output.
arr = [11817685, 2014036792, 2014047115, 11817685]
def mapper(a):
return (a, 1)
def reducer(x, y):
if isinstance(x, dict):
ykey, yval = y
if ykey not in x:
x[ykey] = yval
else:
x[ykey] += yval
return x
else:
xkey, xval = x
ykey, yval = y
a = {xkey: xval}
if ykey in a:
a[ykey] += yval
else:
a[ykey] = yval
return a
mapred = reduce(reducer, map(mapper, arr))
print mapred.items()
Which prints:
[(2014036792, 1), (2014047115, 1), (11817685, 2)]
Please see the linked answer for a more detailed explanation.
If all you need is cnt, then a dict would probably be better than a list of tuples here (if you need this format, just use dict.items).
The collections module has a useful data structure for this, a defaultdict.
from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
# the result of calling int
for key in arr:
cnt[key] += 1 # if key is not in cnt, it will put in the default
# cnt_list = list(cnt.items())

Check Multiple Values and Change them

Heyo everyone, I have a question.
I have three variables, rF, tF, and dF.
Now these values can range from -100 to +100. I want to check all of them and see if they are less than 1; if they are, set them to 1.
An easy way of doing this is just 3 if statements, like
if rF < 1:
rF = 1
if tF < 1:
tF = 1
if dF < 1:
dF = 1
However, as you can see, this looks bad, and if i had, say 50 of these values, this could get out of hand quite easily.
I tried to put them in an array like so:
for item in [rF, tF, dF]:
if item < 1:
item = 1
However this doesn't work. I believe that when you do that you create a completely different object (the array), and when you change the items you are not changing the variables themselves but the values of the array.
So my question is: What is an elegant way of doing this?
Why not use a dictionary, if you've only got three variables of which to keep track?
rF, tF, dF = 100, -100, 1
d = {'rF': rF, 'tF': tF, 'dF': dF}
for k in d:
if d[k] < 1:
d[k] = 1
print(d)
{'rF': 100, 'tF': 1, 'dF': 1}
Then if you're referencing any of those values later, you can simply do this (as a trivial example):
def f(var):
print("'%s' is equal to %d" % (var, d[var]))
>>> f('rF')
'rF' is equal to 100
If you really wanted to use lists, and you knew the order of your list, you could do this (but dictionaries are made for this type of problem):
arr = [rF, tF, dF]
arr = [1 if x < 1 else x for x in arr]
print(arr)
[100, 1, 1]
Note that the list comprehension approach won't actually change the values of rF, tF, and dF.
You can simply use a dictionary and then unpack the dict:
d = {'rF': rF, 'tF': tF, 'dF': dF}
for key in d:
if d[key] < 1:
d[key] = 1
rF, tF, dF = d['rF'], d['tF'], d['dF']
You can use the following instead of the last line:
rF, tF, dF = map(d.get, ('rF', 'tF', 'dF'))
Here's exactly what you asked for:
rF = -3
tF = 9
dF = -2
myenv = locals()
for k in list(myenv.keys()):
if len(k) == 2 and k[1] == "F":
myenv[k] = max(1, myenv[k])
print(rF, tF, dF)
# prints 1 9 1
This may accidentally modify any variables you don't really want to change, so I recommend using a proper data structure instead of hacking the user environment.
Edit: Fixed an error for RuntimeError: dictionary changed size during iteration. Dictionaries cannot be iterated over and modified at the same time. Avoid this by first copying the dictionary keys, and iterating over the original keys instead of the actual dictionary. Should work in Python 2 and 3 now, just Python 2 before.
Use List Comprehension and max function.
items = [-32, 0, 43]
items = [max(1, item) for item in items]
rF, tF, dF = items
print(rF, tF, dF)

Append several variables to a list in Python

I want to append several variables to a list. The number of variables varies. All variables start with "volume". I was thinking maybe a wildcard or something would do it. But I couldn't find anything like this. Any ideas how to solve this? Note in this example it is three variables, but it could also be five or six or anything.
volumeA = 100
volumeB = 20
volumeC = 10
vol = []
vol.append(volume*)
You can use extend to append any iterable to a list:
vol.extend((volumeA, volumeB, volumeC))
Depending on the prefix of your variable names has a bad code smell to me, but you can do it. (The order in which values are appended is undefined.)
vol.extend(value for name, value in locals().items() if name.startswith('volume'))
If order is important (IMHO, still smells wrong):
vol.extend(value for name, value in sorted(locals().items(), key=lambda item: item[0]) if name.startswith('volume'))
Although you can do
vol = []
vol += [val for name, val in globals().items() if name.startswith('volume')]
# replace globals() with locals() if this is in a function
a much better approach would be to use a dictionary instead of similarly-named variables:
volume = {
'A': 100,
'B': 20,
'C': 10
}
vol = []
vol += volume.values()
Note that in the latter case the order of items is unspecified, that is you can get [100,10,20] or [10,20,100]. To add items in an order of keys, use:
vol += [volume[key] for key in sorted(volume)]
EDIT removed filter from list comprehension as it was highlighted that it was an appalling idea.
I've changed it so it's not too similar too all the other answers.
volumeA = 100
volumeB = 20
volumeC = 10
lst = map(lambda x : x[1], filter(lambda x : x[0].startswith('volume'), globals().items()))
print lst
Output
[100, 10, 20]
do you want to add the variables' names as well as their values?
output=[]
output.append([(k,v) for k,v in globals().items() if k.startswith('volume')])
or just the values:
output.append([v for k,v in globals().items() if k.startswith('volume')])
if I get the question appropriately, you are trying to append different values in different variables into a list. Let's see the example below.
Assuming :
email = 'example#gmail.com'
pwd='Mypwd'
list = []
list.append(email)
list.append (pwd)
for row in list:
print(row)
# the output is :
#example#gmail.com
#Mypwd
Hope this helps, thank you.

Categories