Fetching values from a list in Python - python

values = ['random word1', 20, 'random word2', 54]
values = list(map(list,zip(values[::2], values[1::2])))
print(values)
[['random word1', 20], ['random word2', 54]]
Now I want to fetch values
like this:
if ‘random word2’ in values:
get the value associated with random word 2 which is in this case 54.
Note: string words and values are random.
How can I do that?
I need to fetch the correct value.

I think a dictionary over a list is the way to go
d = dict(zip(values[::2], values[1::2]))
print(d['random word1']) # -> 20

try:
value = values[values.index("random words2") + 1]
except ValueError:
value = None
Or, if you need to fecth more values, convert your list to a dictionary and then use the get method. Even for 2 retrievals this will already be more efficient, algorithmically wise:
# make N 2-tuples treating the odd-positioned elements as keys and
# even ones as values, and call the dict constructor with that:
v = dict([i:i+2] for i in range(0, len(values), 2))
v.get("random word2")

Related

What is the correct way to write this function?

I was making a program where first parameter is a list and second parameter is a list of dictionaries. I want to return a list of lists like this:
As an example, if this were a function call:
make_lists(['Example'],
[{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]
)
the expected return value would be:
[ ['Made-up'] ]
As an second example, if this were a function call:
make_lists(['Hint', 'Num'],
[{'Hint': 'Length 2 Not Required', 'Num' : 8675309},
{'Num': 1, 'Hint' : 'Use 1st param order'}]
)
the expected return value would be:
[ ['Length 2 Not Required', 8675309],
['Use 1st param order', 1]
]
I have written a code for this but my code does not return a list of lists, it just returns a single list. Please can someone explain?
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
for y in x:
for k in s:
if(y==k):
lol.append(x.get(y))
i = i+1
return lol
Expected Output:
[ ['Length 2 Not Required', 8675309],['Use 1st param order', 1] ]
Output:
['Length 2 Not Required', 8675309, 1, 'Use 1st param order']
The whole point of dictionaries, is that you can access them by key:
def make_lists(keys, dicts):
result = []
for d in dicts:
vals = [d[k] for k in keys if k in d]
if len(vals) > 0:
result.append(vals)
return result
Let's have a look what happens here:
We still have the result array, which accumulates the answers, but now it's called result instead of lol
Next we iterate through every dictionary:
for d in dicts:
For each dictionary d, we create a list, which is a lookup in that dictionary for the keys in keys, if the key k is in the dictionary d:
vals = [d[k] for k in keys if k in d]
The specs don't detail this, but I assume if none of the keys are in the dictionary, you don't want it added to the array. For that, we have a check if vals have any results, and only then we add it to the results:
if len(vals) > 0:
result.append(vals)
Try this code - I've managed to modify your existing code slighty, and added explanation in the comments. Essentially, you just need to use a sub-list and add that to the master list lol, and then in each loop iteration over elements in lod, append to the sub-list instead of the outermost list.
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
## Added
# Here we want to create a new list, and add it as a sub-list
# within 'lol'
lols = []
lol.append(lols)
## Done
for y in x:
for k in s:
if(y==k):
# Changed 'lol' to 'lols' here
lols.append(x.get(y))
i = i+1
return lol
print(make_lists(['Example'], [{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Hint': 'Length 2 Not Required', 'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Prints:
[['Made-up']]
[['Length 2 Not Required', 8675309], [1, 'Use 1st param order']]
A simpler solution
For a cleaner (and potentially more efficient approach), I'd suggest using builtins like map and using a list comprehension to tackle this problem:
def make_lists(s, lod):
return [[*map(dict_obj.get, s)] for dict_obj in lod]
But note, that this approach includes elements as None in cases where the desired keys in s are not present in the dictionary objects within the list lod.
To work around that, you can pass the result of map to the filter builtin function so that None values (which represent missing keys in dictionaries) are then stripped out in the result:
def make_lists(s, lod):
return [[*filter(None, map(dict_obj.get, s))] for dict_obj in lod]
print(make_lists(['Example'], [{'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Output:
[[]]
[[8675309], ['Use 1st param order', 1]]

How to structure dictionary to apply to function with enumerate

I am trying to re-build a simple function, that ask for a dictionary as an input. No matter what I try I cannot figure out a minimum working example of a dictionary to pass through this function. I've read upon dictionaries and there is not so much room to create it differently, hence I do not know what the problem is.
I've tried to apply following minimum dictionary examples:
import nltk
#Different dictionaries to try as minimum working examples:
comments1 = {1 : 'Rockies', 2: 'Red Sox'}
comments2 = {'key1' : 'Rockies', 'key2': 'Red Sox'}
comments3 = dict([(1, 3), (2, 3)])
#Function:
def tokenize_body(comments):
tokens = {}
for idx, com_id in enumerate(comments):
body = comments[com_id]['body']
tokenized = [x.lower() for x in nltk.word_tokenize(body)]
tokens[com_id] = tokenized
return tokens
tokens = tokenize_body(comments1)
I know that with enumerate I am basically calling the index and the key, I can not figure out how to call the 'body', i.e the strings that I want to tokenize.
For both comments1 and comments2 with strings as inputs I receive the error: TypeError: string indices must be integers.
If I apply integers instead of strings, comments3, I receive the error:
TypeError: 'int' object is not subscriptable.
This may seem trivial to you, but I can not figure out what I am doing wrong. If you could provide a minimum working example, that would be highly appreciated.
In order to loop through a dictionary in python, you need to use the items method to get both keys and values:
comments = {"key1": "word", "key2": "word2"}
def tokenize_body(comments):
tokens = {}
for key, value in comments.items():
# values - word, word2
# keys - key1, key2
tokens[key] = [x.lower() for x in nltk.word_tokenize(value)]
return tokens
enumerate is used for lists, in order to get the index of an element:
l = ['a', 'b']
for index, elm in enumerate(l):
print(index) # => 0, 1
You might be looking for .items(), e.g.:
for idx, item in enumerate(comments1.items()):
print(idx, item)
This will print
0 (1, 'Rockies')
1 (2, 'Red Sox')
See a demo on ideone.com.

I want to convert the categorical variable to numerical in Python

I have a dataframe having categorical variables. I want to convert them to the numerical using the following logic:
I have 2 lists one contains the distinct categorical values in the column and the second list contains the values for each category. Now i need to map these values in place of those categorical values.
For Eg:
List_A = ['A','B','C','D','E']
List_B = [3,2,1,1,2]
I need to replace A with 3, B with 2, C and D with 1 and E with 2.
Is there any way to do this in Python.
I can do this by applying multiple for loops but I am looking for some easier way or some direct function if there is any.
Any help is very much appreciated, Thanks in Advance.
Create a mapping dict
List_A = ['A','B','C','D','E',]
List_B = [3,2,1,1,2]
d=dict(zip(List_A, List_B))
new_list=['A','B','C','D','E','A','B']
new_mapped_list=[d[v] for v in new_list if v in d]
new_mapped_list
Or define a function and use map
List_A = ['A','B','C','D','E',]
List_B = [3,2,1,1,2]
d=dict(zip(List_A, List_B))
def mapper(value):
if value in d:
return d[value]
return None
new_list=['A','B','C','D','E','A','B']
map(mapper,new_list)
Suppose df is your data frame and "Category" is the name of the column holding your categories:
df[df.Category == "A"] = 3,2, 1, 1, 2
df[(df.Category == "B") | (df.Category == "E") ] = 2
df[(df.Category == "C") | (df.Category == "D") ] = 1
If you only need to replace values in one list with the values of other and the structure is like the one you say. Two list, same lenght and same position, then you only need this:
list_a = []
list_a = list_b
A more convoluted solution would be like this, with a function that will create a dictionary that you can use on other lists:
# we make a function
def convert_list(ls_a,ls_b):
dic_new = {}
for letter,number in zip(ls_a,ls_b):
dic_new[letter] = number
return dic_new
This will make a dictionary with the combinations you need. You pass the two list, then you can use that dictionary on other list:
List_A = ['A','B','C','D','E']
List_B = [3,2,1,1,2]
dic_new = convert_list(ls_a, ls_b)
other_list = ['a','b','c','d']
for _ in other_list:
print(dic_new[_.upper()])
# prints
3
2
1
1
cheers
You could use a solution from machine learning scikit-learn module.
OneHotEncoder
LabelEncoder
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
The pandas "hard" way:
https://stackoverflow.com/a/29330853/9799449

Summing keys and values in a list of dictionaries python

I have a list of dictionaries called "timebucket" :
[{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
I would like to return the top two largest keys (.99 and .98) and average them , plus , get both of their values and average those as well.
Expected output would like something like:
{ (avg. two largest keys) : (avg. values of two largest keys) }
I've tried:
import numpy as np
import heapq
[np.mean(heapq.nlargest(2, i.keys())) for i in timebucket]
but heapq doesn't work in this scenario, and not sure how to keep keys and values linked
Doing this with numpy:
In []:
a = np.array([e for i in timebucket for e in i.items()]);
a[a[:,1].argsort()][:2].mean(axis=0)
Out[]
array([ 0.99084129, 0.00261179])
Though I suspect creating a better data-structure up front would probably be a better approach.
This gives you the average of 2 largest keys (keyave) and the average of the two corresponding values (valave).
The keys and values are put into a dictionary called newdict.
timebucket = [{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
keys = []
for time in timebucket:
for x in time:
keys.append(x)
result = {}
for d in timebucket:
result.update(d)
largestkey = (sorted(keys)[-1])
ndlargestkey = (sorted(keys)[-2])
keyave = (float((largestkey)+(ndlargestkey))/2)
largestvalue = (result[(largestkey)])
ndlargestvalue = (result[(ndlargestkey)])
valave = (float((largestvalue)+(ndlargestvalue))/2)
newdict = {}
newdict[keyave] = valave
print(newdict)
#print(keyave)
#print(valave)
Output
{0.9908412862404705: 0.002611786698168918}
Here is a solution to your problem:
def dothisthing(mydict) # define the function with a dictionary a the only parameter
keylist = [] # create an empty list
for key in mydict: # iterate the input dictionary
keylist.append(key) # add the key from the dictionary to a list
keylist.sort(reverse = True) # sort the list from highest to lowest numbers
toptwokeys = 0 # create a variable
toptwovals = 0 # create a variable
count = 0 # create an integer variable
for item in keylist: # iterate the list we created above
if count <2: # this limits the iterations to the first 2
toptwokeys += item # add the key
toptwovals += (mydict[item]) # add the value
count += 1
finaldict = {(toptwokeys/2):(toptwovals/2)} # create a dictionary where the key and val are the average of the 2 from the input dict with the greatest keys
return finaldict # return the output dictionary
dothisthing({0.9711533363722904: 0.008296776727415599, 0.97163564816067838: 0.008153794130319884, 0.99212783984967068: 0.0022392112909864364, 0.98955473263127025: 0.0029843621053514003})
#call the function with your dictionary as the parameter
I hope it helps
You can do it in just four lines without importing numpy :
One line solution
For two max average keys :
max_keys_average=sorted([keys for item in timebucket for keys,values in item.items()])[::-1][:2]
print(sum(max_keys_average)/len(max_keys_average))
output:
0.9908412862404705
for their keys average :
max_values_average=[values for item in max_keys_average for item_1 in timebucket for keys,values in item_1.items() if item==keys]
print(sum(max_values_average)/len(max_values_average))
output:
0.002611786698168918
If you are facing issue with understanding list comprehension here is detailed solution for you:
Detailed Solution
first step:
get all the keys of dict in one list :
Here is your timebucket list:
timebucket=[{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
now let's store all the keys in one list:
keys_list=[]
for dict in timebucket:
for key,value in dict.items():
keys_list.append(key)
Now next step is sort this list and get last two values of this list :
max_keys=sorted(keys_list)[::-1][:2]
Next step just take sum of this new list and divide by len of list :
print(sum(max_keys)/len(max_keys))
output:
0.9908412862404705
Now just iterate the max_keys and keys in timebucket and see if both item match then get the value of that item in a list.
max_values=[]
for item in max_keys:
for dict in timebucket:
for key, value in dict.items():
if item==key:
max_values.append(value)
print(max_values)
Now last part , just take sum and divide by len of max_values:
print(sum(max_values)/len(max_values))
Gives the output :
0.002611786698168918
This is an alternative solution to the problem:
In []:
import numpy as np
import time
def AverageTB(time_bucket):
tuples = [tb.items() for tb in time_bucket]
largest_keys = []
largest_keys.append(max(tuples))
tuples.remove(max(tuples))
largest_keys.append(max(tuples))
keys = [i[0][0] for i in largest_keys]
values = [i[0][1] for i in largest_keys]
return np.average(keys), np.average(values)
time_bucket = [{0.9711533363722904: 0.008296776727415599},
{0.97163564816067838: 0.008153794130319884},
{0.99212783984967068: 0.0022392112909864364},
{0.98955473263127025: 0.0029843621053514003}]
time_exe = time.time()
print('avg. (keys, values): {}'.format(AverageTB(time_bucket)))
print('time: {}'.format(time.time() - time_exe))
Out[]:
avg. (keys, values): (0.99084128624047052, 0.0026117866981689181)
time: 0.00037789344787

Find last occurrence of any item of one list in another list in python

I have the following 2 lists in python:
ll = [500,500,500,501,500,502,500]
mm = [499,501,502]
I want to find out the position of last occurence of any item in mm, in the list ll. I can do this for a single element like this:
len(ll) - 1 - ll[::-1].index(502)
>> 5
Here ll[::-1].index(502) provides position of last occurence of 502 and len(ll) - 1 gives the total length.
How do I extend this to work for the entire list mm? I know I can write a function, but is there a more pythonic way
If you want all the last indices of each item in ll present in mm, then:
ll = [500,500,500,501,500,502,500]
mm = [499,501,502]
d = {v:k for k,v in enumerate(ll) if v in mm}
# {501: 3, 502: 5}
It's probably worth creating a set from mm first to make it an O(1) lookup, instead of O(N), but for three items, it's really not worth it.
Following #Apero's concerns about retaining missing indices as None and also using a hash lookup to make it an O(1) lookup...
# Build a key->None dict for all `mm`
d = dict.fromkeys(mm)
# Update `None` values with last index using a gen-exp instead of dict-comp
d.update((v,k) for k,v in enumerate(ll) if v in d)
# {499: None, 501: 3, 502: 5}
results = {}
reversed = ll[::-1]
for item in mm:
try:
index = ((len(ll) - 1) - reversed.index(item))
except ValueError:
index = None
finally:
results[item] = index
print results
Output:
{499: None, 501: 3, 502: 5}
You can do this with a list comprehension and the max function. In particular, for each element in ll, iterate through mm to create a list of indices where ll[i] = mm[i], and take the max of this list.
>>> indices = [ max([-1] + [i for i, m in enumerate(mm) if m == n]) for n in ll ]
>>> indices
[-1, -1, -1, 1, -1, 2, -1]
Note that you need to add [-1] in case there are no matching indices in mm.
All this being said, I don't think it's more Python-ic than writing a function and using a list comprehension (as you alluded to in your question).

Categories