I am trying to re-build a simple function, that ask for a dictionary as an input. No matter what I try I cannot figure out a minimum working example of a dictionary to pass through this function. I've read upon dictionaries and there is not so much room to create it differently, hence I do not know what the problem is.
I've tried to apply following minimum dictionary examples:
import nltk
#Different dictionaries to try as minimum working examples:
comments1 = {1 : 'Rockies', 2: 'Red Sox'}
comments2 = {'key1' : 'Rockies', 'key2': 'Red Sox'}
comments3 = dict([(1, 3), (2, 3)])
#Function:
def tokenize_body(comments):
tokens = {}
for idx, com_id in enumerate(comments):
body = comments[com_id]['body']
tokenized = [x.lower() for x in nltk.word_tokenize(body)]
tokens[com_id] = tokenized
return tokens
tokens = tokenize_body(comments1)
I know that with enumerate I am basically calling the index and the key, I can not figure out how to call the 'body', i.e the strings that I want to tokenize.
For both comments1 and comments2 with strings as inputs I receive the error: TypeError: string indices must be integers.
If I apply integers instead of strings, comments3, I receive the error:
TypeError: 'int' object is not subscriptable.
This may seem trivial to you, but I can not figure out what I am doing wrong. If you could provide a minimum working example, that would be highly appreciated.
In order to loop through a dictionary in python, you need to use the items method to get both keys and values:
comments = {"key1": "word", "key2": "word2"}
def tokenize_body(comments):
tokens = {}
for key, value in comments.items():
# values - word, word2
# keys - key1, key2
tokens[key] = [x.lower() for x in nltk.word_tokenize(value)]
return tokens
enumerate is used for lists, in order to get the index of an element:
l = ['a', 'b']
for index, elm in enumerate(l):
print(index) # => 0, 1
You might be looking for .items(), e.g.:
for idx, item in enumerate(comments1.items()):
print(idx, item)
This will print
0 (1, 'Rockies')
1 (2, 'Red Sox')
See a demo on ideone.com.
Related
I have a dictionary file, here are a few example lines:
acquires,1.09861228867
acquisition,1.09861228867
acquisitions,1.60943791243
acquisitive,0.69314718056
acridine,0.0
acronyms,1.09861228867
acrylics,0.69314718056
actual,1.60943791243
words = [acquires, acrylics, actual, acridine]
I need the output to be:
word_tuples = ((1.09861228867,acquires),(0.69314718056,acrylics), (1.60943791243,actual),
(0.0,acridine))
I tried doing,
sorted_list[]
word_tuples = [(key,value) for key, value in dict]
if words in word_tuples:
sorted_list.append(word_tuples[value])
You can do something like this:
dict = {"acquires":1.09861228867, "acquisition":1.09861228867, "acquisitions":1.60943791243,
"acquisitive":0.69314718056, "acridine":0.0, "acronyms":1.09861228867, "acrylics":0.69314718056," actual":1.60943791243}
words = ["acquires", "acrylics", "actual", "acridine"]
tuple_list = list()
for key, value in dict.items():
if key in words:
tuple_list.append((value, key))
print(tuple_list)
You could do it by using a passing a generator expression to the tuple constructor:
from operator import itemgetter
my_dict = {
'acquires': 1.09861228867,
'acquisition': 1.09861228867,
'acquisitions': 1.60943791243,
'acquisitive': 0.69314718056,
'acridine': 0.0,
'acronyms': 1.09861228867,
'acrylics': 0.69314718056,
'actual': 1.60943791243,
}
words = {'acquires', 'acrylics', 'actual', 'acridine'}
word_tuples = tuple((value, word) for word, value in
sorted(my_dict.items(), key=itemgetter(0)) if word in words)
Note that I made words a set of strings instead of a list because doing so greatly speeds-up the if word in words membership check.
I would consider iterating over the words List and checking the dictionary for the element. The reason for not doing it the other way around is that searching for an element in a list has a complexity of O(n) while checking the dictionary will have a complexity of O(1) and is therefore much faster.
Here is my solution:
my_dict = {"acquires":1.09861228867, "acquisition":1.09861228867, "acquisitions":1.60943791243,"acquisitive":0.69314718056, "acridine":0.0, "acronyms":1.09861228867, "acrylics":0.69314718056,"actual":1.60943791243}
words = ["acquires", "acrylics", "actual", "acridine"]
word_tuples = list()
for word in words:
if word in my_dict:
word_tuples.append((word, my_dict[word]))
print(word_tuples)
I was making a program where first parameter is a list and second parameter is a list of dictionaries. I want to return a list of lists like this:
As an example, if this were a function call:
make_lists(['Example'],
[{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]
)
the expected return value would be:
[ ['Made-up'] ]
As an second example, if this were a function call:
make_lists(['Hint', 'Num'],
[{'Hint': 'Length 2 Not Required', 'Num' : 8675309},
{'Num': 1, 'Hint' : 'Use 1st param order'}]
)
the expected return value would be:
[ ['Length 2 Not Required', 8675309],
['Use 1st param order', 1]
]
I have written a code for this but my code does not return a list of lists, it just returns a single list. Please can someone explain?
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
for y in x:
for k in s:
if(y==k):
lol.append(x.get(y))
i = i+1
return lol
Expected Output:
[ ['Length 2 Not Required', 8675309],['Use 1st param order', 1] ]
Output:
['Length 2 Not Required', 8675309, 1, 'Use 1st param order']
The whole point of dictionaries, is that you can access them by key:
def make_lists(keys, dicts):
result = []
for d in dicts:
vals = [d[k] for k in keys if k in d]
if len(vals) > 0:
result.append(vals)
return result
Let's have a look what happens here:
We still have the result array, which accumulates the answers, but now it's called result instead of lol
Next we iterate through every dictionary:
for d in dicts:
For each dictionary d, we create a list, which is a lookup in that dictionary for the keys in keys, if the key k is in the dictionary d:
vals = [d[k] for k in keys if k in d]
The specs don't detail this, but I assume if none of the keys are in the dictionary, you don't want it added to the array. For that, we have a check if vals have any results, and only then we add it to the results:
if len(vals) > 0:
result.append(vals)
Try this code - I've managed to modify your existing code slighty, and added explanation in the comments. Essentially, you just need to use a sub-list and add that to the master list lol, and then in each loop iteration over elements in lod, append to the sub-list instead of the outermost list.
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
## Added
# Here we want to create a new list, and add it as a sub-list
# within 'lol'
lols = []
lol.append(lols)
## Done
for y in x:
for k in s:
if(y==k):
# Changed 'lol' to 'lols' here
lols.append(x.get(y))
i = i+1
return lol
print(make_lists(['Example'], [{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Hint': 'Length 2 Not Required', 'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Prints:
[['Made-up']]
[['Length 2 Not Required', 8675309], [1, 'Use 1st param order']]
A simpler solution
For a cleaner (and potentially more efficient approach), I'd suggest using builtins like map and using a list comprehension to tackle this problem:
def make_lists(s, lod):
return [[*map(dict_obj.get, s)] for dict_obj in lod]
But note, that this approach includes elements as None in cases where the desired keys in s are not present in the dictionary objects within the list lod.
To work around that, you can pass the result of map to the filter builtin function so that None values (which represent missing keys in dictionaries) are then stripped out in the result:
def make_lists(s, lod):
return [[*filter(None, map(dict_obj.get, s))] for dict_obj in lod]
print(make_lists(['Example'], [{'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Output:
[[]]
[[8675309], ['Use 1st param order', 1]]
I want to see the modeling output with two data frames.
One data frame has a target value of 1 to 8 and another has only 1,2,3,5,6,7
I made a dictionary to map the values, and I made a code as below to make the probability.
my_dict ={1:'a', 2:'b', 3:'c', 4:'d', 5:'e', 6:'f', 7:'g', 8:'f'}
def func(val):
for key, value in my_dict.items():
if val == key:
return value
return "There is no such Key"
inputData = [1, 2, 3, 4, 5]
inputData2 = np.array([inputData])
index = 1;
result_data = OrderedDict()
for x in xgb_model.predict_proba(inputData2,ntree_limit=None, validate_features=False,base_margin=None)[0]:
result_data[func(index)] = round(x,2)
index += 1
print("result_name : ", max(result_data.items(), key=operator.itemgetter(1))[0])
print("result_value : ", max(xgb_model.predict_proba(inputData2, ntree_limit=None, validate_features=False, base_margin=None)[0]))
print(result_data)
But in the second data frame, the key value is pushed back.
For example, a: 0.2, b:0.2, c:0.1, e:0.1, f:0.1 g:0.3 should appear, but in real data, the data should be:
a:0.2, b:0.2, c:0.1, d:0.1, e:0.1, f:0.3
I don’t know what I should do.
So I've been working on the code below.
Only a:0.2, b:0.2, c:0.1 comes out and ends.
for x in xgb_model.predict_proba(inputData2,ntree_limit=None, validate_features=False,base_margin=None)[0]:
if index not in y.target.unique().tolist():
continue
result_data[func(index)] = round(x,2)
index += 1
please let me know if you can't understand the code.
hope for help. Thank you.
In the second model that has 8 coefficients, you overwrite the value for f since it is defined both for the 6th as well as for the 8th element. Your dict should be defined as:
my_dict ={1:'a', 2:'b', 3:'c', 4:'d', 5:'e', 6:'f', 7:'g', 8:'h'}
But you could make the code much simpler by just using a string ("_abcdefgh") to get the correct letter for each index. You could, then, just use result_data[mystring[i]]= and drop the function.
Assume I have a python dictionary with 2 keys.
dic = {0:'Hi!', 1:'Hello!'}
What I want to do is to extend this dictionary by duplicating itself, but change the key value.
For example, if I have a code
dic = {0:'Hi!', 1:'Hello'}
multiplier = 3
def DictionaryExtend(number_of_multiplier, dictionary):
"Function code"
then the result should look like
>>> DictionaryExtend(multiplier, dic)
>>> dic
>>> dic = {0:'Hi!', 1:'Hello', 2:'Hi!', 3:'Hello', 4:'Hi!', 5:'Hello'}
In this case, I changed the key values by adding the multipler at each duplication step. What's the efficient way of doing this?
Plus, I'm also planning to do the same job for list variable. I mean, extend a list by duplicating itself and change some values like above exmple. Any suggestion for this would be helpful, too!
You can try itertools to repeat the values and OrderedDict to maintain input order.
import itertools as it
import collections as ct
def extend_dict(multiplier, dict_):
"""Return a dictionary of repeated values."""
return dict(enumerate(it.chain(*it.repeat(dict_.values(), multiplier))))
d = ct.OrderedDict({0:'Hi!', 1:'Hello!'})
multiplier = 3
extend_dict(multiplier, d)
# {0: 'Hi!', 1: 'Hello!', 2: 'Hi!', 3: 'Hello!', 4: 'Hi!', 5: 'Hello!'}
Regarding handling other collection types, it is not clear what output is desired, but the following modification reproduces the latter and works for lists as well:
def extend_collection(multiplier, iterable):
"""Return a collection of repeated values."""
repeat_values = lambda x: it.chain(*it.repeat(x, multiplier))
try:
iterable = iterable.values()
except AttributeError:
result = list(repeat_values(iterable))
else:
result = dict(enumerate(repeat_values(iterable)))
return result
lst = ['Hi!', 'Hello!']
multiplier = 3
extend_collection(multiplier, lst)
# ['Hi!', 'Hello!', 'Hi!', 'Hello!', 'Hi!', 'Hello!']
It's not immediately clear why you might want to do this. If the keys are always consecutive integers then you probably just want a list.
Anyway, here's a snippet:
def dictExtender(multiplier, d):
return dict(zip(range(multiplier * len(d)), list(d.values()) * multiplier))
I don't think you need to use inheritance to achieve that. It's also unclear what the keys should be in the resulting dictionary.
If the keys are always consecutive integers, then why not use a list?
origin = ['Hi', 'Hello']
extended = origin * 3
extended
>> ['Hi', 'Hello', 'Hi', 'Hello', 'Hi', 'Hello']
extended[4]
>> 'Hi'
If you want to perform a different operation with the keys, then simply:
mult_key = lambda key: [key,key+2,key+4] # just an example, this can be any custom implementation but beware of duplicate keys
dic = {0:'Hi', 1:'Hello'}
extended = { mkey:dic[key] for key in dic for mkey in mult_key(key) }
extended
>> {0:'Hi', 1:'Hello', 2:'Hi', 3:'Hello', 4:'Hi', 5:'Hello'}
You don't need to extend anything, you need to pick a better input format or a more appropriate type.
As others have mentioned, you need a list, not an extended dict or OrderedDict. Here's an example with lines.txt:
1:Hello!
0: Hi.
2: pylang
And here's a way to parse the lines in the correct order:
def extract_number_and_text(line):
number, text = line.split(':')
return (int(number), text.strip())
with open('lines.txt') as f:
lines = f.readlines()
data = [extract_number_and_text(line) for line in lines]
print(data)
# [(1, 'Hello!'), (0, 'Hi.'), (2, 'pylang')]
sorted_text = [text for i,text in sorted(data)]
print(sorted_text)
# ['Hi.', 'Hello!', 'pylang']
print(sorted_text * 2)
# ['Hi.', 'Hello!', 'pylang', 'Hi.', 'Hello!', 'pylang']
print(list(enumerate(sorted_text * 2)))
# [(0, 'Hi.'), (1, 'Hello!'), (2, 'pylang'), (3, 'Hi.'), (4, 'Hello!'), (5, 'pylang')]
I have values like
amity = 0
erudite = 2
etc.
And I am able to sort the integers with
print (sorted([amity, abnegation, candor, erudite, dauntless]))`
but I want the variable names to be attached to the integers as well, so that when the numbers are sorted I can tell what each number means.
Is there a way to do this?
Define a mapping between the names and the numbers:
numbers = dict(dauntless=42, amity=0, abnegation=1, candor=4, erudite=2)
Then sort:
d = sorted(numbers.items(), key=lambda x: x[1])
print(d)
# [('amity', 0), ('abnegation', 1), ('erudite', 2), ('candor', 4), ('dauntless', 42)]
To keep the result as a mapping/dictionary, call collections.OrderedDict on the sorted list:
from collections import OrderedDict
print(OrderedDict(d))
# OrderedDict([('amity', 0), ('abnegation', 1), ('erudite', 2), ('candor', 4), ('dauntless', 42)])
Python has a built in data-type called dictionary, it is used to map key, value pairs. It is pretty much what you asked for in your question, to attach a value into a specific key.
You can read a bit more about dictionaries here.
What I think you should do is to create a dictionary and map the names of the variables as strings to each of their integer values as shown below:
amity = 0
erudite = 2
abnegation = 50
dauntless = 10
lista = [amity, erudite, abnegation, dauntless]
dictonary = {} # initialize dictionary
dictionary[amity] = 'amity'# You're mapping the value 0 to the string amity, not the variable amity in this case.
dictionary[abnegation] = 'abnegation'
dictionary[erudite] = 'erudite'
dictionary[dauntless] = 'dauntless'
print(dictionary) # prints all key, value pairs in the dictionary
print(dictionary[0]) # outputs amity.
for item in sorted(lista):
print(dictionary[x]) # prints values of dictionary in an ordered manner.