almost similar keys in dictionary - python

I have a dataframe which contains the below column:
column_name
CUVITRU 8 gram
CUVITRU 1 grams
I want to replace these gram and grams to gm. So I have created a dictionary
dict_ = {'gram':'gm','grams':'gm'}
I am able to replace it but it is converting grams to gms. Below is the column after conversion:
column_name
CUVITRU 8 gm
CUVITRU 1 gms
How can I solve this issue.
Below is my code:
dict_ = {'gram':'gm','grams':'gm'}
for key, value in dict_abbr.items():
my_string = my_string.replace(key,value)
my_string = ' '.join(unique_list(my_string.split()))
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist

because it finds 'gram' in 'grams', one way is to instead of string use reg exp for replacement on word boundaries, like (r"\b%s\.... look at the answer usign .sub here for example: search-and-replace-with-whole-word-only-option

You don't actually care about the dict; you care about the key/value pairs produced by its items() method, so just store that in the first place. This lets you specify the order of replacements to try regardless of your Python version.
d = [('grams':'gm'), ('gram':'gm')]
for key, value in d:
my_string = my_string.replace(key,value)

You can make replacements in the reverse order of the key lengths instead:
dict_ = {'gram':'gm','grams':'gm'}
for key in sorted(dict_abbr, key=len, reverse=True):
my_string = my_string.replace(key, dict_[key])

Put the longer string grams before the shorter one gram like this {'grams':'gm','gram':'gm'}, and it will work.
Well, I’m using a recent python 3 like 3.7.2 which guarantees that the sequence of retrieving items is the same as that they are created in the dictionary. For earlier Pythons that may happen (and this appears to be the problem) but isn’t guaranteed.

Related

Function that makes dict from string but swaps keys and values?

I'm trying to make a function that takes in list of strings as an input like the one listed below:
def swap_values_dict(['Summons: Bahamut, Shiva, Chocomog',
'Enemies: Bahamut, Shiva, Cactaur'])
and creates a dictionary from them using the words after the colons as keys and the words before the colons as values. I need to clarify that, at this point, there are only two strings in the list. I plan to split the strings into sublists and, from there, try and assign them to a dictionary.
The output should look like
{'Bahamut': ['Summons','Enemies'],'Shiva':['Summons','Enemies'],'Chocomog':['Summons'],'Cactaur':['Enemies']}
As you can see, the words after the colon in the original list have become keys while the words before the colon (categories) have become the values. If one of the values appears in both lists, it is assigned two values in the final dictionary. I would like to be able to make similar dictionaries out of many lists of different sizes, not just ones that contain two strings. Could this be done without list comprehension and only for loops and if statements?
What I've Tried So Far
title_list = []
for i in range(len(mobs)):#counts amount of strings in list
titles = (mobs[i].split(":"))[0] #gets titles from list using split
title_list.append(titles)
title_list
this code returns ['Summons', 'Enemies'] which aren't the results I wanted to receive but I think they could help me write the function. I had planned on separating the keys and values into separate lists and then zipping them together afterwards as a dictionary.
Try:
def swap_values_dict(lst):
tmp = {}
for s in lst:
k, v = map(str.strip, s.split(":"))
tmp[k] = list(map(str.strip, v.split(",")))
out = {}
for k, v in tmp.items():
for i in v:
out.setdefault(i, []).append(k)
return out
print(
swap_values_dict(
[
"Summons: Bahamut, Shiva, Chocomog",
"Enemies: Bahamut, Shiva, Cactaur",
]
)
)
Prints:
{
"Bahamut": ["Summons", "Enemies"],
"Shiva": ["Summons", "Enemies"],
"Chocomog": ["Summons"],
"Cactaur": ["Enemies"],
}
I'd use a defaultdict. It saves you the trouble of manually checking if a key exists in your dictionary and constructing a new empty list, making for a rather concise function:
from collections import defaultdict
def swap_values_dict(mobs):
result = defaultdict(list)
for elem in mobs:
role, members = elem.split(': ')
for m in members.split(', '):
result[m].append(role)
return result

Comparing and returning dictionary string values?

Trying to make a function that returns a list of values from the dictionary. If the plants are watered weekly, it would be appended into the list then later returned sorted. However, my code iterates each letter of 'weekly' instead of the whole string and I have no idea how to access the watering frequency of the dictionary items. Any explanations would be appreciated.
def weekly(plants_d):
d = []
for plant in plants_d:
for plan in plants_d[plant]:
if plan == "weekly":
d.append[plan]
return sort(d)
weekly({'fern':'weekly', 'shamrock':'weekly', 'carnation':'weekly'})
# Should return like this: ['carnation','fern','shamrock']
Amending the previous answer so that only values with "weekly" are used:
>>> my_dict = {'fern':'weekly', 'shamrock':'weekly', 'carnation':'weekly', 'daffodil': 'monthly'}
>>> sorted(k for k, v in my_dict.items() if v == 'weekly')
['carnation', 'fern', 'shamrock']
This line:
for plan in plants_d[plant]:
is wrong. Since plants_d[plant] is a string like "weekly", this is like
for plan in "weekly":
which will iterate over the letters in the string. Then when you do if plan == "weekly": it will never match, because plan is just a single letter like "w".
You can simply use:
if plants_d[plan] == "weekly":
Or you can change the first loop to:
for plan_name, plan_frequency in plants_d.items():
if plan_frequency == "weekly":
d.append[plan_name]
See Iterating over dictionaries using 'for' loops
Simplified way to achieve this is using dict.keys() which return the list of all the keys in dict. In order to sort the list, you may use sorted() as:
>>> my_dict = {'fern':'weekly', 'shamrock':'weekly', 'carnation':'weekly'}
>>> sorted(my_dict.keys())
['carnation', 'fern', 'shamrock']
Edit: If some the plans are monthly, firstly filter the monthly plans using filter or dict comprehension. Your code should be like:
>>> my_dict = {'fern':'weekly', 'shamrock':'weekly', 'carnation':'weekly',
'something': 'monthly'}
# using filter() as #brianpck has already mentioned 'dict comprehension' approach
# It is better to use brian's approach
>>> filtered_dict = dict(filter(lambda x: x[1] == 'weekly', my_dict.items()))
>>> sorted(filtered_dict.keys())
['carnation', 'fern', 'shamrock']

Updating a dictionary with integer keys

I'm working on a short assignment where I have to read in a .txt file and create a dictionary in which the keys are the number of words in a sentence and the values are the number of sentences of a particular length. I've read in the file and determined the length of each sentence already, but I'm having troubles creating the dictionary.
I've already initialized the dictionary and am trying to update it (within a for loop that iterates over the sentences) using the following code:
for snt in sentences:
words = snt.split(' ')
sDict[len(words)]+=1
It gives me a KeyError on the very first iteration. I'm sure it has to do with my syntax but I'm not sure how else to update an existing entry in the dictionary.
When you initialize the dictionary, it starts out empty. The next thing you do is look up a key so that you can update its value, but that key doesn't exist yet, because the dictionary is empty. The smallest change to your code is probably to use the get dictionary method. Instead of this:
sDict[len(words)]+=1
Use this:
sDict[len(words)] = sDict.get(len(words), 0) + 1
The get method looks up a key, but if the key doesn't exist, you are given a default value. The default default value is None, and you can specify a different default value, which is the second argument, 0 in this case.
The better solution is probably collections.Counter, which handles the common use case of counting occurrences:
import collections
s = map(str.split, sentences)
sDict = collections.Counter(map(len, s))
defaultdicts were invented for this purpose:
from collections import defaultdict
sDict = defaultdict(int)
for snt in sentences:
sDict[len(snt.split())] += 1
If you are restricted to the use of pure dictionaries in the context of your assignment, then you need to test for existence of the key before incrementing its value in order to prevent a KeyError:
sDict = {}
for snt in sentences:
num_words = len(snt.split())
if num_words in sDict:
sDict[num_words] += 1
else:
sDict[num_words] = 1

How can I sort list of strings in specific order?

Let's say I have such a list:
['word_4_0_w_7',
'word_4_0_w_6',
'word_3_0_w_10',
'word_3_0_w_2']
and I want to sort them according to number that comes after "word" and according to number after "w".
It will look like this:
['word_3_0_w_2',
'word_3_0_w_10',
'word_4_0_w_6',
'word_4_0_w_7']
What comes in mind is to create a bunch of list and according to index after "word" stuff them with sorted strings according "w", and then merge them.
Is in Python more clever way to do it?
Use Python's key functionality, in conjunction with other answers:
def mykey(value):
ls = value.split("_")
return int(ls[1]), int(ls[-1])
newlist = sorted(firstlist, key=mykey)
## or, if you want it in place:
firstlist.sort(key=mykey)
Python will be more efficient with key vs cmp.
You can provide a function to the sort() method of list objects:
l = ['word_4_0_w_7',
'word_4_0_w_6',
'word_3_0_w_10',
'word_3_0_w_2']
def my_key_func(x):
xx = x.split("_")
return (int(xx[1]), int(xx[-1]))
l.sort(key=my_key_func)
Output:
print l
['word_3_0_w_2', 'word_3_0_w_10', 'word_4_0_w_6', 'word_4_0_w_7']
edit: Changed code according to comment by #dwanderson ; more info on this can be found here.
You can use a function to extract the relevant parts of your string and then use those parts to sort:
a = ['word_4_0_w_7', 'word_4_0_w_6', 'word_3_0_w_10', 'word_3_0_w_2']
def sort_func(x):
parts = x.split('_');
sort_key = parts[1]+parts[2]+"%02d"%int(parts[4])
return sort_key
a_sorted = sorted(a,key=sort_func)
The expression "%02d" %int(x.split('_')[4]) is used to add a leading zero in front of second number otherwise 10 will sort before 2. You may have to do the same with the number extracted by x.split('_')[2].

Python dictionary search values for keys using regular expression

I am trying to implement to search for a value in Python dictionary for specific key values (using regular expression as a key).
Example:
I have a Python dictionary which has values like:
{'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
I need to search for values whose key has 'seller_account'? I wrote a sample program but would like to know if something can be done better. Main reason is I am not sure of regular expression and miss out something (like how do I set re for key starting with 'seller_account'):
#!usr/bin/python
import re
my_dict={'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
reObj = re.compile('seller_account')
for key in my_dict.keys():
if(reObj.match(key)):
print key, my_dict[key]
~ home> python regular.py
seller_account_number 3433343
seller_account_0 454676
seller_account 454545
If you only need to check keys that are starting with "seller_account", you don't need regex, just use startswith()
my_dict={'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
for key, value in my_dict.iteritems(): # iter on both keys and values
if key.startswith('seller_account'):
print key, value
or in a one_liner way :
result = [(key, value) for key, value in my_dict.iteritems() if key.startswith("seller_account")]
NB: for a python 3.X use, replace iteritems() by items() and don't forget to add () for print.
You can solve this with dpath.
http://github.com/akesterson/dpath-python
dpath lets you search dictionaries with a glob syntax on the keys, and to filter the values. What you want is trivial:
$ easy_install dpath
>>> dpath.util.search(MY_DICT, 'seller_account*')
... That will return you a big merged dictionary of all the keys matching that glob. If you just want the paths and values:
$ easy_install dpath
>>> for (path, value) in dpath.util.search(MY_DICT, 'seller_account*', yielded=True):
>>> ... # do something with the path and value
def search(dictionary, substr):
result = []
for key in dictionary:
if substr in key:
result.append((key, dictionary[key]))
return result
>>> my_dict={'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
>>> search(my_dict, 'seller_account')
[('seller_account_number', 3433343), ('seller_account_0', 454676), ('seller_account', 454545)]
You can use a combination of "re" and "filter". for example, if you want to search which methods have the word "stat" in their method name in the os module you can use the code below.
import re
import os
r = re.compile(".*stat.*")
list(filter(r.match, os.__dict__.keys()))
result is:
['stat', 'lstat', 'fstat', 'fstatvfs', 'statvfs', 'stat_result', 'statvfs_result']
I think the performance issue in the original question is the key_value search after the keys have been found with the "re" module. if a portion of the key is interchangeable we can't use "startswith". so "re" is a good choice. plus I use a filter to get a list of all matched keys and make a list of them so we can return all values with simple [DICT[k] for k in LIST].
like how do I set re for key starting with 'seller_account'
reObj = re.compile('seller_account')
should be:
reObj = re.compile('seller_account.*')

Categories