I am trying to implement to search for a value in Python dictionary for specific key values (using regular expression as a key).
Example:
I have a Python dictionary which has values like:
{'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
I need to search for values whose key has 'seller_account'? I wrote a sample program but would like to know if something can be done better. Main reason is I am not sure of regular expression and miss out something (like how do I set re for key starting with 'seller_account'):
#!usr/bin/python
import re
my_dict={'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
reObj = re.compile('seller_account')
for key in my_dict.keys():
if(reObj.match(key)):
print key, my_dict[key]
~ home> python regular.py
seller_account_number 3433343
seller_account_0 454676
seller_account 454545
If you only need to check keys that are starting with "seller_account", you don't need regex, just use startswith()
my_dict={'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
for key, value in my_dict.iteritems(): # iter on both keys and values
if key.startswith('seller_account'):
print key, value
or in a one_liner way :
result = [(key, value) for key, value in my_dict.iteritems() if key.startswith("seller_account")]
NB: for a python 3.X use, replace iteritems() by items() and don't forget to add () for print.
You can solve this with dpath.
http://github.com/akesterson/dpath-python
dpath lets you search dictionaries with a glob syntax on the keys, and to filter the values. What you want is trivial:
$ easy_install dpath
>>> dpath.util.search(MY_DICT, 'seller_account*')
... That will return you a big merged dictionary of all the keys matching that glob. If you just want the paths and values:
$ easy_install dpath
>>> for (path, value) in dpath.util.search(MY_DICT, 'seller_account*', yielded=True):
>>> ... # do something with the path and value
def search(dictionary, substr):
result = []
for key in dictionary:
if substr in key:
result.append((key, dictionary[key]))
return result
>>> my_dict={'account_0':123445,'seller_account':454545,'seller_account_0':454676, 'seller_account_number':3433343}
>>> search(my_dict, 'seller_account')
[('seller_account_number', 3433343), ('seller_account_0', 454676), ('seller_account', 454545)]
You can use a combination of "re" and "filter". for example, if you want to search which methods have the word "stat" in their method name in the os module you can use the code below.
import re
import os
r = re.compile(".*stat.*")
list(filter(r.match, os.__dict__.keys()))
result is:
['stat', 'lstat', 'fstat', 'fstatvfs', 'statvfs', 'stat_result', 'statvfs_result']
I think the performance issue in the original question is the key_value search after the keys have been found with the "re" module. if a portion of the key is interchangeable we can't use "startswith". so "re" is a good choice. plus I use a filter to get a list of all matched keys and make a list of them so we can return all values with simple [DICT[k] for k in LIST].
like how do I set re for key starting with 'seller_account'
reObj = re.compile('seller_account')
should be:
reObj = re.compile('seller_account.*')
Related
I have a dataframe which contains the below column:
column_name
CUVITRU 8 gram
CUVITRU 1 grams
I want to replace these gram and grams to gm. So I have created a dictionary
dict_ = {'gram':'gm','grams':'gm'}
I am able to replace it but it is converting grams to gms. Below is the column after conversion:
column_name
CUVITRU 8 gm
CUVITRU 1 gms
How can I solve this issue.
Below is my code:
dict_ = {'gram':'gm','grams':'gm'}
for key, value in dict_abbr.items():
my_string = my_string.replace(key,value)
my_string = ' '.join(unique_list(my_string.split()))
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
because it finds 'gram' in 'grams', one way is to instead of string use reg exp for replacement on word boundaries, like (r"\b%s\.... look at the answer usign .sub here for example: search-and-replace-with-whole-word-only-option
You don't actually care about the dict; you care about the key/value pairs produced by its items() method, so just store that in the first place. This lets you specify the order of replacements to try regardless of your Python version.
d = [('grams':'gm'), ('gram':'gm')]
for key, value in d:
my_string = my_string.replace(key,value)
You can make replacements in the reverse order of the key lengths instead:
dict_ = {'gram':'gm','grams':'gm'}
for key in sorted(dict_abbr, key=len, reverse=True):
my_string = my_string.replace(key, dict_[key])
Put the longer string grams before the shorter one gram like this {'grams':'gm','gram':'gm'}, and it will work.
Well, I’m using a recent python 3 like 3.7.2 which guarantees that the sequence of retrieving items is the same as that they are created in the dictionary. For earlier Pythons that may happen (and this appears to be the problem) but isn’t guaranteed.
I currently have a dictionary that looks like this:
{OctetString('Ethernet8/6'): Integer(1),
OctetString('Ethernet8/7'): Integer(2),
OctetString('Ethernet8/8'): Integer(2),
OctetString('Ethernet8/9'): Integer(1),
OctetString('Vlan1'): Integer(2),
OctetString('Vlan10'): Integer(1),
OctetString('Vlan15'): Integer(1),
OctetString('loopback0'): Integer(1),
OctetString('mgmt0'): Integer(1),
OctetString('port-channel1'): Integer(1),
OctetString('port-channel10'): Integer(1),
OctetString('port-channel101'): Integer(1),
OctetString('port-channel102'): Integer(1)}
I want my dictionary to look like this:
{OctetString('Ethernet8/6'): Integer(1),
OctetString('Ethernet8/7'): Integer(2),
OctetString('Ethernet8/8'): Integer(2),
OctetString('Ethernet8/9'): Integer(1)}
I am not sure what is the best way to find these key, value pairs. I really want anything that matches '\Ethernet(\d*)/(\d*)'. However I am not sure the best way to go about this. My main goal is to match all the Ethernet Values and then count them. For example: After I have the dict matching all of Ethernetx/x I want to count the amount of 1's and 2's.
Also, why do I get only Ethernet8/6 when I iterate the dictionary and print, but when I pprint the dictionary I end up with OctetString('Ethernet8/6')?
for k in snmp_comb: print k
Ethernet2/18
Ethernet2/31
Ethernet2/30
Ethernet2/32
Ethernet8/46
This should do it:
new_dict = dict()
for key, value in orig_dict.items():
if 'Ethernet' in str(key):
new_dict[key] = value
When you use print, python calls the __str__ method on the OctetString object, which returns Ethernet8/6. However, I think pprint defaults to printing the object type.
EDIT:
Stefan Pochmann has rightly pointed out below that if 'Ethernet' in will match any string which contains the word Ethernet. The OP did mention using regex in his post to match Ethernet(\d*)/(\d*), so this answer may not be suitable to anyone else looking to solve a similar problem.
(I'll use the same 'Ethernet' in str(key) test as the accepted answer.)
If you want to keep the original dict and have the filtered version as a separate dictionary, I'd use a comprehension:
newdict = {key: value
for key, value in mydict.items()
if 'Ethernet' in str(key)}
If you don't want to keep the original dict, you can also just remove the entries you don't want:
for key in list(mydict):
if 'Ethernet' in str(key):
del mydict[key]
The reason you get "OctetString('...')" is the same as this one:
>>> 'foo'
'foo'
>>> pprint.pprint('foo')
'foo'
>>> print('foo')
foo
The first two tests show you a representation you can use in source code, that's why there are quotes. It's what the repr function gets you. The third test prints the value for normal pleasure, so doesn't add quotes. The "OctetString('...')" is simply such a representation as well, and you can copy&paste it into source code and get actual OctetString objects again, rather than Python string objects. I guess pprint is mostly intended for developing, where it's more useful to get the full repr version.
I'm using ConfigParser which returns a dictionary of configuration data as such:
{'general': {'UserKey': 'thisisatestkey'}}
If I want to simply print the value of the UserKey key (in this case thisisatestkey), then I generally just do a print "Your key is: {0}".format(mydictvar.get('UserKey')).
If I just print out the raw dict to a string I get the above. If I use the print statement above I get result of None since there is no key in the root of the dict called UserKey. If I .get('general') I just get: {'UserKey': 'thisisatestkey'}
Obviously I could do a fore loop like so:
keydic = cp.get_config_data()
for m, k in keydic.iteritems():
for s, v in k.iteritems():
userkey = v
and then print userkey which works fine. But I want to know how I can just avoid having to do the entire for loop first and just print the darned value right inline? Thanks!
You can use
mydictvar['general']['UserKey']
Or, if keys might be missing
mydictvar.get('general', {}).get('UserKey')
mydictvar['general'] returns a dictionary object; you can then just apply [...] to that value to retrieve the next key.
This works in string formatting too:
>>> mydictvar = {'general': {'UserKey': 'thisisatestkey'}}
>>> print "Your key is: {0[general][UserKey]}".format(mydictvar)
Your key is: thisisatestkey
simply without loop:
>>> my_dict = {'general': {'UserKey': 'thisisatestkey'}}
>>> my_dict['general']['UserKey']
'thisisatestkey'
I have a dict that has string-type keys whose exact values I can't know (because they're generated dynamically elsewhere). However, I know that that the key I want contains a particular substring, and that a single key with this substring is definitely in the dict.
What's the best, or "most pythonic" way to retrieve the value for this key?
I thought of two strategies, but both irk me:
for k,v in some_dict.items():
if 'substring' in k:
value = v
break
-- OR --
value = [v for (k,v) in some_dict.items() if 'substring' in k][0]
The first method is bulky and somewhat ugly, while the second is cleaner, but the extra step of indexing into the list comprehension (the [0]) irks me. Is there a better way to express the second version, or a more concise way to write the first?
There is an option to write the second version with the performance attributes of the first one.
Use a generator expression instead of list comprehension:
value = next(v for (k,v) in some_dict.iteritems() if 'substring' in k)
The expression inside the parenthesis will return an iterator which you will then ask to provide the next, i.e. first element. No further elements are processed.
How about this:
value = (v for (k,v) in some_dict.iteritems() if 'substring' in k).next()
It will stop immediately when it finds the first match.
But it still has O(n) complexity, where n is the number of key-value pairs. You need something like a suffix list or a suffix tree to speed up searching.
If there are many keys but the string is easy to reconstruct from the substring, then it can be faster reconstructing it. e.g. often you know the start of the key but not the datestamp that has been appended on. (so you may only have to try 365 dates rather than iterate through millions of keys for example).
It's unlikely to be the case but I thought I would suggest it anyway.
e.g.
>>> names={'bob_k':32,'james_r':443,'sarah_p':12}
>>> firstname='james' #you know the substring james because you have a list of firstnames
>>> for c in "abcdefghijklmnopqrstuvwxyz":
... name="%s_%s"%(firstname,c)
... if name in names:
... print name
...
james_r
class MyDict(dict):
def __init__(self, *kwargs):
dict.__init__(self, *kwargs)
def __getitem__(self,x):
return next(v for (k,v) in self.iteritems() if x in k)
# Defining several dicos ----------------------------------------------------
some_dict = {'abc4589':4578,'abc7812':798,'kjuy45763':1002}
another_dict = {'boumboum14':'WSZE x478',
'tagada4783':'ocean11',
'maracuna102455':None}
still_another = {12:'jfg',45:'klsjgf'}
# Selecting the dicos whose __getitem__ method will be changed -------------
name,obj = None,None
selected_dicos = [ (name,obj) for (name,obj) in globals().iteritems()
if type(obj)==dict
and all(type(x)==str for x in obj.iterkeys())]
print 'names of selected_dicos ==',[ name for (name,obj) in selected_dicos]
# Transforming the selected dicos in instances of class MyDict -----------
for k,v in selected_dicos:
globals()[k] = MyDict(v)
# Exemple of getting a value ---------------------------------------------
print "some_dict['7812'] ==",some_dict['7812']
result
names of selected_dicos == ['another_dict', 'some_dict']
some_dict['7812'] == 798
I prefer the first version, although I'd use some_dict.iteritems() (if you're on Python 2) because then you don't have to build an entire list of all the items beforehand. Instead you iterate through the dict and break as soon as you're done.
On Python 3, some_dict.items(2) already results in a dictionary view, so that's already a suitable iterator.
dict1=({"EMP$$1":1,"EMP$$2":2,"EMP$$3":3})
How to check if EMP exists in the dictionary using python
dict1.get("EMP##") ??
It's not entirely clear what you want to do.
You can loop through the keys in the dict selecting keys using the startswith() method:
>>> for key in dict1:
... if key.startswith("EMP$$"):
... print "Found",key
...
Found EMP$$1
Found EMP$$2
Found EMP$$3
You can use a list comprehension to get all the values that match:
>>> [value for key,value in dict1.items() if key.startswith("EMP$$")]
[1, 2, 3]
If you just want to know if a key matches you could use the any() function:
>>> any(key.startswith("EMP$$") for key in dict1)
True
This approach strikes me as contrary to the intent of a dictionary.
A dictionary is made up of hash keys which have had values associated with them. The benefit of this structure is that it provides very fast lookups (on the order of O(1)). By searching through the keys, you're negating that benefit.
I would suggest reorganizing your dictionary.
dict1 = {"EMP$$": {"1": 1, "2": 2, "3": 3} }
Then, finding "EMP$$" is as simple as
if "EMP$$" in dict1:
#etc...
You need to be a lot more specific with what you want to do. However, assuming the dictionary you gave:
dict1={"EMP$$1":1, "EMP$$2":2, "EMP$$3":3}
If you wanted to know if a specific key was present before trying to request it you could:
dict1.has_key('EMP$$1')
True
Returns True as dict1 has the a key EMP$$1.
You could also forget about checking for keys and rely on the default return value of dict1.get():
dict1.get('EMP$$5',0)
0
Returns 0 as default given dict1 doesn't have a key EMP$$5.
In a similar way you could also use a `try/except/ structure to catch and handle missed keys:
try:
dict1['EMP$$5']
except KeyError, e:
# Code to deal w key error
print 'Trapped key error in dict1 looking for %s' % e
The other answers to this question are also great, but we need more info to be more precise.
There's no way to match dictionary keys like this. I suggest you rethink your data structure for this problem. If this has to be extra quick you could use something like a suffix tree.
You can use in string operator that checks if item is in another string. dict1 iterator returns list of keys, so you check "EMP$$" against of each dict1.key.
dict1 = {"EMP$$1": 1, "EMP$$2": 2, "EMP$$3": 3}
print(any("EMP$$" in i for i in dict1))
# True
# testing for item that doesn't exist
print(any("AMP$$" in i for i in dict1))
# False