my dictionary:
mydict{'a':'/1/2/3/4/5/wrong1', 'b':'/x/y/x/u/t/wrong2'}
I would like parse the value, and replace 'wrong*' with 'right'. 'Right' is always the same, whereas 'wrong' is different each time.
'wrong' looks e.g. like that: folder_x/folder_y/somelongfilename.gz
'right' looks like that: *-AAA/debug.out
so afterwards my dictionary should look like that:
mydict{'a':'/1/2/3/4/5/right', 'b':'/x/y/x/u/right'}
Just replacing the value won't work here because I want to parse the value and replace only the last part of it. It is important to keep the first part of the value.
Does anyone have an idea how to solve this.
Thank you.
You could use a re.sub to handle the replacement for you
>>> import re
>>> {k : re.sub('wrong', 'right', v) for k,v in mydict.items()}
{'b': '/x/y/x/u/t/right2',
'a': '/1/2/3/4/5/right1'}
Related
I know that maybe the title of the question is not the most intuitive one, but I could not think of better way to describe it in short and here is what I actually mean.
I want to write some small parser, that would build a dictionary of kwargs out of string that I specify.
Here is an example:
string_of_kwargs = 'n=6,m=10'
graph_kwargs = {pair.split('=')[0]:pair.split('=')[1]
for pair in string_of_kwargs.split(',')}
And the output is:
{'n': '6', 'm': '10'}
The problem is that in the code above I had to use pair.split('=') twice
and I wonder if there is some way to go around it in case I had to unpack more values like this in future.
Sure:
>>> dict(pair.split('=', 1) for pair in string_of_kwargs.split(','))
{'n': '6', 'm': '10'}
Why the 1 as second argument of split()? That's in case there are more than one '=' sign. There is more to do to make this bullet-proof, though, but this is beyond the scope of the question.
You can hackily use a nested for-clause for the binding to a name by iterating over a single-element list like this:
graph_kwargs = {
k:v for pair in string_of_kwargs.split(',')
for k,v in [pair.split('=')]
}
Note, I call it hackey, but it was apparently idiomatic enough to be worthy of a bespoke optimization in Python 3.9, where it basically gets compiled down to a regular assignment instead of actually creating the intermediate list. You can see this for yourself by playing with dis different versions of the interpreter.
Yet another option (though I would still recommend using dict and a generator):
>>> from operator import methodcaller
>>> kv_split = methodcaller('split', '=', 1)
>>> {k: v for k, v in map(kv_split, string_of_kwargs.split(","))}
{'n': '6', 'm': '10'}
If you know that string_of_kwargs will always be of same format(trusted input) like ',' separated assignment expression. https://realpython.com/python-eval-function/
# convenient(when dict values itself contain '=' or ',') but risky
# This will evaluate the strings also though, '6' -> 6
eval(f'dict({string_of_kwargs})')
from ast import literal_eval
# This will evaluate the strings also though, '6' -> 6
dict((k, literal_eval(v)) for k, v in (pair.split('=') for pair in s.split(',')))
You can use regular expression to split on "," and "=" characters.
Then you can get all the even indexes as keys and odd indexes as values for your dictionary
import re
string_of_kwargs = 'n=6,m=10'
splitted = re.split('=|,', string_of_kwargs) # This will split on = or ,
# Using python list slicing
keys = splitted[0::2] # get all the keys in splitted
values = splitted[1::2] # get all the values in splitted
Using regex findall to split the text, and used it to create the dict pair
import re
string_of_kwargs = 'n=6,m=10'
x=re.findall(r"[\w']+", string_of_kwargs )
your_dict = dict(zip(x[::2], x[1::2]))
I have an ajax POST that sends a dictionary from javascript to my Flask back-end like this:
{'output[0][description]': ['Source File'],
'output[0][input]': ['Some_document.pdf'],
'output[1][description]': ['Name'],
'output[1][input]': ['Ari'],
'output[2][description]': ['Address'],
'output[2][input]': ['12 fake st']}
So I am trying to reorganize it on the back-end to look like this:
['Source File']:['Some_document.pdf'],
['Name']:['Ari],
['Address']:['12 fake st'],
Any ideas?
One problem : You can't use a list as the key of the dict because it's not hashable.
You could use the regular expression module (re) to examine each key to determine if it conforms to the expression
output\[(\d+)\]\[description\]
for each one that does, find the corresponding key
output[$1][input]
put them together in the final dict.
The following is a sketch:
import re
P=re.compile('output\[(\d+)\]\[description\]')
inp = {'output[0][description]': ['Source File'], 'output[0][input]': ['Some_document.pdf'],
'output[1][description]': ['Name'], 'output[1][input]': ['Ari'],
'output[2][description]': ['Address'], 'output[2][input]': ['12 fake st']}
out = {}
for key in inp :
m = P.fullmatch(key)
if m :
out[inp[key][0]] = inp['output['+str(m.group(1))+'][input]'][0]
print(out)
I agree with #Klaus D.'s comment, you need to reorganize your API to use JSONs, that would simplify things but until then the following solution would be a lot faster than using regex and deliver the expected output
i=0
for key,val in inp.items():
if i<3:
print(f"{inp['output['+str(i)+'][description]']}:{inp['output['+str(i)+'][input]']}")
i+=1
Say I have a dictionary like this :
d = {'ben' : 10, 'kim' : 20, 'bob' : 9}
Is there a way to remove a pair like ('bob',9) from the dictionary?
I already know about d.pop('bob') but that will remove the pair even if the value was something other than 9.
Right now the only way I can think of is something like this :
if (d.get('bob', None) == 9):
d.pop('bob')
but is there an easier way? possibly not using if at all
pop also returns the value, so performance-wise (as neglectable as it may be) and readability-wise it might be better to use del.
Other than that I don't think there's something easier/better you can do.
from timeit import Timer
def _del():
d = {'a': 1}
del d['a']
def _pop():
d = {'a': 1}
d.pop('a')
print(min(Timer(_del).repeat(5000, 5000)))
# 0.0005624240000000613
print(min(Timer(_pop).repeat(5000, 5000)))
# 0.0007729860000003086
You want to perform two operations here
1) You want to test the condition d['bob']==9.
2) You want to remove the key along with value if the 1st answer is true.
So we can not omit the testing part, which requires use of if, altogether. But we can certainly do it in one line.
d.pop('bob') if d.get('bob')==9 else None
I currently have a dictionary that looks like this:
{OctetString('Ethernet8/6'): Integer(1),
OctetString('Ethernet8/7'): Integer(2),
OctetString('Ethernet8/8'): Integer(2),
OctetString('Ethernet8/9'): Integer(1),
OctetString('Vlan1'): Integer(2),
OctetString('Vlan10'): Integer(1),
OctetString('Vlan15'): Integer(1),
OctetString('loopback0'): Integer(1),
OctetString('mgmt0'): Integer(1),
OctetString('port-channel1'): Integer(1),
OctetString('port-channel10'): Integer(1),
OctetString('port-channel101'): Integer(1),
OctetString('port-channel102'): Integer(1)}
I want my dictionary to look like this:
{OctetString('Ethernet8/6'): Integer(1),
OctetString('Ethernet8/7'): Integer(2),
OctetString('Ethernet8/8'): Integer(2),
OctetString('Ethernet8/9'): Integer(1)}
I am not sure what is the best way to find these key, value pairs. I really want anything that matches '\Ethernet(\d*)/(\d*)'. However I am not sure the best way to go about this. My main goal is to match all the Ethernet Values and then count them. For example: After I have the dict matching all of Ethernetx/x I want to count the amount of 1's and 2's.
Also, why do I get only Ethernet8/6 when I iterate the dictionary and print, but when I pprint the dictionary I end up with OctetString('Ethernet8/6')?
for k in snmp_comb: print k
Ethernet2/18
Ethernet2/31
Ethernet2/30
Ethernet2/32
Ethernet8/46
This should do it:
new_dict = dict()
for key, value in orig_dict.items():
if 'Ethernet' in str(key):
new_dict[key] = value
When you use print, python calls the __str__ method on the OctetString object, which returns Ethernet8/6. However, I think pprint defaults to printing the object type.
EDIT:
Stefan Pochmann has rightly pointed out below that if 'Ethernet' in will match any string which contains the word Ethernet. The OP did mention using regex in his post to match Ethernet(\d*)/(\d*), so this answer may not be suitable to anyone else looking to solve a similar problem.
(I'll use the same 'Ethernet' in str(key) test as the accepted answer.)
If you want to keep the original dict and have the filtered version as a separate dictionary, I'd use a comprehension:
newdict = {key: value
for key, value in mydict.items()
if 'Ethernet' in str(key)}
If you don't want to keep the original dict, you can also just remove the entries you don't want:
for key in list(mydict):
if 'Ethernet' in str(key):
del mydict[key]
The reason you get "OctetString('...')" is the same as this one:
>>> 'foo'
'foo'
>>> pprint.pprint('foo')
'foo'
>>> print('foo')
foo
The first two tests show you a representation you can use in source code, that's why there are quotes. It's what the repr function gets you. The third test prints the value for normal pleasure, so doesn't add quotes. The "OctetString('...')" is simply such a representation as well, and you can copy&paste it into source code and get actual OctetString objects again, rather than Python string objects. I guess pprint is mostly intended for developing, where it's more useful to get the full repr version.
I've found how to split a delimited string into key:value pairs in a dictionary elsewhere, but I have an incoming string that also includes two parameters that amount to dictionaries themselves: parameters with one or three key:value pairs inside:
clientid=b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0&keyid=987654321&userdata=ip:192.168.10.10,deviceid:1234,optdata:75BCD15&md=AMT-Cam:avatar&playbackmode=st&ver=6&sessionid=&mk=PC&junketid=1342177342&version=6.7.8.9012
Obviously these are dummy parameters to obfuscate proprietary code, here. I'd like to dump all this into a dictionary with the userdata and md keys' values being dictionaries themselves:
requestdict {'clientid' : 'b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0', 'keyid' : '987654321', 'userdata' : {'ip' : '192.168.10.10', 'deviceid' : '1234', 'optdata' : '75BCD15'}, 'md' : {'Cam' : 'avatar'}, 'playbackmode' : 'st', 'ver' : '6', 'sessionid' : '', 'mk' : 'PC', 'junketid' : '1342177342', 'version' : '6.7.8.9012'}
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries? What would the syntax be? If not, I suppose I'll have to split by & and then check & handle splits that contain : but even then I can't figure out the syntax. Can someone help? Thanks!
I basically took Kyle's answer and made it more future-friendly:
def dictelem(input):
parts = input.split('&')
listing = [part.split('=') for part in parts]
result = {}
for entry in listing:
head, tail = entry[0], ''.join(entry[1:])
if ':' in tail:
entries = tail.split(',')
result.update({ head : dict(e.split(':') for e in entries) })
else:
result.update({head: tail})
return result
Here's a two-liner that does what I think you want:
dictelem = lambda x: x if ':' not in x[1] else [x[0],dict(y.split(':') for y in x[1].split(','))]
a = dict(dictelem(x.split('=')) for x in input.split('&'))
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries?
Of course you can, but (a) you probably don't want to, because nested comprehensions beyond two levels tend to get unreadable, and (b) this super-simple syntax won't work for cases like yours, where only some of the data can be turned into a dict.
For example, what should happen with 'PC'? Do you want to make that into {'PC': None}? Or maybe the set {'PC'}? Or the list ['PC']? Or just leave it alone? You have to decide, and write the logic for that, and trying to write it as an expression will make your decision very hard to read.
So, let's put that logic in a separate function:
def parseCommasAndColons(s):
bits = [bit.split(':') for bit in s.split(',')]
try:
return dict(bits)
except ValueError:
return bits
This will return a dict like {'ip': '192.168.10.10', 'deviceid': '1234', 'optdata': '75BCD15'} or {'AMT-Cam': 'avatar'} for cases where each comma-separated component has a colon inside it, but a list like ['1342177342'] for cases where any of them don't.
Even this may be a little too clever; I might make the "is this in dictionary format" check more explicit instead of just trying to convert the list of lists and see what happens.
Either way, how would you put that back into your original comprehension?
Well, you want to call it on the value in the line.split('='). So let's add a function for that:
def parseCommasAndColonsForValue(keyvalue):
if len(keyvalue) == 2:
return keyvalue[0], parseCommasAndColons(keyvalue[1])
else:
return keyvalue
requestDict = dict(parseCommasAndColonsForValue(line.split('='))
for line in clientRequest.split('&'))
One last thing: Unless you need to run on older versions of Python, you shouldn't often be calling dict on a generator expression. If it can be rewritten as a dictionary comprehension, it will almost certainly be clearer that way, and if it can't be rewritten as a dictionary comprehension, it probably shouldn't be a 1-liner expression in the first place.
Of course breaking expressions up into separate expressions, turning some of them into statements or even functions, and naming them does make your code longer—but that doesn't necessarily mean worse. About half of the Zen of Python (import this) is devoted to explaining why. Or one quote from Guido: "Python is a bad language for code golf, on purpose."
If you really want to know what it would look like, let's break it into two steps:
>>> {k: [bit2.split(':') for bit2 in v.split(',')] for k, v in (bit.split('=') for bit in s.split('&'))}
{'clientid': [['b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0']],
'junketid': [['1342177342']],
'keyid': [['987654321']],
'md': [['AMT-Cam', 'avatar']],
'mk': [['PC']],
'playbackmode': [['st']],
'sessionid': [['']],
'userdata': [['ip', '192.168.10.10'],
['deviceid', '1234'],
['optdata', '75BCD15']],
'ver': [['6']],
'version': [['6.7.8.9012']]}
That illustrates why you can't just add a dict call for the inner level—because most of those things aren't actually dictionaries, because they had no colons. If you changed that, then it would just be this:
{k: dict(bit2.split(':') for bit2 in v.split(',')) for k, v in (bit.split('=') for bit in s.split('&'))}
I don't think that's very readable, and I doubt most Python programmers would. Reading it 6 months from now and trying to figure out what I meant would take a lot more effort than writing it did.
And trying to debug it will not be fun. What happens if you run that on your input, with missing colons? ValueError: dictionary update sequence element #0 has length 1; 2 is required. Which sequence? No idea. You have to break it down step by step to see what doesn't work. That's no fun.
So, hopefully that illustrates why you don't want to do this.