Changing dictionary keys with a function - python

Suppose a dictionary like
d1 = {
'a b':1,
'c-d':2,
'Ef':3
}
I want to run a function that renames all the keys according to some rules, for example lowercasing and changing spaces and - to _. So as to get the result:
d2 = {
'a_b':1,
'c_d':2,
'ef':3
}
The difference between this question and the other similar questions on this site about renaming dictionary keys, is that here we don't know the columns beforehand. So we want to run a renaming function on all the keys (to normalize them or something like that).

Declare such a function, then use a dictionary comprehension (search for "dict comprehensions") to loop over the dict items and call that transformer.
def transform_key(key):
return key.lower().replace(" ", "_").replace("-", "_")
d1 = {
'a b':1,
'c-d':2,
'Ef':3
}
d2 = {transform_key(key): value for (key, value) in d1.items()}
print(d2)
outputs
{'a_b': 1, 'c_d': 2, 'ef': 3}

Related

How to sum equal key values when inserting them into a new dictionary in Python?

I have the dictionary that I got from a .txt file.
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1,
}
I would like to generate a new dictionary called dictTwo with the sum of values of equal keys. Result:
dictTwo = {
"AAA": 3,
"BBB": 2,
}
I prepared the following code, but it points to error syntax (SyntaxError: invalid syntax):
import json
dictOne = json.loads(text)
dictTwo = {}
for k, v in dictOne.items():
dictTwo [k] = v += v
Can anyone help me what error?
Assuming you resolve the duplicate key issue in dict
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}
dictTwo = {
"AAA": 3,
"BBB": 2,
}
for k, v in dictOne.items():
if k in dictTwo:
dictTwo [k] += v
else:
dictTwo[k] = v
print(dictTwo)
You can do this if you do it while reading the JSON input.
JSON permits duplicate keys in objects, although it discourages the practice, noting that different JSON processors produce different results for duplicate keys.
Python does not allow duplicate keys in dictionaries, and Python's json module handles duplicate keys in one of the ways noted by the JSON standard: it ignores all but the last value for any such key. However, it gives you a mechanism to do your own processing of objects, in case you want to do something else with duplicate keys (or produce something other than a dictionary).
You do this by providing the object_pairs_hook parameter to json.load or json.loads. That parameter should be a function whose argument is an iterable of (key, value) pairs, where the key is a string and the value is an already processed JSON object. Whatever the function returns will be the value used by json.load for an object literal; it does not need to return a dict.
That implies that the handling of duplicate keys will be the same for every object literal in the JSON input, which is a bit of a limitation, but it may be acceptable in your case.
Here's a simple example:
import json
def key_combiner(pairs):
rv = {}
for k, v in pairs:
if k in rv: rv[k] += v
else: rv[k] = v
return rv
# Sample usage:
# (Note: JSON doesn't allow trailing commas in objects or lists.)
json_data = '''{
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}'''
consolidated = json.loads(json_data, object_pairs_hook=key_combiner)
print(consolidated)
This prints {'AAA': 3, 'BBB': 2}.
If I'd known that the values were numbers, I could have used a slightly simpler definition using defaultdict. Writing it the way I did permits combining certain other value types, such as strings or arrays, provided that all the values for the same key in an object are the same type. (Unfortunately, it doesn't allow combining objects, because Python uses | to combine two dicts, instead of +.)
This feature was mostly intended to be used for creating class instances from json objects, but it has many other possible uses.

LEFT JOIN dictionaries in python based on value

#Input
dict_1 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.250"}}
dict_2 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.252"}}
#Mapper can be modified as required
mapper = {"10.10.210.250":"black","192.168.2.1":"black"}
I am getting each dict in a loop, in each iteration I need to check a dict against the mapper and append a flag based on match between dict_1.orig_h and mapper.10.10.210.250 . I have the flexibility to define the mapper however I need.
So the desired result would be:
dict_1 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.250", "class":"black"}}
dict_2 will remain unchanged since there is no matching value in mapper.
This is kinda what I want, but it works only if orig_h is an int
import collections
result = collections.defaultdict(dict)
for d in dict_1:
result[d[int('orig_h')]].update(d)
for d in mapper:
result[d[int('orig_h')]].update(d)
Not much explaining to be done; if the ip is in the mapper dictionary (if mapper has a key which is that ip) then set the desired attribute of the dict to the value of the key in the mapper dict ('black' here).
def update_dict(dic, mapper):
ip = dic['conn']['orig_h']
if ip in mapper:
dic['conn']['class'] = mapper[ip]
which works exactly as desired:
>>> update_dict(dict_1, mapper)
>>> dict_1
{'conn': {'ts': 15, 'uid': 'ABC', 'orig_h': '10.10.210.250', 'class': 'black'}}
>>> update_dict(dict_2, mapper)
>>> dict_2
{'conn': {'ts': 15, 'uid': 'ABC', 'orig_h': '10.10.210.252'}}
Extracting the conn value for simplicity:
conn_data = dict_1['conn']
conn_data['class'] = mapper[conn_data['orig_h']]
A two liner, extracting class and dict if the 'orig_h' is in the mapper dictionary's keys, if it id, keep it, otherwise don't keep it, then create a new dictionary comprehension inside the list comprehension to add 'class' to the dictionary's 'conn' key's dictionary.
l=[(i,mapper[i['conn']['orig_h']]) for i in (dict_1,dict_2) if i['conn']['orig_h'] in mapper]
print([{'conn':dict(a['conn'],**{'class':b})} for a,b in l])
BTW this answer chooses the dictionaries automatically

Recursively replace characters in a dictionary

How do I change all dots . to underscores (in the dict's keys), given an arbitrarily nested dictionary?
What I tried is write two loops, but then I would be limited to 2-level-nested dictionaries.
This ...
{
"brown.muffins": 5,
"green.pear": 4,
"delicious.apples": {
"green.apples": 2
{
}
... should become:
{
"brown_muffins": 5,
"green_pear": 4,
"delicious_apples": {
"green_apples": 2
{
}
Is there an elegant way?
You can write a recursive function, like this
from collections.abc import Mapping
def rec_key_replace(obj):
if isinstance(obj, Mapping):
return {key.replace('.', '_'): rec_key_replace(val) for key, val in obj.items()}
return obj
and when you invoke this with the dictionary you have shown in the question, you will get a new dictionary, with the dots in keys replaced with _s
{'delicious_apples': {'green_apples': 2}, 'green_pear': 4, 'brown_muffins': 5}
Explanation
Here, we just check if the current object is an instance of dict and if it is, then we iterate the dictionary, replace the key and call the function recursively. If it is actually not a dictionary, then return it as it is.
Assuming . is only present in keys and all the dictionary's contents are primitive literals, the really cheap way would be to use str() or repr(), do the replacement, then ast.literal_eval() to get it back:
d ={
"brown.muffins": 5,
"green.pear": 4,
"delicious_apples": {
"green.apples": 2
} # correct brace
}
Result:
>>> import ast
>>> ast.literal_eval(repr(d).replace('.','_'))
{'delicious_apples': {'green_apples': 2}, 'green_pear': 4, 'brown_muffins': 5}
If the dictionary has . outside of keys, we can replace more carefully by using a regular expression to look for strings like 'ke.y': and replace only those bits:
>>> import re
>>> ast.literal_eval(re.sub(r"'(.*?)':", lambda x: x.group(0).replace('.','_'), repr(d)))
{'delicious_apples': {'green_apples': 2}, 'green_pear': 4, 'brown_muffins': 5}
If your dictionary is very complex, with '.' in values and dictionary-like strings and so on, use a real recursive approach. Like I said at the start, though, this is the cheap way.

Initializing a dictionary in python with a key value and no corresponding values

I was wondering if there was a way to initialize a dictionary in python with keys but no corresponding values until I set them. Such as:
Definition = {'apple': , 'ball': }
and then later i can set them:
Definition[key] = something
I only want to initialize keys but I don't know the corresponding values until I have to set them later. Basically I know what keys I want to add the values as they are found. Thanks.
Use the fromkeys function to initialize a dictionary with any default value. In your case, you will initialize with None since you don't have a default value in mind.
empty_dict = dict.fromkeys(['apple','ball'])
this will initialize empty_dict as:
empty_dict = {'apple': None, 'ball': None}
As an alternative, if you wanted to initialize the dictionary with some default value other than None, you can do:
default_value = 'xyz'
nonempty_dict = dict.fromkeys(['apple','ball'],default_value)
You could initialize them to None.
you could use a defaultdict. It will let you set dictionary values without worrying if the key already exists. If you access a key that has not been initialized yet it will return a value you specify (in the below example it will return None)
from collections import defaultdict
your_dict = defaultdict(lambda : None)
It would be good to know what your purpose is, why you want to initialize the keys in the first place. I am not sure you need to do that at all.
1) If you want to count the number of occurrences of keys, you can just do:
Definition = {}
# ...
Definition[key] = Definition.get(key, 0) + 1
2) If you want to get None (or some other value) later for keys that you did not encounter, again you can just use the get() method:
Definition.get(key) # returns None if key not stored
Definition.get(key, default_other_than_none)
3) For all other purposes, you can just use a list of the expected keys, and check if the keys found later match those.
For example, if you only want to store values for those keys:
expected_keys = ['apple', 'banana']
# ...
if key_found in expected_keys:
Definition[key_found] = value
Or if you want to make sure all expected keys were found:
assert(all(key in Definition for key in expected_keys))
You can initialize the values as empty strings and fill them in later as they are found.
dictionary = {'one':'','two':''}
dictionary['one']=1
dictionary['two']=2
Comprehension could be also convenient in this case:
# from a list
keys = ["k1", "k2"]
d = {k:None for k in keys}
# or from another dict
d1 = {"k1" : 1, "k2" : 2}
d2 = {k:None for k in d1.keys()}
d2
# {'k1': None, 'k2': None}
q = input("Apple")
w = input("Ball")
Definition = {'apple': q, 'ball': w}
Based on the clarifying comment by #user2989027, I think a good solution is the following:
definition = ['apple', 'ball']
data = {'orange':1, 'pear':2, 'apple':3, 'ball':4}
my_data = {}
for k in definition:
try:
my_data[k]=data[k]
except KeyError:
pass
print my_data
I tried not to do anything fancy here. I setup my data and an empty dictionary. I then loop through a list of strings that represent potential keys in my data dictionary. I copy each value from data to my_data, but consider the case where data may not have the key that I want.

Multiple keys per value

Is it possible to assign multiple keys per value in a Python dictionary. One possible solution is to assign value to each key:
dict = {'k1':'v1', 'k2':'v1', 'k3':'v1', 'k4':'v2'}
but this is not memory efficient since my data file is > 2 GB. Otherwise you could make a dictionary of dictionary keys:
key_dic = {'k1':'k1', 'k2':'k1', 'k3':'k1', 'k4':'k4'}
dict = {'k1':'v1', 'k4':'v2'}
main_key = key_dict['k2']
value = dict[main_key]
This is also very time and effort consuming because I have to go through whole dictionary/file twice. Is there any other easy and inbuilt Python solution?
Note: my dictionary values are not simple string (as in the question 'v1', 'v2') rather complex objects (contains different other dictionary/list etc. and not possible to pickle them)
Note: the question seems similar as How can I use both a key and an index for the same dictionary value?
But I am not looking for ordered/indexed dictionary and I am looking for other efficient solutions (if any) other then the two mentioned in this question.
What type are the values?
dict = {'k1':MyClass(1), 'k2':MyClass(1)}
will give duplicate value objects, but
v1 = MyClass(1)
dict = {'k1':v1, 'k2':v1}
results in both keys referring to the same actual object.
In the original question, your values are strings: even though you're declaring the same string twice, I think they'll be interned to the same object in that case
NB. if you're not sure whether you've ended up with duplicates, you can find out like so:
if dict['k1'] is dict['k2']:
print("good: k1 and k2 refer to the same instance")
else:
print("bad: k1 and k2 refer to different instances")
(is check thanks to J.F.Sebastian, replacing id())
Check out this - it's an implementation of exactly what you're asking: multi_key_dict(ionary)
https://pypi.python.org/pypi/multi_key_dict
(sources at https://github.com/formiaczek/python_data_structures/tree/master/multi_key_dict)
(on Unix platforms it possibly comes as a package and you can try to install it with something like:
sudo apt-get install python-multi-key-dict
for Debian, or an equivalent for your distribution)
You can use different types for keys but also keys of the same type. Also you can iterate over items using key types of your choice, e.g.:
m = multi_key_dict()
m['aa', 12] = 12
m['bb', 1] = 'cc and 1'
m['cc', 13] = 'something else'
print m['aa'] # will print '12'
print m[12] # will also print '12'
# but also:
for key, value in m.iteritems(int):
print key, ':', value
# will print:1
# 1 : cc and 1
# 12 : 12
# 13 : something else
# and iterating by string keys:
for key, value in m.iteritems(str):
print key, ':', value
# will print:
# aa : 12
# cc : something else
# bb : cc and 1
m[12] = 20 # now update the value
print m[12] # will print '20' (updated value)
print m['aa'] # will also print '20' (it maps to the same element)
There is no limit to number of keys, so code like:
m['a', 3, 5, 'bb', 33] = 'something'
is valid, and either of keys can be used to refer to so-created value (either to read / write or delete it).
Edit: From version 2.0 it should also work with python3.
Using python 2.7/3 you can combine a tuple, value pair with dictionary comprehension.
keys_values = ( (('k1','k2'), 0), (('k3','k4','k5'), 1) )
d = { key : value for keys, value in keys_values for key in keys }
You can also update the dictionary similarly.
keys_values = ( (('k1',), int), (('k3','k4','k6'), int) )
d.update({ key : value for keys, value in keys_values for key in keys })
I don't think this really gets to the heart of your question but in light of the title, I think this belongs here.
The most straightforward way to do this is to construct your dictionary using the dict.fromkeys() method. It takes a sequence of keys and a value as inputs and then assigns the value to each key.
Your code would be:
dict = dict.fromkeys(['k1', 'k2', 'k3'], 'v1')
dict.update(dict.fromkeys(['k4'], 'v2'))
And the output is:
print(dict)
{'k1': 'v1', 'k2': 'v1', 'k3': 'v1', 'k4': 'v2'}
You can build an auxiliary dictionary of objects that were already created from the parsed data. The key would be the parsed data, the value would be your constructed object -- say the string value should be converted to some specific object. This way you can control when to construct the new object:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
obj = existing.setdefault(v, MyClass(v)) # could be made more efficient
result[k] = obj
Then all the result dictionary duplicate value objects will be represented by a single object of the MyClass class. After building the result, the existing auxiliary dictionary can be deleted.
Here the dict.setdefault() may be elegant and brief. But you should test later whether the more talkative solution is not more efficient -- see below. The reason is that MyClass(v) is always created (in the above example) and then thrown away if its duplicate exists:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
if v in existing:
obj = existing[v]
else:
obj = MyClass(v)
existing[v] = obj
result[k] = obj
This technique can be used also when v is not converted to anything special. For example, if v is a string, both key and value in the auxiliary dictionary will be of the same value. However, the existence of the dictionary ensures that the object will be shared (which is not always ensured by Python).
I was able to achieve similar functionality using pandas MultiIndex, although in my case the values are scalars:
>>> import numpy
>>> import pandas
>>> keys = [numpy.array(['a', 'b', 'c']), numpy.array([1, 2, 3])]
>>> df = pandas.DataFrame(['val1', 'val2', 'val3'], index=keys)
>>> df.index.names = ['str', 'int']
>>> df.xs('b', axis=0, level='str')
0
int
2 val2
>>> df.xs(3, axis=0, level='int')
0
str
c val3
I'm surprised no one has mentioned using Tuples with dictionaries. This works just fine:
my_dictionary = {}
my_dictionary[('k1', 'k2', 'k3')] = 'v1'
my_dictionary[('k4')] = 'v2'

Categories