I have a dictionary my_dict having some elements like:
my_dict = {
'India':'Delhi',
'Canada':'Ottawa',
}
Now I want to add multiple dictionary key-value pair to a dict like:
my_dict = {
'India': 'Delhi',
'Canada': 'Ottawa',
'USA': 'Washington',
'Brazil': 'Brasilia',
'Australia': 'Canberra',
}
Is there any possible way to do this?
Because I don't want to add elements one after the another.
Use update() method.
d= {'India':'Delhi','Canada':'Ottawa'}
d.update({'USA':'Washington','Brazil':'Brasilia','Australia':'Canberra'})
PS: Naming your dictionary as dict is a horrible idea. It replaces the built in dict.
To make things more interesting in this answer section, you can
add multiple dictionary key-value pair to a dict
by doing so (In Python 3.5 or greater):
d = {'India': 'Delhi', 'Canada': 'Ottawa'}
d = {**d, 'USA': 'Washington', 'Brazil': 'Brasilia', 'Australia': 'Canberra', 'India': 'Blaa'}
Which produces an output:
{'India': 'Blaa', 'Canada': 'Ottawa', 'USA': 'Washington', 'Brazil': 'Brasilia', 'Australia': 'Canberra'}
This alternative doesn't even seem memory inefficient. Which kind-a comes as a contradiction to one of "The Zen of Python" postulates,
There should be one-- and preferably only one --obvious way to do it
What I didn't like about the d.update() alternative are the round brackets, when I skim read and see round brackets, I usually think tuples.
Either way, added this answer just to have some fun.
you have a few options:
use update():
d= {'India':'Delhi','Canada':'Ottawa'}
d.update({'USA':'Washington','Brazil':'Brasilia','Australia':'Canberra'})
use merge:
d= {'India':'Delhi','Canada':'Ottawa'}
d2 = {'USA':'Washington','Brazil':'Brasilia','Australia':'Canberra'}
new_dict = d| d2
The update() method works well
As someone who primarily works in pandas data frames, I wanted to share how you can take values from a data frame and add them to a dictionary using update() and pd.to_dict().
import pandas as pd
Existing dictionary
my_dict = {
'India':'Delhi',
'Canada':'Ottawa',
}
Data frame with additional values you want to add to dictionary
country_index = ['USA','Brazil','Australia']
city_column = ['Washingon','Brasilia','Canberra']
new_values_df = pd.DataFrame(data=city_column, index=country_index, columns=['cities'])
Adding data frame values to dictionary
my_dict.update(new_values_df.to_dict(orient='dict')['cities'])
Dictionary now looks like
my_dict = {
'India': 'Delhi',
'Canada': 'Ottawa',
'USA': 'Washington',
'Brazil': 'Brasilia',
'Australia': 'Canberra',
}
Related
I have a single list in Python like this:
my_list = ['name', 'degree', 'age', 'score']
and would like to convert it into a dictionary that should look like this, where
keys and value are taken from my_list
my_dict = {'name': name, 'degree': degree, 'age' : age, 'score': score}
I found a lot of examples how to convert lists, especially two lists into a dictionary, but nothing for my case.
Use a dictionary comprehension that looks up the variable names in globals()
my_dict = {var: globals()[var] for var in my_list}
Using a dictionary comprehension would probably be the easiest approach:
my_dict = {i:i for i in my_list}
you can use zip with the dictionary constructor:
dict(zip(my_list,my_list))
{'name': 'name', 'degree': 'degree', 'age': 'age', 'score': 'score'}
I have a very big dictionary with keys containing a list of items, these are unordered. I would like to group certain elements in a new key. For example
input= [{'name':'emp1','state':'TX','areacode':'001','mobile':123},{'name':'emp1','state':'TX','areacode':'002','mobile':234},{'name':'emp1','state':'TX','areacode':'003','mobile':345},{'name':'emp2','state':'TX','areacode':None,'mobile':None},]
for above input i would like to group areacode and mobile in a new key contactoptions
opdata = [{'name':'emp1','state':'TX','contactoptions':[{'areacode':'001','mobile':123},{'areacode':'002','mobile':234},{'areacode':'003','mobile':345}]},{'name':'emp2','state':'TX','contactoptions':[{'areacode':None,'mobile':None}]}]
i am doing this now with a two long iterations. i wanted to achieve the same more efficiently as the number of records are large. open to using existing methods if available in packages like pandas.
Try
result = (
df.groupby(['name', 'state'])
.apply(lambda x: x[['areacode', 'mobile']].to_dict(orient='records'))
.reset_index(name='contactoptions')
).to_dict(orient='records')
With regular dictionaries, you can do it in a single pass/loop using the setdefault method and no sorting:
data = [{'name':'emp1','state':'TX','areacode':'001','mobile':123},{'name':'emp1','state':'TX','areacode':'002','mobile':234},{'name':'emp1','state':'TX','areacode':'003','mobile':345},{'name':'emp2','state':'TX','areacode':None,'mobile':None}]
merged = dict()
for d in data:
od = merged.setdefault(d["name"],{k:d[k] for k in ("name","state")})
od.setdefault("contactoptions",[]).append({k:d[k] for k in ("areacode","mobile")})
merged = list(merged.values())
output:
print(merged)
# [{'name': 'emp1', 'state': 'TX', 'contactoptions': [{'areacode': '001', 'mobile': 123}, {'areacode': '002', 'mobile': 234}, {'areacode': '003', 'mobile': 345}]}, {'name': 'emp2', 'state': 'TX', 'contactoptions': [{'areacode': None, 'mobile': None}]}]
As you asked, you want to group the input items by 'name' and 'state' together.
My suggestion is, you can make a dictionary which keys will be 'name' plus 'state' such as 'emp1-TX' and values will be list of 'areacode' and 'mobile' such as [{'areacode':'001','mobile':123}]. In this case, the output can be achieved in one iteration.
Output:
{'emp1-TX': [{'areacode':'001','mobile':123}, {'areacode':'001','mobile':123}, {'areacode':'003','mobile':345}], 'emp2-TX': [{'areacode':None,'mobile':None}]}
I have a pandas series whose unique values are something like:
['toyota', 'toyouta', 'vokswagen', 'volkswagen,' 'vw', 'volvo']
Now I want to fix some of these values like:
toyouta -> toyota
(Note that not all values have mistakes such as volvo, toyota etc)
I've tried making a dictionary where key is the correct word and value is the word to be corrected and then map that onto my series.
This is how my code looks:
corrections = {'maxda': 'mazda', 'porcshce': 'porsche', 'toyota': 'toyouta', 'vokswagen': 'vw', 'volkswagen': 'vw'}
df.brands = df.brands.map(corrections)
print(df.brands.unique())
>>> [nan, 'mazda', 'porsche', 'toyouta', 'vw']
As you can see the problem is that this way, all values not present in the dictionary are automatically converted to nan. One solution is to map all the correct values to themselves, but I was hoping there could be a better way to go about this.
Use:
df.brands = df.brands.map(corrections).fillna(df.brands)
Or:
df.brands = df.brands.map(lambda x: corrections.get(x, x))
Or:
df.brands = df.brands.replace(corrections)
My codebase relies on managing data thats currently in a very deeply nested dictionary. Example:
'USA': {
'Texas': {
'Austin': {
'2017-01-01': 169,
'2017-02-01': 231
},
'Houston': {
'2017-01-01': 265,
'2017-02-01': 310
}
}
This extends for multiple countries, states/regions, cities, and dates.
I encounter a problem when trying to access values since I need to have a deeply nested for-loop to iterate over each country, state, city, and date to apply some kind of operation. I'm looking for some kind of alternative.
Assuming the nested dict structure is the same, is there an alternative to so many loops? Perhaps using map, reduce or lambda?
Is there a better way to store all of this data without using nested dicts?
You can use a Pandas DataFrame object (Pandas Dataframe Documentation), that can store your data in a tabular format, similar to a spreadsheet. In that case, your DataFrame should have a column to represent each key in your nested data (one column for Country, another for State, and so on).
Pandas DataFrames also accounts for filtering, grouping and another useful operations based on your records (rows) for each column. Let's say you want to filter your data to return only the rows from Texas that happened after '2018-02-01' (df is your DataFrame). This could be achieved with something like this:
df[df['State'] == 'Texas' & df['Date'] > '2018-02-01']
To build these DataFrame objects, you could start from your data formatted as a collection of records:
data = [['USA', 'Texas', 'Austin', '2017-01-01', 169],
['USA', 'Texas', 'Austin', '2017-02-01', 231],
['USA', 'Texas', 'Houston', '2017-01-01', 265],
['USA', 'Texas', 'Houston', '2017-02-01', 310]]
and then build them like this:
df = DataFrame(data, columns=['Country', 'State', 'City', 'Date', 'Value'])
If DataFrame objects are not an option, and you do not want to use nested loops, you could also access inner data using list comprehensions with nested predicates and filters:
[
d[country][state][city][date]
for country in d.keys()
for state in d[country].keys()
for city in d[country][state].keys()
for date in d[country][state][city].keys()
if country == 'USA' and state == 'Texas' and city == 'Houston'
]
However, I can not see much difference in that approach over the nested loops, and there is a penalty in code readability, imho.
Using the collection of records approach pointed earlier (data), instead of a nested structure, you could filter your rows using:
[r for r in data if r[2] == 'Houston']
For improved readability, you could use a list of namedtuple objects as your list of records. Your data would be:
from collections import namedtuple
record = namedtuple('Record', 'country state city date value')
data = [
record('USA', 'Texas', 'Austin', '2017-01-01', 169),
record('USA', 'Texas', 'Austin', '2017-02-01', 231),
record('USA', 'Texas', 'Houston', '2017-01-01', 265),
record('USA', 'Texas', 'Houston', '2017-02-01', 310)
]
and your filtering would be improved, eg.:
Getting specific records
[r for r in data if r.city == 'Houston']
returning
[
Record(country='USA', state='Texas', city='Houston', date='2017-01-01', value=265),
Record(country='USA', state='Texas', city='Houston', date='2017-02-01', value=310)
]
Getting only the values for those specific records
[r.value for r in data if r.city == 'Houston']
returning
[265, 310]
This last approach can also deal with custom object instances, considering that namedtuple objects can store them easily.
You can create a class, implementing overloading methods, and use recursion:
d = {'USA': {
'Texas': {
'Austin': {
'2017-01-01': 169,
'2017-02-01': 231
},
'Houston': {
'2017-01-01': 265,
'2017-02-01': 310
}
}
}
}
class StateData:
def __init__(self, structure):
self.structure = structure
self.levels = {'country':0, 'state':1, 'city':2, 'date':3}
def get_level(self, d, target, current= 0):
total_listing = [((a, b) if target == 3 else a) if current == target else self.get_level(b, target, current + 1) for a, b in d.items()]
return [i for b in total_listing for i in b] if all(isinstance(i, list) for i in total_listing) else total_listing
def __getitem__(self, val):
return self.get_level(self.structure, self.levels[val])
s = StateData(d)
print(s['city'])
print(s['date'])
Output:
['Austin', 'Houston']
[('2017-01-01', 169), ('2017-02-01', 231), ('2017-01-01', 265), ('2017-02-01', 310)]
It may be best to store your data as a list of lists, which will then make it possible for you to group according to the needs of each individual operation. For instance:
state_data = [['USA', 'Texas', 'Austin', '2017-01-01', 169],
['USA', 'Texas', 'Austin', '2017-02-01', 231],
['USA', 'Houston', '2017-01-01', 265],
['USA', 'Houston', '2017-02-01', 310]]
There's also this.
It allows you to flatten a dict into a pandas dataframe like this:
from pandas.io.json import json_normalize
d = json.load(f)
# Parent node of dict d is 'programs'
n = json_normalize(d['programs'])
I am working with a set of data that I have converted to a list of dictionaries
For example one item in my list is
{'reportDate': u'R20070501', 'idnum': u'1078099', 'columnLabel': u'2005',
'actionDate': u'C20070627', 'data': u'76,000', 'rowLabel': u'Sales of Bananas'}
Per request
The second item in my list could be:
{'reportDate': u'R20070501', 'idnum': u'1078099', 'columnLabel': u'2006',
'actionDate': u'C20070627', 'data': u'86,000', 'rowLabel': u'Sales of Bananas'}
The third item could be:
{'reportDate': u'R20070501', 'idnum': u'1078100', 'columnLabel': u'Full Year 2005',
'actionDate': u'C20070627', 'data': u'116,000', 'rowLabel': u'Sales of Cherries'}
The fourth item could be:
{'reportDate': u'R20070501', 'idnum': u'1078100', 'columnLabel': u'Full Year 2006',
'actionDate': u'C20070627', 'data': u'76,000', 'rowLabel': u'Sales of Sales of Cherries'}
The reason I need to pickle this is because I need to find out all of the ways the columns were labeled before I consolidate the results and put them into a database. The first and second items will be one row in the results, the third and fourth would be the next line in the results (after someone decides what the uniform column header label should be)
I tested pickle and was able to save and retrieve my data. However, I need to be able to preserve the order in the output. One idea I have is to add another key that would be a counter so I could retrieve my data and then sort by the counter. Is there a better way?
I don't want to put this into a database because it is not permanent.
I marked an answer down below. It is not what I am getting, so I need to figure out if the problem is somewhere else in my code.
So what's wrong with pickle? If you structure your data as a list of dicts, then everything should work as you want it to (if I understand your problem).
>>> import pickle
>>> d1 = {1:'one', 2:'two', 3:'three'}
>>> d2 = {1:'eleven', 2:'twelve', 3:'thirteen'}
>>> d3 = {1:'twenty-one', 2:'twenty-two', 3:'twenty-three'}
>>> data = [d1, d2, d3]
>>> out = open('data.pickle', 'wb')
>>> pickle.dump(data, out)
>>> out.close()
>>> input = open('data.pickle')
>>> data2 = pickle.load(input)
>>> data == data2
True
The Python dict is an unordered container. If you need to preserve the order of the entries, you should consider using a list of 2-tuples.
Another option would be to keep an extra, ordered list of the keys. This way you can benefit from the quick, keyed access offered by the dictionary, while still being able to iterate through its values in an ordered fashion:
data = {'reportDate': u'R20070501', 'idnum': u'1078099',
'columnLabel': u'2005', 'actionDate': u'C20070627',
'data': u'76,000', 'rowLabel': u'Sales of Bananas'}
dataOrder = ['reportDate', 'idnum', 'columnLabel',
'actionDate', 'data', 'rowLabel']
for key in dataOrder:
print key, data[key]
Python does not retain order in dictionaries.
However, there is the OrderedDict class in the collections module.
Another option would be to use a list of tuples:
[('reportDate', u'R20080501'), ('idnum', u'1078099'), ...etc]
You can use the built in dict() if you need to convert this to a dictionary later.