Python Loop to Create Dictionary - python

I need to create lookup tables in python from a csv. I have to do this, though, by unique values in my columns. The example is attached. I have a name column that is the name of the model. For reach model, I need a dictionary with the title from the variable column, the key from the level column and value from the value column. I'm thinking the best thing is a dictionary of dictionaries. I will use this look up table in the future to multiply the values together based on the keys.
Here is code to generate sample data set:
Name = ['model1', 'model1', 'model1', 'model2', 'model2',
'model2','model1', 'model1', 'model1', 'model1', 'model2', 'model2',
'model2','model2']
Variable = ['channel_model','channel_model','channel_model','channel_model','channel_model','channel_model', 'driver_age', 'driver_age', 'driver_age', 'driver_age',
'driver_age', 'driver_age', 'driver_age', 'driver_age']
channel_Level = ['Dir', 'IA', 'EA','Dir', 'IA', 'EA', '21','22','23','24', '21','22','23','24']
Value = [1.11,1.18,1.002, 2.2, 2.5, 2.56, 1.1,1.2,1.3,1.4,2.1,2.2,2.3,2.4]
df= {'Name': Name, 'Variable': Variable, 'Level': channel_Level, 'Value':Value}
factor_table = pd.DataFrame(df)
I have read the following but it hasn't yielded great results:
Python Creating Dictionary from excel data
I've also tried:
import pandas as pd
factor_table = pd.read_excel('...\\factor_table_example.xlsx')
#define function to be used multiple times
def factor_tables(file, model_column, variable_column, level_column, value_column):
for i in file[model_column]:
for row in file[variable_column]:
lookup = {}
lookup = dict(zip(file[level_column], file[value,column]))
This yields the error:
`dict expected at most 1 arguments, got 2
What I would ultimately like is:
{{'model2':{'channel':{'EA':1.002, 'IA': 1.18, 'DIR': 1.11}}}, {'model1'::{'channel':{'EA':1.86, 'IA': 1.66, 'DIR': 1.64}}}}

Using collections.defaultdict, you can create a nested dictionary while iterating your dataframe. Then realign into a list of dictionaries via a list comprehension.
from collections import defaultdict
tree = lambda: defaultdict(tree)
d = tree()
for row in factor_table.itertuples(index=False):
d[(row.Name, row.Variable)].update({row.Level: row.Value})
res = [{k[0]: {k[1]: dict(v)}} for k, v in d.items()]
print(res)
[{'model1': {'channel_model': {'Dir': 1.110, 'EA': 1.002, 'IA': 1.180}}},
{'model2': {'channel_model': {'Dir': 2.200, 'EA': 2.560, 'IA': 2.500}}},
{'model1': {'driver_age': {'21': 1.100, '22': 1.200, '23': 1.300, '24': 1.400}}},
{'model2': {'driver_age': {'21': 2.100, '22': 2.200, '23': 2.300, '24': 2.400}}}]

It looks like your error could be comming from this line:
lookup = dict(zip(file[level_column], file[value,column]))
where file is a dict expecting one key, yet you give it value,column, thus it got two args. The loop you might be looking for is like so
def factor_tables(file, model_column, variable_column, level_column, value_column):
lookup = {}
for i in file[model_column]:
lookup[model_column] = dict(zip(file[level_column], file[value_column]))
return lookup
This will return to you a single dictionary with keys corresponding to individual (and unique) models:
{'model_1':{'level_col': 'val_col'}, 'model_2':...}
Allowing you to use:
lookups.get('model_1')
{'level_col': 'val_col'}
If you need the variable_column, you can wrap it one level deeper:
def factor_tables(file, model_column, variable_column, level_column, value_column):
lookup = {}
for i in file[model_column]:
lookup[model_column] = {variable_column: dict(zip(file[level_column], file[value_column]))}
return lookup

Related

how to group a file into a dictionary without importing

I'm having to make a dictionary from a file that looks like this:
example =
'Computer science', random name, 17
'Computer science', another name, 18
'math', one name, 19
I want the majors to be keys but I'm having trouble grouping them this is what I've tried
dictionary = {}
for i in example_file:
dictionary = {example[0]:{example[1] : example[2]}
the problem with this is that it does turn the lines into a dictionary but one by one instead of having the ones with the same key in one dictionary
this is what its returning:
{computer science; {random name: 17}}
{computer science: {another name: 18}}
{math{one name:19}}
this is how I want it to look
{computer science: {random name: 17, another name: 18}, math:{one name:19}}
how do I group these?
You need to update the dictionary elements, not assign the whole dictionary each time through the loop.
You can use defaultdict(dict) to automatically create the nested dictionaries as needed.
from collections import defaultdict
dictionary = defaultdict(dict)
for subject, name, score in example_file:
dictionary[subject][name] = int(score)
It's a pretty well known problem with an elegant solution, making use of dict's setdefault() method.
dictionary = {}
for example in example_file:
names = dictionary.setdefault(example[0], {})
names[example[1]] = example[2]
print(dictionary)
This code prints:
{'Computer science': {'random name': 17, 'another name': 18}, 'math': {'one name': 19}}
An alternative code:
(but #hhimko 's solution is almost 50 times faster)
import pandas as pd
df = pd.read_csv("file.csv", header=None).sort_values(0).reset_index(drop=True)
result = dict()
major_holder = None
for index, row in tt.iterrows():
if row.iloc[0] != major_holder:
major_holder = row.iloc[0]
result[major_holder] = dict()
result[major_holder][row.iloc[1]] = row.iloc[2]
else:
result[major_holder][row.iloc[1]] = row.iloc[2]
print(result)

Python list of dictionaries aggregate values

Here is an example input:
[{'name':'susan', 'wins': 1, 'team': 'team1'}
{'name':'jack', 'wins':1, 'team':'team2'}
{'name':'susan', 'wins':1, 'team':'team1'}]
Desired output
[{'name':'susan', 'wins':2, 'team': 'team1'}
{'name':'jack', 'wins':1, 'team':'team2'}]
I have lots of the dictionaries and want to only add, the 'win' value, based on the 'name' value,
and keep the 'team' values
I've tried to use Counter, but the result was
{'name':'all the names added toghther',
'wins': 'all the wins added toghther'
}
I was able to use defaultdict which seemed to work
result = defaultdict(int)
for d in data:
result[d['name']] += d['wins'])
but the results was something like
{'susan': 2, 'jack':1}
Here it added the values correctly but didn't keep the 'team' key
I guess I'm confused about defaultdict and how it works.
any help very appreciated.
Did you consider using pandas?
import pandas as pd
dicts = [
{'name':'susan', 'wins': 1, 'team': 'team1'},
{'name':'jack', 'wins':1, 'team':'team2'},
{'name':'susan', 'wins':1, 'team':'team1'},
]
agg_by = ["name", "team"]
df = pd.DataFrame(dicts)
df = df.groupby(agg_by)["wins"].apply(sum)
df = df.reset_index()
aggregated_dict = df.to_dict("records")

How to iterate through this nested dictionary within a list using for loop

I have a list of nested dictionaries that I want to get specific values and put into a dictionary like this:
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'}},
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
I saw previous questions similar to this, but I still can't get the values such as 'axe' and 'red'. I would like the new dict to have a 'Description', 'Confidence' and other columns with the values from the nested dict.
I have tried this for loop:
new_dict = {}
for x in range(len(vid)):
for y in vid[x]['a']:
desc = y['desc']
new_dict['Description'] = desc
I got many errors but mostly this error:
TypeError: string indices must be integers
Can someone please help solve how to get the values from the nested dictionary?
You don't need to iterate through the keys in the dictionary (the inner for-loop), just access the value you want.
vid = [{'a':{'display':'axe', 'desc':'red'}, 'b':{'confidence':'good'} },
{'a':{'display':'book', 'desc':'blue'}, 'b':{'confidence':'poor'}},
{'a':{'display':'apple', 'desc':'green'}, 'b':{'confidence':'good'}}
]
new_dict = {}
list_of_dicts = []
for x in range(len(vid)):
desc = vid[x]['a']['desc']
list_of_dicts.append({'desc': desc})
I have found a temporary solution for this. I decided to use the pandas dataframe instead.
df = pd.DataFrame(columns = ['Desc'])
for x in range(len(vid)):
desc = vid[x]['a']['desc']
df.loc[len(df)] = [desc]
so you want to write this to csv later so pandas will help you a lot for this problem using pandas you can get the desc by
import pandas as pd
new_dict = {}
df = pd.DataFrame(vid)
for index, row in df.iterrows() :
new_dict['description'] = row['a']['desc']
a b
0 {'display': 'axe', 'desc': 'red'} {'confidence': 'good'}
1 {'display': 'book', 'desc': 'blue'} {'confidence': 'poor'}
2 {'display': 'apple', 'desc': 'green'} {'confidence': 'good'}
this is how dataframe looks like a b are column of the dataframe and your nested dicts are rows of dataframe
Try using this list comprehension:
d = [{'Description': i['a']['desc'], 'Confidence': i['b']['confidence']} for i in vid]
print(d)

Number of features in dictionary

I am working on loading a dataset from a pickle file like this
""" Load the dictionary containing the dataset """
with open("final_project_dataset.pkl", "r") as data_file:
data_dict = pickle.load(data_file)
It works fine and loads the data correctly. This is an example of one row:
'GLISAN JR BEN F': {'salary': 274975, 'to_messages': 873, 'deferral_payments': 'NaN', 'total_payments': 1272284, 'exercised_stock_options': 384728, 'bonus': 600000, 'restricted_stock': 393818, 'shared_receipt_with_poi': 874, 'restricted_stock_deferred': 'NaN', 'total_stock_value': 778546, 'expenses': 125978, 'loan_advances': 'NaN', 'from_messages': 16, 'other': 200308, 'from_this_person_to_poi': 6, 'poi': True, 'director_fees': 'NaN', 'deferred_income': 'NaN', 'long_term_incentive': 71023, 'email_address': 'ben.glisan#enron.com', 'from_poi_to_this_person': 52}
Now, how can get the number of features? e.g (salary, to_messages, .... , from_poi_to_this_person) ?
I got this row by printing my whole dataset (print data_dict) and this is one of the results. I want to know how many features are there is general i.e. in the whole dataset without specifying a key in the dictionary.
Thanks
Try this.
no_of_features = len(data_dict[data_dict.keys()[0]])
This will work only if all your keys in data_dict have same number of features.
or simply
no_of_features = len(data_dict['GLISAN JR BEN F'])
""" Load the dictionary containing the dataset """
with open("final_project_dataset.pkl", "r") as data_file:
data_dict = pickle.load(data_file)
print len(data_dict)
I think you want to find out the size of the set of all unique field names used in the row dictionaries. You can find that like this:
data_dict = {
'red':{'alpha':1,'bravo':2,'golf':3,'kilo':4},
'green':{'bravo':1,'delta':2,'echo':3},
'blue':{'foxtrot':1,'tango':2}
}
unique_features = set(
feature
for row_dict in data_dict.values()
for feature in row_dict.keys()
)
print(unique_features)
# {'golf', 'delta', 'foxtrot', 'alpha', 'bravo', 'echo', 'tango', 'kilo'}
print(len(unique_features))
# 8
Apply sum to the len of each nested dictionary:
sum(len(v) for _, v in data_dict.items())
v represents a nested dictionary object.
Dictionaries will naturally return their keys when you call an iterator on them (or something of that sort), so calling len will return the number of keys in each nested dictionary, viz. number of features.
If the features may be duplicated across nested objects, then collect them in a set and apply len
len(set(f for v in data_dict.values() for f in v.keys()))
Here is the answer
https://discussions.udacity.com/t/lesson-5-number-of-features/44253/4
where we choose 1 person in this case SKILLING JEFFREY K within the database called enron_data. and then we print the lenght of the keys in the dictionary.
print len(enron_data["SKILLING JEFFREY K"].keys())

Get unique values from a column using Python

I'm trying to get unique values from the column 'name' for every distinct value in column 'gender'.
Here's sample data:
sample input_file_data:
index,name,gender,alive
1,Adam,Male,Y
2,Bella,Female,N
3,Marc,Male,Y
1,Adam,Male,N
I could get it when I give a value corresponding to 'gender' like for example, gave "Male" in the code below:
filtered_data = filter(lambda person: person["gender"] == "Male", input_file_data)
reader = (dict((k, v.strip()) for k, v in row.items() if v) for row in filtered_data)
countt = [rec[gender] for rec in reader]
final1 = input_file_name + ".txt", "gender", "Male"
output1 = str(final1).replace("(", "").replace(")", "").replace("'","").replace(", [{", " -- [").replace("}", "")
final2 = set(re.findall(r"name': '(.*?)'", str(filtered_data)))
final_count = len(final2)
output = str(final_count) + " occurrences", str(final2)
output2 = output1, str(output)
output_final = str(output2).replace('\\', "").replace('"',"").replace(']"', "]").replace("set", "").replace("(", "").replace(")", "").replace("'","").replace(", [{", " -- [").replace("}", "")
output_final = output_final + "\n"
current output:
input_file_name.txt, gender, Male, 2 occurrences, [Adam,Marc]
Expected output:
input_file_name.txt, gender, Male, 2 occurrences, [Adam,Marc], Female, 1 occurrences [Bella]
which should show up all the unique occurrences of names, for every distinct gender value (without hardcoding). Also I do not want to use Pandas. Any help is highly appreciated.
PS- I have multiple files and not all files have the same columns. So I can't hardcode them. Also, all the files have a 'name' column, but not all files have a 'gender' column. And this script should work for any other column like 'index' or 'alive' or anything else for that matter and not just gender.
I would use the csv module along with the defaultdict from collections for this. Say this is stored in a file called test.csv:
>>> import csv
>>> from collections import defaultdict
>>> with open('test.csv', 'rb') as fin: data = list(csv.reader(fin))[1:]
>>> gender_dict = defaultdict(set)
>>> for idx, name, gender, alive in data:
gender_dict[gender].add(name)
>>> gender_dict
defaultdict(<type 'set'>, {'Male': ['Adam', 'Marc'], 'Female': ['Bella']})
You now have a dictionary. Each key is a unique value from the gender column. Each value is a set, so you'll only get unique items. Notice that we added 'Adam' twice, but only see one in the resulting set.
You don't need defaultdict, but it allows you to use less boilerplate code to check if a key exists.
EDIT: It might help to have better visibility into the data itself. Given your code, I can make the following assumptions:
input_file_data is an iterable (list, tuple, something like that) containing dictionaries.
Each dictionary contains a 'gender' key. If it didn't include at least 'gender', you would get a key error when trying to filter it.
Each dictionary has a 'name' key, it looks like.
Rather than doing all of that regex, what about this?
>>> gender_dict = {'Male': set(), 'Female': set()}
>>> for item in input_file_data:
gender_dict[item['gender']].add(item['name'])
You can use item.get('name') instead of item['name'] if not every entry will have a name.
Edit #2: Ok, the first thing you need to do is get your data into a consistent state. We can absolutely get to a point where you have a column name (gender, index, alive, whatever you want) and a set of unique names corresponding to those columns. Something like this:
data_dict = {'gender':
{'Male': ['Adam', 'Marc'],
'Female': ['Bella']}
'alive':
{'Y': ['Adam', 'Marc'],
'N': ['Bella', 'Adam']}
'index':
{1: ['Adam'],
2: ['Bella'],
3: ['Marc']}
}
If that's what you want, you could try this:
>>> data_dict = defaultdict(lambda: defaultdict(lambda: defaultdict(set)))
>>> for element in input_file_data:
for key, value in element.items():
if key != 'name':
data_dict[key][value].add(element[name])
That should get you what you want, I think? I can't test as I don't have your data, but give it a try.

Categories