Setup up a hierarchy from dicts - python

I have a single CSV file of employees where I have employee data including the name, boss, department id and department name.
By reading that CSV file, I have created those 2 dict structures:
dep = {}
dep[1] = {'name': 'Sales', 'parent': None}
dep[2] = {'name': 'National Sales', 'parent': None}
dep[3] = {'name': 'International Sales', 'parent': None}
dep[4] = {'name': 'IT', 'parent': None}
dep[5] = {'name': 'Development', 'parent': None}
dep[6] = {'name': 'Support', 'parent': None}
dep[7] = {'name': 'Helpdesk', 'parent': None}
dep[8] = {'name': 'Desktop support', 'parent': None}
dep[9] = {'name': 'CEO', 'parent': None}
emp = {}
emp[1] = {'name': 'John', 'boss': None, 'dep': 9}
emp[2] = {'name': 'Jane', 'boss': 1, 'dep': 1}
emp[3] = {'name': 'Bob', 'boss': 2, 'dep': 1}
emp[4] = {'name': 'Clara', 'boss': 2, 'dep': 2}
emp[5] = {'name': 'George', 'boss': 3, 'dep': 2}
emp[6] = {'name': 'Steve', 'boss': 2, 'dep': 3}
emp[7] = {'name': 'Joe', 'boss': 1, 'dep': 4}
emp[8] = {'name': 'Peter', 'boss': 7, 'dep': 5}
emp[9] = {'name': 'Silvia', 'boss': 7, 'dep': 6}
emp[10] = {'name': 'Mike', 'boss': 9, 'dep': 7}
emp[11] = {'name': 'Lukas', 'boss': 10, 'dep': 7}
emp[12] = {'name': 'Attila', 'boss': 7, 'dep': 8}
emp[13] = {'name': 'Eva', 'boss': 12, 'dep': 8}
Out of this I have 2 tasks:
Create a hierarchy of departments. (basically fill the value of the
parent key)
Display (list) all the departments and employees for a boss
Expected result for the point #2 would be (everybody working in sales):
employees = {1: (2, 3, 4, 5, 6)}
for everybody working in National Sales:
employees = {4: (5)}
and for everybody working in International Sales (Steve is the only one, nobody is working for him)):
employees = {6: None}
How to achieve this in a performant manner (I have to handle several thousands employees)?
EDIT:
This a (simplified) CSV file structure:
id;name;boss;dep_id;dep_name
1;John;;9;CEO
2;Jane;1;1;Sales
3;Bob;2;1;Sales
4;Clara;2;2;National Sales
5;George;3;2;National Sales
6;Steve;2;3;International Sales
7;Joe;1;4;IT
8;Peter;7;5;Development
9;Silvia;7;6;Support
10;Mike;9;7;Helpdesk
11;Lukas;10;7;Helpdesk
12;Attila;7;8;Desktop support
13;Eva;12;8;Desktop support

As suggested in the comments, here is a solution using pandas. The file is mocked using your example data, and it should be plenty fast for only a few thousand entries.
from StringIO import StringIO
import pandas as pd
f = StringIO("""
id;name;boss;dep_id;dep_name
1;John;1;9;CEO
2;Jane;1;1;Sales
3;Bob;2;1;Sales
4;Clara;2;2;National Sales
5;George;3;2;National Sales
6;Steve;2;3;International Sales
7;Joe;1;4;IT
8;Peter;7;5;Development
9;Silvia;7;6;Support
10;Mike;9;7;Helpdesk
11;Lukas;10;7;Helpdesk
12;Attila;7;8;Desktop support
13;Eva;12;8;Desktop support
""")
# load data
employees = pd.read_csv(f, sep=';', index_col=0)
### print a department ###
# Filter by department and print the names
print employees[employees.dep_id == 7].name
### build org hierarchy ###
# keep only one entry per department (assumes they share a boss)
org = employees[['boss', 'dep_id']].drop_duplicates('dep_id')
# follow the boss id to their department id
# note: the CEO is his own boss, to avoid special casing
org['parent'] = org.dep_id.loc[org['boss']].values
# reindex by department id, and keep only the parent column
# note: the index is like your dictionary key, access is optimized
org = org.set_index('dep_id')[['parent']]
print org

Related

how to insert list of elements into list of dictionaries

I have one list of elements and another list of dictionaries and i want to insert list of elements into each dictionary of list
list_elem = [1,2,3]
dict_ele = [{"Name":"Madhu","Age":25},{"Name":"Raju","Age:24},{""Name":"Mani","Age":12}],
OUTPUT As:
[{"ID":1,"Name":"Madhu","Age":25},{"ID":2,"Name":"Raju","Age:24},{"ID":3,"Name":"Mani","Age":12}]
I have tried this way :
dit = [{"id":item[0]} for item in zip(sam)]
# [{"id":1,"id":2,"id":3}]
dic1 = list(zip(dit,data))
print(dic1)
# [({"id":1},{{"Name":"Madhu","Age":25}},{"id":2},{"Name":"Raju","Age:24},{"id":3},{""Name":"Mani","Age":12})]
What is the most efficient way to do this in Python?
Making an assumption here that the OP's original question has a typo in the definition of dict_ele and also that list_elem isn't really necessary.
dict_ele = [{"Name":"Madhu","Age":25},{"Name":"Raju","Age":24},{"Name":"Mani","Age":12}]
dit = [{'ID': id_, **d} for id_, d in enumerate(dict_ele, 1)]
print(dit)
Output:
[{'ID': 1, 'Name': 'Madhu', 'Age': 25}, {'ID': 2, 'Name': 'Raju', 'Age': 24}, {'ID': 3, 'Name': 'Mani', 'Age': 12}]
dict_ele = [{"Name":"Madhu","Age":25},{"Name":"Raju","Age":24},{"Name":"Mani","Age":12}]
list_elem = [1,2,3]
[{'ID': id, **_dict} for id, _dict in zip(list_elem, dict_ele)]
[{'ID': 1, 'Name': 'Madhu', 'Age': 25}, {'ID': 2, 'Name': 'Raju', 'Age': 24}, {'ID': 3, 'Name': 'Mani', 'Age': 12}]
try this: r = [{'id':e[0], **e[1]} for e in zip(list_elem, dict_ele)]

From list to nested dictionary

there are list :
data = ['man', 'man1', 'man2']
key = ['name', 'id', 'sal']
man_res = ['Alexandra', 'RST01', '$34,000']
man1_res = ['Santio', 'RST009', '$45,000']
man2_res = ['Rumbalski', 'RST50', '$78,000']
the expected output will be nested output:
Expected o/p:- {'man':{'name':'Alexandra', 'id':'RST01', 'sal':$34,000},
'man1':{'name':'Santio', 'id':'RST009', 'sal':$45,000},
'man2':{'name':'Rumbalski', 'id':'RST50', 'sal':$78,000}}
Easy way would be using pandas dataframe
import pandas as pd
df = pd.DataFrame([man_res, man1_res, man2_res], index=data, columns=key)
print(df)
df.to_dict(orient='index')
name id sal
man Alexandra RST01 $34,000
man1 Santio RST009 $45,000
man2 Rumbalski RST50 $78,000
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
Or you could manually merge them using dict + zip
d = dict(zip(
data,
(dict(zip(key, res)) for res in (man_res, man1_res, man2_res))
))
d
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
#save it in 2D array
all_man_res = []
all_man_res.append(man_res)
all_man_res.append(man1_res)
all_man_res.append(man2_res)
print(all_man_res)
#Add it into a dict output
output = {}
for i in range(len(l)):
person = l[i]
details = {}
for j in range(len(key)):
value = key[j]
details[value] = all_man_res[i][j]
output[person] = details
output
The pandas dataframe answer provided by NoThInG makes the most intuitive sense. If you are looking to use only the built in python tools, you can do
info_list = [dict(zip(key,man) for man in (man_res, man1_res, man2_res)]
output = dict(zip(data,info_list))

Errors when trying to copy some data from a 2-D array to another

I am trying to write a simple program that reads and writes in xlsx files.
I have managed to import data from the file, and turn it into a dictionary.
But this array is arranged in a way that bothers me:
the array is something like this:
{
'position': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
'lastname': {0: 'Hamilton', 1: 'Bottas', 2: 'Verstappen', 3: 'Leclerc', 4: 'Vettel', 5: 'Albon'},
'firstname': {0: 'Lewis', 1: 'Valtteri', 2: 'Max', 3: 'Charles', 4: 'Sebastian', 5: 'Alexander'},
'team': {0: 'Mercedes', 1: 'Mercedes', 2: 'RedBull', 3: 'Ferrari', 4: 'Ferrari', 5: 'RedBull'}
}
(using F1 drivers as an example, don't pay attention)
I would like to transform this array into something that looks more like this:
{
0: {'position': 1, 'lastname': 'Hamilton', 'firstname': 'Lewis', 'team': 'Mercedes'},
1: {'position': 2, 'lastname': 'Bottas', 'firstname': 'Valtteri', 'team': 'Mercedes'},
2: {'position': 3, 'lastname': 'Verstappen', 'firstname': 'Max', 'team': 'RedBull'},
...
}
So that I could use the following code
for data in array:
print(array[data])
to print ALL the data on Lewis Hamilton,
then ALL the data on Valtteri Bottas, etc
and not
positions of ALL drivers
names of ALL drivers
So, basically my array is like this
data[rowname][driver]
and I want it this way
data[driver][rowname]
My code below, trying to transfer data from an array named data
to an array named drivers
import pandas
import_file_path = "test.xlsx"
data = pandas.read_excel(import_file_path)
data = pandas.DataFrame(data)
data = data.to_dict()
newdriver = {}
drivers = {}
lines = 0
# getting number of lines in the array
for a in data:
for line in data[a]:
lines += 1
break
# for as many drivers as present in the array
for i in range(lines):
for column in data:
newdriver[column] = data[column][i]
# store the driver's data in a temporary variable, field by field
drivers[i] = newdriver
# storing driver data in a row of our final array
print(drivers)
The final print statement results in this:
{0: {'position': 6, 'lastname': 'Albon', 'firstname': 'Alexander', 'team': 'RedBull'},
1: {'position': 6, 'lastname': 'Albon', 'firstname': 'Alexander', 'team': 'RedBull'},
2: {'position': 6, 'lastname': 'Albon', 'firstname': 'Alexander', 'team': 'RedBull'},
3: {'position': 6, 'lastname': 'Albon', 'firstname': 'Alexander', 'team': 'RedBull'},
4: {'position': 6, 'lastname': 'Albon', 'firstname': 'Alexander', 'team': 'RedBull'},
5: {'position': 6, 'lastname': 'Albon', 'firstname': 'Alexander', 'team': 'RedBull'}}
The same driver in every line of the array.
I've investigated the issue, and it would seem that this line:
newdriver[column] = data[column][i]
also updates the data in the drivers array.
Halfway through the process of storing data into newdriver, "Lewis Bottas" appears in the drivers table, when I'm not even editing it.
Which makes no sense to me. (Lewis Bottas isn't a real driver)
{0: {'position': 2, 'lastname': 'Bottas', 'firstname': 'Lewis', 'team': 'Mercedes'}}
I suspect that the drivers[i] = newdriver makes drivers and newdriver share the same memory address, and thus, updates their values at the same time.
It's like I've created a pointer without wanting to. I just want to copy the values, not make them share the same address.
Any sort of help is welcome.
If you don't want to import anything
Tested on:
Python 2.7 , 3.7.5
def convert(array):
newArray = {}
for index in range(0,len(v['position'])):
newArray[index] = {
'position': array['position'][index],
'lastname': array['lastname'][index],
'firstname': array['firstname'][index],
'team': array['team'][index]};
return newArray
Edit: To do this Dynamically
# Input must be { StringOrNumber: { Number: SomeValue }, etc }
def convertDynamically(array):
newArray = {}
mainKeys = list(v.keys())
for index in range(0,len(v[mainKeys[0]])):
tmp = {}
for key in mainKeys:
tmp[key] = array[key][index]
newArray[index] = tmp
return newArray

Filter/group dictionary by nested value

Here‘s a simplified example of some data I have:
{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}}
The wholeo nested dictionary is a bigger list of address data. The goal is to create pairs of people from the list with randomized partners where partners from the same country should be preferd. So my first real issue is to find a good way to group them by that country value.
I‘m sure there‘s a smarter way to do this than iterating through the dict and writing all records out to some new list/dict?
I think this is close to what you need:
result = {key:[i for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
What this does is use itertools.groupby to group all people in the people list by their specified country. The resulting dictionary has countries as keys, and the unpacked groupings (matching people) as values. Input is expected as a list of dictionaries like the one in your example:
people = [{"id": "1234565", "fields": {"name": "john", "email":"john#example.com", "country": "uk"}},
{"id": "654321", "fields": {"name": "sam", "email":"sam#example.com", "country": "uk"}}]
Sample output:
>>> print(result)
>>> {'uk': [{'fields': {'name': 'john', 'email': 'john#example.com', 'country': 'uk'}, 'id': '1234565'}, {'fields': {'name': 'sam', 'email': 'sam#example.com', 'country': 'uk'}, 'id': '654321'}]}
For a cleaner result, the looping construct can be tweaked so that only the ID of each person is included in the result dict:
result = {key:[i["id"] for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
>>> print(result)
>>> {'uk': ['1234565', '654321']}
EDIT: Sorry, I forgot about the sorting. Simply sort the list of people by country before putting it through groupby. It should now work properly:
sort = sorted(people, key=lambda item: item["fields"]["country"])
Here is another one that uses defaultdict:
import collections
def make_groups(nested_dicts, nested_key):
default = collections.defaultdict(list)
for nested_dict in nested_dicts:
for value in nested_dict.values():
try:
default[value[nested_key]].append(nested_dict)
except TypeError:
pass
return default
To test the results:
import random
COUNTRY = {'af', 'br', 'fr', 'mx', 'uk'}
people = [{'id': i, 'fields': {
'name': 'name'+str(i),
'email': str(i)+'#email',
'country': random.sample(COUNTRY, 1)[0]}}
for i in range(10)]
country_groups = make_groups(people, 'country')
for country, persons in country_groups.items():
print(country, persons)
Random output:
fr [{'id': 0, 'fields': {'name': 'name0', 'email': '0#email', 'country': 'fr'}}, {'id': 1, 'fields': {'name': 'name1', 'email': '1#email', 'country': 'fr'}}, {'id': 4, 'fields': {'name': 'name4', 'email': '4#email', 'country': 'fr'}}]
br [{'id': 2, 'fields': {'name': 'name2', 'email': '2#email', 'country': 'br'}}, {'id': 8, 'fields': {'name': 'name8', 'email': '8#email', 'country': 'br'}}]
uk [{'id': 3, 'fields': {'name': 'name3', 'email': '3#email', 'country': 'uk'}}, {'id': 7, 'fields': {'name': 'name7', 'email': '7#email', 'country': 'uk'}}]
af [{'id': 5, 'fields': {'name': 'name5', 'email': '5#email', 'country': 'af'}}, {'id': 9, 'fields': {'name': 'name9', 'email': '9#email', 'country': 'af'}}]
mx [{'id': 6, 'fields': {'name': 'name6', 'email': '6#email', 'country': 'mx'}}]

Python find element from list of dict in other list of dict

I have two list of dict.
students = [{'lastname': 'JAKUB', 'id': '92051048757', 'name': 'BAJOREK'},
{'lastname': 'MARIANNA', 'id': '92051861424', 'name': 'SLOTARZ'}, {'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id':
'92052877491', 'name': 'LESKO'}]
And
house = [{'id_pok': '2', 'id': '92051048757'}, {'id_pok': '24', 'id': '92051861424'}]
How to find elements that not exist in house list of dict matching by id?
Output
output = [{'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}]
I try do that
for student in students:
for home in house:
if student['id'] != home['id']:
print student
But this only repeat list
The reason your code doesn't work is that if there's any house_id which doesn't match a student_id, the student will be printed. You'd need some more logic or the any function:
for student in students:
if not any (student['id'] == home['id'] for home in house):
print(student)
It outputs:
{'lastname': 'SZYMON', 'id': '92052033215', 'name': 'WNUK'}
{'lastname': 'WOJCIECH', 'id': '92052877491', 'name': 'LESKO'}
A more efficient solution would be to keep a set of house_ids, and find students whose id isn't included in this set:
students = [{'lastname': 'JAKUB', 'id': '92051048757', 'name': 'BAJOREK'},
{'lastname': 'MARIANNA', 'id': '92051861424', 'name': 'SLOTARZ'}, {'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id':
'92052877491', 'name': 'LESKO'}]
house = [{'id_pok': '2', 'id': '92051048757'}, {'id_pok': '24', 'id': '92051861424'}]
house_ids = set(house_dict['id'] for house_dict in house)
result = [student for student in students if student['id'] not in house_ids]
print(result)
It outputs:
[{'lastname': 'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id': '92052877491', 'name': 'LESKO'}]
Note that 2 students match your description.
The reason setenter link description here is used is that it allows much faster lookup than a list.
student_ids = set(d.get('id') for d in students)
house_ids = set(d.get('id') for d in house)
ids_not_in_house = student_ids ^ house_ids
students = [{'lastname': 'JAKUB', 'id': '92051048757', 'name': 'BAJOREK'},
{'lastname': 'MARIANNA', 'id': '92051861424', 'name': 'SLOTARZ'}, {'lastname':
'SZYMON', 'id': '92052033215', 'name': 'WNUK'}, {'lastname': 'WOJCIECH', 'id':
'92052877491', 'name': 'LESKO'}]
house = [{'id_pok': '2', 'id': '92051048757'}, {'id_pok': '24', 'id': '92051861424'}]
s = {item['id'] for item in students}
h = {item['id'] for item in house}
not_in_house_ids = s.difference(h)
not_in_house_items = [x for x in students if x['id'] in not_in_house_ids]
print (not_in_house_items)
>>>[{'name': 'WNUK', 'lastname': 'SZYMON', 'id': '92052033215'}, {'name': 'LESKO', 'lastname': 'WOJCIECH', 'id': '92052877491'}]

Categories