Python merge list concating unique values as comma seperated - python

I am trying to get this to work.
Here is my data:
data.csv
id,fname,lname,education,gradyear,attributes
1,john,smith,mit,2003,qa
1,john,smith,harvard,207,admin
1,john,smith,ft,212,master
2,john,doe,htw,2000,dev
Trying to use this code. Found this code on the Internet, don't fully understand it.
from itertools import groupby
import csv
import pprint
t = csv.reader(open('data.csv'))
t = list(t)
def join_rows(rows):
def join_tuple(tup):
for x in tup:
if x:
return x
else:
return x
return [join_tuple(x) for x in zip(*rows)]
for name, rows in groupby(sorted(t), lambda x:x[0]):
print join_rows(rows)
However, it does not merge unique values as comma separated.
The output is:
['1', 'john', 'smith', 'ft', '212', 'master']
['2', 'john', 'doe', 'htw', '2000', 'dev']
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes']
How can I make it like:
['1', 'john', 'smith', 'mit,harvard,ft', '2003,207,212', 'qa,admin,master']
['2', 'john', 'doe', 'htw', '2000', 'dev']
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes']
If there are more entries for the same column, it should also work. Should not be limited to 3 rows.
Grrrrr .... anybody have tips or ideas?
Thanks in advance!

You can change the definition of join_rows to
import itertools
def join_rows(rows):
return [(e[0] if i < 3 else ','.join(e)) for (i, e) in enumerate(zip(*rows))]
What this does is to zip all entries belonging to the same id into tuples. For the first 3 tuples, the first item is returned; for the latter, they are joined by commas.
['1', 'john', 'smith', 'ft,harvard,mit', '212,207,2003', 'master,admin,qa']
['2', 'john', 'doe', 'htw', '2000', 'dev']
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes']

Related

create a specific python dictionary from 2 lists

I have the following data:
fsm_header = ['VLAN_ID', 'NAME', 'STATUS']
fsm_results = [['1', 'default', 'active'],
['2', 'VLAN0002', 'active'],
['3', 'VLAN0003', 'active']]
I want to create a specific dictionary like this:
{'VLAN_ID':['1','2','3'],
'NAME':['default','VLAN0002','VLAN0003'],
'STATUS':['active','active','active']}
I'm having trouble finding the right combination, as the one I'm using:
dict(zip(fsm_header, row)) for row in fsm_results
gives me another type of useful output, but not the one I mentioned above.
I would prefer to see something without using the zip function, but even with zip is ok.
You need to unpack and zip fsm_results too:
out = {k:list(v) for k,v in zip(fsm_header, zip(*fsm_results))}
Output:
{'VLAN_ID': ['1', '2', '3'],
'NAME': ['default', 'VLAN0002', 'VLAN0003'],
'STATUS': ['active', 'active', 'active']}
If you don't mind tuple as values; then you could use:
out = dict(zip(fsm_header, zip(*fsm_results)))
Output:
{'VLAN_ID': ('1', '2', '3'),
'NAME': ('default', 'VLAN0002', 'VLAN0003'),
'STATUS': ('active', 'active', 'active')}
You could also write the same thing using dict.setdefault:
out = {}
for lst in fsm_results:
for k, v in zip(fsm_header, lst):
out.setdefault(k, []).append(v)

How to index multiple item positions and get items from another list with those positions?

names = ['vik', 'zed', 'loren', 'tal', 'yam', 'jay', 'alex', 'gad', 'dan', 'hed']
cities = ['NY', 'NY', 'SA', 'TNY', 'LA', 'SA', 'SA', 'NY', 'SA', 'LA']
ages = ['28', '26', '26', '31', '28', '23', '29', '31', '27', '41']
How do I create a new list with names of all the people from SA?
I tried getting all 'SA' positions and then print the same positions that are in the names list,
pos = [i for i in range(len(names)) if cities[i] == 'SA']
print(names[pos])
Returns the following error:
TypeError: list indices must be integers or slices, not list
I've also attempted to loop over the positions in cities and then do pretty much the same but one by one, but i still wasn't able to put in a list
pos = [i for i in range(len(names)) if cities[i] == 'SA']
x = 1
for i in pos:
x+=1
You can zip the names ages and cities together then use a list comprehension to filter those by city
[(a,b,c) for a,b,c in zip(names, cities, ages) if b == "SA"]
returns
[('loren', 'SA', '26'), ('jay', 'SA', '23'), ('alex', 'SA', '29'), ('dan', 'SA', '27')]
Enumerate the list of cities so you get their index as well, and collect the names if the city matches:
names = ['vik', 'zed', 'loren', 'tal', 'yam', 'jay', 'alex', 'gad', 'dan', 'hed']
cities = ['NY', 'NY', 'SA', 'TNY', 'LA', 'SA', 'SA', 'NY', 'SA', 'LA']
print( [names[idx] for idx,c in enumerate(cities) if c == "SA"] )
Output:
['loren', 'jay', 'alex', 'dan']
See: enumerate on python.org
How do i create a new list with names of all the people from SA?
Oneliner (but maybe the question is unclear) using zip and list comprehension with a filter
lst = [n for n, c in zip(names, cities) if c == 'SA']
print(lst)
Ouput:
['loren', 'jay', 'alex', 'dan']
Explaination
The oneliner is equivalent to:
lst = []
for name, city in zip(names, cities):
if city == 'SA':
lst.append(name)
print(lst)
zip iterates over the names and cities lists in parallel, producing tuples in the form "(<a_name>, <a_city>)"

From a list of string, get a list of dictionaries

I have a list of strings which contains spanish-recipesĀ“s ingredients and its quantities and I would like to get a list of dictionaries splitting every ingredient, unit and quantity.
This is the list:
ingredients=[
'50',
'ccs',
'aceite',
'1',
'hoja',
'laurel',
'\n',
'1',
'cabeza',
'ajos',
'1',
'vaso',
'vino',
'1,5',
'kilos',
'conejo',
'\n',
...]
I would like to get a dict like this:
my_dic=[
{"name":"aceite" ,"qt":50 ,"unit": "ccs"},
{"name":"laurel" ,"qt":1 ,"unit": "hoja"},
{"name":"ajos" ,"qt":1 ,"unit": "cabeza"},
{"name":"vino" ,"qt":1 ,"unit": "vaso"},
{"name":"conejo" ,"qt":1,5 ,"unit": "kilos"},
...]
I have been trying things but it was all a disaster.
Any ideas?
Thanks in advance!!
So first, you want to remove the newlines from your original list:
ingredients = [i for i in ingredients if i is not '\n']
Then, each ingredient name is every third element in the ingredients list starting from the third element. Likewise for the quantity and unit, starting from the second and first elements, respectively:
names = ingredients[2::3]
units = ingredients[1::3]
qts = ingredients[::3]
Then, iterate through these lists and construct the data structure you specified (which is not actually a dict but a list of dicts):
my_list = []
for i in range(len(names)):
my_dict = {"name":names[i],"qt":qts[i],"unit":units[i]}
my_list.append(my_dict)
There are a lot of ways to compress all of the above, but I have written it for comprehensibility.
This doesn't produce a dictionary, but it does give you the output that you specify in the question:
# Strip out the \n values (can possibly do this with a .strip() in the input stage)
ingredients = [value for value in ingredients if value != '\n']
labels = ['qt', 'unit', 'name']
my_dic = [dict(zip(labels, ingredients[i:i+3])) for i in range(0, len(ingredients), 3)]
my_dic contains:
[{'qt': '50', 'unit': 'ccs', 'name': 'aceite'},
{'qt': '1', 'unit': 'hoja', 'name': 'laurel'},
{'qt': '1', 'unit': 'cabeza', 'name': 'ajos'},
{'qt': '1', 'unit': 'vaso', 'name': 'vino'},
{'qt': '1,5', 'unit': 'kilos', 'name': 'conejo'}]
You can clean you list with filter to remove the \n characters and then zip() it together to collect your items together. This makes a quick two-liner:
l = filter(lambda w: w != '\n', ingredients)
result = [{'name': name, 'qt':qt, 'unit': unit}
for qt, unit, name in zip(l, l, l)]
result:
[{'name': 'aceite', 'qt': '50', 'unit': 'ccs'},
{'name': 'laurel', 'qt': '1', 'unit': 'hoja'},
{'name': 'ajos', 'qt': '1', 'unit': 'cabeza'},
{'name': 'vino', 'qt': '1', 'unit': 'vaso'},
{'name': 'conejo', 'qt': '1,5', 'unit': 'kilos'}]
How about:
ingredients = (list)(filter(lambda a: a != '\n', ingredients))
ing_organized = []
for i in range (0, len(ingredients) , 3):
curr_dict = {"name": ingredients[i+2] ,"qt": ingredients[i] ,"unit": ingredients[i+1]}
ing_organized.append(curr_dict)
I just removed '\n' elements from the list as they didn't seem to have meaning.

How to sort data in the dictionary of list of dictionary in python?

Please help me. I have dataset like this:
my_dict = { 'project_1' : [{'commit_number':'14','name':'john'},
{'commit_number':'10','name':'steve'}],
'project_2' : [{'commit_number':'12','name':'jack'},
{'commit_number':'15','name':'anna'},
{'commit_number':'11','name':'andy'}]
}
I need to sort the dataset based on the commit number in descending order and make it into a new list by ignoring the name of the project using python. The list expected will be like this:
ordered_list_of_dict = [{'commit_number':'15','name':'anna'},
{'commit_number':'14','name':'john'},
{'commit_number':'12','name':'jack'},
{'commit_number':'11','name':'andy'},
{'commit_number':'10','name':'steve'}]
Thank you so much for helping me.
Extract my_dict's values as a list of lists*
Join each sub-list together (flatten dict_values) to form a flat list
Sort each element by commit_number
*list of lists on python2. On python3, a dict_values object is returned.
from itertools import chain
res = sorted(chain.from_iterable(my_dict.values()),
key=lambda x: x['commit_number'],
reverse=True)
[{'commit_number': '15', 'name': 'anna'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '12', 'name': 'jack'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '10', 'name': 'steve'}]
On python2, you'd use dict.itervalues instead of dict.values to the same effect.
Coldspeed's answer is great as usual but as an alternative, you can use the following:
ordered_list_of_dict = sorted([x for y in my_dict.values() for x in y], key=lambda x: x['commit_number'], reverse=True)
which, when printed, gives:
print(ordered_list_of_dict)
# [{'commit_number': '15', 'name': 'anna'}, {'commit_number': '14', 'name': 'john'}, {'commit_number': '12', 'name': 'jack'}, {'commit_number': '11', 'name': 'andy'}, {'commit_number': '10', 'name': 'steve'}]
Note that in the list-comprehension you have the standard construct for flattening a list of lists:
[x for sublist in big_list for x in sublist]
I'll provide the less-pythonic and more reader-friendly answer.
First, iterate through key-value pairs in my_dict, and add each element of value to an empty list. This way you avoid having to flatten out a list of lists:
commits = []
for key, val in my_dict.items():
for commit in val:
commits.append(commit)
which gives this:
In [121]: commits
Out[121]:
[{'commit_number': '12', 'name': 'jack'},
{'commit_number': '15', 'name': 'anna'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '10', 'name': 'steve'}]
Then sort it in descending order:
sorted(commits, reverse = True)
This will sort based on 'commit_number' even if you don't specify it because it comes alphabetically before 'name'. If you want to specify it for the sake of defensive coding, this would be fastest and cleanest way, to the best of my knowledge :
from operator import itemgetter
sorted(commits, key = itemgetter('commit_number'), reverse = True)

Reconstruct the list of dict in python but the result is not in order

list_1 = [{'1': 'name_1', '2': 'name_2', '3': 'name_3',},
{'1': 'age_1', '2': 'age_2' ,'3': 'age_3',}]
I want to manipulate this list so that the dicts contain all the attributes for a particular ID. The ID itself must form part of the resulting dict. An example output is shown below:
list_2 = [{'id' : '1', 'name' : 'name_1', 'age': 'age_1'},
{'id' : '2', 'name' : 'name_2', 'age': 'age_2'},
{'id' : '3', 'name' : 'name_3', 'age': 'age_3'}]
Then I did following:
>>> list_2=[{'id':x,'name':list_1[0][x],'age':list_1[1][x]} for x in list_1[0].keys()]
Then it gives:
>>> list_2
[{'age': 'age_1', 'id': '1', 'name': 'name_1'},
{'age': 'age_3', 'id': '3', 'name': 'name_3'},
{'age': 'age_2', 'id': '2', 'name': 'name_2'}]
But I don't understand why 'id' is showing in the second position while 'age' showing first?
I tried other ways but the result is the same. Any one can help to figure it out?
To keep the order, you should use an ordered dictionary. Using your sample:
new_list = [OrderedDict([('id', x), ('name', list_1[0][x]), ('age', list_1[1][x])]) for x in list_1[0].keys()]
Printing the ordered list...
for d in new_list:
print(d[name], d[age])
name_1 age_1
name_3 age_3
name_2 age_2
Try using an OrderedDict:
list_1 = [collections.OrderedDict([('1','name_1'), ('2', 'name_2'), ('3', 'name_3')]),
collections.OrderedDict([('1','age_1'),('2','age_2'),('3', 'age_3')])]
list_2=[collections.OrderedDict([('id',x), ('name',list_1[0][x]), ('age', list_1[1][x])])
for x in list_1[0].keys()]
This is more likely to preserve the order you want. I am still new to Python, so this may not be super Pythonic, but I think it will work.
output -
In [24]: list( list_2[0].keys() )
Out[24]: ['id', 'name', 'age']
Docs:
https://docs.python.org/3/library/collections.html#collections.OrderedDict
Examples:
https://pymotw.com/2/collections/ordereddict.html
Getting the constructors right:
Right way to initialize an OrderedDict using its constructor such that it retains order of initial data?

Categories