How to combine multiple numpy arrays into a dictionary list - python

I have the following array:
column_names = ['id', 'temperature', 'price']
And three numpy array as follows:
idArry = ([1,2,3,4,....])
tempArry = ([20.3,30.4,50.4,.....])
priceArry = ([1.2,3.5,2.3,.....])
I wanted to combine the above into a dictionary as follows:
table_dict = ( {'id':1, 'temperature':20.3, 'price':1.2 },
{'id':2, 'temperature':30.4, 'price':3.5},...)
I can use a for loop together with append to create the dictionary but the list is huge at about 15000 rows. Can someone show me how to use python zip functionality or other more efficient and fast way to achieve the above requirement?

You can use a listcomp and the function zip():
[{'id': i, 'temperature': j, 'price': k} for i, j, k in zip(idArry, tempArry, priceArry)]
# [{'id': 1, 'temperature': 20.3, 'price': 1.2}, {'id': 2, 'temperature': 30.4, 'price': 3.5}]
If your ids are 1, 2, 3... and you use a list you don’t need ids in your dicts. This is a redundant information in the list.
[{'temperature': i, 'price': j} for i, j in zip(tempArry, priceArry)]
You can use also a dict of dicts. The lookup in the dict must be faster than in the list.
{i: {'temperature': j, 'price': k} for i, j, k in zip(idArry, tempArry, priceArry)}
# {1: {'temperature': 20.3, 'price': 1.2}, 2: {'temperature': 30.4, 'price': 3.5}}

I'd take a look at the functionality of the pandas package. In particular there is a pandas.DataFrame.to_dict method.
I'm confident that for large arrays this method should be pretty fast (though I'm willing to have the zip method proved more efficient).
In the example below I first construct a pandas dataframe from your arrays and then use the to_dict method.
import numpy as np
import pandas as pd
column_names = ['id', 'temperature', 'price']
idArry = np.array([1, 2, 3])
tempArry = np.array([20.3, 30.4, 50.4])
priceArry = np.array([1.2, 3.5, 2.3])
df = pd.DataFrame(np.vstack([idArry, tempArry, priceArry]).T, columns=column_names)
table_dict = df.to_dict(orient='records')

This could work. Enumerate is used to create a counter that starts at 0 and then each applicable value is pulled out of your tempArry and priceArray. This also creates a generator expression which helps with memory (especially if your lists are really large).
new_dict = ({'id': i + 1 , 'temperature': tempArry[i], 'price': priceArry[i]} for i, _ in enumerate(idArry))

You can use list-comprehension to achieve this by just iterating over one of the arrays:
[{'id': idArry[i], 'temperature': tempArry[i], 'price': priceArry[i]} for i in range(len(idArry))]

You could build a NumPy matrix then convert to a dictionary as follow. Given your data (I changed the values just for example):
import numpy as np
idArry = np.array([1,2,3,4])
tempArry = np.array([20,30,50,40])
priceArry = np.array([200,300,100,400])
Build the matrix:
table = np.array([idArry, tempArry, priceArry]).transpose()
Create the dictionary:
dict_table = [ dict(zip(column_names, values)) for values in table ]
#=> [{'id': 2, 'temperature': 30, 'price': 300}, {'id': 3, 'temperature': 50, 'price': 100}, {'id': 4, 'temperature': 40, 'price': 400}]
I don't know the purpose, but maybe you can also use the matrix as follow.
temp_col = table[:,1]
table[temp_col >= 40]
# [[ 3 50 100]
# [ 4 40 400]]

A way to do it would be as follows:
column_names = ['id', 'temperature', 'price']
idArry = ([1,2,3,4])
tempArry = ([20.3,30.4,50.4, 4])
priceArry = ([1.2,3.5,2.3, 4.5])
You could zip all elements in the different list:
l = zip(idArry,tempArry,priceArry)
print(list(l))
[(1, 20.3, 1.2), (2, 30.4, 3.5), (3, 50.4, 2.3), (4, 4, 4.5)]
And append the inner dictionaries to a list using a list comprehension and by iterating over the elements in l as so:
[dict(zip(column_names, next(l))) for i in range(len(idArry))]
[{'id': 1, 'temperature': 20.3, 'price': 1.2},
{'id': 2, 'temperature': 30.4, 'price': 3.5},
{'id': 3, 'temperature': 50.4, 'price': 2.3},
{'id': 4, 'temperature': 4, 'price': 4.5}]
The advantage of using this method is that it only uses built-in methods and that it works for an arbitrary amount of column_names.

Related

Python: consolidate list of lists

I want to consolidate a list of lists (of dicts), but I have honestly no idea how to get it done.
The list looks like this:
l1 = [
[
{'id': 1, 'category': 5}, {'id': 3, 'category': 7}
],
[
{'id': 1, 'category': 5}, {'id': 4, 'category': 8}, {'id': 6, 'category': 9}
],
[
{'id': 6, 'category': 9}, {'id': 9, 'category': 16}
],
[
{'id': 2, 'category': 4}, {'id': 5, 'category': 17}
]
]
If one of the dicts from l1[0] is also present in l1[1], I want to concatenate the two lists and delete l1[0]. Afterwards I want to check if there are values from l1[1] also present in l1[2].
So my desired output would eventually look like this:
new_list = [
[
{'id': 1, 'category': 5}, {'id': 3, 'category': 7}, {'id': 4, 'category': 8}, {'id': 6, 'category': 9}, {'id': 9, 'category': 16}
],
[
{'id': 2, 'category': 4}, {'id': 5, 'category': 17}
]
]
Any idea how it can be done?
I tried it with 3 different for loops, but it wouldnt work, because I change the length of the list and by doing so I provoke an index-out-of-range error (apart from that it would be an ugly solution anyway):
for list in l1:
for dictionary in list:
for index in range(0, len(l1), 1):
if dictionary in l1[index]:
dictionary in l1[index].append(list)
dictionary.remove(list)
Can I apply some map or list_comprehension here?
Thanks a lot for any help!
IIUC, the following algorithm works.
Initialize result to empty
For each sublist in l1:
if sublist and last item in result overlap
append into last list of result without overlapping items
otherwise
append sublist at end of result
Code
# Helper functions
def append(list1, list2):
' append list1 and list2 (without duplicating elements) '
return list1 + [d for d in list2 if not d in list1]
def is_intersect(list1, list2):
' True if list1 and list2 have an element in common '
return any(d in list2 for d in list1) or any(d in list1 for d in list2)
# Generate desired result
result = [] # resulting list
for sublist in l1:
if not result or not is_intersect(sublist, result[-1]):
result.append(sublist)
else:
# Intersection with last list, so append to last list in result
result[-1] = append(result[-1], sublist)
print(result)
Output
[[{'id': 1, 'category': 5},
{'id': 3, 'category': 7},
{'id': 4, 'category': 8},
{'id': 6, 'category': 9},
{'id': 9, 'category': 16}],
[{'id': 2, 'category': 4}, {'id': 5, 'category': 17}]]
​
maybe you can try to append the elements into a new list. by doing so, the original list will remain the same and index-out-of-range error wouldn't be raised.
new_list = []
for list in l1:
inner_list = []
for ...
if dictionary in l1[index]:
inner_list.append(list)
...
new_list.append(inner_list)

How to convert list of dict into two lists?

For example:
persons = [{'id': 1, 'name': 'john'}, {'id': 2, 'name': 'mary'}, {'id': 3, 'name': 'tom'}]
I want to get two lists from it:
ids = [1, 2, 3]
names = ['john', 'mary', 'tom']
What I did:
names = [d['name'] for d in persons]
ids = [d['id'] for d in persons]
Is there a better way to do it?
I'd stick with using list comprehension or use #Woodford technique
ids,name = [dcts['id'] for dcts in persons],[dcts['name'] for dcts in persons]
output
[1, 2, 3]
['john', 'mary', 'tom']
What you did works fine. Another way to handle this (not necessarily better, depending on your needs) is to store your data in a more efficient dictionary and pull the names/ids out of it when you need them:
>>> persons = [{'id': 1, 'name': 'john'}, {'id': 2, 'name': 'mary'}, {'id': 3, 'name': 'tom'}]
>>> p2 = {x['id']: x['name'] for x in persons}
>>> p2
{1: 'john', 2: 'mary', 3: 'tom'}
>>> list(p2.keys())
[1, 2, 3]
>>> list(p2.values())
['john', 'mary', 'tom']
You can do it with pandas in a vectorized fashion:
import pandas as pd
persons = [{'id': 1, 'name': 'john'}, {'id': 2, 'name': 'mary'}, {'id': 3, 'name': 'tom'}]
df = pd.DataFrame(persons)
id_list = df.id.tolist() #[1, 2, 3]
name_list = df.name.tolist() #['john', 'mary', 'tom']
An alternative, inspired by this question, is
ids, names = zip(*map(lambda x: x.values(), persons))
that return tuples. If you need lists
ids, names = map(list, zip(*map(lambda x: x.values(), persons)))
It is a little bit slower on my laptop using python3.9 than the accepted answer but it might be useful.
It sounds like you're trying to iterate through the values of your list while unpacking your dictionaries:
persons = [{'id': 1, 'name': 'john'}, {'id': 2, 'name': 'mary'}, {'id': 3, 'name': 'tom'}]
for x in persons:
id, name = x.values()
ids.append(id)
names.append(name)

Pandas- Concatenating two columns of string lists

I've got an.. interesting data frame that comes from a database. The data frame has two columns, which are lists of strings. I need to concat the values in these two lists, to create a new column of lists. For example:
data = [
{'id': 1, 'l1': ['Luke', 'Han'], 'l2': ['Skywalker', 'Solo']},
{'id': 2, 'l1': ['Darth', 'Kylo'], 'l2': ['Vader', 'Ren']},
{'id': 3, 'l1': [], 'l2': []}
]
df = pd.DataFrame(data)
Notice the third row has no values. You can also assume that l1 and l2 are of the same length.
And I need to concat the values in l1 and l2 (with a space between), e.g.:
result = [
{'id': 1, 'name': ['Luke Skywalker', 'Han Solo']},
{'id': 2, 'name': ['Darth Vader', 'Kylo Ren']},
{'id': 3, 'name': []}
]
result_df = pd.DataFrame(result)
You you use dict comprehension and ' '.join in combination with zip to iterate over your dataset, for example, this:
import pandas as pd
data = [
{'id': 1, 'l1': ['Luke', 'Han'], 'l2': ['Skywalker', 'Solo']},
{'id': 2, 'l1': ['Darth', 'Kylo'], 'l2': ['Vader', 'Ren']},
{'id': 3, 'l1': [], 'l2': []}
]
df = pd.DataFrame(data)
result = [
{
'id': row['id'],
'name': [' '.join(l1_l2) for l1_l2 in zip(row['l1'], row['l2'])]
} for row in data
]
print(pd.DataFrame(result))
>>>
id name
0 1 [Luke Skywalker, Han Solo]
1 2 [Darth Vader, Kylo Ren]
2 3 []
This should get you to where you want: assuming you only have two columns (if you have more just add one of those ' '+df.iloc[j,3 &or 4 &or...][i])
Voila =[]
for j in range(len(df)):
Voila.append([df.iloc[j,1][i]+ ' '+df.iloc[j,2][i] for i in range(len(df.
loc[j,'l1']))])
df['Voila'] = Voila

iterable from pandas dataframe

I need to create an iterable of the form (id, {feature name: features weight}) for using a python package.
my data are store in a pandas dataframe, here an example:
data = pd.DataFrame({"id":[1,2,3],
"gender":[1,0,1],
"age":[25,23,40]})
for the {feature name: features weight}) part, I know I can use this:
fe = data.to_dict(orient='records')
Out[28]:
[{'age': 25, 'gender': 1, 'id': 1},
{'age': 23, 'gender': 0, 'id': 2},
{'age': 40, 'gender': 1, 'id': 3}]
I know I can also iterate over the datframe for get the id, like this:
(row[1] for row in data.itertuples())
But I can get this two together to get one iterable (generator object )
I tried :
((row[1] for row in data.itertuples()),fe[i] for i in range(len(data)))
but the syntax is wrong.
Do you guys know how to do so ?
pd.DataFrame.itertuples returns named tuples. You can iterate and convert each row to a dictionary via the purpose-built method _asdict. You can wrap this in a generator function to create a lazy reader:
data = pd.DataFrame({"id":[1,2,3],
"gender":[1,0,1],
"age":[25,23,40]})
def gen_rows(df):
for row in df.itertuples(index=False):
yield row._asdict()
G = gen_rows(data)
print(next(G)) # OrderedDict([('age', 25), ('gender', 1), ('id', 1)])
print(next(G)) # OrderedDict([('age', 23), ('gender', 0), ('id', 2)])
print(next(G)) # OrderedDict([('age', 40), ('gender', 1), ('id', 3)])
Note that the result will be OrderedDict objects. As a subclass of dict, for most purposes this should be sufficient.
I think need first set_index by column id and then to_dict with orient='index':
fe = data.set_index('id', drop=False).to_dict(orient='index')
print (fe)
{1: {'id': 1, 'gender': 1, 'age': 25},
2: {'id': 2, 'gender': 0, 'age': 23},
3: {'id': 3, 'gender': 1, 'age': 40}}

Grouping python list of dictionaries and aggregation value data

I have input list
inlist = [{"id":123,"hour":5,"groups":"1"},{"id":345,"hour":3,"groups":"1;2"},{"id":65,"hour":-2,"groups":"3"}]
I need to group the dictionaries by 'groups' value. After that I need to add key min and max of hour in new grouped lists. The output should look like this
outlist=[(1, [{"id":123, "hour":5, "min_group_hour":3, "max_group_hour":5}, {"id":345, "hour":3, "min_group_hour":3, "max_group_hour":5}]),
(2, [{"id":345, "hour":3, "min_group_hour":3, "max_group_hour":3}])
(3, [{"id":65, "hour":-2, "min_group_hour":-2, "max_group_hour":-2}])]
So far I managed to group input list
new_list = []
for domain in test:
for group in domain['groups'].split(';'):
d = dict()
d['id'] = domain['id']
d['group'] = group
d['hour'] = domain['hour']
new_list.append(d)
for k,v in itertools.groupby(new_list, key=itemgetter('group')):
print (int(k),max(list(v),key=itemgetter('hour'))
And output is
('1', [{'group': '1', 'id': 123, 'hour': 5}])
('2', [{'group': '2', 'id': 345, 'hour': 3}])
('3', [{'group': '3', 'id': 65, 'hour': -2}])
I don't know how to aggregate values by group? And is there more pythonic way of grouping dictionaries by key value that needs to be splitted?
Start by creating a dict that maps group numbers to dictionaries:
from collections import defaultdict
dicts_by_group = defaultdict(list)
for dic in inlist:
groups = map(int, dic['groups'].split(';'))
for group in groups:
dicts_by_group[group].append(dic)
This gives us a dict that looks like
{1: [{'id': 123, 'hour': 5, 'groups': '1'},
{'id': 345, 'hour': 3, 'groups': '1;2'}],
2: [{'id': 345, 'hour': 3, 'groups': '1;2'}],
3: [{'id': 65, 'hour': -2, 'groups': '3'}]}
Then iterate over the grouped dicts and set the min_group_hour and max_group_hour for each group:
outlist = []
for group in sorted(dicts_by_group.keys()):
dicts = dicts_by_group[group]
min_hour = min(dic['hour'] for dic in dicts)
max_hour = max(dic['hour'] for dic in dicts)
dicts = [{'id': dic['id'], 'hour': dic['hour'], 'min_group_hour': min_hour,
'max_group_hour': max_hour} for dic in dicts]
outlist.append((group, dicts))
Result:
[(1, [{'id': 123, 'hour': 5, 'min_group_hour': 3, 'max_group_hour': 5},
{'id': 345, 'hour': 3, 'min_group_hour': 3, 'max_group_hour': 5}]),
(2, [{'id': 345, 'hour': 3, 'min_group_hour': 3, 'max_group_hour': 3}]),
(3, [{'id': 65, 'hour': -2, 'min_group_hour': -2, 'max_group_hour': -2}])]
IIUC: Here is another way to do it in pandas:
import pandas as pd
input = [{"id":123,"hour":5,"group":"1"},{"id":345,"hour":3,"group":"1;2"},{"id":65,"hour":-2,"group":"3"}]
df = pd.DataFrame(input)
#Get minimum
dfmi = df.groupby('group').apply(min)
#Rename hour column as min_hour
dfmi.rename(columns={'hour':'min_hour'}, inplace=True)
dfmx = df.groupby('group').apply(max)
#Rename hour column as max_hour
dfmx.rename(columns={'hour':'max_hour'}, inplace=True)
#Merge min df with main df
df = df.merge(dfmi, on='group', how='outer')
#Merge max df with main df
df = df.merge(dfmx, on='group', how='outer')
output = list(df.apply(lambda x: x.to_dict(), axis=1))
#Dictionary of dictionaries
dict_out = df.to_dict(orient='index')

Categories