I have a python list as follows:
my_list =
[[25, 1, 0.65],
[25, 3, 0.63],
[25, 2, 0.62],
[50, 3, 0.65],
[50, 2, 0.63],
[50, 1, 0.62]]
I want to order them according to this rule:
1 --> [0.65, 0.62] <--25, 50
2 --> [0.62, 0.63] <--25, 50
3 --> [0.63, 0.65] <--25, 50
So the expected result is as follows:
Result = [[0.65, 0.62],[0.62, 0.63],[0.63, 0.65]]
How to do it guys?
I tried as follows:
df = pd.DataFrame(my_list,columns=['a','b','c'])
res = df.groupby(['b', 'c']).get_group('c')
print res
ValueError: must supply a tuple to get_group with multiple grouping keys
You can sort your list with native python, but I find it easiest to get your required list using numpy. Since you were going to use pandas anyway, I consider this to be an acceptable solution:
from operator import itemgetter
import numpy as np
# or just use pandas.np if you have that already imported
my_list = [[25, 1, 0.65],
[25, 3, 0.63],
[25, 2, 0.62],
[50, 3, 0.65],
[50, 2, 0.63],
[50, 1, 0.62]]
sorted_list = sorted(my_list,key=itemgetter(1,0)) # sort by second and first column
sliced_array = np.array(sorted_list)[:,-1].reshape(-1,2)
final_list = sliced_array.tolist() # to get a list
The main point is to use itemgetter to sort your list on two columns one after the other. The resulting sorted list contains the required elements in its third column, which I extract with numpy. It could be done with native python, but if you're already using numpy/pandas, this should be natural.
Use the following:
my_list = [[25, 1, 0.65], [25, 3, 0.63], [25, 2, 0.62], [50, 3, 0.65], [50, 2, 0.63], [50, 1, 0.62]]
list_25 = sorted([item for item in my_list if item[0] == 25], key=lambda item: item[1])
list_50 = sorted([item for item in my_list if item[0] == 50], key=lambda item: item[1])
res = [[i[2], j[2]] for i,j in zip(list_25, list_50)]
Output:
>>> res
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]
A way to do this with pandas is to extract each group, pull out 'c', convert to a list and append to the list you want :
z = []
>>> for g in df.groupby('b'):
z.append(g[1]['c'].tolist())
>>> z
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]
You could do this as a list comprehension:
>>> res = [g[1]['c'].tolist() for g in df.groupby('b')]
>>> res
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]
Another way would be to apply list directly to df.groupby('b')['c'] this gives you the object you need. Then call the .tolist() method to return a list of lists:
>>> df.groupby('b')['c'].apply(list).tolist()
[[0.65000000000000002, 0.62], [0.62, 0.63], [0.63, 0.65000000000000002]]
The numpy_indexed package (disclaimer: I am its author) has a one-liner for these kind of problems:
import numpy_indexed as npi
my_list = np.asarray(my_list)
keys, table = npi.Table(my_list[:, 1], my_list[:, 0]).mean(my_list[:, 2])
Note that if duplicate values are present in the list, the mean is reported in the table.
EDIT: added some improvements to the master of numpy_indexed, that allow more control over the way you convert to a table; for instance, there is Table.unique which asserts that each item in the table occurs once in the list, and Table.sum; and eventually all other reductions supported by the numpy_indexed package that make sense. Hopefully I can do a new release for that tonight.
Related
Faced a wall recently with somewhat simple thing but no matter what I am unable to solve it.
I created a small function that calculates some values and returns a list as an output value
def calc(file):
#some calculation based on file
return degradation #as a list
for example, for file "data1.txt"
degradation = [1,0.9,0.8,0.5]
and for file "data2.txt"
degradation = [1,0.8,0.6,0.2]
Since I have several files on which I want to apply the calc() I wanted them to connect them, sideways, so that I connect them into an array which has len(degradation) number of rows, and columns as much as I have files. Was planning to do it with for loop.
For this specific case something like:
output = 1 , 1
0.9,0.8
0.8,0.6
0.5,0.2
Tried with pandas as well but without a success.
import numpy as np
arr2d = np.array([[1, 2, 3, 4]])
arr2d = np.append(arr2d, [[9, 8, 7, 6]], axis=0).T
expect an output something like this:
array([[1, 9],
[2, 8],
[3, 7],
[4, 6]])
You can use numpy.hstack() to achieve this.
Imagine you have data from the first two files from the first two iterations of the for loop.
data1.txt gives you
degradation1 = [1,0.9,0.8,0.5]
and data2.txt gives you
degradation2 = [1,0.8,0.6,0.2]
First, you have to convert both lists into lists of lists.
degradation1 = [[i] for i in degradation1]
degradation2 = [[i] for i in degradation2]
This gives the outputs,
print(degradation1)
print(degradation2)
[[1], [0.9], [0.8], [0.5]]
[[1], [0.8], [0.6], [0.2]]
Now you can stack the data using the numpy.hstack().
stacked = numpy.hstack(degradation1,degradation2)
This gives the output
array([[1. , 1. ],
[0.9, 0.8],
[0.8, 0.6],
[0.5, 0.2]])
Imagine you have the file data3.text during the 3rd iteration of the for loop and it gives
degradation3 = [1,0.3,0.6,0.4]
You can follow the same steps as above and stack it with stacked. Follow the steps; convert to a list of the lists, stack with stacked.
degradation3 = [[i] for i in degradation3]
stacked = numpy.hstack(stacked,degradation3)
This gives you the output
array([[1. , 1. , 1. ],
[0.9, 0.8, 0.3],
[0.8, 0.6, 0.6],
[0.5, 0.2, 0.4]])
You can continue this for the whole loop.
Assume my_lists is a list of your lists.
my_lists = [
[1, 2, 3, 4],
[10, 20, 30, 40],
[100, 200, 300, 400]]
result = []
for _ in my_lists[0]:
result.append([])
for l in my_lists:
for i in range(len(result)):
result[i].append(l[i])
for line in result:
print(line)
The output would be
[1, 10, 100]
[2, 20, 200]
[3, 30, 300]
[4, 40, 400]
As you seem to want to work with lists
## degradations as list
degradation1 = [1,0.8,0.6,0.2]
degradation2 = [1,0.9,0.8,0.5]
degradation3 = [0.7,0.9,0.8,0.5]
degradations = [degradation1, degradation2, degradation3]
## CORE OF THE ANSWER ##
degradationstransposed = [list(i) for i in zip(*degradations)]
print(degradationstransposed)
[[1, 1, 0.7], [0.8, 0.9, 0.9], [0.6, 0.8, 0.8], [0.2, 0.5, 0.5]]
I want to map one numpy array to another one. My frist array has two columns and thousands of rows:
arr_1 = [[20, 0.5],
[30, 0.75],
[40, 1.0],
[50, 1.25],
[60, 1.5],
[70, 1.75],
...]
The second array can have a different number of rows and columns:
arr_2 = [[1, 0.45],
[2, 0.57],
[4, 0.58],
[1, 1.69],
[1, 1.51],
[1, 0.95],
...]
I want to compare the values of the second column of arr_2 with the second column of arr_1 to know which row of arr_2 is closer to which row of arr_1. Then I want to copy the first column of arr_1 into arr_2 from the row with the nearest second column.
For example, 0.45 in arr_2 is closest to 0.5, i.e. first row in arr_1. After finding that, I want to copy the first column of that row (which is 20) into arr_2. The final result would look something like:
arr_2_final = [[1, 0.45, 20],
[2, 0.57, 20],
[4, 0.58, 20],
[1, 1.69, 70],
[1, 1.51, 60],
[1, 0.95, 40],
...]
Looking up lots of items in an array is easiest done when it is sorted. You can delegate most of the work to np.searchsorted. Since we want to find elements in arr_1, it is the only array that needs to be sorted. I suspect that having a sorted arr_2 will speed things up by reducing the size of the search space for every successive element.
First, find the insertion points where arr_2 would end up in arr_1:
indices = np.searchsorted(arr_1[:, 1], arr_2[:, 1])
Now all you have to do is check for cases where the prior element is closer than the current one. There are two corner cases: when index is 0, you have to accept it, and when it is arr_1.size, you have to take the prior.
indices[indices == arr_1.shape[0]] = arr_1.shape[0] - 1
indices[(indices != 0) & (arr_1[indices, 1] - arr_2[:, 1] > arr_2[:, 1] - arr_1[indices - 1, 1])] -= 1
Doing it in this order saves you the trouble of messing with temporary arrays. The first line ensures that the index arr_1[indices, 1] is always valid. Since index -1 is valid, the second line succeeds as well.
The final result is then
np.concatenate((arr_2, arr_1[indices, 0:1]), axis=1)
If arr_1 is not already sorted, you can do the following:
arr_1 = arr1[np.argsort(arr_1[:, 1]), :]
A quick benchmark shows that on my very moderately powered machine, this approach takes ~300ms for arr_1.shape = (500000, 2) and arr_2.shape = (300000, 2).
I would probably do it this way:
import numpy as np
arr_1= [[20, 0.5], [30, 0.75], [40, 1], [50, 1.25], [60, 1.5], [70, 1.75]]
arr_2= [[1, 0.45], [2, 0.57], [4, 0.58], [1, 1.69], [1, 1.51], [1, 0.95]]
arr_2_np = np.array(arr_2)[:,1]
for row in arr_1:
idx = np.argmin(np.abs(arr_2_np - row[1]))
arr_2[idx].append(row[0])
print(arr_2)
I'm trying to take a dictionary with list values and divide each element in that list by the corresponding element in another dictionary list values.
For example, if you have these two dictionaries where the keys of the dictionaries do not match,
dict1={"A": [1,2,3], "B": [4,5,6], "C":[7,8,9]}
dict2={"D": [10,20,30], "B":[40,50,60], "C":[70,80,90]}
I'd like to iterate through the list elements and divide the elements of the list, such that the output is something like this,
new_dict={"A": [1/10, 2/20, 3/30], "B":[4/40, 5/50, 6/60], "C": [7/70, 8/80, 9/90]}
I've tried something like this, but get am getting held up with figuring out how to get into the lists.
new_dict={}
for key, value in dict1:
new_dict={key: [i/j for i, j in zip(value, dict2.values()]}
new_dict
Thank you so much for any and all help!
You can use a dict comprehension to build the resulting dictionary, and a list comprehension to build the values of that dictionary:
>>> dict1
{'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
>>> dict2
{'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
>>> res = {key: [x1/x2 for (x1,x2) in zip(dict1[key],dict2[key])] for key in dict2}
>>> res
{'A': [0.1, 0.1, 0.1], 'B': [0.1, 0.1, 0.1], 'C': [0.1, 0.1, 0.1]}
If they keys don't match and you are going only off the ordering of the keys (which I don't recommend), then you can zip the keys of the first dictionary and the values of the second, then operate on that:
>>> dict1={"A": [1,2,3], "B": [4,5,6], "C":[7,8,9]}
>>> dict2={"D": [10,20,30], "B":[40,50,60], "C":[70,80,90]}
>>> res = {k: [x1/x2 for (x1,x2) in zip(dict1[k],v2)] for (k,v2) in zip(
... dict1.keys(), dict2.values())}
>>> res
{'A': [0.1, 0.1, 0.1], 'B': [0.1, 0.1, 0.1], 'C': [0.1, 0.1, 0.1]}
Dictionary comprehension is what you are looking for:
dict1={"A": [1,2,3], "B": [4,5,6], "C":[7,8,9]}
dict2={"A": [10,20,30], "B":[40,50,60], "C":[70,80,90]}
new_dict={key: [i/j for i, j in zip(dict1[key], dict2[key])] for key in dict1}
Output:
{'A': [0.1, 0.1, 0.1], 'B': [0.1, 0.1, 0.1], 'C': [0.1, 0.1, 0.1]}
I have the following kind of list:
myList = [[500, 5], [500, 10], [500, 3], [504, 9], [505, 10], [505, 20]]
I don't want to have values with the same first element, so i wanted to do this: if two or more elements have the same first value, sum all the second values of the element with the same first value and remove the duplicates, so in my example the new output would be:
myList = [[500, 18], [504, 9], [505, 30]]
How can i do this? I was thinking of using Lambda functions but i don't know how to create the function; other solutions i'm thinking about require a massive amount of for loops, so i was thinking if there is an easier way to do this. Any kind of help is appreciated!
Use a defaultdict:
import collections
# by default, non-existing keys will be initialized to zero
myDict = collections.defaultdict(int)
for key, value in myList:
myDict[key] += value
# transform back to list of lists
myResult = sorted(list(kv) for kv in myDict.items())
using the pandas library:
[[k, v] for k, v in pd.DataFrame(myList).groupby(0).sum()[1].items()]
Breaking it down:
pd.DataFrame(myList) creates a DataFrame where each row is one of the short lists in myList:
0 1
0 500 5
1 500 10
2 500 3
3 504 9
4 505 10
5 505 20
(...).groupby(0)[1].sum() groups by the first column, takes the values from the second one (to create a series instead of a dataframe) and sums them
[[k,v] for k, v in (...).items()] is simple list comprehension (treating the series as a dictionary), to output it back as a list like you wanted.
Output:
[[500, 18], [504, 9], [505, 30]]
The list comprehension can be made even shorter by casting each of the .items() to a list:
list(map(list, pd.DataFrame(myList).groupby(0)[1].sum().items()))
An easier to read implementation (less pythonesqe though :-) )
myList = [[500, 5], [500, 10], [500, 3], [504, 9], [505, 10], [505, 20]]
sums = dict()
for a,b in myList:
if a in sums:
sums[a] += b
else:
sums[a] = b
res = []
for key,val in sums.items():
res.append([key,val])
print (sorted(res))
You can use itertools groupby to group the sublists by the first item in the sublist, sum the last entries in the sublist, and create a new list of group keys, with the sums :
from itertools import groupby
from operator import itemgetter
#sort data
#unnecessary IMO, since data looks sorted
#it is however, required to sort data
#before running the groupby function
myList = sorted(myList, key = itemgetter(0))
Our grouper will be the first item in each sublist (500, 504, 505)
#iterate through the groups
#sum the ends of each group
#pair the sum with the grouper
#return a new list
result = [[key, sum(last for first, last in grp)]
for key, grp
in groupby(myList, itemgetter(0))]
print(result)
[[500, 18], [504, 9], [505, 30]]
myList = [[500, 5], [500, 10], [500, 3], [504, 9], [505, 10], [505, 20]]
temp = {}
for first, second in myList:
if first in temp:
temp[first] += second
else:
temp[first] = second
result = [[k, v] for k, v in temp.items()]
print(result)
This question already has answers here:
Group Python List Elements
(4 answers)
Closed 6 years ago.
I have a python list as follows:
my_list =
[[25, 1, 0.65],
[25, 3, 0.63],
[25, 2, 0.62],
[50, 3, 0.65],
[50, 2, 0.63],
[50, 1, 0.62]]
I want to order them according to this rule:
1 --> [0.65, 0.62] <--25, 50
2 --> [0.62, 0.63] <--25, 50
3 --> [0.63, 0.65] <--25, 50
So the expected result is as follows:
Result = [[0.65, 0.62],[0.62, 0.63],[0.63, 0.65]]
I tried as follows:
import pandas as pd
df = pd.DataFrame(my_list,columns=['a','b','c'])
res = df.groupby(['b', 'c']).get_group('c')
print res
ValueError: must supply a tuple to get_group with multiple grouping keys
How to do it guys?
Here is a pandas solution, you can sort the list by the first column, groupby the second column and covert the third column to list, if you prefer the result to be a list, use tolist() method afterwards:
df = pd.DataFrame(my_list, columns=list('ABC'))
s = df.sort_values('A').groupby('B').C.apply(list)
#B
#1 [0.65, 0.62]
#2 [0.62, 0.63]
#3 [0.63, 0.65]
#Name: C, dtype: object
The above method obtains a pandas series:
To get a list of lists:
s.tolist():
# [[0.65000000000000002, 0.62], [0.62, 0.63], [0.63, 0.65000000000000002]]
To get a numpy array of lists:
s.values
# array([[0.65000000000000002, 0.62], [0.62, 0.63],
# [0.63, 0.65000000000000002]], dtype=object)
s.values[0]
# [0.65000000000000002, 0.62] # here each element in the array is still a list
To get a 2D array or a matrix, you can transform the data frame in a different way, i.e pivot your original data frame to wide format and then convert it to a 2d array:
df.pivot('B', 'A', 'C').as_matrix()
# array([[ 0.65, 0.62],
# [ 0.62, 0.63],
# [ 0.63, 0.65]])
Or:
np.array(s.tolist())
# array([[ 0.65, 0.62],
# [ 0.62, 0.63],
# [ 0.63, 0.65]])
Here is an other way, as it seems in your question you were trying to use get_group():
g = [1,2,3]
result = []
for i in g:
lst = df.groupby('b')['c'].get_group(i).tolist()
result.append(lst)
print(result)
[[0.65, 0.62], [0.62, 0.63], [0.63, 0.65]]