pandas multiply each dataset row by multiple vectors - python

df = {1,2,3
4,5,6
7,8,9,
10,11,12
}
weights={[1,3,3],[2,2,2],[3,1,1]}
I want to multiply my df with every line of matrix weights(so I'll have like three different df, one for each vector of weights, and to combine each df by keeping the biggest line of values). Ex:
df0=df * weights[0]={1,6,9
4,15,18,
7,24,27
10,33,36
}
df1=df*wieghts[1]={2,4,6,
8,19,12,
14,16,18,
20,22,24
}
df2=df*wieghts[2]={3,2,3,
12,5,6,
21,8,9,
30,11,12
}
and
final_df_lines=max{df0,df1,df2}={1,6,9 - max line line from df0,
4,15,18, - max line from df0,
7,24,27 - max line from df0,
10,33,36 - max line from df0,
}
In this example all max were from df0 ... but they could be from any of the three df. Max line is just adding the numbers from the same line..
I need to do this things vectorized(without any loops or if...) how do I do this? is it possible at least? I really need welp :( for 2 days I'm searching the internet to do this... I did not work in python for too long...

you can try of concatenating all weights mulitpied columns as one dataframe with suffix of column represeting each weight ,
and by grouping with respect to the weight it multiplied get max summation of index
with max index weight you can multiply the dataframe
df2 = pd.concat([(df*i).add_suffix('__'+str(i)) for i in weights],axis=1).T
0 1 2 3
0__[1, 3, 3] 1 4 7 10
1__[1, 3, 3] 6 15 24 33
2__[1, 3, 3] 9 18 27 36
0__[2, 2, 2] 2 8 14 20
1__[2, 2, 2] 4 10 16 22
2__[2, 2, 2] 6 12 18 24
0__[3, 1, 1] 3 12 21 30
1__[3, 1, 1] 2 5 8 11
2__[3, 1, 1] 3 6 9 12
# by grouping with respect to the weight it multiplied, get max index
a = df2.groupby(df2.index.str.split('__').str[1]).apply(lambda x: x.sum()).idxmax()
# max weights with respect to summation of rows
df['idxmax'] = a.str.slice(1,-1).str.split(',').apply(lambda x: list(map(int,x)))
c [1, 3, 3]
d [1, 3, 3]
3 [1, 3, 3]
4 [1, 3, 3]
dtype: object
df.apply(lambda x: x.loc[df.columns.difference(['idxmax'])] * x['idxmax'],1)
0 1 2
0 1 6 9
1 4 15 18
2 7 24 27
3 10 33 36

EDIT: As question has been updated I had to update too:
You have to align matrices first to be able to make an element-wise matrix operation without using any loop:
import numpy as np
a = [
[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]
]
weights = [
[1,3,3],
[2,2,2],
[3,1,1]
]
w_s = np.array( (4 * [weights[0]], 4 * [weights[1]], 4 * [weights[2]]) )
a_s = np.array(3 * [a])
result_matrix1 = w_s * a_s[0]
result_matrix2 = w_s * a_s[1]
result_matrix3 = w_s * a_s[2]
print(result_matrix1)
print(result_matrix2)
print(result_matrix3)
Output:
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
[[[ 1 6 9]
[ 4 15 18]
[ 7 24 27]
[10 33 36]]
[[ 2 4 6]
[ 8 10 12]
[14 16 18]
[20 22 24]]
[[ 3 2 3]
[12 5 6]
[21 8 9]
[30 11 12]]]
The solution is numpy, but you can do it with pandas as well, if you prefer it, of course.

Related

How to store each iteration's values in dataframe?

I want to take input in the form of lists and join them into strings. How I can store the output as a dataframe column?
The input X is a dataframe and the column name is des:
X['des'] =
[5, 13]
[L32MM, 4MM, 2]
[724027, 40]
[58, 60MM, 36MM, 0, 36, 3]
[8.5, 45MM]
[5.0MM, 44MM]
[10]
This is my code:
def clean_text():
for i in range(len(X)):
str1 = " "
print(str1.join(X['des'][i]))
m = clean_text
m()
And here is my output, but how I can make it as a dataframe?
5 13
L32MM 4MM 2
724027 40
58 60MM 36MM 0 36 3
8.5 45MM
5.0MM 44MM
10
Note that iterating in pandas is an antipattern. Whenever possible, use DataFrame and Series methods to operate on entire columns at once.
Series.str.join (recommended)
X['joined'] = X['des'].str.join(' ')
Output:
des joined
0 [5, 13] 5 13
1 [L32MM, 4MM, 2] L32MM 4MM 2
2 [724027, 40] 724027 40
3 [58, 60MM, 36MM, 0, 36, 3] 58 60MM 36MM 0 36 3
4 [8.5, 45MM] 8.5 45MM
5 [5.0MM, 44MM] 5.0MM 44MM
6 [10] 10
Loop (not recommended)
Iterate the numpy values and assign using DataFrame.loc:
for i, des in enumerate(X['des'].to_numpy()):
X.loc[i, 'loop'] = ' '.join(des)
Or iterate via DataFrame.itertuples:
for row in X.itertuples():
X.loc[row.Index, 'itertuples'] = ' '.join(row.des)
Or iterate via DataFrame.iterrows:
for i, row in X.iterrows():
X.loc[i, 'iterrows'] = ' '.join(row.des)
Output:
des loop itertuples iterrows
0 [5, 13] 5 13 5 13 5 13
1 [L32MM, 4MM, 2] L32MM 4MM 2 L32MM 4MM 2 L32MM 4MM 2
2 [724027, 40] 724027 40 724027 40 724027 40
3 [58, 60MM, 36MM, 0, 36, 3] 58 60MM 36MM 0 36 3 58 60MM 36MM 0 36 3 58 60MM 36MM 0 36 3
4 [8.5, 45MM] 8.5 45MM 8.5 45MM 8.5 45MM
5 [5.0MM, 44MM] 5.0MM 44MM 5.0MM 44MM 5.0MM 44MM
6 [10] 10 10 10

Iterate over last axis of a numpy array

Let's say we have a (20, 5) array. We can iterate over each row very pythonically:
import numpy as np
xs = np.array(range(100)).reshape(20, 5)
for x in xs:
print(x)
If we want to iterate over another axis (here in the example, iterate over columns, but I'm looking for a solution for each possible axis in a ndarray), it's less direct, we can use the method from Iterating over arbitrary dimension of numpy.array:
for i in range(xs.shape[-1]):
x = xs[..., i]
print(x)
Is there a more direct way to iterate over another axis, like (pseudo-code):
for x in xs.iterator(axis=-1):
print(x)
?
I think that as_strided from the stride tricks module should do the work here.
It creates a view into the array and not a copy (as stated by the docs).
Here is a simple demonstration of as_stided capabilities:
from numpy.lib.stride_tricks import as_strided
import numpy as np
xs = np.array(range(3 *3 * 4)).reshape(3,3, 4)
for x in xs:
print(x)
output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
[[24 25 26 27]
[28 29 30 31]
[32 33 34 35]]
function to iterate over array specific axis:
def iterate_over_axis(arr, axis=0):
strides = arr.strides
strides_ = [strides[axis], *strides[0:axis], *strides[(axis+1):]]
shape = arr.shape
shape_ = [shape[axis], *shape[0:axis], *shape[(axis+1):]]
return as_strided(arr, strides=strides_, shape=shape_)
for x in iterate_over_axis(xs, axis=1):
print(x)
output:
[[ 0 1 2 3]
[12 13 14 15]
[24 25 26 27]]
[[ 4 5 6 7]
[16 17 18 19]
[28 29 30 31]]
[[ 8 9 10 11]
[20 21 22 23]
[32 33 34 35]]

Slicing array with numpy?

import numpy as np
r = np.arange(36)
r.resize((6, 6))
print(r)
# prints:
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]
# [24 25 26 27 28 29]
# [30 31 32 33 34 35]]
print(r[:,::7])
# prints:
# [[ 0]
# [ 6]
# [12]
# [18]
# [24]
# [30]]
print(r[:,0])
# prints:
# [ 0 6 12 18 24 30]
The r[:,::7] gives me a column, the r[:,0] gives me a row, they both have the same numbers. Would be glad if someone could explain to me why?
Because the step argument is greater than the corresponding shape so you'll just get the first "row". However these are not identical (even if they contain the same numbers) because the scalar index in [:, 0] flattens the corresponding dimension (so you'll get a 1D array). But [:, ::7] will keep the number of dimensions intact but alters the shape of the step-sliced dimension.

Shifting the location of tensor3 elements based on an offset vector

I have a Theano tensor3 (i.e., a 3-dimensional array) x:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
as well as a Theano vector (i.e., a 1-dimensional array) y, which we will refer as an "offset" vector, since it specifies the desired offset:
[2, 1]
I want to shift the location of elements of x based on vector y, so that the output be as follows (the shift is performed on the second dimension):
[[[ a b c d]
[ e f g h]
[ 0 1 2 3]]
[[ i j k l]
[12 13 14 15]
[16 17 18 19]]]
where the a, b, …, l could be any number.
For example, a valid output could be:
[[[ 0 0 0 0]
[ 0 0 0 0]
[ 0 1 2 3]]
[[ 0 0 0 0]
[12 13 14 15]
[16 17 18 19]]]
Another valid output could be:
[[[ 4 5 6 7]
[ 8 9 10 11]
[ 0 1 2 3]]
[[20 21 22 23]
[12 13 14 15]
[16 17 18 19]]]
I am aware of the function theano.tensor.roll(x, shift, axis=None), however the shift can only take a scalar as input, i.e. it shifts all elements with the same offset.
E.g., the code:
import theano.tensor
from theano import shared
import numpy as np
x = shared(np.arange(24).reshape((2,3,4)))
print('theano.tensor.roll(x, 2, axis=1).eval(): \n{0}'.
format(theano.tensor.roll(x, 2, axis=1).eval()))
outputs:
theano.tensor.roll(x, 2, axis=1).eval():
[[[ 4 5 6 7]
[ 8 9 10 11]
[ 0 1 2 3]]
[[16 17 18 19]
[20 21 22 23]
[12 13 14 15]]]
which is not what I want.
How can I shift the location of tensor3 elements based on an offset vector? (note that in the code provided in this example, the tensor3 is a shared variable for convenience, but in my actual code it will be a symbolic variable)
I couldn't find any dedicated function for that purpose, so I simply ended up using theano.scan:
import theano
import theano.tensor
from theano import shared
import numpy as np
y = shared(np.array([2,1]))
x = shared(np.arange(24).reshape((2,3,4)))
print('x.eval():\n{0}\n'.format(x.eval()))
def shift_and_reverse_row(matrix, y):
'''
Shift and reverse the matrix in the direction of the first dimension (i.e., rows)
matrix: matrix
y: scalar
'''
new_matrix = theano.tensor.zeros_like(matrix)
new_matrix = theano.tensor.set_subtensor(new_matrix[:y,:], matrix[y-1::-1,:])
return new_matrix
new_x, updates = theano.scan(shift_and_reverse_row, outputs_info=None,
sequences=[x, y[::-1]] )
new_x = new_x[:, ::-1, :]
print('new_x.eval(): \n{0}'.format(new_x.eval()))
output:
x.eval():
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
new_x.eval():
[[[ 0 0 0 0]
[ 0 0 0 0]
[ 0 1 2 3]]
[[ 0 0 0 0]
[12 13 14 15]
[16 17 18 19]]]

Iteration through all 1 dimensional subarrays of a multi-dimensional array

What is the fastest way to iterate through all one dimensional sub-arrays of an n dimensional array in python.
For example consider the 3-D array:
import numpy as np
a = np.arange(24)
a = a.reshape(2,3,4)
The desired sequence of yields from the iterator is :
a[:,0,0]
a[:,0,1]
..
a[:,2,3]
a[0,:,0]
..
a[1,:,3]
a[0,0,:]
..
a[1,2,:]
Here is a compact implementation of such an iterator:
def iter1d(a):
return itertools.chain.from_iterable(
numpy.rollaxis(a, axis, a.ndim).reshape(-1, dim)
for axis, dim in enumerate(a.shape))
This will yield the subarrays in the order you gave in your post:
for x in iter1d(a):
print x
prints
[ 0 12]
[ 1 13]
[ 2 14]
[ 3 15]
[ 4 16]
[ 5 17]
[ 6 18]
[ 7 19]
[ 8 20]
[ 9 21]
[10 22]
[11 23]
[0 4 8]
[1 5 9]
[ 2 6 10]
[ 3 7 11]
[12 16 20]
[13 17 21]
[14 18 22]
[15 19 23]
[0 1 2 3]
[4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]
The trick here is to iterate over all axes, and for each axis reshape the array to a two-dimensional array the rows of which are the desired one-dimensional subarrays.
There may be a more efficient way, but this should work...
import itertools
import numpy as np
a = np.arange(24)
a = a.reshape(2,3,4)
colon = slice(None)
dimensions = [range(dim) + [colon] for dim in a.shape]
for dim in itertools.product(*dimensions):
if dim.count(colon) == 1:
print a[dim]
This yields (I'm leaving out a trivial bit of code to print the left hand side of this...):
a[0,0,:] --> [0 1 2 3]
a[0,1,:] --> [4 5 6 7]
a[0,2,:] --> [ 8 9 10 11]
a[0,:,0] --> [0 4 8]
a[0,:,1] --> [1 5 9]
a[0,:,2] --> [ 2 6 10]
a[0,:,3] --> [ 3 7 11]
a[1,0,:] --> [12 13 14 15]
a[1,1,:] --> [16 17 18 19]
a[1,2,:] --> [20 21 22 23]
a[1,:,0] --> [12 16 20]
a[1,:,1] --> [13 17 21]
a[1,:,2] --> [14 18 22]
a[1,:,3] --> [15 19 23]
a[:,0,0] --> [ 0 12]
a[:,0,1] --> [ 1 13]
a[:,0,2] --> [ 2 14]
a[:,0,3] --> [ 3 15]
a[:,1,0] --> [ 4 16]
a[:,1,1] --> [ 5 17]
a[:,1,2] --> [ 6 18]
a[:,1,3] --> [ 7 19]
a[:,2,0] --> [ 8 20]
a[:,2,1] --> [ 9 21]
a[:,2,2] --> [10 22]
a[:,2,3] --> [11 23]
The key here is that indexing a with (for example) a[0,0,:] is equivalent to indexing a with a[(0,0,slice(None))]. (This is just generic python slicing, nothing numpy-specific. To prove it to yourself, you can write a dummy class with just a __getitem__ and print what's passed in when you index an instance of your dummy class.).
So, what we want is every possible combination of 0 to nx, 0 to ny, 0 to nz, etc and a None for each axis.
However, we want 1D arrays, so we need to filter out anything with more or less than one None (i.e. we don't want a[:,:,:], a[0,:,:], a[0,0,0] etc).
Hopefully that makes some sense, anyway...
Edit: I'm assuming that the exact order doesn't matter... If you need the exact ordering you list in your question, you'll need to modify this...
Your friends are the slice() objects, numpy's ndarray.__getitem__() method, and possibly the itertools.chain.from_iterable.

Categories