Python finding min. value in every column in 2D array - python

I have a 2D array, and I would like to find the min. value in every column and minus this min value in every column.
For example,
array = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]]
The smallest values in columns are 1, 2, 4.
I would like the result to be
array = [
[0, 0, 0],
[1, 2, 2],
[4, 5, 5]]
How can I achieve this?

If you use real numpy.array or pandas.DataFrame then you have arr.min(axis=0) and arr - arr.min(axis=0)
For numpy.array
import numpy as np
data = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]
]
arr = np.array(data)
print( arr.min(axis=0) )
print( arr - arr.min(axis=0) )
Result
[1 2 4]
[[0 0 0]
[1 2 2]
[4 5 5]]
Similar for pandas.DataFrame
import pandas as pd
data = [
[1, 2, 4],
[2, 4, 6],
[5, 7, 9]
]
df = pd.DataFrame(data)
print( df.min(axis=0) )
print( df - df.min(axis=0) )
Result
0 1
1 2
2 4
dtype: int64
0 1 2
0 0 0 0
1 1 2 2
2 4 5 5

Related

How to modify every third element in matrix?

I had to make a matrix using numpy.array method. How can I now update every third element of my matrix? I have made a for loop for the problem but that is not the optimal solution. Is there a way to avoid loops? For example if I have this matrix:
matrix = np.array([[1,2,3,4],
[5,6,7,8],
[4,7,6,9]])
is there a way to add 1 to every third element and get this matrix:
[[2,2,3,5],[5,6,8,8],[4,8,6,9]]
Solution:
matrix = np.ascontiguousarray(matrix)
matrix.ravel()[::3] += 1
Why does the ascontiguousarray is needed? Because matrix may not be c-contiguous (for example matrix may have fortran-order - column major). It that case ravel returns a copy instead of a view so a simple inplace operation matrix.ravel()[::3] += 1 will not work as expected.
Example 1
import numpy as np
arr = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[4, 7, 6, 9]])
arr.ravel()[::3] += 1
print(arr)
Works as expected:
[[2 2 3 5]
[5 6 8 8]
[4 8 6 9]]
Example 2
But with fortran-order
import numpy as np
arr = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[4, 7, 6, 9]])
arr = np.asfortranarray(arr)
arr.ravel()[::3] += 1
print(arr)
produces:
[[1 2 3 4]
[5 6 7 8]
[4 7 6 9]]
Example 3
Will work as expected in both cases
import numpy as np
arr = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[4, 7, 6, 9]])
# arr = np.asfortranarray(arr)
arr = np.ascontiguousarray(arr)
arr.ravel()[::3] += 1
print(arr)

getting the index of the last non-empty value in pandas

Let's assume I have the following data frame:
x y
1 -1.808909 0.093380
2 1.733595 -0.380938
3 -1.385898 0.714071
And I want to insert a value in the column after "y".
However, it's possible that I might insert more than one value.
So, I need to check if the cell after "y" is empty or not to avoid overwriting the cell.
so, the expected output might be like
x y
1 -1.808909 0.093380 5
2 1.733595 -0.380938 6 7
3 -1.385898 0.714071 8
Compared to the input above, I need to check the cell first if it's empty or not.
I thought I might use: x = df.iloc[1,:].last_valid_index()
but that method returns "y" not the index of "y" which is 1.
later I'll use that index to inset "5":
x +=1
df.iloc[1,x] = 5
I want to use that approach of finding the last non-empty cell because of the 2nd row in the output.
You see that I need to insert "6" then "7"
If I ended up using always the same method like this one:
df.iloc[1,2] = 6
df.iloc[1,2] = 7
It'll overwrite the "6" when inserting "7"
One more thing, I can't look for the value using something like: (df['y'].iloc[2]).index because later I'll have two "y" columns so, that might leads to returns index number less than the required.
It is easy to identify the position of the first zero in each row in a numpy array or a dataframe. Let's create a dataframe with zeros after a certain position:
df = pd.DataFrame(np.random.normal(size=(5, 10)))
df
0 1 2 3 4 5 6 7 8 9
0 4 1 4 2 6 0 0 0 0 0
1 5 4 9 5 5 4 0 0 0 0
2 6 6 6 5 4 8 6 0 0 0
3 5 3 9 5 3 9 6 3 0 0
4 3 2 7 9 7 6 6 7 5 0
For instance, the code below will give you all positions in the dataframe where the value is 0
np.argwhere(df.values == 0)
array([[0, 5],
[0, 6],
[0, 7],
[0, 8],
[0, 9],
[1, 6],
[1, 7],
[1, 8],
[1, 9],
[2, 7],
[2, 8],
[2, 9],
[3, 8],
[3, 9],
[4, 9]], dtype=int64)
Or you can get the positions where the values are not zero:
np.argwhere(df.values != 0)
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[1, 5],
[2, 0],
[2, 1],
[2, 2],
[2, 3],
[2, 4],
[2, 5],
[2, 6],
[3, 0],
[3, 1],
[3, 2],
[3, 3],
[3, 4],
[3, 5],
[3, 6],
[3, 7],
[4, 0],
[4, 1],
[4, 2],
[4, 3],
[4, 4],
[4, 5],
[4, 6],
[4, 7],
[4, 8]], dtype=int64)
I hope it helps.
I suggest this, a less complicated solution
import random
nums = [0, 7, 78, 843, 34893, 0 , 2, 23, 4, 0]
random.shuffle(nums)
thg = [x for x in nums if x != 0]
print(thg[0])
what this does is shuffle the 'nums' list and filters out all the zeros. Then it prints the first non-zero value

Python Dataframe subtract a value from each list of a row

I have a data frame consisting of lists as elements. I want to subtract a value from each list and create a new column.
My code:
df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
A
0 [1, 2]
1 [4, 5, 6]
# lets substract 1 from each list
val = 1
df['A_new'] = df['A'].apply(lambda x:[a-b for a,b in zip(x[0],[val]*len(x[0]))],axis=1)
Present solution:
IndexError: index 3 is out of bounds for axis 0 with size 2
Expected solution:
df
A A_new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]
Convert to numpy array
df['A_new'] = df.A.map(np.array)-1
Out[455]:
0 [0, 1]
1 [3, 4, 5]
Name: A, dtype: object
df['A_new'] = df['A'].apply(lambda x:[a-b for a,b in zip(x,[val]*len(x))])
You have to pass the list to the len function. Here x is the list itself. So indexing it, x[0] just returns a number which is wrong given the context. This gives the output:
A A_new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]
How about a simple list comprehension:
df['new'] = [[i - 1 for i in l] for l in df['A']]
A new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]
You can convert the list to np.array and then subtract the val:
import numpy as np
df['A_new'] = df['A'].apply(lambda x: np.array(x) - val)
Output:
A A_new
0 [1, 2] [0, 1]
1 [4, 5, 6] [3, 4, 5]

Any function in numpy/pandas/python to search and replace

I have matrix of 4x4 like this
ds1=
4 13 6 9
7 12 5 7
7 0 4 22
9 8 12 0
and other file with two columns:
ds2 =
4 1
5 3
6 1
7 2
8 2
9 3
12 1
13 2
22 3
ds1 = ds1.apply(lambda x: ds2_mean[1] if [condition])
What condition to be added to compare and check that elements from ds1 and ds2 are equal?
I want col1 value from 2nd matrix to be replaced by col2 value in matrix 1, so resultant matrix should look like
1 2 1 3
2 1 3 2
2 0 1 3
3 2 1 0
please see Replacing mean value from one dataset to another this does not answer my question
If you are working with numpy arrays, you could do this -
# Make a copy of ds1 to initialize output array
out = ds1.copy()
# Find out the row indices in ds2 that have intersecting elements between
# its first column and ds1
_,C = np.where(ds1.ravel()[:,None] == ds2[:,0])
# New values taken from the second column of ds2 to be put in output
newvals = ds2[C,1]
# Valid positions in output array to be changed
valid = np.in1d(ds1.ravel(),ds2[:,0])
# Finally make the changes to get desired output
out.ravel()[valid] = newvals
Sample input, output -
In [79]: ds1
Out[79]:
array([[ 4, 13, 6, 9],
[ 7, 12, 5, 7],
[ 7, 0, 4, 22],
[ 9, 8, 12, 0]])
In [80]: ds2
Out[80]:
array([[ 4, 1],
[ 5, 3],
[ 6, 1],
[ 7, 2],
[ 8, 2],
[ 9, 3],
[12, 1],
[13, 2],
[22, 3]])
In [81]: out
Out[81]:
array([[1, 2, 1, 3],
[2, 1, 3, 2],
[2, 0, 1, 3],
[3, 2, 1, 0]])
Here is another solution. Using DataFrame.replace() function.
df1.replace(to_replace= df2[0].tolist(), value= df2[1].tolist, inplace=True)

Numpy: calculate edges of a matrix

I have the following to calculate the difference of a matrix, i.e. the i-th element - the (i-1) element.
How can I (easily) calculate the difference for each element horizontally and vertically? With a transpose?
inputarr = np.arange(12)
inputarr.shape = (3,4)
inputarr+=1
#shift one position
newarr = list()
for x in inputarr:
newarr.append(np.hstack((np.array([0]),x[:-1])))
z = np.array(newarr)
print inputarr
print 'first differences'
print inputarr-z
Output
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
first differences
[[1 1 1 1]
[5 1 1 1]
[9 1 1 1]]
Check out numpy.diff.
From the documentation:
Calculate the n-th order discrete difference along given axis.
The first order difference is given by out[n] = a[n+1] - a[n] along
the given axis, higher order differences are calculated by using diff
recursively.
An example:
>>> import numpy as np
>>> a = np.arange(12).reshape((3,4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.diff(a,axis = 1) # row-wise
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>>> np.diff(a, axis = 0) # column-wise
array([[4, 4, 4, 4],
[4, 4, 4, 4]])

Categories