Set a column in numpy array to zero - python

I want to set a column in numpy array to zero at different times, in other words, I have numpy array M with size 5000x500. When I enter shape command the result is (5000,500), I think 5000 are rows and 500 are columns
shape(M)
(5000,500)
But the problem when I want to access one column like first column
Mcol=M[:][0]
Then I check by shape again with new matrix Mcol
shape(Mcol)
(500,)
I expected the results will be (5000,) as the first has 5000 rows. Even when changed the operation the result was the same
shape(M)
(5000,500)
Mcol=M[0][:]
shape(Mcol)
(500,)
Any help please in explaining what happens in my code and if the following operation is right to set one column to zero
M[:][0]=0

You're doing this:
M[:][0] = 0
But you should be doing this:
M[:,0] = 0
The first one is wrong because M[:] just gives you the entire array, like M. Then [0] gives you the first row.
Similarly, M[0][:] gives you the first row as well, because again [:] has no effect.

Related

NumPy - Finding and printing non-zero elements in each column of a n-d array

Suppose I have the following Numpy nd array:
array([[['a',0,0,0],
[0,'b','c',0],
['e','d',0,0]]])
Now I would like to define 'double connections' of elements as follows:
We consider each column in this array as a time instant, and all elements in this instant are considered to happen at the same time. 0 means nothing happens. For example, a and e happens at the first time instant, b and d happens at the second time instant, and c itself happens in the third time instant.
If two elements, I believe it has 'double connections', and I would like to print the connections like this(if there is no such pair in one column, just move on to the next column until the end):
('a','e')
('e','a')
('b','d')
('d','b')
I tried to come up with solutions on iterating all the columns but did not work.Can anyone share some tips on this?
You can recreate the original array by the following commands
array = np.array([['a',0,0,0],
[0,'b','c',0],
['e','d',0,0],dtype=object)
You could count how many non-zero elements you have for each column. You select the columns with two non-zero elements, repeat them and inverse every second column:
pairs = np.repeat(array[(array[:, (array != 0).sum(axis=0) == 2]).nonzero()].reshape((2, -1)).T, 2, axis=0)
pairs[1::2] = pairs[1::2, ::-1]
If you want to convert these to tuples like in your desired output you could just do a list comprehension:
output = [tuple(pair) for pair in pairs]

df.where based on the difference of two other DataFrames

I have three DataFrames that are all the same shape ~(1,000, 10,000).
original has ~20-100 non-zero values per row - very sparse
input is a copy of original, with 10 random non-zero values per row changed to zero
output is populated completely with non-zero values
I am now attempting to compare original and output only in the positions where input and output are different (i.e. just in the 10 randomly chosen positions)
Firstly, I create a df of only these elements of original with everything else set to zero:
maskedOriginal = original.where(original != input, other=0)
This is created in seconds. I then attempt to do the same for output:
maskedOutput = output.where(original != input, other=0)
However, since this is now working with 3 DataFrames, it is far too slow - I haven't even got a result after a couple of minutes. Is there any more suitable way to do this?
Use numpy.where with DataFrame contructor:
arr = original.values
maskedOriginal = pd.DataFrame(np.where(arr != input, arr, 0),
index=original.index,
columns=original.columns)

Python - split matrix data into separate columns

I have read data from a file and stored into a matrix (frag_coords):
frag_coords =
[[ 916.0907976 -91.01391344 120.83596334]
[ 916.01117655 -88.73389753 146.912555 ]
[ 924.22832597 -90.51682575 120.81734705]
...
[ 972.55384732 708.71316138 52.24644577]
[ 972.49089559 710.51583744 72.86369124]]
type(frag_coords) =
class 'numpy.matrixlib.defmatrix.matrix'
I do not have any issues when reordering the matrix by a specified column. For example, the code below works just fine:
order = np.argsort(frag_coords[:,2], axis=0)
My issue is that:
len(frag_coords[0]) = 1
I need to access the individual numbers of the first row individually, I've tried splitting it, transforming it into a list and everything seems to return the 3 numbers not as columns but rather as a single element with len=1. I need help please!
Your problem is that you're using a matrix instead of an ndarray. Are you sure you want that?
For a matrix, indexing the first row alone leads to another matrix, a row matrix. Check frag_coords[0].shape: it will be (1,3). For an ndarray, it would be (3,).
If you only need to index the first row, use two indices:
frag_coords[0,j]
Or if you store the row temporarily, just index into it as a row matrix:
tmpvar = frag_coords[0] # shape (1,3)
print(tmpvar[0,2]) # for column 2 of row 0
If you don't need too many matrix operations, I'd advise that you use np.arrays instead. You can always read your data into an array directly, but at a given point you can just transform an existing matrix with np.array(frag_coords) too if you wish.

Python Numpy: Coalesce and return first nonzero observation

I am currently new to NumPy, but very proficient with SQL.
I used a function called coalesce in SQL, which I was disappointed not to find in NumPy. I need this function to create a third array want from 2 arrays i.e. array1 and array2, where zero/ missing observations in array1 are replaced by observations in array2 under the same address/Location. I can't figure out how to use np.where?
Once this task is accomplished, I would like to take the lower diagonal of this array want and then populate a final array want2 noting the first non-zero observation. If all observations i.e. coalesce(array1, array2) returns missing or 0 in want2, then assign by default zero.
I have written an example demonstrating the desired behavior.
import numpy as np
array1= np.array(([-10,0,20],[-1,0,0],[0,34,-50]))
array2= np.array(([10,10,50],[10,0,25],[50,45,0]))
# Coalesce array1,array2 i.e. get the first non-zero value from array1, then from array2.
# if array1 is empty or zero, then populate table want with values from array2 under same address
want=np.tril(np.array(([-10,10,20],[-1,0,25],[50,34,-50])))
print(array1)
print(array2)
print(want)
# print first instance of nonzero observation from each column of table want
want2=np.array([-10,34,-50])
print(want2)
"Coalesce": use putmask to replace values equal to zero with values from array2:
want = array1.copy()
np.putmask(array1.copy(), array1==0, array2)
First nonzero element of each column of np.tril(want):
where_nonzero = np.where(np.tril(want) != 0)
"""For the where array, get the indices of only
the first index for each column"""
first_indices = np.unique(where_nonzero[1], return_index=True)[1]
# Get the values from want for those indices
want2 = want[(where_nonzero[0][first_indices], where_nonzero[1][first_indices])]

Replace row with another row in 3D numpy array

I am trying to replace a specific row of NaN's in a 3-D array (filled with NaN's) with rows of known integer values from a specific column in a text file (ex: 24 rows of column 8). Is there a method to perform this replacement that I have missed in my search for help?
My most recent trial code (of many) is as follows:
import numpy as np
tfile = "C:\...\Lee_Gilmer_MEM_GA_01_02_2015.txt"
data = np.genfromtxt(tfile, dtype=None)
#creation of empty 24 hour global matrix
s_array = np.empty((24,361,720))
s_array[:] = np.NAN
#Get values from column 8
c_data = data[:,7]
#Replace all 24 NaN's slices of row 1 column 1 with corresponding 24 row values from column 8
s_array[:,0:1,0:1] = c_data
print s_array
This produces a result of:
ValueError: could not broadcast input array from shape (24) into shape (24,1,1)
When I print out the shape of c_data, I get:
(24L,)
Is this at all possible to do without having to use a loop and replacing each one individually?
The error message tells you pretty much everything you need to know: the array slice on the left-hand side of the assignment has a shape of (24,1,1), whereas the right-hand side has shape (24,). Since these shapes don't match, numpy raises a ValueError.
There are two ways to solve this:
Make the shape of the LHS (24,) rather than (24, 1, 1). A nice way to do this would be to index with an integer rather than a slice for the last two dimensions:
s_array[:, 0, 0] = c_data
Reshape c_data to match the shape of the LHS:
s_array[:, 0:1, 0:1] = c_data.reshape(24, 1, 1)
I think option 1 is a lot more readable.

Categories