Finding indices of repeated values in numpy array - python

Here's my problem: I'm dealing with output from different receivers, and they are listed by number in column 0 of my array. I'm trying to find the indices that correspond to certain receiver values that show up. For my code below, I was trying to find all indices that had a value of 6.
My problem is that for the output (print) I'm only getting [ ], as if there are no indices that correspond to values for receiver 6. I've seen the data file and know this to be incorrect. The data text file is a 4x22000ish array. Any help would be greatly appreciated. thanks.
from numpy import *
data = loadtxt("/home/***")
s,t,q = data[:,0], data[:,2], data[:,3]
t,q = loadtxt("/home/***", usecols = (2,3), unpack=True)
indices = []
for index, value in enumerate(data[:,0]):
if value == '6':
indices.append(index)
print indices

numpy.nonzero(data[:,0]==6)[0]
data[:,0]==6 returns an array of booleans, 1 when the condition is true, 0 when it is false
numpy.nonzero returns the index of nonzero elements inside of a container
you may also be interested to know that you can do things like
data[data[:,0]==6,2]
to grab all the elements from the 2nd column when the first column is zero

Related

Sort np array based on summed selected values of each row

I have a 2D numpy array, filled with floats.
I want to take a selected chunk of each row (say item 2nd to 3rd), sum these values and sort all the rows based on that sum in a descending order.
For example:
array([[0.80372444, 0.35468653, 0.9081662 , 0.69995566],
[0.53712474, 0.90619077, 0.69068265, 0.73794143],
[0.14056974, 0.34685164, 0.87505744, 0.56927803]])
Here's what I tried:
a = np.array(sorted(a, key = sum))
But that just sums all values from each row, rather that, say, only 2nd to 6th element.
You can start by using take to get elements at indices [1,2] from each row (axis = 1). Then sum across those element for each row (again axis = 1), and use argsort to get the order of the sums. This gives a set of row indices, which you can use to slice the array in the desired order.
import numpy as np
a = np.array([[0.80372444, 0.35468653, 0.9081662 , 0.69995566],
[0.53712474, 0.90619077, 0.69068265, 0.73794143],
[0.14056974, 0.34685164, 0.87505744, 0.56927803]])
a[a.take([1, 2], axis=1).sum(axis=1).argsort()]
# returns:
array([[0.14056974, 0.34685164, 0.87505744, 0.56927803],
[0.80372444, 0.35468653, 0.9081662 , 0.69995566],
[0.53712474, 0.90619077, 0.69068265, 0.73794143]])
Replace key with the function you actually want:
a = np.array(sorted(d, key = lambda v : sum(v[1:3])))

NumPy - Finding and printing non-zero elements in each column of a n-d array

Suppose I have the following Numpy nd array:
array([[['a',0,0,0],
[0,'b','c',0],
['e','d',0,0]]])
Now I would like to define 'double connections' of elements as follows:
We consider each column in this array as a time instant, and all elements in this instant are considered to happen at the same time. 0 means nothing happens. For example, a and e happens at the first time instant, b and d happens at the second time instant, and c itself happens in the third time instant.
If two elements, I believe it has 'double connections', and I would like to print the connections like this(if there is no such pair in one column, just move on to the next column until the end):
('a','e')
('e','a')
('b','d')
('d','b')
I tried to come up with solutions on iterating all the columns but did not work.Can anyone share some tips on this?
You can recreate the original array by the following commands
array = np.array([['a',0,0,0],
[0,'b','c',0],
['e','d',0,0],dtype=object)
You could count how many non-zero elements you have for each column. You select the columns with two non-zero elements, repeat them and inverse every second column:
pairs = np.repeat(array[(array[:, (array != 0).sum(axis=0) == 2]).nonzero()].reshape((2, -1)).T, 2, axis=0)
pairs[1::2] = pairs[1::2, ::-1]
If you want to convert these to tuples like in your desired output you could just do a list comprehension:
output = [tuple(pair) for pair in pairs]

Need to extract and display row with a specific minimum value

I have the following output sample:
[[-5.53759409e-01 -2.68382610e-01 4.06747784e+00]
[-1.66055379e+00 -8.08889466e-01 7.06720368e+01]
[ 2.92172488e-01 8.17347290e-01 3.18001189e+00]
[ 1.89072607e+00 -6.68502526e-01 9.08233869e+01]
[-1.31451627e+00 1.61831269e+00 5.41709058e+00]
[ 1.15886824e+00 3.31177259e-01 5.14391851e+00]
[ 1.87270676e+00 1.24100260e+00 2.64360316e+01]
[ 1.93323801e+00 -5.64255644e-02 7.28368451e+01]
[ 1.33014215e+00 1.96282476e+00 2.96295301e-01]]
The minimum function value at generation 10 is [0.2962953]
I have concatenated two arrays - the coordinate array (elements 0 and 1) and the function values (element 2) to form the above array.
However, I would like to not only display the minimum function value e.g 0.2962953 but also the coordinates associated with it, hence the row of the above array.
Any ideas how I would approach this?
In this case, I would need the bottom row of the above array and a way to highlight the coordinates and function value.
Problem fixed! Just used: printValues = array[np.argmin(array[:, 2]), (0,1)]

Python - split matrix data into separate columns

I have read data from a file and stored into a matrix (frag_coords):
frag_coords =
[[ 916.0907976 -91.01391344 120.83596334]
[ 916.01117655 -88.73389753 146.912555 ]
[ 924.22832597 -90.51682575 120.81734705]
...
[ 972.55384732 708.71316138 52.24644577]
[ 972.49089559 710.51583744 72.86369124]]
type(frag_coords) =
class 'numpy.matrixlib.defmatrix.matrix'
I do not have any issues when reordering the matrix by a specified column. For example, the code below works just fine:
order = np.argsort(frag_coords[:,2], axis=0)
My issue is that:
len(frag_coords[0]) = 1
I need to access the individual numbers of the first row individually, I've tried splitting it, transforming it into a list and everything seems to return the 3 numbers not as columns but rather as a single element with len=1. I need help please!
Your problem is that you're using a matrix instead of an ndarray. Are you sure you want that?
For a matrix, indexing the first row alone leads to another matrix, a row matrix. Check frag_coords[0].shape: it will be (1,3). For an ndarray, it would be (3,).
If you only need to index the first row, use two indices:
frag_coords[0,j]
Or if you store the row temporarily, just index into it as a row matrix:
tmpvar = frag_coords[0] # shape (1,3)
print(tmpvar[0,2]) # for column 2 of row 0
If you don't need too many matrix operations, I'd advise that you use np.arrays instead. You can always read your data into an array directly, but at a given point you can just transform an existing matrix with np.array(frag_coords) too if you wish.

setting null values in a numpy array

how do I null certain values in numpy array based on a condition?
I don't understand why I end up with 0 instead of null or empty values where the condition is not met... b is a numpy array populated with 0 and 1 values, c is another fully populated numpy array. All arrays are 71x71x166
a = np.empty(((71,71,166)))
d = np.empty(((71,71,166)))
for indexes, value in np.ndenumerate(b):
i,j,k = indexes
a[i,j,k] = np.where(b[i,j,k] == 1, c[i,j,k], d[i,j,k])
I want to end up with an array which only has values where the condition is met and is empty everywhere else but with out changing its shape
FULL ISSUE FOR CLARIFICATION as asked for:
I start with a float populated array with shape (71,71,166)
I make an int array based on a cutoff applied to the float array basically creating a number of bins, roughly marking out 10 areas within the array with 0 values in between
What I want to end up with is an array with shape (71,71,166) which has the average values in a particular array direction (assuming vertical direction, if you think of a 3D array as a 3D cube) of a certain "bin"...
so I was trying to loop through the "bins" b == 1, b == 2 etc, sampling the float where that condition is met but being null elsewhere so I can take the average, and then recombine into one array at the end of the loop....
Not sure if I'm making myself understood. I'm using the np.where and using the indexing as I keep getting errors when I try and do it without although it feels very inefficient.
Consider this example:
import numpy as np
data = np.random.random((4,3))
mask = np.random.random_integers(0,1,(4,3))
data[mask==0] = np.NaN
The data will be set to nan wherever the mask is 0. You can use any kind of condition you want, of course, or do something different for different values in b.
To erase everything except a specific bin, try the following:
c[b!=1] = np.NaN
So, to make a copy of everything in a specific bin:
a = np.copy(c)
a[b!=1] == np.NaN
To get the average of everything in a bin:
np.mean(c[b==1])
So perhaps this might do what you want (where bins is a list of bin values):
a = np.empty(c.shape)
a[b==0] = np.NaN
for bin in bins:
a[b==bin] = np.mean(c[b==bin])
np.empty sometimes fills the array with 0's; it's undefined what the contents of an empty() array is, so 0 is perfectly valid. For example, try this instead:
d = np.nan * np.empty((71, 71, 166)).
But consider using numpy's strength, and don't iterate over the array:
a = np.where(b, c, d)
(since b is 0 or 1, I've excluded the explicit comparison b == 1.)
You may even want to consider using a masked array instead:
a = np.ma.masked_where(b, c)
which seems to make more sense with respect to your question: "how do I null certain values in a numpy array based on a condition" (replace null with mask and you're done).

Categories