NumPy: filter rows by np.array

NumPy: filter rows by np.array - python

I'd like to filter a NumPy 2-d array by checking whether another array contains a column value. How can I do that?
import numpy as np
ar = np.array([[1,2],[3,-5],[6,-15],[10,7]])
another_ar = np.array([1,6])
new_ar = ar[ar[:,0] in another_ar]
print new_ar
I hope to get [[1,2],[6,-15]] but above code prints just [1,2].

You can use np.where,but note that as ar[:,0] is a list of first elements if ar you need to loop over it and check for membership :
>>> ar[np.where([i in another_ar for i in ar[:,0]])]
array([[ 1, 2],
[ 6, -15]])

Instead of using in, you can use np.in1d to check which values in the first column of ar are also in another_ar and then use the boolean index returned to fetch the rows of ar:
>>> ar[np.in1d(ar[:,0], another_ar)]
array([[ 1, 2],
[ 6, -15]])
This is likely to be much faster than using any kind of for loop and testing membership with in.

Related

Python loop through text and set numpy array index

Given a block of text with matrix rows and columns separated by commas and semicolons, I want to parse the text and set the indices of numpy arrays. Here is the code with the variable 'matrixText' representing the base text.
I first create the matrices and then split the text by semicolons and then by commas. I loop through the split text and set each index. However with the text ...
1,2,3;4,5,6;7,8,9
I get the result
7,7,7;8,8,8;9,9,9
temp1=matrixText.split(';')
temp2=temp1[0].split(',')
rows=len(temp1)
columns=len(temp2)
rA=np.zeros((rows, columns))
arrayText=matrixText.split(';')
rowText=range(len(arrayText))
for rowIndex, rowItem in enumerate(arrayText):
rowText[rowIndex]=arrayText[rowIndex].split(',')
for colIndex, colItem in enumerate(rowText[rowIndex]):
rA[[rowIndex, colIndex]]=rowText[rowIndex][colIndex]
I thought that by setting each index, I would avoid any copy by reference issues.
To provide more info, in the first iteration, the 0,0 index is set to 1 and the output of that is then 1,1,1;0,0,0;0,0,0 which I can't figure out since setting one index in the numpy array sets three.
In the second iteration, the index 0-1 is set to 2 and the result is then 2,2,2;2,2,2;0,0,0
The third iteration sets 0-2 to 3 but the result is 3,3,3;2,2,2;3,3,3
Any suggestions?

You can (ab-) use the matrix constructor plus the A property
np.matrix('1,2,3;4,5,6;7,8,9').A
Output:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

matrixText = '1,2,3;4,5,6;7,8,9'
temp1=matrixText.split(';')
temp2=temp1[0].split(',')
rows=len(temp1)
columns=len(temp2)
rA=np.empty((rows, columns),dtype=np.int)
for n, line in enumerate(temp1):
rA[n,:]=line.split(',')

Using a nested list-comprehension:
Having defined:
s = "1,2,3;4,5,6;7,8,9"
we can use a nice one-liner:
np.array([[int(c) for c in r.split(",")] for r in s.split(";")])
which would give the following array:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

Another one-liner (+1 import):
from io import StringIO
rA = np.loadtxt(StringIO(matrixText.replace(';','\n')), delimiter=',')

So, the problem here was that the dual brackets in rA[[rowIndex, colIndex]] caused every cell in the row to be set. This should be rA[rowIndex, colIndex]

Numpy: proper way to arrange input and output vectors

I am trying to create a numpy array with 2 columns and multiple rows. The first column is meant to represent input vector of size 3. The 2nd column is meant to represent output vector of size 2.
arr = np.array([
[np.array([1,2,3]), np.array([1,0])]
[np.array([4,5,6]), np.array([0,1])]
])
I was expecting: arr[:, 0].shape
to return (2, 3), but it returns (2, )
What is the proper way to arrange input and output vectors into a matrix using numpy?

If you are sure the elements in each column have the same size/length, you can select and then stack the result using numpy.row_stack:
np.row_stack(arr[:,0]).shape
# (2, 3)
np.row_stack(arr[:,1]).shape
# (2, 2)

So, the code
arr = np.array([
[np.array([1,2,3]), np.array([1,0])],
[np.array([4,5,6]), np.array([0,1])]
])
Creates an object array, indexing the first column gives you back two rows with one object in each, which accounts for the size. To get what you want you'd need to wrap it in something like
np.vstack(arr[:, 0])
Which creates an array out of the objects in the first column. This isn't very convenient, it would make more sense to me to store these in a dictionary, something like
io = {'in': np.array([[1,2,3],[4,5,6]]),
'out':np.array([[1,0], [0,1]])
}
A structured array gives you a bit of both. Creation is a bit tricky, for the example given,
arr = np.array([
(1,2,3), (1,0)),
((4,5,6), (0,1)) ],
dtype=[('in', '3int64'), ('out', '2float64')])
Creates a structured array with fields in and out, consisting of 3 integers and 2 floats respectively. Rows can be accessed as usual,
In[73]: arr[0]
Out[74]: ([1, 2, 3], [ 1., 0.])
Or by the field name
In [73]: arr['in']
Out[73]:
array([[1, 2, 3],
[4, 5, 6]])
The numpy manual has many more details (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.rec.html). I can't add any details as I've been intending to use them in a project for some time, but haven't.

finding the max of a column in an array

def maxvalues():
for n in range(1,15):
dummy=[]
for k in range(len(MotionsAndMoorings)):
dummy.append(MotionsAndMoorings[k][n])
max(dummy)
L = [x + [max(dummy)]] ## to be corrected (adding columns with value max(dummy))
## suggest code to add new row to L and for next function call, it should save values here.
i have an array of size (k x n) and i need to pick the max values of the first column in that array. Please suggest if there is a simpler way other than what i tried? and my main aim is to append it to L in columns rather than rows. If i just append, it is adding values at the end. I would like to this to be done in columns for row 0 in L, because i'll call this function again and add a new row to L and do the same. Please suggest.

General suggestions for your code
First of all it's not very handy to access globals in a function. It works but it's not considered good style. So instead of using:
def maxvalues():
do_something_with(MotionsAndMoorings)
you should do it with an argument:
def maxvalues(array):
do_something_with(array)
MotionsAndMoorings = something
maxvalues(MotionsAndMoorings) # pass it to the function.
The next strange this is you seem to exlude the first row of your array:
for n in range(1,15):
I think that's unintended. The first element of a list has the index 0 and not 1. So I guess you wanted to write:
for n in range(0,15):
or even better for arbitary lengths:
for n in range(len(array[0])): # I chose the first row length here not the number of columns
Alternatives to your iterations
But this would not be very intuitive because the max function already implements some very nice keyword (the key) so you don't need to iterate over the whole array:
import operator
column = 2
max(array, key=operator.itemgetter(column))[column]
this will return the row where the i-th element is maximal (you just define your wanted column as this element). But the maximum will return the whole row so you need to extract just the i-th element.
So to get a list of all your maximums for each column you could do:
[max(array, key=operator.itemgetter(column))[column] for column in range(len(array[0]))]
For your L I'm not sure what this is but for that you should probably also pass it as argument to the function:
def maxvalues(array, L): # another argument here
but since I don't know what x and L are supposed to be I'll not go further into that. But it looks like you want to make the columns of MotionsAndMoorings to rows and the rows to columns. If so you can just do it with:
dummy = [[MotionsAndMoorings[j][i] for j in range(len(MotionsAndMoorings))] for i in range(len(MotionsAndMoorings[0]))]
that's a list comprehension that converts a list like:
[[1, 2, 3], [4, 5, 6], [0, 2, 10], [0, 2, 10]]
to an "inverted" column/row list:
[[1, 4, 0, 0], [2, 5, 2, 2], [3, 6, 10, 10]]
Alternative packages
But like roadrunner66 already said sometimes it's easiest to use a library like numpy or pandas that already has very advanced and fast functions that do exactly what you want and are very easy to use.
For example you convert a python list to a numpy array simple by:
import numpy as np
Motions_numpy = np.array(MotionsAndMoorings)
you get the maximum of the columns by using:
maximums_columns = np.max(Motions_numpy, axis=0)
you don't even need to convert it to a np.array to use np.max or transpose it (make rows to columns and the colums to rows):
transposed = np.transpose(MotionsAndMoorings)
I hope this answer is not to unstructured. Some parts are suggestions to your function and some are alternatives. You should pick the parts that you need and if you have any trouble with it, just leave a comment or ask another question. :-)

An example with a random input array, showing that you can take the max in either axis easily with one command.
import numpy as np
aa= np.random.random([4,3])
print aa
print
print np.max(aa,axis=0)
print
print np.max(aa,axis=1)
Output:
[[ 0.51972266 0.35930957 0.60381998]
[ 0.34577217 0.27908173 0.52146593]
[ 0.12101346 0.52268843 0.41704152]
[ 0.24181773 0.40747905 0.14980534]]
[ 0.51972266 0.52268843 0.60381998]
[ 0.60381998 0.52146593 0.52268843 0.40747905]

Access Columns of Numpy Array? Errors Trying to Do by Transpose or by Column Access

I've a numpy.ndarray the columns of which I'd like to access. I will be taking all columns after 8 and testing them for variance, removing the column if the variance/average is low. In order to do this, I need access to the columns, preferably with Numpy. By my current methods, I encounter errors or failure to transpose.
To mine these arrays, I am using the IOPro adapter, which gives a regular numpy.ndarray.
import iopro
import sys
adapter = iopro.text_adapter(sys.argv[1], parser='csv')
all_data = adapter[:]
z_matrix = adapter[range(8,len(all_data[0]))][1:3]
print type(z_matrix) #check type
print z_matrix # print array
print z_matrix.transpose() # attempt transpose (fails)
print z_matrix[:,0] # attempt access by column (fails)
Can someone explain what is happening?
The output is this:
<type 'numpy.ndarray'>
[ (18.712, 64.903, -10.205, -1.346, 0.319, -0.654, 1.52398, 114.495, -75.2488, 1.52184, 111.31, 175.
408, 1.52256, 111.699, -128.141, 1.49227, 111.985, -138.173)
(17.679, 48.015, -3.152, 0.848, 1.239, -0.3, 1.52975, 113.963, -50.0622, 1.52708, 112.335, -57.4621
, 1.52603, 111.685, -161.098, 1.49204, 113.406, -66.5854)]
[ (18.712, 64.903, -10.205, -1.346, 0.319, -0.654, 1.52398, 114.495, -75.2488, 1.52184, 111.31, 175.
408, 1.52256, 111.699, -128.141, 1.49227, 111.985, -138.173)
(17.679, 48.015, -3.152, 0.848, 1.239, -0.3, 1.52975, 113.963, -50.0622, 1.52708, 112.335, -57.4621
, 1.52603, 111.685, -161.098, 1.49204, 113.406, -66.5854)]
Traceback (most recent call last):
File "z-matrix-filtering.py", line 11, in <module>
print z_matrix[:,0]
IndexError: too many indices
What is going wrong? Is there a better way to access the columns? I will be reading all lines of a file, testing all columns from the 8th for significant variance, removing any columns that don't vary significantly, and then reprinting the result as a new CSV.
EDIT:
Based on responses, I have created the following very ugly and I think inane approach.
all_data = adapter[:]
z_matrix = []
for line in all_data:
to_append = []
for column in range(8,len(all_data.dtype)):
to_append.append(line[column].astype(np.float16))
z_matrix.append(to_append)
z_matrix = np.array(z_matrix)
The reason that the columns must be directly accessed is that there is a String inside the data. If this string is not circumvented in some way, an error will be thrown about a void-array with object members using buffer error.
Is there a better solution? This seems terrible, and it seems it will be inefficient for several gigabytes of data.

Notice that the output of print z_matrix has the form
[ (18.712, 64.903, ..., -138.173)
(17.679, 48.015, ..., -66.5854)]
That is, it is printed as a list of tuples. That is the output you get when the array is a "structured array". It is a one-dimensional array of structures. Each "element" in the array has 18 fields. The error occurs because you are trying to index a 1-D array as if it were 2-D; z_matrix[:,0] won't work.
Print the data type of the array to see the details. E.g.
print z_matrix.dtype
That should show the names of the fields and their individual data types.
You can get one of the elements as, for example, z_matrix[k] (where k is an integer), or you can access a "column" (really a field of the structured array) as z_matrix['name'] (change 'name' to one of the fields in the dtype).
If the fields all have the same data type (which looks like the case here--each field has type np.float64), you can create a 2-D view of the data by reshaping the result of the view method. For example:
z_2d = z_matrix.view(np.float64).reshape(-1, len(z_matrix.dtype.names))
Another way to get the data by column number rather than name is:
col = 8 # The column number (zero-based).
col_data = z_matrix[z_matrix.dtype.names[col]]
For more about structured arrays, see http://docs.scipy.org/doc/numpy/user/basics.rec.html.

The display of z_matrix is consistent with it being shape (2,), a 1d array of tuples.
np.array([np.array(a) for a in z_matrix])
produces a (2,18) 2d array. You should be able to do your column tests on that.

It is very easy to access numpy array. Here's a simple example which can be helpful
import numpy as n
A = n.array([[1, 2, 3], [4, 5, 6]])
print A
>>> array([[1, 2, 3],
[5, 6, 7]])
A.T // To obtain the transpose
>>> array([[1, 5],
[2, 6],
[3, 7]])
n.mean(A.T, axis = 1) // To obtain column wise mean of array A
>>> array([ 3., 4., 5.])
I hope this will help you perform your transpose and column-wise operations

Simple question: In numpy how do you make a multidimensional array of arrays?

Right, perhaps I should be using the normal Python lists for this, but here goes:
I want a 9 by 4 multidimensional array/matrix (whatever really) that I want to store arrays in. These arrays will be 1-dimensional and of length 4096.
So, I want to be able to go something like
column = 0 #column to insert into
row = 7 #row to insert into
storageMatrix[column,row][0] = NEW_VALUE
storageMatrix[column,row][4092] = NEW_VALUE_2
etc..
I appreciate I could be doing something a bit silly/unnecessary here, but it will make it ALOT easier for me to have it structured like this in my code (as there's alot of these, and alot of analysis to be done later).
Thanks!

Note that to leverage the full power of numpy, you'd be much better off with a 3-dimensional numpy array. Breaking apart the 3-d array into a 2-d array with 1-d values
may complicate your code and force you to use loops instead of built-in numpy functions.
It may be worth investing the time to refactor your code to use the superior 3-d numpy arrays.
However, if that's not an option, then:
import numpy as np
storageMatrix=np.empty((4,9),dtype='object')
By setting the dtype to 'object', we are telling numpy to allow each element of storageMatrix to be an arbitrary Python object.
Now you must initialize each element of the numpy array to be an 1-d numpy array:
storageMatrix[column,row]=np.arange(4096)
And then you can access the array elements like this:
storageMatrix[column,row][0] = 1
storageMatrix[column,row][4092] = 2

The Tentative NumPy Tutorial says you can declare a 2D array using the comma operator:
x = ones( (3,4) )
and index into a 2D array like this:
>>> x[1,2] = 20
>>> x[1,:] # x's second row
array([ 1, 1, 20, 1])
>>> x[0] = a # change first row of x
>>> x
array([[10, 20, -7, -3],
[ 1, 1, 20, 1],
[ 1, 1, 1, 1]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

NumPy: filter rows by np.array - python

You can use np.where,but note that as ar[:,0] is a list of first elements if ar you need to loop over it and check for membership : >>> ar[np.where([i in another_ar for i in ar[:,0]])] array([[ 1, 2], [ 6, -15]])

Related

Python loop through text and set numpy array index

Numpy: proper way to arrange input and output vectors

finding the max of a column in an array

Access Columns of Numpy Array? Errors Trying to Do by Transpose or by Column Access

Simple question: In numpy how do you make a multidimensional array of arrays?

Categories

Resources