numpy: how to apply function to every row of array - python

I have a 2d numpy array called my_data. Each row represents information about one data point and each column represents different attributes of that data point.
I have a function called processRow. It takes in a row, and does some processing on the info and returns the modified row. The length of the row returned by the function is longer than the row taken in by the function (the function basically expands some categorical data into one-hot vectors)
How can I have a numpy array where every row has been processed by this function?
I tried
answer = np.array([])
for row in my_data:
answer = np.append(answer,processRow(row))
but at the end, the answer is just a single really long row rather than a 2d grid

You can use vstack rather since row has a different shape to answer. You also need to be explicit with the shape of answer:
In [11]: my_data = np.array([[1, 2], [3, 4]])
...: process_row = lambda x: x # do nothing
In [12]: answer = np.empty((0, 2), dtype='int64')
...: for row in my_data:
...: answer = np.vstack([answer, process_row(row)])
...:
In [13]: answer
Out[13]:
array([[ 1, 2],
[ 3, 4]])
However, you're probably better off doing a list comprehension, and then passing it to numpy after:
In [21]: np.array([process_row(row) for row in my_data])
Out[21]:
array([[1, 2],
[3, 4]])

I'm not sure if I entirely got what you were after without seeing a sample of the data. But hopefully this helps you get to the result you want. I simplified the concept and just added one to each value in the row passed to the function and added the results together for a total (just to expand the size of the returned array). Of course you could adjust the processing to whatever you wanted.
def funky(x):
temp = []
for value in x:
value += 1
temp.append(value)
temp.append(temp[0] + temp[1])
return np.array(temp)
my_data = np.array([[1,1], [2,2]])
answer = np.apply_along_axis(funky, 1, my_data)
print("This is the original data:\n{}".format(my_data))
print("This is the adjusted data:\n{}".format(answer))
Below is the before and after of the array modification:
This is the original data:
[[1 1]
[2 2]]
This is the adjusted data:
[[2 2 4]
[3 3 6]]

Related

Iterating through rows in numpy array with one row

For a 2D numpy array A, the loop for a in A will loop through all the rows in A. This functionality is what I want for my code, but I'm having difficulty with the edge case where A only has one row (i.e., is essentially a 1-dimensional array). In this case, the for loop treats A as a 1D array and iterates through its elements. What I want to instead happen in this case is a natural extension of the 2D case, where the loop retrieves the (single) row in A. Is there a way to format the array A such that the for loop functions like this?
Depending on if you declare the array yourself you can do this:
A = np.array([[1, 2, 3]])
Else you can check the dim of your array before iterating over it
B = np.array([1, 2, 3])
if B.ndim == 1:
B = B[None, :]
Or you can use the function np.at_least2d
C = np.array([1, 2, 3])
C = np.atleast_2d(C)
If your array trully is a 2D array, even with one row, there is no edge case:
import numpy
a = numpy.array([[1, 2, 3]])
for line in a:
print(line)
>>> [1 2 3]
You seem to be confusing numpy.array([[1, 2, 3]]) which is a 2D array of one line and numpy.array([1, 2, 3]) which would be a 1D array.
I think you can use np.expand_dims to achieve your goal
X = np.expand_dims(X, axis=0)

Get all the rows with same values in python?

So, suppose I have this 2D array in python
a = [[1,2]
[2,3]
[3,2]
[1,3]]
How do get all array entries with the same row value and store them in a new matrix.
For example, I will have
b = [1,2]
[1,3]
after the query.
My approach is b = [a[i] for i in a if a[i][0] == 1][0]]
but it didn't seem to work?
I am new to Python and the whole index slicing thing is kind confusing. Thanks!
Since you tagged numpy, you can perform this task with NumPy arrays. First define your array:
a = np.array([[1, 2],
[2, 3],
[3, 2],
[1, 3]])
For all unique values in the first column, you can use a dictionary comprehension. This is useful to avoid duplicating operations.
d = {i: a[a[:, 0] == i] for i in np.unique(a[:, 0])}
{1: array([[1, 2],
[1, 3]]),
2: array([[2, 3]]),
3: array([[3, 2]])}
Then access your array where first column is equal to 1 via d[1].
For a single query, you can simply use a[a[:, 0] == 1].
The for i in a syntax gives you the actual items in the list..so for example:
list_of_strs = ['first', 'second', 'third']
first_letters = [s[0] for s in list_of_strs]
# first_letters == ['f', 's', 't']
What you are actually doing with b = [a[i] for i in a if a[i][0]==1] is trying to index an element of a with each of the elements of a. But since each element of a is itself a list, this won't work (you can't index lists with other lists)
Something like this should work:
b = [row for row in a if row[0] == 1]
Bonus points if you write it as a function so that you can pick which thing you want to filter on.
If you're working with arrays a lot, you might also check out the numpy library. With numpy, you can do stuff like this.
import numpy as np
a = np.array([[1,2], [2,3], [3,2], [1,3]])
b = a[a[:,0] == 1]
The last line is basically indexing the original array a with a boolean array defined inside the first set of square brackets. It's very flexible, so you could also modify this to filter on the second element, filter on other conditions (like > some_number), etc. etc.

Basic NumPy array replacement

I have a rather basic question about the NumPy module in Python 2, particularly the version on trinket.io. I do not see how to replace values in a multidimensional array several layers in, regardless of the method. Here is an example:
a = numpy.array([1,2,3])
a[0] = 0
print a
a = numpy.array([[1,2,3],[1,2,3]])
a[0][0] = a[1][0] = 0
print a
Result:
array([0, 2, 3], '<class 'int'>')
array([[1, 2, 3], [1, 2, 3]], '<class 'int'>')
I need the ability to change individual values, my specific code being:
a = numpy.empty(shape = (8,8,2),dtype = str)
for row in range(a.shape[0]):
for column in range(a.shape[1]):
a[row][column][1] = 'a'
Thank you for your time and any help provided.
To change individual values you can simply do something like:
a[1,2] = 'b'
If you want to change all the array, you can do:
a[:,:] = 'c'
Use commas (array[a,b]) instead of (array[a][b])
With numpy version 1.11.0, I get
[[0 2 3]
[0 2 3]]
When I run your code. I guess your numpy version is newer and better.
As user3408085 said, the correct thing is to go a[0,0] = 0 to change one element or a[:,0]=0 if your actually want to zero the entire first column.
The reason a[0][0]=0 does not modify a (at least in your version of numpy) is that a[0] is a new array. If break down your command a[0][0]=0 into 2 lines:
b=a[0]
b[0]=0
Then the fact that this modifies a is counterintuitive.

Slicing a NumPy array within a loop [duplicate]

This question already has answers here:
What is the difference between i = i + 1 and i += 1 in a 'for' loop? [duplicate]
(6 answers)
Closed 6 years ago.
I need a good explanation (reference) to explain NumPy slicing within (for) loops. I have three cases.
def example1(array):
for row in array:
row = row + 1
return array
def example2(array):
for row in array:
row += 1
return array
def example3(array):
for row in array:
row[:] = row + 1
return array
A simple case:
ex1 = np.arange(9).reshape(3, 3)
ex2 = ex1.copy()
ex3 = ex1.copy()
returns:
>>> example1(ex1)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> example2(ex2)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> example3(ex3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
It can be seen that the first result differs from the second and third.
First example:
You extract a row and add 1 to it. Then you redefine the pointer row but not what the array contains! So it will not affect the original array.
Second example:
You make an in-place operation - obviously this will affect the original array - as long as it is an array.
If you were doing a double loop it wouldn't work anymore:
def example4(array):
for row in array:
for column in row:
column += 1
return array
example4(np.arange(9).reshape(3,3))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
this doesn't work because you don't call np.ndarray's __iadd__ (to modify the data the array points to) but the python int's __iadd__. So this example only works because your rows are numpy arrays.
Third example:
row[:] = row + 1 this is interpreted as something like row[0] = row[0]+1, row[1] = row[1]+1, ... again this works in place so this affects the original array.
Bottom Line
If you are operating on mutable objects, like lists or np.ndarray you need to be careful what you change. Such an object only points to where the actual data is stored in memory - so changing this pointer (example1) doesn't affect the saved data. You need to follow the pointer (either directly by [:] (example3) or indirectly with array.__iadd__ (example2)) to change the saved data.
In the first code, you don't do anything with the new computed row; you rebind the name row, and there is no connection to the array anymore.
In the second and the third, you dont rebind, but assign values to the old variable. With += some internal function is called, which varies depending on the type of the object you let it act upon. See links below.
If you write row + 1 on the right hand side, a new array is computed. In the first case, you tell python to give it the name row (and forget the original object which was called row before). And in the third, the new array is written to the slice of the old row.
For further reading follow the link of the comment to the question by #Thiru above. Or read about assignment and rebinding in general...

Python Set the matrix value of multiple row and each rows with multiple different columns without for loop

How to set the same value to the matrix of multiple rows and each row with different column numbers without for loop?
For example for matrix a:
a=matrix([[1,2,3],
[8,2,9],
[1,8,7]])
row = [1,2,3]
col = [[1,2]
[1,3]
[2,3]]
I want to set a[1,1],a[1,2],a[2,1],a[2,3],a[3,2],a[3,3] to the same value.
I know use for loop:
for i in xrange(len(row)):
a[row[i],col[i]] = setvalue
But is there anyway to do this without for loop?
Using numpy, you can avoid loops:
import numpy as np
from numpy.matlib import repmat
a = np.array([[1,2,3],
[8,2,9],
[1,8,7]])
row = np.array([[1],
[2],
[3]])
col = np.array([[1,2],
[1,3],
[2,3]])
row = repmat(row,1,col.shape[1])
setvalue = 0
a[row.ravel(),col.ravel()] = setvalue
However, it's important to note that in python indexing starts at 0, so you should actually do
a[row-1,col-1] = setvalue
Or even better, use the correct (zero-based) indices to initialise your row and col arrays.
Case 1: Use list comprehension
You can do like this:
value = 2
col_length = 3
line_length = 3
a = [[value for x in range(col_length)] for x in range(line_length)]
If you print a,
[[2, 2, 2], [2, 2, 2], [2, 2, 2]]
EDIT: Case 2 : Use map()
I am not very used to this one. But you can find more informations about it here in terms of performance. General idea: it seems faster when used with one function and no lambda expression.
You'll have to use a for loop.
Usually you want to avoid for loops (by using comprehesions) when following the functional paradigm, by building new instances instead of mutating the old one. As your goal is to mutate the old one, somewhere you will need a loop. The best you can do is to wrap it up in a function:
def set_items_to(mx, indices, value=0):
for row,cols in indices:
for col in cols:
mx[row, col] = value
a = matrix([[1,2,3],[4,5,6],[7,8,9]])
set_items_to(a, [
[0, [0,1]],
[1, [0,2]],
[2, [1,2]]
], setvalue)
EDIT
In case it is a programming challenge, there are ways to accomplish that without explicit for loops by using one of the built in aggregator functions. But this approach doesn't make the code clearer nor shorter. Just for completeness, it would look something like this:
def set_items_to(mx, indices, value=0):
sum(map(lambda item: [0,
sum(map(lambda col: [0,
mx.__setitem__((item[0], col), value)
][0], item[1]))
][0], indices))

Categories