Basic NumPy array replacement - python

I have a rather basic question about the NumPy module in Python 2, particularly the version on trinket.io. I do not see how to replace values in a multidimensional array several layers in, regardless of the method. Here is an example:
a = numpy.array([1,2,3])
a[0] = 0
print a
a = numpy.array([[1,2,3],[1,2,3]])
a[0][0] = a[1][0] = 0
print a
Result:
array([0, 2, 3], '<class 'int'>')
array([[1, 2, 3], [1, 2, 3]], '<class 'int'>')
I need the ability to change individual values, my specific code being:
a = numpy.empty(shape = (8,8,2),dtype = str)
for row in range(a.shape[0]):
for column in range(a.shape[1]):
a[row][column][1] = 'a'
Thank you for your time and any help provided.

To change individual values you can simply do something like:
a[1,2] = 'b'
If you want to change all the array, you can do:
a[:,:] = 'c'
Use commas (array[a,b]) instead of (array[a][b])

With numpy version 1.11.0, I get
[[0 2 3]
[0 2 3]]
When I run your code. I guess your numpy version is newer and better.
As user3408085 said, the correct thing is to go a[0,0] = 0 to change one element or a[:,0]=0 if your actually want to zero the entire first column.
The reason a[0][0]=0 does not modify a (at least in your version of numpy) is that a[0] is a new array. If break down your command a[0][0]=0 into 2 lines:
b=a[0]
b[0]=0
Then the fact that this modifies a is counterintuitive.

Related

Difficulties to understand np.nditer

I am very new to python. I want to clearly understand the below code, if there's anyone who can help me.
Code:
import numpy as np
arr = np.array([[1, 2, 3, 4,99,11,22], [5, 6, 7, 8,43,54,22]])
for x in np.nditer(arr[0:,::4]):
print(x)
My understanding:
This 2D array has two 1D arrays.
np.nditer(arr[0:,::4]) will give all value from 0 indexed array to upto last array, ::4 means the gap between printed arrays will be 4.
Question:
Is my understanding for no 2 above correct?
How can I get the index for the print(x)? Because of the step difference of 4 e.g [0:,::4] or any gap [0:,::x] I want to find out the exact index that it is printing. But how?
Addressing your questions below
Yes, I think your understanding is correct. It might help to first print what arr[0:,::4] returns though:
iter_array = arr[0:,::4]
print(iter_array)
>>> [[ 1 99]
>>> [ 5 43]]
The slicing takes out each 4th index of the original array. All nditer does is iterate through these values in order. (Quick FYI: arr[0:] and arr[:] are equivalent, since the starting point is 0 by default).
As you pointed out, to get the index for these you need to keep track of the slicing that you did, i.e. arr[0:, ::x]. Remember, nditer has nothing to do with how you sliced your array. I'm not sure how to best get the indices of your slicing, but this is what I came up with:
import numpy as np
ls = [
[1, 2, 3, 4,99,11,22],
[5, 6, 7, 8,43,54,22]
]
arr = np.array(ls)
inds = np.array([
[(ctr1, ctr2) for ctr2, _ in enumerate(l)] for ctr1, l in enumerate(ls)
]) # create duplicate of arr filled with zeros
step = 4
iter_array = arr[0:,::step]
iter_inds = inds[0:,::step]
print(iter_array)
>>> [[ 1 99]
>>> [ 5 43]]
print(iter_inds)
>>> [[[0 0]
>>> [0 4]]
>>>
>>> [[1 0]
>>> [1 4]]]
All that I added here was an inds array. This array has elements equal to their own index. Then, when you slice both arrays in the same way, you get your indices. Hopefully this helps!

Get all the rows with same values in python?

So, suppose I have this 2D array in python
a = [[1,2]
[2,3]
[3,2]
[1,3]]
How do get all array entries with the same row value and store them in a new matrix.
For example, I will have
b = [1,2]
[1,3]
after the query.
My approach is b = [a[i] for i in a if a[i][0] == 1][0]]
but it didn't seem to work?
I am new to Python and the whole index slicing thing is kind confusing. Thanks!
Since you tagged numpy, you can perform this task with NumPy arrays. First define your array:
a = np.array([[1, 2],
[2, 3],
[3, 2],
[1, 3]])
For all unique values in the first column, you can use a dictionary comprehension. This is useful to avoid duplicating operations.
d = {i: a[a[:, 0] == i] for i in np.unique(a[:, 0])}
{1: array([[1, 2],
[1, 3]]),
2: array([[2, 3]]),
3: array([[3, 2]])}
Then access your array where first column is equal to 1 via d[1].
For a single query, you can simply use a[a[:, 0] == 1].
The for i in a syntax gives you the actual items in the list..so for example:
list_of_strs = ['first', 'second', 'third']
first_letters = [s[0] for s in list_of_strs]
# first_letters == ['f', 's', 't']
What you are actually doing with b = [a[i] for i in a if a[i][0]==1] is trying to index an element of a with each of the elements of a. But since each element of a is itself a list, this won't work (you can't index lists with other lists)
Something like this should work:
b = [row for row in a if row[0] == 1]
Bonus points if you write it as a function so that you can pick which thing you want to filter on.
If you're working with arrays a lot, you might also check out the numpy library. With numpy, you can do stuff like this.
import numpy as np
a = np.array([[1,2], [2,3], [3,2], [1,3]])
b = a[a[:,0] == 1]
The last line is basically indexing the original array a with a boolean array defined inside the first set of square brackets. It's very flexible, so you could also modify this to filter on the second element, filter on other conditions (like > some_number), etc. etc.

numpy: how to apply function to every row of array

I have a 2d numpy array called my_data. Each row represents information about one data point and each column represents different attributes of that data point.
I have a function called processRow. It takes in a row, and does some processing on the info and returns the modified row. The length of the row returned by the function is longer than the row taken in by the function (the function basically expands some categorical data into one-hot vectors)
How can I have a numpy array where every row has been processed by this function?
I tried
answer = np.array([])
for row in my_data:
answer = np.append(answer,processRow(row))
but at the end, the answer is just a single really long row rather than a 2d grid
You can use vstack rather since row has a different shape to answer. You also need to be explicit with the shape of answer:
In [11]: my_data = np.array([[1, 2], [3, 4]])
...: process_row = lambda x: x # do nothing
In [12]: answer = np.empty((0, 2), dtype='int64')
...: for row in my_data:
...: answer = np.vstack([answer, process_row(row)])
...:
In [13]: answer
Out[13]:
array([[ 1, 2],
[ 3, 4]])
However, you're probably better off doing a list comprehension, and then passing it to numpy after:
In [21]: np.array([process_row(row) for row in my_data])
Out[21]:
array([[1, 2],
[3, 4]])
I'm not sure if I entirely got what you were after without seeing a sample of the data. But hopefully this helps you get to the result you want. I simplified the concept and just added one to each value in the row passed to the function and added the results together for a total (just to expand the size of the returned array). Of course you could adjust the processing to whatever you wanted.
def funky(x):
temp = []
for value in x:
value += 1
temp.append(value)
temp.append(temp[0] + temp[1])
return np.array(temp)
my_data = np.array([[1,1], [2,2]])
answer = np.apply_along_axis(funky, 1, my_data)
print("This is the original data:\n{}".format(my_data))
print("This is the adjusted data:\n{}".format(answer))
Below is the before and after of the array modification:
This is the original data:
[[1 1]
[2 2]]
This is the adjusted data:
[[2 2 4]
[3 3 6]]

Python Set the matrix value of multiple row and each rows with multiple different columns without for loop

How to set the same value to the matrix of multiple rows and each row with different column numbers without for loop?
For example for matrix a:
a=matrix([[1,2,3],
[8,2,9],
[1,8,7]])
row = [1,2,3]
col = [[1,2]
[1,3]
[2,3]]
I want to set a[1,1],a[1,2],a[2,1],a[2,3],a[3,2],a[3,3] to the same value.
I know use for loop:
for i in xrange(len(row)):
a[row[i],col[i]] = setvalue
But is there anyway to do this without for loop?
Using numpy, you can avoid loops:
import numpy as np
from numpy.matlib import repmat
a = np.array([[1,2,3],
[8,2,9],
[1,8,7]])
row = np.array([[1],
[2],
[3]])
col = np.array([[1,2],
[1,3],
[2,3]])
row = repmat(row,1,col.shape[1])
setvalue = 0
a[row.ravel(),col.ravel()] = setvalue
However, it's important to note that in python indexing starts at 0, so you should actually do
a[row-1,col-1] = setvalue
Or even better, use the correct (zero-based) indices to initialise your row and col arrays.
Case 1: Use list comprehension
You can do like this:
value = 2
col_length = 3
line_length = 3
a = [[value for x in range(col_length)] for x in range(line_length)]
If you print a,
[[2, 2, 2], [2, 2, 2], [2, 2, 2]]
EDIT: Case 2 : Use map()
I am not very used to this one. But you can find more informations about it here in terms of performance. General idea: it seems faster when used with one function and no lambda expression.
You'll have to use a for loop.
Usually you want to avoid for loops (by using comprehesions) when following the functional paradigm, by building new instances instead of mutating the old one. As your goal is to mutate the old one, somewhere you will need a loop. The best you can do is to wrap it up in a function:
def set_items_to(mx, indices, value=0):
for row,cols in indices:
for col in cols:
mx[row, col] = value
a = matrix([[1,2,3],[4,5,6],[7,8,9]])
set_items_to(a, [
[0, [0,1]],
[1, [0,2]],
[2, [1,2]]
], setvalue)
EDIT
In case it is a programming challenge, there are ways to accomplish that without explicit for loops by using one of the built in aggregator functions. But this approach doesn't make the code clearer nor shorter. Just for completeness, it would look something like this:
def set_items_to(mx, indices, value=0):
sum(map(lambda item: [0,
sum(map(lambda col: [0,
mx.__setitem__((item[0], col), value)
][0], item[1]))
][0], indices))

Acquiring the Minimum array out of Multiple Arrays by order in Python

Say that I have 4 numpy arrays
[1,2,3]
[2,3,1]
[3,2,1]
[1,3,2]
In this case, I've determined [1,2,3] is the "minimum array" for my purposes, as it is one of two arrays with lowest value at index 0, and of those two arrays it has the the lowest index 1. If there were more arrays with similar values, I would need to compare the next index values, and so on.
How can I extract the array [1,2,3] in that same order from the pile?
How can I extend that to x arrays of size n?
Thanks
Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()
gives
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]
The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as
sorted(a.tolist())[0]
As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).
Here's an idea using numpy:
import numpy
a = numpy.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
col = 0
while a.shape[0] > 1:
b = numpy.argmin(a[:,col:], axis=1)
a = a[b == numpy.min(b)]
col += 1
print a
This checks column by column until only one row is left.
numpy's lexsort is close to what you want. It sorts on the last key first, but that's easy to get around:
>>> a = np.array([[1,2,3],[2,3,1],[3,2,1],[1,3,2]])
>>> order = np.lexsort(a[:, ::-1].T)
>>> order
array([0, 3, 1, 2])
>>> a[order]
array([[1, 2, 3],
[1, 3, 2],
[2, 3, 1],
[3, 2, 1]])

Categories