Numpy Conditionally Replace Column Elements - python

So I already took a look at this question.
I know you can conditionally replace a single column, but what about multiple columns? When I tried it, it doesn't seem to work.
the_data = np.array([[0, 1, 1, 1],
[0, 1, 3, 1],
[3, 4, 1, 3],
[0, 1, 2, 0],
[2, 1, 0, 0]])
the_data[:,0][the_data[:,0] == 0] = -1 # this works
columns_to_replace = [0, 1, 3]
the_data[:,columns_to_replace][the_data[:,columns_to_replace] == 0] = -1 # this does not work
I initially thought that the second case doesn't work because I thought the_data[:,columns_to_replace] creates a copy instead of directly referencing the elements. However, if that were the case, then the first case shouldn't work either, when you are only replacing the single column.

You're indeed getting a copy because you're using advanced indexing:
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
(Taken from the docs)
The first part works because it uses basic slicing.
I think you can do this without copying, but still with some memory overhead:
columns_to_replace = [0, 1, 3]
mask = np.zeros(the_data.shape, bool) # don't use too much memory
mask[:, columns_to_replace] = 1
np.place(the_data, (the_data == 0) * mask, [-1]) # this doesn't copy anything

Related

Set rows in Python 2D array to another row without Numpy?

I want to "set" the values of a row of a Python nested list to another row without using NumPy.
I have a sample list:
lst = [[0, 0, 1],
[0, 2, 3],
[5, 2, 3]]
I want to make row 1 to row 2, row 2 to row 3, and row 3 to row 1. My desired output is:
lst = [[0, 2, 3],
[5, 2, 3],
[0, 0, 1]]
How can I do this without using Numpy?
I tried to do something like arr[[0, 1]] = arr[[1, 0]] but it gives the error 'NoneType' object is not subscriptable.
One very straightforward way:
arr = [arr[-1], *arr[:-1]]
Or another way to achieve the same:
arr = [arr[-1]] + arr[:-1]
arr[-1] is the last element of the array. And arr[:-1] is everything up to the last element of the array.
The first solution builds a new list and adds the last element first and then all the other elements. The second one constructs a list with only the last element and then extends it with the list containing the rest.
Note: naming your list an array doesn't make it one. Although you can access a list of lists like arr[i1][i2], it's still just a list of lists. Look at the array documentation for Python's actual array.
The solution user #MadPhysicist provided comes down to the second solution provided here, since [arr[-1]] == arr[-1:]
Since python does not actually support multidimensional lists, your task becomes simpler by virtue of the fact that you are dealing with a list containing lists of rows.
To roll the list, just reassemble the outer container:
result = lst[-1:] + lst[:-1]
Numpy has a special interpretation for lists of integers, like [0, 1], tuples, like :, -1, single integers, and slices. Python lists only understand single integers and slices as indices, and do not accept tuples as multidimensional indices, because, again, lists are fundamentally one-dimensional.
use this generalisation
arr = [arr[-1]] + arr[:-1]
which according to your example means
arr[0],arr[1],arr[2] = arr[1],arr[2],arr[0]
or
arr = [arr[2]]+arr[:2]
or
arr = [arr[2]]+arr[:-1]
You can use this
>>> lst = [[0, 0, 1],
[0, 2, 3],
[5, 2, 3]]
>>> lst = [*lst[1:], *lst[:1]]
>>>lst
[[0, 2, 3],
[5, 2, 3],
[0, 0, 1]]

TypeError: 'int' object has no attribute '__getitem__' (simple issue)

I'm a newbie on Python, and am struggling with a small piece of my code I just don't understand why it won't work.
I have list of lists, containing 3 numbers each. I want to check if the first two numbers are the same for some of the lists. Why doesn't this work? What should I do to get it work?
list=[[0, 4, 0], [1, 4, 0], [0, 3, 1], [0, 4, 1]]
sorted(list)
for i in range(len(list)-1):
if list[i][0][1] == list[i+1][0][1]:
print "overlap"
You are trying to access your matrix as if it would be a 3-dimensional matrix, however it's a 2-dimensional matrix.
Remove one of the indexes:
list=[[0, 4, 0], [1, 4, 0], [0, 3, 1], [0, 4, 1]]
sorted(list)
for i in range(len(list)-1):
if list[i][0:2] == list[i + 1][0:2]:
print "overlap"
As #Dunes pointed out, the slice operator allows you to compare the required items of your list (check out understanding python slice notation for details).
You don't need that extra [1].
list[i] accesses the inner list, e.g. [0, 4, 0]
list[i][0] accesses the 1st element of that list: e.g. 0
Also, please don't use built-in names as names for your variables as the built-in (list in our case) will no longer be accessible by that name.

Pythonic way to index intervals of integers from splitting points

I'm coding a hash-table-ish indexing mechanism that returns an integer's interval number (0 to n), according to a set of splitting points.
For example, if integers are split at value 3 (one split point, so two intervals), we can find the interval number for each array element using a simple comparison:
>>> import numpy as np
>>> x = np.array(range(7))
>>> [int(i>3) for i in x]
[0, 0, 0, 0, 1, 1, 1]
When there are many intervals, we can define a function as below:
>>> def get_interval_id(input_value, splits):
... for i,split_point in enumerate(splits):
... if input_value < split_point:
... return i
... return len(splits)
...
>>> [get_interval_id(i, [2,4]) for i in x]
[0, 0, 1, 1, 2, 2, 2]
But this solution does not look elegant. Is there any Pythonic (better) way to do this job?
Since you're already using it, I would suggest you use the digitize method from numpy:
>>> import numpy as np
>>> np.digitize(np.array([0, 1, 2, 3, 4, 5, 6]), [2, 4])
array([0, 0, 1, 1, 2, 2, 2])
From the documentation:
Return the indices of the bins to which each value in input array
belongs.
Python, per se, does not have a tractable function for this process, called binning. If you wanted, you could wrap your function into a one-line command, but it's more readable this way.
However, data frame packages usually have full-featured binning methods; the most popular one in Python is PANDAS. This allows you to collect or classify values by equal intervals, equal divisions (same quantity of entries in each bin), or custom split values (your case). See this question for a good discussion and examples.
Of course, this means that you'd have to install and import pandas and convert your list to a data frame. If that's too much trouble, just keep your current implementation; it's readable, straightforward, and reasonably short.
How about wrapping the whole process inside of one function instead of only half the process?
>>> get_interval_ids([0 ,1, 2, 3, 4, 5 ,6], [2, 4])
[0, 0, 1, 1, 2, 2, 2]
and your function would look like
def get_interval_ids(values, splits):
def get_interval_id(input_value):
for i,split_point in enumerate(splits):
if input_value < split_point:
return i
return len(splits)
return [get_interval_id(val) for val in values]

Numpy array subset - unexpected behaviour

I'm trying to copy a subset of a numpy array (to do image background subtraction - but that's by the by). I don't understand what's wrong with the following - I've demonstrated it interactively because you really don't want to wade through all my code...
>>> from numpy import zeros
>>> a = zeros((5,5,3), 'uint8')
>>> print a.shape
(5, 5, 3)
>>> b = a[1:2][1:2][:].copy()
>>> print b.shape
(0, 5, 3)
>>> print a[1:2][1:2][:].shape
(0, 5, 3)
>>> print a.shape
(5, 5, 3)
>>>
What I'd like is for b.shape to return (2,2,3) - and behave that way in the subsequent operations I need to do with it. I'm sure I've done something really obvious wrong, but I can't work out what. Any suggestions gratefully received!
I believe you meant a[1:3, 1:3, :] instead of a[1:2][1:2][:].
Also, a[1:3, 1:3, ...] would work too (... means "as many : as necessary"). NumPy seems to also allow a[1:3, 1:3].
There are two parts to the explanations:
slicing in Python is left-inclusive and right-exclusive
comma-indexing is necessary here, a[1:3] gives you a shape (2,5,3) and another [1:3] will slice through the first dimension again.
For simple indexing a[1][2][3] is same as a[1,2,3] because each consecutive indexing removes one dimension. That doesn't hold for slicing, though - you need to use commas.
There are two different problems with what you're doing. The primary one is how you're handling indexing in numpy. Numpy matrices have their own syntax that is much more clear than the list of lists syntax that you're using... Use commas instead of separate indices in brackets:
>>> from numpy import zeros
>>> a = zeros((5,5,3), 'uint8')
>>> print a[1:2,1:2,:].shape
(1, 1, 3)
What you're doing instead is failing because a[1:2] still returns a list of lists, so your next index is an index on the outer list (which only has one element), not the inner list that you want:
>>> a[1:2]
array([[[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]], dtype=uint8)
>>> a[1:2][1:2]
array([], shape=(0, 5, 3), dtype=uint8)
(You wouldn't have this problem if you were using simple indices instead of slices, but you should still use the comma syntax because it's much clearer.
Second, you're using slices wrong. The first value of the slice is the index of the first value of the array that you want--and the indices start at 0. The second value is one MORE than the index of the array that you want. This is so that a[first_index:second_index] returns second_index-first_index points. So, you want something like this:
>>> b = a[0:2,0:2,:]
>>> b
array([[[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0]]], dtype=uint8)
Your index of [1:2] will only return one element... the second one in the list.
Also, as a side note, .copy() is redundant here because taking slices from a numpy array already creates a new object.

itertools product speed up

I use itertools.product to generate all possible variations of 4 elements of length 13. The 4 and 13 can be arbitrary, but as it is, I get 4^13 results, which is a lot. I need the result as a Numpy array and currently do the following:
c = it.product([1,-1,np.complex(0,1), np.complex(0,-1)], repeat=length)
sendbuf = np.array(list(c))
With some simple profiling code shoved in between, it looks like the first line is pretty much instantaneous, whereas the conversion to a list and then Numpy array takes about 3 hours.
Is there a way to make this quicker? It's probably something really obvious that I am overlooking.
Thanks!
The NumPy equivalent of itertools.product() is numpy.indices(), but it will only get you the product of ranges of the form 0,...,k-1:
numpy.rollaxis(numpy.indices((2, 3, 3)), 0, 4)
array([[[[0, 0, 0],
[0, 0, 1],
[0, 0, 2]],
[[0, 1, 0],
[0, 1, 1],
[0, 1, 2]],
[[0, 2, 0],
[0, 2, 1],
[0, 2, 2]]],
[[[1, 0, 0],
[1, 0, 1],
[1, 0, 2]],
[[1, 1, 0],
[1, 1, 1],
[1, 1, 2]],
[[1, 2, 0],
[1, 2, 1],
[1, 2, 2]]]])
For your special case, you can use
a = numpy.indices((4,)*13)
b = 1j ** numpy.rollaxis(a, 0, 14)
(This won't run on a 32 bit system, because the array is to large. Extrapolating from the size I can test, it should run in less than a minute though.)
EIDT: Just to mention it: the call to numpy.rollaxis() is more or less cosmetical, to get the same output as itertools.product(). If you don't care about the order of the indices, you can just omit it (but it is cheap anyway as long as you don't have any follow-up operations that would transform your array into a contiguous array.)
EDIT2: To get the exact analogue of
numpy.array(list(itertools.product(some_list, repeat=some_length)))
you can use
numpy.array(some_list)[numpy.rollaxis(
numpy.indices((len(some_list),) * some_length), 0, some_length + 1)
.reshape(-1, some_length)]
This got completely unreadable -- just tell me whether I should explain it any further :)
The first line seems instantaneous because no actual operation is taking place. A generator object is just constructed and only when you iterate through it as the operating taking place. As you said, you get 4^13 = 67108864 numbers, all these are computed and made available during your list call. I see that np.array takes only list or a tuple, so you could try creating a tuple out of your iterator and pass it to np.array to see if there is any performance difference and it does not affect the overall performance of your program. This can be determined only by trying for your usecase though there are some points which say tuple is slightly faster.
To try with a tuple, instead of list just do
sendbuf = np.array(tuple(c))
You could speed things up by skipping the conversion to a list:
numpy.fromiter(c, count=…) # Using count also speeds things up, but it's optional
With this function, the NumPy array is first allocated and then initialized element by element, without having to go through the additional step of a list construction.
PS: fromiter() does not handle the tuples returned by product(), so this might not be a solution, for now. If fromiter() did handle dtype=object, this should work, though.
PPS: As Joe Kington pointed out, this can be made to work by putting the tuples in a structured array. However, this does not appear to always give a speed up.
Let numpy.meshgrid do all the job:
length = 13
x = [1, -1, 1j, -1j]
mesh = numpy.meshgrid(*([x] * length))
result = numpy.vstack([y.flat for y in mesh]).T
on my notebook it takes ~2 minutes
You might want to try a completely different approach: first create an empty array of the desired size:
result = np.empty((4**length, length), dtype=complex)
then use NumPy's slicing abilities to fill out the array yourself:
# Set up of the last "digit":
result[::4, length-1] = 1
result[1::4, length-1] = -1
result[2::4, length-1] = 1j
result[3::4, length-1] = -1j
You can do similar things for the other "digits" (i.e. the elements of result[:, 2], result[:, 1], and result[:, 0]). The whole thing could certainly be put in a loop that iterates over each digit.
Transposing the whole operation (np.empty((length, 4**length)…)) is worth trying, as it might bring a speed gain (through a better use of the memory cache).
Probably not optimized but much less reliant on python type conversions:
ints = [1,2,3,4]
repeat = 3
def prod(ints, repeat):
w = repeat
l = len(ints)
h = l**repeat
ints = np.array(ints)
A = np.empty((h,w), dtype=int)
rng = np.arange(h)
for i in range(w):
x = l**i
idx = np.mod(rng,l*x)/x
A[:,i] = ints[idx]
return A

Categories