Numpy array limiting operation X[X < {value}] = {value} - python

I came across the following in a piece of code:
X = numpy.array()
X[X < np.finfo(float).eps] = np.finfo(float).eps
I found out the following from the documentation:
class numpy.finfo(dtype):
Machine limits for floating point types.
Parameters:
dtype : float, dtype, or instance
Kind of floating point data-type about which to get information.
I understand that np.finfo(float).eps returns the lowest represent-able float value and that X[X < np.finfo(float).eps] = np.finfo(float).eps makes sure that any value less than np.finfo(float).eps is not contained in the array X, but I'm unable to understand how exactly that happens in the statement of the form X[X < {value}] = {value} and what it means. Any help is appreciated much.

The first time I saw it was as a way to replace NaNs in an array
Basically the conditional X < np.finfo(float).eps creates a boolean mask of Xand then X is iterated over replacing values that have a True associated with them.
So for instance,
x=np.array([-4, -3, -2, -1, 0, 1, 2, 3, 4])
x[x < 0] = 0
Here the mask array would look like,
[True, True, True, True, False, False, False, False, False]
Its a quicker way of doing the following with large arrays,
x=np.array([-4, -3, -2, -1, 0, 1, 2, 3, 4])
for y, idx in enumerate(x):
if y < 0:
x[idx] = 0

This is a fancy way of changing values of an array and changing values if condition is met.
On an easy example:
X = np.random.randint(1, 100, size=5)
print(X) # array([ 1, 17, 92, 9, 11])
X[X < 50] = 50 # Change any value lower than 50 to 50
print(X) # array([50, 50, 92, 50, 50])
Basically this changes array X if you don't make a copy of it and former values are lost forever. Using np.where() would achieve same goal, but it would not override the original array.
X = np.random.randint(1, 100, size=5)
print(X) # array([ 1, 17, 92, 9, 11])
np.where(X < 50, 50, X) # array([50, 50, 92, 50, 50])
print(X) # array([ 1, 17, 92, 9, 11])
Extra info:
Fancy indexing You need to scroll a bit down tho (idk how to copy at specific header)

When we index a numpy array X with another array x, the output is a numpy array with values corresponding to the values of X at indices corresponding to the values of x.
And X < {value} returns a numpy array which has boolean values True or False against each item in X depending on whether the item passed the condition {item} < {value}. Hence, X[X < {value}] = {value} means that we're assigning the value {value} whenever an array item is less than {value}. The following would make things more clear:
>>> x = [1, 2, 0, 3, 4, 0, 5, 6, 0, 7, 8, 0]
>>> X = numpy.array(x)
>>> X < 1
array([False, False, True, False, False, True, False, False, True,
False, False, True])
>>> X[X < 1] = -1
>>> X
array([ 1, 2, -1, 3, 4, -1, 5, 6, -1, 7, 8, -1])
>>> X[x]
array([ 2, -1, 1, 3, 4, 1, -1, 5, 1, 6, -1, 1])
P.S. : The credit for this answer goes to #ForceBru and his comment above!

Related

NumPy mean of multiple slices

Let's say I have an array a where I would like to calculate the mean across multiple slices defined in idx:
a = np.arange(10)
idx = np.random.choice([0,1], a.size).astype(bool)
a, idx
Out[1]: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([False, False, True, True, False, False, False, True, True,
True]))
With the desired output:
array([2.5, 8. ])
Of course I could write a simple for loop, but I would prefer a fully vectorized approach given that the arrays sizes can become quite big and 2D.
It's possible to do it completely vectorized:
edges = np.diff(idx.astype(np.int8), prepend=0, append=0)
rising = np.where(edges == 1)[0]
falling = np.where(edges == -1)[0]
cum = np.insert(np.cumsum(a), 0, 0)
means = (cum[falling] - cum[rising]) / (falling - rising)
This takes about 0.2 seconds on my machine with a = np.arange(10**7).

Changing several elements in a list using list comprehension

An example:
example = [None, b'1', b'2', b'3', b'4', b'1', None, None, b'8', b'56', b'66', b'77', b'1', None]
I need to change values in this list in the following way: bytes should be converted to int, None to 0 and every 6th and 7th element to a bool. The expected output:
expected_output = [0, 1, 2, 3, 4, True, False, 0, 8, 56, 66, 77, True, False]
My attempt:
[int(value) if value else bool(value) if index in (5,6) else 0 for index, value in enumerate(example)]
The result:
[0, 1, 2, 3, 4, True, False, 0, 8, 56, 66, 77, 0, 0]
I know there is a problem in this part of my code:
[int(value) if value else bool(value) if index in (5,6) else 0 for index, value in enumerate(example)]
^
|
|
How can I change every 6th and 7th element to a bool?
You need to check whether the index modulo 7 (the number of elements in each group) is 5 or 6. But you also need to do that first since b'1' also passes the if value test:
[bool(value) if index % 7 in (5,6) else int(value) if value else 0 for index, value in enumerate(example)]
Output:
[0, 1, 2, 3, 4, True, False, 0, 8, 56, 66, 77, True, False]
The more logic you add to this list comprehension the more unreadable it will become. I'd suggest you to go with a normal if/elif statement here:
for ix, i in enumerate(example):
if ix % 7 in (5,6):
example[ix] = bool(i)
elif not i:
example[ix] = 0
elif isinstance(i, bytes):
example[ix] = int(i)
print(example)
# [0, 1, 2, 3, 4, True, False, 0, 8, 56, 66, 77, True, False]
Basic
You can implement rotation with %(modulo) operator.
example = [
None, b'1', b'2', b'3', b'4', b'1', None,
None, b'8', b'56', b'66', b'77', b'1', None,
]
results = [
bool(value) if index % 7 in {5, 6} else
int(value) if value else 0
for index, value in enumerate(example)
]
print(results)
output:
[0, 1, 2, 3, 4, True, False, 0, 8, 56, 66, 77, True, False]
Pythonic
But, there is some more pythonic way:
import itertools
example = [
None, b'1', b'2', b'3', b'4', b'1', None,
None, b'8', b'56', b'66', b'77', b'1', None,
]
converters = itertools.cycle(itertools.chain(
itertools.repeat(lambda x: int(x) if x else 0, 5),
itertools.repeat(bool, 2),
))
results = [
converter(value) for converter, value in zip(converters, example)
]
print(results)
output: same with above.
Explanation
itertools.cycle: Make a cyclic iterator.
itertools.chain: Concatenate iterables.
itertools.repeat: Make iterator with repeating by specified number.
So converters = ... part means: lambda x: int(x) if x else 0 fucntion with five times, bool with two times, and repeat this cycle!
You can add multiple rules for each position without complicated if-else.

Can't delete element in ndarray

I am trying to delete the last element in an array, if the element does not meet certain conditions. The code I am using is:
# Set the distibution parameter to 2
a = 2
# Set the size to 100
s = 100
# Create Zipf's Law distribution using a and s
x = np.random.zipf(a,s)
# Reorder list by number frequency
xb = np.unique(x, return_counts=True)
print("X",x)
print("XB",xb)
for i in reversed(xb):
if xb[-1] > xb[-2]*1.5:
xb = np.delete(xb,-1)
print("XB mod",xb)
print()
I get the following output for the python print("X",x) and ````python print("XB", xb):
XB (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 28,
29,
31, 33, 56, 225]), array([57, 17, 4, 4, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1],
dtype=int64))
However, when I try to run the deletion portion of the code, I get the following error:
Traceback (most recent call last): File "test2.py", line 22, in
if xb[-1] > xb[-2]*1.5: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Any idea how to fix it, so that I can delete the last element in the XB array, if it doesn't meet the condition?
xb is a tuple consisting of a pair of np.ndarray objects.
How do I delete the last element in the XB array, if it doesn't meet the condition
If you want to delete the last pair of zipped values (e.g. 225 and 1 for your data) based on your condition where you compare the last two numbers of the first row of data (e.g. 225 > 56 * 1.5 for your data):
if xb[0][-1] > xb[0][-2] * 1.5:
xb = tuple(x[:-1] for x in xb)
>>> xb
(array([ 1, 2, ..., 31, 33, 56]),
array([57, 17, ..., 1, 1, 1]))
Short answer:
Use all:
for i in reversed(xb):
if all(xb[-1] > xb[-2]*1.5): # use all here
xb = np.delete(xb,-1)
Equivalent: if (xb[-1] > xb[-2]*1.5).all():
Long answer:
You have:
xb
(array([ 1, 2, 3, 4, 5, 7, 9, 10, 13, 21, 22, 24, 30]),
array([62, 16, 2, 4, 6, 3, 1, 1, 1, 1, 1, 1, 1]))
that is a list of numpy arrays.
Next, xb[-1] > xb[-2]*1.5 returns:
array([ True, True, False, False, False, False, False, False, False,
False, False, False, False])
If you do not use all OR any, this condition will raise the errror
the problem is in if xb[-1] > xb[-2]*1.5
xb is not a scalar but a vector (1d array).
So what does it means v1 > v2? all items? at least one item?
Take for example [2,3] > [1,4], all will return False because 3 > 4, is False, any on the other hand, will return True because there is at least one that is true (2 > 1).
Like the error say it is ambiguous.
So, if for example you want that all the items will pass the condition you have to use:
if np.all(xb[-1] > xb[-2]*1.5): ...

What is the '<' doing in this line?: data += dt < b

Input:
dt = [6,7,8,9,10]
data = [1,2,3,4,5]
b = 8.0
b = np.require(b, dtype=np.float)
data += dt < b
data
Output:
array([2, 3, 3, 4, 5])
I tried to input different number but still couldn't figure out what's the "<" doing there....
Also, it seems to work only when b is np.float (hence the conversion).
The < with numpy arrays does an element-wise comparison. That means it returns an array where there is a True where the condition is true and False if not. The np.require line is necessary here so it actually uses NumPy arrays. You could drop the np.require if you converted your data and dt to np.arrays beforehand.
Then the result is added (element-wise) to the numeric array. In this context True is equal to 1 and False to zero.
>>> dt < b # which elements are smaller than b?
array([ True, True, False, False, False])
>>> 0 + (dt < b) # boolean arrays in arithmetic operations with numbers
array([1, 1, 0, 0, 0])
So it adds 1 to every element of data where the element in dt is smaller than 8.
dt is a list:
In [50]: dt = [6,7,8,9,10]
In [51]: dt < 8
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-51-3d06f93227f5> in <module>()
----> 1 dt < 8
TypeError: '<' not supported between instances of 'list' and 'int'
< (.__lt__) is not defined for lists.
But if one element of the comparison is an ndarray, then the numpy definition of __lt__ applies. dt is turned into an array, and it does an element by element comparison.
In [52]: dt < np.array(8)
Out[52]: array([ True, True, False, False, False])
In [53]: np.array(dt) < 8
Out[53]: array([ True, True, False, False, False])
numpy array operations also explain the data += part:
In [54]: data = [1,2,3,4,5] # a list
In [55]: data + (dt < np.array(8)) # list=>array, and boolean array to integer array
Out[55]: array([2, 3, 3, 4, 5])
In [56]: data
Out[56]: [1, 2, 3, 4, 5]
In [57]: data += (dt < np.array(8))
In [58]: data
Out[58]: array([2, 3, 3, 4, 5])
Actually I'm a bit surprised that with the += data has been changed from list to array. It means the data+=... has been implemented as an assignment:
data = data + (dt <np.array(8))
Normally + for a list is a concatenate:
In [61]: data += ['a','b','c']
In [62]: data
Out[62]: [1, 2, 3, 4, 5, 'a', 'b', 'c']
# equivalent of: data.extend(['a','b','c'])
You can often get away with using lists in array contexts, but it's better to make objects arrays, so you do get these implicit, and sometimes unexpected, conversions.
This is just an alias (or shortcut or convenience notation) to the equivalent function: numpy.less()
In [116]: arr1 = np.arange(8)
In [117]: scalar = 6.0
# comparison that generates a boolean mask
In [118]: arr1 < scalar
Out[118]: array([ True, True, True, True, True, True, False, False])
# same operation as above
In [119]: np.less(arr1, scalar)
Out[119]: array([ True, True, True, True, True, True, False, False])
Let's see how this boolean array can be added to a non-boolean array in this case. It is possible due to type coercion
# sample array
In [120]: some_arr = np.array([1, 1, 1, 1, 1, 1, 1, 1])
# addition after type coercion
In [122]: some_arr + (arr1 < scalar)
Out[122]: array([2, 2, 2, 2, 2, 2, 1, 1])
# same output achieved with `numpy.less()`
In [123]: some_arr + np.less(arr1, scalar)
Out[123]: array([2, 2, 2, 2, 2, 2, 1, 1])
So, type coercion happens on the boolean array and then addition is performed.

Elegant list comprehension to extract values in one dimension of an array based on values in another dimension

I'm looking for an elegant solution to this:
data = np.loadtxt(file)
# data[:,0] is a time
# data[:,1] is what I want to extract
mean = 0.0
count = 0
for n in xrange(np.size(data[:,0])):
if data[n,0] >= tstart and data[n,0] <= tend:
mean = mean + data[n,1]
count = count + 1
mean = mean / float(count)
I'm guessing I could alternatively first extract my 2D array and then apply np.mean on it but I feel like there could be some list comprehension goodness to make this more elegant (I come from a FORTRAN background...). I was thinking something like (obviously wrong since i would not be an index):
np.mean([x for x in data[i,1] for i in data[:,0] if i >= tstart and i <= tend])
In numpy, rather than listcomps you can use lists and arrays for indexing purposes. To be specific, say we have a 2D array like the one you're working with:
>>> import numpy as np
>>> data = np.arange(20).reshape(10, 2)
>>> data
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]])
We can get the first column:
>>> ts = data[:,0]
>>> ts
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
And create a boolean array corresponding to the terms we want:
>>> (ts >= 2) & (ts <= 6)
array([False, True, True, True, False, False, False, False, False, False], dtype=bool)
Then we can use this to select elements of the column we're interested in:
>>> data[:,1][(ts >= 2) & (ts <= 6)]
array([3, 5, 7])
and finally take its mean:
>>> np.mean(data[:,1][(ts >= 2) & (ts <= 6)])
5.0
Or, in one line:
>>> np.mean(data[:,1][(data[:,0] >= 2) & (data[:,0] <= 6)])
5.0
[Edit: data[:,1][(data[:,0] >= 2) & (data[:,0] <= 6)].mean() will work too; I always forget you can use methods.]

Categories