Filling a numpy array with x random values - python

I have a numpy array of size x, which I need to fill with 700 true.
For example:
a = np.zeros(5956)
If I want to fill this with 70 % True, I can write this:
msk = np.random.rand(len(a)) < 0.7
b = spam_df[msk]
But what if I need exactly 700 true, and the rest false?

import numpy as np
x = 5956
a = np.zeros((x), dtype=bool)
random_places = np.random.choice(x, 700, replace=False)
a[random_places] = True

import numpy as np
zeros = np.zeros(5956-700, dtype=bool)
ones=np.ones(700, dtype=bool)
arr=np.concatenate((ones,zeros), axis=0, out=None)
np.random.shuffle(arr)#Now, this array 'arr' is shuffled, with 700 Trues and rest False
Example - there should be 5 elements in an array with 3 True and rest False.
ones= np.ones(3, dtype=bool) #array([True, True, True])
zeros= np.zeros(5-3, dtype=bool) #array([False, False])
arr=np.concatenate((ones,zeros), axis=0, out=None) #arr - array([ True, True, True, False, False])
np.random.shuffle(arr) # now arr - array([False, True, True, True, False])

Related

How to return a numpy array of the indices of the first element in each row of a numpy array with a given value?

Given a numpy array of shape (2, 4):
input = np.array([[False, True, False, True], [False, False, True, True]])
I want to return an array of shape (N,) where each element of the array is the index of the first True value:
expected = np.array([1, 2])
Is there an easy way to do this using numpy functions and without resorting to standard loops?
np.max with axis finds the max along the dimension; argmax finds the first max index:
In [42]: arr = np.array([[False, True, False, True], [False, False, True, True]])
In [43]: np.argmax(arr, axis=1)
Out[43]: array([1, 2])
This worked for me:
nonzeros = np.nonzero(input)
u, indices = np.unique(nonzeros[0], return_index=True)
expected = nonzeros[1][indices]

Mergining Two Numpy Boolean Arrays after an Index

In my problem I have 2 boolean numpy arrays that I would like to merge after a given index. Currently I am using np.logical_or(arr1, arr2), but executes on the entire array. I am trying to only execute the operation after an index.
Below I would like to use arr1 as the master and merge arr2 after any index.
For example, take the arrays and index below
arr1 = np.array([ True, False, False, True, False])
arr2 = np.array([False, True, True, True, False])
index = 2
Returns
# array([True, False, True, True, False])
You can use array slicing and np.concatenate to implement this. In this case, arr3 will consist of elements from arr1 in indices 0 to 'index' and the rest of the elements will be from arr2.
arr1 = np.array([ True, False, False, True, False])
arr2 = np.array([False, True, True, True, False])
index = 2
arr3= np.concatenate((arr1[:index], arr2[index:]), axis = 0)
print(arr3)

Using entrywise sum of boolean arrays as inclusive `or`

I would like to compare many m-by-n boolean numpy arrays and get an array of the same shape whose entries are True if the corresponding entry in at least one of the inputs is True.
The easiest way I've found to do this is:
In [5]: import numpy as np
In [6]: a = np.array([True, False, True])
In [7]: b = np.array([True, True, False])
In [8]: a + b
Out[8]: array([ True, True, True])
But I can also use
In [11]: np.stack([a, b]).sum(axis=0) > 0
Out[11]: array([ True, True, True])
Are these equivalent operations? Are there any gotchas I should be aware of? Is one method preferable to the other?
You can use np.logical_or
a = np.array([True, False, True])
b = np.array([True, True, False])
np.logical_or(a,b)
it also works for (m,n) arrays
a = np.random.rand(3,4) < 0.5
b = np.random.rand(3,4) < 0.5
print('a\n',a)
print('b\n',b)
np.logical_or(a,b)

numpy where operation on 2D array

I have a numpy array 'A' of size 571x24 and I am trying to find the index of zeros in it so I do:
>>>A.shape
(571L, 24L)
import numpy as np
z1 = np.where(A==0)
z1 is a tuple with following size:
>>> len(z1)
2
>>> len(z1[0])
29
>>> len(z1[1])
29
I was hoping to create a z1 of same size as A. How do I achieve that?
Edit: I want to create array z1 of booleans for presence of zero in A such that:
>>>z1.shape
(571L, 24L)
You can just check this with the equality operator in python with numpy. Example:
>>> A = np.array([[0,2,2,1],[2,0,0,3]])
>>> A == 0
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
np.where() does something else, see documentation. Although, it is possible to achieve this with np.where() using broadcasting. See documentation.
>>> np.where(A == 0, True, False)
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
Try this:
import numpy as np
myarray = np.array([[0,3,4,5],[9,4,0,4],[1,2,3,4]])
ix = np.in1d(myarray.ravel(), 0).reshape(myarray.shape)
Output of ix:
array([[ True, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)

pandas build matrix of row by row comparisons

I have two dataframes, a (10,2) and a (4,2) and I am looking for a faster/more pythonic way to compare them row by row.
x = pd.DataFrame([range(10),range(2,12)])
x = x.transpose()
y = pd.DataFrame([[5,8],[2,3],[5,5]])
I'd like to build a comparison matrix (10,3) that shows which of the rows in the first dataframe fit the following requirements in the second dataframe. the x1 value must be >= the y[0] value and the x[0] value must be <= the y1 value. In reality, the data are dates, but for simplicity I have just used integers to make this example easier to follow. We're testing for overlap in time periods, so the logic shows that there must be some overlap in the periods of the respective tables.
arr = np.zeros((len(x),len(y)), dtype=bool)
for xrow in x.index:
for yrow in y.index:
if x.loc[xrow,1] >= y.loc[yrow,0] and x.loc[xrow,0] <= y.loc[yrow,1]:
arr[xrow,yrow] = True
arr
The brute force approach above is too slow. Any suggestions for how I could vectorize this or do some sort of transposed matrix comparisons?
You can convert x, y to NumPy arrays and then extend dimensions with np.newaxis/None, which would bring in NumPy's broadcasting when performing the same operations. Thus, all those comparisons and the output boolean array would be created in a vectorized fashion. The implementation would look like this -
X = np.asarray(x)
Y = np.asarray(y)
arr = (X[:,None,1] >= Y[:,0]) & (X[:,None,0] <= Y[:,1])
Sample run -
In [207]: x = pd.DataFrame([range(10),range(2,12)])
...: x = x.transpose()
...: y = pd.DataFrame([[5,8],[2,3],[5,5]])
...:
In [208]: X = np.asarray(x)
...: Y = np.asarray(y)
...: arr = (X[:,None,1] >= Y[:,0]) & (X[:,None,0] <= Y[:,1])
...:
In [209]: arr
Out[209]:
array([[False, True, False],
[False, True, False],
[False, True, False],
[ True, True, True],
[ True, False, True],
[ True, False, True],
[ True, False, False],
[ True, False, False],
[ True, False, False],
[False, False, False]], dtype=bool)

Categories