I am new to python coding. Kindly, help me to achieve my requirement.
Suppose there are two arrays 'a' and 'b' of size 3*4
a = [[1,0,0,1],
[0,0,1,1],
[1,0,0,1]]
b = [[12,-34,-10,4],
[2,11,-12,20],
[-12,16,19,-9]]
Here, if b[i,j]<10 than I want the corresponding a[i,j] to be same(i.e it can be either 0 or 1) else change a[i,j] element to 1.
Expected outcome for the above example :
c = [[1,0,0,1],
[0,1,1,1],
[1,1,1,1]]
You can use the or | operator:
In [11]: b >= 10
Out[11]:
array([[ True, False, False, False],
[False, True, False, True],
[False, True, True, False]])
In [12]: a | (b >= 10)
Out[12]:
array([[1, 0, 0, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]])
The | is a bitwise or and is equivalent to np.bitwise_or:
In [13]: np.bitwise_or(a, b >= 10)
Out[13]:
array([[1, 0, 0, 1],
[0, 1, 1, 1],
[1, 1, 1, 1]])
This assumes both a and b are numpy arrays, you can make this so with the array constructor:
a, b = np.array(a), np.array(b)
if you do not want to use numpy you could do this nested list-comprehension:
c = [[el_a | (el_b >= 10) for el_a, el_b in zip(row_a, row_b)]
for row_a, row_b in zip(a, b)]
but i prefer Andy Hayden's anser. numpy really shines for that kind of operations.
Related
I'm trying to find elements of one DataFrame (df_other) which match a column in another DataFrame (df). In other words, I'd like to know where the values in df['a'] match the values in df_other['a'] for each row in df['a'].
An example might be easier to explain the expected result:
>>> import pandas as pd
>>> import numpy as np
>>>
>>>
>>> df = pd.DataFrame({'a': ['x', 'y', 'z']})
>>> df
a
0 x
1 y
2 z
>>> df_other = pd.DataFrame({'a': ['x', 'x', 'y', 'z', 'z2'], 'c': [1, 2, 3, 4, 5]})
>>> df_other
a c
0 x 1
1 x 2
2 y 3
3 z 4
4 z2 5
>>>
>>>
>>> u = df_other['c'].unique()
>>> u
array([1, 2, 3, 4, 5])
>>> bm = np.ones((len(df), len(u)), dtype=bool)
>>> bm
array([[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True]])
should yield a bitmap of
[
[1, 1, 0, 0, 0], # [1, 2] are df_other['c'] where df_other['a'] == df['a']
[0, 0, 1, 0, 0], # [3] matches
[0, 0, 0, 1, 0], # [4] matches
]
I'm looking for a fast numpy implementation that doesn't iterate through all rows (which is my current solution):
>>> df_other['a'] == df.loc[0, 'a']
0 True
1 True
2 False
3 False
4 False
Name: a, dtype: bool
>>>
>>>
>>> df_other['a'] == df.loc[1, 'a']
0 False
1 False
2 True
3 False
4 False
Name: a, dtype: bool
>>> df_other['a'] == df.loc[2, 'a']
0 False
1 False
2 False
3 True
4 False
Name: a, dtype: bool
Note: in the actual production code, there are many more column conditions ((df['a'] == df_other['a']) & (df['b'] == df_other['b'] & ...), but they are generally less than the number of rows in df, so I wouldn't mind a solution that loops over the conditions (and subsequently sets values in bm to false).
Also, the bitmap should have the shape of (len(df), len(df_other['c'].unique)).
numpy broadcasting is so useful here:
bm = df_other.values[:, 0] == df.values
Output:
>>> bm
array([[ True, True, False, False, False],
[False, False, True, False, False],
[False, False, False, True, False]])
If you need it as ints:
>>> bm.astype(int)
array([[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]])
Another way to do this using pandas methods are as follows:
pd.crosstab(df_other['a'], df_other['c']).reindex(df['a']).to_numpy(dtype=int)
Output:
array([[1, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]])
a = np.array([1, 2, 3])
b = np.array([4, 2, 3, 1, 0])
c = np.setdiff1d(b, a)
print("c", c)
The result is c [0, 4] but the answer I want is c [4 0].
How can I do that?
Get the mask of non-matches with np.in1d and simply boolean-index into b to retain the order of elements in it -
b[~np.in1d(b,a)]
Sample step-by-step run -
In [14]: a
Out[14]: array([1, 2, 3])
In [15]: b
Out[15]: array([4, 2, 3, 1, 0])
In [16]: ~np.in1d(b,a)
Out[16]: array([ True, False, False, False, True], dtype=bool)
In [17]: b[~np.in1d(b,a)]
Out[17]: array([4, 0])
If you want c to 1) have the elements of b that are not in a and 2) for them to be in the same order as they were in b you can use a list comprehension:
c = np.array([el for el in b if el not in a])
setdiff1d treats the arrays as sets and thus: 1) does not respect the order of the elements, 2) any element present more than once is treated as if it was present only once. For example this code:
a = np.array([4])
b = np.array([2, 4, 2, 1, 1])
c = np.setdiff1d(b, a)
print(c)
Will yield
[1, 2]
I would like to remove duplicates which follow each other, but not duplicates along the whole array. Also, I want to keep the ordering unchanged.
So if the input is [0 0 1 3 2 2 3 3] the output should be [0 1 3 2 3]
I found a way using itertools.groupby() but I am looking for a faster NumPy solution.
a[np.insert(np.diff(a).astype(np.bool), 0, True)]
Out[99]: array([0, 1, 3, 2, 3])
The general idea is to use diff to find the difference between two consecutive elements in the array. Then we only index those which give non-zero differences elements. But since the length of diff is shorter by 1. So before indexing, we need to insert the True to the beginning of the diff array.
Explanation:
In [100]: a
Out[100]: array([0, 0, 1, 3, 2, 2, 3, 3])
In [101]: diff = np.diff(a).astype(np.bool)
In [102]: diff
Out[102]: array([False, True, True, True, False, True, False], dtype=bool)
In [103]: idx = np.insert(diff, 0, True)
In [104]: idx
Out[104]: array([ True, False, True, True, True, False, True, False], dtype=bool)
In [105]: a[idx]
Out[105]: array([0, 1, 3, 2, 3])
For pure python wich also works with numpy arrays use this:
def modify(l):
last = None
for e in l:
if e != last:
yield e
last = e
pure = modify([0, 0, 1, 3, 2, 2, 3, 3])
import numpy
num = numpy.array(modify(numpy.array([0, 0, 1, 3, 2, 2, 3, 3])))
I don't know if there are any numpy functions wich would speed this up.
For NumPy version >= 1.16.0 you can use the prepend argument:
a[np.diff(a, prepend=np.nan).astype(bool)]
I would like to be able to quickly instantiate a matrix where the first few (variable number of) cells in a row are 0, and the rest are ones.
Imagine we want a 3x4 matrix.
I have instantiated the matrix first as all ones:
ones = np.ones([4,3])
Then imagine we have an array that announces how many leading zeros there are:
arr = np.array([2,1,3,0]) # first row has 2 zeroes, second row 1 zero, etc
Required result:
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
Obviously this can be done in the opposite way as well, but I'd consider the approach where 1 is a default value, and zeros would be replaced.
What would be the best way to avoid some silly loop?
Here's one way. n is the number of columns in the result. The number of rows is determined by len(arr).
In [29]: n = 5
In [30]: arr = np.array([1, 2, 3, 0, 3])
In [31]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[31]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
There are two parts to the explanation of how this works. First, how to create a row with m zeros and n-m ones? For that, we use np.arange to create a row with values [0, 1, ..., n-1]`:
In [35]: n
Out[35]: 5
In [36]: np.arange(n)
Out[36]: array([0, 1, 2, 3, 4])
Next, compare that array to m:
In [37]: m = 2
In [38]: np.arange(n) >= m
Out[38]: array([False, False, True, True, True], dtype=bool)
That gives an array of boolean values; the first m values are False and the rest are True. By casting those values to integers, we get an array of 0s and 1s:
In [39]: (np.arange(n) >= m).astype(int)
Out[39]: array([0, 0, 1, 1, 1])
To perform this over an array of m values (your arr), we use broadcasting; this is the second key idea of the explanation.
Note what arr[:, np.newaxis] gives:
In [40]: arr
Out[40]: array([1, 2, 3, 0, 3])
In [41]: arr[:, np.newaxis]
Out[41]:
array([[1],
[2],
[3],
[0],
[3]])
That is, arr[:, np.newaxis] reshapes arr into a 2-d array with shape (5, 1). (arr.reshape(-1, 1) could have been used instead.) Now when we compare this to np.arange(n) (a 1-d array with length n), broadcasting kicks in:
In [42]: np.arange(n) >= arr[:, np.newaxis]
Out[42]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[ True, True, True, True, True],
[False, False, False, True, True]], dtype=bool)
As #RogerFan points out in his comment, this is basically an outer product of the arguments, using the >= operation.
A final cast to type int gives the desired result:
In [43]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[43]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
Not as concise as I wanted (I was experimenting with mask_indices), but this will also do the work:
>>> n = 3
>>> zeros = [2, 1, 3, 0]
>>> numpy.array([[0] * zeros[i] + [1]*(n - zeros[i]) for i in range(len(zeros))])
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
>>>
Works very simple: concatenates multiplied required number of times, one-element lists [0] and [1], creating the array row by row.
Is it possible to convert an array of indices to an array of ones and zeros, given the range?
i.e. [2,3] -> [0, 0, 1, 1, 0], in range of 5
I'm trying to automate something like this:
>>> index_array = np.arange(200,300)
array([200, 201, ... , 299])
>>> mask_array = ??? # some function of index_array and 500
array([0, 0, 0, ..., 1, 1, 1, ... , 0, 0, 0])
>>> train(data[mask_array]) # trains with 200~299
>>> predict(data[~mask_array]) # predicts with 0~199, 300~499
Here's one way:
In [1]: index_array = np.array([3, 4, 7, 9])
In [2]: n = 15
In [3]: mask_array = np.zeros(n, dtype=int)
In [4]: mask_array[index_array] = 1
In [5]: mask_array
Out[5]: array([0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0])
If the mask is always a range, you can eliminate index_array, and assign 1 to a slice:
In [6]: mask_array = np.zeros(n, dtype=int)
In [7]: mask_array[5:10] = 1
In [8]: mask_array
Out[8]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
If you want an array of boolean values instead of integers, change the dtype of mask_array when it is created:
In [11]: mask_array = np.zeros(n, dtype=bool)
In [12]: mask_array
Out[12]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False], dtype=bool)
In [13]: mask_array[5:10] = True
In [14]: mask_array
Out[14]:
array([False, False, False, False, False, True, True, True, True,
True, False, False, False, False, False], dtype=bool)
For a single dimension, try:
n = (15,)
index_array = [2, 5, 7]
mask_array = numpy.zeros(n)
mask_array[index_array] = 1
For more than one dimension, convert your n-dimensional indices into one-dimensional ones, then use ravel:
n = (15, 15)
index_array = [[1, 4, 6], [10, 11, 2]] # you may need to transpose your indices!
mask_array = numpy.zeros(n)
flat_index_array = np.ravel_multi_index(
index_array,
mask_array.shape)
numpy.ravel(mask_array)[flat_index_array] = 1
There's a nice trick to do this as a one-liner, too - use the numpy.in1d and numpy.arange functions like this (the final line is the key part):
>>> x = np.linspace(-2, 2, 10)
>>> y = x**2 - 1
>>> idxs = np.where(y<0)
>>> np.in1d(np.arange(len(x)), idxs)
array([False, False, False, True, True, True, True, False, False, False], dtype=bool)
The downside of this approach is that it's ~10-100x slower than the appropch Warren Weckesser gave... but it's a one-liner, which may or may not be what you're looking for.
As requested, here it is in an answer. The code:
[x in index_array for x in range(500)]
will give you a mask like you asked for, but it will use Bools instead of 0's and 1's.