numpy.union that preserves order - python

Two arrays have been produced by dropping random values of an original array (with unique and unsorted elements):
orig = np.array([2, 1, 7, 5, 3, 8])
Let's say these arrays are:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3, 8])
Given just these two arrays, I need to merge them so that the dropped values are on their correct positions.
The result should be:
result = np.array([2, 1, 7, 3, 8])
Another example:
a1 = np.array([2, 1, 7, 5, 8])
b1 = np.array([2, 5, 3, 8])
# the result should be: [2, 1, 7, 5, 3, 8]
Edit:
This question is ambiguous because it is unclear what to do in this situation:
a2 = np.array([2, 1, 7, 8])
b2 = np.array([2, 5, 3, 8])
# the result should be: ???
What I have in reality + solution:
Elements of these arrays are indices of two data frames containing time series. I can use pandas.merge_ordered in order to achieve the ordered indices as I want.
My previous attempts:
numpy.union1d is not suitable, because it always sorts:
np.union1d(a, b)
# array([1, 2, 3, 7, 8]) - not what I want
Maybe pandas could help?
These methods use the first array in full, and then append the leftover values of the second one:
pd.concat([pd.Series(index=a, dtype=int), pd.Series(index=b, dtype=int)], axis=1).index.to_numpy()
pd.Index(a).union(b, sort=False).to_numpy() # jezrael's version
# array([2, 1, 7, 8, 3]) - not what I want

Idea is join both arrays with flatten and then remove duplicates in order:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3, 8])
c = np.vstack((a, b)).ravel(order='F')
_, idx = np.unique(c, return_index=True)
c = c[np.sort(idx)]
print (c)
[2 1 7 3 8]
Pandas solution:
c = pd.DataFrame([a,b]).unstack().unique()
print (c)
[2 1 7 3 8]
If different number of values:
a = np.array([2, 1, 7, 8])
b = np.array([2, 7, 3])
c = pd.DataFrame({'a':pd.Series(a), 'b':pd.Series(b)}).stack().astype(int).unique()
print (c)
[2 1 7 3 8]

Related

How to remove specific elements in a numpy array (passing a list of values not indexes)

I have a 1d numpy array and a list of values to remove (not indexes), how can I modify this code so that the actual values not indexes are removed
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
values_to_remove = [2, 3, 6]
new_a = np.delete(a, values_to_remove)
So what I want to delete is the values 2,3,6 NOT their corresponding index. Actually the list is quite long so ideally I should be able to pass the second parameter as a list
So the final array should actually be = 1, 4, 5, 7, 8, 9
Use this:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
values_to_remove = [2, 3, 6]
for i in range(0, len(values_to_remove)):
index = np.where(a==values_to_remove[i])
a = np.delete(a, index[0][0])
print(a)
Output:
[1 4 5 7 8 9]
You can use numpy.isin
If you don't mind a copy:
out = a[~np.isin(a, values_to_remove)]
Output: array([1, 4, 5, 7, 8, 9])
To update in place:
np.delete(a, np.isin(a, values_to_remove))
updated a: array([1, 4, 5, 7, 8, 9])
Intermediate:
np.isin(a, values_to_remove)
# array([False, True, True, False, False, True, False, False, False])

How can I get each combination of a set of arrays in python

How can I (efficiently) get each combination of a group of 1D-arrays into a 2D array?
Let's say I have arrays A, B, C, and D and I want to create a 2D array with each combination such that I would have 2D arrays that represent AB, AC, AD, ABC, ABD, ..., CD.
For clarity on my notation above:
A = np.array([1,2,3,4,5])
B = np.array([2,3,4,5,6])
C = np.array([3,4,5,6,7])
so
AB = np.array([1,2,3,4,5], [2,3,4,5,6])
ABC = np.array([1,2,3,4,5], [2,3,4,5,6],[3,4,5,6,7])
So far I have tried something like:
A = np.array([1,2,3,4,5])
B = np.array([2,3,4,5,6])
C = np.array([3,4,5,6,7])
D = np.array([4,5,6,7,8])
stacked = np.vstack((A,B,C,D), axis=0)
combos = []
it2 = itertools.combinations(range(4), r=2)
for i in list(it2):
combos.append(i)
it3 = itertools.combinations(range(4), r=3)
for i in list(it3):
combos.append(i)
it4 = itertools.combinations(range(4), r=4)
for i in list(it4):
combos.append(i)
which gets me a list of all the possible combos. then I can apply something like:
for combo in combos:
stacked[combo,:]
#Then I do something with each combo
And this is where I get stuck
This is fine when it's only A,B,C,D but if I have A,B,C,... X,Y,Z then the approach above doesn't scale as I'd have to call itertools 20+ times.
How can I overcome this and make it more flexible (in practice the number of arrays will likely be 5-10)?
As others have also recommended, use itertools.combinations
import numpy as np
from itertools import combinations
A = np.array([1,2,3,4,5])
B = np.array([2,3,4,5,6])
C = np.array([3,4,5,6,7])
arrays = [A, B, C]
combos = []
for i in range(2, len(arrays) + 1):
combos.extend(combinations(arrays, i))
for combo in combos:
arr = np.vstack(combo) # do stuff with array
You can use an additional outer for-loop:
arrays = np.array([ # let's say your input arrays are stored as one 2d array
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
...
])
combos = []
for r in range(2, len(arrays)+1):
combos.extend(it.combinations(range(len(arrays)), r=r))
When you have N items, there are 2^N combinations, so this will take 2^N iterations.
You can go through these 2^N iterations with a single loop if you use a for loop for the range 0 <= n < (2^N) and use bitwise operations to select the items from the list of items according the the current n.
You could try this:
from itertools import combinations
A = np.array([1,2,3,4,5])
B = np.array([2,3,4,5,6])
C = np.array([3,4,5,6,7])
lst = [A,B,C]
[list(combinations(lst, i)) for i in range(1,len(lst)+1)]
out:
# [[(array([1, 2, 3, 4, 5]),),
# (array([2, 3, 4, 5, 6]),),
# (array([3, 4, 5, 6, 7]),)],
# [(array([1, 2, 3, 4, 5]), array([2, 3, 4, 5, 6])),
# (array([1, 2, 3, 4, 5]), array([3, 4, 5, 6, 7])),
# (array([2, 3, 4, 5, 6]), array([3, 4, 5, 6, 7]))],
# [(array([1, 2, 3, 4, 5]), array([2, 3, 4, 5, 6]), array([3, 4, 5, 6, 7]))]]

How can I merge rows in np matrix?

I've got a numpy matrix that has 2 rows and N columns, e.g. (if N=4):
[[ 1 3 5 7]
[ 2 4 6 8]]
The goal is create a string 1,2,3,4,5,6,7,8.
Merge the rows such that the elements from the first row have the even (1, 3, ..., N - 1) positions (the index starts from 1) and the elements from the second row have the odd positions (2, 4, ..., N).
The following code works but it isn't really nice:
xs = []
for i in range(number_of_cols):
xs.append(nums.item(0, i))
ys = []
for i in range(number_of_cols):
ys.append(nums.item(1, i))
nums_str = ""
for i in range(number_of_cols):
nums_str += '{},{},'.format(xs[i], ys[i])
Join the result list with a comma as a delimiter (row.join(','))
How can I merge the rows using built in functions (or just in a more elegant way overall)?
Specify F order when flattening (or ravel):
In [279]: arr = np.array([[1,3,5,7],[2,4,6,8]])
In [280]: arr
Out[280]:
array([[1, 3, 5, 7],
[2, 4, 6, 8]])
In [281]: arr.ravel(order='F')
Out[281]: array([1, 2, 3, 4, 5, 6, 7, 8])
Joining rows can be done this way :
>>> a = np.arange(12).reshape(3,4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.hstack([a[i,:] for i in range(a.shape[0])])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Then it's simple to convert this array into string.
Here's one way of doing it:
out_str = ','.join(nums.T.ravel().astype('str'))
We are first transposing the array with .T, then flattening it with .ravel(), then converting each element from int to str, and then applying `','.join() to combine all the str elements
Trying it out:
import numpy as np
nums = np.array([[1,3,5,7],[2,4,6,8]])
out_str = ','.join(nums.T.ravel().astype('str'))
print (out_str)
Result:
1,2,3,4,5,6,7,8

Python: Creating list of subarrays

I have a massive array but for illustration I am using an array of size 14. I have another list which contains 2, 3, 3, 6. How do I efficiently without for look create a list of new arrays such that:
import numpy as np
A = np.array([1,2,4,5,7,1,2,4,5,7,2,8,12,3]) # array with 1 axis
subArraysizes = np.array( 2, 3, 3, 6 ) #sums to number of elements in A
B = list()
B[0] = [1,2]
B[1] = [4,5,7]
B[2] = [1,2,4]
B[3] = [5,7,2,8,12,3]
i.e. select first 2 elements from A store it in B, select next 3 elements of A store it in B and so on in the order it appears in A.
You can use np.split -
B = np.split(A,subArraysizes.cumsum())[:-1]
Sample run -
In [75]: A
Out[75]: array([ 1, 2, 4, 5, 7, 1, 2, 4, 5, 7, 2, 8, 12, 3])
In [76]: subArraysizes
Out[76]: array([2, 3, 3, 6])
In [77]: np.split(A,subArraysizes.cumsum())[:-1]
Out[77]:
[array([1, 2]),
array([4, 5, 7]),
array([1, 2, 4]),
array([ 5, 7, 2, 8, 12, 3])]

Fastest way to count identical sub-arrays in a nd-array?

Let's consider a 2d-array A
2 3 5 7
2 3 5 7
1 7 1 4
5 8 6 0
2 3 5 7
The first, second and last lines are identical. The algorithm I'm looking for should return the number of identical rows for each different row (=number of duplicates of each element). If the script can be easily modified to also count the number of identical column also, it would be great.
I use an inefficient naive algorithm to do that:
import numpy
A=numpy.array([[2, 3, 5, 7],[2, 3, 5, 7],[1, 7, 1, 4],[5, 8, 6, 0],[2, 3, 5, 7]])
i=0
end = len(A)
while i<end:
print i,
j=i+1
numberID = 1
while j<end:
print j
if numpy.array_equal(A[i,:] ,A[j,:]):
numberID+=1
j+=1
i+=1
print A, len(A)
Expected result:
array([3,1,1]) # number identical arrays per line
My algo looks like using native python within numpy, thus inefficient. Thanks for help.
In unumpy >= 1.9.0, np.unique has a return_counts keyword argument you can combine with the solution here to get the counts:
b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])
>>> unq_a
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> unq_cnt
array([1, 3, 1])
In an older numpy, you can replicate what np.unique does, which would look something like:
a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)
>>> a_unq
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> a_cnt
array([1, 3, 1])
You can lexsort on the row entries, which will give you the indices for traversing the rows in sorted order, making the search O(n) rather than O(n^2). Note that by default, the elements in the last column sort last, i.e. the rows are 'alphabetized' right to left rather than left to right.
In [9]: a
Out[9]:
array([[2, 3, 5, 7],
[2, 3, 5, 7],
[1, 7, 1, 4],
[5, 8, 6, 0],
[2, 3, 5, 7]])
In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])
In [11]: a[lexsort(a.T)]
Out[11]:
array([[5, 8, 6, 0],
[1, 7, 1, 4],
[2, 3, 5, 7],
[2, 3, 5, 7],
[2, 3, 5, 7]])
You can use Counter class from collections module for this.
It works like this :
x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c
Output : Counter({2: 3, 1: 1, 5: 1})
Only issue you will face is in your case since every value of x is itself a list which is a non hashable data structure.
If you can convert every value of x in a tuple that it should works as :
x = [(2, 3, 5, 7),(2, 3, 5, 7),(1, 7, 1, 4),(5, 8, 6, 0),(2, 3, 5, 7)]
from collections import Counter
c=Counter(x)
print c
Output : Counter({(2, 3, 5, 7): 3, (5, 8, 6, 0): 1, (1, 7, 1, 4): 1})

Categories