Compare 2 numpy arrays get positional difference - python

So i have two arrays as shown in the example below:
import os
import numpy as np
tiA = np.array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2]) * 1000
tiB = np.array([0.1,0.2,0.4,0.5,0.6,0.7,0.8,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2]) * 1000
res = [idx for idx, elem in enumerate(tiB)
if elem != tiA[idx]]
print(res)
It gives me an answer of [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. However i wanted to get position 3 and position 9 or [2,8] as an answer because 0.3 and 0.9 is missing from tiB compared to tiA. Also how can i use this answer to select a 4d array. So my array is sized arrayA=(128x128x5xtiA). However i want my new array to be sized arrayB=(128x128x5xtiB) selected from tiA. So basically arrayB will be missing [2,8] of the 4th dimension as shown in my example as compared to arrayA. My problem is most of the time there can be multiple differences (1,2,3 missing) between tiA and tiB. Thank you for all your help.
Kevin

Your code was already a good start, but you need to check if the element is not in the other list and not if the elements at the same position are equal.
import numpy as np
tiA = np.array([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2]) * 1000
tiB = np.array([0.1,0.2,0.4,0.5,0.6,0.7,0.8,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2]) * 1000
res = [idx for idx, elem in enumerate(tiA) if elem not in tiB]
print(res)
For the second part you should be able to use np.delete. You just need to define the indices (res) and the correct axis (should be 3 in your case then).
arrayB = np.delete(arrayA, res, 3))

Related

How do I generate a list of all possible combinations from a single element in a pair, from numerous pairs nested within a parent list?

I find myself in a unique situation in which I need to multiply single elements within a listed pair of numbers where each pair is nested within a parent list of elements. For example, I have my pre-defined variables as:
output = []
initial_list = [[1,2],[3,4],[5,6]]
I am trying to calculate an output such that each element is the product of a unique combination (always of length len(initial_list)) of a single element from each pair. Using my example of initial_list, I am looking to generate an output of length pow(2 * len(initial_list)) that is scable for any "n" number of pairs in initial_list (with a minimum of 2 pairs). So in this case each element of the output would be as follows:
output[0] = 1 * 3 * 5
output[1] = 1 * 3 * 6
output[2] = 1 * 4 * 5
output[3] = 1 * 4 * 6
output[4] = 2 * 3 * 5
output[5] = 2 * 3 * 6
output[6] = 2 * 4 * 5
output[7] = 2 * 4 * 6
In my specific case, the order of output assignments does not matter other than output[0], which I need to be equivalent to the product of the first element in each pair in initial_list. What is the best way to proceed to generate an output list such that each element is a unique combination of every element in each list?
...
My initial approach consisted of using;
from itertools import combinations
from itertools import permutations
from itertools import product
to somehow generate a list of every possible combination then multiply the products together and append each product to the output list, but I couldn't figure out a wait to implement the tools successfully. I have since tried to create a recursive function that combines for x in range(2): with nested recursion recalls, but once again I cannot figured out a solution.
Someone more experienced and smarter than me please help me out; Any and all help is appreciated! Thank you!
Without using any external library
def multi_comb(my_list):
"""
This returns the multiplication of
every possible combinationation of
the `my_list` of type [[a1, a2], [b1, b2], ...]
Arg: List
Return: List
"""
if not my_list: return [1]
a, b = my_list.pop(0)
result = multi_comb(my_list)
left = [a * i for i in result]
right = [b * i for i in result]
return (left + right)
print(multi_comb([[1, 2], [3, 4], [5, 6]]))
# Output
# [15, 18, 20, 24, 30, 36, 40, 48]
I am using reccursion to get the result. Here's the visual illustration of how this works.
Instead of taking a top-down approach, we can take bottom-up approach to better understand how this program works.
At the last step, a and b becomes 5 and 6 respectively. Calling multi_comb() with empty list returns [1] as a result. So left and right becomes [5] and [6]. Thus we return [5, 6] to our previous step.
At the second last step, a and b was 3 and 4 respectively. From the last step we got [5, 6] as a result. After multiplying each of the values inside the result with a and b (notice left and right), we return the result [15, 18, 20, 24] to our previous step.
At our first step, that is our starting step, we had a and b as 1 and 2 respectively. The value returned from our last step becomes our result, ie, [15, 18, 20, 24]. Now we multiply both a and b with this result and return our final output.
Note:
This program works only if list is in the form [ [a1, a2], [b1, b2], [c1, c2], ... ] as told by the OP in the comments. The problem of solving the list containing the sub-list of n items will be little different in code, but the concept is same as in this answer.
This problem can also be solved using dynamic programming
output = [1, ]
for arr in initial_list:
output = [a * b for a in arr for b in product]
This problem is easy to solve if you have just one subarray -- the output is the given subarray.
Suppose you solved the problem for the first n - 1 subarrays, and you got the output. The new subarray is appended. How the output should change? The new output is all pair-wise products of the previous output and the "new" subarray.
Look closely, there's an easy pattern. Let there be n sublists, and 2 elements in each: at index 0 and 1. Now, the indexes selected can be represented as a binary string of length n.
It'll start with 0000..000, then 0000...001, 0000...010 and so on. So all you need to do is:
n = len(lst)
for i in range(2**n):
binary = bin(i)[2:] #get binary representation
for j in range(n):
if binary[j]=="1":
#include jth list's 1st index in product
else:
#include jth list's 0th index in product
The problem would a scalable solution would be, since you're generating all possible pairs, the time complexity will be O(2^N)
Your idea to use itertools.product is great!
import itertools
initial_list = [[1,2],[3,4],[5,6]]
combinations = list(itertools.product(*initial_list))
# [(1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6)]
Now, you can get the product of each tuple in combination using for-loops, or using functools.reduce, or you can use math.prod which was introduced in python 3.8:
import itertools
import math
initial_list = [[1,2],[3,4],[5,6]]
output = [math.prod(c) for c in itertools.product(*initial_list)]
# [15, 18, 20, 24, 30, 36, 40, 48]
import itertools
import functools
import operator
initial_list = [[1,2],[3,4],[5,6]]
output = [functools.reduce(operator.mul, c) for c in itertools.product(*initial_list)]
# [15, 18, 20, 24, 30, 36, 40, 48]
import itertools
output = []
for c in itertools.product(*initial_list):
p = 1
for x in c:
p *= x
output.append(p)
# output == [15, 18, 20, 24, 30, 36, 40, 48]
Note: if you are more familiar with lambdas, operator.mul is pretty much equivalent to lambda x,y: x*y.
itertools.product and math.prod are a nice fit -
from itertools import product
from math import prod
input = [[1,2],[3,4],[5,6]]
output = [prod(x) for x in product(*input)]
print(output)
[15, 18, 20, 24, 30, 36, 40, 48]

Get maximum value of each index between multiple numpy arrays

So I have three NumPy arrays, all with 300 elements in them. Is there any way I could create a new array with the greatest value at each index? I'm not sure where to start since I'm not comparing numbers in the same list. I know there is some kind of loop where you start from 0 to the length and you need to initialize an empty array to populate, but I'm not sure how'd you compare the values at each index. Very likely I'm overthinking.
Ex.
a = [16,24,52]
b = [22,15,136]
c = [9,2,142]
Output = [22,24,142]
Since all your arrays have the same lenghts, you can stack them vertically by using np.vstack. Then, use np.max on axis=0:
import numpy as np
a = np.array([16, 24, 52])
b = np.array([22, 15, 136])
c = np.array([9, 2, 142])
out = np.max(np.vstack((a, b, c)), axis=0)
print(out)
Output:
[ 22 24 142]
Hope that helps!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.1
NumPy: 1.18.1
----------------------------------------
You can use amax.
np.amax(np.array([a,b,c]), axis=0)
Output:
array([ 22, 24, 142])
If you want to follow your original idea involving a loop and the initialization of an array, you can use np.zeros() followed by range() and max():
import numpy as np
a = np.array([16, 24, 52])
b = np.array([22, 15, 136])
c = np.array([9, 2, 142])
# initialize array filled with zeros
Output = np.zeros(len(a), dtype=int)
# populate array
for i in range(len(a)):
Output[i] = max(a[i], b[i], c[i])
print(Output)
Output:
[ 22 24 142]

How to efficiently create a frequency table of numbers of entries in an array Python

I'm trying to implement an efficient way of creating a frequency table in python, with a rather large numpy input array of ~30 million entries. Currently I am using a for-loop, but it's taking far too long.
The input is an ordered numpy array of the form
Y = np.array([4, 4, 4, 6, 6, 7, 8, 9, 9, 9..... etc])
And I would like to have an output of the form:
Z = {4:3, 5:0, 6:2, 7:1,8:1,9:3..... etc} (as any data type)
Currently I am using the following implementation:
Z = pd.Series(index = np.arange(Y.min(), Y.max()))
for i in range(Y.min(), Y.max()):
Z[i] = (Y == i).sum()
Is there a quicker way of doing this or a way without iterating through a loop? Thanks for helping, and sorry if this has been asked before!
You can simply do this using Counter from collections module. Please see the below code i ran for your test case.
import numpy as np
from collections import Counter
Y = np.array([4, 4, 4, 6, 6, 7, 8, 9, 9, 9,10,5,5,5])
print(Counter(Y))
It gave the following output
Counter({4: 3, 9: 3, 5: 3, 6: 2, 7: 1, 8: 1, 10: 1})
you can easily use this object for further. I hope this helps.
If your input array x is sorted, you can do the following to get the counts in linear time:
diff1 = np.diff(x)
# get indices of the elements at which jumps occurred
jumps = np.concatenate([[0], np.where(diff1 > 0)[0] + 1, [len(x)]])
unique_elements = x[jumps[:-1]]
counts = np.diff(jumps)
I think numpy.unique is your solution.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.unique.html
import numpy as np
t = np.random.randint(0, 1000, 100000000)
print(np.unique(t, return_counts=True))
This takes ~4 seconds for me.
The collections.Counter approach takes ~10 seconds.
But the numpy.unique returns the frequencies in an array and the collections.Counter returns a dictionary. It's up to convenience.
Edit. I cannot comment on other posts so I'll write here that #lomereiters solution is lightning fast (linear) and should be the accepted one.

How to use np.where() to create a new array of specific rows?

I have an array (msaarr) of 1700 values, ranging from approximately 0 to 150. I know that 894 of these values should be less than 2, and I wish to create a new array containing only these values.
So far, I have attempted this code:
Combined = np.zeros(shape=(894,8))
for i in range(len(Spitzer)): #len(Spitzer) = 1700
index = np.where(msaarr <= 2)
Combined[:,0] = msaarr[index]
The reason there are eight columns is because I have more data associated with each value in msaarr that I also want to display. msaarr was created using several lines of code, which is why I haven't mentioned them here, but it is an array with shape (1700,1) with type float64.
The problem I'm having is that if I print msaarr[index], then I get an array of shape (893,), but when I attempt to assign this as my zeroth column, I get the error
ValueError: could not broadcast input array from shape (1699) into shape (894)
I also attempted
Combined[:,0] = np.extract(msaarr <= 2, msaarr)
Which gave the same error.
I thought at first this might just be some confusion with Python's zero-indexing, so I tried changing the shape to 893, and also tried to assign to a different column Combined[:,1], but I have the same error every time.
Alternatively, when I try:
Combined[:,1][i] = msaarr[index][i]
I get the error:
IndexError: index 894 is out of bounds for axis 0 with size 894
What am I doing wrong?
EDIT:
A friend pointed out that I might not be calling index correctly because it is a tuple, and so his suggestion was this:
index = np.where(msaarr < 2)
Combined[:,0] = msaarr[index[0][:]]
But I am still getting this error:
ValueError: could not broadcast input array from shape (893,1) into shape (893)
How can my shape be (893) and not (893, 1)?
Also, I did check, and len(index[0][:]) = 893, and len(msaarr[index[0][:]]) = 893.
The full code as of last attempts is:
import numpy as np
from astropy.io import ascii
from astropy.io import fits
targets = fits.getdata('/Users/vcolt/Dropbox/ATLAS source matches/OzDES.fits')
Spitzer = ascii.read(r'/Users/vcolt/Desktop/Catalogue/cdfs_spitzer.csv', header_start=0, data_start=1)
## Find minimum separations, indexed.
RADiffArr = np.zeros(shape=(len(Spitzer),1))
DecDiffArr = np.zeros(shape=(len(Spitzer),1))
msaarr = np.zeros(shape=(len(Spitzer),1))
Combined= np.zeros(shape=(893,8))
for i in range(len(Spitzer)):
x = Spitzer["RA_IR"][i]
y = Spitzer["DEC_IR"][i]
sep = abs(np.sqrt(((x - targets["RA"])*np.cos(np.array(y)))**2 + (y - targets["DEC"])**2))
minsep = np.nanmin(sep)
minseparc = minsep*3600
msaarr[i] = minseparc
min_positions = [j for j, p in enumerate(sep) if p == minsep]
x2 = targets["RA"][min_positions][0]
RADiff = x*3600 - x2*3600
RADiffArr[i] = RADiff
y2 = targets["DEC"][min_positions][0]
DecDiff = y*3600 - y2*3600
DecDiffArr[i] = DecDiff
index = np.where(msaarr < 2)
print msaarr[index].shape
Combined[:,0] = msaarr[index[0][:]]
I get the same error whether index = np.where(msaarr < 2) is in or out of the loop.
Take a look at using numpy.take in combination with numpy.where.
inds = np.where(msaarr <= 2)
new_msaarr = np.take(msaarr, inds)
If it is a multi-dimensional array, you can also add the axis keyword to take slices along that axis.
I think loop is not at the right place. np.where() will return an array of index of elements which matches the condition you have specified.
This should suffice
Index = np.where(msaarr <= 2)
Since index is an array. You need to loop over this index and fill the values in combined[:0]
Also I want to point out one thing. You have said that there will be 894 values less than 2 but in the code you are using less than and equal to 2.
np.where(condition) will return a tuple of arrays containing the indexes of elements that verify your condition.
To get an array of the elements verifying your condition use:
new_array = msaarr[msaarr <= 2]
>>> x = np.random.randint(0, 10, (4, 4))
>>> x
array([[1, 6, 8, 4],
[0, 6, 6, 5],
[9, 6, 4, 4],
[9, 6, 8, 6]])
>>> x[x>2]
array([6, 8, 4, 6, 6, 5, 9, 6, 4, 4, 9, 6, 8, 6])

Reshaping an dynamic array of numbers to a fixed size

I'm working with a small project where I'm to generate linear graphs and say for example I need to have 10 points but the data can be an array of 1000's of points.
[1,5,3,5,6,33,9,1,12,4,2]
Considering the array of integers (11 values) above and I want to the reshape this into an array with 3 values and at the same time adding up the values for a final result like this:
[14,49,18] (4 values + 4 values + 3 values)
What would the best approach be to have function that can handle any size (at least bigger than the size reshaping into) in python without any external libraries?
Did you want something like this:
a = [1, 5, 3, 5, 6, 33, 9, 1, 12, 4, 2]
step = 4
print [sum(a[i:i + step]) for i in range(0, len(a), step)]
which outputs
[14, 49, 18]
Inspired by the grouper recipe in itertools docs
data = [1,5,3,5,6,33,9,1,12,4,2]
from itertools import izip_longest
print map(sum, izip_longest(*[iter(data)] * 4, fillvalue = 0))
# [14, 49, 18]

Categories