Find Value that Partitions two Numpy Arrays Equally - python

I have two arrays (x1 and x2) of equal length that have overlapping ranges of values.
I need to find a value q such that l1-l2 is minimized, and
l1 = x1[np.where(x1 > q)].shape[0]
l2 = x2[np.where(x2 < q)].shape[0]
I need this to be reasonably high-performance since the arrays can be large. A solution using native numpy routines would be preferred.

There may be a smarter way to look for a value, but you can do an exhaustive search as follows:
>>> x1 = np.random.rand(10)
>>> x2 = np.random.rand(10)
>>> x1.sort()
>>> x2.sort()
>>> x1
array([ 0.12568451, 0.30256769, 0.33478133, 0.41973331, 0.46493576,
0.52173197, 0.72289189, 0.72834444, 0.78662283, 0.78796277])
>>> x2
array([ 0.05513774, 0.21567893, 0.29953634, 0.37426842, 0.40000622,
0.54602497, 0.7225469 , 0.80116148, 0.82542633, 0.86736597])
We can compute l1 if q is one of the items in x1 as:
>>> l1_x1 = len(x1) - np.arange(len(x1)) - 1
>>> l1_x1
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
And l2 for the same q as:
>>> l2_x1 = np.searchsorted(x1, x2)
>>> l2_x1
array([ 0, 1, 1, 3, 3, 6, 6, 10, 10, 10], dtype=int64)
You can similarly get values for l1 and l2 when q is in x2:
>>> l2_x2 = np.arange(len(x2))
>>> l2_x2
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> l1_x2 = len(x1) - np.searchsorted(x1, x2, side='right')
>>> l1_x2
array([10, 9, 9, 7, 7, 4, 4, 0, 0, 0], dtype=int64)
And you then simply check for the minimum of l1 - l2:
>>> np.concatenate((l1_x1 - l2_x1, l1_x2 - l2_x2))
array([ 9, 7, 6, 3, 2, -2, -3, -8, -9, -10, 10, 8, 7,
4, 3, -1, -2, -7, -8, -9], dtype=int64)
>>> q_idx = np.argmin(np.abs(np.concatenate((l1_x1 - l2_x1, l1_x2 - l2_x2))))
>>> q = x1[q_idx] if q_idx < len(x1) else x2[q_idx - len(x1)]
>>> q
0.54602497466094291
>>> x1[x1 > q].shape[0]
4L
>>> x2[x2 < q].shape[0]
5L

I think I may have found a fairly simple way to do it.
x1 = (50 - 10) * np.random.random(10000) + 10
x2 = (75 - 25) * np.random.random(10000) + 25
x1.sort()
x2.sort()
x2 = x2[::-1] # reverse the array
# The overlap point should fall where the difference is smallest
diff = np.abs(x1 - x2)
# get the index of where the minimum occurs
loc = np.where(diff == np.min(diff))
q1 = x1[loc] # 38.79087351
q2 = x2[loc] # 38.79110941
M4rtini's solution produces q = 38.7867527.

This is fundamentally an interval problem so you might want to do some reading on Interval trees, but you don't need to understand interval trees to solve this problem.
If you think of every (x1[i], x2[i]) as being an interval, you're looking for the value q which splits the intervals into two groups as evenly as possible ignoring intervals that overlap q. Lets take the easy case first:
from numpy import array
x1 = array([19, 32, 47, 13, 56, 1, 87, 48])
x2 = array([44, 38, 50, 39, 85, 26, 92, 64])
x1sort = np.sort(x1)
x2sort = np.sort(x2)[::-1]
diff = abs(x2sort - x1sort)
mindiff = diff.argmin()
print mindiff, x2sort[mindiff], x1sort[mindiff]
# 4 44 47
#xvtk's solution works well in this case and gives us a range of [44, 47]. Because no intervals overlap the range, all values of q in the range are equivalent and yield an optimal result. Here is an example that is a little more tricky:
x1 = array([12, 65, 46, 81, 71, 77, 37])
x2 = array([ 20, 85, 59, 122, 101, 87, 58])
x1sort = np.sort(x1)
x2sort = np.sort(x2)[::-1]
diff = abs(x2sort - x1sort)
mindiff = diff.argmin()
print mindiff, x2sort[mindiff], x1sort[mindiff], x1sort[mindiff-1]
# 59 71 65
Here the solution gives us a range of [59, 71] but notice that not all values in the range are equivalent. Anything to the left of the green line will produce 3 and 4 intervals on the left and right respectively, while anything to the right of the green line will produce 3 intervals on both sides.
I'm pretty sure that the optimal solution is guaranteed to be in the range produced by #xvtk's solution. It's possible that one of the red lines is guaranteed to be an optimal solution, though I'm not sure on this point. Hope that helps.

Maybe use some of the optimizing functions in scipy to minimize the difference.
Like this for example
import numpy as np
from scipy.optimize import fmin
def findQ(q, *x):
x1, x2 = x
l1 = x1[np.where(x1 > q)].shape[0]
l2 = x2[np.where(x2 < q)].shape[0]
return abs(l1-l2)
x1 = (50 - 10) * np.random.random(10000) + 10
x2 = (75 - 25) * np.random.random(10000) + 25
q0 = (min(x2) + max(x1))/2.0
q = fmin(findQ, q0, (x1,x2))

Related

Is there a way to conditionally index 3D-numpy array?

Having an array A with the shape (2,6, 60), is it possible to index it based on a binary array B of shape (6,)?
The 6 and 60 is quite arbitrary, they are simply the 2D data I wish to access.
The underlying thing I am trying to do is to calculate two variants of the 2D data (in this case, (6,60)) and then efficiently select the ones with the lowest total sum - that is where the binary (6,) array comes from.
Example: For B = [1,0,1,0,1,0] what I wish to receive is equal to stacking
A[1,0,:]
A[0,1,:]
A[1,2,:]
A[0,3,:]
A[1,4,:]
A[0,5,:]
but I would like to do it by direct indexing and not a for-loop.
I have tried A[B], A[:,B,:], A[B,:,:] A[:,:,B] with none of them providing the desired (6,60) matrix.
import numpy as np
A = np.array([[4, 4, 4, 4, 4, 4], [1, 1, 1, 1, 1, 1]])
A = np.atleast_3d(A)
A = np.tile(A, (1,1,60)
B = np.array([1, 0, 1, 0, 1, 0])
A[B]
Expected results are a (6,60) array containing the elements from A as described above, the received is either (2,6,60) or (6,6,60).
Thank you in advance,
Linus
You can generate a range of the indices you want to iterate over, in your case from 0 to 5:
count = A.shape[1]
indices = np.arange(count) # np.arange(6) for your particular case
>>> print(indices)
array([0, 1, 2, 3, 4, 5])
And then you can use that to do your advanced indexing:
result_array = A[B[indices], indices, :]
If you always use the full range from 0 to length - 1 (i.e. 0 to 5 in your case) of the second axis of A in increasing order, you can simplify that to:
result_array = A[B, indices, :]
# or the ugly result_array = A[B, np.arange(A.shape[1]), :]
Or even this if it's always 6:
result_array = A[B, np.arange(6), :]
An alternative solution using np.take_along_axis (from version 1.15 - docs)
import numpy as np
x = np.arange(2*6*6).reshape((2,6,6))
m = np.zeros(6, int)
m[0] = 1
#example: [1, 0, 0, 0, 0, 0]
np.take_along_axis(x, m[None, :, None], 0) #add dimensions to mask to match array dimensions
>>array([[[36, 37, 38, 39, 40, 41],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]]])

Positive Count // Negative Sum

A fairly easy problem, but I'm still practicing iterating over multiple variables with for loops. In the below, I seek to return a new list, where x is the count of positive numbers and y is the sum of negative numbers from an input array arr.
If the input array is empty or null, I am to return an empty array.
Here's what I've got!
def count_positives_sum_negatives(arr):
return [] if not arr else [(count(x), sum(y)) for x, y in arr]
Currently receiving...
TypeError: 'int' object is not iterable
Simply use a sum comprehension
>>> arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, -11, -12, -13, -14, -15]
>>> sum(1 for x in arr if x > 0)
10
>>> sum(x for x in arr if x < 0)
-65
wim's way is good. Numpy is good for these types of things too.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, -11, -12, -13, -14, -15])
print([arr[arr >= 0].size, arr[arr < 0].sum()])
>> [10, -65]
the error you get is from this part for x,y in arr that mean that arr is expected to be a list of tuples of 2 elements (or any similar container), like for example this [(1,2), (5,7), (7,9)] but what you have is a list of numbers, which don't contain anything else inside...
Now to get your desire result you can use the solution of wim, which need to iterate over the list twice or you can get it in one go with
>>> def fun(iterable):
if not iterable:
return []
pos = 0
neg = 0
for n in iterable:
if n>=0:
pos = pos + 1
else:
neg = neg + n
return [pos, neg]
>>> arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, -11, -12, -13, -14, -15]
>>> fun(arr)
[10, -65]
>>>

Python with numpy: How to delete an element from each row of a 2-D array according to a specific index

Say I have a 2-D numpy array A of size 20 x 10.
I also have an array of length 20, del_ind.
I want to delete an element from each row of A according to del_ind, to get a resultant array of size 20 x 9.
How can I do this?
I looked into np.delete with a specified axis = 1, but this only deletes element from the same position for each row.
Thanks for the help
You will probably have to build a new array.
Fortunately you can avoid python loops for this task, using fancy indexing:
h, w = 20, 10
A = np.arange(h*w).reshape(h, w)
del_ind = np.random.randint(0, w, size=h)
mask = np.ones((h,w), dtype=bool)
mask[range(h), del_ind] = False
A_ = A[mask].reshape(h, w-1)
Demo with a smaller dataset:
>>> h, w = 5, 4
>>> %paste
A = np.arange(h*w).reshape(h, w)
del_ind = np.random.randint(0, w, size=h)
mask = np.ones((h,w), dtype=bool)
mask[range(h), del_ind] = False
A_ = A[mask].reshape(h, w-1)
## -- End pasted text --
>>> A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
>>> del_ind
array([2, 2, 1, 1, 0])
>>> A_
array([[ 0, 1, 3],
[ 4, 5, 7],
[ 8, 10, 11],
[12, 14, 15],
[17, 18, 19]])
Numpy isn't known for inplace edits; it's mainly intended for statically sized matrices. For that reason, I'd recommend doing this by copying the intended elements to a new array.
Assuming that it's sufficient to delete one column from every row:
def remove_indices(arr, indices):
result = np.empty((arr.shape[0], arr.shape[1] - 1))
for i, (delete_index, row) in enumerate(zip(indices, arr)):
result[i] = np.delete(row, delete_index)
return result

Sort a list based on unit place, tens place, hundred place digit in Python

Suppose I have a list, I want to sort the digits in the following pattern:
A = [3, 30, 34, 256, 5, 9]
Sort by unit place digit first, if unit place digit is same then we will compare tens place and then hundred place. If you sort the A by this rule then:
A = [9, 5, 34, 3, 30, 256]
9 is the highest digit at Unit place
5 is second highest
3, 34, 30 since unit digit is same here, we will compare tens place so 34 will come first here, then 3 and 30.
256 will come last since its unit place digit is 2 which is the lowest.
Suppose B = [100, 10, 1]
then after sorting B = [1, 10, 100]
Could anyone share some Pythonic way to solve this issue?
I have tried sorted(nums, key=lambda x: int(x[0]), reverse=True) but here how will I take tenth place digit into account?
Update: There is one point missing suppose A = [1, 100, 10] then in such cases after sorting A = [1, 10, 100]. In the example I gave A = [3, 30, 34, 256, 5, 9] here after sorting A = [9, 5, 34, 3, 30, 256].
Overall logic is I want to join all digits and create a largest
number.
I think you just want str as the key:
In [11]: sorted(A, key=str, reverse=True)
Out[11]: [9, 5, 34, 30, 3, 256]
Initially I read your question that you would want the reversed digits:
In [12]: sorted(A, key=lambda x: str(x)[::-1])
Out[12]: [30, 3, 34, 5, 256, 9]
The following code answers the updated question: "sort in a way that concatenated sorted numbers will give the highest possible number".
The idea is if most significant digits are same and length is different the longer number is "greater" than the shorter number if in longer number the (shorter number length + 1) digit is greater or equal than the most significant digit. Eg: 30 < 3, 32 < 3, 35 > 3, 10 < 1, 3003 > 3, 3001 < 3, 345 > 34, 342 < 34.
>>> def f(x, y):
... if x == y:
... return 0
... xs = str(x)
... ys = str(y)
... for i in range(min(len(xs), len(ys))):
... if xs[i] > ys[i]:
... return 1
... elif xs[i] < ys[i]:
... return -1
... if len(xs) > len(ys):
... return 1 if xs[0] <= xs[len(ys)] else -1
... return -1 if ys[0] <= ys[len(xs)] else 1
...
>>> A = [3, 30, 34, 256, 5, 9]
>>> B = [100,10,1]
>>> sorted(A, cmp=f, reverse=True)
[9, 5, 34, 3, 30, 256]
>>> sorted(B, cmp=f, reverse=True)
[1, 10, 100]
Oh, you did really want a numbers-as-text string sort. Well if you wanted the units -> tens -> hundreds sort you described, this does it:
# Repeatedly modulo 10 to get the rightmost digit
# (units, then tens, then hundreds) until the
# digit where the two numbers differ. Compare those two digits.
>>> def f(x, y):
... xr = x % 10
... yr = y % 10
... while x and y and xr == yr:
... x, xr = divmod(x, 10)
... y, yr = divmod(y, 10)
... return cmp(xr, yr)
...
>>> A = [3, 30, 34, 256, 5, 9]
>>> sorted(A, cmp=f)
[30, 3, 34, 5, 256, 9]
It's not sorting as your example output, but it is sorting by units - 0, 3, 4, 5, 6, 9. And if it had any where the units were the same, it sorts by tens, etc.
>>> A = [3, 30, 34, 266, 256, 5, 9]
>>> sorted(A, cmp=f)
[30, 3, 34, 5, 256, 266, 9]
This is a different version from other previously posted answers:
import itertools
def produce_list(A):
At = [str(x) for x in A] # convert the items to str so we can join them
result = 0
for i in itertools.permutations(At): # go through all permutations
temp = int(''.join(i))
if result < temp: # find the biggest value of the combined numbers
result = temp
result_list = list(i)
return [int(x) for x in result_list] # return the list of ints
print produce_list([3, 30, 34, 256, 5, 9])
print produce_list([100, 1, 10])
It might not be very efficient (it will go through every combination) but it is as Pythonic as I could get.

Possbile optimization fo a Python solution for Timus 1005 - Balanced partition

I have come across the classic problem today.
The problem description is on Timus : 1005.
I know how to solve it in c++.
But when I tried it in python, I got Time Limit Exceeded.
I use brute force but failed. Then I tried DP, also failed.
Here is my solution:
n = int(input())
wi = list(map(int, input().split()))
ans = 1<<21
up = (1<<(n-1))-1
su = 0
for x in range(up, -1, -1):
su = 0
for y in range(n):
su += wi[y] if (x & 1<<y) else -wi[y]
ans = min(ans, abs(su))
print(ans)
It got TLE on Test3.
Here is another DP solution:
n = int(input())
wi = list(map(int, input().split()))
wi.sort()
ans = sum(x for x in wi)
up = ans // 2
dp = [0] * (up + 1)
dp[0] = 1
for x in range(n):
for y in range(up, wi[x]-1, -1):
dp[y] |= dp[y-wi[x]]
aw = up
while not dp[aw]:
aw -= 1
print(ans - 2 * aw)
Got TLE on Test 4.
So my question is how to pass the problem time limit while using Python ?
this just dummy algorithm, and don't know if it returns correct result.
actually for smaller ranges, that I can calculate it always return correct result, but for the greater ones - really don't know :) it should be better to check with your working c++ code, if it's ok.
def minimizing_diff(lst):
a = list()
b = list()
for i in sorted(lst, reverse = True):
if sum(a)>sum(b):
b.append(i)
else:
a.append(i)
return (len(a), a, len(b), b, abs(sum(a)-sum(b)))
# I am returning the first 4 elements to debug by eye :)
These are ok. You can check by pen and papaer :)
0..9 => (5, [9, 6, 5, 2, 1], 5, [8, 7, 4, 3, 0], 1)
0..19 => (10, [19, 16, 15, 12, 11, 8, 7, 4, 3, 0], 10, [18, 17, 14, 13, 10, 9, 6, 5, 2, 1], 0)
0..14 => (7, [14, 11, 10, 7, 6, 3, 2], 8, [13, 12, 9, 8, 5, 4, 1, 0], 1)
Other results (random 20 numbers between 1 and 9999): All of them completed less than 0.1 seconds.
(10, [9944, 8573, 8083, 6900, 6664, 4644, 4544, 2362, 1522, 947], 10, [9425, 8647, 8346, 7144, 6252, 6222, 3749, 1803, 1760, 126], 709)
(10, [9839, 7087, 6747, 6016, 5300, 4174, 3702, 2469, 1970, 1758], 10, [9490, 9246, 6436, 6010, 4690, 4168, 3608, 2374, 1879, 1684], 523)
(10, [9209, 8754, 8613, 6383, 6286, 5222, 4992, 3119, 2950, 147], 10, [9102, 8960, 7588, 7317, 6042, 5769, 4861, 3041, 2078, 1516], 599)
(10, [8096, 7757, 6975, 6677, 5204, 4354, 3174, 3132, 1237, 425], 10, [8033, 7765, 7140, 6089, 5511, 4385, 3482, 2877, 1253, 1139], 643)
(10, [9243, 7887, 6890, 6689, 6347, 5173, 3953, 3380, 3079, 1032], 10, [9131, 7996, 7791, 6403, 5621, 5585, 3632, 3436, 2959, 1291], 172)
(10, [9697, 8504, 7731, 7504, 6696, 4671, 4464, 3057, 1929, 1691], 10, [9384, 8540, 8319, 7233, 6252, 5549, 4275, 2154, 2023, 1794], 421)
Because Python integers can be arbitrarily large, a single integer can be used to represent a boolean array, such as the variable dp in your second solution. This lets you replace the inner loop with a couple of bitwise operations:
ans = sum(wi)
up = ans // 2
mask = 2 ** (up + 1) - 1
dp = 1
for x in wi:
dp |= dp << x & mask
aw = dp.bit_length() - 1
print(ans - 2 * aw)

Categories