Numpy - Count Number of Values Until Condition Is Satisfied - python

If I have two numpy arrays of the same size.
ArrayOne = np.array([ 2, 5, 5, 6, 7, 10, 13])
ArrayTwo = np.array([ 8, 10, 12, 14, 16, 18, 24])
How can I count how many elements there are until the beginning of the array. Unless the condition ArrayOne >= ArrayTwo is satisfied. In which case how many elements until that condition. Then make an array out of the result.
So as an example for element [0] there are 0 elements in front. For element [1] there is 1 element in front, and ArrayOne >= ArrayTwo wasn't satisfied. At element [5] in ArrayOne is bigger than element[0] in ArrayTwo so there are four elements until element [1] in ArrayTwo Etc.
Giving the result
result = np.array([ 0, 1, 2, 3, 4, 4, 3])
Thanks in advance.

Basically, at index i you have the value
value = i -count(how often element i in array one was bigger than array two until index i)
Because I'm on mobile with damn autocorrect, I rename the two arrays to a and b.
def get_value(a, b, i):
max_value = a[i]
nb_smaller_elements = sum(1 for el in range(i) if b[el] < max_value)
return i - nb_smaller_elements

I think I got it. Using #Paul Panzer 's answer, I made a for loop that goes through the list.
def toggle(ArrayOne,ArrayTwo):
a = 0
sum = -1
linels = []
for i in range(len(ArrayOne)):
sum += 1
a = sum - np.searchsorted(ArrayTwo, ArrayOne[i])
linels.append(a)
return np.array(linels)
I get the result
linels = np.array([ 0, 1, 2, 3, 4, 4, 3])

Related

how to gete sum of duplicate in list in python?

how do i check how what is the duplicate numbers and what their sum?
I am working on a project and I cant get it.
list = [1, 3, 5, 2, 1, 6, 5, 10, 1]
You can iterate the list and count how many times each element is met, then it's easy to check which elements are repeated (they have counter greater than 1), and the sum would be simply element_value*count
li = [1, 3, 5, 2, 1, 6, 5, 10, 1]
counters = {}
for element in li:
counters[element] = counters.get(element, 0) + 1
for element, count in counters.items():
if count >= 2:
print('Repeated element', element, 'sum=', element*count)
You can set up a separate set to check which items have already been seen, and for those where this is the case you add them to a sum:
sum = 0
li = [1, 3, 5, 2, 1, 6, 5, 10, 1]
seen_numbers = set()
for n in li:
if n not in seen_numbers:
seen_numbers.add(n)
else:
sum += n
Note that this will add a number that is already in the list each time it recurs, i.e., a number that appears three times will be added to sum twice. I don't know if that's what you want.
If you need the single results you can construct a list where each element is a tuple. Each tuple contains the number, the count and the sum.
from collections import Counter
data = [1, 3, 5, 2, 1, 6, 5, 10, 1]
result = [(value, count, value*count) for value, count in Counter(data).items() if count > 1]
print(result)
If you need to find only the full total of all values that appear more than once:
print(sum(value*count for value, count in Counter(data).items() if count > 1))

Length of the intersections between a list an list of list

Note : almost duplicate of Numpy vectorization: Find intersection between list and list of lists
Differences :
I am focused on efficiently when the lists are large
I'm searching for the largest intersections.
x = [500 numbers between 1 and N]
y = [[1, 2, 3], [4, 5, 6, 7], [8, 9], [10, 11, 12], etc. up to N]
Here are some assumptions:
y is a list of ~500,000 sublist of ~500 elements
each sublist in y is a range, so y is characterized by the last elements of each sublists. In the example : 3, 7, 9, 12 ...
x is not sorted
y contains once and only once each numbers between 1 and ~500000*500
y is sorted in the sense that, as in the example, the sub-lists are sorted and the first element of one sublist is the next of the last element of the previous list.
y is known long before even compile-time
My purpose is to know, among the sublists of y, which have at least 10 intersections with x.
I can obviously make a loop :
def find_best(x, y):
result = []
for index, sublist in enumerate(y):
intersection = set(x).intersection(set(sublist))
if len(intersection) > 2: # in real live: > 10
result.append(index)
return(result)
x = [1, 2, 3, 4, 5, 6]
y = [[1, 2, 3], [4], [5, 6], [7], [8, 9, 10, 11]]
res = find_best(x, y)
print(res) # [0, 2]
Here the result is [0,2] because the first and third sublist of y have 2 elements in intersection with x.
An other method should to parse only once y and count the intesections :
def find_intersec2(x, y):
n_sublists = len(y)
res = {num: 0 for num in range(0, n_sublists + 1)}
for list_no, sublist in enumerate(y):
for num in sublist:
if num in x:
x.remove(num)
res[list_no] += 1
return [n for n in range(n_sublists + 1) if res[n] >= 2]
This second method uses more the hypothesis.
Questions :
what optimizations are possibles ?
Is there a completely different approach ? Indexing, kdtree ? In my use case, the large list y is known days before the actual run. So i'm not afraid to buildind an index or whatever from y. The small list x is only known at runtime.
Since y contains disjoint ranges and the union of them is also a range, a very fast solution is to first perform a binary search on y and then count the resulting indices and only return the ones that appear at least 10 times. The complexity of this algorithm is O(Nx log Ny) with Nx and Ny the number of items in respectively x and y. This algorithm is nearly optimal (since x needs to be read entirely).
Actual implementation
First of all, you need to transform your current y to a Numpy array containing the beginning value of all ranges (in an increasing order) with N as the last value (assuming N is excluded for the ranges of y, or N+1 otherwise). This part can be assumed as free since y can be computed at compile time in your case. Here is an example:
import numpy as np
y = np.array([1, 4, 8, 10, 13, ..., N])
Then, you need to perform the binary search and check that the values fits in the range of y:
indices = np.searchsorted(y, x, 'right')
# The `0 < indices < len(y)` check should not be needed regarding the input.
# If so, you can use only `indices -= 1`.
indices = indices[(0 < indices) & (indices < len(y))] - 1
Then you need to count the indices and filter the ones with at least :
uniqueIndices, counts = np.unique(indices, return_counts=True)
result = uniqueIndices[counts >= 10]
Here is an example based on your:
x = np.array([1, 2, 3, 4, 5, 6])
# [[1, 2, 3], [4], [5, 6], [7], [8, 9, 10, 11]]
y = np.array([1, 4, 5, 7, 8, 12])
# Actual simplified version of the above algorithm
indices = np.searchsorted(y, x, 'right') - 1
uniqueIndices, counts = np.unique(indices, return_counts=True)
result = uniqueIndices[counts >= 2]
# [0, 2]
print(result.tolist())
It runs in less than 0.1 ms on my machine on a random input based on your input constraints.
Turn y into 2 dicts.
index = { # index to count map
0 : 0,
1 : 0,
2 : 0,
3 : 0,
4 : 0
}
y = { # elem to index map
1: 0,
2: 0,
3: 0,
4: 1,
5: 2,
6: 2,
7: 3,
8 : 4,
9 : 4,
10 : 4,
11 : 4
}
Since you know y in advance, I don't count the above operations into the time complexity. Then, to count the intersection:
x = [1, 2, 3, 4, 5, 6]
for e in x: index[y[e]] += 1
Since you mentioned x is small, I try to make the time complexity depends only on the size of x (in this case O(n)).
Finally, the answer is the list of keys in index dict where the value is >= 2 (or 10 in real case).
answer = [i for i in index if index[i] >= 2]
This uses y to create a linear array mapping every int to the (1 plus), the index of the range or subgroup the int is in; called x2range_counter.
x2range_counter uses a 32 bit array.array type to save memory and can be cached and used for calculations of all x on the same y.
calculating the hits in each range for a particular x is then just indirected array incrementing of a count'er in function count_ranges`.
y = [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11, 12]]
x = [5, 3, 1, 11, 8, 10]
range_counter_max = len(y)
extent = y[-1][-1] + 1 # min in y must be 1 not 0 remember.
x2range_counter = array.array('L', [0] * extent) # efficient 32 bit array storage
# Map any int in any x to appropriate ranges counter.
for range_counter_index, rng in enumerate(y, start=1):
for n in rng:
x2range_counter[n] = range_counter_index
print(x2range_counter) # array('L', [0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4])
# x2range_counter can be saved for this y and any x on this y.
def count_ranges(x: List[int]) -> List[int]:
"Number of x-hits on each y subgroup in order"
# Note: count[0] initially catches errors. count[1..] counts x's in y ranges [0..]
count = array.array('L', [0] * (range_counter_max + 1))
for xx in x:
count[x2range_counter[xx]] += 1
assert count[0] == 0, "x values must all exist in a y range and y must have all int in its range."
return count[1:]
print(count_ranges(x)) # array('L', [1, 2, 1, 2])
I created a class for this, with extra functionality such as returning the ranges rather than the indices; all ranges hit >=M times; (range, hit-count) tuples sorted most hit first.
Range calculations for different x are proportional to x and are simple array lookups rather than any hashing of dicts.
What do you think?

Swapping opposite elements of a List of Integers if either is Odd

Write a program with the definition of a function named Array_Swap() that will accept an integer list & its size as arguments and the function will swap elements in such a way that the first element is swapped with the last element, the second element is swapped with the second last element and so on, only if anyone or both the elements are odd and display the result.
If initially, a list of seven elements is: [5, 16, 4, 7, 19, 8, 2], the contents of the list after the execution should be:
[2, 16, 19, 7, 4, 8, 5].
def Array_Swap(List,Size):
for i in range (Size//2):
List[i]=List[Size//2-i]
print(List)
L=[]
n=int(input("Enter number of elements"))
for i in range(n):
x=int(input("Enter element"))
L.append(x)
Array_Swap(L,len(L))
The size/length of the list is not relevant because it can be obtained by len(list). And even then it's not required to conditionally swap items in the list. I suggest that the Size parameter be removed, but considering it's an assignment, it can be given a default of None so that it can be ignored by the caller if desired.
The following algorithm zips the input list with its reverse to form pairs relative to their index from the front and end of the list respectively, i.e. the first and last items are paired, the second and second last are paired, etc. Once the items are paired it is simply a matter of iterating over the list and emitting the second number of the pair if either number is odd, or the first number if neither is odd - effectively swapping the pairs as required.
This is done in-place (that's what the List[:] does) with a list comprehension.
def ArraySwap(List, Size=None):
List[:] = [b if (a % 2 or b % 2) else a
for a, b in zip(List, reversed(List))]
print(List)
>>> l = [5, 16, 4, 7, 19, 8, 2]
>>> ArraySwap(l)
[2, 16, 19, 7, 4, 8, 5]
>>> l
[2, 16, 19, 7, 4, 8, 5]
>>> l = list(range(1,30))
>>> ArraySwap(l)
[29, 2, 27, 4, 25, 6, 23, 8, 21, 10, 19, 12, 17, 14, 15, 16, 13, 18, 11, 20, 9, 22, 7, 24, 5, 26, 3, 28, 1]
>>> ArraySwap([1])
[1]
>>> ArraySwap([])
[]
To swap two elements in the list, use the pattern a, b = b, a.
If i is the index of a list item, it's opposite/mirror element is -(i+1), or - i - 1.
so for the 0th element (first one), the mirror is -(0+1), = -1
using that as the indexer for the element, swap the two list elements IF
check that at least one of them is odd before swapping:
def Array_Swap(List,Size):
for i in range (Size // 2):
if List[i] % 2 == 1 or List[-(i+1)] % 2 == 1:
List[i], List[-(i+1)] = List[-(i+1)], List[i]
print(List)
L = [5, 16, 4, 7, 19, 8, 2] # use your input blocks as before, this is an example
Array_Swap(L,len(L))
Output: [2, 16, 19, 7, 4, 8, 5]
(And if L = [5, 16, 4, 7, 19, 8, 1, 2], output is [2, 1, 4, 19, 7, 8, 16, 5].)
Btw, you don't need to pass in the size of the list as a parameter.
You could do just: for i in range(len(List) // 2)
Another solution:
def Array_Swap(List, Size=None):
if Size is None:
Size = len(List)
for (i, j) in zip(range(Size // 2), range(Size - 1, Size // 2, -1)):
if List[i] % 2 or List[j] % 2:
List[i], List[j] = List[j], List[i]
print(List)
Alternatively:
Size parameter is redundant since python's list instance knows its own size
Use bitwise operatos & | >>... possibly cheaper than modulus % and divide / operations.
def Array_Swap(List):
for i in range(len(List) >> 1):
if (List[i] | List[-i-1]) & 1:
List[i], List[-i-1] = List[-i-1], List[i]
print(List)
The standard way to swap two variables in Python is:
a, b = b, a
In this case, you would do:
lst[i], lst[size - i - 1] = lst[size - i - 1], lst[i]
which swaps the ith element with the element that is at index size - i - 1 (i.e. the ith index from the end).
The other issue with your code is that it doesn't check whether either of the elements being swapped are odd, which you can resolve by adding the condition:
if lst[i] % 2 or lst[size - i - 1] % 2:
before doing the swap. This uses the modulo operator (%) to check the parity of the elements. Taking a number modulo 2 will return 1 if the number is odd. If either are odd (1 has a truth value of True), the condition would succeed and the swap will be performed.
Finally, your function was printing the list, rather than returning it. Its usually best to return a result and print the returned result.
The full working version, with the above three changes is as follows:
def list_swap(lst, size):
for i in range(size // 2):
if lst[i] % 2 or lst[size - i - 1] % 2:
lst[i], lst[size - i - 1] = lst[size - i - 1], lst[i]
return lst
l = []
n = int(input("Enter number of elements: "))
for _ in range(n):
x = int(input("Enter element: "))
l.append(x)
result = list_swap(l, len(l))
print(result)
Also note, I've changed all the variables to be lowercase, which is standard in Python.
With your shown example:
Enter number of elements: 7
Enter element: 5
Enter element: 16
Enter element: 4
Enter element: 7
Enter element: 19
Enter element: 8
Enter element: 2
[2, 16, 19, 7, 4, 8, 5]

Numpy: how do I get the smallest index for which a property is true [duplicate]

This question already has answers here:
Numpy first occurrence of value greater than existing value
(8 answers)
Closed 3 years ago.
I have a numpy array that is one dimensional. I would like to get the biggest and the smallest index for which a property is true.
For instance,
A = np.array([0, 3, 2, 4, 3, 6, 1, 0])
and I would like to know the smallest index for which the value of A is larger or equal to 4.
I can do
i = 0
while A[i] < 4:
i += 1
print("smallest index", i)
i = -1
while A[i] <4:
i -= 1
print("largest index", len(A)+i)
Is there a better way of doing this?
As suggested in this answer,
np.argmax(A>=4)
returns 3, which is indeed the smallest index. But this doesn't give me the largest index.
You can try something like. As per the comments, if A is.
A = np.array([0, 3, 2, 4, 3, 6, 1, 4])
idx_values = np.where(A >= 4)[0]
min_idx, max_idx = idx_values[[0, -1]]
print(idx_values)
# array([3, 5, 7], dtype=int64)
idx_values returns all the index values meeting your condition. You can then access the smallest and largest index positions.
print(min_idx, max_idx)
# (3, 7)

Add a 0 or 1 based on a value in a certain column

I hope anyone can help me with the following. I have a list called: 'List'. And I have a list called X.
Now I would like to check whether the value in the third column of each row in List is smaller than (<) X or equal/bigger than X. If the value is smaller I would like to add a 0 to the 6th column and a 1 if it is equal/bigger. And for each X I would like the answers to be added to the upfollowing columns to List. So in this case there are 4 X values. So as a result 4 columns should be added to List. My code below probably shows I'm quite an amature and I hope you can help me out. Thank you in advance.
List = [(3,5,6,7,6),(3,5,3,2,6),(3,6,1,0,5)]
X= [1,4,5,6]
for item in X:
for number in row[3] for row in List:
count = 0
if number < item:
List[5+count].append(0)
count += 1
return List
else:
List[5+count].append(1)
count += 1
return List
return List
First, you should know that tuples (parenthesis enclosed lists) are immutable, so you can not change anything about them once they're defined. It's better to use a list in your case (enclosed by []).
List = [[3,5,6,7,6],[3,5,3,2,6],[3,6,1,0,5]]
X= [1,4,5,6]
for item in X: # loop on elements of X
for subList in List: # loop on 'rows' in List
if subList[2] < item: # test if 3rd element is smaller than item in X
subList.append(0); # push 0 to the end of the row
else:
subList.append(1); # push 1 to the end of the row
List = [(3,5,6,7,6),(3,5,3,2,6),(3,6,1,0,5)]
X= [1,4,5,6]
scores = []
for item in List:
scores.append(tuple(map(lambda x: 0 if item[2] < x else 1, X)))
result = []
for item, score in zip(List, scores):
result.append(item + score)
print(result)
# [(3, 5, 6, 7, 6, 1, 1, 1, 1), (3, 5, 3, 2, 6, 1, 0, 0, 0), (3, 6, 1, 0, 5, 1, 0, 0, 0)]
Your indentation is off (you should unindent everything starting with your for statement.
You can't append to tuples (your rows inside the List variable are actually tuples).
Since you are not in a function, return does not do anything.
Since indices start with 0, you should use row[2] for 3rd row.
There are more elements in your X than the number of rows in List.
That being said, you can also use list comprehensions to implement this. Here is a one-liner that does the same thing:
>>> List = [(3,5,6,7,6),(3,5,3,2,6),(3,6,1,0,5)]
>>> X = [1,4,5,6]
>>> print [tuple(list(t[0])+[0]) if t[0][2] < t[1] else tuple(list(t[0]) + [1]) for t in zip(List, X)]
will print
[(3, 5, 6, 7, 6, 1), (3, 5, 3, 2, 6, 0), (3, 6, 1, 0, 5, 0)]
List = [[3,5,6,7,6],[3,5,3,2,6],[3,6,1,0,5]]
X= [1,4,5,6]
elems = [row[3] for row in List]
for i in range(len(elems)):
for x in X:
if elems[i] < x:
List[i].append(0)
else:
List[i].append(1)
print List
And you cannot use return if you are not using functions.
return needs to be called from inside a function. It exits the function and the value specified by return is given back to the function.
So you can't use it in your program.
In the list, each row is actually known as a tuple. Tuples don't have the append function so you can't use that to add to the end of a row.
Also, you can't have two for loops in a single line. (Which is not a problem since we only need one to achieve your output)
I've modified your code so that it looks similar so it's easier for you to understand.
List = [(3,5,6,7,6),(3,5,3,2,6),(3,6,1,0,5)]
X= [1,4,5,6]
for item in X:
n = 0
for row in list:
if row[3] < item:
list[n] = list[n] + (0,)
else:
list[n] = list[n] + (1,)
n = n+1
print List
You need to add with (0,) or (1,) to show that it's a tuple addition. (Or else python will think that you're adding a tuple with an integer)
agree with Selcuk
[edited #1: Thanks #Rawing, I mistyped > as <]
Here is AlmostGr's version simplified:-
List = [[3, 5, 6, 7, 6], [3, 5, 3, 2, 6], [3, 6, 1, 0, 5]]
X = [1, 4, 5, 6]
for num in X:
for item in List:
if num > item[2]:
item.append(0)
else:
item.append(1)
it runs for all elements in X and produces the output:
[[3, 5, 6, 7, 6, 1, 1, 1, 1], [3, 5, 3, 2, 6, 1, 0, 0, 0], [3, 6, 1, 0, 5, 1, 0, 0, 0]]

Categories