What would be the most efficient way to find the frequency/count of elements in non-overlapping intervals? For example:
limits = [0, 25, 40, 60]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
For the above lists, I want to find the number of elements in data that are within two adjacent limits. So for the above, the count would be something like:
0-25: 6;
25-40: 1;
40-60: 3;
All I can think of is O(n^2) in time. Is there a better way to do it?
Doesn't need Counter to count as number of bins is known, swaps dict to array accesses for binning..
from bisect import bisect_right
def bin_it(limits, data):
"Bin data according to (ascending) limits."
bins = [0] * (len(limits) + 1) # adds under/over range bins too
for d in data:
bins[bisect_right(limits, d)] += 1
return bins
if __name__ == "__main__":
limits = [0, 25, 40, 60]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
bins = bin_it(limits, data)
print(f" < {limits[0]:2} :", bins[0])
for lo, hi, count in zip(limits, limits[1:], bins[1:]):
print(f">= {lo:2} .. < {hi:2} :", count)
print(f">= {limits[-1]:2} ... :", bins[-1])
"""
SAMPLE OUTPUT:
< 0 : 0
>= 0 .. < 25 : 6
>= 25 .. < 40 : 1
>= 40 .. < 60 : 3
>= 60 ... : 0
"""
I recommend you this approach which implements what you want in order of O(nlogn)
limits = [0, 25, 40, 60] # m
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18] # n
data += limits # O(n+m)
data.sort() # O((n+m)log(n+m)) = O(nlogn)
result=dict() # O(1)
cnt = 0 # O(1)
aux ='' # O(1)
i = 0 # O(1)
for el in data: # O(n+m)
if el == limits[i]:
i+=1
if cnt > 0:
aux+='-'+str(el)
result[aux] = cnt # average = O(1)
cnt = 0
aux = str(el)
else:
aux = str(el)
else:
cnt+=1
print(result)
# {'0-25': 6, '25-40': 1, '40-60': 3}
I showed the time complexity of each important line to calculate the total time complexity of the code. the total time complexity of the code is equal to O((n+m)log(n+m)) which can be shown as O(nlogn).
Improvement
you can improve it if you have some assumptions about the inputs. if you have info about the range of limits and data, then you can change the sorting algorithm to counting sort. the time complexity of counting sort is considered as O(n) and the total time complexity of code would be O(n)
Here is a simple O(NlogN) approach. Sort your data, then use a two pointer approach to place each element in the correct interval.
limits = [0, 25, 40, 60]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
data.sort()
n,m = len(data), len(limits)
count = [0]*(m-1)
# count[i] represents count between limits[i] and limits[i+1]
low = 0 # lower index of interval we are currently checking
ptr = 0
while ptr < n:
i = data[ptr]
if i >= limits[low] and i <= limits[low+1]:
count[low] += 1
ptr += 1
elif i>=limits[low]:
if low == len(limits)-1:
break
low += 1
print(count)
limits = [0, 25, 40, 60, 80]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18, 25, 45, 85]
dict_data = {}
i = 0
count = 1
while i < len(limits)-1:
for j in data:
if j in range(limits[i], limits[i+1]):
if '{}-{}'.format(limits[i],limits[i+1]) in dict_data:
dict_data['{}-{}'.format(limits[i],limits[i+1])] +=count
else:
dict_data['{}-{}'.format(limits[i],limits[i+1])] = count
i+=1
print(dict_data)
You could use Counter (from collections) to manage the tallying and bisect to categorize:
from collections import Counter
from bisect import bisect_left
limits = [0, 25, 40, 60, 80]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
r = Counter(limits[bisect_left(limits,d)-1] for d in data)
print(r)
Counter({0: 6, 40: 3, 25: 1})
This has a time complexity of O(NLogM) where M is the number of limit breaks and N is the number of data items
Related
What would be the cleanest way to generate random numbers from 0 to 50, of size 1000, with the condition that no number should have the same number of occurrence as any other number using python and numpy.
Example for size 10: [0, 0, 0, 1, 1, 3, 3, 3, 3, 2] --> no number occurs same number of times
Drawing from a rng.dirichlet distribution and rejecting samples guarantees to obey the requirements, but with low entropy for the number of unique elements. You have to adjust the range of unique elements yourself with np.ones(rng.integers(min,max)). If max approaches the maximum number of unique elements (here 50) rejection might take long or has no solution, causing an infinite loop. The code is for a resulting array of size of 100.
import numpy as np
times = np.array([])
rng = np.random.default_rng()
#rejection sampling
while times.sum() != 100 or len(times) != len(np.unique(times)):
times = np.around(rng.dirichlet(np.ones(rng.integers(5,10)))*100)
nr = rng.permutation(np.arange(51))[:len(times)]
np.repeat(nr, times.astype(int))
Random output
array([ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 25, 5, 5, 5])
Here's a recursive and possibly very slow implementation that produces the output desired.
import numpy as np
def get_sequence_lengths(values, total):
if total == 0:
return [[]], True
if total < 0:
return [], False
if len(values) == 0:
return [], False
sequences = []
result = False
for i in range(len(values)):
ls, suc = get_sequence_lengths(values[:i] + values[i + 1:], total - values[i])
result |= suc
if suc:
sequences.extend([[values[i]] + s for s in ls])
return sequences, result
def gen_numbers(rand_min, rand_max, count):
values = list(range(rand_min, rand_max + 1))
sequences, success = get_sequence_lengths(list(range(1, count+1)), count)
sequences = list(filter(lambda x: len(x) <= 1 + rand_max - rand_min, sequences))
if not success or not len(sequences):
raise ValueError('Cannot generate with given parameters.')
sequence = sequences[np.random.randint(len(sequences))]
values = np.random.choice(values, len(sequence), replace=False)
result = []
for v, s in zip(values, sequence):
result.extend([v] * s)
return result
get_sequence_length will generate all permutations of unique positive integers that sum up to the given total. The sequence will then be further filtered by the number available values. Finally the generation of paired value and counts from the sequence produces the output.
As mentioned above get_sequence_length is recursive and is going to be quite slow for larger input values.
To avoid the variability of generating random combinations in a potentially long trial/error loop, you could use a function that directly produces a random partition of a number where all parts are distinct (increasing). from that you simply need to map shuffled numbers over the chunks provided by the partition function:
def randPart(N,size=0): # O(√N)
if not size:
maxSize = int((N*2+0.25)**0.5-0.5) # ∑1..maxSize <= N
size = random.randrange(1,maxSize) # select random size
if size == 1: return (N,) # one part --> all of N
s = size*(size-1)//2 # min sum of deltas for rest
a = random.randrange(1,(N-s)//size) # base value
p = randPart(N-a*size,size-1) # deltas on other parts
return (a,*(n+a for n in p)) # combine to distinct parts
usage:
size = 30
n = 10
chunks = randPart(size)
numbers = random.sample(range(n),len(chunks))
result = [n for count,n in zip(chunks,numbers) for _ in range(count)]
print(result)
[9, 9, 9, 0, 0, 0, 0, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6]
# resulting frequency counts
from collections import Counter
print(sorted(Counter(result).values()))
[3, 4, 5, 6, 12]
note that, if your range of random numbers is smaller than the maximum number of distinct partitions (for example fewer than 44 numbers for an output of 1000 values), you would need to modify the randPart function to take the limit into account in its calculation of maxSize:
def randPart(N,sizeLimit=0,size=0):
if not size:
maxSize = int((N*2+0.25)**0.5-0.5) # ∑1..maxSize <= N
maxSize = min(maxSize,sizeLimit or maxSize)
...
You could also change it to force a minimum number of partitions
This solves your problem in the way #MYousefi suggested.
import random
seq = list(range(50))
random.shuffle(seq)
values = []
for n,v in enumerate(seq):
values.extend( [v]*(n+1) )
if len(values) > 1000:
break
print(values)
Note that you can't get exactly 1,000 numbers. At first, I generated the entire sequence and then took the first 1,000, but that means whichever sequence gets truncated will be the same length as one of the earlier ones. You end up with 1,035.
Suppose I have a list like this, where numbers increase in different steps:
[ 0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
I want to return the index for the first element in the list where the increase is incremental (+1 step only). In this case, 23 is the first location from which point the increase becomes incremental, and its index would be 8, which is what I want as an output.
What would be an elegant simple way to achieve this? This is what I have tried:
>>> for (a,b) in zip(l, l[1:]):
... if b-a == 1:
... print(l.index(a))
... break
UPDATE: In this particular setup, once the increase becomes incremental it will continue to stay that way. It is possible that the increase will never become incremental.
Solution 1: operator
from operator import sub, indexOf
L = [ 0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
print(indexOf(map(sub, L[1:], L), 1))
# prints 8
Raises ValueError: sequence.index(x): x not in sequence if difference 1 never occurs, so might want to use try/except for that.
Solution 2: bisect
This one only takes O(log n) time, using the monotonicity of incrementalness (as you commented "once the increase becomes incremental it will continue to stay that way").
from bisect import bisect
L = [ 0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
class IsIncremental:
def __getitem__(_, i):
return L[i+1] - L[i] == 1
print(bisect(IsIncremental(), False, 0, len(L) - 1))
# prints 8
Prints len(L) - 1 if difference 1 never occurs.
Btw... readability
As PEP 8 says:
Never use the characters 'l' (lowercase letter el), [...] as single character variable names. In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use 'l', use 'L' instead.
Steps:
Iterate over the array until the second last element.
Check if next element value differs from current element value by exactly 1.
Print the index and break the loop.
Code:
my_list = [0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
for i in range(len(my_list)-1):
if my_list[i+1] - my_list[i] == 1:
print(i)
break
Result:
8
Here is an iterative approach. We can loop over the list and take the following action at each index:
If the current value is one plus the previous, then don't move the incremental index
Otherwise, reset the incremental index to the current position
If we reach the end of the list and we have an incremental index which is earlier than the last position, then we have a potential match.
lst = [0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
idx = 0
for i in range(1, len(lst)):
if lst[i] != lst[i-1] + 1:
idx = i
if idx < len(lst) - 1:
print("Found index: " + str(idx) + ", value: " + str(lst[idx]))
else:
print("No incremental index found")
This prints:
Found index: 8, value: 23
Do a for each loop and check the previous value with the current. Once you reach a point where your current value is only 1 greater than the previous value, return the index of the previous value in your array:
myList = [ 0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
lastVal = -1000
for i in myList:
if i - lastVal == 1:
print(myList.index(lastVal)) #will print your desired value's index. If this is in a function, replace print with return
break
lastVal = i
if myList.index(lastVal) == len(myList) - 1:
print("There is no incremental increase in your array")
(edited, replaced return with lastVal, fixed to print the index)
Output:
8
Here is a way to do this using list comprehension
lst = [ 0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
list2 = [i-1 for i,j in enumerate(lst) if j-lst[i-1]==1]
if len(list2)>0:
print(list2[0])
else:
print('No one up number exists')
Similar to the previous answer.
myList = [0, 4, 6, 8, 12, 15, 19, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
l0 = 0 #suppose that the initial value is 0
for l1 in myList:
increment = l1 - l0
if increment == 1:
print(myList.index(l0)) #if you want to get the second element, use l1 instead.
break #if you want to get all the first elements that has 1 increment, remove break
l0 = l1 #memorize l1
I'm pretty new to programming and python. I was asked to find out a pair of socks from a given list of numbers.
My question was - "There is a large pile of socks that must be paired by color. Given an array of integers representing the color of each sock, determine how many pairs of socks with matching colors there are."
Sample Input
STDIN Function
----- --------
9 n = 9
10 20 20 10 10 30 50 10 20 ar = [10, 20, 20, 10, 10, 30, 50, 10, 20]
Sample Output
3
So my logic was pretty simple, iterate through the list, take a number, compare it with others. If two same numbers are found, count them as a pair and remove them from the list. Then do the same untiil none are left
# Complete the sockMerchant function below.
def sockMerchant(n, ar):
print(ar)
l=[]
result=0
for i in ar:
a=i
c=0
print("a",a)#line for checking
ar.remove(i)
l=ar
print("ar",ar)#line for checking
print("l", l)#line for checking
for j in l:
f=l.index(j)
print("index", f))#line for checking
print("j",j))#line for checking
if j == a:
c=c+1
print("c",c))#line for checking
ar.remove(j)
print("ar2",ar))#line for checking
result=c/2
print("c2",c))#line for checking
return result
n=9
ar=[10, 20, 20, 10, 10, 30, 50, 10, 20]
sockMerchant(n, ar)
Please ignore the line of code beside the comments. They are just there to see the control flow. and here is my output:
[10, 20, 20, 10, 10, 30, 50, 10, 20]
a 10
ar [20, 20, 10, 10, 30, 50, 10, 20]
l [20, 20, 10, 10, 30, 50, 10, 20]
index 0
j 20
index 0
j 20
index 2
j 10
c 1
ar2 [20, 20, 10, 30, 50, 10, 20]
index 3
j 30
index 4
j 50
index 2
j 10
c 2
ar2 [20, 20, 30, 50, 10, 20]
a 20
ar [20, 30, 50, 10, 20]
l [20, 30, 50, 10, 20]
index 0
j 20
c 1
ar2 [30, 50, 10, 20]
index 1
j 50
index 2
j 10
index 3
j 20
c 2
ar2 [30, 50, 10]
a 10
ar [30, 50]
l [30, 50]
index 0
j 30
index 1
j 50
c2 0
Python is full of wonderful utils that can be helpful. Counters from collections can be used for counting how many socks of each color you got and then you just divide by 2 to get the number of pairs.
from collections import Counter
from typing import List
def sock_merchant(socks: List[int]) -> int:
counter = Counter(ar)
return sum((count // 2 for count in counter.values())
Initializing counter with an array will give you something that looks like
Counter({10: 4, 20: 3, 30: 1, 50: 1})
which is the value from the array (i.e color of the sock) and the number of times it occurs in the array.
Like with normal dicts, counters also have a .values() methods that will give you only the values, and since we want the number of pairs, we take the sum of the values after doing integer division on each of them.
E.g. For the input 5, the output should be 7.
(bin(1) = 1, bin(2) = 10 ... bin(5) = 101) --> 1 + 1 + 2 + 1 + 2 = 7
Here's what I've tried, but it isn't a very efficient algorithm, considering that I iterate the loop once for each integer. My code (Python 3):
i = int(input())
a = 0
for b in range(i+1):
a = a + bin(b).count("1")
print(a)
Thank you!
Here's a solution based on the recurrence relation from OEIS:
def onecount(n):
if n == 0:
return 0
if n % 2 == 0:
m = n/2
return onecount(m) + onecount(m-1) + m
m = (n-1)/2
return 2*onecount(m)+m+1
>>> [onecount(i) for i in range(30)]
[0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35, 37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71]
gmpy2, due to Alex Martella et al, seems to perform better, at least on my Win10 machine.
from time import time
import gmpy2
def onecount(n):
if n == 0:
return 0
if n % 2 == 0:
m = n/2
return onecount(m) + onecount(m-1) + m
m = (n-1)/2
return 2*onecount(m)+m+1
N = 10000
initial = time()
for _ in range(N):
for i in range(30):
onecount(i)
print (time()-initial)
initial = time()
for _ in range(N):
total = 0
for i in range(30):
total+=gmpy2.popcount(i)
print (time()-initial)
Here's the output:
1.7816979885101318
0.07404899597167969
If you want a list, and you're using >Py3.2:
>>> from itertools import accumulate
>>> result = list(accumulate([gmpy2.popcount(_) for _ in range(30)]))
>>> result
[0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35, 37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71]
I'm trying to add the difference between numbers and, until the total difference > 20, append the number to a list that becomes a dictionary value where the key is the number_set. I'm getting a list out of range error right now.
The output should be multiple dictionary entries whose list differences add up to the closest number < 20.
number_set = 1
number_dict = {}
num_list = [1, 3, 5, 9, 18, 20, 22, 25, 27, 31]
incl_num_list = []
total = 0
for x in range(1, len(num_list)):
if total < 20:
total = total + (num_list[x+1] - num_list[x])
incl_num_list.append(num_list[x])
else:
number_dict.update({km: num_list})
km += 1
incl_num_list = []
total = 0
for k, v in number_dict.items():
print k
print v
The output should be
1
[1, 3, 5, 9, 18, 20]
2
[22, 25, 27, 31]
num_list = [1, 3, 5, 9, 18, 20, 22, 25, 27, 31]
overflow = 20
total = 0
key = 1
number_dict = {1: [1]}
for left, right in zip(num_list[:-1], num_list[1:]):
total += right - left
if total >= overflow:
key += 1
number_dict[key] = [right]
total = 0
else:
number_dict[key].append(right)
for k, v in sorted(number_dict.items()):
print k
print v
Outputs:
1
[1, 3, 5, 9, 18, 20]
2
[22, 25, 27, 31]
For one thing, you're using km before it's been assigned to anything.
Traceback (most recent call last):
File "<pyshell#29>", line 15, in <module>
number_dict.update({km: num_list})
NameError: name 'km' is not defined
As ndpu points out, your last x will be out of range on your num_list.