I have a minimum value and maximum value, I'd like to generate a list of numbers between them such that all the numbers have equal counts. Is there a numpy function or any function out there?
Example: GenerateNums(start=1, stop=5, nums=10)
Expected output: [1,1,2,2,3,3,4,4,5,5] i.e each number has an almost equal count
Takes "almost equal" to heart -- the difference between the most common and least common number is at most 1. No guarantee about which number is the mode.
def gen_nums(start, stop, nums):
binsize = (1 + stop - start) * 1.0 / nums
return map(lambda x: int(start + binsize * x), xrange(nums))
gen_nums(1, 5, 10)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
There is a numpy function:
In [3]: np.arange(1,6).repeat(2)
Out[3]: array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
def GenerateNums(start=1, stop=5, nums=10):
result = []
rep = nums/(stop - start + 1 )
for i in xraneg(start,stop):
for j in range(rep):
result.append(i)
return result
For almost equal counts, you can sample from a uniform distribution. numpy.random.randint does this:
>>> import numpy as np
>>> np.random.randint(low=1, high=6, size=10)
array([4, 5, 5, 4, 5, 5, 2, 1, 4, 2])
To get these values in sorted order:
>>> sorted(np.random.randint(low=1, high=6, size=10))
[1, 1, 1, 2, 3, 3, 3, 3, 5, 5]
This process is just like rolling dice :) As you sample more times, the counts of each value should become very similar:
>>> from collections import Counter
>>> Counter(np.random.randint(low=1, high=6, size=10000))
Counter({1: 1978, 2: 1996, 3: 2034, 4: 1982, 5: 2010})
For exactly equal counts:
>>> range(1,6) * 2
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> sorted(range(1,6) * 2)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
def GenerateNums(start=0,stop=0,nums=0,result=[]):
assert (nums and stop > 0), "ZeroDivisionError"
# get repeating value
iter_val = int(round(nums/stop))
# go through strt/end and repeat the item on adding
[[result.append(x) for __ in range(iter_val)] for x in range(start,stop)]
return result
print (GenerateNums(start=0, stop=5, nums=30))
>>> [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]
Related
I have an array that looks like this:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
I want to write a function that will randomly return some specified number of indices that correspond to a specified number. In other words, if I pass the function the array x, the desired number of indices such as 3, and the target value 1, I would want it to return an array such as:
[0, 7, 13]
Since 0, 7, and 13 are the indices that correspond to 1 in x.
Does anyone know how I might do this efficiently?
You want to use random.sample for this:
import random
def f(arr, target, num):
return random.sample([i for i, x in enumerate(arr) if x == target], k=num)
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
print(f(x, 1, 3))
Output:
[0, 1, 15]
You can use the sample function from the random module and pass it the list of indices that match the specified value:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
from random import sample
def randomIndices(a,count,v):
return sample([i for i,n in enumerate(a) if n==v],count)
print(randomIndices(x,3,1)) # [1,18,15]
Your question asks how to do this efficiently, which depends on how you plan on using this code. As myself and others have pointed out, one way is to use enumerate to filter the list for the indices that correspond to the target value. The downside here is that each time you pick a new target value or request a new sample, you have to once again enumerate the list which is an O(n) operation.
If you plan on taking multiple samples, you may be better off building a dictionary mapping the target value to the indices upfront. Then you can subsequently use this dictionary to draw random samples more efficiently than enumerating. (The magnitude of the savings would grow as x becomes very large).
First build the dictionary using collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for i, val in enumerate(x):
d[val].append(i)
print(dict(d))
#{1: [0, 1, 7, 13, 15, 16, 18], 2: [2, 5, 6, 8, 10, 12, 14, 17], 3: [3, 4, 9, 11]}
Now you can use d to draw your samples:
from random import sample
def get_random_sample(d, target_value, size):
return sample(d[target_value], size)
print(get_random_sample(d, target_value=1, size=3))
#[16, 7, 18]
You can do the next:
Get the indices of the items with value equal to 1
Use random.sample to select randomly only a few indices (without repetitions) extracted from the previous step.
Here is one way to do it (n indicates the number of indices to pick):
from random import sample
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
n = 3
target = 1
indices = frozenset(filter(lambda k: x[k] == target, range(len(x))))
out = sample(indices, min(len(indices), n))
print(out)
Note that the number of returned indices could be lower than n (if the number of 1s in the list is less than n)
I have a Python list
a = [1, 2, 3, 4]
and I'd like to get a range of indices such that if I select the indices 0 through N, I'm getting (for N=10) the repeated
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
I could of course repeat the list via (int(float(N) / len(a) - 0.5) + 1) * a first and select the range [0:10] out of that, but that feels rather clumsy.
Any hints?
You can simply use the modulo operator when accessing the list, i.e.
a[i % len(a)]
This will give you the same result, but doesn't require to actually store the redundant elements.
You can use itertools.cycle and itertools.islice:
from itertools import cycle, islice
my_list = list(islice(cycle(my_list), 10))
Note that if you just want to iterate over this once, you should avoid calling list and just iterate over the iterable, since this avoids allocating repeated elements.
One easy way is to use modulo with list comprehensions à la
a = [1, 2, 3 ,4]
[k % len(a) for k in range(10)]
>>> a = [1, 2, 3, 4]
>>> (a*3)[:-2]
>>> [1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
Thought I would offer a solution using the * operator for lists.
import math
def repeat_iterable(a, N):
factor = N / len(a) + 1
repeated_list = a * factor
return repeated_list[:N]
Sample Output:
>>> print repeat_iterable([1, 2, 3, 4], 10)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
>>> print repeat_iterable([1, 2, 3, 4], 3)
[1, 2, 3]
>>> print repeat_iterable([1, 2, 3, 4], 0)
[]
>>> print repeat_iterable([1, 2, 3, 4], 14)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
How about faking it? Python is good at faking.
class InfiniteList(object):
def __init__(self, data):
self.data = data
def __getitem__(self, i):
return self.data[i % len(self.data)]
x = InfiniteList([10, 20, 30])
x[0] # 10
x[34] # 20
Of course, you could add __iter__, support for slices etc. You could also add a limit (N), but this is the general idea.
How can I create a list of consecutive numbers where each number repeats N times, for example:
list = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5]
Another idea, without any need for other packages or sums:
[x//N for x in range((M+1)*N)]
Where N is your number of repeats and M is the maximum value to repeat. E.g.
N = 3
M = 5
[x//N for x in range((M+1)*N)]
yields
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
My first instinct is to get some functional help from the funcy package. If N is the number of times to repeat each value, and M is the maximum value to repeat, then you can do
import funcy as fp
fp.flatten(fp.repeat(i, N) for i in range(M + 1))
This will return a generator, so to get the array you can just call list() around it
sum([[i]*n for i in range(0,x)], [])
The following piece of code is the simplest version I can think of.
It’s a bit dirty and long, but it gets the job done.
In my opinion, it’s easier to comprehend.
def mklist(s, n):
l = [] # An empty list that will contain the list of elements
# and their duplicates.
for i in range(s): # We iterate from 0 to s
for j in range(n): # and appending each element (i) to l n times.
l.append(i)
return l # Finally we return the list.
If you run the code …:
print mklist(10, 2)
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
print mklist(5, 3)
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4
Another version a little neater, with list comprehension.
But uhmmm… We have to sort it though.
def mklist2(s, n):
return sorted([l for l in range(s) * n])
Running that version will give the following results.
print mklist2(5, 3)
Raw : [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Sorted: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
Let's consider a 2d-array A
2 3 5 7
2 3 5 7
1 7 1 4
5 8 6 0
2 3 5 7
The first, second and last lines are identical. The algorithm I'm looking for should return the number of identical rows for each different row (=number of duplicates of each element). If the script can be easily modified to also count the number of identical column also, it would be great.
I use an inefficient naive algorithm to do that:
import numpy
A=numpy.array([[2, 3, 5, 7],[2, 3, 5, 7],[1, 7, 1, 4],[5, 8, 6, 0],[2, 3, 5, 7]])
i=0
end = len(A)
while i<end:
print i,
j=i+1
numberID = 1
while j<end:
print j
if numpy.array_equal(A[i,:] ,A[j,:]):
numberID+=1
j+=1
i+=1
print A, len(A)
Expected result:
array([3,1,1]) # number identical arrays per line
My algo looks like using native python within numpy, thus inefficient. Thanks for help.
In unumpy >= 1.9.0, np.unique has a return_counts keyword argument you can combine with the solution here to get the counts:
b = np.ascontiguousarray(A).view(np.dtype((np.void, A.dtype.itemsize * A.shape[1])))
unq_a, unq_cnt = np.unique(b, return_counts=True)
unq_a = unq_a.view(A.dtype).reshape(-1, A.shape[1])
>>> unq_a
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> unq_cnt
array([1, 3, 1])
In an older numpy, you can replicate what np.unique does, which would look something like:
a_view = np.array(A, copy=True)
a_view = a_view.view(np.dtype((np.void,
a_view.dtype.itemsize*a_view.shape[1]))).ravel()
a_view.sort()
a_flag = np.concatenate(([True], a_view[1:] != a_view[:-1]))
a_unq = A[a_flag]
a_idx = np.concatenate(np.nonzero(a_flag) + ([a_view.size],))
a_cnt = np.diff(a_idx)
>>> a_unq
array([[1, 7, 1, 4],
[2, 3, 5, 7],
[5, 8, 6, 0]])
>>> a_cnt
array([1, 3, 1])
You can lexsort on the row entries, which will give you the indices for traversing the rows in sorted order, making the search O(n) rather than O(n^2). Note that by default, the elements in the last column sort last, i.e. the rows are 'alphabetized' right to left rather than left to right.
In [9]: a
Out[9]:
array([[2, 3, 5, 7],
[2, 3, 5, 7],
[1, 7, 1, 4],
[5, 8, 6, 0],
[2, 3, 5, 7]])
In [10]: lexsort(a.T)
Out[10]: array([3, 2, 0, 1, 4])
In [11]: a[lexsort(a.T)]
Out[11]:
array([[5, 8, 6, 0],
[1, 7, 1, 4],
[2, 3, 5, 7],
[2, 3, 5, 7],
[2, 3, 5, 7]])
You can use Counter class from collections module for this.
It works like this :
x = [2, 2, 1, 5, 2]
from collections import Counter
c=Counter(x)
print c
Output : Counter({2: 3, 1: 1, 5: 1})
Only issue you will face is in your case since every value of x is itself a list which is a non hashable data structure.
If you can convert every value of x in a tuple that it should works as :
x = [(2, 3, 5, 7),(2, 3, 5, 7),(1, 7, 1, 4),(5, 8, 6, 0),(2, 3, 5, 7)]
from collections import Counter
c=Counter(x)
print c
Output : Counter({(2, 3, 5, 7): 3, (5, 8, 6, 0): 1, (1, 7, 1, 4): 1})
I have this small bit of code which I want to know if it could be written in list comprehension. The while loop part is what I am interested in condensing.
>>> sum=33
>>> final_list=[]
>>> LastCoin=[0, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2,
1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1]
>>> while sum>0:
... final_list.append(LastCoin[sum])
... sum-=LastCoin[sum]
...
>>> print final_list
[1, 2, 5, 5, 5, 5, 5, 5]
>>>
Is there any good reason you are trying to use a list comprehension?
I see personally a lot of people trying to wedge list comprehensions where they don't belong, because, you know, 'list comprehensions are faster - they're in native C! whereas your boring loop is in interpreted Python'. That's not always true.
Just as a reference, if we compare your original solution, which is concise and readable, against the two proposed answers, you may find your assumptions violated:
In [5]: %%timeit
...: sum=33
...: while sum > 0:
...: final_list.append(LastCoin[sum])
...: sum -= LastCoin[sum]
...:
100000 loops, best of 3: 1.96 µs per loop
In [6]: %%timeit
...: sum=33
...: susu = [sum]
...: susu.extend(x for x in xrange(sum,-1,-1)
...: if x==(susu[-1]-LastCoin[susu[-1]])!=0)
...: fifi = [LastCoin[x] for x in susu]
...:
100000 loops, best of 3: 10.4 µs per loop
# 5x slower
In [10]: %timeit final_list = [LastCoin[reduce(lambda x, y: x - LastCoin[x], range(counter, i, -1))] for i in range(counter -1, 0, -1) if reduce(lambda x, y: x - LastCoin[x], range(counter, i, -1))]
10000 loops, best of 3: 128 µs per loop
# More than 60x slower!!
A list comprehension is a good choice if you are trying to do something for every element in a list - filtering (test every element for true/false), translation, etc. where the operation is separate for every element (and, theoretically, could often be parallelized). It's not very good at loops which do processing and change state during the loop, and they usually look ugly when you try. In this particular case, you only look at 8 items as you go through the list, because you are manually calculating indices to look at. In the list comprehension case, you would have to at least look at all 33.
I don't know if that's your motivation, but if it is, just leave it as a loop. Python loops aren't that bad after all!
Yes, it's possible:
counter = 33 # your sum variable
LastCoin = [0, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2,
1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1]
final_list = [LastCoin[reduce(lambda x, y: x - LastCoin[x],
range(counter, i, -1))]
for i in range(counter -1, 0, -1)
if reduce(lambda x, y: x - LastCoin[x],
range(counter, i, -1))
]
print(final_list)
# [1, 2, 5, 5, 5, 5, 5, 5]
But, it's not way of import this !
Edit
Better solution
Not trying to use a list comprehension. Using recursion instead.
Simpler, and time of execution divided by 3 compared to my former solution.
LastCoin=[0, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2,
1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5,
1, 2, 1, 2, 5, 1, 2, 1]
def nekst(x,L,rec=None):
if rec is None: rec = []
rec.append(L[x])
if x-L[x]>0: nekst(x-L[x],L,rec)
return rec
print nekst(33,LastCoin)
result
[1, 2, 5, 5, 5, 5, 5, 5]
Comparing execution times
NB: the following tests are done with recursive functions that haven't the line
if rec is None: rec = [].
The presence of this line increases a little (+12%) the execution time of the solution with a recursive function.
from time import clock
iterat = 10000
N = 100
LastCoin=[0, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2,
1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5,
1, 2, 1, 2, 5, 1, 2, 1]
counter = 33
te = clock()
for i in xrange(iterat):
final_list = [LastCoin[reduce(lambda x, y: x - LastCoin[x],
range(counter, i, -1))]
for i in range(counter -1, 0, -1)
if reduce(lambda x, y: x - LastCoin[x],
range(counter, i, -1))]
print clock()-te,'Omid Raha'
st=33
E1 = []
for n in xrange(N):
te = clock()
for i in xrange(iterat):
susu2 = [st]
susu2.extend(x for x in xrange(st,0,-1)
if x==(susu2[-1]-LastCoin[susu2[-1]]))
fifi2 = [LastCoin[x] for x in susu2]
del fifi2
E1.append(clock()-te)
t1 = min(E1)
print t1,'eyquem 1'
E2 = []
for n in xrange(N):
te = clock()
for i in xrange(iterat):
def nekst(x,L,rec):
rec.append(L[x])
if x-L[x]>0: nekst(x-L[x],L,rec)
return rec
fifi3 = nekst(st,LastCoin,[])
del fifi3,nekst
E2.append(clock()-te)
t2 = min(E2)
print t2,'eyquem 2, nekst redefined at each turn of the measurement loop'
def nekst(x,L,rec):
rec.append(L[x])
if x-L[x]>0: nekst(x-L[x],L,rec)
return rec
E22 = []
for n in xrange(N):
te = clock()
for i in xrange(iterat):
fifi3 = nekst(st,LastCoin,[])
del fifi3
E22.append(clock()-te)
t22 = min(E22)
print t22,'eyquem 2, nekst defined outside of the measurement loop'
W = []
for n in xrange(N):
te = clock()
for i in xrange(iterat):
y = 33
final_list=[]
while y>0:
final_list.append(LastCoin[y])
y-=LastCoin[y]
del final_list,y
W.append(clock()-te)
tw = min(W)
print tw,'while-loop == %.1f %% of %s' % (100*min(W)/min(E22),min(E22))
result
4.10056836833 Omid Raha
0.29426393578 eyquem 1
0.114381576429 eyquem 2, nekst redefined at each turn of the measurement loop
0.107410299354 eyquem 2, nekst defined outside of the measurement loop
0.0820501882362 while-loop == 76.4 % of 0.107410299354
It's a little faster if the definition of the function nekst() is executed outside the timing loop.
.
Original answer slightly edited
I couldn't do better than that:
sum=33
LastCoin=[0, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2,
1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5,
1, 2, 1, 2, 5, 1, 2, 1]
susu = [sum]
susu.extend(x for x in xrange(sum,0,-1)
if x==(susu[-1]-LastCoin[susu[-1]]))
fifi = [LastCoin[x] for x in susu]
print 'susu == %r\n'\
'fifi == %r\n'\
'wanted : %r' % (susu,fifi,[1, 2, 5, 5, 5, 5, 5, 5])
result
susu == [33, 32, 30, 25, 20, 15, 10, 5]
fifi == [1, 2, 5, 5, 5, 5, 5, 5]
wanted : [1, 2, 5, 5, 5, 5, 5, 5]
.
The edit is:
x for x in xrange(sum,0,-1) if x==(susu[-1]-LastCoin[susu[-1]])
instead of original
x for x in xrange(sum,-1,-1) if x==(susu[-1]-LastCoin[susu[-1]])!=0
I maybe a bit late to the party. With python 3.9, we can use walrus operator (:=)
If we use that, we can do the following and reduce the while statement to one line.
LastCoin=[0, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5, 1, 2,
1, 2, 5, 1, 2, 1, 2, 5, 1, 2, 1, 2, 5,
1, 2, 1, 2, 5, 1, 2, 1]
s = 33
final_list = [LastCoin[s]]
while (s:=s-LastCoin[s]) > 0: final_list.append(LastCoin[s])
print (final_list)
The output of this will be:
[1, 2, 5, 5, 5, 5, 5, 5]