How can I create a list of consecutive numbers where each number repeats N times, for example:
list = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5]
Another idea, without any need for other packages or sums:
[x//N for x in range((M+1)*N)]
Where N is your number of repeats and M is the maximum value to repeat. E.g.
N = 3
M = 5
[x//N for x in range((M+1)*N)]
yields
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
My first instinct is to get some functional help from the funcy package. If N is the number of times to repeat each value, and M is the maximum value to repeat, then you can do
import funcy as fp
fp.flatten(fp.repeat(i, N) for i in range(M + 1))
This will return a generator, so to get the array you can just call list() around it
sum([[i]*n for i in range(0,x)], [])
The following piece of code is the simplest version I can think of.
It’s a bit dirty and long, but it gets the job done.
In my opinion, it’s easier to comprehend.
def mklist(s, n):
l = [] # An empty list that will contain the list of elements
# and their duplicates.
for i in range(s): # We iterate from 0 to s
for j in range(n): # and appending each element (i) to l n times.
l.append(i)
return l # Finally we return the list.
If you run the code …:
print mklist(10, 2)
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
print mklist(5, 3)
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4
Another version a little neater, with list comprehension.
But uhmmm… We have to sort it though.
def mklist2(s, n):
return sorted([l for l in range(s) * n])
Running that version will give the following results.
print mklist2(5, 3)
Raw : [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Sorted: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
Related
I have an array that looks like this:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
I want to write a function that will randomly return some specified number of indices that correspond to a specified number. In other words, if I pass the function the array x, the desired number of indices such as 3, and the target value 1, I would want it to return an array such as:
[0, 7, 13]
Since 0, 7, and 13 are the indices that correspond to 1 in x.
Does anyone know how I might do this efficiently?
You want to use random.sample for this:
import random
def f(arr, target, num):
return random.sample([i for i, x in enumerate(arr) if x == target], k=num)
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
print(f(x, 1, 3))
Output:
[0, 1, 15]
You can use the sample function from the random module and pass it the list of indices that match the specified value:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
from random import sample
def randomIndices(a,count,v):
return sample([i for i,n in enumerate(a) if n==v],count)
print(randomIndices(x,3,1)) # [1,18,15]
Your question asks how to do this efficiently, which depends on how you plan on using this code. As myself and others have pointed out, one way is to use enumerate to filter the list for the indices that correspond to the target value. The downside here is that each time you pick a new target value or request a new sample, you have to once again enumerate the list which is an O(n) operation.
If you plan on taking multiple samples, you may be better off building a dictionary mapping the target value to the indices upfront. Then you can subsequently use this dictionary to draw random samples more efficiently than enumerating. (The magnitude of the savings would grow as x becomes very large).
First build the dictionary using collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for i, val in enumerate(x):
d[val].append(i)
print(dict(d))
#{1: [0, 1, 7, 13, 15, 16, 18], 2: [2, 5, 6, 8, 10, 12, 14, 17], 3: [3, 4, 9, 11]}
Now you can use d to draw your samples:
from random import sample
def get_random_sample(d, target_value, size):
return sample(d[target_value], size)
print(get_random_sample(d, target_value=1, size=3))
#[16, 7, 18]
You can do the next:
Get the indices of the items with value equal to 1
Use random.sample to select randomly only a few indices (without repetitions) extracted from the previous step.
Here is one way to do it (n indicates the number of indices to pick):
from random import sample
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
n = 3
target = 1
indices = frozenset(filter(lambda k: x[k] == target, range(len(x))))
out = sample(indices, min(len(indices), n))
print(out)
Note that the number of returned indices could be lower than n (if the number of 1s in the list is less than n)
I have a large np array called X (size:32000) filled with duplicate values of 0, 1, 2, 3.
I want to replace each of the values(0, 1, 2, 3) with permutations of the following numbers: 0, 1, 2, 3, 4, 5
For example, 0, 1, 2, 3 can be replaced with following:
1, 5, 3, 4
5, 2, 4, 3
0, 5, 1, 4
and so on.(there are 360 such permutations in total)
How can I take each of the 360 permutations and replace the 32000 values in X accordingly such that finally I have 360 versions of X for each permutation?
You can try the method numpy.choose:
import numpy as np
x = np.array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3,])
perm = [1, 5, 3, 4,]
x = np.choose(x, perm)
np.choose(x, perm) will choose a value from perm for each value of x, taking x as a list of indices. I recommend looking at the documentation since this function can lead to confusion.
I have a minimum value and maximum value, I'd like to generate a list of numbers between them such that all the numbers have equal counts. Is there a numpy function or any function out there?
Example: GenerateNums(start=1, stop=5, nums=10)
Expected output: [1,1,2,2,3,3,4,4,5,5] i.e each number has an almost equal count
Takes "almost equal" to heart -- the difference between the most common and least common number is at most 1. No guarantee about which number is the mode.
def gen_nums(start, stop, nums):
binsize = (1 + stop - start) * 1.0 / nums
return map(lambda x: int(start + binsize * x), xrange(nums))
gen_nums(1, 5, 10)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
There is a numpy function:
In [3]: np.arange(1,6).repeat(2)
Out[3]: array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
def GenerateNums(start=1, stop=5, nums=10):
result = []
rep = nums/(stop - start + 1 )
for i in xraneg(start,stop):
for j in range(rep):
result.append(i)
return result
For almost equal counts, you can sample from a uniform distribution. numpy.random.randint does this:
>>> import numpy as np
>>> np.random.randint(low=1, high=6, size=10)
array([4, 5, 5, 4, 5, 5, 2, 1, 4, 2])
To get these values in sorted order:
>>> sorted(np.random.randint(low=1, high=6, size=10))
[1, 1, 1, 2, 3, 3, 3, 3, 5, 5]
This process is just like rolling dice :) As you sample more times, the counts of each value should become very similar:
>>> from collections import Counter
>>> Counter(np.random.randint(low=1, high=6, size=10000))
Counter({1: 1978, 2: 1996, 3: 2034, 4: 1982, 5: 2010})
For exactly equal counts:
>>> range(1,6) * 2
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> sorted(range(1,6) * 2)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
def GenerateNums(start=0,stop=0,nums=0,result=[]):
assert (nums and stop > 0), "ZeroDivisionError"
# get repeating value
iter_val = int(round(nums/stop))
# go through strt/end and repeat the item on adding
[[result.append(x) for __ in range(iter_val)] for x in range(start,stop)]
return result
print (GenerateNums(start=0, stop=5, nums=30))
>>> [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]
I have a Python list
a = [1, 2, 3, 4]
and I'd like to get a range of indices such that if I select the indices 0 through N, I'm getting (for N=10) the repeated
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
I could of course repeat the list via (int(float(N) / len(a) - 0.5) + 1) * a first and select the range [0:10] out of that, but that feels rather clumsy.
Any hints?
You can simply use the modulo operator when accessing the list, i.e.
a[i % len(a)]
This will give you the same result, but doesn't require to actually store the redundant elements.
You can use itertools.cycle and itertools.islice:
from itertools import cycle, islice
my_list = list(islice(cycle(my_list), 10))
Note that if you just want to iterate over this once, you should avoid calling list and just iterate over the iterable, since this avoids allocating repeated elements.
One easy way is to use modulo with list comprehensions à la
a = [1, 2, 3 ,4]
[k % len(a) for k in range(10)]
>>> a = [1, 2, 3, 4]
>>> (a*3)[:-2]
>>> [1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
Thought I would offer a solution using the * operator for lists.
import math
def repeat_iterable(a, N):
factor = N / len(a) + 1
repeated_list = a * factor
return repeated_list[:N]
Sample Output:
>>> print repeat_iterable([1, 2, 3, 4], 10)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
>>> print repeat_iterable([1, 2, 3, 4], 3)
[1, 2, 3]
>>> print repeat_iterable([1, 2, 3, 4], 0)
[]
>>> print repeat_iterable([1, 2, 3, 4], 14)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2]
How about faking it? Python is good at faking.
class InfiniteList(object):
def __init__(self, data):
self.data = data
def __getitem__(self, i):
return self.data[i % len(self.data)]
x = InfiniteList([10, 20, 30])
x[0] # 10
x[34] # 20
Of course, you could add __iter__, support for slices etc. You could also add a limit (N), but this is the general idea.
I need a sample, without replacement, from among all possible tuples of numbers from range(n). That is, I have a collection of (0,0), (0,1), ..., (0,n), (1,0), (1,1), ..., (1,n), ..., (n,0), (n,1), (n,n), and I'm trying to get a sample of k of those elements. I am hoping to avoid explicitly building this collection.
I know random.sample(range(n), k) is simple and efficient if I needed a sample from a sequence of numbers rather than tuples of numbers.
Of course, I can explicitly build the list containing all possible (n * n = n^2) tuples, and then call random.sample. But that probably is not efficient if k is much smaller than n^2.
I am not sure if things work the same in Python 2 and 3 in terms of efficiency; I use Python 3.
Depending on how many of these you're selecting, it might be simplest to just keep track of what things you've already picked (via a set) and then re-pick until you get something that you haven't picked already.
The other option is to just use some simple math:
numbers_in_nxn = random.sample(range(n*n), k) # Use xrange in Python 2.x
tuples_in_nxn = [divmod(x,n) for x in numbers_in_nxn]
You say:
Of course, I can explicitly build the
list containing all possible (n * n =
n^2) tuples, and then call
random.sample. But that probably is
not efficient if k is much smaller
than n^2.
Well, how about building the tuple after you have randomly picked one? Ie, if you can build the tuples before you randomly choose which one to pick, you can do the picking first and building later.
I don't understand how your tuples are supposed to look, but here is an example, although I realize your tuples are all of the same length, this shows the principle:
Instead of doing this:
>>> import random
>>> all_sequences = [range(x) for x in range(10)]
>>> all_sequences
[[], [0], [0, 1], [0, 1, 2], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6], [0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
>>> random.sample(all_sequences, 3)
[[0, 1, 2, 3, 4, 5, 6, 7], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
You would do this:
>>> import random
>>> selection = random.sample(range(10), 3)
>>> [range(x) for a in selection]
[[0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8], [0, 1, 2, 3, 4, 5, 6, 7, 8]]
Without trying (no python at hand):
random.shuffle(range(n))[:k]
see comments. Didn't sleep enough...