Related
I want to replace the N first identic consecutive numbers from an array with 0.
import numpy as np
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
Loop works, but what would be a faster-vectorized implementation?
i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
x[i] = 0
i += 1
You can use argmax on a boolean array to get the index of the first changing value.
Then slice and replace:
n = (x!=x[0]).argmax() # 4
x[:n] = 0
output:
array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
intermediate array:
(x!=x[0])
# n=4
# [False False False False True True True True True True True True
# True True True True True True True]
My solution is based on itertools.groupby, so start from import itertools.
This function creates groups of consecutive equal values, contrary to e.g.
the pandasonic version of groupby, which collects withis a single group all
equal values from the input.
Another important feature is that you can assign any value to N and
replaced will be only the first N of a sequence of consecutive values.
To test my code, I set N = 4 and defined the source array as:
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])
Note that it contains 5 consecutive values of 2 at the end.
Then, to get the expected result, run:
rv = []
for key, grp in itertools.groupby(x):
lst = list(grp)
lgth = len(lst)
if lgth >= N:
lst[0:N] = [0] * N
rv.extend(lst)
xNew = np.array(rv)
The result is:
[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]
Note that a sequence of 4 zeroes occurs:
at the beginning (all 4 values of 1 have been replaced),
almost at the end (from 5 values of 2 first 4 have been replaced).
I have an array that looks like this:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
I want to write a function that will randomly return some specified number of indices that correspond to a specified number. In other words, if I pass the function the array x, the desired number of indices such as 3, and the target value 1, I would want it to return an array such as:
[0, 7, 13]
Since 0, 7, and 13 are the indices that correspond to 1 in x.
Does anyone know how I might do this efficiently?
You want to use random.sample for this:
import random
def f(arr, target, num):
return random.sample([i for i, x in enumerate(arr) if x == target], k=num)
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
print(f(x, 1, 3))
Output:
[0, 1, 15]
You can use the sample function from the random module and pass it the list of indices that match the specified value:
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
from random import sample
def randomIndices(a,count,v):
return sample([i for i,n in enumerate(a) if n==v],count)
print(randomIndices(x,3,1)) # [1,18,15]
Your question asks how to do this efficiently, which depends on how you plan on using this code. As myself and others have pointed out, one way is to use enumerate to filter the list for the indices that correspond to the target value. The downside here is that each time you pick a new target value or request a new sample, you have to once again enumerate the list which is an O(n) operation.
If you plan on taking multiple samples, you may be better off building a dictionary mapping the target value to the indices upfront. Then you can subsequently use this dictionary to draw random samples more efficiently than enumerating. (The magnitude of the savings would grow as x becomes very large).
First build the dictionary using collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for i, val in enumerate(x):
d[val].append(i)
print(dict(d))
#{1: [0, 1, 7, 13, 15, 16, 18], 2: [2, 5, 6, 8, 10, 12, 14, 17], 3: [3, 4, 9, 11]}
Now you can use d to draw your samples:
from random import sample
def get_random_sample(d, target_value, size):
return sample(d[target_value], size)
print(get_random_sample(d, target_value=1, size=3))
#[16, 7, 18]
You can do the next:
Get the indices of the items with value equal to 1
Use random.sample to select randomly only a few indices (without repetitions) extracted from the previous step.
Here is one way to do it (n indicates the number of indices to pick):
from random import sample
x = [1, 1, 2, 3, 3, 2, 2, 1, 2, 3, 2, 3, 2, 1, 2, 1, 1, 2, 1]
n = 3
target = 1
indices = frozenset(filter(lambda k: x[k] == target, range(len(x))))
out = sample(indices, min(len(indices), n))
print(out)
Note that the number of returned indices could be lower than n (if the number of 1s in the list is less than n)
I have a minimum value and maximum value, I'd like to generate a list of numbers between them such that all the numbers have equal counts. Is there a numpy function or any function out there?
Example: GenerateNums(start=1, stop=5, nums=10)
Expected output: [1,1,2,2,3,3,4,4,5,5] i.e each number has an almost equal count
Takes "almost equal" to heart -- the difference between the most common and least common number is at most 1. No guarantee about which number is the mode.
def gen_nums(start, stop, nums):
binsize = (1 + stop - start) * 1.0 / nums
return map(lambda x: int(start + binsize * x), xrange(nums))
gen_nums(1, 5, 10)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
There is a numpy function:
In [3]: np.arange(1,6).repeat(2)
Out[3]: array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
def GenerateNums(start=1, stop=5, nums=10):
result = []
rep = nums/(stop - start + 1 )
for i in xraneg(start,stop):
for j in range(rep):
result.append(i)
return result
For almost equal counts, you can sample from a uniform distribution. numpy.random.randint does this:
>>> import numpy as np
>>> np.random.randint(low=1, high=6, size=10)
array([4, 5, 5, 4, 5, 5, 2, 1, 4, 2])
To get these values in sorted order:
>>> sorted(np.random.randint(low=1, high=6, size=10))
[1, 1, 1, 2, 3, 3, 3, 3, 5, 5]
This process is just like rolling dice :) As you sample more times, the counts of each value should become very similar:
>>> from collections import Counter
>>> Counter(np.random.randint(low=1, high=6, size=10000))
Counter({1: 1978, 2: 1996, 3: 2034, 4: 1982, 5: 2010})
For exactly equal counts:
>>> range(1,6) * 2
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> sorted(range(1,6) * 2)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
def GenerateNums(start=0,stop=0,nums=0,result=[]):
assert (nums and stop > 0), "ZeroDivisionError"
# get repeating value
iter_val = int(round(nums/stop))
# go through strt/end and repeat the item on adding
[[result.append(x) for __ in range(iter_val)] for x in range(start,stop)]
return result
print (GenerateNums(start=0, stop=5, nums=30))
>>> [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]
How can I create a list of consecutive numbers where each number repeats N times, for example:
list = [0,0,0,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5]
Another idea, without any need for other packages or sums:
[x//N for x in range((M+1)*N)]
Where N is your number of repeats and M is the maximum value to repeat. E.g.
N = 3
M = 5
[x//N for x in range((M+1)*N)]
yields
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
My first instinct is to get some functional help from the funcy package. If N is the number of times to repeat each value, and M is the maximum value to repeat, then you can do
import funcy as fp
fp.flatten(fp.repeat(i, N) for i in range(M + 1))
This will return a generator, so to get the array you can just call list() around it
sum([[i]*n for i in range(0,x)], [])
The following piece of code is the simplest version I can think of.
It’s a bit dirty and long, but it gets the job done.
In my opinion, it’s easier to comprehend.
def mklist(s, n):
l = [] # An empty list that will contain the list of elements
# and their duplicates.
for i in range(s): # We iterate from 0 to s
for j in range(n): # and appending each element (i) to l n times.
l.append(i)
return l # Finally we return the list.
If you run the code …:
print mklist(10, 2)
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
print mklist(5, 3)
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4
Another version a little neater, with list comprehension.
But uhmmm… We have to sort it though.
def mklist2(s, n):
return sorted([l for l in range(s) * n])
Running that version will give the following results.
print mklist2(5, 3)
Raw : [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
Sorted: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
I have an ndmatrix in numpy (n x n x n), which I vectorise in order to do some sampling of my data in a particular way, giving me (1 x n^3).
I would like to take the individual vectorised indices and convert them back to n-dimensional indices in the form (n x n x n). Im not sure how bumpy actually vectorises matrices.
Can anyone advise?
Numpy has a function unravel_index which does pretty much that: given a set of 'flat' indices, it will return a tuple of arrays of indices in each dimension:
>>> indices = np.arange(25, dtype=int)
>>> np.unravel_index(indices, (5, 5))
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4,
4, 4], dtype=int64),
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2,
3, 4], dtype=int64))
You can then zip them to get your original indices.
Be aware however that matrices can be represented as 'sequences of rows' (C convention, 'C') or 'sequence of columns' (Fortran convention, 'F'), or the corresponding convention in higher dimensions. Typical flattening of matrices in numpy will preserve that order, so [[1, 2], [3, 4]] can be flattened into [1, 2, 3, 4] (if it has 'C' order) or [1, 3, 2, 4] (if it has 'F' order). unravel_index takes an optional order parameter if you want to change the default (which is 'C'), so you can do:
>>> # Typically, transposition will change the order for
>>> # efficiency reasons: no need to change the data !
>>> n = np.random.random((2, 2, 2)).transpose()
>>> n.flags.f_contiguous
True
>>> n.flags.c_contiguous
False
>>> x, y, z = np.unravel_index([1,2,3,7], (2, 2, 2), order='F')