Related
I would like to return numbers from a list which are closer to each other than a treshold. I can do that with this script, but i'm looking for a faster solution. Is it possible with NumPy or SciPy?
mylist = [2, 4, 54, 43, 43, 3]
for i in range(len(mylist)):
for j in range(i + 1, len(mylist)):
closer_than = 2
if mylist[i] < mylist[j] + closer_than and mylist[i] > mylist[j] - closer_than:
print("close values:", mylist[i], mylist[j])
You can use combinations from itertools to create pairs between all numbers. Then you can filter these combinations with your threshold value.
from itertools import combinations
mylist = [2, 4, 54, 43, 43, 3]
def get_close_numbers(l, threshold):
combs = combinations(mylist, 2)
return list(filter(lambda x: abs(x[0] - x[1]) < threshold, combs))
get_close_numbers(mylist, threshold=2)
>> [(2, 3), (4, 3), (43, 43)]
This can be done with numpy.
import numpy as np
mylist = np.array( [2, 4, 54, 43, 43, 3] )
threshold = 2
temp =abs( np.subtract.outer( mylist, mylist ))
# Subtract each item from each item, keeping the absolute value.
temp
# array([[ 0, 2, 52, 41, 41, 1],
# [ 2, 0, 50, 39, 39, 1],
# [52, 50, 0, 11, 11, 51],
# [41, 39, 11, 0, 0, 40],
# [41, 39, 11, 0, 0, 40],
# [ 1, 1, 51, 40, 40, 0]])
temp[ np.tril_indices_from(temp) ] = threshold
# Set the lower left triangle to threshold
r_ix, c_ix = np.where( temp < threshold )
# find where temp < threshold
# Use the indices to find the values.
list(zip(mylist[ r_ix], mylist[ c_ix ]))
# [(2, 3), (4, 3), (43, 43)]
I want to write a function that can take small images and return a permutation of them, block-wise.
Basically I want to turn this:
Into this:
There was an excellent answer in Is there a function in Python that shuffle data by data blocks? that helped me write a solution. However for ~50,000 28x28 images this takes a long time to run.
# blocks of 7x7 shuffling
range1 = np.arange(4)
range2 = np.arange(4)
block_size = int(28 / 4)
print([[x[i*block_size:(i+1)*block_size].shape] for i in range1])
for x in x1:
np.random.shuffle(range1)
x[:] = np.block([[x[i*block_size:(i+1)*block_size]] for i in range1])
for a in x:
np.random.shuffle(range2)
a[:] = np.block([a[i*block_size:(i+1)*block_size] for i in range2])
print("x1", time.time() - begin)
begin = time.time()
Here's one approach based on this post -
def randomize_tiles_3D(x1, H, W):
# W,H are width and height of blocks
m,n,p = x1.shape
l1,l2 = n//H,p//W
combs = np.random.rand(m,l1*l2).argsort(axis=1)
r,c = np.unravel_index(combs,(l1,l2))
x1cr = x1.reshape(-1,l1,H,l2,W)
out = x1cr[np.arange(m)[:,None],r,:,c]
return out.reshape(-1,l1,l2,H,W).swapaxes(2,3).reshape(-1,n,p)
Sample run -
In [46]: x1
Out[46]:
array([[[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]],
[[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65],
[66, 67, 68, 69, 70, 71]]])
In [47]: np.random.seed(0)
In [48]: randomize_tiles_3D(x1, H=3, W=3)
Out[48]:
array([[[21, 22, 23, 0, 1, 2],
[27, 28, 29, 6, 7, 8],
[33, 34, 35, 12, 13, 14],
[18, 19, 20, 3, 4, 5],
[24, 25, 26, 9, 10, 11],
[30, 31, 32, 15, 16, 17]],
[[36, 37, 38, 54, 55, 56],
[42, 43, 44, 60, 61, 62],
[48, 49, 50, 66, 67, 68],
[39, 40, 41, 57, 58, 59],
[45, 46, 47, 63, 64, 65],
[51, 52, 53, 69, 70, 71]]])
I already found a solution that runs much faster. I feel silly because I didn't really need a double for loop, just two separate shuffle indexes. Leaving this solution here in case anyone wants to shuffle an image block-wise in numpy.
If anyone comes up with another good solution, let me know.
# blocks of 7x7 shuffling
range1 = np.arange(4)
range2 = np.arange(4)
block_size = int(28 / 4)
for x in x1:
np.random.shuffle(range1)
np.random.shuffle(range2)
x[:] = np.block([[x[i*block_size:(i+1)*block_size]] for i in range1])
x[:] = np.block([x[:,i*block_size:(i+1)*block_size] for i in range2])
It will be more efficient to use numpy.lib.stride_tricks.as_strided to break 2D matrices into blocks.
import numpy as np
img_width, block_width = 12, 3
n = img_width // block_width
a = np.arange(img_width * img_width).reshape(img_width, img_width)
print(a)
blocks = np.lib.stride_tricks.as_strided(a, \
shape=(n, n, block_width, block_width), \
strides=(a.itemsize * np.array([n * block_width ** 2, block_width, n * block_width, 1])))
print(blocks)
blocks = blocks.reshape((n * n, block_width, block_width)) # flatten for better shuffle
np.random.shuffle(blocks)
print(blocks)
blocks = np.lib.stride_tricks.as_strided(blocks, \
shape=(n, block_width, n, block_width), \
strides=(a.itemsize * np.array([n * block_width ** 2, block_width, block_width ** 2, 1])))
shuffled = np.reshape(blocks, (img_width, img_width))
print(shuffled)
Output can be found here: blocks_shuffle_example.ipynb
Document: numpy.lib.stride_tricks.as_strided
Here's one approach:
Assume that the original image has shape (m, n), and each block has shape (w, h).
import numpy as np
# split image into tiles of w*h blocks with shape = ((m * n) / (w * h), w, h)
tiles = np.array([img_pad[x : x+w, y : y+h] for x in range(0, m, w) for y in range(0, n, h)])
np.random.shuffle(tiles)
# merge back to shape = (m, n)
mb, nb = m // w, n // h
res = np.vstack(np.hstack(tiles[i*nb : (i+1)*nb]) for i in range(mb))
Update:
res = np.vstack(np.hstack(tiles[i*nb : (i+1)*nb]) for i in range(mb))
may cause "FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future." while running.
Use
res = np.block([[np.hstack(tiles[i*nb : (i+1)*nb])] for i in range(mb)])
instead and there're no warnings.
E.g. For the input 5, the output should be 7.
(bin(1) = 1, bin(2) = 10 ... bin(5) = 101) --> 1 + 1 + 2 + 1 + 2 = 7
Here's what I've tried, but it isn't a very efficient algorithm, considering that I iterate the loop once for each integer. My code (Python 3):
i = int(input())
a = 0
for b in range(i+1):
a = a + bin(b).count("1")
print(a)
Thank you!
Here's a solution based on the recurrence relation from OEIS:
def onecount(n):
if n == 0:
return 0
if n % 2 == 0:
m = n/2
return onecount(m) + onecount(m-1) + m
m = (n-1)/2
return 2*onecount(m)+m+1
>>> [onecount(i) for i in range(30)]
[0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35, 37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71]
gmpy2, due to Alex Martella et al, seems to perform better, at least on my Win10 machine.
from time import time
import gmpy2
def onecount(n):
if n == 0:
return 0
if n % 2 == 0:
m = n/2
return onecount(m) + onecount(m-1) + m
m = (n-1)/2
return 2*onecount(m)+m+1
N = 10000
initial = time()
for _ in range(N):
for i in range(30):
onecount(i)
print (time()-initial)
initial = time()
for _ in range(N):
total = 0
for i in range(30):
total+=gmpy2.popcount(i)
print (time()-initial)
Here's the output:
1.7816979885101318
0.07404899597167969
If you want a list, and you're using >Py3.2:
>>> from itertools import accumulate
>>> result = list(accumulate([gmpy2.popcount(_) for _ in range(30)]))
>>> result
[0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35, 37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71]
I have two lists of numbers, say [1, 2, 3, 4, 5] and [7, 8, 9, 10, 11], and I would like to form a new list which consists of the products of each member in the first list with each member in the second list. In this case, there would be 5*5 = 25 elements in the new list.
I have been unable to do this so far with a while() loop.
This is what I have so far:
x = 0
y = 99
results = []
while x < 5:
x = x + 1
results.append(x*y)
while y < 11:
y = y + 1
results.append(x*y)
Use itertools.product to generate all possible 2-tuples, then calculate the product of that:
[x * y for (x, y) in itertools.product([1,2,3,4,5], [7,8,9,10,11])]
The problem is an example of an outer product. The answer already posted with itertools.product is the way I would do this as well.
But here's an alternative with numpy, which is usually more efficient than working in pure python for crunching numeric data.
>>> import numpy as np
>>> x1 = np.array([1,2,3,4,5])
>>> x2 = np.array([7,8,9,10,11])
>>> np.outer(x1,x2)
array([[ 7, 8, 9, 10, 11],
[14, 16, 18, 20, 22],
[21, 24, 27, 30, 33],
[28, 32, 36, 40, 44],
[35, 40, 45, 50, 55]])
>>> np.ravel(np.outer(x1,x2))
array([ 7, 8, 9, 10, 11, 14, 16, 18, 20, 22, 21, 24, 27, 30, 33, 28, 32,
36, 40, 44, 35, 40, 45, 50, 55])
Wht dont you try with known old ways;
list1 = range(1, 100)
list2 = range(10, 50, 5)
new_values = []
for x in list1:
for y in list2:
new_values.append(x*y)
Without any importing, you can do:
[x * y for x in range(1, 6) for y in range(7, 12)]
or alternatively:
[[x * y for x in range(1, 6)] for y in range(7, 12)]
To split out the different multiples, but it depends which order you want the results in.
from functools import partial
mult = lambda x, y: x * y
l1 = [2,3,4,5,5]
l2 = [5,3,23,4,4]
results = []
for n in l1:
results.extend( map( partial(mult, n) , l2) )
print results
I have a numpy array of numbers, for example,
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
I would like to find all the indexes of the elements within a specific range. For instance, if the range is (6, 10), the answer should be (3, 4, 5). Is there a built-in function to do this?
You can use np.where to get indices and np.logical_and to set two conditions:
import numpy as np
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(np.logical_and(a>=6, a<=10))
# returns (array([3, 4, 5]),)
As in #deinonychusaur's reply, but even more compact:
In [7]: np.where((a >= 6) & (a <=10))
Out[7]: (array([3, 4, 5]),)
Summary of the answers
For understanding what is the best answer we can do some timing using the different solution.
Unfortunately, the question was not well-posed so there are answers to different questions, here I try to point the answer to the same question. Given the array:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
The answer should be the indexes of the elements between a certain range, we assume inclusive, in this case, 6 and 10.
answer = (3, 4, 5)
Corresponding to the values 6,9,10.
To test the best answer we can use this code.
import timeit
setup = """
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
# or test it with an array of the similar size
# a = np.random.rand(100)*23 # change the number to the an estimate of your array size.
# we define the left and right limit
ll = 6
rl = 10
def sorted_slice(a,l,r):
start = np.searchsorted(a, l, 'left')
end = np.searchsorted(a, r, 'right')
return np.arange(start,end)
"""
functions = ['sorted_slice(a,ll,rl)', # works only for sorted values
'np.where(np.logical_and(a>=ll, a<=rl))[0]',
'np.where((a >= ll) & (a <=rl))[0]',
'np.where((a>=ll)*(a<=rl))[0]',
'np.where(np.vectorize(lambda x: ll <= x <= rl)(a))[0]',
'np.argwhere((a>=ll) & (a<=rl)).T[0]', # we traspose for getting a single row
'np.where(ne.evaluate("(ll <= a) & (a <= rl)"))[0]',]
functions2 = [
'a[np.logical_and(a>=ll, a<=rl)]',
'a[(a>=ll) & (a<=rl)]',
'a[(a>=ll)*(a<=rl)]',
'a[np.vectorize(lambda x: ll <= x <= rl)(a)]',
'a[ne.evaluate("(ll <= a) & (a <= rl)")]',
]
rdict = {}
for i in functions:
rdict[i] = timeit.timeit(i,setup=setup,number=1000)
print("%s -> %s s" %(i,rdict[i]))
print("Sorted:")
for w in sorted(rdict, key=rdict.get):
print(w, rdict[w])
Results
The results are reported in the following plot for a small array (on the top the fastest solution) as noted by #EZLearner they may vary depending on the size of the array. sorted slice could be faster for larger arrays, but it requires your array to be sorted, for arrays with over 10 M of entries ne.evaluate could be an option. Is hence always better to perform this test with an array of the same size as yours:
If instead of the indexes you want to extract the values you can perform the tests using functions2 but the results are almost the same.
I thought I would add this because the a in the example you gave is sorted:
import numpy as np
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
start = np.searchsorted(a, 6, 'left')
end = np.searchsorted(a, 10, 'right')
rng = np.arange(start, end)
rng
# array([3, 4, 5])
a = np.array([1,2,3,4,5,6,7,8,9])
b = a[(a>2) & (a<8)]
Other way is with:
np.vectorize(lambda x: 6 <= x <= 10)(a)
which returns:
array([False, False, False, True, True, True, False, False, False])
It is sometimes useful for masking time series, vectors, etc.
This code snippet returns all the numbers in a numpy array between two values:
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56] )
a[(a>6)*(a<10)]
It works as following:
(a>6) returns a numpy array with True (1) and False (0), so does (a<10). By multiplying these two together you get an array with either a True, if both statements are True (because 1x1 = 1) or False (because 0x0 = 0 and 1x0 = 0).
The part a[...] returns all values of array a where the array between brackets returns a True statement.
Of course you can make this more complicated by saying for instance
...*(1-a<10)
which is similar to an "and Not" statement.
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.argwhere((a>=6) & (a<=10))
Wanted to add numexpr into the mix:
import numpy as np
import numexpr as ne
a = np.array([1, 3, 5, 6, 9, 10, 14, 15, 56])
np.where(ne.evaluate("(6 <= a) & (a <= 10)"))[0]
# array([3, 4, 5], dtype=int64)
Would only make sense for larger arrays with millions... or if you hitting a memory limits.
This may not be the prettiest, but works for any dimension
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
ranges = (0,4), (0,4)
def conditionRange(X : np.ndarray, ranges : list) -> np.ndarray:
idx = set()
for column, r in enumerate(ranges):
tmp = np.where(np.logical_and(X[:, column] >= r[0], X[:, column] <= r[1]))[0]
if idx:
idx = idx & set(tmp)
else:
idx = set(tmp)
idx = np.array(list(idx))
return X[idx, :]
b = conditionRange(a, ranges)
print(b)
s=[52, 33, 70, 39, 57, 59, 7, 2, 46, 69, 11, 74, 58, 60, 63, 43, 75, 92, 65, 19, 1, 79, 22, 38, 26, 3, 66, 88, 9, 15, 28, 44, 67, 87, 21, 49, 85, 32, 89, 77, 47, 93, 35, 12, 73, 76, 50, 45, 5, 29, 97, 94, 95, 56, 48, 71, 54, 55, 51, 23, 84, 80, 62, 30, 13, 34]
dic={}
for i in range(0,len(s),10):
dic[i,i+10]=list(filter(lambda x:((x>=i)&(x<i+10)),s))
print(dic)
for keys,values in dic.items():
print(keys)
print(values)
Output:
(0, 10)
[7, 2, 1, 3, 9, 5]
(20, 30)
[22, 26, 28, 21, 29, 23]
(30, 40)
[33, 39, 38, 32, 35, 30, 34]
(10, 20)
[11, 19, 15, 12, 13]
(40, 50)
[46, 43, 44, 49, 47, 45, 48]
(60, 70)
[69, 60, 63, 65, 66, 67, 62]
(50, 60)
[52, 57, 59, 58, 50, 56, 54, 55, 51]
You can use np.clip() to achieve the same:
a = [1, 3, 5, 6, 9, 10, 14, 15, 56]
np.clip(a,6,10)
However, it holds the values less than and greater than 6 and 10 respectively.