How can I downscale the raster data of 4*6 size into 2*3 size using 'mode' i.e., most common value with in 2*2 pixels?
import numpy as np
data=np.array([
[0,0,1,1,1,1],
[1,0,0,1,1,1],
[1,0,1,1,0,1],
[1,1,0,1,0,0]])
The result should be:
result = np.array([
[0,1,1],
[1,1,0]])
Please refer to this thread for a full explanation. The following code will calculate your desired result.
from sklearn.feature_extraction.image import extract_patches
data=np.array([
[0,0,1,1,1,1],
[1,0,0,1,1,1],
[1,0,1,1,0,1],
[1,1,0,1,0,0]])
patches = extract_patches(data, patch_shape=(2, 2), extraction_step=(2, 2))
most_frequent_number = ((patches > 0).sum(axis=-1).sum(axis=-1) > 2).astype(int)
print most_frequent_number
Here's one way to go,
from itertools import product
from numpy import empty,argmax,bincount
res = empty((data.shape[0]/2,data.shape[1]/2))
for j,k in product(xrange(res.shape[0]),xrange(res.shape[1])):
subvec = data[2*j:2*j+2,2*k:2*k+2].flatten()
res[j,k]=argmax(bincount(subvec))
This works as long as the input data contains an integer number of 2x2 blocks.
Notice that a block like [[0,0],[1,1]] will lead 0 as result, because argmax returns the index of the first occurrence only. Use res[j,k]=subvec.max()-argmax(bincount(subvec)[::-1]) if you want these 2x2 blocks to count as 1.
There appears to be more than one statistic you wish to collect about each block. Using toblocks (below) you can apply various computations to the last axis of blocks to obtain the desired statistics:
import numpy as np
import scipy.stats as stats
def toblocks(arr, nrows, ncols):
h, w = arr.shape
blocks = (arr.reshape(h // nrows, nrows, -1, ncols)
.swapaxes(1, 2)
.reshape(h // nrows, w // ncols, ncols * nrows))
return blocks
data=np.array([
[0,0,1,1,1,1],
[1,0,0,1,1,1],
[1,0,1,1,0,1],
[1,1,0,1,0,0]])
blocks = toblocks(data, 2, 2)
vals, counts = stats.mode(blocks, axis=-1)
vals = vals.squeeze()
print(vals)
# [[ 0. 1. 1.]
# [ 1. 1. 0.]]
Related
I need to divide a 2D matrix into a set of 2D patches with a certain stride, then multiply every patch by its center element and sum the elements of each patch.
It feels not unlike a convolution where a separate kernel is used for every element of the matrix.
Below is a visual illustration.
The elements of the result matrix are calculated like this:
The result should look like this:
Here's a solution I came up with:
window_shape = (2, 2)
stride = 1
# Matrix
m = np.arange(1, 17).reshape((4, 4))
# Pad it once per axis to make sure the number of views
# equals the number of elements
m_padded = np.pad(m, (0, 1))
# This function divides the array into `windows`, from:
# https://stackoverflow.com/questions/45960192/using-numpy-as-strided-function-to-create-patches-tiles-rolling-or-sliding-w#45960193
w = window_nd(m_padded, window_shape, stride)
ww, wh, *_ = w.shape
w = w.reshape((ww * wh, 4)) # Two first dimensions multiplied is the number of rows
# Tile each center element for element-wise multiplication
m_tiled = np.tile(m.ravel(), (4, 1)).transpose()
result = (w * m_tiled).sum(axis = 1).reshape(m.shape)
In my view it's not very efficient as a few arrays are allocated in the intermediary steps.
What is a better or more efficient way to accomplish this?
Try scipy.signal.convolve
from scipy.signal import convolve
window_shape = (2, 2)
stride = 1
# Matrix
m = np.arange(1, 17).reshape((4, 4))
# Pad it once per axis to make sure the number of views
# equals the number of elements
m_padded = np.pad(m, (0, 1))
output = convolve(m_padded, np.ones(window_shape), 'valid') * m
print(output)
Output:
array([[ 14., 36., 66., 48.],
[150., 204., 266., 160.],
[414., 500., 594., 336.],
[351., 406., 465., 256.]])
I have a list of indices
a = [
[1,2,4],
[0,2,3],
[1,3,4],
[0,2]]
What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?
I.e. what I want is:
output = array([
[0,1,1,0,1],
[1,0,1,1,0],
[0,1,0,1,1],
[1,0,1,0,0]])
I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?
My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.
How about this:
ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
Here are timings for a 1000x1000 binary array, note that I use an optimized version of the above, see function pp below:
pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms
Code to produce timings:
import itertools as it
import numpy as np
def make_data(n,m):
I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]
def pp():
sz = np.fromiter(map(len,a),int,nrow)
out = np.zeros((nrow,ncol),int)
out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
return out
def ts():
out = np.zeros((nrow,ncol),int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
def u9():
out = np.zeros((nrow,ncol),int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
nrow,ncol = 1000,1000
a = make_data(nrow,ncol)
from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()
print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")
This might not be the fastest way. You will need to compare execution times of these answers using large arrays in order to find out the fastest way. Here's my solution
output = np.zeros((4,5))
for i, ix in enumerate(a):
output[i][ix] = 1
# output ->
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
In case you can and want to use Cython you can create a readable (at least if you don't mind the typing) and fast solution.
Here I'm using the IPython bindings of Cython to compile it in a Jupyter notebook:
%load_ext cython
%%cython
cimport cython
cimport numpy as cnp
import numpy as np
#cython.boundscheck(False) # remove this if you cannot guarantee that nrow/ncol are correct
#cython.wraparound(False)
cpdef cnp.int_t[:, :] mseifert(list a, int nrow, int ncol):
cdef cnp.int_t[:, :] out = np.zeros([nrow, ncol], dtype=int)
cdef list subl
cdef int row_idx
cdef int col_idx
for row_idx, subl in enumerate(a):
for col_idx in subl:
out[row_idx, col_idx] = 1
return out
To compare the performance of the solutions presented here I use my library simple_benchmark:
Note that this uses logarithmic axis to simultaneously show the differences for small and large arrays. According to my benchmark my function is actually the fastest of the solutions, however it's also worth pointing out that all of the solutions aren't too far off.
Here is the complete code I used for the benchmark:
import numpy as np
from simple_benchmark import BenchmarkBuilder, MultiArgument
import itertools
b = BenchmarkBuilder()
#b.add_function()
def pp(a, nrow, ncol):
sz = np.fromiter(map(len, a), int, nrow)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat(sz), np.fromiter(itertools.chain.from_iterable(a), int, sz.sum())] = 1
return out
#b.add_function()
def ts(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
#b.add_function()
def u9(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
b.add_functions([mseifert])
#b.add_arguments("number of rows/columns")
def argument_provider():
for n in range(2, 13):
ncols = 2**n
a = [
sorted(set(np.random.randint(0, ncols, size=np.random.randint(0, ncols))))
for _ in range(ncols)
]
yield ncols, MultiArgument([a, ncols, ncols])
r = b.run()
r.plot()
May not be the best way but the only way I can think of:
output = np.zeros((4,5))
for i, (x, y) in enumerate(zip(a, output)):
y[x] = 1
output[i] = y
print(output)
Which outputs:
[[ 0. 1. 1. 0. 1.]
[ 1. 0. 1. 1. 0.]
[ 0. 1. 0. 1. 1.]
[ 1. 0. 1. 0. 0.]]
How about using array indexing? If you knew more about your input, you could get rid of the penalty for having to convert to a linear array first.
import numpy as np
def main():
row_count = 4
col_count = 5
a = [[1,2,4],[0,2,3],[1,3,4],[0,2]]
# iterate through each row, concatenate all indices and convert them to linear
# numpy append performs copy even if you don't want it, list append is faster
b = []
for row_idx, row in enumerate(a):
b.append(np.array(row, dtype=np.int64) + (row_idx * col_count))
linear_idxs = np.hstack(b)
#could skip previous steps if given index inputs well before hand, or in linear index order.
c = np.zeros(row_count * col_count)
c[linear_idxs] = 1
c = c.reshape(row_count, col_count)
print(c)
if __name__ == "__main__":
main()
#output
# [[0. 1. 1. 0. 1.]
# [1. 0. 1. 1. 0.]
# [0. 1. 0. 1. 1.]
# [1. 0. 1. 0. 0.]]
Depending on your use case, you might look into using sparse matrices. The input matrix looks suspiciously like a Compressed Sparse Row (CSR) matrix. Perhaps something like
import numpy as np
from scipy.sparse import csr_matrix
from itertools import accumulate
def ragged2csr(inds):
offset = len(inds[0])
lens = [len(x) for x in inds]
indptr = list(accumulate(lens))
indptr = np.array([x - offset for x in indptr])
indices = np.array([val for sublist in inds for val in sublist])
n = indices.size
data = np.ones(n)
return csr_matrix((data, indices, indptr))
Again, if it fits in your use case, a sparse matrix would allow elementwise/masking operations to scale with the number of nonzeros, rather than the number of elements (rows*columns), which could bring significant speedup (for a sparse enough matrix).
Another good introduction to CSR matrices is section 3.4 of Iterative Methods. In this case, data is aa, indices is ja and indptr is ia. This format also has the benefit of being very popular among different packages/libraries.
This question already has answers here:
Subtract all pairs of values from two arrays
(2 answers)
Closed 4 years ago.
I have two numpy arrrays:
import numpy as np
points_1 = np.array([1.5,2.5,1,3])
points_2 = np.array([3,4])
I would like to take evey point from points_1 array and deduce whole points_2 array from it in order to get a matrix
I would like to get
[[-1.5,-2.5]
[-0.5,-1.5]
[-2 , -3]
[0 , -1]]
I know there is a way with iteration
points = [x - points_2 for x in points_1]
points = np.array(points)
However this option is not fast enough. In reality I am using much bigger arrays.
Is there some fastser way?
Thanks!
You just have to chose points_2 "better" (better means here an other dimension of you matrix), then it works as you expect it:
so do not use points_2 = np.array([3, 4]) but points_2 = np.array([[3],[4]]):
import numpy as np
points_1 = np.array([1.5,2.5,1,3])
points_2 = np.array([[3],[4]])
points = (points_1 - points_2).transpose()
print(points)
results in:
[[-1.5 -2.5]
[-0.5 -1.5]
[-2. -3. ]
[ 0. -1. ]]
If you don't the whole array at once. You can use generators and benefit from lazy evaluation:
import numpy as np
points_1 = np.array([1.5,2.5,1,3])
points_2 = np.array([3,4])
def get_points():
def get_points_internal():
for p1 in points_1:
for p2 in points_2:
yield [p1 - p2]
x = len(points_1) * len(points_2)
points_1d = get_points_internal()
for i in range(0, int(x/2)):
yield [next(points_1d), next(points_1d)]
points = get_points()
Make use of numpy's broadcasting feature. This will provide the following:
import numpy as np
points_1 = np.array([1.5,2.5,1,3])
points_2 = np.array([3,4])
points = points_1[:, None] - points_2
print(points)
Output:
[[-1.5 -2.5]
[-0.5 -1.5]
[-2. -3. ]
[ 0. -1. ]]
It works by repeating the operation over the 1 dimension injected by the None index. For more info see the link.
You can do it in one line :
np.subtract.outer(points_1,points_2)
This is vectored so very fast.
You need to use tranposed matrix.
points_1-np.transpose([points_2])
and for your result
np.tanspose(points_1-np.transpose([points_2]))
I want to sum the values in vals into elements of a smaller array a specified in an index list idx.
import numpy as np
a = np.zeros((1,3))
vals = np.array([1,2,3,4])
idx = np.array([0,1,2,2])
a[0,idx] += vals
This produces the result [[ 1. 2. 4.]] but I want the result [[ 1. 2. 7.]], because it should add the 3 from vals and 4 from vals into the 2nd element of a.
I can achieve what I want with:
import numpy as np
a = np.zeros((1,3))
vals = np.array([1,2,3,4])
idx = np.array([0,1,2,2])
for i in np.unique(idx):
fidx = (idx==i).astype(int)
psum = (vals * fidx).sum()
a[0,i] = psum
print(a)
Is there a way to do this with numpy without using a for loop?
Possible with np.add.at as long as the shapes align, i.e., a will need to be 1D here.
a = a.squeeze()
np.add.at(a, idx, vals)
a
array([1., 2., 7.])
I am trying to add one column to the array created from recfromcsv. In this case it's an array: [210,8] (rows, cols).
I want to add a ninth column. Empty or with zeroes doesn't matter.
from numpy import genfromtxt
from numpy import recfromcsv
import numpy as np
import time
if __name__ == '__main__':
print("testing")
my_data = recfromcsv('LIAB.ST.csv', delimiter='\t')
array_size = my_data.size
#my_data = np.append(my_data[:array_size],my_data[9:],0)
new_col = np.sum(x,1).reshape((x.shape[0],1))
np.append(x,new_col,1)
I think that your problem is that you are expecting np.append to add the column in-place, but what it does, because of how numpy data is stored, is create a copy of the joined arrays
Returns
-------
append : ndarray
A copy of `arr` with `values` appended to `axis`. Note that `append`
does not occur in-place: a new array is allocated and filled. If
`axis` is None, `out` is a flattened array.
so you need to save the output all_data = np.append(...):
my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t')
new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape
new_col.shape
#(210,1)
all_data = np.append(my_data, new_col, 1)
all_data.shape
#(210,9)
Alternative ways:
all_data = np.hstack((my_data, new_col))
#or
all_data = np.concatenate((my_data, new_col), 1)
I believe that the only difference between these three functions (as well as np.vstack) are their default behaviors for when axis is unspecified:
concatenate assumes axis = 0
hstack assumes axis = 1 unless inputs are 1d, then axis = 0
vstack assumes axis = 0 after adding an axis if inputs are 1d
append flattens array
Based on your comment, and looking more closely at your example code, I now believe that what you are probably looking to do is add a field to a record array. You imported both genfromtxt which returns a structured array and recfromcsv which returns the subtly different record array (recarray). You used the recfromcsv so right now my_data is actually a recarray, which means that most likely my_data.shape = (210,) since recarrays are 1d arrays of records, where each record is a tuple with the given dtype.
So you could try this:
import numpy as np
from numpy.lib.recfunctions import append_fields
x = np.random.random(10)
y = np.random.random(10)
z = np.random.random(10)
data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)])
data = np.recarray(data.shape, data.dtype, buf=data)
data.shape
#(10,)
tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray
tot.shape
#(10,)
all_data = append_fields(data, 'total', tot, usemask=False)
all_data
#array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498),
# (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745),
# (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588 , 2.121903762680979 ),
# (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306),
# (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675 , 1.4957409515009568),
# (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308 , 2.4853911958174133),
# (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103 , 1.275756904913104 ),
# (0.684075052174589 , 0.6654774682866273 , 0.5246593820025259 , 1.8742119024637423),
# (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ),
# (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)],
# dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')])
all_data.shape
#(10,)
all_data.dtype.names
#('x', 'y', 'z', 'total')
If you have an array, a of say 210 rows by 8 columns:
a = numpy.empty([210,8])
and want to add a ninth column of zeros you can do this:
b = numpy.append(a,numpy.zeros([len(a),1]),1)
The easiest solution is to use numpy.insert().
The Advantage of np.insert() over np.append is that you can insert the new columns into custom indices.
import numpy as np
X = np.arange(20).reshape(10,2)
X = np.insert(X, [0,2], np.random.rand(X.shape[0]*2).reshape(-1,2)*10, axis=1)
'''
np.append or np.hstack expects the appended column to be the proper shape, that is N x 1. We can use np.zeros to create this zeros column (or np.ones to create a ones column) and append it to our original matrix (2D array).
def append_zeros(x):
zeros = np.zeros((len(x), 1)) # zeros column as 2D array
return np.hstack((x, zeros)) # append column
I add a new column with ones to a matrix array in this way:
Z = append([[1 for _ in range(0,len(Z))]], Z.T,0).T
Maybe it is not that efficient?
It can be done like this:
import numpy as np
# create a random matrix:
A = np.random.normal(size=(5,2))
# add a column of zeros to it:
print(np.hstack((A,np.zeros((A.shape[0],1)))))
In general, if A is an m*n matrix, and you need to add a column, you have to create an n*1 matrix of zeros, then use "hstack" to add the matrix of zeros to the right of the matrix A.
Similar to some of the other answers suggesting using numpy.hstack, but more readable:
import numpy as np
# declare 10 rows x 3 cols integer array of all 1s
arr = np.ones((10, 3), dtype=np.int64)
# get the number of rows in the original array (as if we didn't know it was 10 or it could be different in other cases)
numRows = arr.shape[0]
# declare the new array which will be the new column, integer array of all 0s so it's visually distinct from the original array
additionalColumn = np.zeros((numRows, 1), dtype=np.int64)
# use hstack to tack on the additionl column
result = np.hstack((arr, additionalColumn))
print(result)
result:
$ python3 scratchpad.py
[[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]]
Here's a shorter one-liner:
import numpy as np
data = np.random.rand(210, 8)
data = np.c_[data, np.zeros(len(data))]
Something that I use often to convert points to homogenous coordinates with np.ones instead.