Related
I'd like to do slice operation on numpy array in parametric way in function so I could get expected array element for my computation. I know how to slide the array by index, but I am more interested in slicing array element in parametric way, so no need to indicate the index. In my case, I have coefficient array c and power array p, I have also parameter num_order. Basically, num_order decide the index of slicing array. To do so, I have following attempt:
my attempt:
import numpy as np
c=[1,1/2, -1/6, 1/12]
p= [1,2,3,4]
x = np.array([1, 1, 2, 3, 5, 8, 13, 21])
def arr_pow(x, num_order):
output= []
for i in range(num_order):
mul = c[i] * np.power(x, p[i])
output.append(mul)
return output
so, if num_order=2, then I also slice first two term of c and p doing c_new = c[:-2], p_new=p[:-2], c_new=[1,1/2], p_new=[1,2] and so on. I am curious is there any better way to do slicing element in two or more array based on param num_order. Can anyone point me out any elegant way to make this happen in parameterized function? Any thoughts?
update:
instead of doing c_new=c[:-1], p_new=[:-1] if num_order=3, and c_new=c[:-2], p_new=p[:-2] if num_order=2, and so on, is there more elegant way (parametric fashion) to do this? Any way of doing this efficiently in python function? Thanks!
I'm not sure if this is the output you want (if you could please update your question to include the expected output that would be helpful):
import numpy as np
c = np.array([1, 1 / 2, -1 / 6, 1 / 12])
p = np.array([1, 2, 3, 4])
x = np.array([1, 1, 2, 3, 5, 8, 13, 21])
def arr_pow_numpy(x, num_order):
return c[:num_order, None] * np.power(x[None], p[:num_order, None])
def arr_pow(x, num_order):
output = []
for i in range(num_order):
mul = c[i] * np.power(x, p[i])
output.append(mul)
return np.asarray(output)
for num_order in range(1, len(p)):
assert np.array_equal(arr_pow(x, num_order), arr_pow_numpy(x, num_order)), f"{num_order}"
The idea here is to use NumPy broadcasting plus NumPy slicing to achieve the result you want without for loops and in a parametric way.
Use the following:
num_order = 2
np.array([c[i] * np.power(x, p[i]) for i in range(num_order)])
# Out:
# array([[ 1. , 1. , 2. , 3. , 5. , 8. , 13. , 21. ],
# [ 0.5, 0.5, 2. , 4.5, 12.5, 32. , 84.5, 220.5]])
I have a list of indices
a = [
[1,2,4],
[0,2,3],
[1,3,4],
[0,2]]
What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?
I.e. what I want is:
output = array([
[0,1,1,0,1],
[1,0,1,1,0],
[0,1,0,1,1],
[1,0,1,0,0]])
I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?
My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.
How about this:
ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
Here are timings for a 1000x1000 binary array, note that I use an optimized version of the above, see function pp below:
pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms
Code to produce timings:
import itertools as it
import numpy as np
def make_data(n,m):
I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]
def pp():
sz = np.fromiter(map(len,a),int,nrow)
out = np.zeros((nrow,ncol),int)
out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
return out
def ts():
out = np.zeros((nrow,ncol),int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
def u9():
out = np.zeros((nrow,ncol),int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
nrow,ncol = 1000,1000
a = make_data(nrow,ncol)
from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()
print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")
This might not be the fastest way. You will need to compare execution times of these answers using large arrays in order to find out the fastest way. Here's my solution
output = np.zeros((4,5))
for i, ix in enumerate(a):
output[i][ix] = 1
# output ->
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
In case you can and want to use Cython you can create a readable (at least if you don't mind the typing) and fast solution.
Here I'm using the IPython bindings of Cython to compile it in a Jupyter notebook:
%load_ext cython
%%cython
cimport cython
cimport numpy as cnp
import numpy as np
#cython.boundscheck(False) # remove this if you cannot guarantee that nrow/ncol are correct
#cython.wraparound(False)
cpdef cnp.int_t[:, :] mseifert(list a, int nrow, int ncol):
cdef cnp.int_t[:, :] out = np.zeros([nrow, ncol], dtype=int)
cdef list subl
cdef int row_idx
cdef int col_idx
for row_idx, subl in enumerate(a):
for col_idx in subl:
out[row_idx, col_idx] = 1
return out
To compare the performance of the solutions presented here I use my library simple_benchmark:
Note that this uses logarithmic axis to simultaneously show the differences for small and large arrays. According to my benchmark my function is actually the fastest of the solutions, however it's also worth pointing out that all of the solutions aren't too far off.
Here is the complete code I used for the benchmark:
import numpy as np
from simple_benchmark import BenchmarkBuilder, MultiArgument
import itertools
b = BenchmarkBuilder()
#b.add_function()
def pp(a, nrow, ncol):
sz = np.fromiter(map(len, a), int, nrow)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat(sz), np.fromiter(itertools.chain.from_iterable(a), int, sz.sum())] = 1
return out
#b.add_function()
def ts(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
#b.add_function()
def u9(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
b.add_functions([mseifert])
#b.add_arguments("number of rows/columns")
def argument_provider():
for n in range(2, 13):
ncols = 2**n
a = [
sorted(set(np.random.randint(0, ncols, size=np.random.randint(0, ncols))))
for _ in range(ncols)
]
yield ncols, MultiArgument([a, ncols, ncols])
r = b.run()
r.plot()
May not be the best way but the only way I can think of:
output = np.zeros((4,5))
for i, (x, y) in enumerate(zip(a, output)):
y[x] = 1
output[i] = y
print(output)
Which outputs:
[[ 0. 1. 1. 0. 1.]
[ 1. 0. 1. 1. 0.]
[ 0. 1. 0. 1. 1.]
[ 1. 0. 1. 0. 0.]]
How about using array indexing? If you knew more about your input, you could get rid of the penalty for having to convert to a linear array first.
import numpy as np
def main():
row_count = 4
col_count = 5
a = [[1,2,4],[0,2,3],[1,3,4],[0,2]]
# iterate through each row, concatenate all indices and convert them to linear
# numpy append performs copy even if you don't want it, list append is faster
b = []
for row_idx, row in enumerate(a):
b.append(np.array(row, dtype=np.int64) + (row_idx * col_count))
linear_idxs = np.hstack(b)
#could skip previous steps if given index inputs well before hand, or in linear index order.
c = np.zeros(row_count * col_count)
c[linear_idxs] = 1
c = c.reshape(row_count, col_count)
print(c)
if __name__ == "__main__":
main()
#output
# [[0. 1. 1. 0. 1.]
# [1. 0. 1. 1. 0.]
# [0. 1. 0. 1. 1.]
# [1. 0. 1. 0. 0.]]
Depending on your use case, you might look into using sparse matrices. The input matrix looks suspiciously like a Compressed Sparse Row (CSR) matrix. Perhaps something like
import numpy as np
from scipy.sparse import csr_matrix
from itertools import accumulate
def ragged2csr(inds):
offset = len(inds[0])
lens = [len(x) for x in inds]
indptr = list(accumulate(lens))
indptr = np.array([x - offset for x in indptr])
indices = np.array([val for sublist in inds for val in sublist])
n = indices.size
data = np.ones(n)
return csr_matrix((data, indices, indptr))
Again, if it fits in your use case, a sparse matrix would allow elementwise/masking operations to scale with the number of nonzeros, rather than the number of elements (rows*columns), which could bring significant speedup (for a sparse enough matrix).
Another good introduction to CSR matrices is section 3.4 of Iterative Methods. In this case, data is aa, indices is ja and indptr is ia. This format also has the benefit of being very popular among different packages/libraries.
I have a set of data in python likes:
x y angle
If I want to calculate the distance between two points with all possible value and plot the distances with the difference between two angles.
x, y, a = np.loadtxt('w51e2-pa-2pk.log', unpack=True)
n = 0
f=(((x[n])-x[n+1:])**2+((y[n])-y[n+1:])**2)**0.5
d = a[n]-a[n+1:]
plt.scatter(f,d)
There are 255 points in my data.
f is the distance and d is the difference between two angles.
My question is can I set n = [1,2,3,.....255] and do the calculation again to get the f and d of all possible pairs?
You can obtain the pairwise distances through broadcasting by considering it as an outer operation on the array of 2-dimensional vectors as follows:
vecs = np.stack((x, y)).T
np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
For example,
In [1]: import numpy as np
...: x = np.array([1, 2, 3])
...: y = np.array([3, 4, 6])
...: vecs = np.stack((x, y)).T
...: np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
...:
Out[1]:
array([[ 0. , 1.41421356, 3.60555128],
[ 1.41421356, 0. , 2.23606798],
[ 3.60555128, 2.23606798, 0. ]])
Here, the (i, j)'th entry is the distance between the i'th and j'th vectors.
The case of the pairwise differences between angles is similar, but simpler, as you only have one dimension to deal with:
In [2]: a = np.array([10, 12, 15])
...: a[np.newaxis, :] - a[: , np.newaxis]
...:
Out[2]:
array([[ 0, 2, 5],
[-2, 0, 3],
[-5, -3, 0]])
Moreover, plt.scatter does not care that the results are given as matrices, and putting everything together using the notation of the question, you can obtain the plot of angles by distances by doing something like
vecs = np.stack((x, y)).T
f = np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
d = angle[np.newaxis, :] - angle[: , np.newaxis]
plt.scatter(f, d)
You have to use a for loop and range() to iterate over n, e.g. like like this:
n = len(x)
for i in range(n):
# do something with the current index
# e.g. print the points
print x[i]
print y[i]
But note that if you use i+1 inside the last iteration, this will already be outside of your list.
Also in your calculation there are errors. (x[n])-x[n+1:] does not work because x[n] is a single value in your list while x[n+1:] is a list starting from n+1'th element. You can not subtract a list from an int or whatever it is.
Maybe you will have to even use two nested loops to do what you want. I guess that you want to calculate the distance between each point so a two dimensional array may be the data structure you want.
If you are interested in all combinations of the points in x and y I suggest to use itertools, which will give you all possible combinations. Then you can do it like follows:
import itertools
f = [((x[i]-x[j])**2 + (y[i]-y[j])**2)**0.5 for i,j in itertools.product(255,255) if i!=j]
# and similar for the angles
But maybe there is even an easier way...
I a very new in Python. Please, apologize if the question is too simple. I have a function which return the standard deviation from the surrounding pixels of a principal pixel, something like*:
def sliding_window(arr, window_size):
""" Construct a sliding window view of the array"""
arr = np.asarray(arr)
window_size = int(window_size)
if arr.ndim != 2:
raise ValueError("need 2-D input")
if not (window_size > 0):
raise ValueError("need a positive window size")
shape = (arr.shape[0] - window_size + 1,
arr.shape[1] - window_size + 1,
window_size, window_size)
if shape[0] <= 0:
shape = (1, shape[1], arr.shape[0], shape[3])
if shape[1] <= 0:
shape = (shape[0], 1, shape[2], arr.shape[1])
strides = (arr.shape[1]*arr.itemsize, arr.itemsize,
arr.shape[1]*arr.itemsize, arr.itemsize)
return as_strided(arr, shape=shape, strides=strides)
def std(arr, i, j, d):
"""Return d-th neighbors of cell (i, j)"""
w = sliding_window(arr, 2*d+1)
ix = np.clip(i - d, 0, w.shape[0]-1)
jx = np.clip(j - d, 0, w.shape[1]-1)
i0 = max(0, i - d - ix)
j0 = max(0, j - d - jx)
i1 = w.shape[2] - max(0, d - i + ix)
j1 = w.shape[3] - max(0, d - j + jx)
return nu.std(w[ix, jx][i0:i1,j0:j1].ravel())
Now I want to apply this function to each element of the array and get as result an array with the same structure:
For example:
array = [[2,3,4,4],
[3,4,3,5],
[4,5,6,6],
[3,6,7,7]]
formula = [[std(array, 0,0,2), std(array, 1,0,2),std(array, 2,0,2),std(array, 3,0,2)],
[std(array, 0,1,2), std(array, 1,1,2),std(array, 2,1,2),std(array, 3,1,2)],
[std(array, 0,2,2), std(array, 1,2,2),std(array, 2,2,2),std(array, 3,2,2)],
[std(array, 0,3,2), std(array, 1,3,2),std(array, 2,3,2),std(array, 3,3,2)]]
result = [[0.70710678118654757, 0.9574271077563381, 1.1989578808281798, 1.0671873729054748],
[0.68718427093627676, 1.1331154474650633, 1.4624940645653537, 1.4229164972072998],
[0.8660254037844386, 1.1873172373979173, 1.5, 1.4409680388158819],
[0.68718427093627676, 1.0657403385139377, 1.35400640077266, 1.2570787221094177]]
I was trying to make a loop. Something like:
For k, v in array[i][j]:
sd(array, i,j, n)
But until now loops are very frustrating....I hope you can help me.
from here
You can do it with
array = [[2,3,4,4],
[3,4,3,5],
[4,5,6,6],
[3,6,7,7]]
print [[[std(array,x,i,2) for x in xrange(len(array[i]))] for i in xrange(len(array))]]
# Result
[[[0.70710678118654757, 0.9574271077563381, 1.1989578808281798, 1.0671873729054748], [0.68718427093627676, 1.1331154474650633, 1.4624940645653537, 1.4229164972072998], [0.8660254037844386, 1.1873172373979173, 1.5, 1.4409680388158819], [0.68718427093627676, 1.0657403385139377, 1.35400640077266, 1.2570787221094177]]]
It looks like you can use the scipy function 'ndimage.generic_filter'.
We give it a footprint (your window_size), and map a function (np.nanstd) onto a 1d array of each item which matches in that footprint.
Because we have to worry about borders, we can give use np.nanstd, and pad the array with nans (cval = np.nan).
import numpy as np
import scipy.ndimage as ndimage
results = np.empty(shape = (4,4), dtype = 'float') #to avoid type conversion
footprint = np.ones((2,2)) #change for your window size
#footprint[0,0] = 0 #not sure if this is intended
array = [[2,3,4,4],
[3,4,3,5],
[4,5,6,6],
[3,6,7,7]]
ndimage.generic_filter(array, np.nanstd, footprint=footprint, mode = 'constant', cval= np.nan, output = results, origin = -1)
results
array([[ 0.70710678, 0.5 , 0.70710678, 0.5 ],
[ 0.70710678, 1.11803399, 1.22474487, 0.5 ],
[ 1.11803399, 0.70710678, 0.5 , 0.5 ],
[ 1.5 , 0.5 , 0. , 0. ]])
The results are different from yours - not sure if I have something wrong in the understanding, or the footprint/origin/function.
How do I compute the derivative of an array, y (say), with respect to another array, x (say) - both arrays from a certain experiment?
e.g.
y = [1,2,3,4,4,5,6] and x = [.1,.2,.5,.6,.7,.8,.9];
I want to get dy/dx!
Use numpy.diff
If dx is constant
from numpy import diff
dx = 0.1
y = [1, 2, 3, 4, 4, 5, 6]
dy = diff(y)/dx
print dy
array([ 10., 10., 10., 0., 10., 10.])
dx is not constant (your example)
from numpy import diff
x = [.1, .2, .5, .6, .7, .8, .9]
y = [1, 2, 3, 4, 4, 5, 6]
dydx = diff(y)/diff(x)
print dydx
[10., 3.33333, 10. , 0. , 10. , 10.]
Note that this approximated "derivative" has size n-1 where n is your array/list size.
Don't know what you are trying to achieve but here are some ideas:
If you are trying to make numerical differentiation maybe finite differences formulation might help you better.
The solution above is like a first-order accuracy approximation for the forward schema of finite differences with a non-uniform grid/array.
use numpy.gradient()
Please be aware that there are more advanced way to calculate the numerical derivative than simply using diff. I would suggest to use numpy.gradient, like in this example.
import numpy as np
from matplotlib import pyplot as plt
# we sample a sin(x) function
dx = np.pi/10
x = np.arange(0,2*np.pi,np.pi/10)
# we calculate the derivative, with np.gradient
plt.plot(x,np.gradient(np.sin(x), dx), '-*', label='approx')
# we compare it with the exact first derivative, i.e. cos(x)
plt.plot(x,np.cos(x), label='exact')
plt.legend()
I'm assuming this is what you meant:
>>> from __future__ import division
>>> x = [.1,.2,.5,.6,.7,.8,.9]
>>> y = [1,2,3,4,4,5,6]
>>> from itertools import izip
>>> def pairwise(iterable): # question 5389507
... "s -> (s0,s1), (s2,s3), (s4, s5), ..."
... a = iter(iterable)
... return izip(a, a)
...
>>> for ((a, b), (c, d)) in zip(pairwise(x), pairwise(y)):
... print (d - c) / (b - a)
...
10.0
10.0
10.0
>>>
question 5389507 link
That is, define dx as the difference between adjacent elements in x.
numpy.diff(x) computes
the difference between adjacent elements in x
just like in the answer by #tsm.
As a result you get an array which is 1 element shorter than the original one. This of course makes sense, as you can only start computing the differences from the first index (1 "history element" is needed).
>>> x = [1,3,4,6,7,8]
>>> dx = numpy.diff(x)
>>> dx
array([2, 1, 2, 1, 1])
>>> y = [1,2,4,2,3,1]
>>> dy = numpy.diff(y)
>>> dy
array([ 1, 2, -2, 1, -2])
Now you can divide those 2 resulting arrays to get the desired derivative.
>>> d = dy / dx
>>> d
array([ 0.5, 2. , -1. , 1. , -2. ])
If for some reason, you need a relative (to the y-values) growth, you can do it the following way:
>>> d / y[:-1]
array([ 0.5 , 1. , -0.25 , 0.5 , -0.66666667])
Interpret as 50% growth, 100% growth, -25% growth, etc.
Full code:
import numpy
x = [1,3,4,6,7,8]
y = [1,2,4,2,3,1]
dx = numpy.diff(x)
dy = numpy.diff(y)
d = dy/dx