Construct 2 time series random variables with fixed correlation - python

Is there an easy way to generate two time-series with a fixed correlation? For instance 0.5.
Does anyone know a solution in R or Python?
Thanks!

This question is quite general, I think. It is not limited to just time-series. What you are asking is to generate 2d random variable, with known covariance. r==0.5, std1=1 and std2=2 would translate to a covariance matrix of [[1,1],[1,4]]. Therefore, if we assume the data is multidimensional normal distributed, we can generate such a random variable:
In [42]:
import numpy as np
val=np.random.multivariate_normal((0,0),[[1,1],[1,4]],1000)
In [43]:
np.corrcoef(val.T)
Out[43]:
array([[ 1. , 0.488883],
[ 0.488883, 1. ]])
In [44]:
np.cov(val.T)
Out[44]:
array([[ 1.03693888, 0.96490767],
[ 0.96490767, 3.75671707]])
In [45]:
val=np.random.multivariate_normal((0,0),[[1,1],[1,4]],10)
In [46]:
np.corrcoef(val.T)
Out[46]:
array([[ 1. , 0.56807297],
[ 0.56807297, 1. ]])
In [48]:
val[:,0]
Out[48]:
array([-0.77425116, 0.35758601, -1.21668939, -0.95127533, -0.5714381 ,
0.87530824, 0.9594394 , 1.30123373, 1.92511929, 0.98070711])
In [49]:
val[:,1]
Out[49]:
array([-1.75698285, 2.24011423, -3.5129411 , -1.33889305, 2.32720257,
0.53750133, 3.23935645, 2.96819425, -0.72551024, 3.0743096 ])
As shown in this example, if your sample size is small, the resulting random variable may deviate from r=0.5, considerably.

Related

Quickest way to calculate the average growth rate across columns of a numpy array

Given an array such as:
import numpy as np
a = np.array([[1,2,3,4,5],[6,7,8,9,10]])
What's the quickest way to calculate the growth rates of each row so that my results would be 0.52083333333333326, and 0.13640873015873009 respectively.
I tried using:
>>> np.nanmean(np.rate(1,0,-a[:-1],a[1:]), axis=0)
array([ 5. , 2.5 , 1.66666667, 1.25 , 1. ])
but of course it doesn't yield the right result and I don't know how to get the axis right for the numpy.rate function.
In [262]: a = np.array([[1,2,3,4,5],[6,7,8,9,10]]).astype(float)
In [263]: np.nanmean((a[:, 1:]/a[:, :-1]), axis=1) - 1
Out[263]: array([ 0.52083333, 0.13640873])
To take your approach using numpy.rate, you need to index into your a array properly (consider all rows separately) and use axis=1:
In [6]: np.nanmean(np.rate(1,0,-a[:,:-1],a[:,1:]), axis=1)
Out[6]: array([ 0.52083333, 0.13640873])

numpy get std between datasets

I have a dataset array A. A is n×2. It can be plotted on the x and y axis.
A[:,1] gets me all of the Y values ans A[:,0] gets me all the x values.
Now, I have a few other dataset arrays that are similar to A. X values are the same for these similar arrays. How do I calculate the standard deviation of the datasets? There should be a std value for each X. In the end my result std should have a length of n.
I can do this the manual way with loops but I'm not sure how to do this using NumPy in a pythonic and simple manner.
here are some sample data:
A=[[0,2.54],[1,254.5],[2,-43]]
B=[[0,3.34],[1,154.5],[2,-93]]
std_Array=[std(2.54,3.54),std(254.5,154.5),std(-43,-93)]
Suppose your arrays are all the same shape and they are in a list. Then to get the standard deviation of the first column of each you can do
arrays = [np.random.rand(10, 2) for _ in range(8)]
np.dstack(arrays).std(axis=0)[0]
This stacks the 2-D arrays into a 3-D array an then takes the std along the first axis, giving a 2 X 8 (the number of arrays). The first row of the result is the std. devs. of the 8 sets of x-values.
If you post some sample data perhaps we could help more.
Is this pythonic enough?
std_Array = numpy.std((A,B), axis = 0)[:,1]
li_arr = [np.array(x)[: , 1] for x in [A , B]]
This will produce numpy arrays with specifi columns you want to add the result will be
[array([ 2.54, 254.5 , -43. ]), array([ 3.34, 154.5 , -93. ])]
then you stack the values using column_stack
arr = np.column_stack(li_arr)
this will be the result stacking
array([[ 2.54, 3.34],
[ 254.5 , 154.5 ],
[ -43. , -93. ]])
and then finally
np.std(arr , axis = 1)

How to use numpy.frompyfunc to return an array of elements instead of array of arrays?

I am using the PLegendre function from the SHTOOLS package. It returns an array of Legendre polynomials for a particular argument. PLegendre(lmax,x) returns an array of Legendre polynomials P_0(x) to P_lmax(x). It works like this:
In [1]: from pyshtools import PLegendre
loading shtools documentation
In [2]: import numpy as np
In [3]: PLegendre(3,0.5)
Out[3]: array([ 1. , 0.5 , -0.125 , -0.4375])
I would like to pass an array as a parameter, so I use frompyfunc.
In [4]: legendre=np.frompyfunc(PLegendre,2,1)
In [5]: legendre(3,np.linspace(0,1,4))
Out[5]:
array([array([ 1. , 0. , -0.5, -0. ]),
array([ 1. , 0.33333333, -0.33333333, -0.40740741]),
array([ 1. , 0.66666667, 0.16666667, -0.25925926]),
array([ 1., 1., 1., 1.])], dtype=object)
The output is an array of arrays. I understand that I can create an array of elements from this by slicing the array.
In [6]: a=legendre(3,np.linspace(0,1,4))
In [7]: array([a[i][:] for i in xrange(4)])
Out[7]:
array([[ 1. , 0. , -0.5 , -0. ],
[ 1. , 0.33333333, -0.33333333, -0.40740741],
[ 1. , 0.66666667, 0.16666667, -0.25925926],
[ 1. , 1. , 1. , 1. ]])
But.. is there a way to get to this directly, instead of having to slice the array of arrays?
I think it can not be done directly as already pointed out here in the case of np.vectorize which does almost the same thing. Note that your code is not faster then an ordinary for loop by using np.frompyfunc ... the code only looks nicer.
However, what you can do is using np.vstack instead of the list comprehension
a = legendre(3,np.linspace(0,1,4))
np.vstack(a)
np.frompyfunc is compiled, so I'd have to dig into the source to see exactly what it is doing. But it appears to assume that the func output is an (inscrutable) Python object.
foo1 = np.frompyfunc(np.arange,1,1)
foo2 = np.vectorize(np.arange,otypes='O')
These 2 functions produce the same outputs, though foo1 is faster.
foo1(np.arange(4))
produces arrays with different sizes
array([array([], dtype=int32), array([0]), array([0, 1]), array([0, 1, 2])], dtype=object)
where as foo1(np.ones((4,)) are all the same, and in theory could be stacked.
There's no attempt, during the loop or after, to test whether the objects are arrays (or lists) and whether they can be combined into a single higher dimensional array.
plonser's use of vstack is a good idea. In fact frompyfunc plus vstack is faster than the more common list comprehension plus vstack.
In [54]: timeit np.vstack([np.arange(i) for i in 10*np.ones((10,))])
10000 loops, best of 3: 169 µs per loop
In [55]: timeit np.vstack(foo1(10*np.ones((10,))))
10000 loops, best of 3: 127 µs per loop

Efficient way of taking Logarithm function in a sparse matrix

I have a big sparse matrix. I want to take log4 for all element in that sparse matrix.
I try to use numpy.log() but it doesn't work with matrices.
I can also take logarithm row by row. Then I crush old row with a new one.
# Assume A is a sparse matrix (Linked List Format) with float values as data
# It is only for one row
import numpy as np
c = np.log(A.getrow(0)) / numpy.log(4)
A[0, :] = c
This was not as quick as I'd expected. Is there a faster way to do this?
You can modify the data attribute directly:
>>> a = np.array([[5,0,0,0,0,0,0],[0,0,0,0,2,0,0]])
>>> coo = coo_matrix(a)
>>> coo.data
array([5, 2])
>>> coo.data = np.log(coo.data)
>>> coo.data
array([ 1.60943791, 0.69314718])
>>> coo.todense()
matrix([[ 1.60943791, 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.69314718,
0. , 0. ]])
Note that this doesn't work properly if the sparse format has repeated elements (which is valid in the COO format); it'll take the logs individually, and log(a) + log(b) != log(a + b). You probably want to convert to CSR or CSC first (which is fast) to avoid this problem.
You'll also have to add checks if the sparse matrix is in a different format, of course. And if you don't want to modify the matrix in-place, just construct a new sparse matrix as you did in your answer, but without adding 3 because that's completely unnecessary here.
I think I solve it with very easy way. It is very strange that no one could answer immediately.
# Let A be a COO_matrix
import numpy as np
from scipy.sparse import coo_matrix
new_data = np.log(A.data+3)/np.log(4) #3 is not so important. It can be 1 too.
A = coo_matrix((new_data, (A.row, A.col)), shape=A.shape)

Euclidian Distances between points

I have an array of points in numpy:
points = rand(dim, n_points)
And I want to:
Calculate all the l2 norm (euclidian distance) between a certain point and all other points
Calculate all pairwise distances.
and preferably all numpy and no for's. How can one do it?
If you're willing to use SciPy, the scipy.spatial.distance module (the functions cdist and/or pdist) do exactly what you want, with all the looping done in C. You can do it with broadcasting too but there's some extra memory overhead.
This might help with the second part:
import numpy as np
from numpy import *
p=rand(3,4) # this is column-wise so each vector has length 3
sqrt(sum((p[:,np.newaxis,:]-p[:,:,np.newaxis])**2 ,axis=0) )
which gives
array([[ 0. , 0.37355868, 0.64896708, 1.14974483],
[ 0.37355868, 0. , 0.6277216 , 1.19625254],
[ 0.64896708, 0.6277216 , 0. , 0.77465192],
[ 1.14974483, 1.19625254, 0.77465192, 0. ]])
if p was
array([[ 0.46193242, 0.11934744, 0.3836483 , 0.84897951],
[ 0.19102709, 0.33050367, 0.36382587, 0.96880535],
[ 0.84963349, 0.79740414, 0.22901247, 0.09652746]])
and you can check one of the entries via
sqrt(sum ((p[:,0]-p[:,2] )**2 ))
0.64896708223796884
The trick is to put newaxis and then do broadcasting.
Good luck!

Categories