Efficient Way to Recursively Multiply - python

I'm creating N_MC paths of simulated stock prices S with n points in each path, excluding the initial point. The algorithm to do so is recursive on the previous value of the stock price, for a given path. Here's what I have now:
import numpy as np
import time
N_MC = 1000
n = 10000
S = np.zeros((N_MC, n+1))
S0 = 1.0
S[:, 0] = S0
start_time_normals = time.clock()
Z = np.exp(np.random.normal(size=(N_MC, n)))
print "generate normals time = ", time.clock() - start_time_normals
start_time_prices = time.clock()
for i in xrange(N_MC):
for j in xrange(1, n+1):
S[i, j] = S[i, j-1]*Z[i, j-1]
print "pices time = ", time.clock() - start_time_prices
The times were:
generate normals time = 1.07
pices time = 9.98
Is there a much more efficient way to generate the arrays S, perhaps using Numpy's routines? It would be nice if the normal random variables Z could be generated more quickly, too, but I'm not as hopeful.

It's not necessary to loop over 'paths', because they're independent of each other. So, you can remove the outer loop for i in xrange(N_MC) and just operate on entire columns of S and Z.
For accelerating the recursive computation, let's just consider a single 'path'. Say z is vector containing the random values at each timestep (all known ahead of time). s is a vector that should contain the output at each timestep. s0 is the initial output at time zero. j is time.
Your code defines the ouput recursively:
s[j] = s[j-1]*z[j-1]
Let's expand this:
s[1] = s[0]*z[0]
s[2] = s[1]*z[1]
= s[0]*z[0]*z[1]
s[3] = s[2]*z[2]
= s[0]*z[0]*z[1]*z[2]
s[4] = s[3]*z[3]
= s[0]*z[0]*z[1]*z[2]*z[3]
Each output s[j] is given by s[0] times the product of the random values from 0 to j-1. You can calculate cumulative products like this using numpy.cumprod(), which should be much more efficient than looping:
s = np.concatenate(([s0], s0 * np.cumprod(z[0:-1])))
You can use the axis parameter for operating along one dimension of a matrix (e.g. for doing this in parallel across 'paths').

Related

How to do sampling based on the some conditions in parallel Python?

Assume I would like to do sampling in parallel based on a condition.
For example, give the matrix A. I want to sample the p pairs of indices (i,j) such that A[i][j] != 5
import numpy as np
import random
A = np.random.randint(10, size=(5000, 5000)) # assume this is fixed
p = 400 # sample 400 index
res = set()
cnt = 0
while cnt < p:
r, c = random.randint(0, A.shape[0]-1), random.randint(0, A.shape[0]-1)
if A[r, c] != 5 and (r,c) not in res:
res.add((r,c))
cnt += 1
Above is my attempt. However, the matrix A and the number of samples p can be very large. Can we do it in parallel? Like use joblib, multiprocessing? Or any fast way to obtain the row and col?
You can use Numba to speed up this code. Numba can generate fast (parallel) functions at runtime using a just-in-time compiler (JIT). Using a smaller datatype like np.int8 save some memory space and result in a faster execution time. Indeed, smaller arrays can be read/written faster from/into RAM. Moreover, they are more likely to fit in the CPU cache speeding up random access. While you can parallelize the random picking, this is quite hard and the creation of threads can be more expensive than the actual computation regarding the chosen parameters. Still, Numba can improve its speed by a large margin by just (mostly) removing the overhead of the Python interpreter.
Here is the resulting code:
# Initial conditions
import numba as nb
import numpy as np
import random
#nb.njit('int8[:,:](int_, int_)', parallel=True)
def genArray(n, m):
res = np.empty((n, m), dtype=np.int8)
# Parallel loop
for i in nb.prange(n):
for j in range(m):
res[i, j] = np.random.randint(10)
return res
p = 400
A = genArray(5000, 5000)
# Actual computing code
#nb.njit('(int8[:,::1], int_)')
def genPosSet(A, p):
maxi = A.shape[0]-1
res = set()
cnt = 0
while cnt < p:
r, c = random.randint(0, maxi), random.randint(0, maxi)
if A[r, c] != 5 and (r,c) not in res:
res.add((r,c))
cnt += 1
return res
res = genPosSet(A, p)
This implementation of genPosSet takes 64 us on my machine while the initial function takes 1350 us. The new implementation is thus 21 times faster.
Note that the time to create/delete threads (1 thread/core) and share the work between them takes usually from 10 us to 1000 us.
Note that if p is not much smaller than A.size * prob where prob is the probability to find a value different of 5, then the current algorithm is not very efficient. In this case, it is better to filter the values that are different of 5 before picking random locations. If p is not much smaller than A.size, then the best solution is to shuffle all the possible locations that can be picked and finally extract the p first values of the resulting list.

How to efficiently find separately for each element N maximum values among multiple matrices?

I am looping through a large number of H x W matrices. I cannot store them all in memory. I need to get N matrices. For example, the element of the 1st of N matrix in position (i, j) will be the largest among all elements in position (i, j) of all processed matrix matrices. For the second of the N matrix, the elements that are the second-largest will be taken, and so on.
Example.
Let N = 2. Then the 1st matrix will look like this.
And the second matrix is like this.
How to do such an operation inside a loop so as not to store all matrices in memory?
The comments suggested using the np.partition function. I replaced the use of numpy with cupy, which uses the GPU. And also added a buffer to sort less frequently.
import cupy as np
buf = // # As much as fits into the GPU
largests = np.zeros((buf + N, h, w))
for i in range(num):
val = //
largests[i % buf] = val
if i % buf == buf - 1:
largests.partition(range(buf, buf + N), axis=0)
largests.partition(range(buf, buf + N), axis=0) # Let's not forget the tail
res = largests[:-(N + 1):-1]
The solution does not work very quickly, but I have come to terms with this speed.

How to minimize code when there are lot of lists?

I'm making a code to simulate a Brownian motion.
from random import random
import matplotlib.pyplot as plt
import numpy as np
N=100
p=0.5
l=1
x1=[]
x2=[]
x1.append(0)
x2.append(0)
for i in range(1, N):
step = -l if random() < p else l
X1 = x1[i-l] + step
x1.append(X1)
for i in range(1, N):
step = -l if random() < p else l
X2 = x2[i-l] + step
x2.append(X2)
x1mean=np.array(x1)
x2mean=np.array(x2)
mean=[]
for j in range (0,N):
mean.append((x1mean[j]+x2mean[j])/2.0)
plt.plot(mean)
plt.plot(x1)
plt.plot(x2)
plt.show()
This code makes the displacement for 2 diferent particles, but in order to calculate the mean displacement properly, I would need to have a great number of particles, likes 100. As you can see, I'm looking for a way to condensate the code because I cannot repetat the same code 100 times.
Is there a way to create a loop that makes all this code in function of 1 variable, i.e. the number of particles?
Thanks.
I can't provide you a working python code, because until now I did not write a single line of python code. But I can give you an idea how to solve your problem.
Assumptions:
N : Number of Moves
P : Number of Particles
Step 1:
Create a method generating your array/list and returning it. So you can re-use it and avoid copying your code.
def createParticleMotion(N, p, l):
x1=[]
x1.append(0)
for i in range(1, N):
step = -l if random() < p else l
X1 = x1[i-l] + step
x1.append(X1)
return x1
Step 2:
Create a list of lists, lets call it particleMotions. The list it selves has P list of your N moves. Fill the list within a for loop for you number of particles P by calling the method from the first step and append the list paticleMotions by the returned list/array.
May be the answer for Python: list of lists will help you creating this.
Step 3:
After you created and filled particleMotions use this list within a double for loop and calculate the mean and store it in a list of means.
mean=[]
for n in range (0,N):
sum=0
for p in range (0,P):
sum = sum + particleMotions[p][n]
mean.append(sum/P)
And now you can use a next for loop to plot your result.
for particle in range (0,P):
plt.plot(particleMotions[particle])
So again don't blame me for syntax errors. I am no phyton developer. I just want to give you a way to solve your problem.
This?
from random import random
import matplotlib.pyplot as plt
import numpy as np
N=100
p=0.5
l=1
mydict = {}
for n in range(100):
mydict[n] = []
mydict[n].append(0)
for i in range(1, N):
step = -l if random() < p else l
X1 = mydict[n][i-l] + step
mydict[n].append(X1)
for k,v in mydict.iteritems():
plt.plot(v)
# mean
plt.plot([np.mean(i) for i in mydict.values()])
plt.show()

Fast algorithm to compute Adamic-Adar

I'm working on graph analysis. I want to compute an N by N similarity matrix that contains the Adamic Adar similarity between every two vertices. To give an overview of Adamic Adar let me start with this introduction:
Given the adjacency matrix A of an undirected graph G. CN is the set of all common neighbors of two vertices x, y. A common neighbor of two vertices is one where both vertices have an edge/link to, i.e. both vertices will have a 1 for the corresponding common neighbor node in A. k_n is the degree of node n.
Adamic-Adar is defined as the following:
My attempt to compute it is to fetch both rows of the x and y nodes from A and then sum them. Then look for the elements that has 2 as the value and then gets their degrees and apply the equation. However computing that takes really really a long of time. I tried with a graph that contains 1032 vertices and it took a lot of time to compute. It started with 7 minutes and then I cancelled the computations. So my question: is there a better algorithm to compute it?
Here's my code in python:
def aa(graph):
"""
Calculates the Adamic-Adar index.
"""
N = graph.num_vertices()
A = gts.adjacency(graph)
S = np.zeros((N,N))
degrees = get_degrees_dic(graph)
for i in xrange(N):
A_i = A[i]
for j in xrange(N):
if j != i:
A_j = A[j]
intersection = A_i + A_j
common_ns_degs = list()
for index in xrange(N):
if intersection[index] == 2:
cn_deg = degrees[index]
common_ns_degs.append(1.0/np.log10(cn_deg))
S[i,j] = np.sum(common_ns_degs)
return S
Since you're using numpy, you can really cut down on your need to iterate for every operation in the algorithm. my numpy- and vectorized-fu aren't the greatest, but the below runs in around 2.5s on a graph with ~13,000 nodes:
def adar_adamic(adj_mat):
"""Computes Adar-Adamic similarity matrix for an adjacency matrix"""
Adar_Adamic = np.zeros(adj_mat.shape)
for i in adj_mat:
AdjList = i.nonzero()[0] #column indices with nonzero values
k_deg = len(AdjList)
d = np.log(1.0/k_deg) # row i's AA score
#add i's score to the neighbor's entry
for i in xrange(len(AdjList)):
for j in xrange(len(AdjList)):
if AdjList[i] != AdjList[j]:
cell = (AdjList[i],AdjList[j])
Adar_Adamic[cell] = Adar_Adamic[cell] + d
return Adar_Adamic
unlike MBo's answer, this does build the full, symmetric matrix, but the inefficiency (for me) was tolerable, given the execution time.
I believe you are using rather slow approach. It would better to revert it -
- initialize AA (Adamic-Adar) matrix by zeros
- for every node k get it's degree k_deg
- calc d = log(1.0/k_deg) (why log10 - is it important or not?)
- add d to all AAij, where i,j - all pairs of 1s in kth row
of adjacency matrix
Edit:
- for sparse graphs it is useful to extract positions of all 1s in kth row to the list to reach O(V*(V+E)) complexity instead of O(V^3)
AA = np.zeros((N,N))
for k = 0 to N - 1 do
AdjList = []
for j = 0 to N - 1 do
if A[k, j] = 1 then
AdjList.Add(j)
k_deg = AdjList.Length
d = log(1/k_deg)
for j = 0 to AdjList.Length - 2 do
for i = j+1 to AdjList.Length - 1 do
AA[AdjList[i],AdjList[j]] = AA[AdjList[i],AdjList[j]] + d
//half of matrix filled, it is symmetric for undirected graph
I don't see a way of reducing the time complexity, but it can be vectorized:
degrees = A.sum(axis=0)
weights = np.log10(1.0/degrees)
adamic_adar = (A*weights).dot(A.T)
With A a regular Numpy array. It seems you're using graph_tool.spectral.adjacency and thus A would be a sparse matrix. In that case the code would be:
from scipy.sparse import csr_matrix
degrees = A.sum(axis=0)
weights = csr_matrix(np.log10(1.0/degrees))
adamic_adar = A.multiply(weights) * A.T
This is much faster than using Python loops. A small warning though: with this approach you really need to make sure that the values on the main diagonal (of A and adamic_adar) are what you expect them to be. Also, A must not contain weights, but only zeros and ones.
I believe there most be a function like the one defined in R igraph in its python_igraph as well for the node similarity (Adamic_Adar as well)

Not sure how to integrate negative number function in data generating algorithm?

I’m having a bit of trouble controlling the results from a data generating algorithm I am working on. Basically it takes values from a list and then lists all the different combinations to get to a specific sum. So far the code works fine(haven’t tested scaling it with many variables yet), but I need to allow for negative numbers to be include in the list.
The way I think I can solve this problem is to put a collar on the possible results as to prevent infinity results(if apples is 2 and oranges are -1 then for any sum, there will be an infinite solutions but if I say there is a limit of either then it cannot go on forever.)
So Here's super basic code that detects weights:
import math
data = [-2, 10,5,50,20,25,40]
target_sum = 100
max_percent = .8 #no value can exceed 80% of total(this is to prevent infinite solutions
for node in data:
max_value = abs(math.floor((target_sum * max_percent)/node))
print node, "'s max value is ", max_value
Here's the code that generates the results(first function generates a table if its possible and the second function composes the actual results. Details/pseudo code of the algo is here: Can brute force algorithms scale? ):
from collections import defaultdict
data = [-2, 10,5,50,20,25,40]
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
for c in range(s / x + 1):
if T[s - c * x, i]:
T[s, i+1] = True
coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
# /* Base case: If we've assigned all the variables correctly, list this
# * solution.
# */
if k == 0:
# print what we have so far
print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
return
x_k = data[k-1]
# /* Recursive step: Try all coefficients, but only if they work. */
for c in range(sum // x_k + 1):
if T[sum - c * x_k, k - 1]:
# mark the coefficient of x_k to be c
coeff[k-1] = c
RecursivelyListAllThatWork(k - 1, sum - c * x_k)
# unmark the coefficient of x_k
coeff[k-1] = 0
RecursivelyListAllThatWork(len(data), target_sum)
My problem is, I don't know where/how to integrate my limiting code to the main code inorder to restrict results and allow for negative numbers. When I add a negative number to the list, it displays it but does not include it in the output. I think this is due to it not being added to the table(first function) and I'm not sure how to have it added(and still keep the programs structure so I can scale it with more variables).
Thanks in advance and if anything is unclear please let me know.
edit: a bit unrelated(and if detracts from the question just ignore, but since your looking at the code already, is there a way I can utilize both cpus on my machine with this code? Right now when I run it, it only uses one cpu. I know the technical method of parallel computing in python but not sure how to logically parallelize this algo)
You can restrict results by changing both loops over c from
for c in range(s / x + 1):
to
max_value = int(abs((target_sum * max_percent)/x))
for c in range(max_value + 1):
This will ensure that any coefficient in the final answer will be an integer in the range 0 to max_value inclusive.
A simple way of adding negative values is to change the loop over s from
for s in range(target_sum + 1):
to
R=200 # Maximum size of any partial sum
for s in range(-R,R+1):
Note that if you do it this way then your solution will have an additional constraint.
The new constraint is that the absolute value of every partial weighted sum must be <=R.
(You can make R large to avoid this constraint reducing the number of solutions, but this will slow down execution.)
The complete code looks like:
from collections import defaultdict
data = [-2,10,5,50,20,25,40]
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
R=200 # Maximum size of any partial sum
max_percent=0.8 # Maximum weight of any term
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(-R,R+1): #set the range of one higher than sum to include sum itself
max_value = int(abs((target_sum * max_percent)/x))
for c in range(max_value + 1):
if T[s - c * x, i]:
T[s, i+1] = True
coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
# /* Base case: If we've assigned all the variables correctly, list this
# * solution.
# */
if k == 0:
# print what we have so far
print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
return
x_k = data[k-1]
# /* Recursive step: Try all coefficients, but only if they work. */
max_value = int(abs((target_sum * max_percent)/x_k))
for c in range(max_value + 1):
if T[sum - c * x_k, k - 1]:
# mark the coefficient of x_k to be c
coeff[k-1] = c
RecursivelyListAllThatWork(k - 1, sum - c * x_k)
# unmark the coefficient of x_k
coeff[k-1] = 0
RecursivelyListAllThatWork(len(data), target_sum)

Categories