I want to store certain values in a 2D array. In the below code. I want sT to be total. When the inner loop runs the values to be stored in rows and then next column when the outer loop increment happens.
class pricing_lookback:
def __init__(self,spot,rate,sigma,time,sims,steps):
self.spot = spot
self.rate = rate
self.sigma = sigma
self.time = time
self.sims = sims
self.steps = steps
self.dt = self.time/self.steps
def call_floatingstrike(self):
simulationS = np.array([])
simulationSt = np.array([])
call2 = np.array([])
total = np.empty(shape=[self.steps, self.sims])
for j in range(self.sims):
sT = self.spot
pathwiseminS = np.array([])
for i in range(self.steps):
phi= np.random.normal()
sT *= np.exp((self.rate-0.5*self.sigma*self.sigma)*self.dt + self.sigma*phi*np.sqrt(self.dt))
pathwiseminS = np.append(pathwiseminS, sT)
np.append(total,[[j,sT]])###This should store values in rows of j column
#print (pathwiseminS)
#tst1 = np.append(tst1, pathwiseminS[1])
call2 = np.append(call2, max(pathwiseminS[self.steps-1]-self.spot,0))
#print (pathwiseminS[self.steps-1])
#print(call2)
simulationSt = np.append(simulationSt,pathwiseminS[self.steps-1])
simulationS = np.append(simulationS,min(pathwiseminS))
call = max(np.average(simulationSt) - np.average(simulationS),0)
return call, total#,call2,
Here is a simple example of what I think you are trying to do:
for i in range(5):
row = np.random.rand(5,)
if i == 0:
my_array = row
else:
my_array = np.vstack((my_array, row))
print(row)
However, this is not very efficient with memory, especially if you are dealing with large arrays, as this has to allocate new memory on every loop. It would be much better to preallocate an empty array and then populate it if possible.
To answer the question of how to append a column, it would be something like this:
import numpy as np
x = np.random.rand(5, 4)
column_to_append = np.random.rand(5,)
np.insert(x, x.shape[1], column_to_append, axis=1)
Again, this is not memory efficient and should be avoided whenever possible. Preallocation is much better.
Related
I'm using NumPy to store data into matrices.
I'm struggling to make the below Python code perform better.
RESULT is the data store I want to put the data into.
TMP = np.array([[1,1,0],[0,0,1],[1,0,0],[0,1,1]])
n_row, n_col = TMP.shape[0], TMP.shape[0]
RESULT = np.zeros((n_row, n_col))
def do_something(array1, array2):
intersect_num = np.bitwise_and(array1, array2).sum()
union_num = np.bitwise_or(array1, array2).sum()
try:
return intersect_num / float(union_num)
except ZeroDivisionError:
return 0
for i in range(n_row):
for j in range(n_col):
if i >= j:
continue
RESULT[i, j] = do_something(TMP[i], TMP[j])
I guess it would be much faster if I could use some NumPy built-in function instead of for-loops.
I was looking for the various questions around here, but I couldn't find the best fit for my problem.
Any suggestion? Thanks in advance!
Approach #1
You could do something like this as a vectorized solution -
# Store number of rows in TMP as a paramter
N = TMP.shape[0]
# Get the indices that would be used as row indices to select rows off TMP and
# also as row,column indices for setting output array. These basically correspond
# to the iterators involved in the loopy implementation
R,C = np.triu_indices(N,1)
# Calculate intersect_num, union_num and division results across all iterations
I = np.bitwise_and(TMP[R],TMP[C]).sum(-1)
U = np.bitwise_or(TMP[R],TMP[C]).sum(-1)
vals = np.true_divide(I,U)
# Setup output array and assign vals into it
out = np.zeros((N, N))
out[R,C] = vals
Approach #2
For cases with TMP holding 1s and 0s, those np.bitwise_and and np.bitwise_or would be replaceable with dot-products and as such could be faster alternatives. So, with those we would have an implementation like so -
M = TMP.shape[1]
I = TMP.dot(TMP.T)
TMP_inv = 1-TMP
U = M - TMP_inv.dot(TMP_inv.T)
out = np.triu(np.true_divide(I,U),1)
I have a CSV that looks like this:
0.500187550,CPU1,7.93
0.500187550,CPU2,1.62
0.500187550,CPU3,7.93
0.500187550,CPU4,1.62
1.000445359,CPU1,9.96
1.000445359,CPU2,1.61
1.000445359,CPU3,9.96
1.000445359,CPU4,1.61
1.500674877,CPU1,9.94
1.500674877,CPU2,1.61
1.500674877,CPU3,9.94
1.500674877,CPU4,1.61
The first column is time, the second the CPU used and the third is energy.
As a final result I would like to have these arrays:
Time:
[0.500187550, 1.000445359, 1.500674877]
Energy (per CPU): e.g. CPU1
[7.93, 9.96, 9.94]
For parsing the CSV I'm using:
query = csv.reader(csvfile, delimiter=',', skipinitialspace=True)
#Arrays global time and power:
for row in query:
x = row[0]
x = float(x)
x_array.append(x) #column 0 to array
y = row[2]
y = float(y)
y_array.append(y) #column 2 to array
print x_array
print y_array
These way I get all the data from time and energy into two arrays: x_array and y_array.
Then I order the arrays:
energy_core_ord_array = []
time_ord_array = []
#Dividing array into energy and time per core:
for i in range(number_cores[0]):
e = 0 + i
for j in range(len(x_array)/(int(number_cores[0]))):
time_ord = x_array[e]
time_ord_array.append(time_ord)
energy_core_ord = y_array[e]
energy_core_ord_array.append(energy_core_ord)
e = e + int(number_cores[0])
And lastly, I cut the time array into the lenghts it should have:
final_time_ord_array = []
for i in range(len(x_array)/(int(number_cores[0]))):
final_time_ord = time_ord_array[i]
final_time_ord_array.append(final_time_ord)
Till here, although the code is not elegant, it works.
The problem comes when I try to get the array for each core.
I get it for the first core, but when I try to iterate for the next one, I donĀ“t know how to do it, and how can I store each array in a variable with a single name for example.
final_energy_core_ord_array = []
#Trunk energy core array:
for i in range(len(x_array)/(int(number_cores[0]))):
final_energy_core_ord = energy_core_ord_array[i]
final_energy_core_ord_array.append(final_energy_core_ord)
So using Pandas (library to handle dataframes in Python) you can do something like this, which is much quicker than trying to process the CSV manually like you're doing:
import pandas as pd
csvfile = "C:/Users/Simon/Desktop/test.csv"
data = pd.read_csv(csvfile, header=None, names=['time','cpu','energy'])
times = list(pd.unique(data.time.ravel()))
print times
cpuList = data.groupby(['cpu'])
cpuEnergy = {}
for i in range(len(cpuList)):
curCPU = 'CPU' + str(i+1)
cpuEnergy[curCPU] = list(cpuList.get_group('CPU' + str(i+1))['energy'])
for k, v in cpuEnergy.items():
print k, v
that will give the following as output:
[0.50018755000000004, 1.000445359, 1.5006748769999998]
CPU4 [1.6200000000000001, 1.6100000000000001, 1.6100000000000001]
CPU2 [1.6200000000000001, 1.6100000000000001, 1.6100000000000001]
CPU3 [7.9299999999999997, 9.9600000000000009, 9.9399999999999995]
CPU1 [7.9299999999999997, 9.9600000000000009, 9.9399999999999995]
Finally I got the answer, using globals.... not a great idea, but works, leave it here if someone find it useful.
final_energy_core_ord_array = []
#Trunk energy core array:
a = 0
for j in range(number_cores[0]):
for i in range(len(x_array)/(int(number_cores[0]))):
final_energy_core_ord = energy_core_ord_array[a + i]
final_energy_core_ord_array.append(final_energy_core_ord)
globals()['core%s' % j] = final_energy_core_ord_array
final_energy_core_ord_array = []
a = a + 12
print 'Final time and cores:'
print final_time_ord_array
for j in range(number_cores[0]):
print globals()['core%s' % j]
I have a list of 100k items and each item has a list of indices. I am trying to put this into a boolean sparse matrix for vector multiplication. My code isn't running as fast as I would like, so I am looking for performance tips or maybe alternative approaches for getting this data into a matrix.
rows = []
cols = []
for i, item in enumerate(items):
indices = item.getIndices()
rows += [i]*len(indices)
cols += indices
data = np.ones(len(rows), dtype='?')
mat = coo_matrix(data,(rows,cols)),shape=(len(items),totalIndices),dtype='?')
mat = mat.tocsr()
There wind up being 800k items in the rows/cols lists and just the extending of those lists seems to be taking up 16% and 13% of the building time. Converting to the coo_matrix then takes up 12%. Enumeration is taking up 13%. I got these stats from line_profiler and I am using python 3.3.
The best I can do is:
def foo3(items,totalIndices):
N = len(items)
cols=[]
cnts=[]
for item in items:
indices = getIndices(item)
cols += indices
cnts.append(len(indices))
rows = np.arange(N).repeat(cnts) # main change
data = np.ones(rows.shape, dtype=bool)
mat = sparse.coo_matrix((data,(rows,cols)),shape=(N,totalIndices))
mat = mat.tocsr()
return mat
For 100000 items it's only a 50% increase in speed.
A lot of sparse matrix algorithms run twice through the data, once to figure out the size of the sparse matrix, the other to fill it in with the right values. So perhaps it is worth trying something like this:
total_len = 0
for item in items:
total_len += len(item.getIndices())
rows = np.empty((total_len,), dtype=np.int32)
cols = np.empty((total_len,), dtype=np.int32)
total_len = 0
for i, item in enumerate(items):
indices = item.getIndices()
len_ = len(indices)
rows[total_len:total_len + len_] = i
cols[total_len:total_len + len_] = indices
total_len += len_
Followed by the same you are currently doing. You can also build the CSR matrix directly, avoiding the COO one, which will save some time as well. After the first run to find out the total size you would do:
indptr = np.empty((len(items) + 1,), dtype=np.int32)
indptr[0] = 0
indices = np.empty((total_len,), dtype=np.int32)
for i, item in enumerate(items):
item_indices = item.getIndices()
len_ = len(item_indices)
indptr[i+1] = indptr[i] + len_
indices[indptr[i]:indptr[i+1]] = item_indices
data = np.ones(total_len,), dtype=np.bool)
mat = csr_matrix((data, indices, indptr))
I'm trying to optimize the following code, potentially by rewriting it in Cython: it simply takes a low dimensional but relatively long numpy arrays, looks into of its columns for 0 values, and marks those as -1 in an array. The code is:
import numpy as np
def get_data():
data = np.array([[1,5,1]] * 5000 + [[1,0,5]] * 5000 + [[0,0,0]] * 5000)
return data
def get_cols(K):
cols = np.array([2] * K)
return cols
def test_nonzero(data):
K = len(data)
result = np.array([1] * K)
# Index into columns of data
cols = get_cols(K)
# Mark zero points with -1
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
import time
t_start = time.time()
data = get_data()
for n in range(5000):
test_nonzero(data)
t_end = time.time()
print (t_end - t_start)
data is the data. cols is the array of columns of data to look for non-zero values (for simplicity, I made it all the same column). The goal is to compute a numpy array, result, which has a 1 value for each row where the column of interest is non-zero, and -1 for the rows where the corresponding columns of interest have a zero.
Running this function 5000 times on a not-so-large array of 15,000 rows by 3 columns takes about 20 seconds. Is there a way this can be sped up? It appears that most of the work goes into finding the nonzero elements and retrieving them with indices (the call to nonzero and subsequent use of its index.) Can this be optimized or is this the best that can be done?
How could a Cython implementation gain speed on this?
cols = np.array([2] * K)
That's going to be really slow. That's create a very large python list and then converts it into a numpy array. Instead, do something like:
cols = np.ones(K, int)*2
That'll be way faster
result = np.array([1] * K)
Here you should do:
result = np.ones(K, int)
That will produce the numpy array directly.
idx = np.nonzero(data[np.arange(K), cols] == 0)[0]
result[idx] = -1
The cols is an array, but you can just pass a 2. Furthermore, using nonzero adds an extra step.
idx = data[np.arange(K), 2] == 0
result[idx] = -1
Should have the same effect.
data is a matrix containing 2500 time series of a measurment. I need to average each time series over time, discarding data points that were recorded around a spike (in the interval tspike-dt*10... tspike+10*dt). The number of spiketimes is variable for each neuron and stored in a dictionary with 2500 entries. My current code iterates over neurons and spiketimes and sets the masked values to NaN. Then bottleneck.nanmean() is called. However this code is to slow in the current version, and I am wondering wheater there is a faster solution. thanks!
import bottleneck
import numpy as np
from numpy.random import rand, randint
t = 1
dt = 1e-4
N = 2500
dtbin = 10*dt
data = np.float32(ones((N, t/dt)))
times = np.arange(0,t,dt)
spiketimes = dict.fromkeys(np.arange(N))
for key in spiketimes:
spiketimes[key] = rand(randint(100))
means = np.empty(N)
for i in range(N):
spike_times = spiketimes[i]
datarow = data[i]
if len(spike_times) > 0:
for spike_time in spike_times:
start=max(spike_time-dtbin,0)
end=min(spike_time+dtbin,t)
idx = np.all([times>=start,times<=end],0)
datarow[idx] = np.NaN
means[i] = bottleneck.nanmean(datarow)
The vast majority of the processing time in your code comes from this line:
idx = np.all([times>=start,times<=end],0)
This is because for each spike, you are comparing every value in times against start and end. Since you have uniform time steps in this example (and I presume this is true in your data as well), it is much faster to simply compute the start and end indexes:
# This replaces the last loop in your example:
for i in range(N):
spike_times = spiketimes[i]
datarow = data[i]
if len(spike_times) > 0:
for spike_time in spike_times:
start=max(spike_time-dtbin,0)
end=min(spike_time+dtbin,t)
#idx = np.all([times>=start,times<=end],0)
#datarow[idx] = np.NaN
datarow[int(start/dt):int(end/dt)] = np.NaN
## replaced this with equivalent for testing
means[i] = datarow[~np.isnan(datarow)].mean()
This reduces the run time for me from ~100s to ~1.5s.
You can also shave off a bit more time by vectorizing the loop over spike_times. The effect of this will depend on the characteristics of your data (should be most effective for high spike rates):
kernel = np.ones(20, dtype=bool)
for i in range(N):
spike_times = spiketimes[i]
datarow = data[i]
mask = np.zeros(len(datarow), dtype=bool)
indexes = (spike_times / dt).astype(int)
mask[indexes] = True
mask = np.convolve(mask, kernel)[10:-9]
means[i] = datarow[~mask].mean()
Instead of using nanmean you could just index the values you need and use mean.
means[i] = data[ (times<start) | (times>end) ].mean()
If I misunderstood and you do need your indexing, you might try
means[i] = data[numpy.logical_not( np.all([times>=start,times<=end],0) )].mean()
Also in the code you probably want to not use if len(spike_times) > 0 (I assume you remove the spike time at each iteration or else that statement will always be true and you'll have an infinite loop), only use for spike_time in spike_times.