I'm a beginner. I recently see the Mandelbrot set which is fantastic, so I decide to draw this set with python.
But there is a problem,I got 'memoryerror' when I run this code.
This statement num_set = gen_num_set(10000) will produce a large list, about 20000*20000*4 = 1600000000. When I use '1000' instead of '10000', I can run code successfully.
My computer's memory is 4GB and the operating system is window7 32bit. I want to know if this problem is limit of my computer or there is some way to optimize my code.
Thanks.
#!/usr/bin/env python3.4
import matplotlib.pyplot as plt
import numpy as np
import random,time
from multiprocessing import *
def first_quadrant(n):
start_point = 1 / n
n = 2*n
return gen_complex_num(start_point,n,1)
def second_quadrant(n):
start_point = 1 / n
n = 2*n
return gen_complex_num(start_point,n,2)
def third_quadrant(n):
start_point = 1 / n
n = 2*n
return gen_complex_num(start_point,n,3)
def four_quadrant(n):
start_point = 1 / n
n = 2*n
return gen_complex_num(start_point,n,4)
def gen_complex_num(start_point,n,quadrant):
complex_num = []
if quadrant == 1:
for i in range(n):
real = i*start_point
for j in range(n):
imag = j*start_point
complex_num.append(complex(real,imag))
return complex_num
elif quadrant == 2:
for i in range(n):
real = i*start_point*(-1)
for j in range(n):
imag = j*start_point
complex_num.append(complex(real,imag))
return complex_num
elif quadrant == 3:
for i in range(n):
real = i*start_point*(-1)
for j in range(n):
imag = j*start_point*(-1)
complex_num.append(complex(real,imag))
return complex_num
elif quadrant == 4:
for i in range(n):
real = i*start_point
for j in range(n):
imag = j*start_point*(-1)
complex_num.append(complex(real,imag))
return complex_num
def gen_num_set(n):
return [first_quadrant(n), second_quadrant(n), third_quadrant(n), four_quadrant(n)]
def if_man_set(num_set):
iteration_n = 10000
man_set = []
z = complex(0,0)
for c in num_set:
if_man = 1
for i in range(iteration_n):
if abs(z) > 2:
if_man = 0
z = complex(0,0)
break
z = z*z + c
if if_man:
man_set.append(c)
return man_set
def plot_scatter(x,y):
#plt.plot(x,y)
color = ran_color()
plt.scatter(x,y,c=color)
plt.show()
def ran_num():
return random.random()
def ran_color():
return [ran_num() for i in range(3)]
def plot_man_set(man_set):
z_real = []
z_imag = []
for z in man_set:
z_real.append(z.real)
z_imag.append(z.imag)
plot_scatter(z_real,z_imag)
if __name__ == "__main__":
start_time = time.time()
num_set = gen_num_set(10000)
with Pool(processes=4) as pool:
#use multiprocess
set_part = pool.map(if_man_set, num_set)
man_set = []
for i in set_part:
man_set += i
plot_man_set(man_set)
end_time = time.time()
use_time = end_time - start_time
print(use_time)
You say you are creating a list with 1.6 billion elements. Each of those is a complex number which contains 2 floats. A Python complex number takes 24 bytes (at least on my system: sys.getsizeof(complex(1.0,1.0)) gives 24), so you'll need over 38GB just to store the values, and that's before you even start looking at the list itself.
Your list with 1.6 billion elements won't fit at all on a 32-bit system (6.4GB with 4 byte pointers), so you need to go to a 64-bit system with 8 byte pointers and at will need 12.8GB just for the pointers.
So, no way you're going to do that unless you upgrade to a 64-bit OS with maybe 64GB RAM (though it might need more).
When handling large data like this you should prefer using numpy arrays instead of python lists. There is a nice post explaining why (What are the advantages of NumPy over regular Python lists?), but I will try to sum it up.
In Python, each complex number in your list is an object (with methods and attributes) and takes up some overhead space for that. That is why they take up 24 bytes (as Duncan pointed out) instead of the 2 * 32bit for two floats per complex number.
Numpy arrays build on c-style arrays (basically all values written next to each other in memory as raw numbers, not objects). They don't provide some of the nice functionality of python lists (like appending) and are restricted to only one data type. They save a lot of space though, as you do not need to save the objects' overhead. This reduces the space needed for each complex number from 24 bytes to 8 bytes (two floats, 32bit each).
While Duncan is right and the big instance you tried will not run even with numpy, it might help you to process bigger instances.
As you have already imported numpy your could change you code to use numpy arrays instead. Please mind that I am not too proficient with numpy and there most certainly is a better way to do this, but this is an example with only little changes to your original code:
def gen_complex_num_np(start_point, n, quadrant):
# create a nxn array of complex numbers
complex_num = np.ndarray(shape=(n,n), dtype=np.complex64)
if quadrant == 1:
for i in range(n):
real = i*start_point
for j in range(n):
imag = j*start_point
# fill ony entry in the array
complex_num[i,j] = complex(real,imag)
# concatenate the array rows to
# get a list-like return value again
return complex_num.flatten()
...
Here your Python list is replaced with a 2d-numpy array with the data type complex. After the array has been filled it is flattened (all row vectors are concatenated) to mimic your return format.
Note that you would have to change the man_set lists in all other parts of your program accordingly.
I hope this helps.
Related
I'm making a script thats does some mathemagical morphology on images (mainly gis rasters). Now, I've implemented erosion and dilation, with opening/closing with reconstruction still on the TODO but thats not the subject here.
My implementation is very simple with nested loops, which I tried on a 10900x10900 raster and it took an absurdly long amount of time to finish, obviously.
Before I continue with other operations, I'd like to know if theres a faster way to do this?
My implementation:
def erode(image, S):
(m, n) = image.shape
buffer = np.full((m, n), 0).astype(np.float64)
for i in range(S, m - S):
for j in range(S, n - S):
buffer[i, j] = np.min(image[i - S: i + S + 1, j - S: j + S + 1]) #dilation is just np.max()
return buffer
I've heard about vectorization but I'm not quite sure I understand it too well. Any advice or pointers are appreciated. Also I am aware that opencv has these morphological operations, but I want to implement my own to learn about them.
The question here is do you want a more efficient implementation because you want to learn about numpy or do you want a more efficient algorithm.
I think there are two obvious things that could be improved with your approach. One is you want to avoid looping on the python level because that is slow. The other is that your taking a maximum of overlapping parts of arrays and you can make it more efficient if you reuse all the effort you put in finding the last maximum.
I will illustrate that with 1d implementations of erosion.
Baseline for comparison
Here is basically your implementation just a 1d version:
def erode(image, S):
n = image.shape[0]
buffer = np.full(n, 0).astype(np.float64)
for i in range(S, n - S):
buffer[i] = np.min(image[i - S: i + S + 1]) #dilation is just np.max()
return buffer
You can make this faster using stride_tricks/sliding_window_view. I.e. by avoiding the loops and doing that at the numpy level.
Faster Implementation
np.lib.stride_tricks.sliding_window_view(arr,2*S+1).min(1)
Notice that it's not quite doing the same since it only starts calculating values once there are 2S+1 values to take the maximum of. But for this illustration I will ignore this problem.
Faster Algorithm
A completely different approach would be to not start calculating the min from scratch but keeping the values ordered and only adding one and removing one when considering the next window one to the right.
Here is a ruff implementation of that:
def smart_erode(arr, m):
n = arr.shape[0]
sd = SortedDict()
for new in arr[:m]:
if new in sd:
sd[new] += 1
else:
sd[new] = 1
for to_remove,new in zip(arr[:-m+1],arr[m:]):
yield sd.keys()[0]
if new in sd:
sd[new] += 1
else:
sd[new] = 1
if sd[to_remove] > 1:
sd[to_remove] -= 1
else:
sd.pop(to_remove)
yield sd.keys()[0]
Notice that an ordered set wouldn't work and an ordered list would have to have a way to remove just one element with a specific value sind you could have repeated values in your array. I am using an ordered dict to store the amount of items present for a value.
A Ruff Benchmark
I want to illustrate how the 3 implementations compare for different window sizes. So I am testing them with an array of 10^5 random integers for different window sizes ranging from 10^3 to 10^4.
arr = np.random.randint(0,10**5,10**5)
sliding_window_times = []
op_times = []
better_alg_times = []
for m in np.linspace(0,10**4,11)[1:].astype('int'):
x = %timeit -o -n 1 -r 1 np.lib.stride_tricks.sliding_window_view(arr,2*m+1).min(1)
sliding_window_times.append(x.best)
x = %timeit -o -n 1 -r 1 erode(arr,m)
op_times.append(x.best)
x = %timeit -o -n 1 -r 1 tuple(smart_erode(arr,2*m+1))
better_alg_times.append(x.best)
print("")
pd.DataFrame({"Baseline Comparison":op_times,
'Faster Implementation':sliding_window_times,
'Faster Algorithm':better_alg_times,
},
index = np.linspace(0,10**4,11)[1:].astype('int')
).plot.bar()
Notice that for very small window sizes the raw power of the numpy implementation wins out but very quickly the amount of work we are saving by not calculating the min from scratch is more important.
I have a given array with a length of over 1'000'000 and values between 0 and 255 (included) as integers. Now I would like to plot on the x-axis the integers from 0 to 255 and on the y-axis the quantity of the corresponding x value in the given array (called Arr in my current code).
I thought about this code:
list = []
for i in range(0, 256):
icounter = 0
for x in range(len(Arr)):
if Arr[x] == i:
icounter += 1
list.append(icounter)
But is there any way I can do this a little bit faster (it takes me several minutes at the moment)? I thought about an import ..., but wasn't able to find a good package for this.
Use numpy.bincount for this task (look for more details here)
import numpy as np
list = np.bincount(Arr)
While I completely agree with the previous answers that you should use a standard histogram algorithm, it's quite easy to greatly speed up your own implementation. Its problem is that you pass through the entire input for each bin, over and over again. It would be much faster to only process the input once, and then write only to the relevant bin:
def hist(arr):
nbins = 256
result = [0] * nbins # or np.zeroes(nbins)
for y in arr:
if y>=0 and y<nbins:
result[y] += 1
return result
I have a piece of code I need to run, however, it is taking so long. I estimate it will take a minimum of 2 hours to run and at most 100 hours. As you can see, I would like to speed this up.
My code is as follows:
#B1
file = open("/home/1015097/Downloads/B1/tour5.txt", "r").readlines()
# This is a very large file. You can find it on http://codemasters.eng.unimelb.edu.au/bit_problems/B1.zip
# It is named tour5.txt, you can also find the question I am trying to solve there.
racers = [int(file[0].split()[0]) / float(x) for x in file[0].split()[1::]]
print("hi")
def av(race):
race = race.split()
j = 0
while j != len([float(race[0]) / float(x) for x in race[1::]]):
racers[j] = [float(race[0]) / float(x) for x in race[1::]][j] + racers[j]
j += 1
print(j)
print("yay...")
for i in range(1, len(file)):
print("yay")
av(file[i])
print("yaaay")
a = min(racers)
del(racers[racers.index(min(racers))])
b = min(racers)
c = b-a
h = int(c)
c-=h
m = int(c * 60)
c-=m/60
s = round(c * 60 * 60)
print(str(h) + "h" + str(m) + "m" + str(s) + "s")
We are currently coming first in Australia for the Code Bits contest and would not like to drop our perfect score. The random print statements were so that we could tell if the code was actually running, they are essentially checkpoints. the number that is printed out is the racer number, there are at least 3000 racers, we do not know exactly.
I would start with changing:
while j != len([float(race[0]) / float(x) for x in race[1::]]):
racers[j] = [float(race[0]) / float(x) for x in race[1::]][j] + racers[j]
into
while j != len(race) - 1:
racers[j] += float(race[0]) / float(race[j])
Avoid loops like the plague. Vectorize everything. Use numpy, etc. If you have to, even look into Cython. But most importantly, vectorize.
The av() function is probably the part where your code is taking the most time (btw, it would be a good idea to profile the code at various points, figure out the most taxing process and focus on vectorizing it). Also, try to minimize the number of initializations. If you can, create an object only once and for all.
Below is how I would change up the function.
import numpy as np
racers = np.array(racers)
def av(race, racers):
race = race.split()
race_float = np.array(len([float(race[0]) / float(x) for x in race[1::]]))
racers += race_float
return racers
Also, please refrain from:
Using print for debugging. You have a built-in logging module. Use it.
Using globals. Just pass them into functions as arguments and return the new object instead of directly modifying a global object.
I think you should be looking at numpy arrays instead of lists. Thereby you can avoid the loops and obtain close to c -speed. This problem is easily applicable to that. And by the way why not store anything in float64 or float32 so there is no conversion of datatypes. Code example not fully portable. It is a schoolwork and I should not do it for you:
import numpy as np
racer=np.array(racer) # Will work for 1 d lists and for 2-d lists with same lenght
racer_time=racer/time # diving a vector by a scalar is easy
I am running some simulations that take a numpy array as input, continue for several iterations until some condition is met (stochastically), and then repeat again using the same starting array. Each complete simulation starts with the same array and the number of steps to complete each simulation isn't known beforehand (but it's fine to put a limit on it, maxt).
I would like to save the array (X) after each step through each simulation, preferably in a large multi-dimensional array. Below I'm saving each simulation output in a list, and saving the array with copy.copy. I appreciate there are other methods I could use (using tuples for instance) so is there a more efficient method for doing this in Python?
Note: I appreciate this is a trivial example and that one could vectorize the code below. In the actual application being used, however, I have to use a loop as the stochasticity is introduced in a more complicated manner.
import numpy as np, copy
N = 10
Xstart = np.zeros(N)
Xstart[0] = 1
num_sims = 3
prob = 0.2
maxt = 20
analysis_result = []
for i in range(num_sims):
print("-------- Starting new simulation --------")
t = 0
X = copy.copy(Xstart)
# Create a new array to store results, save the array
sim_result = np.zeros((maxt, N))
sim_result[t,:] = X
while( (np.count_nonzero(X) < N) & (t < maxt) ):
print(X)
# Increment elements of the array stochastically
X[(np.random.rand(N) < prob)] += 1
# Save the array for time t
sim_result[t,:] = copy.copy(X)
t += 1
print(X)
analysis_result.append(sim_result[:t,:])
I read on wikipedia that python is a language that implements Row-major order. But I am trying to multiply 2 matrices where I access using a row-major approach and a column-major approach. But when I execute the code, the column-major approach is allways faster.
I'll post here a snippet of my code:
class Matrix:
#flag indicates the type of matrix
def __init__(self, num_lines, num_cols, flag):
self.num_lines = num_lines
self.num_cols = num_cols
if flag == 0:
# First Matrix
self.matrix = [1]*num_lines*num_cols
elif flag == 1:
# Second Matrix
self.matrix = [0]*num_lines*num_cols
for i in range(num_lines):
for j in range(num_cols):
self.matrix[i*num_cols+j] = i+1
elif flag == 2:
# Result Matrix
self.matrix = self.matrix = [0]*num_lines*num_cols
def setMatrixValue(self, line, column, value):
self.matrix[line*self.num_cols + column] = value
def getMatrixValue(self, line, column):
return self.matrix[line*self.num_cols + column]
def multiplyMatrices(num_lines, num_cols, flag):
matrix_a = Matrix(num_lines, num_cols, 0)
matrix_b = Matrix(num_cols, num_lines, 1)
matrix_result = Matrix(num_lines, num_lines, 2)
# Column-major approach
if flag == 0:
start_time = time.time()
for i in range(matrix_result.num_lines):
for j in range(matrix_result.num_cols):
temp = 0
for k in range(matrix_a.num_cols):
temp += matrix_a.getMatrixValue(i, k) * matrix_b.getMatrixValue(k, j)
matrix_result.setMatrixValue(i,j,temp)
# Row-major approach
elif flag == 1:
start_time = time.time()
for i in range(matrix_result.num_lines):
for k in range(matrix_result.num_cols):
for j in range(matrix_a.num_lines):
matrix_result.setMatrixValue(i,j, matrix_result.getMatrixValue(i,j) + (matrix_a.getMatrixValue(i,k) * matrix_b.getMatrixValue(k,i)))
end_time = time.time()
print matrix_result.matrix
diffTime(start_time, end_time)
And I just realised that matrices multiplication in python is so much slower than Java or C++. Is there any reason for that?
The behaviour you are expecting is caused by the advantage of accessing sequential memory addresses. So here
self.matrix[line*self.num_cols + column]
You would want to be incrementing column in the inner loop.
There are a few problems with this concept in pure Python. Since all the ints are objects - to access them, you are first getting a reference from the list sequentially (good), but then you need to access the memory of the reference (non sequential) to get the int object (bad).
Fortunately there are alternatives, such as numpy. Which additionally has well tested matrix routines you can use.
Python is slow because it checks the types of all objects at runtime to decide what operations to perform (e.g. which mult should it use when it sees '*'). Not only that but most abstractions on simple data types have a runtime cost as you marshall data in and out of the object.
You can mitigate some of these speed problems by using numpy. e.g.
import numpy
numpy.matrix([0, 1]) * numpy.matrix([[0], [1]])