matplotlib error when running plotting in multiprocess - python

I am using python's Multiprocess.Pool to plot some data using multiple processes as follows:
class plotDriver:
def plot(self, parameterList):
numberOfWorkers = len(parameterList)
pool = Pool(numberOfWorkers)
pool.map(plotWorkerFunction, parameterList)
pool.close()
pool.join()
this is a simplified version of my class, the driver also contains other stuffs I choose to omit. The plotWorkderFunction is a single threaded function, which imports matplotlib and does all the plotting and setting figure styles and save the plots to one pdf file, and each worker is not interacting with the other.
I need to call this plot function multiple times since I have many parameterList, like following:
parameters = [parameterList0, parameterList1, ... parameterListn]
for param in parameters:
driver = PlotDriver()
driver.plot(param)
If parameters only contains one parameterList (the for loop only runs once), the code seems working fine. But it consistently fails whenever parameters contains more than one element, with the following error message happening on the second time in the loop.
Traceback (most recent call last):
File "plot.py", line 59, in <module>
plottingDriver.plot(outputFile_handle)
File "/home/yingryic/PlotDriver.py", line 69, in plot
pool.map(plotWrapper, workerParamList)
File "/home/yingryic/.conda/envs/pp/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func.iterable, chunksize).get()
File "/home/yingryic/.conda/envs/pp/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
RuntimeError: In set_text: could not load glyph
X Error: BadIDChoice (invalid resouce ID chosen for this connection) 14
Extension: 138 (RENDER)
Minor opcode: 17 (RenderCreateGlyphSet)
Resouce id: 0xe00002
: Fatal IO error: client killed
any idea what is going wrong and how should I fix?

You can try placing import matplotlib into plotWorkerFunction() so that child processes will have their own copy of the module.

Related

Multiprocessing array .get_lock works on one computer but not another

I am working a somewhat extensive python program that uses multiprocessing. Because I wanted the user to see some progress on the console when running the program, I read about using a shared counter on stackoverflow and after a while of playing around with my code, I got it to work. As I said it's too much code to post here, but the gist is that I instantiate a multiprocessing array after the name==main line,
if __name__ == "__main__":
total_progress_counter = Array('i',[0,0])
and then during the main portion of code I pass this array to a function in other module:
some_name.plot(<other variables>,
total_progress_counter=total_progress_counter)
Then within that other function, I used the .get_lock method that I found described here on stackoverflow:
with total_progress_counter.get_lock():
total_progress_counter[0] += self.total_panels_to_plot
I also update the other component, total_progress_counter[1], in the same function. This works fine for me on my work machine, where I wrote the code, and that machine has a Centos operating system.
But, when I run it on my personal MacBook it gives the following traceback:
Traceback (most recent call last):
File "./program.py", line 775, in <module>
program.run()
File "./program.py", line 177, in run
cases_plotted = pool.map(self.__plot__, all_cases)
File "/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/opt/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
AttributeError: 'list' object has no attribute 'get_lock'
I have python3 version 3.8.3 on my personal machine and python3 version 3.7.4 on my work machine. Can anyone help me understand why I'm getting different behavior on these two environments? I'd be grateful, as this is meant to be software others might use on different machines.

How do I get rid of attribute error in matplotlib animation

I want to turn a series of matplotlib figures into an animation. However, whatever I do, I always receive error. I use Enthought Canopy 1.6.2 and Python 2.7.13 on Windows 10.
I have tried using videofig package. While it was good, I could not manage to save the mp4 file. Also, I believe using the source directly, i.e., matplotlib animation package would be more versatile for future uses. I checked a few answers, including 1, 2 and 3, yet none of them solved my problem.
The function I call is structured as follows.
def some_plotter(self, path, start_value, image_array)
some_unrelated_fig_functions()
im=plt.savefig(path, animated=True)
image_array.append([im])
plt.close("all")
The main code is as follows:
import matplotlib.animation as animation
image_array=[]
while(something):
some_obj.some_plotter(path, start_value, image_array)
fig = plt.figure()
ani = animation.ArtistAnimation(fig, image_array, interval=50, blit=True, repeat_delay=1000)
I receive the following error:
Traceback (most recent call last):
File "C:\Users\kocac\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\matplotlib\cbook__init__.py", line 387, in process
proxy(*args, **kwargs)
File "C:\Users\kocac\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\matplotlib\cbook__init__.py", line 227, in call
return mtd(*args, **kwargs)
File "C:\Users\kocac\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\matplotlib\animation.py", line 1026, in _start
self._init_draw()
File "C:\Users\kocac\AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\matplotlib\animation.py", line 1557, in _init_draw
artist.set_visible(False)
AttributeError: 'NoneType' object has no attribute 'set_visible'
I had more similar lines to this, yet updating Matplotlib, following the suggestion at 3 reduced the error lines to 4. Yet I cannot proceed any longer. Note that saved images are perfectly fine so I am probably not doing anything wrong in the image creation.
How can I get rid of these errors? Where am I going wrong?

Serialize iterator object to be passed between processes in Python

I have a python script that calculates the eigenvalues of matrices from a list, and I would like to insert these eigenvalues into another collection in the same order as the original matrix and I would like to do this by spawning up multiple processes.
Here is my code:
import time
import collections
import numpy as NP
from scipy import linalg as LA
from joblib import Parallel, delayed
def computeEigenV(unit_of_work):
current_index = unit_of_work[0]
current_matrix = unit_of_work[1]
e_vals, e_vecs = LA.eig(current_matrix)
finished_unit = (current_index, lowEV[::-1])
return finished_unit
def run(work_list):
pool = Parallel( n_jobs = -1, verbose = 1, pre_dispatch = 'all')
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
return results
if __name__ == '__main__':
# create original array of matrices
original_matrix_list = []
work_list = []
#basic set up so we can run this test
for i in range(0, 100):
# generate the matrix & unit or work
matrix = NP.random.random_integers(0, 100, (500, 500))
#insert into respective resources
original_matrix_list.append(matrix)
for i, matrix in enumerate(original_matrix_list):
unit_of_work = [i, matrix]
work_list.append(unit_of_work)
work_result = run(work_list)
so work_result should hold all the eigenvalues from each matrix after all processes finish. And the iterator I am using is unit_of_work which is a list containing the index of the matrix (from the original_matrix_list) and the matrix itself.
The weird thing is, if I were to run this code by doing python matrix.py everything works perfectly. But when I use auto (a program that does calculations for differential equations?) to run my script, typing auto matrix.py gives me the following error:
Traceback (most recent call last):
File "matrix.py", line 50, in <module>
work_result = run(work_list)
File "matrix.py", line 27, in run
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 805, in __call__
while self.dispatch_one_batch(iterator):
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch
tasks = BatchedCalls(itertools.islice(iterator, batch_size))
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 69, in __init__
self.items = list(iterator_slice)
File "matrix.py", line 27, in <genexpr>
results = pool(delayed(computeEigenV)(unit_of_work) for unit_of_work in work_list)
File "/Library/Python/2.7/site-packages/joblib/parallel.py", line 162, in delayed
pickle.dumps(function)
TypeError: expected string or Unicode object, NoneType found
Note: when I ran this with auto I had to change if __name__ == '__main__': to if __name__ == '__builtin__':
I looked up this error and it seems like I am not serializing the iterator unit_of_work correctly when passing it around to different processes. I have then tried to use serialized_unit_of_work = pickle.dumps(unit_of_work), pass that around, and do pickle.loads when I need to use the iterator, but I still get the same error.
Can someone please help point me in the right direction as to how I can fix this? I hesitate to use pickle.dump(obj, file[, protocol]) because eventually I will be running this to calculate eigenvalues of thousands of matrices and I don't really want to create that many files to store the serialized iterator if possible.
Thanks!! :)
You can't pickle an iterator in python2.7 (but you can from 3.4 onward).
Also, pickling works differently in __main__ is different than when not in __main__, and it would seem that auto is doing something odd with __main__. What you often will observe when pickling fails on a particular object is that if instead of running the script with the object in it directly, you run a script as main which imports the portion of the script with the "difficult-to-serialize" object, then pickling will succeed. This is because the object will pickle by reference at a namespace level above where the "difficult" object lives… thus it's never directly pickled.
So, you can probably get away with pickling what you want, by adding a reference layer… a file import or a class. But, if you want to pickle an iterator, you are out of luck unless you move to at least python3.4.

Import text files into mysqldb

For a current assignment, I must import 2 .txt files into a MySQLdb with python. I'm having immense trouble. I have tried various methods and I simply can't do it.
I've searched through this site and many others over the past few days and I simply cannot get this to work. Whenever I've tried to adapt another person's solution to my own code, it fails - so I figure I should ask for help directly for my own code.
This is what I have so far:
import MySQLdb
# connect to database
mydb = MySQLdb.connect("localhost","root","0dy5seuS","cars_db")
# define the function
def data_entry(cars_for_sale):
# cursor creation
cursor = mydb.cursor()
#load the file 'cars_for_sale.txt' into the database under the table 'cars_for_sale'
sql = """LOAD DATA LOCAL INFILE 'cars_for_sale.TXT'
INTO TABLE cars_for_sale
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\r\n'"""
#execute the sql function above
cursor.execute(sql)
#commit to the database
mydb.commit()
#call data_entry(cars_for_sale) function
data_entry(cars_for_sale)
mydb.close()
I can hardly wrap my head around it, any help would be appreciated it.
I now get the following feedback from the testing function:
Trying:
data_entry("cars_for_sale") Expecting:
The number of rows inserted to cars_for_sale is 7049
**************************************** File "main", line 4, in main Failed example:
data_entry("cars_for_sale") Exception raised:
Traceback (most recent call last):
File "C:\Python27\lib\doctest.py", line 1289, in __run
compileflags, 1) in test.globs
File "", line 1, in
data_entry("cars_for_sale")
File "E:/Uni/104/Portfolio 2/MediumTask_DataStatistics/question/TEST2_data_statistics.py", line 270, in data_entry
data_entry(cars_for_sale) *it repeats this last portion several hundred/thousand times"
The following few lines are after the repeated error above.
File "C:\Python27\lib\site-packages\MySQLdb\connections.py", line
243, in cursor return (cursorclass or self.cursorclass)(self)
File "C:\Python27\lib\site-packages\MySQLdb\cursors.py", line 51, in
init from weakref import proxy RuntimeError: maximum recursion depth exceeded while calling a Python object
I'm aware that this is an infinite recursion although I have no idea how to stop it.
Thanks
The following code reproduces your error "RuntimeError: maximum recursion depth exceeded while calling a Python object":
def data_entry(cars_for_sale):
data_entry(cars_for_sale)
You don't need recursion here (and it is used incorrectly anyway).
I'm aware that this is an infinite recursion although I have no idea how to stop it.
Just remove the data_entry call inside the data_entry function.

RuntimeError: underlying C/C++ object has been deleted when saving and afterwards closing a pyplot figure

I ran into a python error that i have been trying to solve for several days now.
My program creates figures, saves and closes them which works fine except for this error. Usually it does not hinder the saving process but sometimes a picture is missing the lower part when saved. The odd thing is that this only happens every second time the loop reaches the savefig method, here is my code:
for num in np.arange(file_number):
plt.figure('abc' + str(num),figsize=(22,12),dpi=100)
#some plots are added to the figure
print 1
plt.savefig(os.path.join(savepath,filename),dpi=100)
print 2
plt.close()
print 3
I use the print commands to see where the error occurs. Here is the console output of spyder:
Reading file1.file
1
2
3
Reading file2.file
1
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/matplotlib/backends/backend_qt4.py", line 151, in <lambda>
lambda: self.close_event())
File "/usr/lib/pymodules/python2.7/matplotlib/backend_bases.py", line 1564, in close_event
self.callbacks.process(s, event)
RuntimeError: underlying C/C++ object has been deleted
2
3
Reading file3.file
1
2
3
Reading file4.file
1
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/matplotlib/backends/backend_qt4.py", line 151, in <lambda>
lambda: self.close_event())
File "/usr/lib/pymodules/python2.7/matplotlib/backend_bases.py", line 1564, in close_event
self.callbacks.process(s, event)
RuntimeError: underlying C/C++ object has been deleted
2
3
To my understanding, the error already occurs while saving the figure (every second time), although it works fine if i omit the close() command. In that case, my RAM is filled after about 70 files and sometimes i need to evaluate a couple of hundreds. That's why i need to include the close() command or something similar.
If you solve this (or improve my programing, i guess the way i did this saving and closing might be considered ugly) please help me.
How about change the backend to other options? For example:
import matplotlib as mpl
mpl.use( "agg" )
from matplotlib import pyplot as plt
import numpy as np
print plt.get_backend()
file_number = 100
for num in np.arange(file_number):
plt.figure('abc' + str(num),figsize=(22,12),dpi=100)
#some plots are added to the figure
print 1
plt.savefig("%d.png" % num,dpi=100)
print 2
plt.close()
print 3
I can't replicate your problem (partially because your example is not self contained), but I think you could look at going about solving the problem slightly different.
Since your figure definition (size, dpi, etc.) stays the same throughout the loop (and even if it didn't) you could look at producing just one figure, and updating it inside the loop:
import matplotlib as mpl
mpl.use( "tkagg" )
from matplotlib import pyplot as plt
import numpy as np
file_number = 1000
fig = plt.figure('abc', figsize=(22,12), dpi=100)
plt.show(block=False)
for num in np.arange(file_number):
fig.set_label('abc%s' % num)
# add an axes to the figure
ax = plt.axes()
#some plots are added to the figure (I just plotted a line)
plt.plot(range(num))
plt.savefig("%d.png" % num, dpi=100)
# draw the latest changes to the gui
plt.draw()
# remove the axes now that we have done what we want with it.
fig.delaxes(ax)
# put in a blocking show to wait for user interaction / closure.
plt.show()
Typically this isn't how you would do things (I would normally update the axes rather than add/remove one each time) but perhaps you have a good reason for doing it this way.
That should improve the performance significantly.

Categories