Stop images produced by pymc.Matplot.plot being saved - python

I recently started experimenting with pymc and only just realised that the images being produced by pymc.Matplot.plot, which I use to diagnose whether the MCMC has performed well, are being saved to disk. This results in images appearing wherever I am running my scripts from, and it is time consuming to clear them up. Is there a way to stop figures being saved to disk? I can't see anything clearly in the documentation.

There is currently no way to plot them without being saved to disk. I would recommend only plotting a few diagnostic parameters, and specifying plot=False for the others. That would at least cut down on the volume of plots being generated. There probably should be a saveplot argument, however, I agree.

Related

Convert downloaded .pth checkpoint file for use in browser (TensorFlow)

Overall goal:
I used some Python code from mmpose which can identify animals in a picture, and then deduce their pose. Great. Now, my goal is to be able to bring this to the browser with TensorFlow.js. I understand this question might require many steps.
What I've managed so far:
I used the file top_down_img_demo_with_mmdet.py which came in the demo/ folder of mmpose. Detecting objects works like a charm, the key line being mmdet_results = inference_detector(det_model, image_name) (from mmdet.apis) which returns bounding boxes of what's found. Next, it runs inference_top_down_pose_model (from mmpose.apis) which returns an array of all the coordinates of key points on the animal. Perfect. From there, it draws out to a file. Now, shifting over to TensorFlow.js, I've included their COCO-SSD model, so I can get bounding boxes of animals. Works fine.
What I need help with:
As I understand it, to use the .pth file (big) used in the animal pose identification, it must be ported to another format (.pt, maybe with an intermediate onnx stop) and then loaded as a model in TensorFlow.js where it can run its pose-detection magic in-browser. Two problems: 1) most instructions seem to expect me to know data about the model, which I don't. Kernel size? Stride? Do I need this info? If so, how do I get it? 2) it's honestly not clear what my real end-goal should be. If I end up with a .pt file, is it a simple few lines to load it as a model in TensorFlow.js and run an image through it?
TL;DR: I've got a working Python program that finds animal pose using a big .pth file. How do I achieve the same in-browser (e.g. TensorFlow.js)
What didn't work
This top answer does not run, since "model" is not defined. Adding model = torch.load('./hrnet_w32_animalpose_256x256-1aa7f075_20210426.pth') still failed with AttributeError: 'dict' object has no attribute 'training'
This GitHub project spits out a tiny saved_model.pb file, less then 0.1% the size of the .pth file, so that can't be right.
This answer gave a huge wall of text, array values off my screen, which it said were weights anyway, not a new model file.
This article expects me to know the structure of the model.
Thank you all. Honestly, even comments about apparent misunderstandings I have about the process would be very valuable to me. Cheers.
Chris

Spyder does not release memory when re-running cells

Spyder's memory management is driving me crazy lately. I need to load large datasets (up to 10 GB) and preprocess them. Later on I perform some calculations / modelling (using sklearn) and plots on them. Spyder's cell functionality is perfect for this since it allows me to run the same calculations several times (using different parameters) without repeating the time-consuming preprocessing steps. However, I'm running into memory issues when repeatedly running cells for various reasons:
Re-running the same cell various times increases the memory consumption. I don't understand this since I'm not introducing new variables and previous (global) variables should just be overwritten.
Encapsulating variables in a function helps but not to the degree that one should expect. I have a strong feeling that the memory of local variables is often not released correctly (this also holds when returning any values with .copy() to avoid references to local variables).
Large matplotlib objects do not get recycled properly. Running gc.collect() can help sometimes but does not always clear all the used memory from plots.
When using the 'Clear all variables' button from the IPython console often only parts of the memory get released and several GBs of memory might still remain occupied.
Running %reset from the IPython console works better but does not always clear the memory completely neither (even when running import gc; gc.collect() afterwards).
The only thing that helps for sure is restarting the kernel. However, I don't like doing this since it removes all the output from the console.
Advice on any of the above points is appreciated, as well as some elaboration on the memory management of Spyder. I'm using multi-threading in my code and have a suspicion that some of the issues are related to that, even though I was not able to pinpoint the problem.

Optimising drawing graphs from a model: code architecture

OK, I have a question about how to lay out code efficiently.
I have a model written in python which generates results which I use to produce graphs in matplotlib. As written, the model is contained within a single file, and I have 15 other run-files, which call on it with complicated configurations and produce graphs. It takes a while to go through and run each of these run-files, but since they all use substantially different settings for the model, I need to have complicated setup files anyway, and it all works.
I have the output set up for figures which could go in an academic paper. I have now realised that I am going to need each of these figures again in other formats - one for presentations (low dpi, medium size, different font) and one for a poster (high dpi, much bigger, different font again.)
This means I could potentially have 45 odd files to wade through every time I want to make a change to my model. I also would have to cut and paste a lot of boilerplate matplotlib code with minor alterations (each run-file would become 3 different files - one for each graph).
Can anybody explain to me how (and if) I could speed things up? At the moment, I think it's taking me much longer than it should.
As I see it there are 3 main options:
Set up 3 run-files for each actual model run (so duplicate a fair amount, and run the model a lot more than I need) but I can then tweak everything independently (but risk missing something important).
Add another layer - so save the results as .csv or equivalent and then read them into the files for producing graphs. This means more files, but I only have to run the model once per 3 graphs (which might save some time).
Keep the graph and model parameter files integrated, but add another file which sets up graphing templates, so every time I run the file it spits out 3 graphs) It might speed things up a bit, and will certainly keep the number of files down, but they will get very big (and probably much more complicated).
Something else..
Can anybody point me to a resource or provide me with some advice on how best to handle this?
Thanks!
I think you are close to find what you want.
If calculations take some time, store results in files to process later without recalculation.
The most important: separate code from configuration, instead of copy pasting variations of such mixture.
If the model takes parameters, define a model class. Maybe instantiate the model only once, but the model knows how to load_config, read_input_data and run. Model also does write_results. That way you can loop a sequence of load_config, read_data, write_results for every config and maybe input data.
Write the config files by hand with ini format for example and use the confiparser module to load them.
Do something similar for your Graph class. Put the template definition in configuration files, including output format, sizes fonts, and so on.
In the end you will be able to "manage" the intended workflow with a single script that uses this facilites. Maybe store groups of related configuration files, output templates and input data together, one group per folder for each modelling session.

How to save large Python numpy datasets?

I'm attempting to create an autonomous RC car and my Python program is supposed to query the live stream on a given interval and add it to a training dataset. The data I want to collect is the array of the current image from OpenCV and the current speed and angle of the car. I would then like it to be loaded into Keras for processing.
I found out that numpy.save() just saves one array to a file. What is the best/most efficient way of saving data for my needs?
As with anything regarding performance or efficiency, test it yourself. The problem with recommendations for the "best" of anything is that they might change from year to year.
First, you should determine if this is even an issue you should be tackling. If you're not experiencing performance issues or storage issues, then don't bother optimizing until it becomes a problem. What ever you do, don't waste your time on premature optimizations.
Next, assuming it actually is an issue, try out every method for saving to see which one yields the smallest results in the shortest amount of time. Maybe compression is the answer, but that might slow things down? Maybe pickling objects would be faster? Who knows until you've tried.
Finally, weigh the trade-offs and decide which method you can compromise on; You'll almost never have one silver bullet solution. While your at it, determine if just adding more CPU, RAM or disk space at the problem would solve it. Cloud computing affords you a lot of headroom in those areas.
The most simple way is np.savez_compressed(). This saves any number of arrays using the same format as np.save() but encapsulated in a standard Zip file.
If you need to be able to add more arrays to an existing file, you can do that easily, because after all the NumPy ".npz" format is just a Zip file. So open or create a Zip file using zipfile, and then write arrays into it using np.save(). The APIs aren't perfectly matched for this, so you can first construct a StringIO "file", write into it with np.save(), then use writestr() in zipfile.

Save data to disk to avoid re-computation

I'm a beginner with Python, but I'm at the final stages of a project I've been working on for the past year and I need help at the final step.
If needed I'll post my code though it's not really relevant.
Here is my problem:
I have a database of images, say for example a 100 images. On each one of those images, I run an algorithm called ICA. This algorithm is very heavy to compute and each picture individually usually takes 7-10 seconds, so 100 pictures can take 700-1000 seconds, and that's way too long to wait.
Thing is, my database of images never changes. I never add pictures or delete pictures, and so the output of the ICA algorithm is always the same. So in reality, every time I run my code, I wait forever and gain the same output every time.
Is there a way to save the data to the hard disk, and extract it at a later time?
Say, I compute the ICA of the 100 images, it takes forever, and I save it and close my computer. Now when I run the program, I don't want it to recompute ICA, I want it to use the values I stored previously.
Would such a thing be possible in Python? if so - how?
Since you're running computation-heavy algorithms, I'm going to assume you're using Numpy. If not, you should be.
Numpy has a numpy.save() function that lets you save arrays in binary format. You can then load them using numpy.load().
EDIT: Docs for aforementioned functions can be found here under the "NPZ Files" section.

Categories