Optimising drawing graphs from a model: code architecture

Optimising drawing graphs from a model: code architecture - python

OK, I have a question about how to lay out code efficiently.
I have a model written in python which generates results which I use to produce graphs in matplotlib. As written, the model is contained within a single file, and I have 15 other run-files, which call on it with complicated configurations and produce graphs. It takes a while to go through and run each of these run-files, but since they all use substantially different settings for the model, I need to have complicated setup files anyway, and it all works.
I have the output set up for figures which could go in an academic paper. I have now realised that I am going to need each of these figures again in other formats - one for presentations (low dpi, medium size, different font) and one for a poster (high dpi, much bigger, different font again.)
This means I could potentially have 45 odd files to wade through every time I want to make a change to my model. I also would have to cut and paste a lot of boilerplate matplotlib code with minor alterations (each run-file would become 3 different files - one for each graph).
Can anybody explain to me how (and if) I could speed things up? At the moment, I think it's taking me much longer than it should.
As I see it there are 3 main options:
Set up 3 run-files for each actual model run (so duplicate a fair amount, and run the model a lot more than I need) but I can then tweak everything independently (but risk missing something important).
Add another layer - so save the results as .csv or equivalent and then read them into the files for producing graphs. This means more files, but I only have to run the model once per 3 graphs (which might save some time).
Keep the graph and model parameter files integrated, but add another file which sets up graphing templates, so every time I run the file it spits out 3 graphs) It might speed things up a bit, and will certainly keep the number of files down, but they will get very big (and probably much more complicated).
Something else..
Can anybody point me to a resource or provide me with some advice on how best to handle this?
Thanks!

I think you are close to find what you want.
If calculations take some time, store results in files to process later without recalculation.
The most important: separate code from configuration, instead of copy pasting variations of such mixture.
If the model takes parameters, define a model class. Maybe instantiate the model only once, but the model knows how to load_config, read_input_data and run. Model also does write_results. That way you can loop a sequence of load_config, read_data, write_results for every config and maybe input data.
Write the config files by hand with ini format for example and use the confiparser module to load them.
Do something similar for your Graph class. Put the template definition in configuration files, including output format, sizes fonts, and so on.
In the end you will be able to "manage" the intended workflow with a single script that uses this facilites. Maybe store groups of related configuration files, output templates and input data together, one group per folder for each modelling session.

Related

I am unable to randomize blocks in my Psychopy study? What can I do?

:slight_smile:
What are you trying to achieve?:
I am currently doing a reaction time and accuracy task that involves comparing visually presented numerical information and auditory numerical information. The visually presented numerical information will be presented in three forms - Arabic numerals (e.g 5), number words (e.g five), and non-symbolic magnitude (a picture of 5 dots). Both visual numerical information and auditory numerical information will be presented sequentially. After the presentation of the 2nd stimulus, participants are to respond if these two stimuli are conveying the same information or not. They are supposed to press “a” if the numerical information is the same and “l” if it’s different.
Apart from varying the format of the visual numerical stimuli I am presenting, I also intend to vary the stimulus onset asynchrony (SOA)/time interval between the two stimuli. I have 7 levels of time intervals/SOAs (plus minus 750, 250 and 500, and 0ms), resulting in me creating my experiment in such a manner (see attached picture).
One set of fixation_cross and VA_750ms (for example) constitutes a block. Hence, in total, there are 7 blocks here (only 4 are pictured though). I have already randomized the trials within each block. The next step for me is to randomize the presentation of these blocks, with one block denoting one level of SOA/time interval (e.g +750ms). To do this, I’ve placed a loop around all the blocks, with this loop titled “blocknames” in the picture. While the experiment still works fine, randomization still doesn’t occur.
I understand that there was a post addressing the randomization of blocks, but I felt that it was more specific to experiments that only have one routine. This is not very feasible for my case considering that I would have to vary the time interval between two numerical stimuli within a trial.
What did you try to make it work?:
Nevertheless, I’ve tried to create an excel file with the names of the excel files in each condition - across all routines, the excel files actually contain the same information, but they’re just named differently according to what the condition name is (e.g AV500ms, VA750ms). In this case, the experiment still works, but the blocks are still not being randomized.
What specifically went wrong when you tried that?:
With the same excel file, I also tried to label my conditions as $condsFile instead of using the exact document location, but this was what I got instead.
At the same time, I was wondering if I could incorporate my SOA/time interval levels into Excel instead - how would this be carried out in Builder?
This might be some useful background info on my Psychopy software and laptop.
OS (e.g. Win10): Win 10
PsychoPy version (e.g. 1.84.x): 2020.1.3
Standard Standalone? (y/n) Yes,
I apologize if this might have been posted a few times. However, I’ve tried to apply these solutions according to what my experiment requires, but to no avail. I’m also quite a new user to Psychopy and am not very sure on how to proceed from here as well. Would really appreciate any advice on this!

This isn't really a programming question per se, as it can be addressed entirely by using the graphical Builder interface of PsychoPy. In the future, you should probably address such questions to the dedicated support forum at https://discourse.psychopy.org rather than here at Stack Overflow.
In essence, your experiment should have a much simpler structure. Embed your two trial routines within a trials loop. After that loop, insert your break routine. Lastly, embed the whole lot within an outer blocks loop. i.e. your experiment will show only three routines and two loops, not the very long structure you currently have. The nested loops means the two trial routines will run on very trial, while the break routine will run only once per block.
The key aspect to controlling the block order is the outer blocks loop. Connect it to a conditions file that looks like this:
condition_file
block_1.csv
block_2.csv
block_3.csv
block_4.csv
block_5.csv
block_6.csv
block_7.csv
And set the loop to be be "random".
In the inner trials loop, put the variable name $condition_file in the conditions file field. So you now will have the order of blocks randomised across your subjects.
The other key aspect you need to learn is to control more of the task using variables contained within each of your conditions files. e.g. you are currently creating a separate routine for each ISI value (e.g. AV500ms and AV750ms). Instead, you should just have a single routine, called say AV. Make the timings of the stimulus components within that routine be controlled by a variable from your conditions file.
A key principle of programming is DRY: Don't Repeat Yourself (and although you aren't directly programming, under the hood, PsychoPy Builder will generate a Python program for you). Creating multiple routines that differ only in one respect is an indicator that things are not being specified optimally. By having only one routine, if you need to alter it in some way, you only have to do it once, rather than repeat it 7 times. The latter approach is very fragile and hard to maintain, and can easily lead to errors.
There is a resource on controlling blocks of trials here:
https://www.psychopy.org/builder/blocksCounterbalance.html

Need fill_in_blank_1000_from_test_score.pkl file for NGNN Implementation, Can someone help me with it or its alternative?

While we are replicating the implementation of NGNN Dressing as a whole paper, I am stuck on one pickle file which is actually required to progress further i.e. fill_in_blank_1000_from_test_score.pkl.
Can someone help by sharing the same, else with its alternative?
Github implementation doesn't contain the same!
https://github.com/CRIPAC-DIG/NGNN

You're not supposed to use main_score.py (which requires the fill_in_blank_1000_from_test_score.pkl pickle), it's obsolete - only authors fail to mention this in the README. The problem was raised in this issue. Long story short, use another "main": main_multi_modal.py.
One of the comments explains in details how to proceed, I will copy it here so that it does not get lost:
Download the pre-processed dataset from the authors (the one on Google Drive, around 7 GB)
Download the "normal" dataset in order to get the text to the images (the one on Github, only a few MBs)
Change all the folder paths in the files to your corresponding ones
run "onehot_embedding.py" to create the textual features (The rest of the pre-processing was already done by the authors)
run "main_multi_modal.py" to train. In the end of the file you can adjust the config of the network (Beta, d, T etc.), so the file
"Config.py" is useless here.
If you want to train several instances in the for-loop, you need to reset the graph at the begining of the training. Just add
"tf.reset_default_graph()" at the start of the function "cm_ggnn()"
With this setup, I could reproduce the results fairly well with the
same accuracy as in the paper.

Blender, splitting an object into layers

I have an object that I used blenders "pixelate" (advanced) object function, this created what looked like a bunch (1000's) of duplications of a single cube.
having exported and then re-imported this resulted in a single object consisting of some 18,000 cubes.
This has had different materials added to many of the cubes.
The aim is to split the object into "layers" of all the cubes that are at the same height, while retaining their materials
I have tried a number of things like boolean operations, but that's been prohibitively slow and hasn't always kept the materials
In addition there are some 70+ layers, so manually creating the layers might be somewhat tedious....
ideally I'd like to write some kind of script that would filter out each layer at a time and export them (with materials) so they can be rendered as 2d images...
The python documentation for blender initially seems to be somewhat opaque probably due the the very large size of the API (where do you start!)
can anyone help with at least some of the steps I might need to write this script as I'm having problems gaining any kind of traction.

In the end I used a partially automated and partially manual method to do what I wanted thanks to #keltar I found the info window and the commands I needed picking one up from the python commands in the menu popup
>>> def dolayer(name):
... bpy.ops.mesh.select_linked(delimit={'SEAM'})
... bpy.ops.mesh.separate(type='SELECTED')
... bpy.data.objects[name].hide = True
...
>>> dolayer('Cube.018')
>>> dolayer('Cube.019')
I selected the next layer with box select being sure to turn off limit selection to visible! then I simply provide the dolayer function with what will me the new name for the split object (this lets you hide it) the up cursor key is your friend here!
the minimal amount of automation made it practical to separate out 72 layers into separate objects, this allows me to hide different layers and show only the ones I want for each step of the build....
I completely missed the info window, which should make scripting very much more accessible !

Stop images produced by pymc.Matplot.plot being saved

I recently started experimenting with pymc and only just realised that the images being produced by pymc.Matplot.plot, which I use to diagnose whether the MCMC has performed well, are being saved to disk. This results in images appearing wherever I am running my scripts from, and it is time consuming to clear them up. Is there a way to stop figures being saved to disk? I can't see anything clearly in the documentation.

There is currently no way to plot them without being saved to disk. I would recommend only plotting a few diagnostic parameters, and specifying plot=False for the others. That would at least cut down on the volume of plots being generated. There probably should be a saveplot argument, however, I agree.

Save data to disk to avoid re-computation

I'm a beginner with Python, but I'm at the final stages of a project I've been working on for the past year and I need help at the final step.
If needed I'll post my code though it's not really relevant.
Here is my problem:
I have a database of images, say for example a 100 images. On each one of those images, I run an algorithm called ICA. This algorithm is very heavy to compute and each picture individually usually takes 7-10 seconds, so 100 pictures can take 700-1000 seconds, and that's way too long to wait.
Thing is, my database of images never changes. I never add pictures or delete pictures, and so the output of the ICA algorithm is always the same. So in reality, every time I run my code, I wait forever and gain the same output every time.
Is there a way to save the data to the hard disk, and extract it at a later time?
Say, I compute the ICA of the 100 images, it takes forever, and I save it and close my computer. Now when I run the program, I don't want it to recompute ICA, I want it to use the values I stored previously.
Would such a thing be possible in Python? if so - how?

Since you're running computation-heavy algorithms, I'm going to assume you're using Numpy. If not, you should be.
Numpy has a numpy.save() function that lets you save arrays in binary format. You can then load them using numpy.load().
EDIT: Docs for aforementioned functions can be found here under the "NPZ Files" section.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.