How to dynamically rename the hdf5 file from psychopy's iohub - python

I'm using the Psychopy 1.82.01 Coder and its iohub functionality (on Ubuntu 14.04 LTS). It is working but I was wondering if there is a way to dynamically rename the hdf5 file it produces during an experiment (such that in the end, I know which participant it belongs to and two participants will get two files without overwriting one of them).
It seems to me that the filename is determined in this file: https://github.com/psychopy/psychopy/blob/df68d434973817f92e5df78786da313b35322ae8/psychopy/iohub/default_config.yaml
But is there a way to change this dynamically?

If you want to create a different hdf5 file for each experiment run, then the options depend on how you are starting the ioHub process. Assuming you are using the psychopy.iohub.launchHubServer() function to start ioHub, then you can pass the 'experiment_code' kwarg to the function and that will be used as the hdf5 file name.
For example, if you created a script with the following code and ran it:
import psychopy.iohub as iohub
io = iohub.launchHubServer(experiment_code="exp_sess_1")
# your experiment code here ....
# ...
io.quit()
An ioHub hdf5 file called 'exp_sess_1.hdf5' will be created in the same folder as the script file.
As a side note, you do not have to save each experiment sessions data into a separate hdf5 file. The ioHub hdf5 file structure is designed to save multiple participants / sessions data in a single file. Each time the experiment is run, a unique session code is required, and the data from each run is saved in the hdf5 file with a session id that is associated with the session code.

Related

Script for post-processing vtu or pvd files

I am pretty new to post-processing using python scripts. But all these days I used ParaView to look at my results(images at different time steps). But as my mesh resolution increases the image of the next time step takes forever to load. Therefore, I would like to create a python script which can save the results at every timestep in image formats (png or jpeg) and also maybe merge the images as a video file.
I have a folder SavingsforParaview which contains a single .pvd file and 217 .vtu files, one for each time step. In ParaView, we load the pvd file and then visualize everything. Now, I would like to build a script to do the same. I don't want to use the inbuilt python script in ParaView, but create a separate file that I can run in a terminal using python commands.
The files can be found here.
https://filesender.renater.fr/?s=download&token=6aad92fb-dde3-41e0-966d-92284aa5884e
You can use the Python Trace, in Tool menu.
Usage:
start trace
use PV as usual (load files, setup filters and views, take screenshot ...)
stop trace
It generates the python version of your actions and display it. Then you can save it as a python file and manually modify it.
For instance, you can do the visu for the first 2 timesteps and then edit the trace file to add a loop and cover each timesteps.

How to version-control a set of input data along with its processing scripts?

I am working with a set of Python scripts that take data from an Excel file that is set up to behave as a pseudo-database. Excel is used instead of an SQL software due to compatibility and access requipements for other people I work with who aren't familiar with databases.
I have a set of about 10 tables with multiple records in each and relational keys linking them all (again in a pseudo-linking kind of way, using some flimsy data validation).
The scripts I am using are version controlled by Git, and I know the pitfalls of adding a .xlsx file to a repo, so I have kept it away. Since the data is a bit vulnerable, I want to make sure I have a way of keeping track of any changes we make to it. My thought was to have a script that breaks the Excel file into .csv tables and adds those to the repo, i.e.:
import pandas as pd
from pathlib import Path
excel_input_file = Path(r"<...>")
output_path = Path(r"<...>")
tables_dict = pd.read_excel(excel_input_file, sheet_name=None)
for i,x in tables_dict.items():
x.to_csv(output_path / (i+'.csv'), index=False)
Would this be a typically good method for keeping track of the input files at each stage?
Git tends to work better with text files rather than binary files, as you've noted, so this would be a better choice than just checking in an Excel file. Specifically, Git would be able to merge and diff these files, whereas they couldn't be merged natively by Git otherwise.
Typically the way that people handle this sort of situation is to take one or more plain text input files (e.g., CSV or SQL) and then build them into the usable output format (e.g., Excel or database) as part of the build or test step, depending on where they're needed. I've done similar things by using a Git fast-export dump to create test Git repositories, and it generally works well.
If you had just one input file, which you don't in this case, you could also use smudge and clean filters to turn the source file in the repository into a different format in the checkout. You can read about this with man gitattributes.

pyspark MLUtils saveaslibsvm saving only under _temporary and not saving on master

I use pyspark
And use MLUtils saveaslibsvm to save an RDD on labledpoints
It works but keeps that files in all the worker nodes under /_temporary/ as many files.
No error is thrown, i would like to save the files in the proper folder, and preferably saving all the output to one libsvm file that will be located on the nodes or on the master.
Is that possible?
edit
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
No matter what i do, i can't use MLUtils.loadaslibsvm() to load the libsvm data from the same path i used to save it. maybe something is wrong with writing the file?
This is a normal behavior for Spark. All writing and reading activities are performed in parallel directly from the worker nodes and data is not passed to or from driver node.
This why reading and writing should be performed using storage which can be accessed from each machine, like distributed file system, object store or database. Using Spark with local file system has very limited applications.
For testing you can can use network file system (it is quite easy to deploy) but it won't work well in production.

Saving File to Dictionary - Python

Is there any way to save a file to a dictionary under python?
(Indeed I am not asking how to export dictionaries to files here.)
Maybe a file could be pickled or transformed into me python object
and then saved.
Is this generally advisable?
Or should I only save the file's path to the dictionary?
How would I retrieve the file later on?
The background of my question relates to my usage of dictionaries as
databases. I use the handy little module sqlitshelf as a form of permanent dictionary: https://github.com/shish/sqliteshelf
Each dataset includes a unique config file (~500 kB) which is retrieved from an application. Upon opening of the respective data set the config file are copied into and back from the working directory of the application. I might use a folder instead where I save the config files to. Yet, it strikes me as more elegant to save them together with the other data.

can linux command line programs see python temporary files?

I have a simple web-server written using Python Twisted. Users can log in and use it to generate certain reports (pdf-format), specific to that user. The report is made by having a .tex template file where I replace certain content depending on user, including embedding user-specific graphs (.png or similar), then use the command line program pdflatex to generate the pdf.
Currently the graphs are saved in a tmp folder, and that path is then put into the .tex template before calling pdflatex. But this probably opens up a whole pile of problems when the number of users increases, so I want to use temporary files (tempfile module) instead of a real tmp folder. Is there any way I can make pdflatex see these temporary files? Or am I doing this the wrong way?
without any code it's hard to tell you how, but
Is there any way I can make pdflatex see these temporary files?
yes you can print the path to the temporary file by using a named temporary file:
>>> with tempfile.NamedTemporaryFile() as temp:
... print temp.name
...
/tmp/tmp7gjBHU
As commented you can use tempfile.NamedTemporaryFile. The problem is that this will be deleted once it is closed. That means you have to run pdflatex while the file is still referenced within python.
As an alternative way you could just save the picture with a randomly generated name. The tempfile is designed to allow you to create temporary files on various platforms in a consistent way. This is not what you need, since you'll always run the script on the same webserver I guess.
You could generate random file names using the uuid module:
import uuid
for i in xrange(3):
print(str(uuid.uuid4()))
The you save the pictures explictly using the random name and pass insert it into the tex-file.
After running pdflatex you explicitly have to delete the file, which is the drawback of that approach.

Categories