Background:
My program currently assembles arrays in Python. These arrays are connected to a front-end UI and as such have interactive elements (i.e. user specified values in array elements). These arrays are then saved to .txt files (depending on their later use). The user must then leave the Python program and run a separate Fortran script which simulates a system based on the Python output files. While this only takes a couple of minutes at most, I would ideally like to automate the process without having to leave my Python UI.
Assemble Arrays (Python) -> Edit Arrays (Python) -> Export to File (Python)
-> Import File (Fortran) -> Run Simulation (Fortran) -> Export Results to File (Fortran)
-> Import File to UI, Display Graph (Python)
Question:
Is this possible? What are my options for automating this process? Can I completely remove the repeated export/import of files altogether?
Edit:
I should also mention that the fortran script uses Lapack, I don't know if that makes a difference.
You do not have to pass arrays to Fortran code using text files. If you create an entry point to the Fortran code as a subroutine, you can pass all the numpy arrays using f2py. You should be aware of that if you added the f2py tag yourself. Just use any of the numerous tutorials, for example https://github.com/thehackerwithin/PyTrieste/wiki/F2Py or http://www.engr.ucsb.edu/~shell/che210d/f2py.pdf .
The way back is the same, the Fortran code just fills any of the intent(out) or intent(inout) arrays and variables with the results.
I love the Python+Fortran stack. :)
When needing close communication between your Python front-end and Fortran engine, a good option is to use the subprocess module in Python. Instead of saving the arrays to a text file, you'll keep them as arrays. Then you'll execute the Fortran engine as a subprocess within the Python script. You'll pipe the Python arrays into the Fortran engine and then pipe the results out to display.
This solution will require changing the file I/O in both the Python and Fortran codes to writing and reading to/from a pipe (on the Python side) and from/to standard input and output (on the Fortran side), but in practice this isn't too much work.
Good luck!
Related
I use nodejs to call a python script that runs object detection for some jpg images reading from the hard disk. These images are written to the disk by nodejs prior to calling the script.
To make it dynamic and faster, now I want to send multiple images as multidimensional array from nodejs to the python script. This saves me writing and reading images from disk. Is this the best way to do this? If so how do I pass images as multidimensional array to python script? Or Is there any better solution?
Your question leaves out a lot of specifics, but if you literally mean "pass input to a python script," you have two options: command line arguments or standard input.
While it is technically possible to read a series of images from standard input, it is certainly more complicated with no benefit to make it worth while.
Command line arguments can only be strings, read from sys.argv So, you while you could try to use multiple command line arguments, it would be more trouble than it's worth to translate an array of strings into a multidimensional array.
So, tl;dr create a data file format that lets you represent a set of images as a multidimensional array of file paths (or URLs, if you wanted). The easiest by far would be simply to use a CSV file and read it in with the csv Python module.
import csv
image_path_array = list(csv.reader(open('filename.csv','r')))
I wonder if anyone would guide me with the right direction to create a code which performs these operations:
The first code imports a series of libraries, performs a some of operations and generates some data (for example a pandas data frame)
The first code runs a second python script using as an argument this pandas dataframe array
-The second code imports a library and performs some operations independently of the objects and variables created by the first program
-The second program generates some data (a second data frame for example) and it stops
-The first program reads these data and works with it before closing
The question is then: how do you make a python script running a second script separately but transferring complex objects between then?
A bit of context :
I am using pyraf which is a python based library which works over IRAF. The latter is a very important astronomical library but with a very... Conflicting design... For example: If I run matplotlib before this a pyraf object I get crushes. Moreover, I work with eclipse and if I run a pyraf routine I also get errors. So I need to run the second script as if it were from the terminal independent of where the actual first code is actually working. I understand I can generate files which preserve the data from one code to another but I wonder if there is a cleaner approach
Thanks for your patience :)
I have a python script that needs to read a huge file into a var and then search into it and perform other stuff,
the problem is the web server calls this script multiple times and every time i am having a latency of around 8 seconds while the file loads.
Is it possible to make the file persist in memory to have faster access to it atlater times ?
I know i can make the script as a service using supervisor but i can't do that for this.
Any other suggestions please.
PS I am already using var = pickle.load(open(file))
You should take a look at http://docs.h5py.org/en/latest/. It allows to perform various operations on huge files. It's what the NASA uses.
Not an easy problem. I assume you can do nothing about the fact that your web server calls your application multiple times. In that case I see two solutions:
(1) Write TWO separate applications. The first application, A, loads the large file and then it just sits there, waiting for the other application to access the data. "A" provides access as required, so it's basically a sort of custom server. The second application, B, is the one that gets called multiple times by the web server. On each call, it extracts the necessary data from A using some form of interprocess communication. This ought to be relatively fast. The Python standard library offers some tools for interprocess communication (socket, http server) but they are rather low-level. Alternatives are almost certainly going to be operating-system dependent.
(2) Perhaps you can pre-digest or pre-analyze the large file, writing out a more compact file that can be loaded quickly. A similar idea is suggested by tdelaney in his comment (some sort of database arrangement).
You are talking about memory-caching a large array, essentially…?
There are three fairly viable options for large arrays:
use memory-mapped arrays
use h5py or pytables as a back-end
use an array caching-aware package like klepto or joblib.
Memory-mapped arrays index the array in file, as if there were in memory.
h5py or pytables give you fast access to arrays on disk, and also can avoid the load of the entire array into memory. klepto and joblib can store arrays as a collection of "database" entries (typically a directory tree of files on disk), so you can load portions of the array into memory easily. Each have a different use case, so the best choice for you depends on what you want to do. (I'm the klepto author, and it can use SQL database tables as a backend instead of files).
I have a MATLAB file that currently saves its variables into a .mat workspace. The python script uses SciPy.io to read these variables from the workspace. The python script performs some operations & resaves variables into a MATLAB workspace(agin using Scipy.io) which matlab should then reopen. I'm using MATLABR2013a and I dont think there's an easy way to run the python script from within the .m file itself.
There may be an easier way then the method I'm going about doing it but my current plan is to create a bash script that runs the matlab file and only proceeds to the latter section if a random variable (stored in another file) is of a certain value. The script then calls the python script, sets the random variable to a different (can view as a sort of boolean). The matlab script will then execute the second section but not the first section. I need to have about 5 or 6 such exclusive sections however and it's easier to have them all in the same .m file than it is to separate them
This seems tedious however when all I really want is a way to have the system pause the matlab script, run the python script and come back to that spot in the matlab script.
Appreciate all creative suggestions to make this workflow as efficient as possible and easy to modify
MATLAB code detailed below
I saved the workspace using MATLAB's save function
Used MATLAB's system() function to execute the python script.
Within python, used scipy.iosavemat to save variables I wanted to access in matlab
Used MATLAB's load function to load the variables from python back into matlab's workspace
writeto=['insert path to save to here']
save(writeto)
first_Pypath=['insert path of python script here']
py_call=horzcat('python ',first_Pypath);
system(py_call);
I'm working on a side project where we want to process images in a hadoop mapreduce program (for eventual deployment to Amazon's elastic mapreduce). The input to the process will be a list of all the files, each with a little extra data attached (the lat/long position of the bottom left corner - these are aerial photos)
The actual processing needs to take place in Python code so we can leverage the Python Image Library. All the Python streaming examples I can find use stdin and process text input. Can I send image data to Python through stdin? If so, how?
I wrote a Mapper class in Java that takes the list of files and saves the names, the extra data, and the binary contents to a sequence file. I was thinking maybe I need to write a custom Java mapper that takes in the sequence file and pipes it to Python. Is that the right approach? If so, what should the Java to pipe the images out and the Python to read them in look like?
In case it's not obvious, I'm not terribly familiar with Java OR Python, so it's also possible I'm just biting off way more than I can chew with this as my introduction to both languages...
There are a few possible approaches that I can see:
Use both the extra data and the file contents as input to your python program. The tricky part here will be the encoding. I frankly have no idea how streaming works with raw binary content, and I'm assuming that basic answer is "not well." The main issue is that the stdin/stdout communication between processes is very text-based, relying on delimiting input with tabs and newlines, and things like that. You would need to worry about the encoding of the image data, and probably have some sort of pre-processing step, or a custom InputFormat so that you could represent the image as text.
Use only the extra data and the file location as input to your python program. Then the program can independently read the actual image data from the file. The hiccup here is making sure that the file is available to the python script. Remember this is a distributed environment, so the files would have to be in HDFS or somewhere similar, and I don't know if there are good libraries for reading files from HDFS in python.
Do the java-python interaction yourself. Write a java mapper that uses the Runtime class to start the python process itself. This way you get full control over exactly how the two worlds communicate, but obviously its more code and a bit more involved.