I use nodejs to call a python script that runs object detection for some jpg images reading from the hard disk. These images are written to the disk by nodejs prior to calling the script.
To make it dynamic and faster, now I want to send multiple images as multidimensional array from nodejs to the python script. This saves me writing and reading images from disk. Is this the best way to do this? If so how do I pass images as multidimensional array to python script? Or Is there any better solution?
Your question leaves out a lot of specifics, but if you literally mean "pass input to a python script," you have two options: command line arguments or standard input.
While it is technically possible to read a series of images from standard input, it is certainly more complicated with no benefit to make it worth while.
Command line arguments can only be strings, read from sys.argv So, you while you could try to use multiple command line arguments, it would be more trouble than it's worth to translate an array of strings into a multidimensional array.
So, tl;dr create a data file format that lets you represent a set of images as a multidimensional array of file paths (or URLs, if you wanted). The easiest by far would be simply to use a CSV file and read it in with the csv Python module.
import csv
image_path_array = list(csv.reader(open('filename.csv','r')))
Related
I am running some numerical code in Python (using numpy) on a linux cluster that can be submitted as an array job. This means that the same code will run with thousands of parallel instances, each generating some data (using a random input) and saving it as a .npy file.
To avoid generating thousands of output files, I want to save all the data to the same file. However, I'm worried that if two instances of the code try to open the file simultaneously, it might be corrupted.
What is the best way to do this?
I got to know that file-like objects made in python from io.BytesIO or io.StringIO are not stored in the disk. Are they stored in the memory like variables? If not, where?
Also, is there a way to store them on the disk?
My code writes mp3 data to files which takes long time (30+ sec to write ~3.5mb). I planned to spilt the input data and write to files simultaneously. But I have no idea how to do this. Do I need to run multiple python scripts? I don't mind writing to different files and later read and edit the content. Can you point me to any references to start?
I am having trouble reading with mpi4py. I have opened the file and read it, however then I want to do manipulation in python with list. I get an error that the datatype (mpi4py) datatype object does not support. How to I read it into an object python supports, or convert it?
I use MPI.File_Seek(block_start) wherr block_start is nprocs/size then MPI.File_read. (after opening the file with MPI). It takes a buffer argument, im not certain what that is, but I use a btyearray of size block_start -block_end. I have figured out how to use python to then turn the bytearray into a sting and do the manipulation i need to, then turn it back into a bytearray and print. I am wondering if their is a more efficient way.
The task is to read financial tick data (date,price,volume) with date coming in on microseconds of form yyyymmdd:hh:mm:ss.ssssss, and to identify malformed lines. I succedded sequentially with a small file using python sequentially. We are required to use Python with mpi4py. The task seems simple, however mlst of us are very inexperienced programers (in fact I am taking my first programming course simultaneosly). However, we are not learning programming in the class,.
Background:
My program currently assembles arrays in Python. These arrays are connected to a front-end UI and as such have interactive elements (i.e. user specified values in array elements). These arrays are then saved to .txt files (depending on their later use). The user must then leave the Python program and run a separate Fortran script which simulates a system based on the Python output files. While this only takes a couple of minutes at most, I would ideally like to automate the process without having to leave my Python UI.
Assemble Arrays (Python) -> Edit Arrays (Python) -> Export to File (Python)
-> Import File (Fortran) -> Run Simulation (Fortran) -> Export Results to File (Fortran)
-> Import File to UI, Display Graph (Python)
Question:
Is this possible? What are my options for automating this process? Can I completely remove the repeated export/import of files altogether?
Edit:
I should also mention that the fortran script uses Lapack, I don't know if that makes a difference.
You do not have to pass arrays to Fortran code using text files. If you create an entry point to the Fortran code as a subroutine, you can pass all the numpy arrays using f2py. You should be aware of that if you added the f2py tag yourself. Just use any of the numerous tutorials, for example https://github.com/thehackerwithin/PyTrieste/wiki/F2Py or http://www.engr.ucsb.edu/~shell/che210d/f2py.pdf .
The way back is the same, the Fortran code just fills any of the intent(out) or intent(inout) arrays and variables with the results.
I love the Python+Fortran stack. :)
When needing close communication between your Python front-end and Fortran engine, a good option is to use the subprocess module in Python. Instead of saving the arrays to a text file, you'll keep them as arrays. Then you'll execute the Fortran engine as a subprocess within the Python script. You'll pipe the Python arrays into the Fortran engine and then pipe the results out to display.
This solution will require changing the file I/O in both the Python and Fortran codes to writing and reading to/from a pipe (on the Python side) and from/to standard input and output (on the Fortran side), but in practice this isn't too much work.
Good luck!
I'm working on a side project where we want to process images in a hadoop mapreduce program (for eventual deployment to Amazon's elastic mapreduce). The input to the process will be a list of all the files, each with a little extra data attached (the lat/long position of the bottom left corner - these are aerial photos)
The actual processing needs to take place in Python code so we can leverage the Python Image Library. All the Python streaming examples I can find use stdin and process text input. Can I send image data to Python through stdin? If so, how?
I wrote a Mapper class in Java that takes the list of files and saves the names, the extra data, and the binary contents to a sequence file. I was thinking maybe I need to write a custom Java mapper that takes in the sequence file and pipes it to Python. Is that the right approach? If so, what should the Java to pipe the images out and the Python to read them in look like?
In case it's not obvious, I'm not terribly familiar with Java OR Python, so it's also possible I'm just biting off way more than I can chew with this as my introduction to both languages...
There are a few possible approaches that I can see:
Use both the extra data and the file contents as input to your python program. The tricky part here will be the encoding. I frankly have no idea how streaming works with raw binary content, and I'm assuming that basic answer is "not well." The main issue is that the stdin/stdout communication between processes is very text-based, relying on delimiting input with tabs and newlines, and things like that. You would need to worry about the encoding of the image data, and probably have some sort of pre-processing step, or a custom InputFormat so that you could represent the image as text.
Use only the extra data and the file location as input to your python program. Then the program can independently read the actual image data from the file. The hiccup here is making sure that the file is available to the python script. Remember this is a distributed environment, so the files would have to be in HDFS or somewhere similar, and I don't know if there are good libraries for reading files from HDFS in python.
Do the java-python interaction yourself. Write a java mapper that uses the Runtime class to start the python process itself. This way you get full control over exactly how the two worlds communicate, but obviously its more code and a bit more involved.