Storing file-like objects and writing to multiple files - python

I got to know that file-like objects made in python from io.BytesIO or io.StringIO are not stored in the disk. Are they stored in the memory like variables? If not, where?
Also, is there a way to store them on the disk?
My code writes mp3 data to files which takes long time (30+ sec to write ~3.5mb). I planned to spilt the input data and write to files simultaneously. But I have no idea how to do this. Do I need to run multiple python scripts? I don't mind writing to different files and later read and edit the content. Can you point me to any references to start?

Related

Multiple instances of python saving to the same file

I am running some numerical code in Python (using numpy) on a linux cluster that can be submitted as an array job. This means that the same code will run with thousands of parallel instances, each generating some data (using a random input) and saving it as a .npy file.
To avoid generating thousands of output files, I want to save all the data to the same file. However, I'm worried that if two instances of the code try to open the file simultaneously, it might be corrupted.
What is the best way to do this?

How to store all files universally in memory (variable) Python

Essentially, I would want to be able to go through a folder with text files, jpg files, csv files, png files, any kind of file, and be able to load it into memory as some kind of object. When necessary, I would then like to be able to save it and create an instance on disk. This would need to work for any kind of file type.
I would create a class that would contain the file data itself as well as metadata, but that is not necessary for my question.
Is this possible ,and if so, how can I do this?

How to pass multiple images as input to python script

I use nodejs to call a python script that runs object detection for some jpg images reading from the hard disk. These images are written to the disk by nodejs prior to calling the script.
To make it dynamic and faster, now I want to send multiple images as multidimensional array from nodejs to the python script. This saves me writing and reading images from disk. Is this the best way to do this? If so how do I pass images as multidimensional array to python script? Or Is there any better solution?
Your question leaves out a lot of specifics, but if you literally mean "pass input to a python script," you have two options: command line arguments or standard input.
While it is technically possible to read a series of images from standard input, it is certainly more complicated with no benefit to make it worth while.
Command line arguments can only be strings, read from sys.argv So, you while you could try to use multiple command line arguments, it would be more trouble than it's worth to translate an array of strings into a multidimensional array.
So, tl;dr create a data file format that lets you represent a set of images as a multidimensional array of file paths (or URLs, if you wanted). The easiest by far would be simply to use a CSV file and read it in with the csv Python module.
import csv
image_path_array = list(csv.reader(open('filename.csv','r')))

Better way to store a set of files with arrays?

I've accumulated a set of 500 or so files, each of which has an array and header that stores metadata. Something like:
2,.25,.9,26 #<-- header, which is actually cryptic metadata
1.7331,0
1.7163,0
1.7042,0
1.6951,0
1.6881,0
1.6825,0
1.678,0
1.6743,0
1.6713,0
I'd like to read these arrays into memory selectively. We've built a GUI that lets users select one or multiple files from disk, then each are read in to the program. If users want to read in all 500 files, the program is slow opening and closing each file. Therefore, my question is: will it speed up my program to store all of these in a single structure? Something like hdf5? Ideally, this would have faster access than the individual files. What is the best way to go about this? I haven't ever dealt with these types of considerations. What's the best way to speed up this bottleneck in Python? The total data is only a few MegaBytes, I'd even be amenable to storing it in the program somewhere, not just on disk (but don't know how to do this)
Reading 500 files in python should not take much time, as the overall file size is around few MB. Your data-structure is plain and simple in your file chunks, it ll not even take much time to parse I guess.
Is the actual slowness is bcoz of opening and closing file, then there may be OS related issue (it may have very poor I/O.)
Did you timed it like how much time it is taking to read all the files.?
You can also try using small database structures like sqllite. Where you can store your file data and access the required data in a fly.

how long can i store data in cPickle?

I'm storing a list of dictionaries in cPickle, but need to be able to add and remove to/from it occasionally. If I store the dictionary data in cPickle, is there some sort of limit on when I will be able to load it again?
You can store it for as long as you want. It's just a file. However, if your data structures start becoming complicated, it can become tedious and time consuming to unpickle, update and pickle the data again. Also, it's just file access so you have to handle concurrency issues by yourself.
No. cPickle just writes data to files and reads it back; why would you think there would be a limit?
cPickle is just a faster implementation of pickle. You can use it to convert a python object to its string equivalent and retrieve it back by unpickling.
You can do one of the two things with a pickled object:
Do not write to a file
In this case, the scope of your pickled data is similar to that of
any other variable.
Write to a file
We can write this pickled data to a file and read it whenever we
want and get back the python objects/data structures. Your pickled
data is safe as long as your pickled file is stored on the disk.

Categories