Python HDF5 H5Py issues opening multiple files - python

I am usint the 64-bit version of Enthought Python to process data across multiple HDF5 files. I'm using h5py version 1.3.1 (HDF5 1.8.4) on 64-bit Windows.
I have an object that provides a convenient interface to my specific data heirarchy, but testing the h5py.File(fname, 'r') independently yields the same results. I am iterating through a long list (~100 files at a time) and attempting to pull out specific pieces of information from the files. The problem I'm having is that I'm getting the same information out of several files! My loop looks something like:
files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))
for filename in files:
handle = hdf5.File(filename, 'r')
data = extract_data_from_handle(handle)
for row in data:
out_csv.writerow((filename, ) +row)
When I inspect the files using something like hdfview, I know the internals are different. However, the csv I get seems to indicate that all the files contain the same data. Has anyone seen this behavior before? Any suggestions where I could go to start debugging this issue?

I've concluded that this is a strange manifestation of Perplexing assignment behavior with h5py object as instance variable . I re-wrote my code so that each file is handled within a function call and the variable is not reused. Using this approach, I don't see the same strange behavior and it seems to work much better. For clarity, the solution looks more like:
files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))
def extract_data_from_filename(filename):
return extract_data_from_handle(hdf5.File(filename, 'r'))
for filename in files:
data = extract_data_from_filename(filename)
for row in data:
out_csv.writerow((filename, ) +row)

Related

Name object/lists when combining json files in Python

I have a series of python scripts doing Oracle SQL database checks and outputting the results into individual json files per output. That's all working fine, albeit there is probably better ways to do it.
Then using the following which I found on here in merging the json files into a single json:
import json
import glob
result = []
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
result.append(json.load(infile))
with open("merged_file.json", "w") as outfile:
json.dump(result, outfile)
This works roughly how I wanted the output to be, the only issue I'm facing is the "names" of the result sets (I'm sure my lack of terminology knowledge is what's made this hard).
To visualize what I mean, this is the output result without the data (I realise this isn't the correct output format, this is from a gui viewer of the file to better see formatting):
[]JSON
{}0
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
{}1
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
I'm looking to have the 0 and 1 be the labels of the result set, so that the output would look like:
[]JSON
{}LAX
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
{}LGW
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
Is that possible with Python? Or should I be looking at alternative solutions?
I've seen other suggestions for similar questions that suggest just taking the outputs directly into one file rather then merging multiple files, however the results are from different database connections and finding a solution for "queuing" database connections and storing outputs has been above my skill level.
Thanks!

REPL.it JSON Files

So recently I've been using REPL as python code source, but whenever I'm offline, any information stored in the JSON File is rolled back after a bit of time. Now I know this is a REPL specific problem after doing some research, but is there any way I can fix this? My code itself is quite a few lines long, so I would rather not want to use a completely different storage method.
To successfully store data in json files in replit.com, it's important to load and dump it the correct way.
An example of storing data in json files:
with open("sample.json", "r") as file:
sample = json.load(file)
sample["item"] = "Value"
with open("sample.json", "w") as file:
json.dump(sample, file)
Let me know if you've already followed these steps.

Can't open and read content of an uploaded zip file with FastAPI

I am currently developing a little backend project for myself with the Python Framework FastAPI. I made an endpoint, where the user should be able to upload 2 files, while the first one is a zip-file (which contains X .xmls) and the latter a normal .xml file.
The code is as follows:
#router.post("/sendxmlinzip/")
def create_upload_files_with_zip(files: List[UploadFile] = File(...)):
if not len(files) == 2:
raise Httpex.EXPECTEDTWOFILES
my_file = files[0].file
zfile = zipfile.ZipFile(my_file, 'r')
filelist = []
for finfo in zfile.infolist():
print(finfo)
ifile = zfile.open(finfo)
line_list = ifile.readlines()
print(line_list)
This should print the content of the files, that are in the .zip file, but it raises the Exception
AttributeError: 'SpooledTemporaryFile' object has no attribute 'seekable'
In the row ifile = zfile.open(finfo)
Upon approximately 3 days research with a lot of trial and error involved, trying to use different functions such as .read() or .extract(), I gave up. Because the python docs literally state, that this should be possible in this way...
For you, who do not know about FastAPI, it's a backend fw for Restful Webservices and is using the starlette datastructure for UploadFile. Please forgive me, if I have overseen something VERY obvious, but I literally tried to check every corner, that may have been the possible cause of the error such as:
Check, whether another implementation is possible
Check, that the .zip file is correct
Check, that I attach the correct file (lol)
Debug to see, whether the actual data, that comes to the backend is indeed the .zip file
This is a known Python bug:
SpooledTemporaryFile does not fully satisfy the abstract for IOBase.
Namely, seekable, readable, and writable are missing.
This was discovered when seeking a SpooledTemporaryFile-backed lzma
file.
As #larsks suggested in his comment, I would try writing the contents of the spooled file to a new TemporaryFile, and then operate on that. As long as your files aren't too large, that should work just as well.
This is my workaround
with zipfile.ZipFile(io.BytesIO(file.read()), 'r') as zip:

Python: Iterate through opening files

If anyone could point me in the right direction I'd be really grateful. I am looking to replace the following:
file1 = open ('filepath')
file1.write(data1)
file2 = open ('filepath2')
file2.write(data2)
file3 = open ('filepath3')
file3.write(data3)
With something like this which can be iterated through:
file[i] = open ('filepath')
file[i].write(data[i])
The reason they all need different names is because all the files must be open at once without closing. This is just a requirement of the system.
Is there any way in which this can be done?
open_files = [open(fname) for fname in ['filepath1', 'filepath2', 'filepath3']]
for fh in open_files:
fh.write(...)
or
for i, fh in enumerate(open_files):
fh.write(data[i])
You can iterate over the file paths using enumerate:
for f in enumerate("fil1","file2","file3"):
with open(f,"w") as fle:
fle.write(data[i])
Or zip the file names and data:
for f,d in zip(("fil1","file2","file3",data)):
with open(f,"w") as fle:
fle.write(d)
If you want them to stay open store the file objects in a dict:
d = {}
for f,d in zip(("fil1","file2","file3",data)):
d[f] = open(f,"w")
f[f].write(d)
You can use a dictionary.
files = {'filepath1': open('filepath1'), 'filepath2': open('filepath2')}
If you want to generate the dictionary in an iterative way, you can do something like this:
path = 'filepath{0}'
for i in range(10):
filepath = path.format(i)
files[filepath] = open(filepath)
Well You can do something like:-
filepaths=[List of all your file paths (Ex. "abc.txt", "\c\example\abc.txt")]
fileptr =[]
for file in filepaths :
fileptr += open(file,'mode')
for fil in fileptr :
fil.write(data[i])
Thank you for the suggestions made, I explored around the subject after having looked at the various ways of doing this and came up with a good solution (for my application anyhow).
The trouble I was having was that I'm writing data to a networked csv file with a raspberry pi. If the file is open elsewhere, the pi can no longer access it and data is lost as it can't be recorded on the csv file.
The initial solution which prompted this question was to keep all the files open at once, allowing other users to only open the csv file in read only mode which still allows the pi to write data.This however needs all files to be kept open in the python script and is rather memory intensive if I'm right in thinking,
Therefore, the solution I found was to make the file read only all the time I wasn't writing to it with the pi and then only making it writable just as I was adding information to the csv file.
I used the code explained here: How to remove read-only attrib directory with Python in Windows?
Many thanks

cPickle.load( ) error

I am working with cPickle for the purpose to convert the structure data into datastream format and pass it to the library. The thing i have to do is to read file contents from manually written file name "targetstrings.txt" and convert the contents of file into that format which Netcdf library needs in the following manner,
Note: targetstrings.txt contains latin characters
op=open("targetstrings.txt",'rb')
targetStrings=cPickle.load(op)
The Netcdf library take the contents as strings.
While loading a file it stuck with the following error,
cPickle.UnpicklingError: invalid load key, 'A'.
Please tell me how can I rectify this error, I have googled around but did not find an appropriate solution.
Any suggestions,
pickle is not for reading/writing generic text files, but to serialize/deserialize Python objects to file. If you want to read text data you should use Python's usual IO functions.
with open('targetstrings.txt', 'r') as f:
fileContent = f.read()
If, as it seems, the library just wants to have a list of strings, taking each line as a list element, you just have to do:
with open('targetstrings.txt', 'r') as f:
lines=[l for l in f]
# now in lines you have the lines read from the file
As stated - Pickle is not meant to be used in this way.
If you need to manually edit complex Python objects taht are to be read and passed as Python objects to another function, there are plenty of other formats to use - for example XML, JSON, Python files themselves. Pickle uses a Python specific protocol, that while note being binary (in the version 0 of the protocol), and not changing across Python versions, is not meant for this, and is not even the recomended method to record Python objects for persistence or comunication (although it can be used for those purposes).

Categories