How to load multiple .mat files into a python script

How to load multiple .mat files into a python script - python

I want to load 38 .mat files into a dictionary to hold them all.
the .mat files are named subject1 to subject38
The code I tried is a simple for loop
import scipy.io as sio
data = {}
for i in range(1, 38):
data["data{}".format(i)] = sio.loadmat('subject{}.mat'.format(i))
the error I'm getting is:
Traceback (most recent call last):
File "D:/senior project/python/dataAqu.py", line 7, in
data["data{0}".format(i)] = sio.loadmat('subject{0}.mat'.format(i))
File "C:\Users\mamdo\AppData\Roaming\Python\Python27\site-packages\scipy\io\matlab\mio.py", line 208, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "C:\Users\mamdo\AppData\Roaming\Python\Python27\site-packages\scipy\io\matlab\mio5.py", line 292, in get_variables
res = self.read_var_array(hdr, process)
File "C:\Users\mamdo\AppData\Roaming\Python\Python27\site-packages\scipy\io\matlab\mio5.py", line 252, in read_var_array
return self._matrix_reader.array_from_header(header, process)
File "mio5_utils.pyx", line 675, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
File "mio5_utils.pyx", line 705, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
File "mio5_utils.pyx", line 778, in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex
File "mio5_utils.pyx", line 450, in scipy.io.matlab.mio5_utils.VarReader5.read_numeric
File "mio5_utils.pyx", line 355, in scipy.io.matlab.mio5_utils.VarReader5.read_element
File "streams.pyx", line 194, in scipy.io.matlab.streams.ZlibInputStream.read_string
File "pyalloc.pxd", line 9, in scipy.io.matlab.pyalloc.pyalloc_v
MemoryError

So I found the problem. The mat files shouldnt be opened by any other program - like matlab - if there is an error restart the computer.
Also if there is a memory problem try to integrate the mat files seperatly and perform whatever code you need and then load the next file.

Related

pandas reading excel results in "not a zip file"

I try to read a xlsx into a data frame:
itut_ir = pd.read_excel('C:\\Users\\Administrator\\Downloads\\reportdata.xlsx')
print(itut_ir.to_string())
I receive this:
Traceback (most recent call last):
File
"C:\Users\Administrator\eclipse-workspace\Reports\GOW\Report.py",
line 44, in
df = pd.read_excel('C:\Users\Administrator\Downloads\reportdata.xlsx')
File
"C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_base.py",
line 304, in read_excel
io = ExcelFile(io, engine=engine) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_base.py",
line 824, in init
self._reader = self.enginesengine File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_xlrd.py",
line 21, in init
super().init(filepath_or_buffer) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_base.py",
line 353, in init
self.book = self.load_workbook(filepath_or_buffer) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel_xlrd.py",
line 36, in load_workbook
return open_workbook(filepath_or_buffer) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd_init.py",
line 117, in open_workbook
zf = zipfile.ZipFile(filename) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\zipfile.py",
line 1222, in init
self._RealGetContents() File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\zipfile.py",
line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file
does anybody have an idea? the file does not seem to be broken, I can open it with Excel.
thanks!
*** UPDATE ***
the file producing the error is being downloaded from FTP. opening the original file works ... if that gives you a hint :) thanks

I had the same issue just a little bit ago with an XLSX that I created in LibreOffice.
The solution was to check the XLSX to make sure it wasn't corrupted. In my case, loading a previous version of the XLSX file corrected the problem.

unable to perform OCR using python

I am trying to build a a python program to recognize the text from text containing objects in realtime. I was able to do OCR using tesseract from command line just to check if it works for static images or not. However, to use this in python we have to get use of pytesseract wrapper to use tesseract in python. I am following this Tutorial. But when i execute the script from the tutorial with sample image, it throws error like this:
Traceback (most recent call last):
File "C:\Python27\tess1.py", line 4, in <module>
print(pytesseract.image_to_string(Image.open('example_01.png')))
File "build\bdist.win-amd64\egg\pytesseract\pytesseract.py", line 193, in
image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)
File "build\bdist.win-amd64\egg\pytesseract\pytesseract.py", line 130, in
run_and_get_output
temp_name, img_extension = save_image(image)
File "build\bdist.win-amd64\egg\pytesseract\pytesseract.py", line 86, in
save_image
image.save(input_file_name, format=img_extension, **image.info)
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1420, in save
self.load()
File "C:\Python27\lib\site-packages\PIL\ImageFile.py", line 193, in load
d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
File "C:\Python27\lib\site-packages\PIL\Image.py", line 356, in
_getdecoder
raise IOError("decoder %s not available" % decoder_name)
IOError: decoder zip not available

Pyexcel doesn't merge .xls files

I'm trying to merge multiple .xls files into a single workbook, where each file is inserted into a sheet, named with the .xls filename.
While surfing on web, I've seen the documentation of Pyexcel and a specific module which, as written here, could do the job easly.
Here's the code.
from pyexcel.cookbook import merge_all_to_a_book
import glob
merge_all_to_a_book(glob.glob("Dir\*.xls"),"output.xls")
As expected, it doesn't work. Here's the console output.
File "..\Desktop\scripts\provaimport.py", line 48, in <module>
merge_all_to_a_book(glob.glob("C:\Users\Tesisti\Desktop\forpythonscript\*.xls"),"output.xls")
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel\cookbook.py", line 148, in merge_all_to_a_book
merged.save_as(outfilename)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel\internal\meta.py", line 339, in save_as
return save_book(self, file_name=filename, **keywords)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel\internal\core.py", line 51, in save_book
return _save_any(a_source, book)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel\internal\core.py", line 55, in _save_any
a_source.write_data(instance)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel\plugins\sources\file_output.py", line 38, in write_data
**self._keywords)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel\plugins\renderers\excel.py", line 30, in render_book_to_file
save_data(file_name, book.to_dict(), **keywords)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel_io\io.py", line 119, in save_data
**keywords)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel_io\io.py", line 141, in store_data
writer.write(data)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel_io\book.py", line 58, in __exit__
self.close()
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\pyexcel_xls\xlsw.py", line 86, in close
self.work_book.save(self._file_alike_object)
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\xlwt\Workbook.py", line 710, in save
doc.save(filename_or_stream, self.get_biff_data())
File "C:\Users\Tesisti\Anaconda2\lib\site-packages\xlwt\Workbook.py", line 680, in get_biff_data
self.__worksheets[self.__active_sheet].selected = True
Any idea on how to fix?

It seems to me that glob.glob("Dir*.xls") returned an empty list of files. Hence pyexcel's plugin pyexcel-xls fails to create an empty file.
The current solution, I would recommend is to take the latest pyexcel-xls and use try-except statement around merge_all_to_a_book, catching empty file case.

how to unpack dmoz urls from rdf dump with python and rdflib?

i tried to open rdf file (dmoz rdf dump), but a get this error message
Traceback (most recent call last):
File "/media/_dev_/ODP_RDF_get_links.py", line 4, in <module>
result = g.parse("data/content.rdf")
File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 1033, in parse
parser.parse(source, self, **args)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 577, in parse
self._parser.parse(source)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 210, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 352, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 160, in endElementNS
self.current.end(name, qname)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 331, in node_element_end
self.error("Repeat node-elements inside property elements: %s"%"".join(name))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 185, in error
raise ParserError(info + message)
file:///media/_dev_/data/content.rdf:5:12: Repeat node-elements inside property elements: http://dmoz.org/rdf/catid
my simple code is as follow:
import rdflib
g = rdflib.Graph()
result = g.parse("data/content.rdf")
print("graph has %s statements." % len(g))
i need to be able to read the file.
extract all links in the world category.
thanks for any possible help
EDIT:
PS: found this wikipedia rdf_dumps, so developing custom scripts is necessary to use this dump

scipy.io typeerror:buffer too small for requested array

I have a problem in python. I'm using scipy, where i use scipy.io to load a .mat file. The .mat file was created using MATLAB.
listOfFiles = os.listdir(loadpathTrain)
for f in listOfFiles:
fullPath = loadpathTrain + '/' + f
mat_contents = sio.loadmat(fullPath)
print fullPath
Here's the error:
Traceback (most recent call last):
File "tryRankNet.py", line 1112, in <module>
demo()
File "tryRankNet.py", line 645, in demo
mat_contents = sio.loadmat(fullPath)
File "/usr/lib/python2.6/dist-packages/scipy/io/matlab/mio.py", line 111, in loadmat
matfile_dict = MR.get_variables()
File "/usr/lib/python2.6/dist-packages/scipy/io/matlab/miobase.py", line 356, in get_variables
getter = self.matrix_getter_factory()
File "/usr/lib/python2.6/dist-packages/scipy/io/matlab/mio5.py", line 602, in matrix_getter_factory
return self._array_reader.matrix_getter_factory()
File "/usr/lib/python2.6/dist-packages/scipy/io/matlab/mio5.py", line 274, in matrix_getter_factory
tag = self.read_dtype(self.dtypes['tag_full'])
File "/usr/lib/python2.6/dist-packages/scipy/io/matlab/miobase.py", line 171, in read_dtype
order='F')
TypeError: buffer is too small for requested array
The whole thing is in a loop, and I checked the size of the file where it gives the error by loading it interactively in IDLE.
The size is (9,521), which is not at all huge. I tried to find if I'm supposed to clear the buffer after each iteration of the loop, but I could not find anything.
Any help would be appreciated.
Thanks.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to load multiple .mat files into a python script - python

So I found the problem. The mat files shouldnt be opened by any other program - like matlab - if there is an error restart the computer. Also if there is a memory problem try to integrate the mat files seperatly and perform whatever code you need and then load the next file.

Related

pandas reading excel results in "not a zip file"

unable to perform OCR using python

Pyexcel doesn't merge .xls files

how to unpack dmoz urls from rdf dump with python and rdflib?

scipy.io typeerror:buffer too small for requested array

Categories

Resources