New to scipy but not to python. Trying to import a .sav file to scipy so I can do some basic work on it. But, each time I try to import the file using scipy.io.readsav(), python throws an error:
Traceback (most recent call last):
File "<ipython-input-7-743be643d8a1>", line 1, in <module>
dataset = io.readsav("c:/users/me/desktop/survey.sav")
File "C:\Users\me\Anaconda3\lib\site-packages\scipy\io\idl.py", line 726, in readsav
raise Exception("Invalid SIGNATURE: %s" % signature)
Exception: Invalid SIGNATURE: b'$F'
Any idea what's happening? I can open the file in R and manipulate the data, but I'd like to do it in Python. Running Anaconda on Windows.
scipy.io.readsav() reads IDL SAVE files. You have tagged this question spss, so I assume you are trying to read an SPSS file. The format of an SPSS .sav file is not the same as the format of an IDL SAVE file.
Look on pypi for savReaderWriter for Python code to read and write sav files.
Related
I am trying to read some Canadian census data from Statistics Canada
(the XML option for the "Canada, provinces and territories" geograpic level). I see that the xml file is in the SDMX format and that there is a structure file provided, but I cannot figure out how to read the data from the xml file.
It seems there are 2 options in Python, pandasdmx and sdmx1, both of which say they can read local files. When I try
import sdmx
datafile = '~/Documents/Python/Generic_98-401-X2016059.xml'
canada = sdmx.read_sdmx(datafile)
It appears to read the first 903 lines and then produces the following:
Traceback (most recent call last):
File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/xml.py", line 238, in read_message
raise NotImplementedError(element.tag, event) from None
NotImplementedError: ('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}GenericData', 'start')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/__init__.py", line 126, in read_sdmx
return reader().read_message(obj, **kwargs)
File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/xml.py", line 259, in read_message
raise XMLParseError from exc
sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}GenericData', 'start')
Is this happening because I've not loaded the structure of the sdmx file (Structure_98-401-X2016059.xml in the zip file from the StatsCan link above)? If so, how do I go about loading that and telling sdmx to use that when reading datafile?
The documentation for sdmx and pandasdmx only show examples of loading files from online providers and not from local files, so I'm stuck. I have limited familiarity with python so any help is much appreciated.
For reference, I can read the file in R using the instructions from the rsdmx github. I would like to be able to do the same/similar in Python.
Thanks in advance.
From a cursory inspection of the documentation, it seems that Statistics Canada is not one of the sources that is included by default. There is however an sdmx.add_source function. I suggest you try that (before loading the data).
As per the sdmx1 developer, StatsCan is using the older, unsupported version of the SDMX (v. 2.0). The current version is 2.1 and rsdmx1 only supports this (support is also going towards the upcoming v.3).
Can someone help me with the Python code to read the .mat file generated from Visual SFM? You can download the .mat file from the link:
https://github.com/cvlab-epfl/tf-lift/tree/master/example
You can get a .mat file in the zip in the link and the file is what I am asking for help.
It seems to be an ASCII file. I do not know how to read the data in the file.
I tried to load the data in the .mat file with scipy.io.loadmat() but an error occurred as:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
raise ValueError('Unknown mat file type, version %s, %s' % ret)
ValueError: Unknown mat file type, version 20, 0
Can someone help me to load the data in the file with Python code?
Thanks for your help and replies sincerely.
If you mean this VisualSFM (http://ccwu.me/vsfm/doc.html), then the .mat file isn't a MATLAB .mat file, but a 'match' file.
From the website:
[name].sift stores all the detected SIFT features, and [name].mat stores the feature matches.
It seems there is C++ code for reading this file (http://ccwu.me/vsfm/MatchFile.zip) which you could use to write a python parser.
Additionally, it seems like there is a python socket interface to VSFM, which may allow you to do what you want https://github.com/nrhine1/vsfm_util
I am trying to load an arff file using Python's 'loadarff' function from scipy.io.arff. The file has string attributes and it is giving the following error.
>>> data,meta = arff.loadarff(fpath)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/home/eex608/conda3_envs/PyT3/lib/python3.6/site-packages/scipy/io/arff/arffread.py", line 805, in loadarff
return _loadarff(ofile)
File "/data/home/eex608/conda3_envs/PyT3/lib/python3.6/site-packages/scipy/io/arff/arffread.py", line 838, in _loadarff
raise NotImplementedError("String attributes not supported yet, sorry")
NotImplementedError: String attributes not supported yet, sorry
How to read the arff successfully?
Since SciPy's loadarff converts containings of arff file into NumPy array, it does not support strings as attributes.
In 2020, you can use liac-arff package.
import arff
data = arff.load(open('your_document.arff', 'r'))
However, make sure your arff document does not contain inline comments after a meaningful text.
So there won't be such inputs:
#ATTRIBUTE class {F,A,L,LF,MN,O,PE,SC,SE,US,FT,PO} %Check and make sure that FT and PO should be there
Delete or move comment to the next line.
I'd got such mistake in one document and it took some time to figure out what's wrong.
I am trying to use Python 3.7.2 with PyPDF2 1.26 to select some pages of an input PDF file and write the output to stdout (the actual code is more complicated, this is just a MCVE):
import sys
from PyPDF2 import PdfFileReader, PdfFileWriter
input = PdfFileReader("example.pdf")
output = PdfFileWriter()
output.addPage(input.getPage(0))
output.write(sys.stdout)
This fails with the following error:
UserWarning: File <<stdout>> to write to is not in binary mode. It may not be written to correctly. [pdf.py:453]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/PyPDF2/pdf.py", line 487, in write
stream.write(self._header + b_("\n"))
TypeError: write() argument must be str, not bytes
The problem seems to be that sys.stdout is not open in binary mode. As some of the answers suggest, I have tried the following:
output.write(sys.stdout.buffer)
This fails with the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/site-packages/PyPDF2/pdf.py", line 491, in write
object_positions.append(stream.tell())
OSError: [Errno 29] Illegal seek
I have also tried the answer from Changing the way stdin/stdout is opened in Python 3:
sout = open(sys.stdout.fileno(), "wb")
output.write(sout)
This fails with the same error as above.
How can I use the PyPDF2 library to output a PDF to standard output?
More generally, how do I correctly switch sys.stdout to binary mode (akin to Perl's binmode STDOUT)?
Note: There is no need to tell me that I can open a file in binary mode and write the PDF to that file. That works; however, I specifically want to write the PDF to stdout.
From the documentation:
write(stream)
Writes the collection of pages added to this object out as a PDF file.
Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.
It turns out that sys.stdout.buffer is not tellable if not redirected to a file, hence you can't use it as a stream for PdfFileWriter.write.
Say your script is called myscript. If you call just myscript, then you'll get this error, but if you use it with a redirection, as in:
myscript > myfile.pdf
then Python understands it's a seekable stream, and you won't get the error.
I am trying to call python code from excel using
wb = xw.Book.caller()
If file path is in English, it works. However, if the path has other language, it raise below Error popup
---------------------------
Error
---------------------------
C:\Anaconda2\lib\site-packages\xlwings\main.py:2692: UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
throw = (os.path.normpath(os.path.realpath(impl.fullname.lower())) != os.path.normpath(fullname.lower()))
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "test.py", line 13, in plot_chart
wb = xw.Book.caller()
File "C:\Anaconda2\lib\site-packages\xlwings\main.py", line 545, in caller
return cls(impl=app.books.open(fullname).impl)
File "C:\Anaconda2\lib\site-packages\xlwings\main.py", line 2695, in open
"Cannot open two workbooks named '%s', even if they are saved in different locations." % name
ValueError: Cannot open two workbooks named 'test.xlsm', even if they are saved in different locations.
Guess this has something to do with unicode problem. I did not have this kind of problem with previous version. (e.g. 0.6 or 0.7) This is new problem after I updated to version 0.9.2.
Thank you for any help
p.s. I am using Python 2.7
xlwings version 0.9.3 solve above problem. Self closing the question.