Read .net pajek file using python igraph library - python

I am trying to load a .net file using python igraph library. Here is the sample code:
import igraph
g = igraph.read("s.net",format="pajek")
But when I tried to run this script I got the following errors:
Traceback (most recent call last):
File "demo.py", line 2, in <module>
g = igraph.read('s.net',format="pajek")
File "C:\Python27\lib\site-packages\igraph\__init__.py", line 3703, in read
return Graph.Read(filename, *args, **kwds)
File "C:\Python27\lib\site-packages\igraph\__init__.py", line 2062, in Read
return reader(f, *args, **kwds)
igraph._igraph.InternalError: Error at .\src\foreign.c:574: Parse error in Pajek
file, line 1 (syntax error, unexpected ARCSLINE, expecting VERTICESLINE), Parse error
Kindly provide some hint over it.

Either your file is not a regular Pajek file or igraph's Pajek parser is not able to read this particular Pajek file. (Writing a Pajek parser is a bit of hit-and-miss since the Pajek file format has no formal specification). If you send me your Pajek file via email, I'll take a look at it.
Update: you were missing the *Vertices section of the Pajek file. Adding a line like *Vertices N (where N is the number of vertices in the graph) resolves your problem. I cannot state that this line is mandatory in Pajek files because of lack of formal specification for the file format, but all the Pajek files I have seen so far included this line so I guess it's pretty standard.

Related

Cannot read Statistics Canada sdmx file

I am trying to read some Canadian census data from Statistics Canada
(the XML option for the "Canada, provinces and territories" geograpic level). I see that the xml file is in the SDMX format and that there is a structure file provided, but I cannot figure out how to read the data from the xml file.
It seems there are 2 options in Python, pandasdmx and sdmx1, both of which say they can read local files. When I try
import sdmx
datafile = '~/Documents/Python/Generic_98-401-X2016059.xml'
canada = sdmx.read_sdmx(datafile)
It appears to read the first 903 lines and then produces the following:
Traceback (most recent call last):
File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/xml.py", line 238, in read_message
raise NotImplementedError(element.tag, event) from None
NotImplementedError: ('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}GenericData', 'start')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/__init__.py", line 126, in read_sdmx
return reader().read_message(obj, **kwargs)
File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/xml.py", line 259, in read_message
raise XMLParseError from exc
sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}GenericData', 'start')
Is this happening because I've not loaded the structure of the sdmx file (Structure_98-401-X2016059.xml in the zip file from the StatsCan link above)? If so, how do I go about loading that and telling sdmx to use that when reading datafile?
The documentation for sdmx and pandasdmx only show examples of loading files from online providers and not from local files, so I'm stuck. I have limited familiarity with python so any help is much appreciated.
For reference, I can read the file in R using the instructions from the rsdmx github. I would like to be able to do the same/similar in Python.
Thanks in advance.
From a cursory inspection of the documentation, it seems that Statistics Canada is not one of the sources that is included by default. There is however an sdmx.add_source function. I suggest you try that (before loading the data).
As per the sdmx1 developer, StatsCan is using the older, unsupported version of the SDMX (v. 2.0). The current version is 2.1 and rsdmx1 only supports this (support is also going towards the upcoming v.3).

How to read the .mat file from Visual SFM in Python Code?

Can someone help me with the Python code to read the .mat file generated from Visual SFM? You can download the .mat file from the link:
https://github.com/cvlab-epfl/tf-lift/tree/master/example
You can get a .mat file in the zip in the link and the file is what I am asking for help.
It seems to be an ASCII file. I do not know how to read the data in the file.
I tried to load the data in the .mat file with scipy.io.loadmat() but an error occurred as:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
raise ValueError('Unknown mat file type, version %s, %s' % ret)
ValueError: Unknown mat file type, version 20, 0
Can someone help me to load the data in the file with Python code?
Thanks for your help and replies sincerely.
If you mean this VisualSFM (http://ccwu.me/vsfm/doc.html), then the .mat file isn't a MATLAB .mat file, but a 'match' file.
From the website:
[name].sift stores all the detected SIFT features, and [name].mat stores the feature matches.
It seems there is C++ code for reading this file (http://ccwu.me/vsfm/MatchFile.zip) which you could use to write a python parser.
Additionally, it seems like there is a python socket interface to VSFM, which may allow you to do what you want https://github.com/nrhine1/vsfm_util

Python loadarff fails for string attributes

I am trying to load an arff file using Python's 'loadarff' function from scipy.io.arff. The file has string attributes and it is giving the following error.
>>> data,meta = arff.loadarff(fpath)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/home/eex608/conda3_envs/PyT3/lib/python3.6/site-packages/scipy/io/arff/arffread.py", line 805, in loadarff
return _loadarff(ofile)
File "/data/home/eex608/conda3_envs/PyT3/lib/python3.6/site-packages/scipy/io/arff/arffread.py", line 838, in _loadarff
raise NotImplementedError("String attributes not supported yet, sorry")
NotImplementedError: String attributes not supported yet, sorry
How to read the arff successfully?
Since SciPy's loadarff converts containings of arff file into NumPy array, it does not support strings as attributes.
In 2020, you can use liac-arff package.
import arff
data = arff.load(open('your_document.arff', 'r'))
However, make sure your arff document does not contain inline comments after a meaningful text.
So there won't be such inputs:
#ATTRIBUTE class {F,A,L,LF,MN,O,PE,SC,SE,US,FT,PO} %Check and make sure that FT and PO should be there
Delete or move comment to the next line.
I'd got such mistake in one document and it took some time to figure out what's wrong.

Invalid Signature Exception with .sav file

New to scipy but not to python. Trying to import a .sav file to scipy so I can do some basic work on it. But, each time I try to import the file using scipy.io.readsav(), python throws an error:
Traceback (most recent call last):
File "<ipython-input-7-743be643d8a1>", line 1, in <module>
dataset = io.readsav("c:/users/me/desktop/survey.sav")
File "C:\Users\me\Anaconda3\lib\site-packages\scipy\io\idl.py", line 726, in readsav
raise Exception("Invalid SIGNATURE: %s" % signature)
Exception: Invalid SIGNATURE: b'$F'
Any idea what's happening? I can open the file in R and manipulate the data, but I'd like to do it in Python. Running Anaconda on Windows.
scipy.io.readsav() reads IDL SAVE files. You have tagged this question spss, so I assume you are trying to read an SPSS file. The format of an SPSS .sav file is not the same as the format of an IDL SAVE file.
Look on pypi for savReaderWriter for Python code to read and write sav files.

how to parse big datasets using RDFLib?

I'm trying to parse several big graphs with RDFLib 3.0, apparently it handles first one and dies on the second (MemoryError)... looks like MySQL is not supported as store anymore, can you please suggest a way to somehow parse those?
Traceback (most recent call last):
File "names.py", line 152, in <module>
main()
File "names.py", line 91, in main
locals()[graphname].parse(filename, format="nt")
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/graph.py", line 938, in parse
location=location, file=file, data=data, **args)
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/graph.py", line 757, in parse
parser.parse(source, self, **args)
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/plugins/parsers/nt.py", line 24, in parse
parser.parse(f)
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/plugins/parsers/ntriples.py", line 124, in parse
self.line = self.readline()
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/plugins/parsers/ntriples.py", line 151, in readline
m = r_line.match(self.buffer)
MemoryError
How many triples on those RDF files ? I have tested rdflib and it won't scale much further than few tens of ktriples - if you are lucky. No way it really performs well for files with millions of triples.
The best parser out there is rapper from Redland Libraries. My first advice is to not use RDF/XML and go for ntriples. Ntriples is a lighter format than RDF/XML. You can transform from RDF/XML to ntriples using rapper:
rapper -i rdfxml -o ntriples YOUR_FILE.rdf > YOUR_FILE.ntriples
If you like Python you can use the Redland python bindings:
import RDF
parser=RDF.Parser(name="ntriples")
model=RDF.Model()
stream=parser.parse_into_model(model,"file://file_path",
"http://your_base_uri.org")
for triple in model:
print triple.subject, triple.predicate, triple.object
I have parsed fairly big files (couple of gigabyes) with redland libraries with no problem.
Eventually if you are handling big datasets you might need to assert your data into a scalable triple store, the one I normally use is 4store. 4store internally uses redland to parse RDF files. In the long term, I think, going for a scalable triple store is what you'll have to do. And with it you'll be able to use SPARQL to query your data and SPARQL/Update to insert and delete triples.

Categories