I'm completely new to coding and Python and am having trouble with the simple task of reading a csv file.
Naturally, I started with:
import pandas as pd
import igraph as ig
I tested the csv using:
test_df = pd.read_csv('griplinks.csv')
print(test_df.head())
It seemed to work because I was able to come up with the output:
From To
0 1 11
1 1 31
2 1 40
3 1 44
4 1 53
However, when it was time to actually read my csv file using:
griplinks = ig.Graph.Read_Ncol('griplinks.csv', directed=False)
I would come up with:
--------------------------------------------------------------------------- InternalError Traceback (most recent call
last) in ()
1 # Attempt 1
2
----> 3 griplinks = ig.Graph.Read_Ncol('griplinks.csv', directed=False)
InternalError: Error at
c:\users\vssadministrator\appdata\local\temp\pip-req-build-ft6_7fco\vendor\build\igraph\igraph-0.8.3-msvc\src\foreign.c:244:
Parse error in NCOL file, line 1 (syntax error, unexpected NEWLINE,
expecting ALNUM), Parse error
Since nothing's really wrong with my csv file or its path, I was wondering if there's something wrong with the code I used to read it?
The documentation is indeed not really clear: it is expected that the nodes are separated by whitespace, not by a comma. It might be easier to actually construct your graph from the pandas dataframe:
griplinks = ig.Graph.DataFrame(test_df)
Note that this was only introduced in python-igraph version 0.8.3, so make sure to use at least that version.
Related
I have some code that I wrote that relies on the Koala2 package. The code was running a few weeks ago but has since stopped working. The following code throws an error
!pip install koala2
from koala.ExcelCompiler import ExcelCompiler
ImportError Traceback (most recent call last)
<ipython-input-2-715ed1f6c9fe> in <module>()
----> 1 from koala.ExcelCompiler import ExcelCompiler
2 from koala.Spreadsheet import Spreadsheet
3 import pandas as pd
4 import numpy as np
5 import string
1 frames
/usr/local/lib/python3.7/dist-packages/koala/__init__.py in <module>()
2
3 from openpyxl import *
----> 4 from .ast import *
5 from .Cell import *
6 from .ExcelCompiler import *
/usr/local/lib/python3.7/dist-packages/koala/ast/__init__.py in <module>()
7 import networkx
8 from networkx.classes.digraph import DiGraph
----> 9 from openpyxl.compat import unicode
10
11 from koala.utils import uniqueify, flatten, max_dimension, col2num, resolve_range
ImportError: cannot import name 'unicode' from 'openpyxl.compat'
(/usr/local/lib/python3.7/dist-packages/openpyxl/compat/__init__.py)
This code is taken directly from the very first example that they have on PYPI
I can get around this by:
!pip install openpyxl==2.5.14
But then I am no longer able to use Pandas read_excel() because it requires openpyxl >= 3.0.0
This is strange to me because as I mentioned earlier, everything was running ok a few weeks ago and I don't think there are new versions of koala or openpyxl. Is anyone aware of a workaround for this? Any help is greatly appreciated.
Edit: Also it looks like Koalas has been abandoned by its creators. My goal was to read in an excel file, record where the formulas are and be able to apply these formulas to a new excel file or data frame with the same layout and dimensions (# or rows and columns) as the original excel. If there is a better way to do this without koalas any insight would be very helpful. Thanks
Looks like that is known problem from 2019.
Koala code don't updated more than 2 years (maybe you should search newer alternative?).
Here is two possible solutions:
Use older versions of all other libraries in your code to keep dependencies
Patch Koala library manually everytime when you install it:
Until the bugfix is put into the repository, the fix I made locally is: replace the "from openpyxl.compat import unicode" line with "unicode = str" in all the koala files that have it.
KeyError Traceback (most recent call last)
<ipython-input-17-e0d335b3929e> in <module>()
1 numfile = 0
2 filename = os.path.join(data_dir, Directory[0])
----> 3 radar = pyart.io.read(filename)
4 display = pyart.graph.RadarMapDisplay(radar)
1 frames
/usr/local/lib/python3.7/dist-packages/pyart/io/cfradial.py in read_cfradial(filename,
field_names, additional_metadata, file_field_names, exclude_fields, include_fields,
delay_field_loading, **kwargs)
134 # 4.4 coordinate variables -> create attribute dictionaries enter code here
135 time = _ncvar_to_dict(ncvars['time'])
136 _range = _ncvar_to_dict(ncvars['range'])
137
KeyError: 'time'
It is hard to answer this with a specific solution unless you can provide a little more detail. The basis of this error, however, is that the reader pyart.io.cfradial is looking for a specific variable in the provided NetCDF but cannot find it. One simple solution might be to directly specify a "time" variable using either NetCDF4.Dataset or xarray, which might allow the file to be opened, if there are no other missing variables.
It is also possible that a specific file handler in Pyart could open your NetCDF file; take a look at pyart.io and pyart.aux_io for listings of available functions.
I am using tabula-py to read data from some pdfs, but keep getting this error.
Exception has occurred: ParserError
Error tokenizing data. C error: Expected 1 fields in line 51, saw 2
The PDF i am reading from is almost exactly the same as the one that I built this code around. For example, I built it while testing with another PDF, and am now changing to a new updated one that is the same format and style, but the code now fails and throws this error.
Not sure what I am doing wrong / why this code that previously worked no longer works.
Code snippet:
tabula.convert_into_by_batch("-----", stream = True, output_format='csv', pages='11-57')
path = ("-------")
filenamelist = os.listdir(path)
updated_path = path+ "\\" + filenamelist[0]
new_frame = pd.read_csv(updated_path, skiprows=2, encoding='ISO-8859-1') #error thrown here
The conversion of pdfs to csvs is no perfect transformation. Converting anything away from a pdf is actually quite difficult and can be finnicky no matter what library you're using. Your error is telling me that on line 51 of your converted csv's there is a comma that pandas did not expect to see. So in all of the rows leading up to the "bad" row, you only had single commas (e.g. it expected to see 1 value). Then on row 51, it encountered either 2 values, or a value with a comma at the end, which makes this an improperly formatted csv.
import pandas as pd
import io
bad_csv_file = io.StringIO("""
A
1
2
3
99
50,
100
""".strip())
pd.read_csv(bad_csv_file)
output
Error tokenizing data. C error: Expected 1 fields in line 6, saw 2
Note that there's an extra comma on line 6 that leads to the above error. Simply removing that extra trailing comma resolves this error.
Upon calling a csv file I am getting the following error
ParserError: Error tokenizing data. C error: Expected 1 fields in line 12, saw 2
I opened my csv file and then went to the line and saw that the error is coming because one of the numbers is with decimals but separated by a cooma.
That entire column of my csv file has whole numbers but also decimals numbers that look like the following .
385433,4
Not sure how I can resolve this error when reading the csv file using pandas
It sounds like you have European-formatted CSV. Since you haven't provided a real sample of your CSV as requested, I will guess. If this doesn't solve your issue, edit your question to provide an actual sample:
Given test.csv:
c1;c2;c3
1,2;3,4;5,6
3,4;5,6;7,8
Then:
import pandas as pd
data = pd.read_csv('test.csv',decimal=',',delimiter=';')
print(data)
Produces:
c1 c2 c3
0 1.2 3.4 5.6
1 3.4 5.6 7.8
I have multiple VCF files and What I need to achieve is to extract some rows from VCF file based on the defined filters.. So in order to achieve that I started off with using
import vcf
import vcf.utils
which seems to be to straight forward and neat. but I am getting issues, in between it would be really great if some one take a look and guide a little bit to get to the desired output..
VCF file looks like this its has lines starting with # and later information we needed,(few lines with headers and the rows needed are as follows,)
##fileformat=VCFv4.1
##source=V2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
chr10 197523 . G A . PASS DP=26;SS=1;SSC=2;GPV=5.4595E-6;SPV=6.1327E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:17:8:9:52.94%:5,3,4,5 0/1:.:9:4:5:55.56%:2,2,2,3
chr10 198411 . T G . PASS DP=37;SS=1;SSC=5;GPV=1.2704E-5;SPV=2.7151E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:19:13:6:31.58%:8,5,1,5 0/1:.:18:9:8:47.06%:3,6,5,3
and so I used following piece of python code to get the information I need,
The flowing piece of code throws error message,
reader_BM_CR_ID = vcf.Reader(filename="sample/sam/sample.vcf", compressed=False)
writer_CR = vcf.Writer(open('same/sam/sample_filtered.vcf', 'w'), reader_BM_CR_ID)
for variants in vcf.utils(reader_BM_CR_ID):
for call in variants.samples:
if call.sample == 'T':
if call.data.FREQ >='20%':
if call.data.FREQ >'0%':
if call.data.FREQ !='100%':
if call.data.DP >=20:
writer.write_record(id_var)
The error message,
TypeError Traceback (most recent call last)
<ipython-input-471-526e4c3bbab1> in <module>()
----> 1 for variants in vcf.utils(reader_BM_CR_ID):
2
3 for call in variants.samples:
4 if call.sample == 'T':
5 if call.data.FREQ >='20%':
TypeError: 'module' object is not callable
Any help is really appreciated..!!
You are trying to call module as function.
Python carefully reports about it:
TypeError: 'module' object is not callable