vcfutils for parsing multiple vcf files - python

I have multiple VCF files and What I need to achieve is to extract some rows from VCF file based on the defined filters.. So in order to achieve that I started off with using
import vcf
import vcf.utils
which seems to be to straight forward and neat. but I am getting issues, in between it would be really great if some one take a look and guide a little bit to get to the desired output..
VCF file looks like this its has lines starting with # and later information we needed,(few lines with headers and the rows needed are as follows,)
##fileformat=VCFv4.1
##source=V2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
chr10 197523 . G A . PASS DP=26;SS=1;SSC=2;GPV=5.4595E-6;SPV=6.1327E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:17:8:9:52.94%:5,3,4,5 0/1:.:9:4:5:55.56%:2,2,2,3
chr10 198411 . T G . PASS DP=37;SS=1;SSC=5;GPV=1.2704E-5;SPV=2.7151E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:19:13:6:31.58%:8,5,1,5 0/1:.:18:9:8:47.06%:3,6,5,3
and so I used following piece of python code to get the information I need,
The flowing piece of code throws error message,
reader_BM_CR_ID = vcf.Reader(filename="sample/sam/sample.vcf", compressed=False)
writer_CR = vcf.Writer(open('same/sam/sample_filtered.vcf', 'w'), reader_BM_CR_ID)
for variants in vcf.utils(reader_BM_CR_ID):
for call in variants.samples:
if call.sample == 'T':
if call.data.FREQ >='20%':
if call.data.FREQ >'0%':
if call.data.FREQ !='100%':
if call.data.DP >=20:
writer.write_record(id_var)
The error message,
TypeError Traceback (most recent call last)
<ipython-input-471-526e4c3bbab1> in <module>()
----> 1 for variants in vcf.utils(reader_BM_CR_ID):
2
3 for call in variants.samples:
4 if call.sample == 'T':
5 if call.data.FREQ >='20%':
TypeError: 'module' object is not callable
Any help is really appreciated..!!

You are trying to call module as function.
Python carefully reports about it:
TypeError: 'module' object is not callable

Related

How can I resolve this Pyart Issue with "Time" as the keyError from the data?

KeyError Traceback (most recent call last)
<ipython-input-17-e0d335b3929e> in <module>()
1 numfile = 0
2 filename = os.path.join(data_dir, Directory[0])
----> 3 radar = pyart.io.read(filename)
4 display = pyart.graph.RadarMapDisplay(radar)
1 frames
/usr/local/lib/python3.7/dist-packages/pyart/io/cfradial.py in read_cfradial(filename,
field_names, additional_metadata, file_field_names, exclude_fields, include_fields,
delay_field_loading, **kwargs)
134 # 4.4 coordinate variables -> create attribute dictionaries enter code here
135 time = _ncvar_to_dict(ncvars['time'])
136 _range = _ncvar_to_dict(ncvars['range'])
137
KeyError: 'time'
It is hard to answer this with a specific solution unless you can provide a little more detail. The basis of this error, however, is that the reader pyart.io.cfradial is looking for a specific variable in the provided NetCDF but cannot find it. One simple solution might be to directly specify a "time" variable using either NetCDF4.Dataset or xarray, which might allow the file to be opened, if there are no other missing variables.
It is also possible that a specific file handler in Pyart could open your NetCDF file; take a look at pyart.io and pyart.aux_io for listings of available functions.

Graph.Read_Ncol (csv) for igraph in Python

I'm completely new to coding and Python and am having trouble with the simple task of reading a csv file.
Naturally, I started with:
import pandas as pd
import igraph as ig
I tested the csv using:
test_df = pd.read_csv('griplinks.csv')
print(test_df.head())
It seemed to work because I was able to come up with the output:
From To
0 1 11
1 1 31
2 1 40
3 1 44
4 1 53
However, when it was time to actually read my csv file using:
griplinks = ig.Graph.Read_Ncol('griplinks.csv', directed=False)
I would come up with:
--------------------------------------------------------------------------- InternalError Traceback (most recent call
last) in ()
1 # Attempt 1
2
----> 3 griplinks = ig.Graph.Read_Ncol('griplinks.csv', directed=False)
InternalError: Error at
c:\users\vssadministrator\appdata\local\temp\pip-req-build-ft6_7fco\vendor\build\igraph\igraph-0.8.3-msvc\src\foreign.c:244:
Parse error in NCOL file, line 1 (syntax error, unexpected NEWLINE,
expecting ALNUM), Parse error
Since nothing's really wrong with my csv file or its path, I was wondering if there's something wrong with the code I used to read it?
The documentation is indeed not really clear: it is expected that the nodes are separated by whitespace, not by a comma. It might be easier to actually construct your graph from the pandas dataframe:
griplinks = ig.Graph.DataFrame(test_df)
Note that this was only introduced in python-igraph version 0.8.3, so make sure to use at least that version.

osmx elevation data: no module named 'keys'

I'm coming from R and new to Python, so I assume this is a novice question, but any help would be appreciated. I'm following along with this example to add elevation data to Open Streets Map data using the OSMnx package:
https://github.com/gboeing/osmnx-examples/blob/master/notebooks/12-node-elevations-edge-grades.ipynb
When I type the first block of code
from keys import my_google_elevation_api_key
I get this error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-50-384032deeff7> in <module>()
----> 1 from keys import my_google_elevation_api_key #replace this with your own API key
ModuleNotFoundError: No module named 'keys'
-----------
(Note - I am using an API key from Google as instructed in the example. The code above is intentionally generic.) I'm using jupyter to type the Python code.
The author probably has a keys.py in which he defines the variable google_elevation_api_key and assigns it a key that he later uses to access the elevation API.
You could simply replace the import line
from keys import my_google_elevation_api_key
with this
my_google_elevation_api_key = '<your_api_key>'

Error processing a csv file in Keras

I'm working on building an LSTM recurrent neural network that processes a set of texts and uses them to predict the author of new texts. I have a CSV file containing a single column of long text entries that is comma separated like this:
"sample text here","more text here","extra text written here"
This goes on for a few thousand entries. I'm trying to load this so I can feed it through the Keras Tokenizer and then use it in training my model but I'm stuck on an error originating on the first call to the Tokenizer where it kicks back with
Traceback (most recent call last):
File "test.py", line 35, in <module>
t.fit_on_texts(X_train)
File "I:\....text.py",
line 175, in fit_on_texts
self.split)
File "I:\....text.py",
line 47, in text_to_word_sequence
text = text.translate(translate_map)
AttributeError: 'numpy.ndarray' object has no attribute 'translate'
I'm very new to python, but as far as I can tell the issue is that the Tokenizer is expecting strings, but it's getting passed an ndarray instead. What I can't seem to manage is finding a way to pass it the correct thing, and I would really appreciate any advice. I've been working on this for a couple days now and it's just not coming to me.
Here's the relevant section of my code:
X_train = pandas.read_csv('I:\\xTrain.csv', sep=",", header=None, error_bad_lines=False).as_matrix()
t = Tokenizer(lower=False)
t.fit_on_texts(X_train)
t.texts_to_matrix(X_train, mode='count', lower=False)
I've tried reading it in a variety of ways, including using numpy.loadtxt. The error has varied a bit with the methods, but it's always that I'm trying to feed the wrong kind of input to the Tokenizer and I can't seem to work out how to get the right kind. What am I missing here? Thanks for taking the time to read!
Update
With help from furas, I discovered that my array was two columns wide and have successfully removed the second empty column. Unfortunately, this seems to have simply changed the error I'm getting slightly. It now reads:
Traceback (most recent call last):
File "test.py", line 36, in <module>
t.fit_on_texts(X_train)
File "I:\....text.py",
line 175, in fit_on_texts
self.split)
File "I:\....text.py",
line 47, in text_to_word_sequence
text = text.translate(translate_map)
AttributeError: 'numpy.int64' object has no attribute 'translate'
The only change is that numpy.ndarray is now numpy.int64. It looks to me like this is an int array now, even though it contains strings of text, so I'm attempting to find a way to convert it into a string array.
del X_train[1]
X_train[0] = Y_train[0].apply(str)
Is the code I've tried so far. The first line strips the extra column, but the second line seems to do nothing. I'm still trying to figure out how to get this data into the proper format.

TypeError when trying to fetch data from CSV file

I am new in Python. I have a .csv file named supermarket.csv. I am trying to fetch data from the file and store it in a DataFrame object. I am using Jupyter as text editor.
Data the file contains:
,Address,City,Country,Employees,ID,Name,State
0,3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
1,735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
2,332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3,3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
4,1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
5,551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114
The code I am trying to run:
import pandas
df1=pandas.read_csv("supermarkets.csv")
df1
and its throwing an type error:
> ---------------------------------------------------------------------------
TypeError
Traceback (most recent call last)
<ipython-input-123-0000e09242f0> in <module>()
----> 1 df1=pandas.read_csv("supermarkets.csv")
----> 2 df1
TypeError: 'str' object is not callable
> ---------------------------------------------------------------------------
I was following a tutorial. In the tutorial it worked fine for the instructor. But whenever I'm trying to run this code getting the same error.
I have also tried for .json and .xlsx file. Both are working fine. Only for read_csv() method getting this error.
Improper Indentation
Python programs get structured through indentation, i.e. code blocks are defined by their indentation. Okay that's what we expect from any program code, isn't it? Yes, but in the case of Python it's a language requirement not a matter of style. This principle makes it easier to read and understand other people's Python code.
Read more about Python Indentation
Try this:
import pandas
df1 = pandas.read_csv('supermarkets.csv')
print df1

Categories