osmx elevation data: no module named 'keys' - python

I'm coming from R and new to Python, so I assume this is a novice question, but any help would be appreciated. I'm following along with this example to add elevation data to Open Streets Map data using the OSMnx package:
https://github.com/gboeing/osmnx-examples/blob/master/notebooks/12-node-elevations-edge-grades.ipynb
When I type the first block of code
from keys import my_google_elevation_api_key
I get this error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-50-384032deeff7> in <module>()
----> 1 from keys import my_google_elevation_api_key #replace this with your own API key
ModuleNotFoundError: No module named 'keys'
-----------
(Note - I am using an API key from Google as instructed in the example. The code above is intentionally generic.) I'm using jupyter to type the Python code.

The author probably has a keys.py in which he defines the variable google_elevation_api_key and assigns it a key that he later uses to access the elevation API.
You could simply replace the import line
from keys import my_google_elevation_api_key
with this
my_google_elevation_api_key = '<your_api_key>'

Related

module 'pandas' has no attribute 'read_csv': AttributeError

I have written a lambda function for AWS which will use pandas for handling dataframe. When I tested this lambda function - I faced error - No module name pandas.
I further kept pandas and other dependencies libraries in library folder of my repository.
Now I am facing other issue which I am unable to solve.
Current error:
module 'pandas' has no attribute 'read_csv': AttributeError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 127, in lambda_handler
initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word
AttributeError: module 'pandas' has no attribute 'read_csv'
I checked the solutions available on this site - like - module 'pandas' has no attribute 'read_csv
I don't have pandas.py and csv.py in my pandas folder but rather have test_to_csv.py, csvs.py and test_pandas.py, which is required as per the discussion in link provided above.
I am unable to figure out a way here.
Pandas is indeed not available by default on AWS lambda.
If you want to use Pandas with AWS lamdba, the easiest way is to use the AWS Data Wrangler layer.
When you add a new layer, select AWS layers , then in the dropdown menu you can select the AWSDataWrangler-Python39 one.
Once you have added the layer, you will be able to use pandas as usual.

Python read pickle protocol 4 error: STACK_GLOBAL requires str

In Python 3.7.5, ubuntu 18.04, pickle read gives error,
pickle version 4
Sample code:
import pickle as pkl
file = open("sample.pkl", "rb")
data = pkl.load(file)
Error:
UnpicklingError Traceback (most recent call
last)
in
----> 1 data = pickle.load(file)
UnpicklingError: STACK_GLOBAL requires str
Reading from same file object solves problem.
Reading using pandas also gives same problem
I also has this error turned out I was opening a numpy file with pickle. ;)
Turns out it is known issue. There is issue page in
github
I had this problem and just added pckl to the end of the file name.
My problem was that I was trying to pickle and un-pickle across different python environments - watch out to make sure your pickle versions match!
Perhaps this will be the solution to this error for someone.
I needed to load a numpy array:
torch.load(file)
When I loaded the array, this error appeared. All that is needed is to turn the array into a tensor.
For example:
result = torch.from_numpy(np.load(file))

NameError: name 'pd' is not defined when calling a function in custom package

Context
I'm learning python for Data Science and I'm using Foursquare API to explore venues near a coordinate. It returns a JSON file, so I created a function to return a Pandas DataFrame from Foursquare results using 'foursquare' package (github.com/mLewisLogic/foursquare) and then extract append the data to a DataFrame.
The function works in my Jupyter Notebook (you can check the function here https://github.com/dacog/foursquare_api_tools/blob/master/foursquare_api_tools/foursquare_api_tools.py), and I though about making it easier for others and tried to create a package which could be installed using pip directly from github. I successfully created a package and published it to github to test it, but when I'm trying to use the function it returns
NameError: name 'pd' is not defined
Steps to try the package
!pip install git+https://github.com/dacog/foursquare_api_tools.git#egg=foursquare_api_tools
# #hidden_cell
CLIENT_ID = 'Secret' # your Foursquare ID
CLIENT_SECRET = 'Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
from foursquare_api_tools import foursquare_api_tools as ft
ft.venues_explore(client,lat='40.7233',lng='-74.0030',limit=100)
and I get
NameError Traceback (most recent call last)
<ipython-input-47-0a062ed9d667> in <module>()
3 import pandas as pd
4
----> 5 ft.venues_explore(client,lat='40.7233',lng='-74.0030',limit=100)
/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/foursquare_api_tools/foursquare_api_tools.py in venues_explore(client, lat, lng, limit)
3 This returns a pandas dataframe with name, city ,country, lat, long, postal code, address and main category as columns'''
4 # creata a dataframe
----> 5 df_a = pd.DataFrame(columns=['Name', 'City', 'Latitude','Longitude','Category','Postal Code', 'Address'])
6 ll=lat+','+lng
7 #get venues using client https://github.com/mLewisLogic/foursquare
NameError: name 'pd' is not defined
I tried import pandas as pd in the main notebook, inside the function, in __init__.py always with the same result.
You can check the code at https://github.com/dacog/foursquare_api_tools
It's the first time I'm creating a package and pretty new to python, so any help will be greatly appreciated.
UPDATES
Pandas is working fine in the environment when I'm doing the tests.
The installed Python versions are:
!which python --> /home/jupyterlab/conda/bin/python
!whereis python
/usr/bin/python /usr/bin/python2.7 /usr/lib/python2.7 /etc/python /etc/python2.7
/usr/local/lib/python2.7 /usr/share/python
/home/jupyterlab/conda/bin/python /home/jupyterlab/conda/bin/python3.6
/home/jupyterlab/conda/bin/python3.6-config /home/jupyterlab/conda/bin/python3.6m /home/jupyterlab/conda/bin/python3.6m-config /usr/share/man/man1/python.1.gz
You are missing a import pandas as pd statement in foursquare_api_tools.py. Just add that line at the top of that file, and you should be good to go.
The clue is in the error: NameError, on line 5 where you call pd.DataFrame, because there is no import statement, Python does not know what the "name" pd means.
In addition to "import pandas as pd" as seaborn to your libraries, use this:
Import pandas as pd
Import seaborn as sns
Sns.set()
This should work in Jupyter notebook

ValueError when reading a sas file with pandas

pandas.read_sas() prints traceback messages that I cannot remove. The problem is it prints messages for EACH row it's reading, so when I try to read the whole file it just freezes printing too much.
I tried from other stackoverflow answers
import warnings
warnings.simplefilter(action='ignore')
And
warnings.filterwarnings('ignore')
And
from IPython.display import HTML
HTML('''<script>
code_show_err=false;
function code_toggle_err() {
if (code_show_err){
$('div.output_stderr').hide();
} else {
$('div.output_stderr').show();
}
code_show_err = !code_show_err
}
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click here.''')
But nothing works.
The message it prints is:
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) pandas\io\sas\sas.pyx in pandas.io.sas._sas.rle_decompress()
ValueError: Unexpected non-zero end_of_first_byte
Exception ignored in:
'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback
(most recent call last): File "pandas\io\sas\sas.pyx", line 29, in
pandas.io.sas._sas.rle_decompress ValueError: Unexpected non-zero
end_of_first_byte
As highlighted in the traceback, the error is caused by a bug in the pandas implementation of RLE decompression, which is used when the SAS dataset is exported using CHAR (RLE) compression.
Note the pandas issue created for this topic: https://github.com/pandas-dev/pandas/issues/31243
The resolution that pandas implemented for this bug in read_sas is contained in the following Pull Request, which is part of the version 1.5 milestone, yet to be released at the time of answering: https://github.com/pandas-dev/pandas/pull/47113
To answer your question, you have two options:
Wait until pandas releases version 1.5, update to that version, and read_sas should then work as expected. You've already been waiting awhile since you asked, so I suspect this will be fine.
Use the python sas7bdat library instead (https://pypi.org/project/sas7bdat/), and then convert to a pandas DataFrame:
from sas7bdat import SAS7BDAT
df = SAS7BDAT("./path/to/file.sas7bdat").to_data_frame()
The sas7bdat approach worked for me, after facing the exact same error as you did.

vcfutils for parsing multiple vcf files

I have multiple VCF files and What I need to achieve is to extract some rows from VCF file based on the defined filters.. So in order to achieve that I started off with using
import vcf
import vcf.utils
which seems to be to straight forward and neat. but I am getting issues, in between it would be really great if some one take a look and guide a little bit to get to the desired output..
VCF file looks like this its has lines starting with # and later information we needed,(few lines with headers and the rows needed are as follows,)
##fileformat=VCFv4.1
##source=V2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
chr10 197523 . G A . PASS DP=26;SS=1;SSC=2;GPV=5.4595E-6;SPV=6.1327E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:17:8:9:52.94%:5,3,4,5 0/1:.:9:4:5:55.56%:2,2,2,3
chr10 198411 . T G . PASS DP=37;SS=1;SSC=5;GPV=1.2704E-5;SPV=2.7151E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:19:13:6:31.58%:8,5,1,5 0/1:.:18:9:8:47.06%:3,6,5,3
and so I used following piece of python code to get the information I need,
The flowing piece of code throws error message,
reader_BM_CR_ID = vcf.Reader(filename="sample/sam/sample.vcf", compressed=False)
writer_CR = vcf.Writer(open('same/sam/sample_filtered.vcf', 'w'), reader_BM_CR_ID)
for variants in vcf.utils(reader_BM_CR_ID):
for call in variants.samples:
if call.sample == 'T':
if call.data.FREQ >='20%':
if call.data.FREQ >'0%':
if call.data.FREQ !='100%':
if call.data.DP >=20:
writer.write_record(id_var)
The error message,
TypeError Traceback (most recent call last)
<ipython-input-471-526e4c3bbab1> in <module>()
----> 1 for variants in vcf.utils(reader_BM_CR_ID):
2
3 for call in variants.samples:
4 if call.sample == 'T':
5 if call.data.FREQ >='20%':
TypeError: 'module' object is not callable
Any help is really appreciated..!!
You are trying to call module as function.
Python carefully reports about it:
TypeError: 'module' object is not callable

Categories