module 'pandas' has no attribute 'read_csv': AttributeError - python

I have written a lambda function for AWS which will use pandas for handling dataframe. When I tested this lambda function - I faced error - No module name pandas.
I further kept pandas and other dependencies libraries in library folder of my repository.
Now I am facing other issue which I am unable to solve.
Current error:
module 'pandas' has no attribute 'read_csv': AttributeError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 127, in lambda_handler
initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word
AttributeError: module 'pandas' has no attribute 'read_csv'
I checked the solutions available on this site - like - module 'pandas' has no attribute 'read_csv
I don't have pandas.py and csv.py in my pandas folder but rather have test_to_csv.py, csvs.py and test_pandas.py, which is required as per the discussion in link provided above.
I am unable to figure out a way here.

Pandas is indeed not available by default on AWS lambda.
If you want to use Pandas with AWS lamdba, the easiest way is to use the AWS Data Wrangler layer.
When you add a new layer, select AWS layers , then in the dropdown menu you can select the AWSDataWrangler-Python39 one.
Once you have added the layer, you will be able to use pandas as usual.

Related

Pandas read_xml - AttributeError: module 'pandas' has no attribute 'read_xml'

I ran into the error:
" AttributeError: module 'pandas' has no attribute 'read_xml' "
This would be a huge lifesaver if I could ingest the XML with one function into a pandas df without trying to iterate through etc.
I am running pandas 1.3.4 and python 3.8.8. I have tried opening an xml in my local folder (where the script is housed).
I tried directly importing the file like so:
df = pd.read_xml('xmltest.xml')
As well as trying to import via a string like so:
txt = Path('xmltest.txt').read_text()
df = pd.read_xml(txt)
And both gave me the wrong error.. Any help would be awesome as this would be AMAZING to ingest XML with 1 function into a DF!!! Are there similar functions out there if this is no longer a valid solution?
It appears this person had the same problem but I'm currently running the updated pandas:
AttributeError: module 'pandas' has no attribute 'read_xml' or 'to_xml'

When importing into Google Collab: TypeError: 'str' object is not callable

using pandas in google colaboratory, I am attempting to import a .csv file named 'gifted.csv'. Using the following code:
df=pd.read_csv('/content/gifted.csv')
I have ran the pandas library as pd, but whenever I run the code, it does not function and the following error appears.
enter code hereTypeError: 'str' object is not callable
i dont know where the csv is located but try
df=pd.read_csv('content/gifted.csv')
without the '/' before the content
the error does not imply for it but try it.
more check about the import did the package installed well.

Spark - is map function available for Dataframe or just RDD?

I just realized that I can do following in Scala
val df = spark.read.csv("test.csv")
val df1=df.map(x=>x(0).asInstanceOf[String].toLowerCase)
However in Python if I try to call map function on DataFrame it will throw me error.
df = spark.read.csv("Downloads/test1.csv")
df.map(lambda x: x[1].lower())
Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/apache-spark/2.4.3/libexec/python/pyspark/sql/dataframe.py", line 1300, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'map'
In Python I need to explicitly convert Dataframe to RDD.
my question is why, I need to do this in case of python ?
Is this the different in Spark API implementation or Scala implicityly converts DataFrame to RDD back and again to DataFrame
Python Dataframe API doesn't have map function due to how the Python API works.
Python, everytime that you convert to RDD or uses a UDF with the Python API you are creating a python call during your execution.
What that means? That means, during the Spark execution instead of all the data be processed inside of the JVM with Scala code generated (Dataframe API) the JVM need to call the Python code to apply the logic you created. That by default creates a HUGE overhead during the execution.
So the solution for Python is building an API that will block the usage of python code and will only use Scala generated code using the DataFrame pipeline.
This will help to understand how UDFs with python works, that basically is really close how RDD maps will work with Python: https://medium.com/wbaa/using-scala-udfs-in-pyspark-b70033dd69b9

osmx elevation data: no module named 'keys'

I'm coming from R and new to Python, so I assume this is a novice question, but any help would be appreciated. I'm following along with this example to add elevation data to Open Streets Map data using the OSMnx package:
https://github.com/gboeing/osmnx-examples/blob/master/notebooks/12-node-elevations-edge-grades.ipynb
When I type the first block of code
from keys import my_google_elevation_api_key
I get this error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-50-384032deeff7> in <module>()
----> 1 from keys import my_google_elevation_api_key #replace this with your own API key
ModuleNotFoundError: No module named 'keys'
-----------
(Note - I am using an API key from Google as instructed in the example. The code above is intentionally generic.) I'm using jupyter to type the Python code.
The author probably has a keys.py in which he defines the variable google_elevation_api_key and assigns it a key that he later uses to access the elevation API.
You could simply replace the import line
from keys import my_google_elevation_api_key
with this
my_google_elevation_api_key = '<your_api_key>'

'Tables' not recognizing 'isHDF5File'

I am writing a code that creates an HDF5 that can later be used for data analysis. I load the following packages:
import numpy as np
import tables
Then I use the tables module to determine if my file is an HDF5 file with:
tables.isHDF5File(FILENAME)
This normally would print either TRUE or FALSE depending on if the file type is actually an HDF5 file or not. However, I get the error:
AttributeError: module 'tables' has no attribute 'isHDF5File'
So I tried:
from tables import isHDF5File
and got the error:
ImportError: cannot import name 'isHDF5File'
I've tried this code on another computer, and it ran fine. I've tried updating both numpy and tables with pip but it states that the file is already up to date. Is there a reason 'tables' isn't recognizing 'isHDF5File' for me? I am running this code on a Mac (not working) but it worked on a PC (if this matters).
Do you have the function name right?
In [21]: import tables
In [22]: tables.is_hdf5_file?
Docstring:
is_hdf5_file(filename)
Determine whether a file is in the HDF5 format.
When successful, it returns a true value if the file is an HDF5
file, false otherwise. If there were problems identifying the file,
an HDF5ExtError is raised.
Type: builtin_function_or_method
In [23]:

Categories