I'm trying to write the data in the existing zip file to hdfs in parquet format, but I encountered an error like this. I would be glad if you help. (By the way, I'm open to your ideas to make this code serve the same purpose in a different way)
import pandas as pd
import pyarrow.parquet as pq
file = c:/okay.log.gz
df = pd.read_csv(file, compression =gzip, low_memory=false, sep="|", error_badlines=False)
pq.write_table(df, "target_path")
AttributeError: 'DataFrame' object has no attribute 'schema'
I've just run into the same issue, but I assume you've resolved yours. In case you haven't or someone else comes across this with a similar issue, try creating a pyarrow table from the dataframe first.
import pyarrow as pa
import pyarrow.parquet as pq
df = {some dataframe}
table = pa.Table.from_pandas(df)
pq.write_table(table, '{path}')
Related
I'm using a Jupyter notebook and I'm trying to open a data file and keep getting an error code AttributeError:'pandas._libs.properties.AxisProperty' object has no attribute 'unique'. This is my first time using Jupyter So I am not familiar with any error like this.
import pandas as pd
df = pd.DataFrame
df - pd.read_csv("C:/Users/yusra/OneDrive/Documents/vgsales.csv")
df
You are not using pd.DataFrame right. See below corrected code:
import pandas as pd
df=pd.read_csv("C:/Users/yusra/OneDrive/Documents/vgsales.csv")
df
For a current project, I am planning to clean a Pandas DataFrame off its Null values. For this purpose, I want to use Pandas.DataFrame.fillna, which is apparently a solid soliton for data cleanups.
When running the below code, I am however receiving the following error AttributeError: module 'pandas' has no attribute 'df'. I tried several options to rewrite the line df = pd.df().fillna, none of which changed the outcome.
Is there any smart tweak to get this running?
import string
import json
import pandas as pd
# Loading and normalising the input file
file = open("sp500.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df = pd.df().fillna
When you load the file to the pandas - in your code the data variable is a DataFrame instance. However, you made a typo.
df = pd.json_normalize(data)
df = df.fillna()
I am doing a data analysis project and while importing the csv file into spyder I am facing this error. Please help me to debug this as I am new to programming.
#import library
>>>import pandas as pd
#read the data from from csv as a pandas dataframe
>>>df = pd.read.csv('/Documents/Melbourne_housing_FULL.csv')
This is the error shown when I use the pd.read.csv command:
File "C:/Users/mylaptop/.spyder-py3/temp.py", line 4, in <module>
df = pd.read.csv('/Documents/Melbourne_housing_FULL.csv')
AttributeError: module 'pandas' has no attribute 'read'
you should use :
df = pd.read_csv('/Documents/Melbourne_housing_FULL.csv')
see here docs
you need to use pandas.read_csv() instead of pandas.read.csv() the error is litterally telling you this method doesn't exist .
I am reading a parquet file and transforming it into dataframe.
from fastparquet import ParquetFile
pf = ParquetFile('file.parquet')
df = pf.to_pandas()
Is there a way to read a parquet file from a variable (that previously read and now hold parquet data)?
Thanks.
In Pandas there is method to deal with parquet. Here is reference to the docs. Something like that:
import pandas as pd
pd.read_parquet('file.parquet')
should work. Also please read this post for engine selection.
You can read a file from a variable also using pandas.read_parquet using the following code. I tested this with the pyarrow backend but this should also work for the fastparquet backend.
import pandas as pd
import io
with open("file.parquet", "rb") as f:
data = f.read()
buf = io.BytesIO(data)
df = pd.read_parquet(buf)
I am getting the following error after attempting to read in an xlsx file, write it to a dataframe using feather, then read in that same dataframe using feather and display the results using df.head()
import pandas as pd
import feather
v = pd.__version__
#--- read in excel data
in_path = '/Users/me/Documents/python/data/FGS2000-2015.xlsx'
out_path = '/Users/me/Documents/python/data/mydata.feather'
df = pd.read_excel(in_path)
#--- write pandas dataframe to disk
feather.write_dataframe(df,out_path)
#--- read dataframe from disk
data = feather.read_dataframe(out_path)
data.head()
'module' object has no attribute 'write_dataframe' error.
I had the same issue. I was able to fix it by doing:
pip install feather-format
Instead of:
pip install feather