I am trying to run the line of code:
pd.get_dummies(pd_df, columns = ['ethnicity'])
However, I keep getting the error 'DataFrame' object has no attribute '_internal'. It looks like its linked to the ...pyspark/pandas/namespace.py file so therefore I am not too sure how to fix it.
Unfortunately, the dataframe itself is private so I can't show/describe it on Stackoverflow however any information about why this could be happening would be greatly appreciated!
I can make the example below work perfectly but it wont work on my code even though it is exactly the same I just have a different DataFrame that has been changed from PySpark to Pandas:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"sales":[50000,52000,90000,34000,42000,72000,49000,55000,67000,65000,67000]
,"region":["East","North","East","South","West","West","South","West","West","East",np.nan]
}
)
pd.get_dummies(sales_data, columns = ['region'])
I had this same error. I was confusing the execution by using ps (pyspark.pandas) instead of pd (pandas).
Ensure your alias are correct and you're not accidentally renaming a pandas instantiation:
Ex.
import pyspark.pandas as pd
Related
I am new to data, so after a few lessons on importing data in python, I tried the following codes in my jupter notebook but keep getting an error saying df not defined. I need help.
The code I wrote is as follows;
import pandas as pd
url = "https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv"
df = pd.read_csv(https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv)
After running the third code, I got a series of reports on jupter notebook but one that stood out was "df not defined"
The problem here is that your data is a ZIP file containing multiple CSV files. You need to download the data, unpack the ZIP file, and then read one CSV file at a time.
If you can give more details on the problem(etc: screenshots), debugging will become more easier
One possibility for the error is that the response content accessed by the url(https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv) is a zip file, which may prevent pandas from processing it further.
I am facing a small issue with a line of code that I am converting from pandas into Koalas.
Note: I am executing my code in the databricks.
The following line is pandas code:
input_data['t_avail'] = np.where(input_data['purchase_time'] != time(0, 0), 1, 0)
I did the conversion to Koalas as follows. Just to mention that I already have defined the input_data dataframe as Koalas type before the following line of code.
# Add a new column called 't_avail' in input_data Koalas dataframe
input_data = input_data.assign(
t_avail = (input_data['purchase_time'] != time(0, 0))
)
I get the following error with the Koalas conversion: TypeError: 'module' object is not callable
I am not sure what is the issue with the time module as I just want to assign the t_avail column with entries from the purchase_time column with entries that have a not empty time.
May someone help me resolve the issue? I think I am missing something silly.
Thank you to all.
As you say you import time module in your code.
This is because you write time(0,0).
However, time is a module and you use it as a function
You can use this
input_data = input_data.assign(
t_avail = ((input_data['purchase_time']).str.strip() != "")
)
I am starting to learn and understand panda module in Python. However, my issue is with the rename string. The rename works fine when i use print, this shows the column has been renamed:
print(data.rename(columns={"Rep": "Name"}))
However, when i use print(data), to show all of the data from the document, the column does not show as being renamed. This also does not show when the file has been exported using the data.to_csv("example.csv") string.
Would really appreciate if somebody could shed some light on this please.
Full Source code below:
import pandas as pd
data = pd.read_excel(r"D:\Downloads\Book1.xlsx")
del data["Region"]
del data["Item"]
print(data.rename(columns={"Rep": "Name"})
print(data)
data.to_csv("example.csv")
Use inplace argument, to make the changes reflect in the DataFrame as well, like this:
data.rename(columns={"Rep": "Name"}, inplace = True)
Try adding 'inplace=True' to data.rename
print(data.rename(columns={"Rep": "Name"}, inplace=True))
I intended
to write a code which helps me display Table / Dataframe on GUI (Kivy). To which I found the solution here. Apparently it uses a non-official package from a github repo which is dfgui.
The Problem
occurred to me when I executed as told on StackOverflow link. However returned Error that
wx._core.PyAssertionError: C++ assertion "!items.IsEmpty()" failed at
/usr/include/wx-3.0/wx/ctrlsub.h(154) in InsertItems(): need something
to insert
I Brokedown
the problem by selective execution in foll. way
import dfgui
import pandas as pd
xls = pd.read_excel('Res.xls')
df = pd.DataFrame(xls)
dfgui.show(df)
#dfgui.show(xls) Apparently the same as df
which then returned
TypeError: String or Unicode type required
and led me to this link, which I couldn't understand much.
Point me in North, or perhaps a different solution could be great too.
I have 0.20.3 version of pandas install. I am trying to set header_style to false so that i can format the header row. xlsxwriter not applying format to header row of dataframe - Python Pandas
I keep getting error : AttributeError: 'module' object has no attribute 'formats'
I have tried
pd.formats.format.header_style = None
and
pd.core.format.header_style = None
Any idea what am I doing wrong ?
As you can see in the API, the module pandas.formats and pandas.core.format do not exist : https://pandas.pydata.org/pandas-docs/stable/api.html
It is normal that you have this error.
If you read new API changes with 0.20, pandas.formats has become pandas.io.formats. Try to check the API.
Another way to do this, suggested by #Martin Evans, is to write the headers directly, outside of Pandas. This avoids issues like above with different Pandas versions.
See also this example in the XlsxWriter docs.