print(df["date"].str.replace("2016","16"))
The code above works fine. What I really want to do is to make this replacement in just a small part of the data-frame. Something like:
df.loc[2:4,["date"]].str.replace("2016","16")
However here I get an error:
AttributeError: 'DataFrame' object has no attribute 'str'
What about df['date'].loc[2:4].str.replace('2016', 16')?
By selecting ['date'] first you know you are dealing with a series which does have a string attribute.
Related
I have categoryDf which is spark Dataframe and its being printed successfully:
categoryDf.limit(10).toPandas()
I want to join this to another sparkdataframe. So, I tried this:
df1=spark.read.parquet("D:\\source\\202204121920-seller_central_opportunity_explorer_niche_summary.parquet")
#df1.limit(5).toPandas()
df2=df1.join(categoryDf,df1["category_id"] == categoryDf["cat_id"])
df2.show()
When I use df2.show() then I see the output as:
The join is happening succesfully.But when I tried to change it into df2.limit(10).toPandas(), I see the error:
AttributeError: 'DataFrame' object has no attribute 'dtype' error in pyspark
I want to see how the data looks after join. So, I tried to use df2.limit(10).toPandas(). Or is there any other method to see the data since my join is happening successfully?
My python version is:3.7.7
Spark version is:2.4.4
I faced the same problem, in my case it was because I had duplicate column names after the join.
I see you have report_date and marketplaceid in both dataframes. For each duplicated pair, you need to either drop one or both, or rename one of them.
for index, row in HDFC_Nifty.iterrows():
if HDFC_Nifty.iat(index,0).dt.year ==2016:
TE_Nifty_2016.append(row['TE'])
else:
TE_Nifty_2017.append(row['TE'])
Hello,
I am attempting to iterate over the DataFrame, more specifically apply a conditional to the Date column which is formatted as a Datatime object.
I keep getting the below error
**
TypeError: '_iAtIndexer' object is not callable
**
I do not know how to proceed further. I have tried various loops but am largely unsuccessful and not able to understand what I am doing incorrect.
Thanks for the help!
I am trying to fill the NAs in my data frame using the following code and got an error. Can anyone help? Why it is not working? I need to use more than one column (gender and age). With only one column, it works but beyond one column, I have an error
Here is the code:
df['NewCol'].fillna(df.groupby(['gender','age'])['grade'].transform('mean'),inplace=True)
The error message is:
TypeError: 'NoneType' object is not subscriptable
I have a dataframe with a column test that contains test names, which I am using that to extract some information about what grade a test was written. Because I know that the string used for the test name always has the grade in it as the next digit after the date I have been extracting that data using this line of code:
df['Grade'] = df['test'].apply(lambda x: str(list(filter(str.isdigit, x[10:]))[0]))
This line, however, gives a TypeError: 'float' object is not subscriptable. Now, I should note that before I ran this, I did a check with df.dtypes and the column test was listed as object. That makes sense, as the string for the test names are something like 2015-2016_math_grade_7, so there is no way it could be seen as a float by Pandas.
I have checked, and test names are the only data in that column, so I have no idea why I am getting this type error. No matter what I change the code to, I get this error because I need to perform a string operation after x[:10]. (I have used df['Grade'] = df['test'].apply(lambda x: str(re.sub("\D", "", str(x[:10])))))
I should also note, that I have used this code before and it worked perfectly, but for some reason on this data set it seems to fail, if that helps.
Could you please support me to clarify the proper way to convert a selected Dataframe column into string ?
'Product_ID' part of dataframe 'df' is automatically selected as integer
If I use following statement:
df['Product_ID']=df['Product_ID'].to_string()
generate error:
TypeError: Can't convert 'int' object to str implicitly
same issue is generated by .astype(str) or .apply(str)
thanks!
First, note that to_string and .astype(str) do different things. to_string returns a single string, and .astype(str) returns a series with string values. Which are you trying to do?
Second, how sure are you that you are working with an integer series? What does df['Product_ID'].dtype return?
Third, can you try to post a reproducible example? One way to narrow down the data that is causing the problem:
for i,v in enumerate(df['Product_ID'].values):
try:
str(v)
except TypeError:
print i, v