Replace string in one part pandas dataframe - python

print(df["date"].str.replace("2016","16"))
The code above works fine. What I really want to do is to make this replacement in just a small part of the data-frame. Something like:
df.loc[2:4,["date"]].str.replace("2016","16")
However here I get an error:
AttributeError: 'DataFrame' object has no attribute 'str'

What about df['date'].loc[2:4].str.replace('2016', 16')?
By selecting ['date'] first you know you are dealing with a series which does have a string attribute.

Related

AttributeError: 'DataFrame' object has no attribute 'dtype' error in pyspark

I have categoryDf which is spark Dataframe and its being printed successfully:
categoryDf.limit(10).toPandas()
I want to join this to another sparkdataframe. So, I tried this:
df1=spark.read.parquet("D:\\source\\202204121920-seller_central_opportunity_explorer_niche_summary.parquet")
#df1.limit(5).toPandas()
df2=df1.join(categoryDf,df1["category_id"] == categoryDf["cat_id"])
df2.show()
When I use df2.show() then I see the output as:
The join is happening succesfully.But when I tried to change it into df2.limit(10).toPandas(), I see the error:
AttributeError: 'DataFrame' object has no attribute 'dtype' error in pyspark
I want to see how the data looks after join. So, I tried to use df2.limit(10).toPandas(). Or is there any other method to see the data since my join is happening successfully?
My python version is:3.7.7
Spark version is:2.4.4
I faced the same problem, in my case it was because I had duplicate column names after the join.
I see you have report_date and marketplaceid in both dataframes. For each duplicated pair, you need to either drop one or both, or rename one of them.

Python Conditional on a date time Object

for index, row in HDFC_Nifty.iterrows():
if HDFC_Nifty.iat(index,0).dt.year ==2016:
TE_Nifty_2016.append(row['TE'])
else:
TE_Nifty_2017.append(row['TE'])
Hello,
I am attempting to iterate over the DataFrame, more specifically apply a conditional to the Date column which is formatted as a Datatime object.
I keep getting the below error
**
TypeError: '_iAtIndexer' object is not callable
**
I do not know how to proceed further. I have tried various loops but am largely unsuccessful and not able to understand what I am doing incorrect.
Thanks for the help!

How can I fill the NAs using groupby function in Pandas (using Python) when I have more than one column?

I am trying to fill the NAs in my data frame using the following code and got an error. Can anyone help? Why it is not working? I need to use more than one column (gender and age). With only one column, it works but beyond one column, I have an error
Here is the code:
df['NewCol'].fillna(df.groupby(['gender','age'])['grade'].transform('mean'),inplace=True)
The error message is:
TypeError: 'NoneType' object is not subscriptable

Pandas apply giving float error when operation is performed on string

I have a dataframe with a column test that contains test names, which I am using that to extract some information about what grade a test was written. Because I know that the string used for the test name always has the grade in it as the next digit after the date I have been extracting that data using this line of code:
df['Grade'] = df['test'].apply(lambda x: str(list(filter(str.isdigit, x[10:]))[0]))
This line, however, gives a TypeError: 'float' object is not subscriptable. Now, I should note that before I ran this, I did a check with df.dtypes and the column test was listed as object. That makes sense, as the string for the test names are something like 2015-2016_math_grade_7, so there is no way it could be seen as a float by Pandas.
I have checked, and test names are the only data in that column, so I have no idea why I am getting this type error. No matter what I change the code to, I get this error because I need to perform a string operation after x[:10]. (I have used df['Grade'] = df['test'].apply(lambda x: str(re.sub("\D", "", str(x[:10])))))
I should also note, that I have used this code before and it worked perfectly, but for some reason on this data set it seems to fail, if that helps.

python3 pandas - #TypeError: Can't convert 'int' object to str implicitly

Could you please support me to clarify the proper way to convert a selected Dataframe column into string ?
'Product_ID' part of dataframe 'df' is automatically selected as integer
If I use following statement:
df['Product_ID']=df['Product_ID'].to_string()
generate error:
TypeError: Can't convert 'int' object to str implicitly
same issue is generated by .astype(str) or .apply(str)
thanks!
First, note that to_string and .astype(str) do different things. to_string returns a single string, and .astype(str) returns a series with string values. Which are you trying to do?
Second, how sure are you that you are working with an integer series? What does df['Product_ID'].dtype return?
Third, can you try to post a reproducible example? One way to narrow down the data that is causing the problem:
for i,v in enumerate(df['Product_ID'].values):
try:
str(v)
except TypeError:
print i, v

Categories