Replacing values in pandas dataframe using values in a list - python

I have a column in my df which ends with ['-A','-B','-T','-Z','-EQ','-BE','-BL','-BT','-GC','-IL','-IQ'], and I need to remove the values.
I tried the below and got an error
df['name'] = df['name'].str.replace(['-A','-B','-T','-Z','-EQ','-BE','-BL','-BT','-GC','-IL','-IQ'],'', regex=True)
TypeError: unhashable type: 'list'

Use Series.replace instead Series.str.replace:
df['name'] = df['name'].replace(['-A','-B','-T','-Z','-EQ','-BE','-BL','-BT','-GC','-IL','-IQ'],'', regex=True)

Related

Create a new column which is cast to a string in pandas

What would be the proper way to assign a stringified column to a dataframe, as I would like to keep the original so I don't want to use .astype({'deliveries': 'str'). SO far I have:
df = ( df.groupby('path')
.agg(agg_dict)
.assign(deliveries_str=df['deliveries'].str ??)
)
What would be the proper way to do this?
I also tried the following but I get an unhashable type error:
.assign(deliveries_str=lambda x: x.deliveries.str)
TypeError: unhashable type: 'list'
You need try change .str since it is a function
.assign(deliveries_str=lambda x: x.deliveries.astype(str))
Adding mask
.assign(deliveries_str=lambda x: x['deliveries'].astype(str).mask(x['deliveries'].isnull()))

Remove white space from entire DataFrame

i have a dataframe, 22 columns and 65 rows. The data comes in from csv file.
Each of the values with dataframe has an extra unwanted whitespace. So if i do a loop on 'Year' column with a Len() i get
2019 5
2019 5
2018 5
...
this 1 extra whitespace appears throughout DF in every value. I tried running a .strip() on DF but no attribute exists
i tried a 'for each df[column].str.strip() but there are various data types in each column... dtypes: float64(6), int64(4), object(14) , so this errors.
any ideas on how to apply a function for entire dataframe, and if so, what function/method? if not what is best way to handle?
Handle the error:
for col in df.columns:
try:
df[col] = df[col].str.strip()
except AttributeError:
pass
Normally, I'd say select the object dtypes, but that can still be problematic if the data are messy enough to store numeric data in an object container.
import pandas as pd
df = pd.DataFrame({'foo': [1, 2, 3], 'bar': ['seven ']*3})
df['foo2'] = df.foo.astype(object)
for col in df.select_dtypes('object'):
df[col] = df[col].str.strip()
#AttributeError: Can only use .str accessor with string values!
you should use apply() function in order to do this :
df['Year'] = df['Year'].apply(lambda x:x.strip() )
you can apply this function on each column separately :
for column in df.columns:
df[column] = df[column].apply(lambda x:x.strip() )
Try this:
for column in df.columns:
df[column] = df[column].apply(lambda x: str(x).replace(' ', ' '))
Why not try this?
for column in df.columns:
df[column] = df[column].apply(lambda x: str(x).strip())

Replace NaN values of filtered column by the mean

I have a dataframe with the following shape:
Index([u'PRODUCT',u'RANK', u'PRICE', u'STARS', u'SNAPDATE', u'CAT_NAME'], dtype='object')
For each product of that dataframe I can have NaN values for a specific date.
The goal is to replace for each product the NaN values by the mean of the existing values.
Here is what I tried without success:
for product in df['PRODUCT'].unique():
df = df[df['PRODUCT'] == product]['RANK'].fillna((df[df['PRODUCT'] == product]['RANK'].mean()), inplace=True)
print df
gives me:
TypeError: 'NoneType' object has no attribute '__getitem__'
What am I doing wrong?
You can use groupby to create a mean series:
s = df.groupby('PRODUCT')['RANK'].mean()
Then use this series to fillna values:
df['RANK'] = df['RANK'].fillna(df['PRODUCT'].map(s))
The reason you're getting this error is because of your use of inplace in fillna. Unfortunately, the documentation there is wrong:
Returns: filled : Series
This shows otherwise, though:
df = pd.DataFrame({'a': [3]})
>>> type(df.a.fillna(6, inplace=True))
NoneType
>>> type(df.a.fillna(6))
pandas.core.series.Series
So when you assign
df = df[df['PRODUCT'] == product]['RANK'].fillna((df[df['PRODUCT'] == product]['RANK'].mean()), inplace=True)
you're assigning df = None, and the next iteration fails with the error you get.
You can omit the assignment df =, or, better yet, use the other answer.

Pandas: Column' object is not callable

I am trying to strip all the values after 'H' and store it to a column.
df['col1'] = df['col1'].str.split('H').str[0]
But pyspark gives me error : Column' object is not callable
One possible solution is add expand=True for DataFrame and then select second column:
df['col1'] = df['col1'].str.split('H', expand=True).iloc[:, 1]
Or:
df['col1'] = df['col1'].str.split('H', expand=True)[1]

Error while using rstrip in pandas

I have a dataframe df with one of the column "values". It contains -
values
[u'12f4',u'ff45',u'tr23']
[u'125g4',u'ff145',u'trr523']
[u'12f34',u'ff2345',u'trg23a']
I want to remove ']' from each cell. I am using the following code -
df['values'] = df['values'].map(lambda x: x.rstrip(']'))
This gives me an error -
AttributeError: 'float' object has no attribute 'rstrip'
How do I get rid of this error?
Try use str.rstrip:
df['values'] = df['values'].str.rstrip(']')

Categories