I have a dataframe with 15 columns. 5 of those columns use numbers but some of the entries are either blanks, or words. I want to convert those to zero.
I am able to convert the entries in one of the column to zero but when I try to do that for multiple columns, I am not able to do it. I tried this for one column:
pd.to_numeric(Tracker_sample['Product1'],errors='coerce').fillna(0)
and it works, but when I try this for multiple columns:
pd.to_numeric(Tracker_sample[['product1','product2','product3','product4','Total']],errors='coerce').fillna(0)
I get the error : arg must be a list, tuple, 1-d array, or Series
I think it is the way I am calling the columns to be fixed. I am new to pandas so any help would be appreciated. Thank you
You can use:
Tracker_sample[['product1','product2','product3','product4','Total']].apply(pd.to_numeric, errors='coerce').fillna(0)
With a for loop?
for col in ['product1','product2','product3','product4','Total']:
Tracker_sample[col] = pd.to_numeric(Tracker_sample[col],errors='coerce').fillna(0)
Related
I cant seem to find a way to split all of the array values from the column of a dataframe.
I have managed to get all the array values using this code:
The dataframe is as follows:
I want to use value.counts() on the dataframe and I get this
I want the array values that are clubbed together to be split so that I can get the accurate count of every value.
Thanks in advance!
You could try .explode(), which would create a new row for every value in each list.
df_mentioned_id_exploded = pd.DataFrame(df_mentioned_id.explode('entities.user_mentions'))
With the above code you would create a new dataframe df_mentioned_id_exploded with a single column entities.user_mentions, which you could then use .value_counts() on.
I have a dataframe which looks as follows:
I want to multiply elements in a row except for the "depreciation_rate" column with the value in the same row in the "depreciation_rate" column.
I tried df2.iloc[:,6:26]*df2["depreciation_rate"] as well as df2.iloc[:,6:26].mul(df2["depreciation_rate"])
I get the same results with both which look as follows. I get NaN values with additional columns which I don't want. I think the elements in rows also multiply with values in other rows in the "depreciation_rate" column. What would be a good way to solve this issue?
Try using mul() along axis=0:
df2.iloc[:,6:26].mul(df2["depreciation_rate"], axis=0)
I have a dataframe named df_train with 20 columns. Is there a pythonic way to just view info on only one column by selecting its name.
Basically I am trying to loop through the df and extract number of unique values and add missing values
print("\nUnique Values:")
for col in df_train.columns:
print(f'{col:<25}: {df_train[col].nunique()} unique values. \tMissing values: {} ')
If you want the total number of null values, this is the pythonic way to achieve it:
df_train[col].isnull().sum()
Yes there is a way to select individual columns from a dataframe.
df_train['your_column_name']
This will extract only the column with <your_column_name>.
PS: This is my first StackOverflow answer. Please be nice.
I want to select both the rows 489-493 and the rows 503-504 in this dataframe. I can slice them separately by df.iloc[489:493] and df.iloc[503:504], respectively, but am not sure how to combine them?
I have tried using df[(df.State =='Washington') & (df.State=='Wisconsin')] , however, I'm getting an empty dataframe with the column labels only.
if I do only one of them, eg. df[df.State =='Washigton'] this works fine, to produce 5 rows with Washington as expected.
So how can I combine them?
use pandas.DataFrame.loc.
df = df.loc[['Washington','Wisconsin'],['Region Name']]
df.iloc[np.r_[489:493, 503:504], :] worked for me!
Essentially this is the same question as in this link:How to automatically shrink down row numbers in R data frame when removing rows in R. However, I want to do this with a pandas dataframe. How would I go about doing so? There seems to be nothing similar to the rownames method of R dataframes in the Pandas library...Any ideas?
What you call "row number" is part of the index in pandas-speak, in this case a integer index. You can rebuild the index using
df = df.reset_index(drop=True)
There is another way of doing this, which does not generate a new column with the old index:
df.index=range(len(df.index))