Combining multiple rows in a pandas dataframe - python

I want to select both the rows 489-493 and the rows 503-504 in this dataframe. I can slice them separately by df.iloc[489:493] and df.iloc[503:504], respectively, but am not sure how to combine them?
I have tried using df[(df.State =='Washington') & (df.State=='Wisconsin')] , however, I'm getting an empty dataframe with the column labels only.
if I do only one of them, eg. df[df.State =='Washigton'] this works fine, to produce 5 rows with Washington as expected.
So how can I combine them?

use pandas.DataFrame.loc.
df = df.loc[['Washington','Wisconsin'],['Region Name']]

df.iloc[np.r_[489:493, 503:504], :] worked for me!

Related

Select 2 different set of columns from column multiindex dataframe

I have the following column multiindex dataframe.
I would like to select (or get a subset) of the dataframe with different columns of each level_0 index (i.e. x_mm and y_mm from virtual and z_mm rx_deg ry_deg rz_deg from actual). From what I have read I think I might be able to use pandas IndexSlice but not entire sure how to use it in this context.
So far my work around is to use pd.concat selecting the 2 sets of columns independently. I have the feeling that this can be done neatly with slicing.
You can programmatically generate the tuples to slice your MultiIndex:
from itertools import product
cols = ((('virtual',), ('x_mm', 'y_mm')),
(('actual',), ('z_mm', 'rx_deg', 'ry_deg', 'rz_deg'))
)
out = df[[t for x in cols for t in product(*x)]]

Pandas - drop rows based on two conditions on different columns

Although there are several related questions answered in Pandas, I cannot solve this issue. I have a large dataframe (~ 49000 rows) and want to drop rows the meet two conditions at the same time(~ 120):
For one column: an exact string
For another column: a NaN value
My code is ignoring the conditions and no row is removed.
to_remove = ['string1', 'string2']
df.drop(df[df['Column 1'].isin(to_remove) & (df['Column 2'].isna())].index, inplace=True)
What am I doing wrong? Thanks for any hint!
Instead of calling drop, and passing the index, You can create the mask for the condition for which you want to keep the rows, then take only those rows. Also, the logic error seems to be there, you are checking two different condition combined by AND for the same column values.
df[~(df['Column1'].isin(to_remove) & (df['Column2'].isna()))]
Also, if you need to check in the same column, then you probably want to combine the conditions by or i.e. |
If needed, you can reset_index at last.
Also, as side note, your list to_remove has two same string values, I'm assuming thats a typo in the question.

What's the most efficient way to drop columns (from beginning and end) in pandas from a large dataframe?

I am trying to drop a number of columns from the beginning and end of the pandas dataframe.
My dataframe has 397 rows and 291 columns. I currently have this solution to remove the first 8 columns, but I also want to remove some at the end:
SMPS_Data = SMPS_Data.drop(SMPS_Data.columns[0:8], axis=1)
I know I could just repeat this step and remove the last few columns, but I was hoping there is a more direct way to approach this problem.
I tried using
SMPS_Data = SMPS_Data.drop(SMPS_Data.columns[0:8,278:291], axis=1)
but it doesn't work.
Also, it seems that the .drop method somehow slows down the console responsiveness, so maybe there's a cleaner way to do it?
You could use .drop(), if you want to remove your columns by their column names
drop_these = ['column_name1', 'column_name2', 'last_columns']
df = df.drop(columns=drop_these)
If you know you want to remove them by their location, you could use .iloc():
df.iloc[:, 8:15] # For columns 8-15
df.iloc[:, :-5] # For all columns, except the last five
df.iloc[:. 2:-5] # For all columns, except the first column, and the last five
See this documentation on indexing and slicing data with pandas, for more information.

Adding Multiple Columns

How to add multiple columns from one dataframe to another dataframe i laready figured out to add a single column but not getting multiple columns. I am a newbie
df
new['Symbol']= pd.Series(df['Symbol'])
dfnew['Symbol']['Desc']= pd.Series(df['Symbol']['Desc'])
Use:
dfnew['Symbol'],dfnew['Desc']= df['Symbol'],df['Desc']
Or df.assign():
dfnew=dfnew.assign(Symbol=df.Symbol,Desc=df.Desc)
If needed initialize dfnew first as dfnew=pd.DataFrame()

Use to_numeric on certain columns only in PANDAS

I have a dataframe with 15 columns. 5 of those columns use numbers but some of the entries are either blanks, or words. I want to convert those to zero.
I am able to convert the entries in one of the column to zero but when I try to do that for multiple columns, I am not able to do it. I tried this for one column:
pd.to_numeric(Tracker_sample['Product1'],errors='coerce').fillna(0)
and it works, but when I try this for multiple columns:
pd.to_numeric(Tracker_sample[['product1','product2','product3','product4','Total']],errors='coerce').fillna(0)
I get the error : arg must be a list, tuple, 1-d array, or Series
I think it is the way I am calling the columns to be fixed. I am new to pandas so any help would be appreciated. Thank you
You can use:
Tracker_sample[['product1','product2','product3','product4','Total']].apply(pd.to_numeric, errors='coerce').fillna(0)
With a for loop?
for col in ['product1','product2','product3','product4','Total']:
Tracker_sample[col] = pd.to_numeric(Tracker_sample[col],errors='coerce').fillna(0)

Categories