I want to delete some specific rows in a Dataframe in Python . The dataframe consists of a series of tables and we have to delete the rows where only the first cell has values . For example the in the bottom , rows highlighted in yellows.
If Unwanted rows have specific part string common then you could explicitly delete those using
df_new = df[~df.columnName.str.contains("FINANCIAL SERVICES")]
and if the row cells are NULL use dropna
df.dropna(subset=df.columns[1:], how= 'all', inplace = True)
Related
Importing a sql datatable as a pandas dataframe and dropping all completely empty columns:
equip = %sql select * from [coswin].[dbo].[Work Order]
df = equip.DataFrame()
#dropping empty columns
df.dropna(axis=1, how="all", inplace=True)
the problem is I am still finding the null columns without any errors in the output.
Are you sure the columns you want to remove are full of null values? You might check with df.isna().sum() if you haven't.
Also, you could use pd.read_sql() to read your data directly into a DataFrame.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
Consider if I have a column Amount.Requested and it has some missing values, so now based on those missing values from Amount.Requested I want to drop the entire row, because if the column Amount.Requested has missing values then there is no point in keeping the data of that client for my sample code.
If you have nulls, then to remove rows with nulls alone try
df = df.loc[~df['Amount.Requested'].isna()]
or
df = df.loc[df['Amount.Requested'] > 0]
I have a data frame DF in Python and I want to filter its rows based on 2 columns.
In particular, I want to remove the rows where orderdate is earlier than the startdate
How can I reverse/opposite the condition inside the following code to achieve what I want?
DF = DF.loc[DF['orderdate']<DF['startdate']]
I could reframe the code like below but it won't cover some rows that have NaT and I want to keep them
DF = DF.loc[DF['orderdate']>=DF['startdate']]
Inserting the ~ in front of the condition in parenthesis will reverse the condition and remove all the rows that do not satisfy it.
DF = DF.loc[~(DF['orderdate']<DF['startdate'])]
1- loc takes the rows from the 'orderdate' column and compares them with the rows from the 'startdate' column. Where the condition is true, it returns the index of the lines and stores it in the ids array.
2 - The drop method deletes lines in the dataframe, the parameters are the array with the indices of the lines, and inplace = True, this ensures that the operation is performed on the dataframe itself, if it is False operation it will return a copy of the dataframe
# Get names of indexes for which column orderdate > = startdate
ids = DF.loc[DF['orderdate'] >= DF['startdate']].index
# Delete these row indexes from dataFrame
DF.drop(ids, inplace=True)
I need to delete the rows that contains 'nan%' in in the 'Precison' and 'Recall' columns,
as below image shows,
I just need to remove all rows that shows 'nan%' both in 'Precision' and in 'Recall'.
dropna() does not work here.
You can select all rows if not equal nan% in both columns:
df[df[['Precison','Recall']].ne('nan%').all(axis=1)]
Or you can replace all nan% to NaN for working DataFrame.dropna:
df = df.replace('nan%', np.nan).dropna(subset=['Precison','Recall'])
I have a dataframe which has 14343 rows. But, when I check df.info() it shows 14365 rows as after the last row, there are cells which explain the column names and df considers it as a row. I tried the following code but it seems it did not work: df.drop(df.index[14344, ])