How can I drop duplicates in pandas without dropping NaN values - python

I have a dataframe which I query and I want to get only unique values out of a certain column.
I tried to do that executing this code:
database = pd.read_csv(db_file, sep='\t')
query = database.loc[database[db_specifications[0]].isin(elements)].drop_duplicates(subset=db_specification[1])
db_specification is just a list containing two columns that I query.
Some of the values are NaN and I don't want to consider them duplicates of each other, how can I achieve that?

You can start by selecting all NaN and then drop duplicate on the rest of the dataframe.
mask = data.isna().any()
data = pd.concat([data[mask], data[~mask]])

Related

Merge two Panda rows

I need help getting two rows in the same datafram merged/joined.
The first table is the df that I have right now
The second one is the one that I would like to have
I need to combine Jim and Bill. I don't want to overwrite values in either tables. I just want to update NaN values in the row (Bill) with the values with row(Jim) e.g city
There are about 20 columns that I need updating because of that I cannot just update the Bill/City cell
Thanks
You can try
df.loc['Bill'] = df.loc['Bill'].fillna(df.loc['Jim'])
# or
df.loc['Bill'].fillna(df.loc['Jim'], inplace=True)

how to merge two columns inside a dataframe which have string data type and also having some common datas and also having nan values present

I have 2 dataframe which has 2 common column name, emp_id and emp_name.when i joined these two dataframe on=emp_id, separate columns created emp_name_x,emp_name_y which contains nan values as well and there are some rows where emp_name_x =emp_name_y. I want to make them into one column.If anyone can help me.

How to change Nan values to zero in a list of DataFrame in Python 3.7

I want to change the Nan values in a specific column in a list of DataFrame. I have applied methods (below). I am not unable to change the nan to zero. Is there any way to replace the values to zero
Data is the list of DataFrame and qobs is the specific column in each DataFrame
for value in data:
value['qobs']= value['qobs'].replace(np.nan,0)
for value in data:
value['qobs']= value['qobs'].fillna(0)
You can change column like this:
data['qobs'] = data['qobs'].fillna(0)
print(data)

Deleting rows in a dataset when they are missing data in a specific column using Python

I'm trying to identify which rows have a value of nan in a specific column (index 2), and either delete the rows that have nan or move the ones that don't have nan into their own dataframe. Any recommendations on how to go about either way?
I've tried to create a vector with all of the rows and specified column, but the data type object is giving me trouble. Also, I tried creating a list and adding all of the rows that != 'nan' in that specific column to the list.
patientsDD = patients.iloc[:,2].values
ddates = []
for value in patients[:,2]:
if value != 'nan':
ddates.append(value)
I'm expecting that it returns all of the rows that != 'nan' in index 2, but nothing is added to the list, and the error I am receiving is '(slice(None, None, None), 2)' is an invalid key.
I'm a newbie to all of this, so I really appreciate any help!
You can use .isna() of pandas:
patients[!patients.iloc[:, 2].isna()]
Instead of delete rows are nan, you can select only rows that are not nan.
You can try this (assuming df is the name of your data frame):
import numpy as np
df1 = df[np.isfinite(df['index 2'])]
This will give you a new data frame df1 with only the rows that have a finite value in the column index 2. You can also try this:
import pandas as pd
df1 = df[pd.notnull(df['index 2'])]
If you want to drop all the rows that have NaN values in any of the columns, you can use this:
df1 = df.dropna()

Pandas: Updating dataframe values for rows where one colum has missing data

I have a dataframe called firstpart.
I'm trying to update the values in one column (Key), but only for rows in which another column (Zone) has no data.
I'm using this code, which doesn't work:
firstpart.ix[firstpart.Zone ==np.nan,"Key"] = "newvalue"
Neither does this:
firstpart.ix[firstpart.Zone =="","Key"] = "newvalue"
Using this syntax I'm able to update values in rows for which Zone has another value, but for some reason not if I try to select the rows in which it is blank.
What am I doing wrong?
firstpart.ix[firstpart.Zone.isnull()] = "newvalue"
You can't equate NaN to anything.
In [1]: NaN == NaN
Out[1]: False
You need special methods for that, and this is what .isnull() is about.

Categories