How to delete an undesirable row from pandas dataframe [duplicate] - python

This question already has an answer here:
Deleting DataFrame row in Pandas where column value in list
(1 answer)
Closed 3 years ago.
I have pandas dataframe for exemple like :
id column1 column2
1 aaa mmm
2 bbb nnn
3 ccc ooo
4 ddd ppp
5 eee qqq
I have a list that contain some values from column1 :
[bbb],[ddd],[eee]
I need python code in order to delete from the pandas all elements existing in the list
Ps: my pandas contains 280 000 samples so I need a fast code
Thanks

You can use isin and its negation (~):
df[~df.column1.isin(['bbb','ddd', 'eee'])]

Try this:
df = df.loc[~df['B'].isin(list), :]

Related

seperate row made of numbers to a new column [duplicate]

This question already has answers here:
How to extract every second row into seperate column in dataframe?
(5 answers)
Closed 1 year ago.
I have the following data in pandas from a .CSV file:
ABC
123
DEF
456
GHI
789
But I want to separate the row made of numbers to a new column like:
ABC 123
DEF 456
GHI 789
Any idea how I can do this in pandas?
thank you.
df = pd.DataFrame(df['YOUR_COLUMN'].values.reshape(-1, 2), columns=['Letters', 'Numbers'])

how to find the difference between two dataFrame Pandas [duplicate]

This question already has answers here:
Find difference between two data frames
(19 answers)
Closed 1 year ago.
I have two dataFrame, both of them have name column, I want to make new dataframe of dataframeA have and dataframeB don't have
dataframeA
id name
1 aaa
2 bbbb
3 cccc
4 gggg
dataframeB
id name
1 ddd
2 aaa
3 gggg
new dataframe
id name
1 bbbb
2 cccc
If I understand correctly, ou can merge the two dataframes
import pandas as pd
merged_df = pd.merge(dataframe_a, dataframe_b, on='name')
You can use reduce from functools, or you can use isin, to create a new_df that only contains values in dfA that are also present in dfB.
Approach 1 using reduce:
from functools import reduce #import package
li = [dfA, dfB] #create list of dataframes
new_df = reduce(lambda left,right: pd.merge(left,right,on='name'), li) #reduce list
Approach 2 using isin:
new_df = dfA[dfA['name'].isin(dfB['name])]
One way you could do this is to utilise python's set functionality.
This will convert the specified columns to sets and then create a new dataframe using the output.
dataframe = pd.DataFrame(data = {
'name': list(set(dataframeA['name'].tolist()) - set(dataframeB['name'].tolist()))
})

Return rows with some text in them and delete the rest [duplicate]

This question already has answers here:
Drop rows containing empty cells from a pandas DataFrame
(8 answers)
Closed 2 years ago.
I have a dataframe df with values as below:
Common_words count
0 realdonaldtrump 2932
2 new 2347
3 2030
4 trump 2013
5 good 1553
6 1440
7 great 200
I only need the rows where there is certain text. For e.g rows which have blank value like row 3 and row6 need to be removed.
Tried:
df = df.dropna(how='any',axis=0)
but still i get the same result. I feel these are not null values but spaces, so I also tried below:
df.Common_words = df.Common_words.str.replace(' ', '')
But still same result. Row 3 and 6 are still not removed. What to do?
You can try:
df.replace(r'^\s+$', np.nan, regex=True)
df.dropna()
You can do:
df.Common_words = df.Common_words.replace(r"\s+", np.NaN, regex=True)
df.dropna()

Merge Disjoint Columns in Pandas [duplicate]

This question already has answers here:
How to remove nan value while combining two column in Panda Data frame?
(5 answers)
Closed 4 years ago.
I have a pretty simple Pandas question that deals with merging two series. I have two series in a dataframe together that are similar to this:
Column1 Column2
0 Abc NaN
1 NaN Abc
2 Abc NaN
3 NaN Abc
4 NaN Abc
The answer will probably end up being a really simple .merge() or .concat() command, but I'm trying to get a result like this:
Column1
0 Abc
1 Abc
2 Abc
3 Abc
4 Abc
The idea is that for each row, there is a string of data in either Column1, Column2, but never both. I did about 10 minutes of looking for answers on StackOverflow as well as Google, but I couldn't find a similar question that cleanly applied to what I was looking to do.
I realize that a lot of this question just stems from my ignorance on the three functions that Pandas has to stick series and dataframes together. Any help is very much appreciated. Thank you!
You can just use pd.Series.fillna:
df['Column1'] = df['Column1'].fillna(df['Column2'])
Merge or concat are not appropriate here; they are used primarily for combining dataframes or series based on labels.
Use groupby with first
df.groupby(df.columns.str[:-1],axis=1).first()
Out[294]:
Column
0 Abc
1 Abc
2 Abc
3 Abc
4 Abc
Or :
`ndf = pd.DataFrame({'Column1':df.fillna('').sum(1)})`

how to use map in index of pandas dataframe [duplicate]

This question already has answers here:
Map dataframe index using dictionary
(6 answers)
Closed 1 year ago.
I want to create a new column on a pandas dataframe using values on the index and a dictionary that translates these values into something more meaningful. My initial idea was to use map. I arrived to a solution but it is very convoluted and there must be a more elegant way to do it. Suggestions?
#dataframe and dict definition
df=pd.DataFrame({'foo':[1,2,3],'boo':[3,4,5]},index=['a','b','c'])
d={'a':'aa','b':'bb','c':'cc'}
df['new column']=df.reset_index().set_index('index',drop=False)['index'].map(d)
Creating a new series explicitly is a bit shorter:
df['new column'] = pd.Series(df.index, index=df.index).map(d)
After to_series, you can using map or replace
df.index=df.index.to_series().map(d)
df
Out[806]:
boo foo
aa 3 1
bb 4 2
cc 5 3
Or we think about another way
df['New']=pd.Series(d).get(df.index)
df
Out[818]:
boo foo New
a 3 1 aa
b 4 2 bb
c 5 3 cc

Categories