Dropping filtered rows in pandas

Dropping filtered rows in pandas - python

I need to filter phone numbers and drop filtered phone numbers. I couldn't combine these conditions into 1 filter, so I made two filters. When I drop by 1 filter everything works fine. But when I drop by second filter I get an error. But anyway I get a properly filtered data frame.
What I need to change, to go without any error?
import pandas as pd
df = pd.DataFrame({"Phone" :['+77013655566','87014324366','7014324366','11111','999999','43434343','+77015452313','7012334212','87010956612', '7777777', '8888888']})
print(df)
Phone
0 +77013655566
1 87014324366
2 7014324366
3 11111
4 999999
5 43434343
6 +77015452313
7 7012334212
8 87010956612
9 7777777
10 8888888
phone_filter = ((df['Phone'].map(str) == '8888888') |
(df['Phone'].map(str) == '7777777'))
phone_filter2 = ((df['Phone'].map(str).str[0] != '8') &
(df['Phone'].map(str).str[0] != '7') &
(df['Phone'].map(str).str[0] != '+'))
df.drop(df[phone_filter].index, inplace = True)
df.drop(df[phone_filter2].index, inplace = True)
<ipython-input-83-80183cb110d3>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Expected output:
print(df)
Phone
0 +77013655566
1 87014324366
2 7014324366
6 +77015452313
7 7012334212
8 87010956612

Use:
invalid_numbers = ['8888888', '7777777']
df[(~df.Phone.isin(invalid_numbers)) & (df.Phone.str[0].isin(['8','7','+']))]
Output:
Phone
0 +77013655566
1 87014324366
2 7014324366
6 +77015452313
7 7012334212
8 87010956612

Related

changing index of 1 row in pandas

I have the the below df build from a pivot of a larger df. In this table 'week' is the the index (dtype = object) and I need to show week 53 as the first row instead of the last
Can someone advice please? I tried reindex and custom sorting but can't find the way
Thanks!
here is the table

Since you can't insert the row and push others back directly, a clever trick you can use is create a new order:
# adds a new column, "new" with the original order
df['new'] = range(1, len(df) + 1)
# sets value that has index 53 with 0 on the new column
# note that this comparison requires you to match index type
# so if weeks are object, you should compare df.index == '53'
df.loc[df.index == 53, 'new'] = 0
# sorts values by the new column and drops it
df = df.sort_values("new").drop('new', axis=1)
Before:
numbers
weeks
1 181519.23
2 18507.58
3 11342.63
4 6064.06
53 4597.90
After:
numbers
weeks
53 4597.90
1 181519.23
2 18507.58
3 11342.63
4 6064.06

One way of doing this would be:
import pandas as pd
df = pd.DataFrame(range(10))
new_df = df.loc[[df.index[-1]]+list(df.index[:-1])].reset_index(drop=True)
output:
0
9 9
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Alternate method:
new_df = pd.concat([df[df["Year week"]==52], df[~(df["Year week"]==52)]])

python panda new column with order of values

I would like to make a new column with the order of the numbers in a list. I get 3,1,0,4,2,5 ( index of the lowest numbers ) but I would like to have a new column with 2,1,4,0,3,5 ( so if I look at a row i get the list and I get in what order this number comes in the total list. what am I doing wrong?
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df.sort_values(by='list').index
print(df)

What you're looking for is the rank:
import pandas as pd
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df['list'].rank().sub(1).astype(int)
Result:
list order
0 4 2
1 3 1
2 6 4
3 1 0
4 5 3
5 9 5
You can use the method parameter to control how to resolve ties.

how to drop rows based on other column only if has multiple different values

What I have?
I have a dataframe like this:
id value
0 0 5
1 0 5
2 0 6
3 1 7
4 1 7
What I want to get?
I want to drop all the rows with id that has more than one different value. in the example above I want to drop all the rows with id = 0
id value
3 1 7
4 1 7
What I have tried?
import pandas as pd
df = pd.DataFrame({'id':[0, 0, 0, 1, 1], 'value':[5,5,6,7,7]})
print(df)
id_list = df['id'].tolist()
id_set = set(id_list)
for id in id_set:
temp_list = df.loc[df['id'] == id,'value'].tolist()
s = set(temp_list)
if len(s) > 1:
df = df.loc[df['id'] != id]
it works, but it ugly and inefficient
There is a better pytonic way using pandas methods?

Use GroupBy.transform with DataFrameGroupBy.nunique for number of unique values to Series, so possible compare and filter in boolean indexing:
df = df[df.groupby('id')['value'].transform('nunique').eq(1)]
print (df)
id value
3 1 7
4 1 7

# Try this code #
import pandas as pd
id1 = pd.Series([0,0,0,1,1])
value = pd.Series([5,5,6,7,7])
data = pd.DataFrame({'id':id1,'value':value})
datag = data.groupby('id')
# to delete rows,that id have different values
datadel = []
for i in set(data.id):
if len(set(datag.get_group(i)['value'])) != 1:
datadel.extend(data.loc[data["id"] == i].index.tolist())
data.drop(datadel, inplace = True)
print(data)

Pandas countifs value from column and other column not null

I am trying to perform the equivalent to an Excel COUNTIFS formula in pandas, where the first range is a dataframe column, and the search criteria is each value in that column. The second search range is a different column and the criteria is non null values in that column.
Written as an Excel formula, it would look like: COUNTIFS(A:A,A2,B:B,"<>")
Here is some sample data:
data = {'ADJL':['BCF-364/BTS-1091/ADJL-4', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-581/BTS-1742/ADJL-1', 'BCF-581/BTS-1742/ADJL-1'],
'LNCEL':['LNBTS-55/LNCEL-63', '', 'LNBTS-801/LNCEL-62', '', 'LNBTS-801/LNCEL-61', '', '', '']}
df = pd.DataFrame(data)
I need to add two columns to this. The first is a count of the value of each "ADJL". I found this solution for that column:
df['Count_of_ADJL'] = df.groupby('ADJL')['ADJL'].transform('Count_of_ADJL')
What I am stuck on is the next one, shown below in Excel. I need to calculate how many times the value in ADJL occurs throughout the entire ADJL column AND the LNCEL column is not empty.
I removed many other columns to simplify my question, so a solution where I can just add another column is ideal.
Many thanks in advance.

Use groupby.transform with np.count_nonzero as:
df['Count_of_ADJL'] = df.groupby('ADJL')['ADJL'].transform('count')
df['Count_of_ADJL & LNCEL not null'] = df.groupby('ADJL')['LNCEL'].transform(np.count_nonzero)
#or
df['Count_of_ADJL & LNCEL not null'] = df.groupby('ADJL')['LNCEL'].transform('count')
print(df)
ADJL LNCEL Count_of_ADJL \
0 BCF-364/BTS-1091/ADJL-4 LNBTS-55/LNCEL-63 1
1 BCF-130/BTS-389/ADJL-1 5
2 BCF-130/BTS-389/ADJL-1 LNBTS-801/LNCEL-62 5
3 BCF-130/BTS-389/ADJL-1 5
4 BCF-130/BTS-389/ADJL-1 LNBTS-801/LNCEL-61 5
5 BCF-130/BTS-389/ADJL-1 5
6 BCF-581/BTS-1742/ADJL-1 2
7 BCF-581/BTS-1742/ADJL-1 2
Count_of_ADJL & LNCEL not null
0 1
1 2
2 2
3 2
4 2
5 2
6 0
7 0

Split Dataframe from back to front

Somebody know how to make a split from back to front,
when I make a split like
dfgeo['geo'].str.split(',',expand=True)
I have:
1,2,3,4,nan,nan,nan
but I want
nan,nan,nan,4,3,2,1
thanks peopleee :)

if you're looking to reverse the column order you can do this:
new_df = dfgeo['geo'].str.split(',', expand=True)
new_df[new_df.columns[::-1]]

Try this:
list(reversed(dfgeo['geo'].str.split(',',expand=True)))
Assuming your code returns a list!

Use iloc with ::-1 for swap order of columns:
dfgeo = pd.DataFrame({'geo': ['1,2,3,4', '1,2,3,4,5,6,7']})
print (dfgeo)
geo
0 1,2,3,4
1 1,2,3,4,5,6,7
df = dfgeo['geo'].str.split(',',expand=True).iloc[:, ::-1]
#if necessary set default columns names
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6
0 None None None 4 3 2 1
1 7 6 5 4 3 2 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dropping filtered rows in pandas - python

Use: invalid_numbers = ['8888888', '7777777'] df[(~df.Phone.isin(invalid_numbers)) & (df.Phone.str[0].isin(['8','7','+']))] Output: Phone 0 +77013655566 1 87014324366 2 7014324366 6 +77015452313 7 7012334212 8 87010956612

Related

changing index of 1 row in pandas

python panda new column with order of values

how to drop rows based on other column only if has multiple different values

Pandas countifs value from column and other column not null

Split Dataframe from back to front

Categories

Resources