Dropping filtered rows in pandas - python

I need to filter phone numbers and drop filtered phone numbers. I couldn't combine these conditions into 1 filter, so I made two filters. When I drop by 1 filter everything works fine. But when I drop by second filter I get an error. But anyway I get a properly filtered data frame.
What I need to change, to go without any error?
import pandas as pd
df = pd.DataFrame({"Phone" :['+77013655566','87014324366','7014324366','11111','999999','43434343','+77015452313','7012334212','87010956612', '7777777', '8888888']})
print(df)
Phone
0 +77013655566
1 87014324366
2 7014324366
3 11111
4 999999
5 43434343
6 +77015452313
7 7012334212
8 87010956612
9 7777777
10 8888888
phone_filter = ((df['Phone'].map(str) == '8888888') |
(df['Phone'].map(str) == '7777777'))
phone_filter2 = ((df['Phone'].map(str).str[0] != '8') &
(df['Phone'].map(str).str[0] != '7') &
(df['Phone'].map(str).str[0] != '+'))
df.drop(df[phone_filter].index, inplace = True)
df.drop(df[phone_filter2].index, inplace = True)
<ipython-input-83-80183cb110d3>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Expected output:
print(df)
Phone
0 +77013655566
1 87014324366
2 7014324366
6 +77015452313
7 7012334212
8 87010956612

Use:
invalid_numbers = ['8888888', '7777777']
df[(~df.Phone.isin(invalid_numbers)) & (df.Phone.str[0].isin(['8','7','+']))]
Output:
Phone
0 +77013655566
1 87014324366
2 7014324366
6 +77015452313
7 7012334212
8 87010956612

Related

changing index of 1 row in pandas

I have the the below df build from a pivot of a larger df. In this table 'week' is the the index (dtype = object) and I need to show week 53 as the first row instead of the last
Can someone advice please? I tried reindex and custom sorting but can't find the way
Thanks!
here is the table
Since you can't insert the row and push others back directly, a clever trick you can use is create a new order:
# adds a new column, "new" with the original order
df['new'] = range(1, len(df) + 1)
# sets value that has index 53 with 0 on the new column
# note that this comparison requires you to match index type
# so if weeks are object, you should compare df.index == '53'
df.loc[df.index == 53, 'new'] = 0
# sorts values by the new column and drops it
df = df.sort_values("new").drop('new', axis=1)
Before:
numbers
weeks
1 181519.23
2 18507.58
3 11342.63
4 6064.06
53 4597.90
After:
numbers
weeks
53 4597.90
1 181519.23
2 18507.58
3 11342.63
4 6064.06
One way of doing this would be:
import pandas as pd
df = pd.DataFrame(range(10))
new_df = df.loc[[df.index[-1]]+list(df.index[:-1])].reset_index(drop=True)
output:
0
9 9
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Alternate method:
new_df = pd.concat([df[df["Year week"]==52], df[~(df["Year week"]==52)]])

python panda new column with order of values

I would like to make a new column with the order of the numbers in a list. I get 3,1,0,4,2,5 ( index of the lowest numbers ) but I would like to have a new column with 2,1,4,0,3,5 ( so if I look at a row i get the list and I get in what order this number comes in the total list. what am I doing wrong?
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df.sort_values(by='list').index
print(df)
What you're looking for is the rank:
import pandas as pd
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df['list'].rank().sub(1).astype(int)
Result:
list order
0 4 2
1 3 1
2 6 4
3 1 0
4 5 3
5 9 5
You can use the method parameter to control how to resolve ties.

how to drop rows based on other column only if has multiple different values

What I have?
I have a dataframe like this:
id value
0 0 5
1 0 5
2 0 6
3 1 7
4 1 7
What I want to get?
I want to drop all the rows with id that has more than one different value. in the example above I want to drop all the rows with id = 0
id value
3 1 7
4 1 7
What I have tried?
import pandas as pd
df = pd.DataFrame({'id':[0, 0, 0, 1, 1], 'value':[5,5,6,7,7]})
print(df)
id_list = df['id'].tolist()
id_set = set(id_list)
for id in id_set:
temp_list = df.loc[df['id'] == id,'value'].tolist()
s = set(temp_list)
if len(s) > 1:
df = df.loc[df['id'] != id]
it works, but it ugly and inefficient
There is a better pytonic way using pandas methods?
Use GroupBy.transform with DataFrameGroupBy.nunique for number of unique values to Series, so possible compare and filter in boolean indexing:
df = df[df.groupby('id')['value'].transform('nunique').eq(1)]
print (df)
id value
3 1 7
4 1 7
# Try this code #
import pandas as pd
id1 = pd.Series([0,0,0,1,1])
value = pd.Series([5,5,6,7,7])
data = pd.DataFrame({'id':id1,'value':value})
datag = data.groupby('id')
# to delete rows,that id have different values
datadel = []
for i in set(data.id):
if len(set(datag.get_group(i)['value'])) != 1:
datadel.extend(data.loc[data["id"] == i].index.tolist())
data.drop(datadel, inplace = True)
print(data)

Pandas countifs value from column and other column not null

I am trying to perform the equivalent to an Excel COUNTIFS formula in pandas, where the first range is a dataframe column, and the search criteria is each value in that column. The second search range is a different column and the criteria is non null values in that column.
Written as an Excel formula, it would look like: COUNTIFS(A:A,A2,B:B,"<>")
Here is some sample data:
data = {'ADJL':['BCF-364/BTS-1091/ADJL-4', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-130/BTS-389/ADJL-1', 'BCF-581/BTS-1742/ADJL-1', 'BCF-581/BTS-1742/ADJL-1'],
'LNCEL':['LNBTS-55/LNCEL-63', '', 'LNBTS-801/LNCEL-62', '', 'LNBTS-801/LNCEL-61', '', '', '']}
df = pd.DataFrame(data)
I need to add two columns to this. The first is a count of the value of each "ADJL". I found this solution for that column:
df['Count_of_ADJL'] = df.groupby('ADJL')['ADJL'].transform('Count_of_ADJL')
What I am stuck on is the next one, shown below in Excel. I need to calculate how many times the value in ADJL occurs throughout the entire ADJL column AND the LNCEL column is not empty.
I removed many other columns to simplify my question, so a solution where I can just add another column is ideal.
Many thanks in advance.
Use groupby.transform with np.count_nonzero as:
df['Count_of_ADJL'] = df.groupby('ADJL')['ADJL'].transform('count')
df['Count_of_ADJL & LNCEL not null'] = df.groupby('ADJL')['LNCEL'].transform(np.count_nonzero)
#or
df['Count_of_ADJL & LNCEL not null'] = df.groupby('ADJL')['LNCEL'].transform('count')
print(df)
ADJL LNCEL Count_of_ADJL \
0 BCF-364/BTS-1091/ADJL-4 LNBTS-55/LNCEL-63 1
1 BCF-130/BTS-389/ADJL-1 5
2 BCF-130/BTS-389/ADJL-1 LNBTS-801/LNCEL-62 5
3 BCF-130/BTS-389/ADJL-1 5
4 BCF-130/BTS-389/ADJL-1 LNBTS-801/LNCEL-61 5
5 BCF-130/BTS-389/ADJL-1 5
6 BCF-581/BTS-1742/ADJL-1 2
7 BCF-581/BTS-1742/ADJL-1 2
Count_of_ADJL & LNCEL not null
0 1
1 2
2 2
3 2
4 2
5 2
6 0
7 0

Split Dataframe from back to front

Somebody know how to make a split from back to front,
when I make a split like
dfgeo['geo'].str.split(',',expand=True)
I have:
1,2,3,4,nan,nan,nan
but I want
nan,nan,nan,4,3,2,1
thanks peopleee :)
if you're looking to reverse the column order you can do this:
new_df = dfgeo['geo'].str.split(',', expand=True)
new_df[new_df.columns[::-1]]
Try this:
list(reversed(dfgeo['geo'].str.split(',',expand=True)))
Assuming your code returns a list!
Use iloc with ::-1 for swap order of columns:
dfgeo = pd.DataFrame({'geo': ['1,2,3,4', '1,2,3,4,5,6,7']})
print (dfgeo)
geo
0 1,2,3,4
1 1,2,3,4,5,6,7
df = dfgeo['geo'].str.split(',',expand=True).iloc[:, ::-1]
#if necessary set default columns names
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6
0 None None None 4 3 2 1
1 7 6 5 4 3 2 1

Categories