Filter Pandas Dataframe Under Multiple Conditions

Filter Pandas Dataframe Under Multiple Conditions - python

My current progress
I currently have a pandas Dataframe with 5 different instances
df =
{
'Name': ['John', 'Mark', 'Kevin', 'Ron', 'Amira'
'ID': [110,111,112,113,114]
'Job title': ['xox','xoy','xoz','yow','uyt']
'Manager': ['River' 'Trevor', 'John', 'Lydia', 'Connor']
'M2': ['Shaun', 'Mary', 'Ronald', 'Cary', 'Miranda']
'M3': ['Clavis', 'Sharon', 'Randall', 'Mark', Doug']
'M4': ['Pat', 'Karen', 'Brad', 'Chad', 'Anita']
'M5': ['Ty', 'Jared', 'Bill', 'William', 'Bob']
'Location': ['US', 'US', 'JP', 'CN', 'JA']
}
list = ['River', 'Pat', 'Brad', 'William', 'Clogah']
I need to filter and drop all rows in the pandas dataframe that contain 0 values from my list and also those that contain more than one value from my list. In the case above the instances in row 1 and row 2 would be dropped because there's two of the names in the specific row within the list.
IN ROW 1 i.e. (1: 'John', 110, 'xox, 'River', 'Shaun', 'Clavis', 'Pat', 'Ty', 'US'): SEE BELOW -> IT WOULD BE DROPPED BECAUSE BOTH 'River' and 'Pat' are listed in the list
IN ROW 2 i.e. (2: 'Mark', 111, 'xoy, 'Trevor', 'Mary', 'Sharon', 'Karen', 'Jared', 'US'): SEE BELOW -> IT WOULD BE DROPPED BECAUSE BOTH 'Trevor' and 'Jared' are listed in the list
IN ROW 5 i.e. (5: 'Amira', 114, 'uyt', 'Connor', 'Miranda', 'Doug', 'Anita', 'Bob', 'JA'): SEE BELOW -> IT WOULD BE DROPPED BECAUSE the row does not contain any values from my list.
The two other instances would be kept.
Original Printed DF
0: 'Name', 'ID', 'Job title', 'Manager', 'M2', 'M3', 'M4', 'M5', 'Location'
1: 'John', 110, 'xox, 'River', 'Shaun', 'Clavis', 'Pat', 'Ty', 'US'
2: 'Mark', 111, 'xoy, 'Trevor', 'Mary', 'Sharon', 'Karen', 'Jared', 'US'
3: 'Kevin', 112, 'xoz, 'John', 'Ronald', 'Randall', 'Brad', 'Bill', 'JP
4: 'Ron', 113, 'yow', 'Lydia', 'Cary', 'Mark', 'Chad', 'William', 'CN'
5: 'Amira', 114, 'uyt', 'Connor', 'Miranda', 'Doug', 'Anita', 'Bob', 'JA'
Filtered Printed DF
3: 'Kevin', 112, 'xoz, 'John', 'Ronald', 'Randall', 'Brad', 'Bill', 'JP',
4: 'Ron', 113, 'yow', 'Lydia', 'Cary', 'Mark', 'Chad', 'William', 'CN',
The current process only filters out rows that don't contain a value equal to any value in my managers list. I want to keep rows with one manager from the list but not rows without mangers from the lis

Not the prettiest way to achieve this, but this will work:
d = {
"Name": ["John", "Mark", "Kevin", "Ron", "Amira"],
"ID": [110, 111, 112, 113, 114],
"Job title": ["xox", "xoy", "xoz", "yow", "uyt"],
"M1": ["River", "Trevor", "John", "Lydia", "Connor"],
"M2": ["Shaun", "Mary", "Ronald", "Cary", "Miranda"],
"M3": ["Clavis", "Sharon", "Randall", "Mark", "Doug"],
"M4": ["Pat", "Karen", "Brad", "Chad", "Anita"],
"M5": ["Ty", "Jared", "Bill", "William", "Bob"],
"Location": ["US", "US", "JP", "CN", "JA"],
}
df = pd.DataFrame(d)
# Isolate managers in their own DataFrame
managers = ["River", "Pat", "Trevor", "Jared", "Connor"]
df_managers = df[["M1", "M2", "M3", "M4", "M5"]]
# Assess any one employee has less than two managers and isolate those employees
less_than_two_managers = []
for i in range(df_managers.shape[0]):
if len(set(df_managers.iloc[i]).intersection(set(managers))) < 2:
less_than_two_managers.append(True)
else:
less_than_two_managers.append(False)
df["LT two managers"] = less_than_two_managers
df[df["LT two managers"] == True]

here you go:
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'Mark', 'Kevin', 'Ron', 'Amira'],
'ID': [110, 111, 112, 113, 114],
'Job title': ['xox', 'xoy', 'xoz', 'yow', 'uyt'],
'Manager': ['River', 'Trevor', 'John', 'Lydia', 'Connor'],
'M2': ['Shaun', 'Mary', 'Ronald', 'Cary', 'Miranda'],
'M3': ['Clavis', 'Sharon', 'Randall', 'Mark', 'Doug'],
'M4': ['Pat', 'Karen', 'Brad', 'Chad', 'Anita'],
'M5': ['Ty', 'Jared', 'Bill', 'William', 'Bob'],
'Location': ['US', 'US', 'JP', 'CN', 'JA']}
)
managers = ['River', 'Pat', 'Trevor', 'Jared', 'Connor']
mask = df.applymap(lambda x: x in managers)
filtered_df = df[mask.values.sum(axis=1) < 2]
print(filtered_df)
to filter also the 0 (so only 1 manager will stay):
filtered_df = df[mask.values.sum(axis=1) == 1]

Vectorial solution using a mask:
m = (df.filter(regex=r'^M')
.apply(lambda s: s.isin(lst))
.sum(1).eq(1)
)
out = df.loc[m]
Output:
Name ID Job title Manager M2 M3 M4 M5 Location
2 Kevin 112 xoz John Ronald Randall Brad Bill JP
3 Ron 113 yow Lydia Cary Mark Chad William CN

Related

Combining three different list collection of dictionary having same value in key name “firstname” and “lastname” in python

I have three different list collection of dictionary as shown all three have same "firstname" and lastname". I need to combine this list in a copy of one without replicating the firstname and lastname, ie for each firstname and lastname a combination of the other three list collection of dictionary:
list one
[{'First Name': 'Justin',
'lastName': 'Walker',
'Age (Years)': '29',
'Sex': 'Male',
'Vehicle Make': 'Toyota',
'Vehicle Model': 'Continental',
'Vehicle Year': '2012',
'Vehicle Type': 'Sedan'},
{'First Name': 'Maria',
'lastName': 'Jones',
'Age (Years)': '66',
'Sex': 'Female',
'Vehicle Make': 'Mitsubishi',
'Vehicle Model': 'Yukon XL 2500',
'Vehicle Year': '2014',
'Vehicle Type': 'Van/Minivan'},
{'First Name': 'Samantha',
'lastName': 'Norman',
'Age (Years)': '19',
'Sex': 'Female',
'Vehicle Make': 'Aston Martin',
'Vehicle Model': 'Silverado 3500 HD Regular Cab',
'Vehicle Year': '1995',
'Vehicle Type': 'SUV'}
list two
[{'firstName': 'Justin',
'lastName': 'Walker',
'age': 71,
'iban': 'GB43YKET96816855547287',
'credit_card_number': '2221597849919620',
'credit_card_security_code': '646',
'credit_card_start_date': '03/18',
'credit_card_end_date': '06/26',
'address_main': '462 Marilyn radial',
'address_city': 'Lynneton',
'address_postcode': 'W4 0GW'},
{'firstName': 'Maria',
'lastName': 'Jones',
'age': 91,
'iban': 'GB53QKRK45175204753504',
'credit_card_number': '4050437758955103343',
'credit_card_security_code': '827',
'credit_card_start_date': '11/21',
'credit_card_end_date': '01/27',
'address_main': '366 Brenda radial',
'address_city': 'Ritafurt',
'address_postcode': 'NE85 1RG'}]
list three
{'firstName': 'Justin',
'lastName': 'Walker',
'age': '64',
'sex': 'Male',
'retired': 'False',
'dependants': '2',
'marital_status': 'single',
'salary': '56185',
'pension': '0',
'company': 'Hudson PLC',
'commute_distance': '14.1',
'address_postcode': 'G2J 0FH'},
{'firstName': 'Maria',
'lastName': 'Jones',
'age': '69',
'sex': 'Female',
'retired': 'False',
'dependants': '1',
'marital_status': 'divorced',
'salary': '36872',
'pension': '0',
'company': 'Wall, Reed and Whitehouse',
'commute_distance': '10.47',
'address_postcode': 'TD95 7FL'}
This is what I trying but
for i in range(0,2):
dict1 = list_one[i]
dict2 = list_two[i]
dict3 = list_three[i]
combine_file = list_three.copy()
for k, v in dict1.items():
if k == "firstname" or "lastname":
for k1, v1 in combine_file.items():
if dict1.get(k) == combine_file.v1:
This is what I'm expecting
print(combine_file)
{'firstName': 'Justin',
'lastName': 'Walker',
'age': '64',
'sex': 'Male',
'retired': 'False',
'dependants': '2',
'marital_status': 'single',
'salary': '56185',
'pension': '0',
'company': 'Hudson PLC',
'commute_distance': '14.1',
'iban': 'GB43YKET96816855547287',
'credit_card_number': '2221597849919620',
'credit_card_security_code': '646',
'credit_card_start_date': '03/18',
'credit_card_end_date': '06/26',
'address_main': '462 Marilyn radial',
'address_city': 'Lynneton',
'address_postcode': 'W4 0GW',
'Vehicle Make': 'Mitsubishi',
'Vehicle Model': 'Yukon XL 2500',
'Vehicle Year': '2014',
'Vehicle Type': 'Van/Minivan'},
{'firstName': 'Maria',
'lastName': 'Jones',
'age': '69',
'sex': 'Female',
'retired': 'False',
'dependants': '1',
'marital_status': 'divorced',
'salary': '36872',
'pension': '0',
'company': 'Wall, Reed and Whitehouse',
'commute_distance': '10.47',
'iban': 'GB53QKRK45175204753504',
'credit_card_number': '4050437758955103343',
'credit_card_security_code': '827',
'credit_card_start_date': '11/21',
'credit_card_end_date': '01/27',
'address_main': '366 Brenda radial',
'address_city': 'Ritafurt',
'address_postcode': 'NE85 1RG',
'Vehicle Make': 'Aston Martin',
'Vehicle Model': 'Silverado 3500 HD Regular Cab',
'Vehicle Year': '1995',
'Vehicle Type': 'SUV'}

Create a new dictionary keyed on a composite of either 'firstname_lastname' or 'First Name_lastname' then you can do this:
master = {}
for _list in list_1, list_2, list_3:
for d in _list:
if not (firstname := d.get('firstName')):
firstname = d['First Name']
name_key = f'{firstname}_{d["lastName"]}'
for k, v in d.items():
master.setdefault(name_key, {})[k] = v
print(list(master.values()))

Python's dict.update() functionality might be what you are looking for.
For example:
dict1 = { 'a' : 0,
'b' : 1,
'c' : 2}
dict2 = { 'c' : 0,
'd' : 1,
'e' : 2}
dict2.update(dict1)
dict2 is now:
{'a' : 0, 'b': 1, 'c': 2, 'd' 1, 'e': 2}
Notice how 'c' was overwritten with the updated value from dict1.
You can't update together dictionaries from different people, but if you run through your lists beforehand you could compile sets of dictionaries where each set belongs to one person.
You can create a new dictionary, called people, and then iterate through your lists of dictionaries and extract the person's name from those dictionaries and turn it into a key in the new "people" dictionary.
If that person's name is not in people yet, you can add that dictionary, so that people[name] points to that dictionary.
If people[name] does exist, then you can use the people[name].update() function on the new dictionary to add the new values.
After this process, you will have a dictionary whose keys are the names of people and the values point to a dictionary containing those people's attributes.

Extract values from dicts inside lists

I'm trying to extract the values from this JSON file, but I having some trouble to extract the data inside from lists in the dict values. For example, in the city and state, I would like to get only the name values and create a Pandas Dataframe and select only some keys like this.
I tried using some for with get methods techniques, but without success.
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': 0001, 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}

In [123]: data
Out[123]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'cpf': ['9999999999999'],
'rg': [],
'gender': ['Feminino'],
'email': ['my_user#bol.com.br'],
'phone_numbers': ['51999999999'],
'photo': [],
'id': 11111111,
'duplicate_id': -1,
'name': 'My User',
'cnpj': [],
'company_name': '[]',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}],
'type': 'Private Person',
'tags': [],
'pending_tickets_count': 0}
In [124]: data2 = {k:v for k,v in data.items() if k in required}
In [125]: data2
Out[125]:
{'birthday': ['1987-07-13T00:00:00.000Z'],
'gender': ['Feminino'],
'id': 11111111,
'name': 'My User',
'city': [{'id': '0001', 'name': 'Porto Alegre'}],
'state': [{'id': 100, 'name': 'Rio Grande do Sul', 'fs': 'RS'}]}
In [126]: pd.DataFrame(data2).assign(
...: city_name=lambda x: x['city'].str.get('name'),
...: state_name=lambda x: x['state'].str.get('name'),
...: state_fs=lambda x: x['state'].str.get('fs')
...: ).drop(['state', 'city'], axis=1)
Out[126]:
birthday gender id name city_name state_name state_fs
0 1987-07-13T00:00:00.000Z Feminino 11111111 My User Porto Alegre Rio Grande do Sul RS
reason why data2 is required is that you can't have columns that differ in length. So in this case, pd.DataFrame(data) won't work as rg has 0 items but birthday has 1 item.
Also something to look at if you are directly dealing with json files is pd.json_normalize

Trying to print with specific format from DataFrame

new to Python and trying to print from a data frame
customers = {'NAME': ['Breadpot', 'Hoviz', 'Hovis', 'Grenns', 'Magnolia', 'Dozen', 'Sun'],
'CITY': ['Sydney', 'Manchester', 'London', 'London', 'Chicago', 'San Francisco', 'San Francisco'],
'COUNTRY': ['Australia', 'UK', 'UK', 'UK', 'USA', 'USA', 'USA'],
'CPERSON': ['Sam.Keng#info.com', 'harry.ham#hoviz.com', 'hamlet.host#hoviz.com', 'grenns#grenns.com', 'man#info.com', 'dozen#dozen.com', 'sunny#sun.com'],
'EMPLCNT': [250, 150, 1500, 200, 1024, 1000, 2000],
'CONTRCNT': [48, 7, 12800, 12800, 25600, 5, 2],
'CONTRCOST': [1024.00, 900.00, 10510.50, 128.30, 512000.00, 1000.20, 10000.01]
}
df = pd.DataFrame(customers, columns=['CITY', 'COUNTRY', 'CPERSON', 'EMPLCNT', 'CONTRCNT', 'EMPLCNT', 'CONTRCOST'])
new_df = df.loc[df['CONTRCNT'].idxmax()]
print('City with the largest number of signed contracts:')
print(new_df['CITY'],'(', new_df['CONTRCNT'], 'contracts)')
Trying to get code to return "City with largest number of contracts:" "city" ("number of contracts")
but instead keep getting this:
City with the largest number of signed contracts:
4 Chicago
4 Chicago
Name: CITY, dtype: object ( CONTRCNT CONTRCNT
4 25600 25600
4 25600 25600 contracts)

This should work:
customers = {'NAME': ['Breadpot', 'Hoviz', 'Hovis', 'Grenns', 'Magnolia', 'Dozen', 'Sun'],
'CITY': ['Sydney', 'Manchester', 'London', 'London', 'Chicago', 'San Francisco', 'San Francisco'],
'COUNTRY': ['Australia', 'UK', 'UK', 'UK', 'USA', 'USA', 'USA'],
'CPERSON': ['Sam.Keng#info.com', 'harry.ham#hoviz.com', 'hamlet.host#hoviz.com', 'grenns#grenns.com', 'man#info.com', 'dozen#dozen.com', 'sunny#sun.com'],
'EMPLCNT': [250, 150, 1500, 200, 1024, 1000, 2000],
'CONTRCNT': [48, 7, 12800, 12800, 25600, 5, 2],
'CONTRCOST': [1024.00, 900.00, 10510.50, 128.30, 512000.00, 1000.20, 10000.01]
}
df = pd.DataFrame(customers, columns=['CITY', 'COUNTRY', 'CPERSON', 'EMPLCNT', 'CONTRCNT', 'CONTRCOST'])
new_df = df.groupby('CITY').sum().sort_values(by='CONTRCNT', ascending = False)
print('City with the largest number of signed contracts:')
print(new_df.index.values[0],'(', new_df.iloc[0][1], 'contracts)')

Update dictionary keys inside a list based on another dictionary key value pairs

I have a list which has nested dictionary inside it and also a dictionary with respective key pair values.
I am trying to map the key from dict2 to keys for the dictionary elements inside the list.
list = [{'name': 'Megan', 'Age': '28', 'occupation': 'yes', 'race': 'american', 'children': 'yes'}, {'name': 'Ryan', 'Age': '25', 'occupation': 'no', 'race': 'american', 'intern': 'yes'}]
The respective dictionary which holds the correct keys is
dict_map = {'occupation': 'service', 'intern': 'employee', 'race': 'ethnicity'}
I am new to python so far I am trying to go through stackoverflow pages to get an output tried few as well but not able to get the desired result so far.
The closet I got was with this Python Dictionary: How to update dictionary value, base on key - using separate dictionary keys
The final output should be:
[{'name': 'Megan', 'Age': '28', 'service': 'yes', 'ethnicity': 'american', 'children': 'yes'}, {'name': 'Ryan', 'Age': '25', 'service': 'no', 'ethnicity': 'american', 'employee': 'yes'}]

you could try this:
note that i renamed your list to lst (list is abuilt-in type that you should never overwrite!)
lst = [
{
"name": "Megan",
"Age": "28",
"occupation": "yes",
"race": "american",
"children": "yes",
},
{
"name": "Ryan",
"Age": "25",
"occupation": "no",
"race": "american",
"intern": "yes",
},
]
for dct in lst:
for old_key, new_key in dict_map.items():
if old_key not in dct:
continue
dct[new_key] = dct[old_key]
del dct[old_key]

Using a list comprehension with dict.get
Ex:
lst = [{'name': 'Megan', 'Age': '28', 'occupation': 'yes', 'race': 'american', 'children': 'yes'}, {'name': 'Ryan', 'Age': '25', 'occupation': 'no', 'race': 'american', 'intern': 'yes'}]
dict_map = {'occupation': 'service', 'intern': 'employee', 'race': 'ethnicity'}
result = [{dict_map.get(k, k): v for k, v in i.items()} for i in lst]
print(result)
Output:
[{'Age': '28',
'children': 'yes',
'ethnicity': 'american',
'name': 'Megan',
'service': 'yes'},
{'Age': '25',
'employee': 'yes',
'ethnicity': 'american',
'name': 'Ryan',
'service': 'no'}]

Python dictionary value conversion to dictionary list

class Weightcheck:
def bag_products(self,product_list):
bag_list = []
non_bag_items = []
MAX_BAG_WEIGHT = 5.0
for product in product_list:
if float(product['weight']) > MAX_BAG_WEIGHT:
product_list.remove(product)
non_bag_items.append(product)
and argument product_list is like
product_list = {'barcode': [123, 456], 'Name': ['Milk, 2 Litres', 'Bread'], 'Price': ['2', '3.5'], 'weight': ['2', '0.6']}
if the passed arugument is like
product_list = [{'name': 'Milk', 'price': 2.0, 'weight': 2.0},
{'name': 'LowfatMilk', 'price': 2.0, 'weight': 2.0},
{'name': 'HighfatMilk', 'price': 2.0, 'weight': 2.0},
{'name': 'Bread', 'price': 2.0, 'weight': 7.0}]
then it works properly. i mean list of dictionary. please help how can i solve this

This is not the best way but you can use something like this:
final_list = []
for i in range(len(product_in_basket['Name'])):
item ={} # each new item
for k,v in product_in_basket.items():
item[k]= v[i] # filling that item with specific index
final_list.append(item) # append to final list
> final_list
[
{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'},
{'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}
]

Here's a one-liner that does the trick:
product_list = [dict(zip(product_in_basket,t)) for t in zip(*product_in_basket.values())]
print(product_list)
Output:
[{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'}, {'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}]
In general, it's better to not use a library when plain Python will do, but I thought a solution using pandas might be interesting:
import pandas as pd
product_in_basket = {'barcode': [123, 456], 'Name': ['Milk, 2 Litres', 'Bread'],
'Price': ['2', '3.5'], 'weight': ['2.0', '0.6']}
df = pd.DataFrame(product_in_basket)
output = list(df.T.to_dict().values())
print(output)
Output:
[{'Name': 'Milk, 2 Litres', 'Price': '2', 'barcode': 123, 'weight': '2.0'},
{'Name': 'Bread', 'Price': '3.5', 'barcode': 456, 'weight': '0.6'}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Filter Pandas Dataframe Under Multiple Conditions - python

Vectorial solution using a mask: m = (df.filter(regex=r'^M') .apply(lambda s: s.isin(lst)) .sum(1).eq(1) ) out = df.loc[m] Output: Name ID Job title Manager M2 M3 M4 M5 Location 2 Kevin 112 xoz John Ronald Randall Brad Bill JP 3 Ron 113 yow Lydia Cary Mark Chad William CN

Related

Combining three different list collection of dictionary having same value in key name “firstname” and “lastname” in python

Extract values from dicts inside lists

Trying to print with specific format from DataFrame

Update dictionary keys inside a list based on another dictionary key value pairs

Python dictionary value conversion to dictionary list

Categories

Resources