group by with python and streamlit - python

i have the below dataframe i want to filter the dataframe and return result based on the user selection from a multiselectbox , and grouped by name
the selectbox is the unique value of name field
import streamlit as st
import pandas as pd
data = {
'ID': [1, 2, 3, 4],
'name': ['peter', 'john', 'james', 'james'],
'nickname': ['pet', 'jon','james', 'jem'],
'mother_name': ['maria', 'linda', 'ana', 'beth'],
'bd': ['2000-05-15', '2006-09-12', '2004-10-25',]
}
with st.sidebar.form(key='search_form',clear_on_submit= False):
choices =df["name"].unique().tolist()
regular_search_term = st.multiselect(" ",choices)
if st.form_submit_button("search"):
df_result_search=df[df["name"].isin(regular_search_term)]
df_group = df_result_search.groupby('name')
st.write(df_group)
if i select james it return the 2 records while i need to return
1 record that includes the 2 data related to james
how can i return this result.

There is a missing value for the key bd in your data dictionnary.
You can use this :
import streamlit as st
import pandas as pd
data = {
'ID': [1, 2, 3, 4],
'name': ['peter', 'john', 'james', 'james'],
'nickname': ['pet', 'jon', 'james', 'jem'],
'mother_name': ['maria', 'linda', 'ana', 'beth'],
'bd': ['2000-05-15', '2006-09-12', '2004-10-25', '2004-10-26']
}
df = pd.DataFrame(data)
with st.sidebar.form(key='search_form', clear_on_submit=False):
choices = df["name"].unique().tolist()
regular_search_term = st.multiselect(" ", choices)
if st.form_submit_button("search"):
st.text('Filter on name')
st.write(df[df["name"].isin(regular_search_term)])
st.text('Filter on nickname')
st.write(df[df["nickname"].isin(regular_search_term)])
df_gr = df[['ID', 'nickname', 'mother_name', 'bd']
].astype(str).groupby(df['name']).agg('|'.join).reset_index()
st.text('Filter on name with grouped columns')
st.write(df_gr[df_gr["name"].isin(regular_search_term)])
>>> Output (in browser):
I let you choose whatever type of filter/display you want between the three.

Related

From list to nested dictionary

there are list :
data = ['man', 'man1', 'man2']
key = ['name', 'id', 'sal']
man_res = ['Alexandra', 'RST01', '$34,000']
man1_res = ['Santio', 'RST009', '$45,000']
man2_res = ['Rumbalski', 'RST50', '$78,000']
the expected output will be nested output:
Expected o/p:- {'man':{'name':'Alexandra', 'id':'RST01', 'sal':$34,000},
'man1':{'name':'Santio', 'id':'RST009', 'sal':$45,000},
'man2':{'name':'Rumbalski', 'id':'RST50', 'sal':$78,000}}
Easy way would be using pandas dataframe
import pandas as pd
df = pd.DataFrame([man_res, man1_res, man2_res], index=data, columns=key)
print(df)
df.to_dict(orient='index')
name id sal
man Alexandra RST01 $34,000
man1 Santio RST009 $45,000
man2 Rumbalski RST50 $78,000
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
Or you could manually merge them using dict + zip
d = dict(zip(
data,
(dict(zip(key, res)) for res in (man_res, man1_res, man2_res))
))
d
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
#save it in 2D array
all_man_res = []
all_man_res.append(man_res)
all_man_res.append(man1_res)
all_man_res.append(man2_res)
print(all_man_res)
#Add it into a dict output
output = {}
for i in range(len(l)):
person = l[i]
details = {}
for j in range(len(key)):
value = key[j]
details[value] = all_man_res[i][j]
output[person] = details
output
The pandas dataframe answer provided by NoThInG makes the most intuitive sense. If you are looking to use only the built in python tools, you can do
info_list = [dict(zip(key,man) for man in (man_res, man1_res, man2_res)]
output = dict(zip(data,info_list))

adding 'rows' to dictionary in a WHILE loop

I am missing something small here and could use a pointer. I am trying to generate data to save time for my work with CRUD work in a database via pymonogo and other pythonic database libraries. Below is the code that I am having trouble with. I would like to create a function which creates a dictionary of length n but I cannot figure out how to append the dictionary appropriately. As you can see, it only enters in the last item of the list generated. Any input would be great!
import names
import random
import numpy as np
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
def create_data(n=20):
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
person_values = []
data_dict = {}
unique_id = 0
while unique_id < n:
age = random.choice(age_choices)
gender = random.choice(gender_choices)
salary = random.choice(salary_choices)
person_keys = ['id', 'name', 'gender', 'age', 'salary']
person_values = [unique_id, names.get_full_name(gender), gender, age, salary]
for k, v in zip(person_keys, person_values):
data_dict[k] = v
unique_id += 1
return person_values, data_dict
data_list, data_dict = create_data(5)
print(data_list)
print()
print(data_dict)
current outputs:
[4, 'Anthony Shultz', 'male', 29, 188503] # This is the last item of the list generated in the while loop
{'id': 4, 'name': 'Anthony Shultz', 'gender': 'male', 'age': 29, 'salary': 188503} # This is the "whole" dictionary generated but should have length 5 since n=5
The desired out put should be a dictionary of length n not just one.
You should introduce another variable in your function which would be a list or tuple and append each data_dict to it, every time you create one. You should also create a unique data_dict in your while loop, on every iteration. For example (check the lines with comments):
import names
import random
import numpy as np
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
def create_data(n=20):
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
person_values = []
all_data = [] # Make a list which will store all our dictionaries
unique_id = 0
while unique_id < n:
data_dict = {} # Create a dictionary with current values
age = random.choice(age_choices)
gender = random.choice(gender_choices)
salary = random.choice(salary_choices)
person_keys = ['id', 'name', 'gender', 'age', 'salary']
person_values = [unique_id, names.get_full_name(gender), gender, age,
salary]
for k, v in zip(person_keys, person_values):
data_dict[k] = v
all_data.append(data_dict) # Add newly created `data_dict` dictionary to our list
unique_id += 1
return person_values, data_dict, all_data # Return as desired
data_list, data_dict, all_data = create_data(5) # Just as an example
print(data_list)
print()
print(data_dict)
print()
print(all_data) # Print the output
This will result in list of dictionaries, which I assume you want as an output, e.g.:
[{'id': 0, 'name': 'David Medina', 'gender': 'male', 'age': 87, 'salary': 67957}, {'id': 1, 'name': 'Valentina Reese', 'gender': 'female', 'age': 68, 'salary': 132938}, {'id': 2, 'name': 'Laura Franklin', 'gender': 'female', 'age': 84, 'salary': 93839}, {'id': 3, 'name': 'Melita Pierce', 'gender': 'female', 'age': 21, 'salary': 141055}, {'id': 4, 'name': 'Brenda Clay', 'gender': 'female', 'age': 36, 'salary': 94385}]

How to get lenght of dict keys after specific element?

There is a dict
example_dict =
{'spend': '3.91',
'impressions': '791',
'clicks': '19',
'campaign_id': '1111',
'date_start': '2017-11-01',
'date_stop': '2019-11-27',
'age': '18-24',
'gender': 'male'}
I have to check if there are any additional keys after date_stop key and if yes, get the lenght of them and their names.
So far I made a list of keys
list_keys = list(example_dict.keys())
list_keys =
['spend',
'impressions',
'clicks',
'campaign_id',
'date_start',
'date_stop',
'age',
'gender']
And to check that there is 'date_stop' element is simple
if 'date_stop' in list_keys:
# what next
But how to proceed am not sure. Appreciate any help.
I guess it should be implement in diffrent way, You should be using dict, but if You really want to do this way You could use OrderedDict from collections:
from collections import OrderedDict
my_dict = {
'spend': '3.91',
'impressions': '791',
'clicks': '19',
'campaign_id': '1111',
'date_start': '2017-11-01',
'date_stop': '2019-11-27',
'age': '18-24',
'gender': 'male'
}
sorted_ordered_dict = OrderedDict(sorted(my_dict.items(), key=lambda t: t[0]))
if 'date_stop' in sorted_ordered_dict.keys():
keys = list(sorted_ordered_dict.keys())
index = keys.index('date_stop')
after_list = keys[index:]
print('len: ', len(after_list))
print('list: ', after_list)
use below code:
new_dict={}
list_keys = list(example_dict.keys())
k=""
for i in list_keys:
if 'date_stop' == i:
k="done"
if k=="done":
new_dict[i]=len(i)
output:
{'date_stop': 9, 'age': 3, 'gender': 6}
I hope you understand your question
if you want just name and number of keys use this:
new_dict=[]
list_keys = list(example_dict.keys())
k=""
for i in list_keys:
if 'date_stop' == i:
k="done"
if k=="done":
new_dict.append(i)
output:
print (new_dict)
print (len(new_dict))
['date_stop', 'age', 'gender']
3

Turn two lists comprehensions into one?

I am trying to create the table_data dictionary from Pandas dataframe like below:
import pandas as pd
d = {
'Name': ['John', 'Tom', 'Jack', 'Jill'],
'Age': [42, 53, 18, 22],
'City': ['London', 'New York', 'Bangkok', 'Warsaw']
}
df = pd.DataFrame(d)
table_data = dict(
headers = [[header] for header in list(df)],
columns = [df[header].tolist() for header in list(df)],
)
print(table_data)
Is there any way to avoid iterating over list(df) twice and turn those two list comprehensions into one?
Or does it defeat the purpose of list comprehension and I should use plain old for loop instead, like so?:
import pandas as pd
d = {
'Name': ['John', 'Tom', 'Jack', 'Jill'],
'Age': [42, 53, 18, 22],
'City': ['London', 'New York', 'Bangkok', 'Warsaw']
}
df = pd.DataFrame(d)
headers = []
columns = []
table_data = {
'headers': headers,
'columns': columns,
}
for header in list(df):
table_data['headers'].append([header])
table_data['columns'].append(df[header].tolist())
print(table_data)
Something like this:
header = [list(df.columns.values)]
values = df.values.T
table_data = dict(headers=header, columns=values)
Yes, it is possible by creating tuples first, then zip and convert tuples to lists:
L = [([header],df[header].tolist()) for header in list(df)]
h, c = zip(*L)
table_data = dict(
headers = list(h),
columns = list(c),
)
Non loop solution:
table_data = dict(
headers = df.columns.to_frame().values.tolist(),
columns = df.T.values.tolist(),
)
print(table_data)

Setup up a hierarchy from dicts

I have a single CSV file of employees where I have employee data including the name, boss, department id and department name.
By reading that CSV file, I have created those 2 dict structures:
dep = {}
dep[1] = {'name': 'Sales', 'parent': None}
dep[2] = {'name': 'National Sales', 'parent': None}
dep[3] = {'name': 'International Sales', 'parent': None}
dep[4] = {'name': 'IT', 'parent': None}
dep[5] = {'name': 'Development', 'parent': None}
dep[6] = {'name': 'Support', 'parent': None}
dep[7] = {'name': 'Helpdesk', 'parent': None}
dep[8] = {'name': 'Desktop support', 'parent': None}
dep[9] = {'name': 'CEO', 'parent': None}
emp = {}
emp[1] = {'name': 'John', 'boss': None, 'dep': 9}
emp[2] = {'name': 'Jane', 'boss': 1, 'dep': 1}
emp[3] = {'name': 'Bob', 'boss': 2, 'dep': 1}
emp[4] = {'name': 'Clara', 'boss': 2, 'dep': 2}
emp[5] = {'name': 'George', 'boss': 3, 'dep': 2}
emp[6] = {'name': 'Steve', 'boss': 2, 'dep': 3}
emp[7] = {'name': 'Joe', 'boss': 1, 'dep': 4}
emp[8] = {'name': 'Peter', 'boss': 7, 'dep': 5}
emp[9] = {'name': 'Silvia', 'boss': 7, 'dep': 6}
emp[10] = {'name': 'Mike', 'boss': 9, 'dep': 7}
emp[11] = {'name': 'Lukas', 'boss': 10, 'dep': 7}
emp[12] = {'name': 'Attila', 'boss': 7, 'dep': 8}
emp[13] = {'name': 'Eva', 'boss': 12, 'dep': 8}
Out of this I have 2 tasks:
Create a hierarchy of departments. (basically fill the value of the
parent key)
Display (list) all the departments and employees for a boss
Expected result for the point #2 would be (everybody working in sales):
employees = {1: (2, 3, 4, 5, 6)}
for everybody working in National Sales:
employees = {4: (5)}
and for everybody working in International Sales (Steve is the only one, nobody is working for him)):
employees = {6: None}
How to achieve this in a performant manner (I have to handle several thousands employees)?
EDIT:
This a (simplified) CSV file structure:
id;name;boss;dep_id;dep_name
1;John;;9;CEO
2;Jane;1;1;Sales
3;Bob;2;1;Sales
4;Clara;2;2;National Sales
5;George;3;2;National Sales
6;Steve;2;3;International Sales
7;Joe;1;4;IT
8;Peter;7;5;Development
9;Silvia;7;6;Support
10;Mike;9;7;Helpdesk
11;Lukas;10;7;Helpdesk
12;Attila;7;8;Desktop support
13;Eva;12;8;Desktop support
As suggested in the comments, here is a solution using pandas. The file is mocked using your example data, and it should be plenty fast for only a few thousand entries.
from StringIO import StringIO
import pandas as pd
f = StringIO("""
id;name;boss;dep_id;dep_name
1;John;1;9;CEO
2;Jane;1;1;Sales
3;Bob;2;1;Sales
4;Clara;2;2;National Sales
5;George;3;2;National Sales
6;Steve;2;3;International Sales
7;Joe;1;4;IT
8;Peter;7;5;Development
9;Silvia;7;6;Support
10;Mike;9;7;Helpdesk
11;Lukas;10;7;Helpdesk
12;Attila;7;8;Desktop support
13;Eva;12;8;Desktop support
""")
# load data
employees = pd.read_csv(f, sep=';', index_col=0)
### print a department ###
# Filter by department and print the names
print employees[employees.dep_id == 7].name
### build org hierarchy ###
# keep only one entry per department (assumes they share a boss)
org = employees[['boss', 'dep_id']].drop_duplicates('dep_id')
# follow the boss id to their department id
# note: the CEO is his own boss, to avoid special casing
org['parent'] = org.dep_id.loc[org['boss']].values
# reindex by department id, and keep only the parent column
# note: the index is like your dictionary key, access is optimized
org = org.set_index('dep_id')[['parent']]
print org

Categories