I am trying to create the table_data dictionary from Pandas dataframe like below:
import pandas as pd
d = {
'Name': ['John', 'Tom', 'Jack', 'Jill'],
'Age': [42, 53, 18, 22],
'City': ['London', 'New York', 'Bangkok', 'Warsaw']
}
df = pd.DataFrame(d)
table_data = dict(
headers = [[header] for header in list(df)],
columns = [df[header].tolist() for header in list(df)],
)
print(table_data)
Is there any way to avoid iterating over list(df) twice and turn those two list comprehensions into one?
Or does it defeat the purpose of list comprehension and I should use plain old for loop instead, like so?:
import pandas as pd
d = {
'Name': ['John', 'Tom', 'Jack', 'Jill'],
'Age': [42, 53, 18, 22],
'City': ['London', 'New York', 'Bangkok', 'Warsaw']
}
df = pd.DataFrame(d)
headers = []
columns = []
table_data = {
'headers': headers,
'columns': columns,
}
for header in list(df):
table_data['headers'].append([header])
table_data['columns'].append(df[header].tolist())
print(table_data)
Something like this:
header = [list(df.columns.values)]
values = df.values.T
table_data = dict(headers=header, columns=values)
Yes, it is possible by creating tuples first, then zip and convert tuples to lists:
L = [([header],df[header].tolist()) for header in list(df)]
h, c = zip(*L)
table_data = dict(
headers = list(h),
columns = list(c),
)
Non loop solution:
table_data = dict(
headers = df.columns.to_frame().values.tolist(),
columns = df.T.values.tolist(),
)
print(table_data)
Related
i have the below dataframe i want to filter the dataframe and return result based on the user selection from a multiselectbox , and grouped by name
the selectbox is the unique value of name field
import streamlit as st
import pandas as pd
data = {
'ID': [1, 2, 3, 4],
'name': ['peter', 'john', 'james', 'james'],
'nickname': ['pet', 'jon','james', 'jem'],
'mother_name': ['maria', 'linda', 'ana', 'beth'],
'bd': ['2000-05-15', '2006-09-12', '2004-10-25',]
}
with st.sidebar.form(key='search_form',clear_on_submit= False):
choices =df["name"].unique().tolist()
regular_search_term = st.multiselect(" ",choices)
if st.form_submit_button("search"):
df_result_search=df[df["name"].isin(regular_search_term)]
df_group = df_result_search.groupby('name')
st.write(df_group)
if i select james it return the 2 records while i need to return
1 record that includes the 2 data related to james
how can i return this result.
There is a missing value for the key bd in your data dictionnary.
You can use this :
import streamlit as st
import pandas as pd
data = {
'ID': [1, 2, 3, 4],
'name': ['peter', 'john', 'james', 'james'],
'nickname': ['pet', 'jon', 'james', 'jem'],
'mother_name': ['maria', 'linda', 'ana', 'beth'],
'bd': ['2000-05-15', '2006-09-12', '2004-10-25', '2004-10-26']
}
df = pd.DataFrame(data)
with st.sidebar.form(key='search_form', clear_on_submit=False):
choices = df["name"].unique().tolist()
regular_search_term = st.multiselect(" ", choices)
if st.form_submit_button("search"):
st.text('Filter on name')
st.write(df[df["name"].isin(regular_search_term)])
st.text('Filter on nickname')
st.write(df[df["nickname"].isin(regular_search_term)])
df_gr = df[['ID', 'nickname', 'mother_name', 'bd']
].astype(str).groupby(df['name']).agg('|'.join).reset_index()
st.text('Filter on name with grouped columns')
st.write(df_gr[df_gr["name"].isin(regular_search_term)])
>>> Output (in browser):
I let you choose whatever type of filter/display you want between the three.
there are list :
data = ['man', 'man1', 'man2']
key = ['name', 'id', 'sal']
man_res = ['Alexandra', 'RST01', '$34,000']
man1_res = ['Santio', 'RST009', '$45,000']
man2_res = ['Rumbalski', 'RST50', '$78,000']
the expected output will be nested output:
Expected o/p:- {'man':{'name':'Alexandra', 'id':'RST01', 'sal':$34,000},
'man1':{'name':'Santio', 'id':'RST009', 'sal':$45,000},
'man2':{'name':'Rumbalski', 'id':'RST50', 'sal':$78,000}}
Easy way would be using pandas dataframe
import pandas as pd
df = pd.DataFrame([man_res, man1_res, man2_res], index=data, columns=key)
print(df)
df.to_dict(orient='index')
name id sal
man Alexandra RST01 $34,000
man1 Santio RST009 $45,000
man2 Rumbalski RST50 $78,000
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
Or you could manually merge them using dict + zip
d = dict(zip(
data,
(dict(zip(key, res)) for res in (man_res, man1_res, man2_res))
))
d
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
#save it in 2D array
all_man_res = []
all_man_res.append(man_res)
all_man_res.append(man1_res)
all_man_res.append(man2_res)
print(all_man_res)
#Add it into a dict output
output = {}
for i in range(len(l)):
person = l[i]
details = {}
for j in range(len(key)):
value = key[j]
details[value] = all_man_res[i][j]
output[person] = details
output
The pandas dataframe answer provided by NoThInG makes the most intuitive sense. If you are looking to use only the built in python tools, you can do
info_list = [dict(zip(key,man) for man in (man_res, man1_res, man2_res)]
output = dict(zip(data,info_list))
I uploaded a a csv file using DictReader so I essentially have a list of dictionaries. For example I have a called reader with the following:
[{'name': 'Jack', 'hits:' :7, 'misses:': 12, 'year': 10},
{'name': 'Lisa', 'hits': 5, 'misses': 3,' year': 8},
{'name': 'Jack', 'hits': 5, 'misses ':7, 'year': 9}]
I am using a loop to create lists like the following:
name = []
hits = []
for row in reader:
name.append(row["name"])
hits.append(row["hits"])
However I don't want duplicates in my list so where there is a duplicate name I am only interested in the names with the highest year. So basically I want to end up with the following
name = [Jack, Lisa]
hits = [7,5]
What is the best way to go about this
TRY:
reader = sorted(reader, key = lambda i: i['year'], reverse=True)
name = []
hits = []
for row in reader:
if row['name'] in name:
continue
name.append(row["name"])
hits.append(row["hits"])
Idea is to sort the list of dict based on year and then iterate over the list.
import pandas as pd
data = [{'name': 'Jack', 'hits' :7, 'misses': 12, 'year': 10},
{'name': 'Lisa', 'hits': 5, 'misses': 3,'year': 8},
{'name': 'Jack', 'hits': 5, 'misses':7, 'year': 9}]
df = pd.DataFrame(data).sort_values(by=['name','year'],ascending=False).groupby('name').first()
dict(zip(df.index,df['hits']))
In pure Python (no libraries):
people = {} # maps "name" -> "info"
for record in csv_reader:
# do we have someone with that name already?
old_record = people.get(record['name'], {})
# what's their year (defaulting to -1)
old_year = old_record.get('year', -1)
# if this record is more up to date
if record['year'] > old_year:
# replace the old record
people[record['name']] = record
# -- then, you can pull out your name and year lists
name = list(people.keys())
year = list(r['year'] for r in people.values())
If you want to learn Pandas
import pandas as pd
df = pd.read_csv('yourdata.csv')
df.groupby(['name']).max()
Solution without pandas:
lst = [
{"name": "Jack", "hits": 7, "misses:": 12, "year": 10},
{"name": "Lisa", "hits": 5, "misses": 3, " year": 8},
{"name": "Jack", "hits": 5, "misses ": 7, "year": 9},
]
out = {}
for d in lst:
out.setdefault(d["name"], []).append(d)
name = [*out]
hits = [max(i["hits"] for i in v) for v in out.values()]
print(name)
print(hits)
Prints:
['Jack', 'Lisa']
[7, 5]
I am missing something small here and could use a pointer. I am trying to generate data to save time for my work with CRUD work in a database via pymonogo and other pythonic database libraries. Below is the code that I am having trouble with. I would like to create a function which creates a dictionary of length n but I cannot figure out how to append the dictionary appropriately. As you can see, it only enters in the last item of the list generated. Any input would be great!
import names
import random
import numpy as np
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
def create_data(n=20):
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
person_values = []
data_dict = {}
unique_id = 0
while unique_id < n:
age = random.choice(age_choices)
gender = random.choice(gender_choices)
salary = random.choice(salary_choices)
person_keys = ['id', 'name', 'gender', 'age', 'salary']
person_values = [unique_id, names.get_full_name(gender), gender, age, salary]
for k, v in zip(person_keys, person_values):
data_dict[k] = v
unique_id += 1
return person_values, data_dict
data_list, data_dict = create_data(5)
print(data_list)
print()
print(data_dict)
current outputs:
[4, 'Anthony Shultz', 'male', 29, 188503] # This is the last item of the list generated in the while loop
{'id': 4, 'name': 'Anthony Shultz', 'gender': 'male', 'age': 29, 'salary': 188503} # This is the "whole" dictionary generated but should have length 5 since n=5
The desired out put should be a dictionary of length n not just one.
You should introduce another variable in your function which would be a list or tuple and append each data_dict to it, every time you create one. You should also create a unique data_dict in your while loop, on every iteration. For example (check the lines with comments):
import names
import random
import numpy as np
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
def create_data(n=20):
age_choices = np.arange(18, 90)
gender_choices = ['male', 'female']
salary_choices = np.arange(10000, 200000)
person_values = []
all_data = [] # Make a list which will store all our dictionaries
unique_id = 0
while unique_id < n:
data_dict = {} # Create a dictionary with current values
age = random.choice(age_choices)
gender = random.choice(gender_choices)
salary = random.choice(salary_choices)
person_keys = ['id', 'name', 'gender', 'age', 'salary']
person_values = [unique_id, names.get_full_name(gender), gender, age,
salary]
for k, v in zip(person_keys, person_values):
data_dict[k] = v
all_data.append(data_dict) # Add newly created `data_dict` dictionary to our list
unique_id += 1
return person_values, data_dict, all_data # Return as desired
data_list, data_dict, all_data = create_data(5) # Just as an example
print(data_list)
print()
print(data_dict)
print()
print(all_data) # Print the output
This will result in list of dictionaries, which I assume you want as an output, e.g.:
[{'id': 0, 'name': 'David Medina', 'gender': 'male', 'age': 87, 'salary': 67957}, {'id': 1, 'name': 'Valentina Reese', 'gender': 'female', 'age': 68, 'salary': 132938}, {'id': 2, 'name': 'Laura Franklin', 'gender': 'female', 'age': 84, 'salary': 93839}, {'id': 3, 'name': 'Melita Pierce', 'gender': 'female', 'age': 21, 'salary': 141055}, {'id': 4, 'name': 'Brenda Clay', 'gender': 'female', 'age': 36, 'salary': 94385}]
I have a list of dictionaries and need to create a new one that contains new keys and also keys&values from my original dictionary. One of the keys will have to contain a list of dictionaries (those would be the values from original dictionary)
My data looks like the following:
data = [{'CloseDate': '2020-05-01',
'OpportunityID': '1',
'CustomerID': '10'},
{'CloseDate': '2020-07-31',
'OpportunityID': '2',
'CustomerID': '11'}]
I want my new list of dicts look like this:
new_data = [{'id': '39',
'Query': [{'records': '40', 'Order Name': '1', 'CustomerID': '10'}]},
{'id': '39',
'Query': [{'records': '40', 'Order Name': '2', 'CustomerID': '11'}]}]
I have tried the following:
new_data = []
for item in data:
params_dict = {}
params_dict["id"] = "39"
params_dict["Query"] = []
# push new_dicts in params_dict
new_dict = {}
new_dict["records"] = "40"
new_dict["Order Name"] = data["OpportunityID"]
params_dict.append(new_dict)
new_data.append(params_dict)
Error: TypeError: list indices must be integers or slices, not str
datas_list=[]
for get_dict in data:
new_dict={}
new_dict["id"] = 39
new_dict['Query']=[]
other_dictionary={}
other_dictionary['records']=40
for values in get_dict:
if values == "OpportunityID":
other_dictionary['Order Name'] = get_dict[values]
if values == "CustomerID" :
other_dictionary[values] = get_dict[values]
new_dict["Query"].append(other_dictionary)
datas_list.append(new_dict)
You were trying to iterate through item and not data inside the loop.
Also you need to append to Query.
Try:
new_data = []
for item in data:
params_dict = {}
params_dict["id"] = "39"
params_dict["Query"] = []
new_dict = {} # defined new_dict
new_dict["records"] = "40"
new_dict["Order Name"] = item["OpportunityID"] # here it should be item
params_dict["Query"].append(new_dict)
new_data.append(params_dict)
Also:
new_data = []
for item in data:
params_dict = {}
params_dict["id"] = "39"
params_dict["Query"] = [{"records" : "40","Order Name" :item["OpportunityID"] }]
new_data.append(params_dict)