Related
Below is a simple reproducible example that works to illustrate the problem in its simple form. You can jump to the code and expected behaviour as the problem description can be long.
The main concept
There are 3 dataframes stored in a list, and a form on the sidebar shows the supplier_name and po_number from the relevant dataframe. When the user clicks the Next button, the information inside the supplier_name and po_number text_input will be saved (in this example, they basically got printed out on top of the sidebar).
Problem
This app works well when the user don't change anything inside the text_input, but if the user changes something, it breaks the app. See below pic for example, when I change the po_number to somethingrandom, the saved information is not somethingrandom but p123 from the first dataframe.
What's more, if the information from the next dataframe is the same as the first dataframe, the changed value inside the text_input will be unchanged for the next display. For example, because the first and second dataframe's supplier name are both S1, if I change the supplier name to S10, then click next, the supplier_name is still S10 on the second dataframe, while the second dataframe's supplier_name should be S1. But if the supplier name for the next dataframe changed, the information inside the text_input will be changed.
Justification
If you are struggling to understand why I want to do this, the original use for this is for the sidebar input area to extract information from each PDFs, then when the user confirms the information are all correct, they click next to review the next PDF. But if something is wrong, they can change the information inside the text_input, then click next, and the information of the changed value will be recorded, and for the next pdf, the extracted information should reflect on what the next pdf is. I did this in R shiny quite simply, but can't figure out how the data flow works here in streamlit, please help.
Reproducible Example
import streamlit as st
import pandas as pd
# 3 dataframes that are stored in a list
data1 = {
"supplier_name": ["S1"],
"po_number": ["P123"],
}
data2 = {
"supplier_name": ["S1"],
"po_number": ["P124"],
}
data3 = {
"supplier_name": ["S2"],
"po_number": ["P125"],
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
list1 = [df1, df2, df3]
# initiate a page session state, every time next button is clicked
# it will go to the next dataframe in the list
if 'page' not in st.session_state:
st.session_state.page = 0
def next_page():
st.sidebar.write(f"Submitted! supplier_name: {supplier_name} po_number: {po_number}")
st.session_state.page += 1
supplier_name_value = list1[st.session_state.page]["supplier_name"][0]
po_number_value = list1[st.session_state.page]["po_number"][0]
# main area
list1[st.session_state.page]
# sidebar form
with st.sidebar.form("form"):
supplier_name = st.text_input(label="Supplier Name", value=supplier_name_value)
po_number = st.text_input(label="PO Number", value=po_number_value)
next_button = st.form_submit_button("Next", on_click=next_page)
Expected behaviour
The dataframe's info are extracted into the sidebar input area. The user can change the input if they wish, then click next, and the values inside the input areas will be saved. When it goes to the next dataframe, the values inside the text input will be refreshed to extract from the next dataframe, and repeats.
I'm not totally sure what you're going for, but after some messing around, the only way I was able to achieve this sort of sequential form submission handling is with st.experimental_rerun(). I hate to resort to that since it may be removed any time, so hopefully there's a better way.
Without experimental_rerun(), forms take two submits to actually update state. I wasn't able to find a "correct" way to achieve an immediate update to support the expected behavior.
Here's my attempt:
import pandas as pd # 1.5.1
import streamlit as st # 1.18.1
def initialize_state():
data = [
{
"supplier_name": ["S1"],
"po_number": ["P123"],
},
{
"supplier_name": ["S1"],
"po_number": ["P124"],
},
{
"supplier_name": ["S2"],
"po_number": ["P125"],
},
]
state.dfs = state.get("dfs", [pd.DataFrame(x) for x in data])
first_vals = [{x: df[x][0] for x in df.columns} for df in state.dfs]
state.selections = state.get("selections", first_vals)
state.pages_expanded = state.get("pages_expanded", 0)
state.current_page = state.get("current_page", 0)
state.just_modified_page = state.get("just_modified_page", -1)
def handle_submit(i):
st.session_state.selections[i] = {
"supplier_name": state.new_supplier_name,
"po_number": state.new_po_number,
}
state.current_page = i
state.just_modified_page = i
if i < len(state.dfs) - 1 and state.pages_expanded == i:
state.pages_expanded += 1
st.experimental_rerun()
def render_form(i):
with st.sidebar.form(key=f"form-{i}"):
supplier_name = state.selections[i]["supplier_name"]
po_number = state.selections[i]["po_number"]
if i == state.just_modified_page:
st.sidebar.write(
f"Submitted! supplier_name: {supplier_name} "
f"po_number: {po_number}"
)
state.just_modified_page = -1
state.new_supplier_name = st.text_input(
label="Supplier Name",
value=supplier_name,
)
state.new_po_number = st.text_input(
label="PO Number",
value=po_number,
)
if st.form_submit_button("Next"):
handle_submit(i)
state = st.session_state
initialize_state()
for i in range(state.pages_expanded + 1):
render_form(i)
# debug
st.write("state.pages_expanded", state.pages_expanded)
st.write("state.current_page", state.current_page)
st.write("state.just_modified_page", state.just_modified_page)
st.write("state.dfs[state.current_page]", state.dfs[state.current_page])
st.write("state.selections", state.selections)
I'm assuming you want to keep track of the user's selections, but not actually modify the dataframes. If you do want to modify the dataframes, that's simpler: replace state.selections with actual writes to dfs by index and column:
# ...
def handle_submit(i):
st.session_state.dfs[i]["supplier_name"] = state.new_supplier_name,
st.session_state.dfs[i]["po_number"] = state.new_po_number,
#st.session_state.selections[i] = {
# "supplier_name": state.new_supplier_name,
# "po_number": state.new_po_number,
#}
# ...
def render_form(i):
with st.sidebar.form(key=f"form-{i}"):
supplier_name = state.dfs[i]["supplier_name"][0]
po_number = state.dfs[i]["po_number"][0]
#supplier_name = state.selections[i]["supplier_name"]
#po_number = state.selections[i]["po_number"]
# ...
Now, it's possible to make this 100% dynamic, but I hardcoded supplier_name and po_number to avoid premature generalization that you may not need. If you do want to generalize, use df.columns like initialize_state does throughout the code.
I'm not sure I quite understand what you're trying to accomplish, but it seems like you're never updating the supplier name in list1 after the user updates the name via the text input widget.
I got multiple dropdowns that I'd like to populate depending on what the user chooses in the previous dropdown. I populate the first dropdown with:
schools_requests = requests.get("http://ipwhatever:portwhatever/list_all_schools")
schools_data = schools_requests.json()
df = pd.DataFrame(schools_data)
nome = df['nome'].tolist()
It gives me the names of the schools I got listed. I then send it (nome) to the first dropdown like this:
html.Label('Escola'),
dcc.Dropdown(
options = nome,
id = "escola",
)
The first callback works fine and it's the one down below:
#callback(
Output('id_school', 'children'),
Input('escola','value')
)
def find_id_school(school_name):
all_schools = requests.get(
"http://ipwhatever:portwhatever/list_all_schools")
all_schools_data = all_schools.json()
for element in all_schools_data:
if school_name == element['nome']:
id_school = element['id_escola']
return id_school
It basically searches for the corresponding school id given the name of the school the user chose in the first dropdown and stores this id in a hidden html.Div.
Now comes the second callback, where I use pandas and don't understand why it's different from the first time.
#callback(
Output('ano', 'options'),
Input('id_school', 'children')
)
def render_grade_from_school(chosen_id):
grade = requests.get(
"http://ipwhatever:portwhatever/grade?school_id="+str(chosen_id))
grade_data = grade.json()
indices = list(range(0,len(grade_data)))
df = pd.DataFrame(grade_data, index=indices)
ano = df['serie'].tolist()
return ano
So it takes the school id, requests the grades from another endpoint and basically does the same thing as the first time I used pandas in the code.
The only difference is the index argument. It started complaining about the lack of index. So I check the length of the list of jsons, generate a list of indices like [0,1,2,...] and passes it as argument to dataframe. So it stopped complaining about it.
But now...I get a KeyError: 'serie'. The warning highlights this: return self._engine.get_loc(casted_key) as the source, I don't know. Still, the dropdown 'ano' (grade) correctly updates and shows it in the dropdown. But the warning never goes away.
I am working on a charting module where I can pass on dataframe and the module will create reports based on plots generated by calling few functions as mentioned below.
I am using Altair for plotting and "Datapane" for creating the report, the documentation of the same can be found here : https://datapane.github.io/datapane/
My DataFrame looks like this
d = {'Date': ['2021-01-01', '2021-01-01','2021-01-01','2021-01-01','2021-01-02','2021-01-03'],
'country': ['IND','IND','IND','IND','IND','IND' ],
'channel': ['Organic','CRM','Facebook','referral','CRM','CRM' ],
'sessions': [10000,8000,4000,2000,7000,6000 ],
'conversion': [0.1,0.2,0.1,0.05,0.12,0.11 ],
}
country_channel = pd.DataFrame(d)
Plotting functions :
def plot_chart(source,Y_axis_1,Y_axis_2,chart_caption):
base = alt.Chart(source).encode(
alt.X('Date:T', axis=alt.Axis(title="Date"))
)
line_1 = base.mark_line(opacity=1, color='#5276A7').encode(
alt.Y(Y_axis_1,
axis=alt.Axis( titleColor='#5276A7'))
)
line_2 = base.mark_line(opacity=0.3,color='#57A44C', interpolate='monotone').encode(
alt.Y(Y_axis_2,
axis=alt.Axis( titleColor='#57A44C'))
)
chart_ae=alt.layer(line_1, line_2).resolve_scale(
y = 'independent'
).interactive()
charted_plot = dp.Plot(chart_ae , caption=chart_caption)
return charted_plot
def channel_plot_split(filter_1,filter_2,country,channel):
channel_split_data = country_channel[(country_channel[filter_1]==country.upper())]
channel_split_data =channel_split_data[(channel_split_data[filter_2].str.upper()==channel.upper())]
channel_split_data=channel_split_data.sort_values(by='Date',ascending = True)
channel_split_data=channel_split_data.reset_index(drop=True)
channel_split_data.head()
plot_channel_split = plot_chart(source=channel_split_data,Y_axis_1='sessions:Q',Y_axis_2='conversion:Q',chart_caption="Sessions-Conversion Plot for Country "+country.upper()+" and channel :"+ channel)
channel_plot=dp.Group(dp.HTML("<div class='center'> <h3> Country : "+country.upper()+" & Channel : "+channel.upper()+"</h3></div>"),plot_channel_split,rows=2)
return channel_plot
def grpplot(plot_1,plot_2):
gp_plot = dp.Group(plot_1,plot_2,columns=2)
return gp_plot
The above functions when called, will filter the dataframe, create plot for each filters and group 2 plots in a row.
row_1 = grpplot(channel_plot_split('country','channel','IND','Organic'),channel_plot_split('country','channel','IND','CRM'))
row_2 = grpplot(channel_plot_split('country','channel','IND','Facebook'),channel_plot_split('country','channel','IND','referral'))
I can now generate a report by calling datapane.Report() function as follows
r= dp.Report(row_1,row_2)
Problem: This works fine when I know how many channels are present, but my channel list is dynamic.I am thing of using "for" loop to generate rows, but not sure how can I pass on these rows as kwargs in dp.Report() function. For example, if I have 10 channels, I need to pass 10 rows dynamically.
I had a similar problem and solved it as follows
Create a list to store the pages or elements of the report, such as
report_pages=[]
report_pages.append(dp.Page)
report_pages.append(dp.Table)
report_pages.append(dp.Plot)
At the end just generate the report with a pointer to the list
dp.Report(*pages)
In your case, I think you can do the following
create a list
rows=[]
add the rows to the list
rows.append(row_1)
rows.append(row_2)
and then create the report with
r= dp.Report(*rows)
I found this solution on datapane's GitHub and in this notebook in the last line of code.
So here is how I solved this problem.
channel_graph_list=[]
for i in range(0,len(unique_channels),1):
channel_1_name = unique_channels[i]
filtered_data = filter_the_data(source=channel_data,filter_1='channel',fv_1=channel_1_name)
get_chart = plot_chart(filtered_data,Y_axis_1='sessions:Q',Y_axis_2='conversion:Q',chart_title='Session & Conv. Chart for '+channel_1_name)
#This is where the trick starts - The below code creates a dynamic variable
vars() ["channel_row_"+str(i)] = get_chart
channel_graph_list.append("dp.Plot(channel_row_"+str(i)+",label='"+channel_1_name+"')")
#convert the list to a string
channel_graph_row = ','.join(channel_graph_list)
# assign the code you want to run
code="""channel_graph = dp.Select(blocks=["""+channel_graph_row+ """],type=dp.SelectType.TABS)"""
#execute the code
exec(code)
Hope the above solution helps others looking to pass dynamically generated parameters into any function.
I am creating a word search app using Dash by Plotly - I have seen some other similar questions to mine out there, but none seem to hit my direct point. I want to have a user enter a query into a Dash object, in my case a dcc.Input, and have that input create a DataFrame (or a dt.DataTable if someone can explain how to further manipulate those properly). Most the examples on Dash's website have a pre-built DataFrame, if not pre-built, no examples show an #app.callback creating a DataFrame.
So... step by step where I am
Here is my app.layout. I want to pass an input that creates a DataFrame/table. Then, pass that resulting table to some graphs (starting with one for simplicity).
app.layout = html.Div([
html.H2('Enter a text query'),
html.H6('Searching multiple words will create an AND statement where \
\n |valve leak| will return records with valve and leak. Or, \
\n you can use " " to search for specific phrases like "valve leak".'),
dcc.Input(id='searchId', value='Enter Search Term', type='text'),
html.Button('Submit', id='button', n_clicks=0),
dcc.Graph(id='tableGraph', figure='fig'),
html.Button('Update Graph', id='graph', n_clicks=0),
dt.DataTable(style_cell={
'whiteSpace': 'normal',
'height': 'auto',
'textAlign': 'left'
}, id='queryTable',
)
])
Here is the first search callback. Right now, I am attempting to use a global df to 'export' the DataFrame from the function. A problem is that Dash does not really allow DataFrame returns (or does it? not really sure how to extract my search DataFrame). This does output the table properly via data, columns
#app.callback(
[Output(component_id='queryTable', component_property='data'),
Output(component_id='queryTable', component_property='columns')],
[Input(component_id='button', component_property='n_clicks')],
[State('searchId', 'value')]
)
def update_Frame(n_clicks, value):
if n_clicks > 0:
with index.searcher() as searcher:
parser = QueryParser("content", index.schema)
myquery = parser.parse(value)
results = searcher.search(myquery, limit=None)
#print(results[0:10])
print("Documents Containing ", value, ": ", len(results), "\n")
global df
df = pd.DataFrame([i['date'], i['site'], i['ticket'], i.score, i['docId'],i['content']] for i in results)
df.columns=['Reported Date', 'Site','Ticket ID', 'Score', 'Document ID', 'Content']
columns = [{'name': col, 'id': col} for col in df.columns]
data = df.to_dict(orient='records')
return data, columns
Now, if I had the DataFrame, I would pass it to another callback to manipulate and create figures. My attempt is to assign the global df in a new callback, but that does not work.
#app.callback(
Output(component_id='tableGraph', component_property='figure'),
[Input(component_id='graph', component_property='n_clicks')]
)
def updateFig(n_clicks):
if n_clicks > 0:
frame = df
frame = frame.sort_values(by='Reported Date')
#fig = px.line(df, x='Reported Date', y='Score', title=value)
frame['Avg'] = frame['Score'].rolling(window=10).mean()
# Test
abc = frame.loc[frame['Site'] =='ABC']
# real
fig = go.Figure()
fig.add_trace(go.Scatter(x=abc['Reported Date'], y=abc['Score'],
mode='markers',
marker_color='BLUE',
name='ABC',
text="Site: " + abc['Site'].map(str) + " " + "Ticket: "+ abc['Ticket ID'].map(str)))
# There is a good bit more of figure trace stuff here, but I am shortening it.
print(fig)
return fig
It seems that Python is recognizing the correct frame, and when I print fig the console shows what looks to be the correct Dash object. However, no figure appears on the actual test website. My main question is: How can I pass a variable to a Dash object and ultimately a callback to create an initial DataFrame to pass to further Dash objects?
Thank you for reading a long question
You could use dcc.Store. The dcc.Store component works like a session based storage. For your case you would have two callbacks then.
First define the Store component in your Frontend section:
dcc.Store(id='memory')
The first callback where you output the genereated data into the dcc.Store component.
#app.callback(Output('memory', 'data'), [Input('button', 'n_clicks')])
The second callback where you fetch the data from the storage to show graphs/plots or anything else
#app.callback(Output('queryTable', 'data'), [Input('memory', 'data')])
If I understand correctly, your user input from dcc.Input is used as a search query and that generates your main dataframe lets say op_df.
Edit:
Not sure of what exactly you are generating in your df, but a psuedo code to give you some pointers:
def generate_df(user_input):
created_dict = {'col': user_input, 'value': user_input * 3}
op_df = pd.DataFrame(created_dict)
return op_df
Now to display this op_df, you can make use of plotly graph_object's `dataTables. Here is the official documentation for dataTables. As an example, in your layout part, you would have :
dcc.Graph(
id='main_table',
figure=mn_table,
style = {'width':'50%', 'height':'30%'} #
)
And you can then generate mn_table as:
mn_table = go.Figure(data=[go.Table(
header=dict(fill_color='white', line_color='black'),
cells=dict(values=[op_df['Col_name'], op_df['Values']],
fill_color='white',
align='left',
font_size=16,
line_color='black',
height=25
))
])
Later in the callback you can pass in the user input and call the function(generate_df) that calculates or generates your op_df.
Edit2:
Psuedo code for callback:
#app.callback(Output('main_table', 'figure'),
[Input('user_ip', 'value')]
def refresh_df(user_input):
new_table = generate_df(user_input)
return new_table
I am trying to use a Holoviz Panel dropdown widget value to query a dataframe. The dataframe however does not reflect the change in the dropdown value. I added a markdown widget to check if the change in the dropdown value is being captured - It seems to be. However, I can't figure out how to update the dataframe. I am a complete beginner to programming, just trying to learn. Any help is appreciated.
import pandas as pd
import panel as pn
pn.extension()
# Dataframe
df = pd.DataFrame({'CcyPair':['EUR/USD', 'AUD/USD' ,'USD/JPY'],
'Requester':['Client1', 'Client2' ,'Client3'],
'Provider':['LP1', 'LP2' ,'LP3']})
# Dropdown
a2 = pn.widgets.Select(options=list(df.Provider.unique()))
# Query dataframe based on value in Provider dropdown
def query(x=a2):
y = pn.widgets.DataFrame(df[(df.Provider==x)])
return y
# Test Markdown Panel to check if the dropdown change returns value
s = pn.pane.Markdown(object='')
# Register watcher and define callback
w = a2.param.watch(callback, ['value'], onlychanged=False)
def callback(*events):
print(events)
for event in events:
if event.name == 'value':
df1 = query(event.new)
s.object = event.new
# Display Output
pn.Column(query, s)
Output Image
Inspired by the self-answer, the following code produces a select box containing the list of providers and a dataframe filtered on that selection. It was tested on Panel version 0.13.1.
Note that the watch=True suggestion in the self-answer wasn't necessary.
import pandas as pd
import panel as pn
pn.extension()
# Dataframe
df = pd.DataFrame({
'CcyPair':['EUR/USD', 'AUD/USD' ,'USD/JPY'],
'Requester':['Client1', 'Client2' ,'Client3'],
'Provider':['LP1', 'LP2' ,'LP3']
})
# Dropdown
providers = list(df.Provider.unique())
select_widget = pn.widgets.Select(options=providers)
# Query dataframe based on value in Provider dropdown
#pn.depends(select_widget)
def query(x):
filtered_df = pn.widgets.DataFrame(df[df.Provider==x])
return filtered_df
# Display Output
pn.Column(select_widget, query)
Figured it out, turned out I just needed to add #pn.depends above my query function. Once I added pn.depends(a2, watch=True), the dataframe was filtered based on a2 input. The callback and watcher were unnecessary.