Create an initial/update Pandas DataFrame with Dash Object - python

I am creating a word search app using Dash by Plotly - I have seen some other similar questions to mine out there, but none seem to hit my direct point. I want to have a user enter a query into a Dash object, in my case a dcc.Input, and have that input create a DataFrame (or a dt.DataTable if someone can explain how to further manipulate those properly). Most the examples on Dash's website have a pre-built DataFrame, if not pre-built, no examples show an #app.callback creating a DataFrame.
So... step by step where I am
Here is my app.layout. I want to pass an input that creates a DataFrame/table. Then, pass that resulting table to some graphs (starting with one for simplicity).
app.layout = html.Div([
html.H2('Enter a text query'),
html.H6('Searching multiple words will create an AND statement where \
\n |valve leak| will return records with valve and leak. Or, \
\n you can use " " to search for specific phrases like "valve leak".'),
dcc.Input(id='searchId', value='Enter Search Term', type='text'),
html.Button('Submit', id='button', n_clicks=0),
dcc.Graph(id='tableGraph', figure='fig'),
html.Button('Update Graph', id='graph', n_clicks=0),
dt.DataTable(style_cell={
'whiteSpace': 'normal',
'height': 'auto',
'textAlign': 'left'
}, id='queryTable',
)
])
Here is the first search callback. Right now, I am attempting to use a global df to 'export' the DataFrame from the function. A problem is that Dash does not really allow DataFrame returns (or does it? not really sure how to extract my search DataFrame). This does output the table properly via data, columns
#app.callback(
[Output(component_id='queryTable', component_property='data'),
Output(component_id='queryTable', component_property='columns')],
[Input(component_id='button', component_property='n_clicks')],
[State('searchId', 'value')]
)
def update_Frame(n_clicks, value):
if n_clicks > 0:
with index.searcher() as searcher:
parser = QueryParser("content", index.schema)
myquery = parser.parse(value)
results = searcher.search(myquery, limit=None)
#print(results[0:10])
print("Documents Containing ", value, ": ", len(results), "\n")
global df
df = pd.DataFrame([i['date'], i['site'], i['ticket'], i.score, i['docId'],i['content']] for i in results)
df.columns=['Reported Date', 'Site','Ticket ID', 'Score', 'Document ID', 'Content']
columns = [{'name': col, 'id': col} for col in df.columns]
data = df.to_dict(orient='records')
return data, columns
Now, if I had the DataFrame, I would pass it to another callback to manipulate and create figures. My attempt is to assign the global df in a new callback, but that does not work.
#app.callback(
Output(component_id='tableGraph', component_property='figure'),
[Input(component_id='graph', component_property='n_clicks')]
)
def updateFig(n_clicks):
if n_clicks > 0:
frame = df
frame = frame.sort_values(by='Reported Date')
#fig = px.line(df, x='Reported Date', y='Score', title=value)
frame['Avg'] = frame['Score'].rolling(window=10).mean()
# Test
abc = frame.loc[frame['Site'] =='ABC']
# real
fig = go.Figure()
fig.add_trace(go.Scatter(x=abc['Reported Date'], y=abc['Score'],
mode='markers',
marker_color='BLUE',
name='ABC',
text="Site: " + abc['Site'].map(str) + " " + "Ticket: "+ abc['Ticket ID'].map(str)))
# There is a good bit more of figure trace stuff here, but I am shortening it.
print(fig)
return fig
It seems that Python is recognizing the correct frame, and when I print fig the console shows what looks to be the correct Dash object. However, no figure appears on the actual test website. My main question is: How can I pass a variable to a Dash object and ultimately a callback to create an initial DataFrame to pass to further Dash objects?
Thank you for reading a long question

You could use dcc.Store. The dcc.Store component works like a session based storage. For your case you would have two callbacks then.
First define the Store component in your Frontend section:
dcc.Store(id='memory')
The first callback where you output the genereated data into the dcc.Store component.
#app.callback(Output('memory', 'data'), [Input('button', 'n_clicks')])
The second callback where you fetch the data from the storage to show graphs/plots or anything else
#app.callback(Output('queryTable', 'data'), [Input('memory', 'data')])

If I understand correctly, your user input from dcc.Input is used as a search query and that generates your main dataframe lets say op_df.
Edit:
Not sure of what exactly you are generating in your df, but a psuedo code to give you some pointers:
def generate_df(user_input):
created_dict = {'col': user_input, 'value': user_input * 3}
op_df = pd.DataFrame(created_dict)
return op_df
Now to display this op_df, you can make use of plotly graph_object's `dataTables. Here is the official documentation for dataTables. As an example, in your layout part, you would have :
dcc.Graph(
id='main_table',
figure=mn_table,
style = {'width':'50%', 'height':'30%'} #
)
And you can then generate mn_table as:
mn_table = go.Figure(data=[go.Table(
header=dict(fill_color='white', line_color='black'),
cells=dict(values=[op_df['Col_name'], op_df['Values']],
fill_color='white',
align='left',
font_size=16,
line_color='black',
height=25
))
])
Later in the callback you can pass in the user input and call the function(generate_df) that calculates or generates your op_df.
Edit2:
Psuedo code for callback:
#app.callback(Output('main_table', 'figure'),
[Input('user_ip', 'value')]
def refresh_df(user_input):
new_table = generate_df(user_input)
return new_table

Related

KeyError warning from pandas dataframe inside plotly dash chained callback

I got multiple dropdowns that I'd like to populate depending on what the user chooses in the previous dropdown. I populate the first dropdown with:
schools_requests = requests.get("http://ipwhatever:portwhatever/list_all_schools")
schools_data = schools_requests.json()
df = pd.DataFrame(schools_data)
nome = df['nome'].tolist()
It gives me the names of the schools I got listed. I then send it (nome) to the first dropdown like this:
html.Label('Escola'),
dcc.Dropdown(
options = nome,
id = "escola",
)
The first callback works fine and it's the one down below:
#callback(
Output('id_school', 'children'),
Input('escola','value')
)
def find_id_school(school_name):
all_schools = requests.get(
"http://ipwhatever:portwhatever/list_all_schools")
all_schools_data = all_schools.json()
for element in all_schools_data:
if school_name == element['nome']:
id_school = element['id_escola']
return id_school
It basically searches for the corresponding school id given the name of the school the user chose in the first dropdown and stores this id in a hidden html.Div.
Now comes the second callback, where I use pandas and don't understand why it's different from the first time.
#callback(
Output('ano', 'options'),
Input('id_school', 'children')
)
def render_grade_from_school(chosen_id):
grade = requests.get(
"http://ipwhatever:portwhatever/grade?school_id="+str(chosen_id))
grade_data = grade.json()
indices = list(range(0,len(grade_data)))
df = pd.DataFrame(grade_data, index=indices)
ano = df['serie'].tolist()
return ano
So it takes the school id, requests the grades from another endpoint and basically does the same thing as the first time I used pandas in the code.
The only difference is the index argument. It started complaining about the lack of index. So I check the length of the list of jsons, generate a list of indices like [0,1,2,...] and passes it as argument to dataframe. So it stopped complaining about it.
But now...I get a KeyError: 'serie'. The warning highlights this: return self._engine.get_loc(casted_key) as the source, I don't know. Still, the dropdown 'ano' (grade) correctly updates and shows it in the dropdown. But the warning never goes away.

Creating several input boxes from df in Python - smartly

I'm trying to create an input form based on a Excel spreadsheet.
I use the spreadsheet to create a dataframe (which has 30 "products" listed)
I need to create a series of input boxes for each product on the list.
Currently I do this in a very inefficient way :
product_1 = (ipw.Dropdown(options=barrier_list['Product Name'],
value = barrier_list['Product Name'][0],
description= barrier_list['ISIN'][0],
disabled=False,
layout = {'width':'350px'}))
product_1.style.description_width = 'initial'
units_1 = (ipw.IntText(value=for_table['Units'][0],
description='Units:',
disabled=False,
layout = {'width':'200px'}
))
price_1 = (ipw.FloatText(value=for_table['Price'][0],
description='Price:',
disabled=False,
layout = {'width':'200px'}
))
value_1 = (ipw.FloatText(value=0,
description='Value:',
disabled=False,
layout = {'width':'200px'}
))
HBox_1 = ipw.HBox([product_1,units_1,price_1, value_1])
Which creates exactly what I need for one line of the input sheet. To do the next line I copy this exact code again and change all the [0] to [1]. This goes on 30x.
I know this is a terrible way to do it but I cannot figure out how to use a loop to create the 30 lines (1 per product) of input boxes.
The guidance worked. My solution was simple and along the lines of :
dd_list = []
for i in range(len(barrier_list['Product Name'])):
dropdown = widgets.Text(description=barrier_list['ISIN'][i],
value=barrier_list['Product Name'][i])
dd_list.append(dropdown)
VBox1 = widgets.VBox(dd_list)
VBox1
Just need to create the other boxes and add to the VBox.

Python Dash Callback is not updating Data when I select Dropdown Values

Can anyone let me know why my code is not updating the graph with data when I select drop-down value? (entire GitHub code in link in comments below)
def filterPollutants(selected_pollutants):
if selected_pollutants:
dff = df.loc[df.pollutant_abb.isin([selected_pollutants])]
else:
dff = df
bar_fig = {
"data": [
go.Bar(
x=dff["U_SAMPLE_DTTM"],
y=dff["DISPLAYVALUE"],
)
],
"layout": go.Layout(
title="Sampling for Local Limits",
# yaxis_range=[0,2],
yaxis_title_text="Metals mg/L",
),
}
From testing the code at the Github link you shared I think the problem is in this line where you filter your data set in the callback:
dff = df.loc[df.pollutant_abb.isin([selected_pollutants])]
The problem with this line is that the value of selected_pollutants inside the callback is of the form ['SELECTED_VALUE_FROM_DROPDOWN'] and not 'SELECTED_VALUE_FROM_DROPDOWN'. This is because your dropdown has multi set to True.
But because of this your isin filter doesn't work, since you're essentially doing this:
dff = df.loc[df.pollutant_abb.isin([['SELECTED_VALUE_FROM_DROPDOWN']])]
instead of this:
dff = df.loc[df.pollutant_abb.isin(['SELECTED_VALUE_FROM_DROPDOWN'])]
So the fix could be to remove the list surrounding selected_pollutants:
dff = df.loc[df.pollutant_abb.isin(selected_pollutants)]

Python (Datapane) : How to pass dynamic variables into a datapane report function

I am working on a charting module where I can pass on dataframe and the module will create reports based on plots generated by calling few functions as mentioned below.
I am using Altair for plotting and "Datapane" for creating the report, the documentation of the same can be found here : https://datapane.github.io/datapane/
My DataFrame looks like this
d = {'Date': ['2021-01-01', '2021-01-01','2021-01-01','2021-01-01','2021-01-02','2021-01-03'],
'country': ['IND','IND','IND','IND','IND','IND' ],
'channel': ['Organic','CRM','Facebook','referral','CRM','CRM' ],
'sessions': [10000,8000,4000,2000,7000,6000 ],
'conversion': [0.1,0.2,0.1,0.05,0.12,0.11 ],
}
country_channel = pd.DataFrame(d)
Plotting functions :
def plot_chart(source,Y_axis_1,Y_axis_2,chart_caption):
base = alt.Chart(source).encode(
alt.X('Date:T', axis=alt.Axis(title="Date"))
)
line_1 = base.mark_line(opacity=1, color='#5276A7').encode(
alt.Y(Y_axis_1,
axis=alt.Axis( titleColor='#5276A7'))
)
line_2 = base.mark_line(opacity=0.3,color='#57A44C', interpolate='monotone').encode(
alt.Y(Y_axis_2,
axis=alt.Axis( titleColor='#57A44C'))
)
chart_ae=alt.layer(line_1, line_2).resolve_scale(
y = 'independent'
).interactive()
charted_plot = dp.Plot(chart_ae , caption=chart_caption)
return charted_plot
def channel_plot_split(filter_1,filter_2,country,channel):
channel_split_data = country_channel[(country_channel[filter_1]==country.upper())]
channel_split_data =channel_split_data[(channel_split_data[filter_2].str.upper()==channel.upper())]
channel_split_data=channel_split_data.sort_values(by='Date',ascending = True)
channel_split_data=channel_split_data.reset_index(drop=True)
channel_split_data.head()
plot_channel_split = plot_chart(source=channel_split_data,Y_axis_1='sessions:Q',Y_axis_2='conversion:Q',chart_caption="Sessions-Conversion Plot for Country "+country.upper()+" and channel :"+ channel)
channel_plot=dp.Group(dp.HTML("<div class='center'> <h3> Country : "+country.upper()+" & Channel : "+channel.upper()+"</h3></div>"),plot_channel_split,rows=2)
return channel_plot
def grpplot(plot_1,plot_2):
gp_plot = dp.Group(plot_1,plot_2,columns=2)
return gp_plot
The above functions when called, will filter the dataframe, create plot for each filters and group 2 plots in a row.
row_1 = grpplot(channel_plot_split('country','channel','IND','Organic'),channel_plot_split('country','channel','IND','CRM'))
row_2 = grpplot(channel_plot_split('country','channel','IND','Facebook'),channel_plot_split('country','channel','IND','referral'))
I can now generate a report by calling datapane.Report() function as follows
r= dp.Report(row_1,row_2)
Problem: This works fine when I know how many channels are present, but my channel list is dynamic.I am thing of using "for" loop to generate rows, but not sure how can I pass on these rows as kwargs in dp.Report() function. For example, if I have 10 channels, I need to pass 10 rows dynamically.
I had a similar problem and solved it as follows
Create a list to store the pages or elements of the report, such as
report_pages=[]
report_pages.append(dp.Page)
report_pages.append(dp.Table)
report_pages.append(dp.Plot)
At the end just generate the report with a pointer to the list
dp.Report(*pages)
In your case, I think you can do the following
create a list
rows=[]
add the rows to the list
rows.append(row_1)
rows.append(row_2)
and then create the report with
r= dp.Report(*rows)
I found this solution on datapane's GitHub and in this notebook in the last line of code.
So here is how I solved this problem.
channel_graph_list=[]
for i in range(0,len(unique_channels),1):
channel_1_name = unique_channels[i]
filtered_data = filter_the_data(source=channel_data,filter_1='channel',fv_1=channel_1_name)
get_chart = plot_chart(filtered_data,Y_axis_1='sessions:Q',Y_axis_2='conversion:Q',chart_title='Session & Conv. Chart for '+channel_1_name)
#This is where the trick starts - The below code creates a dynamic variable
vars() ["channel_row_"+str(i)] = get_chart
channel_graph_list.append("dp.Plot(channel_row_"+str(i)+",label='"+channel_1_name+"')")
#convert the list to a string
channel_graph_row = ','.join(channel_graph_list)
# assign the code you want to run
code="""channel_graph = dp.Select(blocks=["""+channel_graph_row+ """],type=dp.SelectType.TABS)"""
#execute the code
exec(code)
Hope the above solution helps others looking to pass dynamically generated parameters into any function.

Plotly Dash figure/graph/callback: Where do I go from here?

I get the following error, and looking at my code I can't figure out why?
I've tried applying to_json/to_dict('records') to my return fig object but it doesn't seem to fix it.
Any help would be appreciated...
dash.exceptions.InvalidCallbackReturnValue: The callback for <Output bar_line_1.figure>
returned a value having type Figure
which is not JSON serializable.
......
...
.
The value in question is either the only value returned,
or is in the top level of the returned list,
........
...
.
In general, Dash properties can only be
dash components, strings, dictionaries, numbers, None,
or lists of those.
#app.callback(
Output('bar_line_1', 'figure'),
[Input('region', 'value')],
[Input('countries', 'value')],
[Input('select_years', 'value')]
)
def update_graph(region, countries, select_years):
mask = (
(data['Region'] == region)
& (data['Country'] == countries)
& (data['Year'] >= select_years[0])
& (data['Year'] <= select_years[1])
)
filtered_data = data.loc[mask, :]
fig = make_subplots(specs=[[{'secondary_y': True}]])
fig.add_trace(
go.Scatter(
x=filtered_data['Year'],
y=filtered_data['GDP'],
name='GDP'
),
secondary_y=False,
)
fig.add_trace(
go.Scatter(
x=filtered_data['Year'],
y=filtered_data['FDI'],
name='FDI'
),
secondary_y=True
)
return fig```
The problem came from a different .py file I used to clean my api data before importing it into my main.py.
One column (Year) had changed into an object type when it was meant to be an integer, I'd been trying to find the error for days.
Converting with the df['Year'].astype(int) resolved the issue and the graph is working through the callback with no change needed.

Categories