I'm making a multi-page dash application that I plan to host on a server using Gunicorn and Nginx. It will access a PostgreSQL database on an external server over the network.
The data on one of the pages is obtained by a query from the database and should be updated every 30 seconds. I use to update the #callback through the dcc.Interval.
My code (simplified version):
from dash import Dash, html, dash_table, dcc, Input, Output, callback
import dash_bootstrap_components as dbc
from flask import Flask
import pandas as pd
from random import random
server = Flask(__name__)
app = Dash(__name__, server=server, suppress_callback_exceptions=True, external_stylesheets=[dbc.themes.BOOTSTRAP])
app.layout = html.Div([
dcc.Interval(
id='interval-component-time',
interval=1000,
n_intervals=0
),
html.Br(),
html.H6(id='time_update'),
dcc.Interval(
id='interval-component-table',
interval=1/2*60000,
n_intervals=0
),
html.Br(),
html.H6(id='table_update')
])
#callback(
Output('time_update', 'children'),
Input('interval-component-time', 'n_intervals')
)
def time_update(n_intervals):
time_show = 30
text = "Next update in {} sec".format(time_show - (n_intervals % 30))
return text
#callback(
Output('table_update', 'children'),
Input('interval-component-table', 'n_intervals')
)
def data_update(n_intervals):
# here in a separate file a query is made to the database and a dataframe is returned
# now here is a simplified receipt df
col = ["Col1", "Col2", "Col3"]
data = [[random(), random(), random()]]
df = pd.DataFrame(data, columns=col)
return dash_table.DataTable(df.to_dict('records'),
style_cell={'text-align': 'center', 'margin-bottom': '0'},
style_table={'width':'500px'})
if __name__ == '__main__':
server.run(port=5000, debug=True)
Locally, everything works fine for me, the load on the database is small, one such request loads 1 out of 8 processors by 30% for 3 seconds.
But, if you open my application in several browser windows, then the same data is displayed on two pages by two queries to the database at different times, that is, the load doubles. I am worried that when connecting more than 10 people, my server with the database will not withstand / will freeze heavily, and the database on it should work without delay and not fall.
Question:
Is it possible to make page refresh the same for different connections? That is, so that the data is updated at the same time for different users and only with the help of one query to the database.
I studied everything about the callback in the documentation and did not find an answer.
Solution
Thanks for the advice, #Epsi95! I studied page Dash Performance and added this to my code:
cache = Cache(app.server, config={
'CACHE_TYPE': 'filesystem',
'CACHE_DIR': 'cache-directory',
'CACHE_THRESHOLD': 50
})
#cache.memoize(timeout=30)
def query_data():
# here I make a query to the database and save the result in a dataframe
return df
def dataframe():
df = query_data()
return df
And in the #callback function I make a call to the dataframe() function.
Everything works the way I needed it. Thank you!
Related
I am developing a dash application. In that I have file upload feature. The file size is big enough minimum is some about 100MB to support that I have set max_size=-1 (no file size limit).
Below is code:
dcc.Upload(
id="upload_dataset",
children=html.Div(
[
"Drag and Drop or ",
html.A(
"Select File",
style={
"font-weight": "bold",
},
title="Click to select file.",
),
]
),
multiple=False,
max_size=-1,
)
The uploaded files are saved on server side. This dcc.upload component has attribute contents which holds the entire data in string format using base64. While browsing I come to know that before sending the data to server, this contents is also stored in web browser memory.
Problem: for small file size storing contents in web browser memory may be fine. Since I have large file size by doing so browser may crash and app freeze.
Is there any way to by-pass this default behavior and I will like to send file in chunks or as stream?
How to achieve this in dash using dcc.upload component or any other way?
You can use the dash-uploader library. It allows you to directly transfer data from the browser to the server hard drive, so you don't face any file size issues.
This library is a hobby project of the maintainer, so it might not be the most production worthy library. Though, I tested it today and it seems stable enough, I even got it to working with a Dash app that runs on an AWS Lambda.
Visit the more extensive documentation to get started with the library.
Here is a short code example to get you started with a local version.
requirements.txt
install with pip install -r requirements.txt
dash==2.8.1
dash-uploader==0.7.0a1
packaging==21.3
app.py
Copy code in file app.py and run the file. It runs as-is.
import pprint
from pathlib import Path
import os
import uuid
import dash_uploader as du
import dash
from dash import Output, html
app = dash.Dash(__name__)
UPLOAD_FOLDER_ROOT = Path("./tmp") / "uploads"
du.configure_upload(
app,
str(UPLOAD_FOLDER_ROOT),
use_upload_id=True,
)
def get_upload_component(id):
return du.Upload(
id=id,
max_file_size=50, # 50 Mb
chunk_size=4, # 4 MB
filetypes=["csv", "json", "txt", "xlsx", "xls", "png"],
upload_id=uuid.uuid1(), # Unique session id
)
def get_app_layout():
return html.Div(
[
html.H1("Demo"),
html.Div(
children=[
get_upload_component("upload_data"),
html.Div(
id="upload_output",
),
],
style={ # wrapper div style
"textAlign": "center",
"width": "600px",
"padding": "10px",
"display": "inline-block",
},
),
],
style={
"textAlign": "center",
},
)
# get_app_layout is a function
# This way we can use unique session id's as upload_id's
app.layout = get_app_layout
#du.callback(
output=Output("upload_output", "children"),
id="upload_data",
)
def callback_on_completion(status: du.UploadStatus):
"""Has some print statements to get you started in understanding the status
object and how to access the file location of your uploaded file."""
pprint.pprint(status.__dict__)
print(f"Contents of {UPLOAD_FOLDER_ROOT}:\n{os.listdir(UPLOAD_FOLDER_ROOT)}")
upload_id_folder = Path(status.uploaded_files[0]).parent
print(f"Current upload_id: {upload_id_folder.name}")
print(
f"Contents of subfolder {upload_id_folder.name}:\n{os.listdir(upload_id_folder)}"
)
return html.Ul([html.Li(str(x)) for x in status.uploaded_files])
if __name__ == "__main__":
app.run(debug=True)
I am making my first dash application. I am creating a layout which will contain a dash_table however at load up time the table will be empty as the table will populate once the user selects an option.
I have tried setting the dash table to {} & none but when I do this the page will not load. How can I have an empty table as part of my layout when loading the page?
You need to provide a dictionary of column names at least to create an empty datatable. You can leave the data attribute as empty, here is a minimally working example:
from dash import Dash, dash_table, html
app = Dash(__name__)
app.layout = html.Div([
dash_table.DataTable(id="table_infos",
columns=[
{'id': "Intitulé", 'name': "Intitulé"},
{'id': "Donnée", 'name': "Donnée"}
]
)
])
if __name__ == '__main__':
app.run_server(debug=True)
I have a dash table. Table allows edit. I want to sort the table by column, so that if user input data, the table is resorted right away. I achieve this like on the page https://dash.plotly.com/datatable/callbacks. The sort is already set when the page loads. I got stuck on the last step, where I want to hide the sort option from user. Is that possible?
Example on the image. I want to delete the arrows marked yellow, but keep sort by column 'pop'.
edited code example from https://dash.plotly.com/datatable/callbacks:
import dash
from dash.dependencies import Input, Output
import dash_table
import pandas as pd
app = dash.Dash(__name__)
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
PAGE_SIZE = 5
app.layout = dash_table.DataTable(
id='table-multicol-sorting',
columns=[
{"name": i, "id": i} for i in sorted(df.columns)
],
data=df.to_dict('records'),
page_size=PAGE_SIZE,
sort_action='native',
sort_mode='multi',
sort_as_null=['', 'No'],
sort_by=[{'column_id': 'pop', 'direction': 'asc'}],
editable=True,
)
if __name__ == '__main__':
app.run_server(debug=True)
You can target the sort element and hide it using css like this:
span.column-header--sort {
display: none;
}
So you can put that code in a css file in your assets directory for example. See the documentation here for more information about ways to include styles in a dash app.
I am able to do it by sort_action='none' in Dash v1.16.2
I have a Dash application that lets the user filter a pandas dataframe, which results in a graph. They can also download a CSV of the filtered dataframe. This is accomplished by passing arguments to the URL, retrieving the arguments using flask.request.args, refiltering the database, and writing the output to a file.
While working on this solution, I added print statements to help me track variable values. Although the download link is working with the desired result, I came across some behavior that I don't fully understand. I think this may have something to do with #app.server.route and when/how it is executed.
For example:
Print statements are not always executed. They are executed sometimes.
They seem to have a higher rate of execution once I apply filters, rather than clicking the download link with no filters applied.
After applying a filter, clicking the download link, and confirming that it caused the print statements to execute, reloading the app and applying the same filter may result in the print statements not executing.
The download link always performs as intended, but I do not understand how the dataframe is being filtered and written via #app.server.route('/download_csv'), while also skipping over the print statements.
UPDATE
I have produced an MRE for Jupyter notebooks. Please note that you must pip install jupyter-dash for this code to execute.
Some notes:
I ran three tests where I would click the download CSV link 10x each.
In the first two tests, the print statements executed 8/10 times.
In the final test, they executed 3/10 times.
In the third test, I cleared the age dropdown and performed most of the tests with it as 'Null.' Sometimes print statements will execute and return 'Null', however most times there was no execution.
The MRE is below:
Update 2
After waiting 24hrs and running the code again, all of the lines in #app.server.route seem to be executing. That is, until I click the download button quickly after changing filters. When this happens, I can get the print statements to not execute. Despite not executing, the other lines are. Some guesses as to what is going on:
When the print statements don't execute, it seems that a previous version of the code is being executed. Perhaps it is stored in some temporary memory location?
It seems that restarting the system or long periods of inactivity cause the current version of the code to become the default when print statements don't execute.
print statements seem to execute less frequently after quick filter changes and download requests.
import plotly.express as px
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import numpy as np
# Load Data
df = pd.DataFrame({'id':np.arange(100), 'name':['cat', 'dog', 'mouse', 'bird', 'rabbit']*20, 'age':np.random.randint(1,30,size=100)})
# Build App
app = JupyterDash(__name__)
app.layout = html.Div([
html.H1("JupyterDash Demo"),
dcc.Graph(id='graph'),
html.Label([
"names",
dcc.Dropdown(
id='names', clearable=False,
value='cat', options=[
{'label': names, 'value': names}
for names in df.name.unique()
])
]),
html.Label([
"age",
dcc.Dropdown(
id='age', clearable=True,
options=[
{'label': att, 'value': att}
for att in df.age.unique()
])
]),
html.Br(),
html.A(
"Download CSV",
id="download_csv",
href="#",
className="btn btn-outline-secondary btn-sm"
)
])
# Define callback to update graph
#app.callback(
[Output('graph', 'figure'),
Output('download_csv', 'href')],
[Input("names", "value"),
Input('age', 'value')]
)
def update_figure(names, age):
if not names:
names = 'Null'
fil_df = df
else:
fil_df = df[df['name'].isin([names])]
fig = px.bar(
fil_df, x='id', y='age',
title="Animals"
)
if not age:
age = 'Null'
fil_df = fil_df
else:
fil_df = fil_df[(fil_df['age'].isin([age]))]
fig = px.bar(
fil_df, x='id', y='age', title="Animals"
)
return fig, "/download_csv?value={}/{}".format(names, age)
app.run_server(mode='inline')
#app.server.route('/download_csv')
def download_csv():
value = flask.request.args.get('value')
value = value.split('/')
selected_1 = value[0]
selected_2 = value[1]
print(selected_1)
print(selected_2)
str_io = io.StringIO()
df.to_csv(str_io)
mem = io.BytesIO()
mem.write(str_io.getvalue().encode('utf-8'))
mem.seek(0)
str_io.close()
return flask.send_file(mem,
mimetype='text/csv',
attachment_filename='downloadFile.csv',
as_attachment=True)
I built a flask web app that manipulates big dataframes.
Each time I make a call to the url, the used RAM of the server increase of 2Gb.
But if I kill the browser session, the server used RAM does not decrease (no clear of the used memory) --> it ends with a memory error.
The structure of the code is the following :
import pandas as pd
from flask import Flask,render_template,request
app = Flask(__name__)
#app.route("/index", methods = ['GET','POST'])
def index():
global df
df2 = df.copy()
#...data treatment on df2 depending on POST parameters...#
return render_template('template.html', df2 = df2.to_html())
df = pd.read_csv("path_to_huge_dataframe")
app.run(debug=True,threaded=True)
I canot load the dataframe in the index() function because it takes several minutes because of the size of the file. So I use a 'global' variable but I think it is the reason of my problem.