Bokeh not updating plot line update from CheckboxGroup - python

I'm following this great tutorial to play a little bit with Bokeh.
Basically, I have a figure with two independent line added to it. Everything is rendered properly but when I want to update nothing happens even if I checked that the new ColumnDataSource is well updated with the new values.
I render it using the command : bokeh serve --show my_app
Here is how I create my figure :
src_p6 = make_dataset(["select_a", "select_b"])
p6 = make_plot(src_p6)
select_selection = CheckboxGroup(labels=["select_a", "select_b"], active = [0, 1])
select_selection.on_change('active', update)
controls = WidgetBox(select_selection)
curdoc().add_root(column(controls, p6, width=1200))
def make_dataset(select_list):
if 'select_a' in select_list and 'select_b' in select_list:
tmp = pd.DataFrame({'time': df["time"],
'a': df["a"],
'b': df["b"]
})
elif 'select_a' in select_list and 'select_b' not in select_list:
tmp = pd.DataFrame({'time': df["time"],
'a': df["a"]
})
elif 'select_a' not in select_list and 'select_b' in select_list:
tmp = pd.DataFrame({'time': df["time"],
'b': df["b"]
})
else:
tmp = pd.DataFrame({'time': df["time"]
})
src = ColumnDataSource(tmp)
return src
def make_plot(plot_src):
p = figure(plot_width=1000, plot_height=600,
title="Line x2 with hover and update",
x_axis_label='Time',
y_axis_label='Values'
)
hover_content = [("Time", "#time")]
if 'a' in plot_src.data:
p.line(x='time', y='a', source=plot_src, legend="A", line_color="blue")
hover_content.append(("A", "#a"))
if 'b' in plot_src.data:
p.line(x='time', y='b', source=plot_src, legend="B", line_color="red")
hover_content.append(("B", "#b"))
p.add_tools(HoverTool(tooltips=hover_content))
return p
def update(attr, old, new):
print(src_p6.data)
select_to_plot = [select_selection.labels[i] for i in select_selection.active]
new_src = make_dataset(select_to_plot)
src_p6.data = new_src.data
print("**********************")
print(src_p6.data) # I see here that the data are well updated compared to the first print
My incoming data is JSON and looks like this :
# {"data":[{"time":0,"a":123,"b":123},{"time":1,"a":456,"b":456},{"time":2,"a":789,"b":789}]}
# data = json.load(data_file, encoding='utf-8')
# df = pd.io.json.json_normalize(data['data'])
Thank you for your insights

This will not function correctly:
src_p6.data = new_src.data
The ColumnDataSource is one of the most complicated objects in Bokeh, e.g. the .data object on a CDS is not a plain Python dict, it has lots of special instrumentation to make things like efficient streaming possible. But it is also tied to the CDS it is created on. Ripping the .data out of one CDS and putting assigning it to another is not going to work. We probably need to find a way to make that complain, I am just not sure how, offhand.
In any case, you need to assign .data from a plain Python dict, as all the examples and demonstrations do:
src_p6.data = dict(...)
For your sepcific code, that probably means having make_dataset just return the dicts it creates directly, instead of putting them in dataframes then making a CDS out of that.

First of all thanks to #bigreddot for his time and guidance.
One of my biggest problem was that I didn't actually wanted to update the values but rather just show/hide it, so just removing it from the source was not working.
Adding line in my make_plot function with a if statement doesn't work also because it is only called the first time the plot is created. For updates, it update the value on the figure but do not reconstruct everything from scratch... So if you start with only one line, I don't know how it will create a new line, if it's even possible...
I started to simplify my make_dataset function to only return a simple Python dict :
tmp = dict(time=df["time"], a=df["a"], b=df["b"])
But when I wanted to remove a line, I used an empty array even if there is better solutions (I was just playing with Bokeh here) : Line ON/OFF, Interactive legend
empty = np.empty(len(df["time"])); empty.fill(None)
tmp = dict(time=df["time"], a=df["a"], b=empty)
When I first create my plot I then do :
src_p6 = ColumnDataSource(data=make_dataset(["select_a", "select_b"]))
p6 = make_plot(src_p6)
And the update function update the .data of the ColumnDataSource with a basic Python dict :
new_src = make_dataset(select_to_plot)
src_p6.data = new_src

Related

Pandas Styler.to_latex() - how to pass commands and do simple editing

How do I pass the following commands into the latex environment?
\centering (I need landscape tables to be centered)
and
\caption* (I need to skip for a panel the table numbering)
In addition, I would need to add parentheses and asterisks to the t-statistics, meaning row-specific formatting on the dataframes.
For example:
Current
variable
value
const
2.439628
t stat
13.921319
FamFirm
0.114914
t stat
0.351283
founder
0.154914
t stat
2.351283
Adjusted R Square
0.291328
I want this
variable
value
const
2.439628
t stat
(13.921319)***
FamFirm
0.114914
t stat
(0.351283)
founder
0.154914
t stat
(1.651283)**
Adjusted R Square
0.291328
I'm doing my research papers in DataSpell. All empirical work is in Python, and then I use Latex (TexiFy) to create the pdf within DataSpell. Due to this workflow, I can't edit tables in latex code while they get overwritten every time I run the jupyter notebook.
In case it helps, here's an example of how I pass a table to the latex environment:
# drop index to column
panel_a.reset_index(inplace=True)
# write Latex index and cut names to appropriate length
ind_list = [
"ageFirm",
"meanAgeF",
"lnAssets",
"bsVol",
"roa",
"fndrCeo",
"lnQ",
"sic",
"hightech",
"nonFndrFam"
]
# assign the list of values to the column
panel_a["index"] = ind_list
# format column names
header = ["", "count","mean", "std", "min", "25%", "50%", "75%", "max"]
panel_a.columns = header
with open(
os.path.join(r"/.../tables/panel_a.tex"),"w"
) as tf:
tf.write(
panel_a
.style
.format(precision=3)
.format_index(escape="latex", axis=1)
.hide(level=0, axis=0)
.to_latex(
caption = "Panel A: Summary Statistics for the Full Sample",
label = "tab:table_label",
hrules=True,
))
You're asking three questions in one. I think I can do you two out of three (I hear that "ain't bad").
How to pass \centering to the LaTeX env using Styler.to_latex?
Use the position_float parameter. Simplified:
df.style.to_latex(position_float='centering')
How to pass \caption*?
This one I don't know. Perhaps useful: Why is caption not working.
How to apply row-specific formatting?
This one's a little tricky. Let me give an example of how I would normally do this:
df = pd.DataFrame({'a':['some_var','t stat'],'b':[1.01235,2.01235]})
df.style.format({'a': str, 'b': lambda x: "{:.3f}".format(x)
if x < 2 else '({:.3f})***'.format(x)})
Result:
You can see from this example that style.format accepts a callable (here nested inside a dict, but you could also do: .format(func, subset='value')). So, this is great if each value itself is evaluated (x < 2).
The problem in your case is that the evaluation is over some other value, namely a (not supplied) P value combined with panel_a['variable'] == 't stat'. Now, assuming you have those P values in a different column, I suggest you create a for loop to populate a list that becomes like this:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
Now, we can apply a function to df.style.format, and pop/select from the list like so:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
panel_a.style.format({'variable': str, 'value': func})
Result:
This solution is admittedly a bit "hacky", since modifying a globally declared list inside a function is far from good practice; e.g. if you modify the list again before calling func, its functionality is unlikely to result in the expected behaviour or worse, it may throw an error that is difficult to track down. I'm not sure how to remedy this other than simply turning all the floats into strings in panel_a.value inplace. In that case, of course, you don't need .format anymore, but it will alter your df and that's also not ideal. I guess you could make a copy first (df2 = df.copy()), but that will affect memory.
Anyway, hope this helps. So, in full you add this as follows to your code:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
with open(fname, "w") as tf:
tf.write(
panel_a
.style
.format({'variable': str, 'value': func})
...
.to_latex(
...
position_float='centering'
))

Create dictionaries from data frames stored in a dictionary in Python

I have a for loop that cycles through and creates 3 data frames and stores them in a dictionary. From each of these data frames, I would like to be able to create another dictionary, but I cant figure out how to do this.
Here is the repetitive code without the loop:
Trad = allreports2[allreports2['Trad'].notna()]
Alti = allreports2[allreports2['Alti'].notna()]
Alto = allreports2[allreports2['Alto'].notna()]
Trad_dict = dict(zip(Trad.State, Trad.Position))
Alti_dict = dict(zip(Alti.State, Alti.Position))
Alto_dict = dict(zip(Alto.State, Alto.Position))
As stated earlier, I understand how to make the 3 dataframes by storing them in a dictionary and I understand what needs to go on the right side of the equal sign in the second statement in the for loop, but not what goes on the left side (denoted below as XXXXXXXXX).
Routes = ['Trad', 'Alti', 'Alto']
dfd = {}
for route in Routes:
dfd[route] = allreports2[allreports2[route].notna()]
XXXXXXXXX = dict(zip(dfd[route].State, dfd[route].Position))
(Please note: I am very new to Python and teaching myself so apologies in advance!)
This compromises readability, but this should work.
Routes = ['Trad', 'Alti', 'Alto']
dfd, output = [{},{}] # Unpack List
for route in Routes:
dfd[route] = allreports2[allreprots2[route].notna()]
output[route] = dict(zip(dfd[route].State, dfd[route].Position))
Trad_dict, Alti_dict, Alto_dict = list(output.values()) # Unpack List
Reference
How can I get list of values from dict?

Realtime multi_line graph updates at decent performance

I'm currently using Bokeh to present a multi_line plot, that has several static lines and one line, that is live updated. This runs fine with only few lines but, depending on the resolution of the lines (usually 2000-4000 points per line), the refreshing rate drops significantly when having 50+ lines in the plot. The CPU usage of the browser is pretty high at that moment.
This is how the the plot is initialized and the live update is triggered:
figure_opts = dict(plot_width=750,
plot_height=750,
x_range=(0, dset_size),
y_range=(0, np.iinfo(dtype).max),
tools='pan,wheel_zoom')
line_opts = dict(
line_width=5, line_color='color', line_alpha=0.6,
hover_line_color='color', hover_line_alpha=1.0,
source=profile_lines
)
profile_plot = figure(**figure_opts)
profile_plot.toolbar.logo = None
multi_line_plot = profile_plot.multi_line(xs='x', ys='y', **line_opts)
profile_plot.xaxis.axis_label = "x"
profile_plot.yaxis.axis_label = "y"
ds = multi_line_plot.data_source
def update_live_plot():
random_arr = np.random.random_integers(65535 * (i % 100) / (100 + 100 / 4), 65535 * (i % 100 + 1) / 100, (2048))
profile = random_arr.astype(np.uint16)
if profile is not None:
profile_lines["x"][i] = x
profile_lines["y"][i] = profile
profile_lines["color"][i] = Category20_20[0]
ds.data = profile_lines
doc.add_periodic_callback(update_live_plot, 100)
Is there any way to make this better performing?
Is it, for example, possible to only update the one line, that needs to get updated, instead of ds.data = profile_lines?
Edit: The one line that needs to be updated has to be updated in its full length. I.e. I'm not streaming data at one end, but instead I have a full new set of 2000-4000 values and want to show those, instead of the old live line.
Currently the live line is the element at i in the arrays in the profile_lines dictionary.
You are in luck, updating a single line with all new elements while keeping the same length is something that can be accomplished with the CDS patch method. (Streaming would not help here, since streaming to the end of a CDS for a multi_line means adding an entire new line, and the other case of streaming to the end of each sub-line does not have a good solution at all.)
There is a patch_app.py example in the repository that shows how to use patch to update one line of a multi_line. The example only updates a single point in the line, but it's possible to update the entire line at once using slices:
source.patch({ 'ys' : [([i, slice(None)], new_y)]})
That will update the ith line in source.data['ys'], as long as new_y has the same length as the old line.

Updating pyfits bin table data

I am trying to update an existing fits table with pyfits. It is working fine for some columns of the table, unfortunately not for the first column.
Here is the columns definition:
ColDefs(
name = 'EVENT_ID'; format = '1J'; bscale = 1; bzero = 2147483648
name = 'TEL_ID'; format = '1I'
name = 'TIMESLICE'; format = '1I'; null = 0...
And the simple code fragment to update it:
event = pyfits.open('file.fits.gz')[1]
event.data.field('EVENT_ID')[0] = np.uint32(event.event_ID)
event.data.field('TEL_ID')[0] = int(tel.ID[2])
event.writeto('test.fits')
Writing TEL_ID (and others not shown here) works, EVENT_ID does not. I already tried different formats (np.int32, int) but always the same...
type(event.data.field('EVENT_ID')[0])
returns numpy.uint32 (for the unmodified file)
Thanks for your help
Edit:
If I change the definition of 'EVENT_ID', leaving out 'bscale' and 'bzero' the update of the value works. So it seems there is a problem with the unsigned integer.

Chaco: 2 tools write the same metadata key

I have this problem with chaco.
In the plot, I need select some points (are points that I generate). This points, I can select with two tools: RangenSelection and ScatterInspector. If I work with only one tool, the code work well and I can detect with points I select, but when I work with both tools, both tools write the same metadata name: selections. This is the most important part of the code:
#this are all the tools.append
my_plot.tools.append(ScatterInspector(my_plot, selection_mode="toggle", persistent_hover=False))
my_plot.overlays.append(
ScatterInspectorOverlay(my_plot,
hover_color = "transparent",
hover_marker_size = 10,
hover_outline_color = "purple",
hover_line_width = 2,
selection_marker_size = 8,
selection_color = "red")
)
my_plot.tools.append(RangeSelection(my_plot, left_button_selects = False, rigth_button_selects = True, auto_handle_event = False))
my_plot.overlays.append(RangeSelectionOverlay(component=my_plot))
my_plot.tools.append(PanTool(my_plot))
my_plot.overlays.append(ZoomTool(my_plot, drag_button="right"))
return plot
#the rest of the code
def _metadata_handler(self):
sel_indices = self.index_datasource.metadata.get('selections', [])
su = self.index_datasource.metadata.get('annotations', [])
print su
print "Selection indices:", sel_indices
def _plot_default(self):
plot = _create_plot_component()
# Retrieve the plot hooked to the tool.
my_plot = plot.plots["my_plot"][0]
# Set up the trait handler for the selection
self.index_datasource = my_plot.index
self.index_datasource.on_trait_change(self._metadata_handler,
"metadata_changed")
return plot
When I run the code, and see what are in annotations, is always empty. But in selections the code write with both tools and this give an error.
How can I tell to some tool in where metadata key write??
Thanks for your help.
The solution is put metadata_name="annotations" in RangeSelection and in RangeSelectionOverlay

Categories