How do I pre-select rows in a Bokeh.widget.DataTable? - python

Bokeh has the ability to display data in a dataframe as shown here:
http://docs.bokeh.org/en/latest/docs/user_guide/interaction/widgets.html#data-table
The Setup:
I have a dataframe of the following format:
Index|Location|Value
-----|--------|-----
1 |1 | 10
2 |1 | 20
3 |1 | 30
4 |2 | 20
5 |2 | 30
6 |2 | 40
This dataframe can be displayed in a data table like so:
source = ColumnDataSource(data={
LOCATION_NAME: [],
VALUE_NAME: []
})
columns = [
TableColumn(field=LOCATION_NAME, title=LOCATION_NAME),
TableColumn(field=VALUE_NAME, title=VALUE_NAME)
]
data_table = DataTable(source=source, columns=columns, width=400, height=800)
def update_dt(df):
"""Update the data table. This function is called upon some trigger"""
source.data = {
LOCATION_NAME: mt_val_df[LOCATION_NAME],
VALUE_NAME: mt_val_df[VALUE]}
Ideally, I want this datatable to drive a heatmap where selections made for each location will lead to a changed value in the heatmap. But a heatmap cannot have several values for one location. I also do not know how to pre-select items in a datatable.
Assume that I have a second dataframe:
Index|Location|Value
-----|--------|-----
2 |1 | 20
6 |2 | 40
This dataframe represents a subset of the above table - perhaps some custom selection of the above.
The Problem:
At the most basic level: I have the index of my selection of rows. How can I highlight/pre-select rows in the data table above based on the rows of the second dataframe?
Update (2017-07-14): So far I tried setting the selected index on the data source python side. Although source['selected']['1d'].indices = [LIST OF MY SELECTION] does correctly set the indices, I am not seeing a corresponding update on the front-end DataTable in Bokeh 0.12.5.
I have also tried setting the indices on the front-end. My problem there is I don't know how to pass in parameters via CustomJS that are not related to Bokeh.
At a more complete level: How can selections in the datatable drive the heatmap?
Update (2017-07-17): I have not gotten the proposed solution to work within the context of a Bokeh app! I am currently trying to find the cause but it's a bit tricky to follow why nothing gets selected in the end. My suspicion is that the code string gets instantiated in the beginning when the page loads. My coordinates, however, are not calculated until later. Therefore, hitting a button with the callback leads to the selection of nothing - even if later the row selection has been calculated. Continued help would be appreciated!

I have found a partial answer to the above questions thanks to the helpful comments of Claire Tang and Bryan Van de ven here.
Concerning Pre-selection not showing up on the DataTable
This turns out to be caused be two issues as far as I am aware.
1.) If I updated the selected index list in a CustomJS, I was missing to register the changes in the DataTable.
button.callback = CustomJS(args=dict(source2=source2), code="""
source2.selected['1d'].indices = [1,2,3];
//I did not "emit" my changed selection in the line below.
source2.properties.selected.change.emit();
console.log(source2)
""")
2.) The other important aspect to note is that I was on Bokeh version 0.12.5. In this version "source2.properties.selected" is an unknown property (perhaps because this function is located somehwhere else or not implemented yet). Therefore, selection also failed for me as long as I remained on Bokeh 0.12.5. An update to Bokeh 0.12.6 and the above line enabled selections to appear on the DataTable.
Concerning dynamic input from a Jupyter Notebook
The above example shows how I can use a button and a linked CustomJS callback to trigger selection of hard-coded lists. The question is how to feed in a list of index values based on some more dynamic calculations because CustomJS does not allow for external parameters that are not Bokeh related. On this topic, since CustomJS "code" attribute just takes a string. I tried the following:
dynamically_calculated_index = [1,2,3]
button.callback = CustomJS(args=dict(source1=source1, source2=source2), code="""
source2.selected['1d'].indices = {0};
source2.properties.selected.change.emit();
console.log(source2)
""".format(dynamically_calculated_index))
I am not sure if this is best practice, and so I welcome feedback in this regard, but this works for me for now. As pointed out by #DuCorey, once these changes are in the main branch of bokeh, some permutations of my issue could be more easily solved as described by him/her.
Also: This approach only works for a Jupyter Notebook where the entire cell gets recomputed again, and any pre-computed selected indices get bound at cell execution time. I can add a static list and it works for that, but if I want to dynamically calculate the above list, it will not work. I need to find a workaround still.
Solving the above issues now allows me to concentrate on propagating changes in what is selected to a heatmap.
Final answer using Bokeh server
The answer here was rather simple: It is possible to change the selected items, but it has to be done in the following way:
ds.selected = {'0d': {'glyph': None, 'indicices': []},
'1d': {indices': selected_index_list},
'2d': {}}
Previously, I had only tried to replace the 1d indices, but for some unknown reason, I have to actually replace the entire selected dictionary for the change in selected index to be registered by the bokeh app. So don't just do:
ds.selected['1d']['indices'] = selected_index_list
This now works for me. An explanation from someone more knowledgable would be appreciated though.

I managed to pre-select using this
resultPath = "path/to/some/file.tsv"
resultsTable = pandas.read_csv(resultPath,sep="\t")
source = ColumnDataSource(data=resultsTable)
source.selected.indices = [0,1,2,3,4] # KEY LINE
table = DataTable(source=source)

Related

Python folium - Markercluster not iterable with GroupedLayerControl

I would like to group my 2 marker cluster layers, where one is reliant on the other by providing a separate styling. Hence the second one is set as control=False.
Nevertheless, I want to have it disappear when the first one is switched off.
Along with the new Python folium issue v.0.14 I found, that the new feature has been provided, which potentially could resolve my issue:
https://github.com/ikoojoshi/Folium-GroupedLayerControl
Allow only one layer at a time in Folium LayerControl
and I've applied the following code:
df = pd.read_csv("or_geo.csv")
fo=FeatureGroup(name="OR")
or_cluster = MarkerCluster(name="Or", overlay=True, visible=True).add_to(map)
or_status = MarkerCluster(overlay=True,
control=False,
visible=False,
disableClusteringAtZoom=16,
).add_to(map)
GroupedLayerControl(
groups={'OrB': or_cluster, 'OrC': or_status},
collapsed=False,
).add_to(map)
and the console throws the following error:
TypeError: 'MarkerCluster' object is not iterable
How could I switch off 2 layer groups at once?
UPDATE:
The answer below provides the code, which seems to work but not in the way I need.
df = pd.read_csv("or_geo.csv")
fo=FeatureGroup(name="Or",overlay = True)
or_cluster = MarkerCluster(name="Or").add_to(map)
or_status = MarkerCluster(control=False,
visible=True,
disableClusteringAtZoom=16,
).add_to(map)
# definition of or_marker
# definition of or_stat_marker
or_cluster.add_child(or_marker)
or_status.add_child(or_stat_marker)
GroupedLayerControl(
groups={"Or": [or_cluster, or_status]},
collapsed=False,
exclusive_group=False,
).add_to(map)
I have a separate box instead, but what is worst I can just switch between one layer and another whereas I would like to have them reliant on the main group. The exclusive_groups option allows me to untick both of them but I am looking for something, which would allow me to switch off two of them at once (place the thick box on the major group instead). Is it possible to have something like this?
Try passing your markerclusters as a list to the GroupedLayerControl, not one by one. This is described here:
https://nbviewer.org/github/chansooligans/folium/blob/plugins-groupedlayercontrol/examples/plugin-GroupedLayerControl.ipynb
GroupedLayerControl(
groups={'OrB': [or_cluster, or_status]},
collapsed=False,
).add_to(map)
Update I
I see what you mean, that was definitely nonsense as it splits groups instead of joining them. so, back to topic
We had a similar discussion here and I am still convinced that the FeatureSubGroup should solve this issue. I use it in exact that way that I enable/disable a MarkerCluster in the legend and multiple FeatureGroupSubGroups (which are added not to the map but to the MarkerCluster) appear/disappear. Perhaps you try that again

Tkinter, How do i save the entry/dropdown selection to a str variable in my script

I have searched for answers to this question and always get sent back to printing the selection with print(var.get()).
I don't want to print the selection, but to store it in a variable that I can then use in my code.
For more context, I'm making a simple gui to filter data frames. There are a bunch of dropdown menus from which the user selects specific features. I then want to get all of those features to filter down my data frame to a single entry.
The code snippet below shows what I'm trying, but I don't know how to get the data frame that the function filter_df returns (the data frame df is defined before).
I'd also need to use the value of each previous dropdown menu to remove all the impossible feature values (as in, no entry has both the previous value and the one I select after).
Is any of this possible, or are there elements I'm going to have to find a way around ?
I thank any and all answers in advance, this website is what makes the (or at least my) coding world go round.
app = tk.Tk()
app.geometry('100x200')
detail_app_str = tk.StringVar(app)
detail_app_str.set('Select Scope')
detail_dropedown = tk.OptionMenu(app, detail_app_str, *feature_values)
detail_dropedown.config(width=90, font=('Helvetica', 12))
detail_dropedown.pack(side="top")
def filter_df(*args):
filtered_df = df[(df['A'] == int(detail_app_str.get()))]
return filtered_df
detail_app_str.trace_add("write", filter_df)

Can we add annotations, or some kind of labels, to Folium Maps?

I have a basic SQL script which pulls data from MySQL , adds it to a dataframe, and then a Folium Map is created.
Here is my code.
#### login to DB
#### df = pd.read_sql("SELECT ... FROM ...)
m = folium.Map(location=[40.6976701, -74.2598704], zoom_start=10)
locations = list(zip(df.Latitude, df.Longitude))
#print(locations)
cluster = MarkerCluster(locations=locations)
m.add_child(cluster)
m
That produces this awesome map.
I can zoom in or zoom out, and the clusters expand or combine dynamically. Clearly the numbers are counts of items per cluster. I am wondering if I can add in another data point, like summing expenses per counts of items. So, in the image here, we can see a 3 at the top center. If that consists of 3 seperate expenses of 200 each, can I show the 600 as some kind of annotation, or label, pointing to the cluster? In the documentation I saw a parameters called popup and tooltip, but it doesn't seem to work for me.
Maybe I need to do some kind of aggregation, like this.
df.groupby(['Latitude','Longitude']).sum()
Just thinking out loud here.
I ended up doing this.
m = folium.Map(location=[40.6976701, -74.2598704], zoom_start=10)
for lat,lon,name,tip in zip(df.Latitude, df.Longitude, df.SiteName, df.Site):
folium.Marker(location=[lat,lon], tooltip = tip, popup = name)
m.add_child(cluster)
m
This lets you add a tooltip and a popup. That's pretty helpful. I still can't find a way to do sums. It seems like the only option is counts.

How to have a horizontal scroll bar when a column in the output is really long when using Jupyter and Python

I am trying to use Jupyter + Python. Here is an example of the output
You can see the because the column 'correspondencedata' is too long, so it can not be shown fully in the output.
Can I change this so that a horizontal scroll bar will occur when a column has too long content?
You want to use pd.set_option('max_colwidth', nbr_pixel) before.
If you use a number big enough it will always show the entire content of your cells.
Like, pd.set_option('max_colwidth', 4000)
For more informations:
## To see the actual settings :
pd.get_option("display.max_colwidth")
## To reset with default value
pd.reset_option("max_colwidth")
Documentation

SPSS Python - fast(er) way of accessing Value Labels

I am trying to pull the variables' names, labels and value labels. I noticed that all assignments are quite fast, except the one referencing the ValueLabels. On my test dataset, if I comment out that line, everything else takes about 1 second. But that line alone delays the whole code by about 15 seconds, and the test dataset is not a large one (by my standards at least :))
Is this something inherent with accessing the variable dictionary ? Or is there another, faster, way of pulling the whole dictionary, without going variable by variable...?
begin program.
import spss
import spssaux
vardict = spssaux.VariableDict()
var_list=[]
var_values={}
var_type={}
var_labels={}
for i in range(spss.GetVariableCount()):
var=spss.GetVariableName(i)
var_list.append(var)
#this is the line causing the massive delay
var_values[var]=vardict[i].ValueLabels
var_type[var]=str(spss.GetVariableFormat(i)[0])
var_labels[var]=vardict[i].VariableLabel
end program.
In fact I only need it to check if a variable had value labels defined or not. But I have no idea how to check that in any other way.
It turns out that using the spssaux module was the culprit here. I have no idea why, because pretty much all the Internet knowledge points to that way of getting the value labels.
However, almost by accident I stumbled upon the help of the `spss' module, which states:
| valueLabels
| Get, set or delete value labels. The set of value labels for a particular variable is represented
| as a Python dictionary whose keys are the values for which labels are being set and whose
| values are the associated labels. Labels must be specified as quoted strings.
|
| --examples
| # Get all value labels for a specified variable
| import spss
| spss.StartDataStep()
| datasetObj = spss.Dataset()
| varObj = datasetObj.varlist['minority']
| vallabels = varObj.valueLabels
| spss.EndDataStep()
As I was only interested to see if variables have (or do not have) value labels, I created a dictionary storing the length of the valueLabels dictionary of each variable:
begin program.
# Get all value labels for a specified variable
import spss
spss.StartDataStep()
datasetObj = spss.Dataset()
var_labels={}
for var in datasetObj.varlist:
var_labels[var.name]=len(var.valueLabels)
spss.EndDataStep()
print var_labels
end program.
It is instantaneous, even on large files. (I admit, what "large" means may be different from user to user; I stopped the code in the OP after 30 minutes on a "large" file, as it was obviously not being time-effective).

Categories