Altair linked map with scatter plot - python

I am trying to create a linked plot similar to examples here and here. I want a scatter plot on one side and a geomap on the other. The dots in the scatter plot will show up as dots on their corresponding geolocations on the map. Once I select a few points on the scatter plot, I'd like to see only those points on the map, or vice versa. However, couldn't get it done.
I think the problem is the base, or the values used in their x and y axes of these plots. The scatter plot's base uses just values (the dataframe, two numeric columns selected), while geomap has lat and long (the topojson file, the latitude and longitude columns used for adding the points onto the map). You can think of the dataset as the one from vegasets: data.airports() with two more numeric columns. And the topojson as data.us_10m.url
Is there a way to establish a connection between them?

Working from the US Airports example plot and adding an accompanying scatter plot, you can do something like this:
import altair as alt
from vega_datasets import data
airports = data.airports()
states = alt.topo_feature(data.us_10m.url, feature='states')
selection = alt.selection_interval()
# US states background
background = alt.Chart(states).mark_geoshape(
fill='lightgray',
stroke='white'
).properties(
width=500,
height=300
).project('albersUsa')
# airport positions on background
points = alt.Chart(airports).mark_circle(
size=10,
).encode(
longitude='longitude:Q',
latitude='latitude:Q',
tooltip=['name', 'city', 'state'],
color=alt.condition(selection, alt.value('steelblue'), alt.value('lightgray'))
)
#lat/lon scatter
scatter = alt.Chart(airports).mark_point().encode(
x='longitude:Q',
y='latitude:Q',
color=alt.condition(selection, alt.value('steelblue'), alt.value('lightgray'))
).add_selection(
selection
)
scatter | (background + points)
Note that interval selections are currently not supported on geographic projections, so it will not be possible to select points on the map itself.

Related

customization of plotly create_scattermatrix plots

A simple call to plotly's figure_factory routine to create a scatter matrix:
import pandas as pd
import numpy as np
from plotly import figure_factory
df = pd.DataFrame(np.random.randn(40,3))
fig = figure_factory.create_scatterplotmatrix(df, diag='histogram')
fig.show()
yields
My questions are:
How can I specify a single color for all the plots?
How can I set the axes ranges for each of the three variables on the scatter plot?
Is there a way to create a density (normalized) version of the histogram?
Is there a way to include the correlation coefficient (say, computed from df.corr()) in the upper right corner of the non-diagonal plots?
To change to the same color for the first, update the marker attribute color in the generated graph data; to modify the range of axes for the second scatter plot, update the generated data in the same way; since only the x-axis has been modified, use the same technique for the y-axis if necessary; to change to a normalized version of the third histogram To change to the normalized version of the third histogram, replace it with the normalized data. The data to be replaced is the one done in the example specification in Ref. If this does not hit normalization, I believe it is possible to replace it with data obtained with np.histogram(), etc. The fourth is a note, but I have added the data obtained with df.corr() with the graph data reference, specifying the data by axis name for each subplot.
import pandas as pd
import numpy as np
from plotly import figure_factory
np.random.seed(20220529)
df = pd.DataFrame(np.random.randn(40,3))
density = px.histogram(df, x=[0,1,2], histnorm='probability density')
df_corr = df.corr()
fig = figure_factory.create_scatterplotmatrix(df, diag='histogram', height=600, width=600)
# 1.How can I specify a single color for all the plots?
for i in range(9):
fig.data[i]['marker']['color'] = 'blue'
# 2.How can I set the axes ranges for each of the three variables on the scatter plot?
for axes in ['xaxis2','xaxis3','xaxis4','xaxis6','xaxis7']:
fig.layout[axes]['range']=(-4,4)
# 3.Is there a way to create a density (normalized) version of the histogram?
fig['data'][0]['histnorm'] = 'probability density'
fig['data'][4]['histnorm'] = 'probability density'
fig['data'][8]['histnorm'] = 'probability density'
# 4.Is there a way to include the correlation coefficient (say, computed from df.corr())
# in the upper right corner of the non-diagonal plots?
for r,x,y in zip(df_corr.values.flatten(),
['x1','x2','x3','x4','x5','x6','x7','x8','x9'],
['y1','y2','y3','y4','y5','y6','y7','y8','y9']):
if r == 1.0:
pass
else:
fig.add_annotation(x=3.3, y=2, xref=x, yref=y, showarrow=False, text='R:'+str(round(r,2)))
fig.show()

How to plot line plot with vertical-based data (well-log)?

I was trying to plot geophysics data (well-log) into a scatter plot in Altair using mark_line function, but the line plot is not connecting the dots/ points from top-bottom, but rather from left-right. If you see figure on the left, the data is distributed vertically as clearly seen, in the middle is the result using mark_line, and on the right is the one I wanted, just flipped the X and Y axis.
Is there any way to make a plot to behave just like left figure, but in line encoding?
Or perhaps some form of hacks to flipped the display on the right figure?
chart1 = alt.Chart(w).mark_point(color='green').encode(
alt.X('GR', scale=alt.Scale(domain=[0,300])),
alt.Y('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart2 = alt.Chart(w).mark_line(color='green').encode(
alt.X('GR', scale=alt.Scale(domain=[0,300])),
alt.Y('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart3 = alt.Chart(w).mark_line(color='green').encode(
alt.Y('GR', scale=alt.Scale(domain=[0,300])),
alt.X('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart1 | chart2 | chart3
Plot using Altair
For those who needs more information, this is a typical dataset from borehole geophysics data/ well-log. Data (GR) is displayed in vertical line, against depth (DEPT).
Thanks for the help!
From what I tested so far, Altair scatters plot using mark_line will always follow the X-axis by default. Therefore, in the case where you want to plot data across Y-axis, one has to specify the order of the connecting line. In the following, I add order = 'DEPT' which was the Y-axis in the plot.
alt.Chart(
w
).mark_line(
color='green',
point=True,
).encode(
alt.X('GR', scale=alt.Scale(domain=[0,250])),
alt.Y('DEPT', sort = 'descending',scale=alt.Scale(domain=[7000, 7030])),
order = 'DEPT' #this has to be added to make sure the plot is following the order of Y-axis, DEPT
).configure_mark(
color = 'red'
).interactive()
Result:

Can you add a label for "missing data" to a legend for an Altair chart?

I have a heatmap plotted using Altair that includes a colorbar, but the missing data (blank/white) in the heatmap is not labeled on the colorbar. Is there a way to add a separate label to the legend (e.g. below the colorbar) to show how missing data is represented in the chart?
I've come up with a solution that includes a "ghost" layer on top of my chart -- a ruler chart with size = 0 (so that the line is invisible) that is colored by a column filled with string values of "No Data" (see code below). This forces a legend item, but I'm wondering if there's a better way.
(See my full example at link here:heatmap plot)
import numpy as np
import altair as alt
import pandas as pd
# Example heatmap data
heatmap_df = pd.DataFrame([["NY",1999,1],["NY",2000,np.nan], ["MA",1999,np.nan], ["MA",2000,4]], columns = ["state","year","rate"])
# Example Legend dataframe
legend_no_data = pd.DataFrame([[1999, "No Data"]], columns = ["year", "text"])
# Example chart with "No Data" label
heatmap = alt.Chart(heatmap_df).mark_rect().encode(alt.X("year:O"), alt.Y("state:N"), alt.Color("rate:Q"))
# Chart for "No Data" legend item
vacc_legend_no_data = alt.Chart(legend_no_data).mark_line(
size=0
).encode(
x='year:O',
color = alt.Color("text:N", legend = alt.Legend(title = "", symbolType = "square")))
heatmap + vacc_legend_no_data
The "ghost layer" adds the empty square labeled "No Data" at the bottom of the colorbar, but I hope there is a better way to represent this!
Unfortunately, I don't know an easy way to handle nulls within a quantitative scale. But you can handle them naturally within nominal scales; I would probably generate the Null dataset layer in the chart spec using a calculate transform, to avoid having to construct a second dataframe. It might look something like this:
heatmap = alt.Chart(heatmap_df).mark_rect().encode(
alt.X("year:O"),
alt.Y("state:N"),
alt.Color("rate:Q")
)
nulls = heatmap.transform_filter(
"!isValid(datum.rate)"
).mark_rect(opacity=0.5).encode(
alt.Color('rate:N', scale=alt.Scale(scheme='greys'))
)
heatmap + nulls

Pie charts and geopandas map

I want to represent the ethnic distribution within each region of my map.
I'm a newbie in geopandas and until now I can just make a map that shows the share of one single ethnic group by region.
My code is the following:
geodf.plot(column="resid_preto", cmap="Blues", figsize=(20,12),
edgecolor='black', linewidth=0.5, alpha=0.9, legend=False)
plt.axis('off')
Where 'resid_preto' is a column that contains the share of the black population within the region
I want to make a map like this one. So, I could have the representation of all ethnic groups in one single map instead of creating one map per group
AFAIK there is no straight forward way to do this with geopandas. But combining it with scatter from pyplot the desired result can be obtained.
The first trick is to obtain the x,y coordinates where the pie chart is going to be plotted. In my case I used the centroid tool of geopandas with the x and y methods, e.g., the x and y coordinates are given by (i is the row index in the geopandas DataFrame):
centroids = geo_df.centroid
x_coord = centroids.x[i]
y_coord = centroids.y[i]
The next step is to use pyplot to create a pie chart and store the returned structure, which comprises, among other things, a list of patches with the wedges of the pie:
wedges=plt.pie([1,1,1])
The 3rd step is to create a plot object and then plot the geopandas DataFrame with the plot API, e.g.,
fig, ax = plt.subplots(1, 1)
geo_df.boundary.plot(ax=ax,zorder=1,edgecolor='black')
Finally, use scatter to add the pie charts on top of the geopandas plot. The for loop is to add each of the 3 wedges obtained from plot.pie([1,1,1]) (in case of more wedges just change this. There must be a more efficient/elegant way of doing this):
piecolor=['b','g','y']
for j in range(3):
ax.scatter([x_coord],[y_coord],marker=(wedges[0][j].get_path().vertices.tolist()),facecolor=piecolors[j], s=800)
plt.show()
The trick is to pass a non-default marker to scatter, in this case the wedges. For each element in wedges first get the path, then the vertices and finally convert it to a list and that's the marker.

MatPlotLib - Showing legend

I'm making a scatter plot from a Pandas DataFrame with 3 columns. The first two would be the x and y axis, and the third would be classicfication data that I want to visualize by points having different colors. My question is, how can I add the legend to this plot:
df= df.groupby(['Month', 'Price'])['Quantity'].sum().reset_index()
df.plot(kind='scatter', x='Month', y='Quantity', c=df.Price , s = 100, legend = True);
As you can see, I'd like to automatically color the dots based on their price, so adding labels manually is a bit of an inconvenience. Is there a way I could add something to this code, that would also show a legend to the Price values?
Also, this colors the scatter plot dots on a range from black to white. Can I add custom colors without giving up the easy usage of c=df.Price?
Thank you!

Categories