How to plot line plot with vertical-based data (well-log)? - python

I was trying to plot geophysics data (well-log) into a scatter plot in Altair using mark_line function, but the line plot is not connecting the dots/ points from top-bottom, but rather from left-right. If you see figure on the left, the data is distributed vertically as clearly seen, in the middle is the result using mark_line, and on the right is the one I wanted, just flipped the X and Y axis.
Is there any way to make a plot to behave just like left figure, but in line encoding?
Or perhaps some form of hacks to flipped the display on the right figure?
chart1 = alt.Chart(w).mark_point(color='green').encode(
alt.X('GR', scale=alt.Scale(domain=[0,300])),
alt.Y('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart2 = alt.Chart(w).mark_line(color='green').encode(
alt.X('GR', scale=alt.Scale(domain=[0,300])),
alt.Y('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart3 = alt.Chart(w).mark_line(color='green').encode(
alt.Y('GR', scale=alt.Scale(domain=[0,300])),
alt.X('DEPT', scale=alt.Scale(domain=[7000, 7100])),
).interactive()
chart1 | chart2 | chart3
Plot using Altair
For those who needs more information, this is a typical dataset from borehole geophysics data/ well-log. Data (GR) is displayed in vertical line, against depth (DEPT).
Thanks for the help!

From what I tested so far, Altair scatters plot using mark_line will always follow the X-axis by default. Therefore, in the case where you want to plot data across Y-axis, one has to specify the order of the connecting line. In the following, I add order = 'DEPT' which was the Y-axis in the plot.
alt.Chart(
w
).mark_line(
color='green',
point=True,
).encode(
alt.X('GR', scale=alt.Scale(domain=[0,250])),
alt.Y('DEPT', sort = 'descending',scale=alt.Scale(domain=[7000, 7030])),
order = 'DEPT' #this has to be added to make sure the plot is following the order of Y-axis, DEPT
).configure_mark(
color = 'red'
).interactive()
Result:

Related

How to create a plot with stacked and labeled line segments

I want to create sort of Stacked Bar Chart [don't know the proper name]. I hand drew the graph [for years 2016 and 2017] and attached it here.
The code to create the df is below:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = [[2016.0, 0.4862, 0.4115, 0.3905, 0.3483, 0.1196],
[2017.0, 0.4471, 0.4096, 0.3725, 0.2866, 0.1387],
[2018.0, 0.4748, 0.4016, 0.3381, 0.2905, 0.2012],
[2019.0, 0.4705, 0.4247, 0.3857, 0.3333, 0.2457],
[2020.0, 0.4755, 0.4196, 0.3971, 0.3825, 0.2965]]
cols = ['attribute_time', '100-81 percentile', '80-61 percentile', '60-41 percentile', '40-21 percentile', '20-0 percentile']
df = pd.DataFrame(data, columns=cols)
#set seaborn plotting aesthetics
sns.set(style='white')
#create stacked bar chart
df.set_index('attribute_time').plot(kind='bar', stacked=True)
The data doesn't need to stack on top of each other. The code will create a stacked bar chart, but that's not exactly what needs to be displayed. The percentile needs to have labeled horizontal lines indicating the percentile on the x axis for that year. Does anyone have recommendations on how to achieve this goal? Is it a sort of modified stacked bar chart that needs to be visualized?
My approach to this is to represent the data as a categorical scatter plot (stripplot in Seaborn) using horizontal lines rather than points as markers. You'll have to make some choices about exactly how and where you want to plot things, but this should get you started!
I first modified the data a little bit:
df['attribute_time'] = df['attribute_time'].astype('int') # Just to get rid of the decimals.
df = df.melt(id_vars = ['attribute_time'],
value_name = 'pct_value',
var_name = 'pct_range')
Melting the DataFrame takes the wide data and makes it long instead, so the columns are now year, pct_value, and pct_range and there is a row for each data point.
Next is the plotting:
fig, ax = plt.subplots()
sns.stripplot(data = df,
x = 'attribute_time',
y = 'pct_value',
hue = 'pct_range',
jitter = False,
marker = '_',
s = 40,
linewidth = 3,
ax = ax)
Instead of labeling each point with the range that it belongs to, I though it would be a lot cleaner to separate them into ranges by color.
The jitter is used when there are lots of points for a given category that might overlap to try and prevent them from touching. In this case, we don't need to worry about that so I turned the jitter off. The marker style is designated here as hline.
The s parameter is the horizontal width of each line, and the linewidth is the thickness, so you can play around with those a bit to see what works best for you.
Text is added to the figure using the ax.text method as follows:
for year, value in zip(df['attribute_time'],df['pct_value']):
ax.text(year - 2016,
value,
str(value),
ha = 'center',
va = 'bottom',
fontsize = 'small')
The figure coordinates are indexed starting from 0 despite the horizontal markers displaying the years, so the x position of the text is shifted left by the minimum year (2016). The y position is equal to the value, and the text itself is a string representation of the value. The text is centered above the line and sits slightly above it due to the vertical anchor being on the bottom.
There's obviously a lot you can tweak to make it look how you want with sizing and labeling and stuff, but hopefully this is at least a good start!

Create a horizontal waterfall chart with python matplotlib

I am trying to create a waterfall chart, which is like a bar chart, except that each bar starts at the end of its neighboring bars, at the end or beginning, so you have the total, and can see how it breaks down.
I am trying to create this chart in python, but there are no direct charts in matplot.lib called waterfall.
I found code for a vertical waterfall, but I could not transform it to horizontal.
How can I transform a barh matplot chart, for example, to a horizontal waterfall?
I want to create a HORIZONTAL waterfall.
For example, I am trying to make each bar in barh chart in matplotlib start at end of other, but I do not think I am approaching the problem the right way, because I have no results so far.
It should look like this:
Code to create the plot:
my_plot = trans.plot(
kind='barh',
stacked=True,
bottom=blank,legend=None,
figsize=(10, 5)
)
How do I separate the bars?
EDIT
I have found this ready to use python package, but it doesn't work with dataframes, so I cannot use it.
import waterfall_chart
from matplotlib import transforms
a = ['sales','returns','credit fees','rebates','late charges','shipping']
b = [10,-30,-7.5,-25,95,-7]
my_plot = waterfall_chart.plot(a, b, rotation_value=30, sorted_value=True, threshold=0.2,
formatting="$ {:,.1f}", net_label="end result", other_label="misc",
Title="chart", x_lab="X", y_lab="money", blue_color="blue",
green_color="#95ff24", red_color="r")
rot = transforms.Affine2D().rotate_deg(90)
my_plot.show()
I also found this tutorial, with code, for a vertical waterfall chart.
https://pbpython.com/waterfall-chart.html.
It works great, but I didn't manage to reproduce the same thing for a horizontal waterfall.

Altair linked map with scatter plot

I am trying to create a linked plot similar to examples here and here. I want a scatter plot on one side and a geomap on the other. The dots in the scatter plot will show up as dots on their corresponding geolocations on the map. Once I select a few points on the scatter plot, I'd like to see only those points on the map, or vice versa. However, couldn't get it done.
I think the problem is the base, or the values used in their x and y axes of these plots. The scatter plot's base uses just values (the dataframe, two numeric columns selected), while geomap has lat and long (the topojson file, the latitude and longitude columns used for adding the points onto the map). You can think of the dataset as the one from vegasets: data.airports() with two more numeric columns. And the topojson as data.us_10m.url
Is there a way to establish a connection between them?
Working from the US Airports example plot and adding an accompanying scatter plot, you can do something like this:
import altair as alt
from vega_datasets import data
airports = data.airports()
states = alt.topo_feature(data.us_10m.url, feature='states')
selection = alt.selection_interval()
# US states background
background = alt.Chart(states).mark_geoshape(
fill='lightgray',
stroke='white'
).properties(
width=500,
height=300
).project('albersUsa')
# airport positions on background
points = alt.Chart(airports).mark_circle(
size=10,
).encode(
longitude='longitude:Q',
latitude='latitude:Q',
tooltip=['name', 'city', 'state'],
color=alt.condition(selection, alt.value('steelblue'), alt.value('lightgray'))
)
#lat/lon scatter
scatter = alt.Chart(airports).mark_point().encode(
x='longitude:Q',
y='latitude:Q',
color=alt.condition(selection, alt.value('steelblue'), alt.value('lightgray'))
).add_selection(
selection
)
scatter | (background + points)
Note that interval selections are currently not supported on geographic projections, so it will not be possible to select points on the map itself.

Multi horizontal barplots in one plot

Does anybody whether its possible to get multiple horizontal bar plots in one plot. Say I have two horizontal bars plots (attached) which both use the same y-axis data. But their x-value data differs. Can I get these two plots in one plot?
I have attached my bar plots and code that i use to plot
Plot the first hbar
plt.barh(index,b1_plt,color = 'K')
plt.barh(index,b2_plt,color = 'K')
plt.xlabel('Width')
plt.ylabel('layer nr')
plt.title('Section outline')
Plot the second hbar
plt.barh(index,micro_xmi_all)
plt.xlabel('Micro strain')
plt.ylabel('layer nr')
plt.title('Strain diagram')
The list micro_xmi_all have different range than b1_plt or b2_plt

MatPlotLib - Showing legend

I'm making a scatter plot from a Pandas DataFrame with 3 columns. The first two would be the x and y axis, and the third would be classicfication data that I want to visualize by points having different colors. My question is, how can I add the legend to this plot:
df= df.groupby(['Month', 'Price'])['Quantity'].sum().reset_index()
df.plot(kind='scatter', x='Month', y='Quantity', c=df.Price , s = 100, legend = True);
As you can see, I'd like to automatically color the dots based on their price, so adding labels manually is a bit of an inconvenience. Is there a way I could add something to this code, that would also show a legend to the Price values?
Also, this colors the scatter plot dots on a range from black to white. Can I add custom colors without giving up the easy usage of c=df.Price?
Thank you!

Categories