Customizing axis titles in Seaborn pairplots

Customizing axis titles in Seaborn pairplots - python

I would like to customize axis titles in a Seaborn pairplot, e.g., I would like to use the formatted versions (r'$I_{bs}$') and (r'$k_{HL}$') instead of "Ibs" and "kHL" which are the column titles in the dataframe I use to generate the plot. How can I achieve this?

Problem solved. I named columns using formatted strings in defining the dataframe I used to create the pairplot:
df_red=pd.DataFrame(DataMatrix_red,columns=['shape',r'$d_n$',r'$\nu_s$',r'$D_s$[%]',r'$I_{{bs}}$',r'$k_{{VssL}}$',r'$k_{{HL}}$'])

Related

Why am I unable to make a plot containing subplots in plotly using a px.scatter plot?

I have been trying to make a figure using plotly that combines multiple figures together. In order to do this, I have been trying to use the make_subplots function, but I have found it very difficult to have the plots added in such a way that they are properly formatted. I can currently make singular plots (as seen directly below):
However, whenever I try to combine these singular plots using make_subplots, I end up with this:
This figure has the subplots set up completely wrong, since I need each of the four subplots to contain data pertaining to the four methods (A, B, C, and D). In other words, I would like to have four subplots that look like my singular plot example above.
I have set up the code in the following way:
for sequence in sequences:
#process for making sequence profile is done here
sequence_df = pd.DataFrame(sequence_profile)
row_number=1
grand_figure = make_subplots(rows=4, cols=1)
#there are four groups per sequence, so the grand figure should have four subplots in total
for group in sequence_df["group"].unique():
figure_df_group = sequence_df[(sequence_df["group"]==group)]
figure_df_group.sort_values("sample", ascending=True, inplace=True)
figure = px.line(figure_df_group, x = figure_df_group["sample"], y = figure_df_group["intensity"], color= figure_df_group["method"])
figure.update_xaxes(title= "sample")
figure.update_traces(mode='markers+lines')
#note: the next line fails, since data must be extracted from the figure, hence why it is commented out
#grand_figure.append_trace(figure, row = row_number, col=1)
figure.update_layout(title_text="{} Profile Plot".format(sequence))
grand_figure.append_trace(figure.data[0], row = row_number, col=1)
row_number+=1
figure.write_image(os.path.join(output_directory+"{}_profile_plot_subplots_in_{}.jpg".format(sequence, group)))
grand_figure.write_image(os.path.join(output_directory+"grand_figure_{}_profile_plot_subplots.jpg".format(sequence)))
I have tried following directions (like for example, here: ValueError: Invalid element(s) received for the 'data' property) but I was unable to get my figures added as is as subplots. At first it seemed like I needed to use the graph object (go) module in plotly (https://plotly.com/python/subplots/), but I would really like to keep the formatting/design of my current singular plot. I just want the plots to be conglomerated in groups of four. However, when I try to add the subplots like I currently do, I need to use the data property of the figure, which causes the design of my scatter plot to be completely messed up. Any help for how I can ameliorate this problem would be great.

Ok, so I found a solution here. Rather than using the make_subplots function, I just instead exported all the figures onto an .html file (Plotly saving multiple plots into a single html) and then converted it into an image (HTML to IMAGE using Python). This isn't exactly the approach I would have preferred to have, but it does work.
UPDATE
I have found that plotly express offers another solution, as the px.line object has the parameter of facet that allows one to set up multiple subplots within their plot. My code is set up like this, and is different from the code above in that the dataframe does not need to be iterated in a for loop based on its groups:
sequence_df = pd.DataFrame(sequence_profile)
figure = px.line(sequence_df, x = sequence_df["sample"], y = sequence_df["intensity"], color= sequence_df["method"], facet_col= sequence_df["group"])
Although it still needs more formatting, my plot now looks like this, which is works much better for my purposes:

same data produces different pandas plot

I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.

The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.

so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

Is it possible to stack line graphs with df.plot() and if not how can it be done?

I want to create something like this figure with a dataframe that contains 9 columns.

Use the area type:
df.plot(kind='area')
Starting pandas version 0.17:
df.plot.area()

ggplot Bar Plot semantics

I am trying to use ggplot in Python for the first time and the semantics are completely unobvious to me.
I have a pandas dataframe with two columns: date and entries_sum. What I would like to do is plot a bar plot with the date column as each entry on the x-axis and entries_sum as the respective heights.
I cannot figure out how to do this with the ggplot API. Am I formatting my data wrong for this?

How about:
ggplot(aes(x='date', y='entries_sum'), data=data) + geom_bar(stat='identity')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Customizing axis titles in Seaborn pairplots - python

I would like to customize axis titles in a Seaborn pairplot, e.g., I would like to use the formatted versions (r'$I_{bs}$') and (r'$k_{HL}$') instead of "Ibs" and "kHL" which are the column titles in the dataframe I use to generate the plot. How can I achieve this?

Problem solved. I named columns using formatted strings in defining the dataframe I used to create the pairplot: df_red=pd.DataFrame(DataMatrix_red,columns=['shape',r'$d_n$',r'$\nu_s$',r'$D_s$[%]',r'$I_{{bs}}$',r'$k_{{VssL}}$',r'$k_{{HL}}$'])

Related

Why am I unable to make a plot containing subplots in plotly using a px.scatter plot?

same data produces different pandas plot

How can I loop through a list of elements and create time series plots in Python

Is it possible to stack line graphs with df.plot() and if not how can it be done?

ggplot Bar Plot semantics

Categories

Resources