X-Axis scales not matching with 2 data sets on same plot

X-Axis scales not matching with 2 data sets on same plot - python

I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()

If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()

Related

plot and draw curves in python matplotlib without ignoring first and last Nan values from the graph figure

I have csv format file like the below table
depth
x1
x2
x3
1000
15
Nan
Nan
1001
10
Nan
Nan
1002
5
Nan
Nan
1003
8
10
Nan
1004
12
11.11111111
Nan
1010
13
17.77777778
14.16666667
1011
14
18.88888889
15
1012
15
20
15.71428571
1013
16
20.55555556
16.42857143
1014
17
21.11111111
17.14285714
1017
20
22.77777778
19.28571429
1018
21
23.33333333
20
1019
22
23.88888889
20.83333333
1024
27
17.5
25
1025
28
15
25
1026
25
Nan
Nan
1027
26
Nan
Nan
1028
7
Nan
Nan
I want to plot x1, x2, x3 columns versus depth columns but sometimes these columns contain Nan values at start and end of columns, I want to plot whole curves points without ignoring the first and last Nan values
the below code is my attempt to plot curves but the plot always start and end at first and last valid values and ignores the first and last Nan values
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
df = pd.read_csv("result.csv")
fig = plt.figure(figsize=(15, 12), dpi=100, tight_layout=True)
gs = gridspec.GridSpec(nrows=1, ncols=5, wspace=0)
fig.add_subplot(gs[0, 1])
plt.plot(df['x1'],df["depth"], linewidth=2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,2 ])
plt.plot(df["x2"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
fig.add_subplot(gs[0,3])
plt.plot(df["x3"],df["depth"], linewidth =2, color='black', marker="o", markersize=3)
plt.gca().invert_yaxis()
plt.show()
the current reult
the desired result in the below image where all curves y axis start from same depth point

You need to share the y axis with the other y axis:
fig, axs = plt.subplots(1, 3, figsize=(15, 12), dpi=100, tight_layout=True, gridspec_kw={'wspace': 0})
axs[0].plot(df.x1, df.depth, '-ok', lw=2, ms=3)
axs[1].plot(df.x2, df.depth, '-ok', lw=2, ms=3)
axs[1].sharey(axs[0])
axs[2].plot(df.x3, df.depth, '-ok', lw=2, ms=3)
axs[2].sharey(axs[0])

Different binning for histplot as JoinGrid (x,y) marginal plot

I have a pandas dataframe like this:
Date
Weight
Year
Month
Day
Week
DayOfWeek
0
2017-11-13
76.1
2017
11
13
46
0
1
2017-11-14
76.2
2017
11
14
46
1
2
2017-11-15
76.6
2017
11
15
46
2
3
2017-11-16
77.1
2017
11
16
46
3
4
2017-11-17
76.7
2017
11
17
46
4
...
...
...
...
...
...
...
...
I created a JoinGrid with:
g = sns.JointGrid(data=df,
x="Date",
y="Weight",
marginal_ticks=True,
height=6,
ratio=2,
space=.05)
Then a defined joint and marginal plots:
g.plot_joint(sns.scatterplot,
hue=df["Year"],
alpha=.4,
legend=True)
g.plot_marginals(sns.histplot,
multiple="stack",
bins=20,
hue=df["Year"])
Result is this.
Now the question is: "is it possible to specify different binning for the two histplot resulting in the x and y marginal plot?"

I don't think there is a built-in way to do that, by you can plot directly on the marginal axes using the plotting function of your choice, like so:
penguins = sns.load_dataset('penguins')
data = penguins
x_col = "bill_length_mm"
y_col = "bill_depth_mm"
hue_col = "species"
g = sns.JointGrid(data=data, x=x_col, y=y_col, hue=hue_col)
g.plot_joint(sns.scatterplot)
# top marginal
sns.histplot(data=data, x=x_col, hue=hue_col, bins=5, ax=g.ax_marg_x, legend=False, multiple='stack')
# right marginal
sns.histplot(data=data, y=y_col, hue=hue_col, bins=40, ax=g.ax_marg_y, legend=False, multiple='stack')

How to make plots with small whitespace separations in Matplotlib or Seaborn?

I'd like to make this type of plot with multiple columns separated by small whitespace, each having different category having 3-5 (5 in this example) different observations with varying values on y axis:

actually, i can plot this plot use ggplot2. for example:
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(dplyr)
library(ggplot2)
mtcars %>% reshape2::melt() %>%
ggplot(aes(x = variable, y = value)) +
geom_point() + facet_grid(~ variable) +
theme(axis.text.x = element_blank())
you set a categorical variable in your dataset,then use the facet_grid(~).this function can change your plot into multiple plot by your categrical variable

Here is an approach to draw a similar plot using Python's matplotlib. The plot has a grey background and white major and minor gridlines to delimit the zones. Getting the dots in the center of each little cell is somewhat tricky: divide into n+1 spaces and shift half a cell (1/2n). A secondary x-axis can be used to set the labels. A zorder has to be set to have the dots on top of the gridlines.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import ticker
n = 5
cols = 7
values = [np.random.uniform(1, 10, n) for c in range(cols)]
fig, ax = plt.subplots()
ax.set_facecolor('lightgrey')
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.xaxis.set_minor_locator(ticker.MultipleLocator(1 / (n)))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
ax.grid(True, which='both', axis='both', color='white')
ax.set_xticklabels([])
ax.tick_params(axis='x', which='both', length=0)
ax.grid(which='major', axis='both', lw=3)
ax.set_xlim(1, cols + 1)
for i in range(1, cols + 1):
ax.scatter(np.linspace(i, i + 1, n, endpoint=False) + 1 / (2 * n), values[i-1], c='crimson', zorder=2)
ax2 = ax.twiny()
ax2.set_xlim(0.5, cols + 0.5)
ticks = range(1, cols + 1)
ax2.set_xticks(ticks)
ax2.set_xticklabels([f'Cat_{t:02d}' for t in ticks])
bbox = dict(boxstyle="round", ec="limegreen", fc="limegreen", alpha=0.5)
plt.setp(ax2.get_xticklabels(), bbox=bbox)
ax2.tick_params(axis='x', length=0)
plt.show()

Matplotlib is printing the line plot twice/multiple times

What could be the problem if Matplotlib is printing a line plot twice or multiple like this one:
Here is my code:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy import integrate
def compute_integrated_spectral_response_ikonos(file, sheet):
df = pd.read_excel(file, sheet_name=sheet, header=2)
blue = integrate.cumtrapz(df['Blue'], df['Wavelength'])
green = integrate.cumtrapz(df['Green'], df['Wavelength'])
red = integrate.cumtrapz(df['Red'], df['Wavelength'])
nir = integrate.cumtrapz(df['NIR'], df['Wavelength'])
pan = integrate.cumtrapz(df['Pan'], df['Wavelength'])
plt.figure(num=None, figsize=(6, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(df[1:], blue, label='Blue', color='darkblue');
plt.plot(df[1:], green, label='Green', color='b');
plt.plot(df[1:], red, label='Red', color='g');
plt.plot(df[1:], nir, label='NIR', color='r');
plt.plot(df[1:], pan, label='Pan', color='darkred')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Spectral Response (%)')
plt.title(f'Integrated Spectral Response of {sheet} Bands')
plt.show()
compute_integrated_spectral_response_ikonos('Sorted Wavelengths.xlsx', 'IKONOS')
Here is my dataset.

This is because plotting df[1:] is plotting the entire dataframe as the x-axis.
>>> df[1:]
Wavelength Blue Green Red NIR Pan
1 355 0.001463 0.000800 0.000504 0.000532 0.000619
2 360 0.000866 0.000729 0.000391 0.000674 0.000361
3 365 0.000731 0.000806 0.000597 0.000847 0.000244
4 370 0.000717 0.000577 0.000328 0.000729 0.000435
5 375 0.001251 0.000842 0.000847 0.000906 0.000914
.. ... ... ... ... ... ...
133 1015 0.002601 0.002100 0.001752 0.002007 0.149330
134 1020 0.001602 0.002040 0.002341 0.001793 0.136372
135 1025 0.001946 0.002218 0.001260 0.002754 0.118682
136 1030 0.002417 0.001376 0.000898 0.000000 0.103634
137 1035 0.001300 0.001602 0.000000 0.000000 0.089097
[137 rows x 6 columns]
The slice [1:] just gives the dataframe without the first row. Altering each instance of df[1:] to df['Wavelength'][1:] gives us what I presume is the expected output:
>>> df['Wavelength'][1:]
1 355
2 360
3 365
4 370
5 375
133 1015
134 1020
135 1025
136 1030
137 1035
Name: Wavelength, Length: 137, dtype: int64
Output:

Plot a line on a OHLC minutes graph by mpl_finance

Using the below code, I can successfully plot a OHLC graph
names = pd.Series(data.index.strftime("%d-%m-%y").unique())
indexs = pd.Series(data.index.date).unique()
ohlc = data[data.index.date == indexs[0]].copy()
ohlc['mdate'] = [mdates.date2num(d) for d in ohlc.index]
ohlc['SMA10'] = ohlc["Close"].rolling(10).mean()
fig, ax = plt.subplots(figsize = (10,5))
mpl_finance.candlestick2_ohlc(ax,ohlc['Open'],ohlc['High'],ohlc['Low'],ohlc['Close'], width = 0.6)
xdate = ohlc.index
def mydate(x, pos):
try:
return xdate[int(x)]
except IndexError:
return ''
ax.xaxis.set_major_formatter(ticker.FuncFormatter(mydate))
fig.autofmt_xdate()
fig.tight_layout()
plt.show()
However, when I add this line
ax.plot(ohlc.mdate, ohlc["SMA10"], color ="green", label = "SMA50"),
I can an empty graph with two vertical tine lines. What is wrong in here please?
Open High Low ... Volume mdate SMA10
Date_Time ...
2018-02-13 11:55:00 7169.7 7172.4 7167.0 ... 444 736738.496528 NaN
2018-02-13 12:00:00 7171.6 7174.2 7164.2 ... 578 736738.500000 NaN
2018-02-13 12:05:00 7174.2 7174.7 7170.7 ... 458 736738.503472 NaN
2018-02-13 12:10:00 7172.0 7175.7 7171.2 ... 401 736738.506944 NaN
2018-02-13 12:15:00 7174.7 7176.7 7173.0 ... 389 736738.510417 NaN
This is the columns of my data
Index(['Open', 'High', 'Low', 'Close', 'Volume', 'mdate', 'SMA10'], dtype='object')

replace
ax.plot(ohlc.mdate, ohlc["SMA10"], color ="green", label = "SMA50")
by
ax.plot(ohlc.index, ohlc["SMA10"], color ="green", label = "SMA50")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

X-Axis scales not matching with 2 data sets on same plot - python

Related

plot and draw curves in python matplotlib without ignoring first and last Nan values from the graph figure

Different binning for histplot as JoinGrid (x,y) marginal plot

How to make plots with small whitespace separations in Matplotlib or Seaborn?

Matplotlib is printing the line plot twice/multiple times

Plot a line on a OHLC minutes graph by mpl_finance

Categories

Resources