I'm going insane here ... this should be a simple exercise but I'm stuck:
I have a Jupyter notebook and am using the ruptures Python package. All I want to do is, take the figure or AxesSubplot(s) that the display() function returns and add it to a figure of my own, so I can share the x-axis, have a single image, etc.:
import pandas as pd
import matplotlib.pyplot as plt
myfigure = plt.figure()
l = len(df.columns)
for index, series in enumerate(df):
data = series.to_numpy().astype(int)
algo = rpt.KernelCPD(kernel='rbf', min_size=4).fit(data)
result = algo.predict(pen=3)
myfigure.add_subplot(l, 1, index+1)
rpt.display(data, result)
plt.title(series.name)
plt.show()
What I get is a figure with the desired number of subplots (all empty) and n separate figures from ruptures:
When instead I want want the subplots to be filled with the figures ...
I basically had to recreate the plot that ruptures.display(data,result) produces, to get my desired figure:
import pandas as pd
import numpy as np
import ruptures as rpt
import matplotlib.pyplot as plt
from matplotlib.ticker import EngFormatter
fig, axs = plt.subplots(len(df.columns), figsize=(22,20), dpi=300)
for index, series in enumerate(df):
resampled = df[series].dropna().resample('6H').mean().pad()
data = resampled.to_numpy().astype(int)
algo = rpt.KernelCPD(kernel='rbf', min_size=4).fit(data)
result = algo.predict(pen=3)
# Create ndarray of tuples from the result
result = np.insert(result, 0, 0) # Insert 0 as first result
tuples = np.array([ result[i:i+2] for i in range(len(result)-1) ])
ax = axs[index]
# Fill area beween results alternating blue/red
for i, tup in enumerate(tuples):
if i%2==0:
ax.axvspan(tup[0], tup[1], lw=0, alpha=.25)
else:
ax.axvspan(tup[0], tup[1], lw=0, alpha=.25, color='red')
ax.plot(data)
ax.set_title(series)
ax.yaxis.set_major_formatter(EngFormatter())
plt.subplots_adjust(hspace=.3)
plt.show()
I've wasted more time on this than I can justify, but it's pretty now and I can sleep well tonight :D
Related
I am plotting some routes on a black and white png. Now it appears that there is a item in the legend that should not be there. I am iterating a pandas dataframe and identify the different routes by there unique id. I also have a start and a end point that i have right at the beginning of the dataframe, so at i=0, and i=1, I plot marker='o' instead, so I can see that single points on my plot/rows in my dataframe. All working fine so far, but as you can see in the legend for i=0, there are 2 entries. Once the starting point, but in the second line it adds an orange line. How can that be? In the dataframe it is definitely only 1 row with id=0.
Here my code with an example dataframe:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x':[100,60,1,1,1,5,4,4], 'y':[100,125,1,2,3,10,10,9],'id':[0,1,2,2,2,3,3,3]})
for i, g in df.groupby('id'):
if(i==0):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="Start Punkt")
if(i==1):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="End Punkt")
else:
g.plot(x='x',y='y',ax=ax, title="Alternative Routes",label=i)
plt.show()
Here the resulting plot:
Found the answer by myself: Should be an elif instead of a an if for i==1
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x':[100,60,1,1,1,5,4,4], 'y':[100,125,1,2,3,10,10,9],'id':[0,1,2,2,2,3,3,3]})
for i, g in df.groupby('id'):
if(i==0):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="Start Punkt")
elif(i==1):
g.plot(x='x',y='y',ax=ax,marker='o',title="Alternative Routes",label="End Punkt")
else:
g.plot(x='x',y='y',ax=ax, title="Alternative Routes",label=i)
plt.show()
I have a pandas dataframe which I would like to slice, and plot each slice in a separate subplot. I would like to use the sharey='all' and have matplotlib decide on some reasonable y-axis limits, rather than having to search the dataframe for the min and max and add offsets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=0,ncols=0, sharey='all', tight_layout=True)
for i in range(1, len(df.columns) + 1):
ax = fig.add_subplot(2,3,i)
iC = df.iloc[:, i-1]
iC.plot(ax=ax)
Which gives the following plot:
In fact, it gives that irrespective of what I specify sharey to be ('all','col','row',True, or False). What I sought after using sharey='all' would be something like:
Can somebody perhaps explain me what I'm doing wrong here?
The following version would only add those axes you need for your df-columns and share their y-scales:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig = plt.figure(tight_layout=True)
ref_ax = None
for i in range(len(df.columns)):
ax = fig.add_subplot(2, 3, i+1, sharey=ref_ax)
ref_ax=ax
iC = df.iloc[:, i]
iC.plot(ax=ax)
plt.show()
The grid-layout Parameters, which are explicitly given as ...add_subplot(2, 3, ... here can of course be calculated with respect to len(df.columns).
Your plots are not shared. You create a subplot grid with 0 rows and 0 columns, i.e. no subplots at all, but those nonexisting subplots have their y axes shared. Then you create some other (existing) subplots, which are not shared. Those are the ones that are plotted to.
Instead you need to set nrows and ncols to some useful values and plot to those hence created axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=2,ncols=3, sharey='all', tight_layout=True)
for i, ax in zip(range(len(df.columns)), axes.flat):
iC = df.iloc[:, i]
iC.plot(ax=ax)
for j in range(len(df.columns),len(axes.flat)):
axes.flatten()[j].axis("off")
plt.show()
I am trying to compare and get a proper point of intersection between the two CSV files. I am using the graph depiction for better understanding.
But I am getting very diminished image of one graph as compared to another.
See the following:
Here is the data: trade-volume.csv
Here is the real graph:
Here is the data: miners-revenue.csv
Here is the real graph:
Here is the program I wrote for comparison:
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'])
ax.plot(dat3['timeDiff'], dat3['Value'])
plt.show()
I got the output like the following:
As one can see the orange color graph is very low and I could not understand the points as it is lower. I am willing to overlap the graphs and then check.
Please help me make it possible with my existing code, if no alteration required.
The problem comes down to your y axis. One has a maximum of 60,000,000 while the other has a maximum of 6,000,000,000. Trying to plot these on the same graph is going to lead to one "looking" like a straight line even though it isn't if you zoom in.
A possible solution is to use a second y axis (you can change the color of the lines using the color= argument in ax.plot():
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value'], color="blue")
ax2=ax.twinx()
ax2.plot(dat3['timeDiff'], dat3['Value'], color="red")
plt.show()
Both data live on very different scales. You may normalize both in order to compare them.
import pandas as pd
import matplotlib.pyplot as plt
dat2 = pd.read_csv("trade-volume.csv", parse_dates=['time'])
dat3 = pd.read_csv("miners-revenue.csv", parse_dates=['time'])
dat2['timeDiff'] = (dat2['time'] - dat2['time'][0]).astype('timedelta64[D]')
dat3['timeDiff'] = (dat3['time'] - dat3['time'][0]).astype('timedelta64[D]')
fig, ax = plt.subplots()
ax.plot(dat2['timeDiff'], dat2['Value']/dat2['Value'].values.max())
ax.plot(dat3['timeDiff'], dat3['Value']/dat3['Value'].values.max())
plt.show()
I need to draw several datasets within a single plot. The number of datasets varies, so I don't know a priori how many there will be.
If I just draw the legends, I get this (MCVE below):
How can I tell plt.legend() to only draw say the first 10 legends? I've looked around the plt.legends() class but there seems to be no argument to set such a value.
MCVE:
import numpy as np
import matplotlib.pyplot as plt
dataset = []
for _ in range(20):
dataset.append(np.random.uniform(0, 1, 2))
lbl = ['adfg', 'dfgb', 'cgfg', 'rtbd', 'etryt', 'frty', 'jklg', 'jklh',
'ijkl', 'dfgj', 'kbnm', 'bnmbl', 'qweqw', 'fghfn', 'dfg', 'hjt', 'dfb',
'sdgdas', 'werwe', 'dghfg']
for i, xy in enumerate(dataset):
plt.scatter(xy[0], xy[1], label=lbl[i])
plt.legend()
plt.savefig('test.png')
You can just limit the number of labels shown.
import matplotlib.pyplot as plt
maxn = 16
for i in range(25):
plt.scatter(.5, .5, label=(i//maxn)*"_"+str(i))
plt.legend()
plt.show()
This method works also for text labels of course:
import numpy as np
import matplotlib.pyplot as plt
labels = ["".join(np.random.choice(list("ABCDEFGHIJK"), size=8)) for k in range(25)]
maxn = 16
for i,l in enumerate(labels):
plt.scatter(.5, .5, label=(i//maxn)*"_"+l)
plt.legend()
plt.show()
The reason this works is that labels starting with "_" are ignored in the legend. This is used internally to give objects a label without showing them in the legend but can of course also be used by us to limit the number of elements in the legend.
I would like to suggest an alternative way to get your desired output, which I feel relies less on a "hack" of the legend labels.
You can use the function Axes.get_legend_handles_labels() to get a list of the handles and the labels of the objects that are to be put in the legend.
You can truncate these lists however you feel like, before passing them to plt.legend(). For instance:
import numpy as np
import matplotlib.pyplot as plt
dataset = []
for _ in range(20):
dataset.append(np.random.uniform(0, 1, 2))
lbl = ['adfg', 'dfgb', 'cgfg', 'rtbd', 'etryt', 'frty', 'jklg', 'jklh',
'ijkl', 'dfgj', 'kbnm', 'bnmbl', 'qweqw', 'fghfn', 'dfg', 'hjt', 'dfb',
'sdgdas', 'werwe', 'dghfg']
fig, ax = plt.subplots()
for i, xy in enumerate(dataset):
ax.scatter(xy[0], xy[1], label=lbl[i])
h,l = ax.get_legend_handles_labels()
plt.legend(h[:3], l[:3]) # <<<<<<<< This is where the magic happens
plt.show()
You could even display every other label plt.legend(h[::2], l[::2]) or whatever else you want.
I need to create a scatter matrix in Python. I tried using scatter_matrix for this but I would like to leave only the scatter plots above the diagonal line.
I`m in the really beginning (did not got far) and I have troubles when columns have names (not the default numbers).
Here is my code:
import itertools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data=pd.DataFrame(np.random.randint(0,100,size=(10, 5)), columns=list('ABCDE')) #THE PROBLEM IS HERE - I WILL HAVE COLUMNS WITH NAMES
d = data.shape[1]
fig, axes = plt.subplots(nrows=d, ncols=d, sharex=True, sharey=True)
for i in range(d):
for j in range(d):
ax = axes[i,j]
if i == j:
ax.text(0.5, 0.5, "Diagonal", transform=ax.transAxes,
horizontalalignment='center', verticalalignment='center',
fontsize=16)
else:
ax.scatter(data[j], data[i], s=10)
You have an issue when selecting a column from a data frame. You can use iloc to select columns based on integer location. Change your last line to:
ax.scatter(data.iloc[:,j], data.iloc[:,i], s=10)
Gives: