How to generate matplotlib figures inside a for loop - python

Hello I have a piece of code which reads an excel data file, does some stuff to it and then plots a figure. Now i want to be able to plot many excel data files at the same time and each should be plotted to its own figure.
example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import MultiCursor
import matplotlib.pylab as pl
Files_to_read = np.array([r"C:\file1",r"C:\file2"])
for ii in range(len(Files_to_read)):
df=pd.read_excel(Files_to_read[ii])
#do a lot of stuff to "df"
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2,2,sharex=True)
fig.suptitle('some name')
p1 = ax1.plot(df["some vector"], df["some vector"])
p2 = ax2.plot(df["some vector"], df["some vector"])
p3 = ax3.plot(df["some vector"], df["some vector"])
p4 = ax4.plot(df["some vector"], df["some vector"])
multi = MultiCursor(fig.canvas, (ax1, ax2, ax3, ax4), color='r', lw=1)
plt.show()
Doing it like this generates 1 figure with the data of the first file and then overwrites the same figure with the data of the second file, how can I change it to generate a new figure on each pass through the for loop?

Move your call to plt.show() so that it occurs after the for loop has been complete. All created figures should be visualized at once.

Related

Hover all data in y-axis in mplcursor python

Currently I'm doing some data visualization using python, matplotlib and mplcursor that requires to show different parameters and values at the same time in a certain time period.
Sample CSV data that was extracted from a system:
https://i.stack.imgur.com/fjd1d.png
My expected output would look like this:
https://i.stack.imgur.com/zXGXA.png
Found the same case but they were using numpy functions: Add the vertical line to the hoverbox (see pictures)
Hoping someone will suggest what is the best approach of my problem.
Code below:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
import pandas as pd
fig, ax=plt.subplots()
y1=ax.twinx()
y2=ax.twinx()
y2.spines.right.set_position(("axes", 1.05))
df=pd.read_csv(r"C:\Users\OneDrive\Desktop\sample.csv")
time=df['Time']
yd1=df['Real Power']
yd2=df['Frequency']
yd3=df['SOC']
l1=ax.plot(time,yd1,color='black', label='Real Power')
l2=y1.plot(time,yd2, color='blue', label='Frequency')
l3=y2.plot(time,yd3, color='orange', label='SOC')
df=pd.DataFrame(df)
arr=df.to_numpy()
print(arr)
def show_annotation(sel):
x=sel.target[0]
annotation_str = df['Real Power'][sel.index]
#sel.annotation.set_text(annotation_str)
fig.autofmt_xdate()
cursor=mplcursors.cursor(hover=True)
cursor.connect('add', show_annotation)
plt.show()```

How can I plot specific attributes rather than default of all attributes in Time Series

How can I plot specific attributes of a time series and not the default of all attributes in the Data Frame. I would like to make a Time Series of a particular attribute and two particular attributes. Is it possible to make a time series graph of headcount and another time series graph of headcount and tables open? Below is the code I have been using, if I try and call specific variables I get error codes. Thanks in advance
# Load necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load data
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df.describe()
headcount_df.columns
ax = plt.figure(figsize=(12, 3)).gca() # define axis
headcount_df.plot(ax = ax)
ax.set_xlabel('Date')
ax.set_ylabel('Number of guests')
ax.set_title('Time series of Casino data')
You might have to mess around with the ticks and some other formatting, but this should get you headed in the right direction.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df['DateFormat'] = pd.to_datetime(headcount_df['DateFormat'].fillna('ffill'))
headcount_df.set_index('DateFormat', inplace=True)
headcount_df.sort_index(inplace=True)
headcount_df_to = headcount_df[['TablesOpen']]
headcount_df_hc_to = headcount_df[['HeadCount', 'TablesOpen']]
fig, axes = plt.subplots(nrows=2, ncols=1,
figsize=(12, 8))
headcount_df_to.plot(ax=axes[0], color=['orange'])
headcount_df_hc_to.plot(ax=axes[1], color=['blue', 'orange'])
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Tables Open')
axes[0].legend(loc='center left', bbox_to_anchor=(1, 0.5))
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Number of guests and Tables Open')
axes[1].legend(loc='center left', bbox_to_anchor=(1, 0.5))
fig.suptitle('Time Series of Casino data')

Multiple files, multiple plots saved to a multipage, single pdf file

I am working with >100 csv files while I am opening and plotting in a loop. My aim is to save each plot on a pdf page and generate a big pdf file with each page containing plot from a single file. I am looking at these examples - (1) and (2). Trying out combinations using matplotlib.backends.backend_pdf I am unable to get the required result.
Here I re-create my code and the approach I am using:
pdf = PdfPages('alltogther.pdf')
fig, ax = plt.subplots(figsize=(20,10))
for file in glob.glob('path*'):
df_in=pd.read_csv(file)
df_d = df_in.resample('d')
df_m = df_in.resample('m')
y1=df_d['column1']
y2=df_m['column2']
plt.plot(y1,linewidth='2.5')
plt.plot(y2,linewidth='2.5')
pdf.savefig(fig)
With this all the plots are getting superimposed on the same figure and the pdf generated is empty.
You need to move the line
fig, ax = plt.subplots(figsize=(20,10))
Inside the loop, otherwise each iteration will use the same figure instance instead of a new instance. Also note that you need to close the pdf when you are done with it. So the code should be
pdf = PdfPages('alltogther.pdf')
for file in glob.glob('path*'):
fig, ax = plt.subplots(figsize=(20,10))
df_in=pd.read_csv(file)
df_d = df_in.resample('d')
df_m = df_in.resample('m')
y1=df_d['column1']
y2=df_m['column2']
plt.plot(y1,linewidth='2.5')
plt.plot(y2,linewidth='2.5')
pdf.savefig(fig)
pdf.close()
Edit
Complete, self-contained example:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
pdf = PdfPages('out.pdf')
for i in range(5):
fig, ax = plt.subplots(figsize=(20, 10))
plt.plot(np.random.random(10), linestyle=None, marker='.')
pdf.savefig(fig)
pdf.close()

How to make horizontal linechart with categorical variables and timeseries?

I want to replicate plots from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000555/pdf/nihms774453.pdf I'm particularly interested in plot on page 16, right panel. I tried to do this in matplotlib but it seems to me that there is no way to access lines in linecollection.
I don't know how to change the color of the each line, according to the value at every index. I'd like to eventually get something like here: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/multicolored_line.html but for every line, according to the data.
this is what I tried:
the data in numpy array: https://pastebin.com/B1wJu9Nd
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib import colors as mcolors
%matplotlib inline
base_range = np.arange(qq.index.max()+1)
fig, ax = plt.subplots(figsize=(12,8))
ax.set_xlim(qq.index.min(), qq.index.max())
# ax.set_ylim(qq.columns[0], qq.columns[-1])
ax.set_ylim(-5, len(qq.columns) +5)
line_segments = LineCollection([np.column_stack([base_range, [y]*len(qq.index)]) for y in range(len(qq.columns))],
cmap='viridis',
linewidths=(5),
linestyles='solid',
)
line_segments.set_array(base_range)
ax.add_collection(line_segments)
axcb = fig.colorbar(line_segments)
plt.show()
my result:
what I want to achieve:

iPython/Jupyter Notebook and Pandas, how to plot multiple graphs in a for loop?

Consider the following code running in iPython/Jupyter Notebook:
from pandas import *
%matplotlib inline
ys = [[0,1,2,3,4],[4,3,2,1,0]]
x_ax = [0,1,2,3,4]
for y_ax in ys:
ts = Series(y_ax,index=x_ax)
ts.plot(kind='bar', figsize=(15,5))
I would expect to have 2 separate plots as output, instead, I get the two series merged in one single plot.
Why is that? How can I get two separate plots keeping the for loop?
Just add the call to plt.show() after you plot the graph (you might want to import matplotlib.pyplot to do that), like this:
from pandas import Series
import matplotlib.pyplot as plt
%matplotlib inline
ys = [[0,1,2,3,4],[4,3,2,1,0]]
x_ax = [0,1,2,3,4]
for y_ax in ys:
ts = Series(y_ax,index=x_ax)
ts.plot(kind='bar', figsize=(15,5))
plt.show()
In the IPython notebook the best way to do this is often with subplots. You create multiple axes on the same figure and then render the figure in the notebook. For example:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
ys = [[0,1,2,3,4],[4,3,2,1,0]]
x_ax = [0,1,2,3,4]
fig, axs = plt.subplots(ncols=2, figsize=(10, 4))
for i, y_ax in enumerate(ys):
pd.Series(y_ax, index=x_ax).plot(kind='bar', ax=axs[i])
axs[i].set_title('Plot number {}'.format(i+1))
generates the following charts

Categories