Multiple files, multiple plots saved to a multipage, single pdf file - python

I am working with >100 csv files while I am opening and plotting in a loop. My aim is to save each plot on a pdf page and generate a big pdf file with each page containing plot from a single file. I am looking at these examples - (1) and (2). Trying out combinations using matplotlib.backends.backend_pdf I am unable to get the required result.
Here I re-create my code and the approach I am using:
pdf = PdfPages('alltogther.pdf')
fig, ax = plt.subplots(figsize=(20,10))
for file in glob.glob('path*'):
df_in=pd.read_csv(file)
df_d = df_in.resample('d')
df_m = df_in.resample('m')
y1=df_d['column1']
y2=df_m['column2']
plt.plot(y1,linewidth='2.5')
plt.plot(y2,linewidth='2.5')
pdf.savefig(fig)
With this all the plots are getting superimposed on the same figure and the pdf generated is empty.

You need to move the line
fig, ax = plt.subplots(figsize=(20,10))
Inside the loop, otherwise each iteration will use the same figure instance instead of a new instance. Also note that you need to close the pdf when you are done with it. So the code should be
pdf = PdfPages('alltogther.pdf')
for file in glob.glob('path*'):
fig, ax = plt.subplots(figsize=(20,10))
df_in=pd.read_csv(file)
df_d = df_in.resample('d')
df_m = df_in.resample('m')
y1=df_d['column1']
y2=df_m['column2']
plt.plot(y1,linewidth='2.5')
plt.plot(y2,linewidth='2.5')
pdf.savefig(fig)
pdf.close()
Edit
Complete, self-contained example:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np
pdf = PdfPages('out.pdf')
for i in range(5):
fig, ax = plt.subplots(figsize=(20, 10))
plt.plot(np.random.random(10), linestyle=None, marker='.')
pdf.savefig(fig)
pdf.close()

Related

How to generate matplotlib figures inside a for loop

Hello I have a piece of code which reads an excel data file, does some stuff to it and then plots a figure. Now i want to be able to plot many excel data files at the same time and each should be plotted to its own figure.
example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import MultiCursor
import matplotlib.pylab as pl
Files_to_read = np.array([r"C:\file1",r"C:\file2"])
for ii in range(len(Files_to_read)):
df=pd.read_excel(Files_to_read[ii])
#do a lot of stuff to "df"
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2,2,sharex=True)
fig.suptitle('some name')
p1 = ax1.plot(df["some vector"], df["some vector"])
p2 = ax2.plot(df["some vector"], df["some vector"])
p3 = ax3.plot(df["some vector"], df["some vector"])
p4 = ax4.plot(df["some vector"], df["some vector"])
multi = MultiCursor(fig.canvas, (ax1, ax2, ax3, ax4), color='r', lw=1)
plt.show()
Doing it like this generates 1 figure with the data of the first file and then overwrites the same figure with the data of the second file, how can I change it to generate a new figure on each pass through the for loop?
Move your call to plt.show() so that it occurs after the for loop has been complete. All created figures should be visualized at once.

Matplotlib: Generating Subplots for Multiple Time Series

I have the following dataset that was randomly generated through a simulation I am building:
https://drive.google.com/drive/folders/1JF5QrliE9s8VPMaGc8Z-mwpFhNWkeYtk?usp=sharing
For debugging purposes, I would like to be able to view this data in a series of small multiples. Like this:
I am attempting to do this using matplotlib and pandas. Here is my code for that:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
def graph_file(f: str):
"""
Graphs a single file of data
and exports it as a pdf of separate charts.
"""
data = pd.read_csv(f)
header = data.columns
fname = f[:-4] + '.pdf'
with PdfPages(fname) as pdf:
n = len(header)
time: str = header[0]
# Multiple charts on one page
fig = plt.figure()
for i in range(1, n):
y: str = header[i]
ax = fig.add_subplot()
data.plot(x=time, y=y)
pdf.savefig(bbox_inches='tight')
When I open up the .csv file and try to run the function using a Jupyter notebook, I get the same deprecation warning over and over again:
<ipython-input-5-0563709f3c08>:24: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
ax = fig.add_subplot()
The resulting pdf file does not contain a single page with multiple graphs (which is what I want like in the first image) but just a single page with a single graph:
What exactly am I doing wrong? I greatly appreciate any feedback you can give.
Here is a solution that should meet your needs. It reads the csv file into a dataframe and iterates through the columns of the dataframe to plot corresponding subplots.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
def graph_file(f: str):
df = pd.read_csv(f)
fig, axs = plt.subplots(nrows=3, ncols=3)
fig.set_size_inches(20, 10)
fig.subplots_adjust(wspace=0.5)
fig.subplots_adjust(hspace=0.5)
fname = f[:-4] + '.pdf'
with PdfPages(fname) as pdf:
for col, ax in zip(df.columns[1:], axs.flatten()):
ax.plot(df['time (days)'], df[col])
ax.set(xlabel='time (days)', ylabel=col)
ax.tick_params(axis='x', labelrotation=30)
pdf.savefig(bbox_inches='tight')
plt.show()

How to make a graphing code graph with csv instead of txt

I have a code that lets me graph stacked bar charts, and it imports data from a .txt file.
df = pd.read_table("tiempo.txt",header =None, names = ['Tiempos no Contributivos','Tiempos Contributivos', 'Tiempos Productivos'])
df['bar'] = 1
dfgroup = df.groupby('bar').sum()
ax = dfgroup [['Tiempos no Contributivos','Tiempos Contributivos', 'Tiempos Productivos']].plot(kind='bar', title ="Tiempos de obra",
figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel(" ", fontsize=12)
ax.set_ylabel("Tiempo(segundos)", fontsize=12)
plt.show()
The graph this code throws, looks like this.
Graph Image
What i need is for this code to work like its been working, but instead of reading a .txt file, i need it to read a .csv file.
When i tried to switch the .txt file for the .csv one in the reading part, this happened.
Error
The data used in this code looks like this (Wont post the entirety of it because there are like 500 rows)
Example of the data used
So I fixed that and changed the function to read_csv. Then I noticed that your csv wasn't comma delimited, and was, instead, tab delimited. So, I added delimiter='\t' the read. Next, I went ahead adn updated the functions you were calling to set the ylabel and xlabel to their modern names. Then, I took your current column names and fed them into the plot, that way we have a properly labeled graph.
import pandas as pd
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
df = pd.read_csv("tiempo.txt",header=None,delimiter='\t',names = ['Tiempos no Contributivos','Tiempos Contributivos', 'Tiempos Productivos'])
y_pos = np.arange(len(df.columns))
bars = df.sum(axis=0)
plt.bar(y_pos, bars, align='center', alpha=0.5)
plt.xticks(y_pos, df.columns)
plt.ylabel("Tiempo(segundos)", fontsize=12)
plt.xlabel(" ", fontsize=12)
plt.title('Tiempo')
plt.show()
Output:

Plot multiple csv files with Python/matplotlib loop

I have a directory filled with multiple .csv files, each only has two columns (date and an integer). I am trying to get this code to loop over each file and plot them individually so that there is a corresponding .png to each .csv. Every time it runs, I end up with the correct number of .png files, but each has exactly the same data. I have already inplemented the plt.clf() method to clear it for each loop, but it doesn't work. Here is the code:
import numpy as np
import pylab as pl
import matplotlib.pyplot as plt
import datetime as DT
import matplotlib.dates as mdates
import scipy
import os
import glob
rootdir='/path/to/file'
for infile in glob.glob( os.rootdir.join(rootdir, '*.csv.out') ):
output = infile + '.out'
data= np.loadtxt(infile, delimiter=',',
dtype={'names': ('date', 'session'),'formats': ('S10', 'i4')} )
#Organizes 2-column spreadsheet
dates, sessions = map(list, zip(*data))
print dates, sessions
x = [DT.datetime.strptime(date,"%m-%d-%y") for date in dates]
y = [sessions]
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis_date()
ax.grid()
#Fills space under plotted line
ax.fill_between(x, sessions, color='blue')
# slants the x axis
fig.autofmt_xdate()
plt.plot(x,sessions)
plt.xlabel('Date')
plt.ylabel('Sessions')
plt.title('Peak Usage')
fileNameTemplate = r'\path\to\file\Plot{}.png'
for subdir,dirs,files in os.walk(rootdir):
for count, file in enumerate(files):
pl.savefig(fileNameTemplate.format(count), format='png')
pl.clf()
I modeled the enumerator after a solution in this answer but I am still getting an issue.
You need to:
define a function for your plots
call that function from your loop
include plt.close() at the end of said function.
Right now, you're not creating new plots as you walk the directory. The plot command needs to be inside the loop.
def plot():
#do your plotting in here. If this is being called from a loop and the
#variables used herein are defined before, it will use the
#global values as they exist at the time. You can also end this function with
fig.savefig(**args)
plt.close()
for count, file in enumerate(files):
plot()

plot data from CSV file with matplotlib

I have a CSV file at e:\dir1\datafile.csv.
It contains three columns and 10 heading and trailing lines need to be skipped.
I would like to plot it with numpy.loadtxt(), for which I haven't found any rigorous documentation.
Here is what I started to write from the several tries I found on the web.
import matplotlib as mpl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cbook as cbook
def read_datafile(file_name):
# the skiprows keyword is for heading, but I don't know if trailing lines
# can be specified
data = np.loadtxt(file_name, delimiter=',', skiprows=10)
return data
data = read_datafile('e:\dir1\datafile.csv')
x = ???
y = ???
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("Mains power stability")
ax1.set_xlabel('time')
ax1.set_ylabel('Mains voltage')
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
According to the docs numpy.loadtxt is
a fast reader for simply formatted files. The genfromtxt function provides more sophisticated handling of, e.g., lines with missing values.
so there are only a few options to handle more complicated files.
As mentioned numpy.genfromtxt has more options. So as an example you could use
import numpy as np
data = np.genfromtxt('e:\dir1\datafile.csv', delimiter=',', skip_header=10,
skip_footer=10, names=['x', 'y', 'z'])
to read the data and assign names to the columns (or read a header line from the file with names=True) and than plot it with
ax1.plot(data['x'], data['y'], color='r', label='the data')
I think numpy is quite well documented now. You can easily inspect the docstrings from within ipython or by using an IDE like spider if you prefer to read them rendered as HTML.
I'm guessing
x= data[:,0]
y= data[:,1]

Categories