Plotting two datasets from different folders on one plot - python

I have two folders with similar number of files: maindirNo and maindirWith. I'm trying to plot each pair of similar files from folders on one plot:
for i in [maindirNo, maindirWith]:
for root, dirs, files in os.walk(i):
for fil in files:
if 'output.rsv' in fil:
df = pd.read_csv(os.path.join(i, fil), skiprows = 9, delimiter = r'\s+', header = None)
df['SIMULATEDm'] = mergedlevels
df['OBSERVEDm'] = df_observed['OBSERVEDm']
df['date'] = pd.date_range('1/1991','12/2040', freq='MS')
if i == maindirNo:
plt.plot(df['date'], df['SIMULATEDm'], 'b', label='No outlet')
if i == maindirWith:
plt.plot(df['date'], df['SIMULATEDm'], 'r', label='With outlet')
plt.legend(loc = 'lower right')
plt.savefig('C:/Users/sgulbin/Desktop/AGU_Conf/plots/%s.jpg' %fil)
The problem is that I either have all datesets plotted on one plot, or one plot for each file (I need two datasets on one plot). I assume I can append output to an empty dataframe and then plot it, but is there a simplest way to plot them through the loop?
P.S. I know there are kind of similar questions to this, but not exactly.

pandas uses matplotlib which gives fig and ax when you create many plots. ie. 5 plots in one column
fig, ax = plt.subplots(5, 1)
and then you can use ax[0], a[1] to choose plot for drawed line.
import matplotlib.pyplot as plt
import pandas as pd
import random
SIZE = 5
# create grid 5x1
fig, ax = plt.subplots(SIZE, 1)
# --- first folder --- blue ---
for idx in range(SIZE):
# dataframe with random data as example
df = pd.DataFrame([ random.randint(0,10) for _ in range(10) ])
# draw it
ax[idx].plot(df, 'b')
# --- second folder --- red ---
for idx in range(SIZE):
# dataframe with random data as example
df = pd.DataFrame([ random.randint(0,10) for _ in range(10) ])
# draw it
ax[idx].plot(df, 'r')


Plotting multiple excel sheets on the same graph

I have this excel file that i need to plot. So far my code looks like this
import pandas as pd
import matplotlib.pyplot as plt
file = 'weatherdata.xlsx'
def plotMeteoData(file,x_axis,metric,*list_of_cities):
df = pd.ExcelFile(file)
sheets = df.sheet_names
df_list = []
for city in list_of_cities:
width = 0.3
for j,df_item in enumerate(df_list):,df_item[metric],width,label = sheets[j]) #this isn't working
x = [i+width for i in x]
and gives this output.
Each color represents an excel sheet, x-axis represents the months and y-axis represents some values.
I've commented the line where I'm trying to add some labels for each sheet and I'm unable to. Also if you look at the above output the bars aren't centered with each xtick. How can I fix those problems? Thanks
Typically you use plt.subplots, as it gives you more control over the graph. The code below calculates the offset needed for the xtick labels to be centered and shows the legend with the city labels:
import pandas as pd
import matplotlib.pyplot as plt
file = 'weatherdata.xlsx'
def plotMeteoData(file,x_axis,metric,*list_of_cities):
df = pd.ExcelFile(file)
sheets = df.sheet_names
df_list = []
for city in list_of_cities:
width = 0.3
# Calculate the offset of the center of the xtick labels
xTickOffset = width*(len(list_of_cities)-1)/2
# Create a plot
fig, ax = plt.subplots()
for j,df_item in enumerate(df_list):,df_item[metric],width,label = sheets[j]) #this isn't working
x = [i+width for i in x]
# Add a legend (feel free to change the location)
ax.legend(loc='upper right')
# Add the xTickOffset to the xtick label positions so they are centered
ax.set_xticks(list(map(lambda x:x+xTickOffset, range(len(df_list[0][x_axis].tolist())))),labels=df_list[0][x_axis].tolist())
t=['Athena', 'Thessaloniki', 'Patras']
Resulting Graph:
The xtick offset should account for different numbers of excel pages. See this for more information on legends.

How to use for loop to plot in subplots in Python

Example Input Data:
I am a beginner in python. I use for loop to read several csv files look like above(all of those file are same format).
so far my code was look like below.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
ax, fig = plt.subplots(4,4,sharex=False, sharey=False, figsize=(22, 10), dpi=70, linewidth=0.5)
ax = np.array(ax)
def loop_directory(directory):
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file_directory = os.path.join(directory, filename)
# print(filename)
df = pd.read_csv(file_directory)
df = df[df['Tavg'].isin([-999]) == False]
df[['Year','Month']] = df[['Year','Month']].astype(int).astype(str)
df["Year&Month"] = df[['Year', 'Month']].agg("/".join,axis=1)
df["Year&Month"] = pd.to_datetime(df["Year&Month"])
x = df["Year&Month"]
y = df["Tavg"]
for axes,col in zip(x, y):
axes.plot(df.index, df[col]) # here is the problem, i dont know how to use for loop to plot in subplots
if __name__ == "__main__":
I've tried for ten more times but didn't work at all.
I want to know how to use those syntaxes ex. ax zip ,etc.
enter image description here
I want to plot in every subplot in one plot.
it should have plot every ax.
Firstly, you have your fig and ax reversed in your call to plt.subplots, it should be:
fig, ax = plt.subplots(4,4,sharex=False, sharey=False, figsize=(22, 10), dpi=70, linewidth=0.5)
You can then access each set of axes to call plot by indexing. You can index the 4 by 4 numpy array to get each axes set in your 4 by 4 grid of plots. i.e. ax[0, 0].plot(...), ax[0, 1].plot(...), etc. up to ax[3, 3].plot(...)
Your question needs a bit more information to clarify how you want the data plotted though! I can see you combine the first two columns so that you have 4 columns, but consider how do you want each sample to be plotted.
EDIT: As you want to plot your files sequentially in ax[0, 0], ax[0, 1], etc., you can flatten the 2D numpy array of axes to get a 1D iterable that you can loop through or index with one value. I don't have your files so I can't test it but here's some demo code that should give you an idea of what to do.
As #sam mentioned in the comments, you should seperate your csv collection logic and your plotting logic.
def loop_directory(directory):
# Get all files, filter for '.csv' and prepend dir path
files = os.listdir(directory)
csvs = [os.path.join(directory, f) for f in files if f.endswith('.csv')]
return csvs
def plot_csvs(csvs):
fig, ax = plt.subplots(4, 4, sharex=False, sharey=False, figsize=(22, 10), dpi=70, linewidth=0.5)
ax = np.array(ax).flatten() # Flatten to 1D, [0 ,0], [0, 1], etc
# This assumes number of subplots >= number of CSVs
for i, filename in enumerate(csvs):
df = pd.read_csv(filename)
# Do your processing here
x = df["Year&Month"]
y = df["Tavg"]
ax[i].plot(x, y)
csv_dir = '/path/to/csv/dir'
csv_paths = loop_directory(csv_dir)

Merging legends of different lineplots using seaborn

I would like to plot two dataframes with a 'long' representation, and differing axis, to one plot using sns.lineplot(). Yet, I am failing plot it with a single legend containing the elements of both lineplots.
The issue is similar to this: Secondary axis with twinx(): how to add to legend?, though I'd like to use seaborn.
A minimal working example up to the point I got stuck is given below.
import pandas as pd
import seaborn as sns
import numpy as np
import itertools
# mock dataset
lst = range(1,11)
steps1 = list(itertools.chain.from_iterable(itertools.repeat(x, 4) for x in lst))
labels1 = ['A','B']*20
values1 = list(np.random.uniform(0,1,40))
df1 = pd.DataFrame({'steps':steps1, 'lab':labels1, 'vals':values1})
lst = range(6,11)
steps2 = list(itertools.chain.from_iterable(itertools.repeat(x, 4) for x in lst))
labels2 = ['C','D']*10
values2 = list(np.random.uniform(10,20,20))
df2 = pd.DataFrame({'steps':steps2, 'lab2':labels2, 'others':values2})
# plotting
fig, ax = plt.subplots()
fig = sns.lineplot(x='steps',y='vals', data=df1, hue='lab',palette='bright', legend='brief')
ax2 = ax.twinx()
fig2 = sns.lineplot(x='steps',y='others', hue='lab2', data=df2 ,palette='dark', legend='brief')
# How do I merge the legends into one?
# the solution below gives me one merged and one separate legend
h1,l1 = fig.get_legend_handles_labels()
h2,l2 = fig2.get_legend_handles_labels()
ax.legend(loc=3, handles=h1+h2, labels = l1+l2)
I just resolved it by removing the obsolete legend by ax2.get_legend().remove().

How to plot data from multiple files in a loop

I have a more than 1000 .csv files (data_1.csv......data1000.csv), each containing X and Y values!
x1 y1 x2 y2
5.0 60 5.5 500
6.0 70 6.5 600
7.0 80 7.5 700
8.0 90 8.5 800
9.0 100 9.5 900
I have made a subplot program in python which can give two plots (plot1 - X1vsY1, Plot2 - X2vsY2) at a time using one file.
I need help in looping all the files, (open a file, read it, plot it, pick another file, open it, read it, plot it, ... until all the files in a folder get plotted)
I have the following code:
import pandas as pd
import matplotlib.pyplot as plt
fig = plt.figure()
plt.subplot(2, 1, 1)
plt.subplot(2, 1, 2)
How can this be accomplished more efficiently?
You can generate a list of filenames using glob and then plot them in a for loop.
import glob
import pandas as pd
import matplotlib.pyplot as plt
files = glob.glob(# file pattern something like '*.csv')
for file in files:
fig = plt.figure()
plt.subplot(2, 1, 1)
plt.subplot(2, 1, 2)
plt.plot(df1.iloc[:,[3]],df1.iloc[:,[4]]) # this wil stop the loop until you close the plot
I used NetCDF(.nc) just in case anyone is interested in using NetCDF data. Also, you could replace it with .txt too, the idea is the same. I used this for a contour plot loop.
path_to_folder='#type the path to the files'
fig = plt.figure(figsize=(10,5))
files = []
for i in os.listdir(path_to_folder):
if i.endswith('.nc'):
plt.subplot(1, 2, count) # change 1 and 2 to the shape you want
prec.groupby('time.month').mean(dim=('time','longitude')).T.plot.contourf(cmap='Purples') *#this is to plot contour plot but u can replace with any plot command
Here is the basic setup for what am using here at work. This code will plot the data from each file and through each file separately. This will work on any number of files as long as column names remain the same. Just direct it to the proper folder.
import os
import csv
def graphWriterIRIandRut():
m = 0
List1 = []
List2 = []
List3 = []
List4 = []
fileList = []
for file in os.listdir(os.getcwd()):
while m < len(fileList):
for col in csv.DictReader(open(fileList[m],'rU')):
List1.append(col['Col 1 Name'])
List2.append(col['Col 2 Name'])
List3.append(col['Col 3 Name'])
List4.append(col['Col 4 Name'])
plt.subplot(2, 1, 1)
colors = np.random.rand(n)
plt.tick_params(axis='both', which='major', labelsize=8)
plt.subplot(2, 1, 2)
colors = np.random.rand(n)
plt.tick_params(axis='both', which='major', labelsize=8)
m = m + 1
# plotting all the file data and saving the plots
import os
import csv
import matplotlib.pyplot as plt
def graphWriterIRIandRut():
m = 0
List1 = []
List2 = []
List3 = []
List4 = []
fileList = []
for file in os.listdir(os.getcwd()):
while m < len(fileList):
for col in csv.DictReader(open(fileList[m],'rU')):
plt.subplot(2, 1, 1)
# colors = np.random.rand(2)
plt.tick_params(axis='both', which='major', labelsize=8)
plt.subplot(2, 1, 2)
# colors = np.random.rand(2)
plt.tick_params(axis='both', which='major', labelsize=8)
m = m + 1
What we want to do is for each iteration, or file, create a new empty list. So for each iteration the data will be plotted, but once that data has been plotted a new empty list will be created, and plotted. Once all the data from each file has been plotted, then you want to finally to which will show all the plots together. Here is a link to a similar problem I was having: Traceback lines on plot of multiple files. Goog luck!
import csv
import matplotlib.pyplot as plt
def graphWriter():
for file in os.listdir(os.getcwd()):
List1 = []
List2 = []
List3 = []
List4 = []
with open(filename, 'r') as file:
for col in csv.DictReader(file):
plt.subplot(2, 1, 1)
colors = np.random.rand(2)
plt.tick_params(axis='both', which='major', labelsize=8)
plt.subplot(2, 1, 2)
colors = np.random.rand(2)
plt.tick_params(axis='both', which='major', labelsize=8)
If for some reason #Neill Herbst answer didnt work as expected (i consider the easiest way) I run with a problem reading the files I rearrenged the code that worked for me
import glob
import pandas as pd
import matplotlib.pyplot as plt
for file in glob.glob("*.csv")::
fig = plt.figure()
plt.subplot(2, 1, 1)
plt.subplot(2, 1, 2)
plt.plot(df1.iloc[:,[3]],df1.iloc[:,[4]]) # plot one csv when you close it, plots next one <------ if u want to see all the plots in different windows
Using p = Path(...): p → WindowsPath('so_data/files')
files = p.rglob(...) yields all files matching the pattern
file[0] → WindowsPath('so_data/files/data_1.csv')
p.parent / 'plots' / f'{file.stem}.png' → WindowsPath('so_data/plots/data_1.png')
p.parent → WindowsPath('so_data')
file.stem → data_1
This assumes all directories exist. Directory creation / checking is not included.
This example uses pandas, as does the OP.
Plotted with pandas.DataFrame.plot, which uses matplotlib as the default backend.
Use .iloc to specify the columns, and then x=0 will always be the x-axis data, based on the given example data.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
p = Path('so_data/files') # specify the path to the files
files = p.rglob('data_*.csv') # generator for all files based on rglob pattern
for file in files:
df = pd.read_csv(file, header=0, sep=',') # specify header row and separator as needed
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(7, 5))
df.iloc[:, [0, 1]].plot(x=0, ax=ax1) # plot 1st x/y pair; assumes x data is at position 0
df.iloc[:, [2, 3]].plot(x=0, ax=ax2) # plot 2nd x/y pair; assumes x data is at position 0
fig.savefig(p.parent / 'plots' / f'{file.stem}.png')
plt.close(fig) # close each figure, otherwise they stay in memory
Sample Data
This is for testing the plotting code
Create a so_data/files directory manually.
df = pd.DataFrame({'x1': [5.0, 6.0, 7.0, 8.0, 9.0], 'y1': [60, 70, 80, 90, 100], 'x2': [5.5, 6.5, 7.5, 8.5, 9.5], 'y2': [500, 600, 700, 800, 900]})
for x in range(1, 1001):
df.to_csv(f'so_data/files/data_{x}.csv', index=False)
Alternate Answer
This answer addresses cases where there are many consecutive pairs of x/y columns
df.column creates an array of columns, that can be chunked into pairs
For consecutive column pairs, this answer works
list(zip(*[iter(df.columns)]*2)) → [('x1', 'y1'), ('x2', 'y2')]
If necessary, use some other pattern to create pairs of columns
Use .loc, since there will be column names, instead of .iloc for column indices.
p = Path('so_data/files')
files = p.rglob('data_*.csv')
for file in files:
df = pd.read_csv(file, header=0, sep=',')
col_pair = list(zip(*[iter(df.columns)]*2)) # extract column pairs
fig, axes = plt.subplots(len(col_pair), 1) # a number of subplots based on number of col_pairs
axes = axes.ravel() # flatten the axes if necessary
for cols, ax in zip(col_pair, axes):
df.loc[:, cols].plot(x=0, ax=ax) # assumes x data is at position 0
fig.savefig(p.parent / 'plots' / f'{file.stem}.png')

How to plot multiple dataframes in subplots

I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.
You can see in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.
You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
for r in range(nrow):
for c in range(ncol):
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.
You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
df.A.plot() #no need to specify for first axis
You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
ax = fig.add_subplot(222)
You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.
Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
df.plot.barh(ax=ax1, stacked=True)
Here is a working pandas subplot example, where modes is the column names of the dataframe.
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]][0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
for ax in ax.flatten():
df.plot(ax =ax)
