Plotting multiple columns on the same figure using pandas [duplicate] - python

This question already has answers here:
How to plot different groups of data from a dataframe into a single figure
(5 answers)
Closed 5 months ago.
I am trying to plot multiple different lines on the same graph using pandas and matplotlib.
I have a series of 100 synthetic temperature histories that I want to plot, all in grey, against the original real temperature history I used to generate the data.
How can I plot all of these series on the same graph? I know how to do it in MATLAB but am using Python for this project, and pandas has been the easiest way I have found to read in every single column from the output file without having to specify each column individually. The number of columns of data will change from 100 up to 1000, so I need a generic solution. My code plots each of the data series individually fine, but I just need to work out how to add them both to the same figure.
Here is the code so far:
# dirPath is the path to my working directory
outputFile = "output.csv"
original_data = "temperature_data.csv"
# Read in the synthetic temperatures from the output file, time is the index in the first column
data = pd.read_csv(outputFile,header=None, skiprows=1, index_col=0)
# Read in the original temperature data, time is the index in the first column
orig_data = pd.read_csv(dirPath+original_data,header=None, skiprows=1, index_col=0)
# Convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)
# Plot all columns of synthetic data in grey
data = data.plot.line(title="ARMA Synthetic Temperature Histories",
xlabel="Time (yrs)",
ylabel=("Synthetic avergage hourly temperature (C)"),
color="#929591",
legend=None)
# Plot one column of original data in black
orig_data = orig_data.plot.line(color="k",legend="Original temperature data")
# Create and save figure
fig = data.get_figure()
fig = orig_data.get_figure()
fig.savefig("temp_arma.png")
This is some example data for the output data:
And this is the original data:
Plotting each individually gives these graphs - I just want them overlaid!

Your data.plot.line returns an AxesSubplot instance, you can catch it and feed it to your second command:
# plot 1
ax = data.plot.line(…)
# plot 2
data.plot.line(…, ax=ax)
Try to run this code:
# convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)
# Plot all columns of synthetic data in grey
ax = data.plot.line(title="ARMA Synthetic Temperature Histories",
xlabel="Time (yrs)",
ylabel=("Synthetic avergage hourly temperature (C)"),
color="#929591",
legend=None)
# Plot one column of original data in black
orig_data.plot.line(color="k",legend="Original temperature data", ax=ax)
# Create and save figure
ax.figure.savefig("temp_arma.png")

You should directy use matplotlib functions. It offers more control and is easy to use as well.
Part 1 - Reading files (borrowing your code)
# Read in the synthetic temperatures from the output file, time is the index in the first column
data = pd.read_csv(outputFile,header=None, skiprows=1, index_col=0)
# Read in the original temperature data, time is the index in the first column
orig_data = pd.read_csv(dirPath+original_data,header=None, skiprows=1, index_col=0)
# Convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)
Part 2 - Plotting
fig = plt.figure(figsize=(10,8))
ax = plt.gca()
# Plotting all histories:
# 1st column contains time hence excluding
for col in data.columns[1:]:
ax.plot(data["Time"], data[col], color='grey')
# Orig
ax.plot(orig_data["Time"], orig_data["Temperature"], color='k')
# axis labels
ax.set_xlabel("Time (yrs)")
ax.set_ylabel("avergage hourly temperature (C)")
fig.savefig("temp_arma.png")

Try the following:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
data.plot.line(ax=ax,
title="ARMA Synthetic Temperature Histories",
xlabel="Time (yrs)",
ylabel="Synthetic avergage hourly temperature (C)",
color="#929591",
legend=False)
orig_data.rename(columns={orig_data.columns[0]: "Original temperature data"},
inplace=True)
orig_data.plot.line(ax=ax, color="k")
It's pretty much your original code with the following slight modifications:
Getting the ax object
fig, ax = plt.subplots()
and using it for the plotting
data.plot.line(ax=ax, ...
...
orig_data.plot.line(ax=ax, ...)
Result for some randomly generated sample data:
import random # For sample data only
# Sample data
data = pd.DataFrame({
f'col_{i}': [random.random() for _ in range(25)]
for i in range(1, 50)
})
orig_data = pd.DataFrame({
'col_0': [random.random() for _ in range(25)]
})

Related

Plotting multiple excel sheets on the same graph

I have this excel file that i need to plot. So far my code looks like this
import pandas as pd
import matplotlib.pyplot as plt
file = 'weatherdata.xlsx'
def plotMeteoData(file,x_axis,metric,*list_of_cities):
df = pd.ExcelFile(file)
sheets = df.sheet_names
print(sheets)
df_list = []
for city in list_of_cities:
df_list.append(df.parse(city))
x=range(len(df_list[0][x_axis].tolist()))
width = 0.3
for j,df_item in enumerate(df_list):
plt.bar(x,df_item[metric],width,label = sheets[j]) #this isn't working
x = [i+width for i in x]
plt.xlabel(x_axis)
plt.ylabel(metric)
plt.xticks(range(len(df_list[0][x_axis].tolist())),labels=df_list[0][x_axis].tolist())
plt.show()
t=['Αθήνα','Θεσσαλονίκη','Πάτρα']
plotMeteoData(file,'Μήνας','Υγρασία',*t)
and gives this output.
Each color represents an excel sheet, x-axis represents the months and y-axis represents some values.
I've commented the line where I'm trying to add some labels for each sheet and I'm unable to. Also if you look at the above output the bars aren't centered with each xtick. How can I fix those problems? Thanks
Typically you use plt.subplots, as it gives you more control over the graph. The code below calculates the offset needed for the xtick labels to be centered and shows the legend with the city labels:
import pandas as pd
import matplotlib.pyplot as plt
file = 'weatherdata.xlsx'
def plotMeteoData(file,x_axis,metric,*list_of_cities):
df = pd.ExcelFile(file)
sheets = df.sheet_names
print(sheets)
df_list = []
for city in list_of_cities:
df_list.append(df.parse(city))
x=range(len(df_list[0][x_axis].tolist()))
width = 0.3
# Calculate the offset of the center of the xtick labels
xTickOffset = width*(len(list_of_cities)-1)/2
# Create a plot
fig, ax = plt.subplots()
for j,df_item in enumerate(df_list):
ax.bar(x,df_item[metric],width,label = sheets[j]) #this isn't working
x = [i+width for i in x]
ax.set_xlabel(x_axis)
ax.set_ylabel(metric)
# Add a legend (feel free to change the location)
ax.legend(loc='upper right')
# Add the xTickOffset to the xtick label positions so they are centered
ax.set_xticks(list(map(lambda x:x+xTickOffset, range(len(df_list[0][x_axis].tolist())))),labels=df_list[0][x_axis].tolist())
plt.show()
t=['Athena', 'Thessaloniki', 'Patras']
plotMeteoData(file,'Month','Humidity',*t)
Resulting Graph:
The xtick offset should account for different numbers of excel pages. See this for more information on legends.

Matplotlib Time-Series Heatmap Visualization Row Modification

Thank you in advance for the assistance!
I am trying to create a heat map from time-series data and the data begins mid year, which is causing the top of my heat map to be shifted to the left and not match up with the rest of the plot (Shown Below). How would I go about shifting the just the top line over so that the visualization of the data syncs up with the rest of the plot?
(Code Provided Below)
import pandas as pd
import matplotlib.pyplot as plt
# links to datadata
url1 = 'https://raw.githubusercontent.com/the-datadudes/deepSoilTemperature/master/minotDailyAirTemp.csv'
# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = pd.read_csv(url1, parse_dates=['Date'], index_col=['Date'])
# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
Now, what its the problem, the dates on the dataset, if you see the Dataset this start on
`1990-4-24,15.533`
To solve this is neccesary to add the data between 1990/01/01 -/04/23 and delete the 29Feb.
rng = pd.date_range(start='1990-01-01', end='1990-04-23', freq='D')
df = pd.DataFrame(index= rng)
df.index = pd.to_datetime(df.index)
df['Temp'] = np.NaN
frames = [df, df1]
result = pd.concat(frames)
result = result[~((result.index.month == 2) & (result.index.day == 29))]
With this data
dfg1 = result.groupby(result.index.year).agg({'Temp': list})
df1_wide = pd.DataFrame(dfg1['Temp'].tolist(), index=dfg1.index)
# ploting the data
fig, (ax1) = plt.subplots(ncols=1, figsize=(20, 5))
ax1.matshow(df1_wide, interpolation=None, aspect='auto');
The problem with the unfilled portions are a consequence of the NaN values on your dataset, in this case you take the option, replace the NaN values with the column-mean or replace by the row-mean.
Another ways are available to replace the NaN values
df1_wide = df1_wide.apply(lambda x: x.fillna(x.mean()),axis=0)

How to plot from multiple datasets with same fieldnames?

I have a few monthly datasets of usage stats stored in different CSVs, with a couple hundred fields. I am cutting off the top 30 of each one, but the bottom will change (and the top as changes as stuff is banned, albeit less commonly). Currently I have the lines representing months, but I want the points to be (y=usage %) and (x=month) with the legend being different users.
column[0] is their number in the file (1-30)
column[1] is their name
column[2] is the usage percent
AprilStats = pd.read_csv(r'filepath', nrows=30)
MayStats = pd.read_csv(r'filepath', nrows=30)
JuneStats = pd.read_csv(r'filepath', nrows=30)
## Assign labels and sources
labels = [[AprilStats.columns[1]], [MayStats.columns[1]], [JuneStats.columns[1]]]
AprilUsage=np.array(AprilStats[AprilStats.columns[2]].tolist())
MayUsage=np.array(MayStats[MayStats.columns[2]].tolist())
JuneUsage=np.array(JuneStats[JuneStats.columns[2]].tolist())
x = np.array(AprilStats[AprilStats.columns[0]].tolist())
y = np.array(AprilStats[AprilStats.columns[2]].tolist())
my_xticks = AprilStats[AprilStats.columns[1]].tolist()
plt.xticks(x, my_xticks, rotation='55')
x1 = np.array(MayStats[MayStats.columns[0]].tolist())
y1 = np.array(MayStats[MayStats.columns[2]].tolist())
my_xticks1 = MayStats[MayStats.columns[1]].tolist()
plt.xticks(x, my_xticks1, rotation='55')
x2 = np.array(JuneStats[JuneStats.columns[0]].tolist())
y2 = np.array(JuneStats[JuneStats.columns[2]].tolist())
my_xticks2 = JuneStats[JuneStats.columns[1]].tolist()
plt.xticks(x, my_xticks2, rotation='55',)
### Plot the data
plt.rc('xtick', labelsize='xx-small')
plt.title('Little Cup Usage')
plt.ylabel('Usage (Percent)')
plt.plot(x,y,label='April', color='green', alpha=.4)
plt.plot(x1,y1,label='May', color='blue', alpha=.4)
plt.plot(x2,y2,label='June', color='red', alpha=.4)
plt.subplots_adjust(bottom=.2)
plt.legend()
plt.savefig('90daytest.png', dpi=500)
plt.show()
I think I am mislabeling them, but the month of usage isn't stored in the file. I reckon I could add it, but I'd like to not have to go in and edit these files every month. Also, sorry if this is horribly inneficient coding, I have just started learning python less than two weeks ago and this is a little project for me to learn with.
I'd divide this into two steps:
Gather all the data into a single dataframe in which the rows correspond to the different months, the columns to the different names and the values are the usage %.
Plot each column as a different series in a scatter plot.
Step 1:
# Create a dictionary associating a file to each month
files = {dt.date(2019, 4, 1): 'april.csv',
dt.date(2019, 5, 1): 'may.csv'}
# An empty data frame
df = pd.DataFrame()
''' For each file, generate a one entry data frame as follows, and append it to df.
Month name1 name2 ...
2019-1-1 0.5 0.2
'''
for month, file in files.items():
data = pd.read_csv(file, usecols=['name', 'usage'], index_col='name')
data = data.transpose()
data['month'] = month
data = data.set_index('month')
df = df.append(data)
Step 2:
# New figure
fig = plt.figure()
# Plot one series for each column in df
for name in df.columns:
plt.scatter(x=df.index, y=df[name], label=name)
# Additional plot formatting code here
plt.show()
I hope that helps.

How to plot both Price and Volume in same Chart

I have a dataframe as mentioned below:
Date,Time,Price,Volume
31/01/2019,09:15:00,10691.50,600
31/01/2019,09:15:01,10709.90,13950
31/01/2019,09:15:02,10701.95,9600
31/01/2019,09:15:03,10704.10,3450
31/01/2019,09:15:04,10700.05,2625
31/01/2019,09:15:05,10700.05,2400
31/01/2019,09:15:06,10698.10,3000
31/01/2019,09:15:07,10699.90,5925
31/01/2019,09:15:08,10699.25,5775
31/01/2019,09:15:09,10700.45,5925
31/01/2019,09:15:10,10700.00,4650
31/01/2019,09:15:11,10699.40,8025
31/01/2019,09:15:12,10698.95,5025
31/01/2019,09:15:13,10698.45,1950
31/01/2019,09:15:14,10696.15,3900
31/01/2019,09:15:15,10697.15,2475
31/01/2019,09:15:16,10697.05,4275
31/01/2019,09:15:17,10696.25,3225
31/01/2019,09:15:18,10696.25,3300
The data frame contains approx 8000 rows. I want plot both price and volume in same chart. (Volume Range: 0 - 8,00,000)
Suppose you want to compare price and volume vs time, try this:
df = pd.read_csv('your_path_here')
df.plot('Time', ['Price', 'Volume'], secondary_y='Price')
edit: x-axis customization
Since you want x-axis customization,try this (this is just a basic example you can follow):
# Create a Datetime column while parsing the csv file
df = pd.read_csv('your_path_here', parse_dates= {'Datetime': ['Date', 'Time']})
Then you need to create two list, one containing the position on the x-axis and the other one the labels.
Say you want labels every 5 seconds (your requests at 30 min is possibile but not with the data you provided)
positions = [p for p in df.Datetime if p.second in range(0, 60, 5)]
labels = [l.strftime('%H:%M:%S') for l in positions]
Then you plot passing the positions and labels lists to set_xticks and set_xticklabels
ax = df.plot('Datetime', ['Price', 'Volume'], secondary_y='Price')
ax.set_xticks(positions)
ax.set_xticklabels(labels)

Python: Legend has wrong colors on Pandas MultiIndex plot

I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
else:
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
Code:
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
Image:
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.

Categories