I have a few monthly datasets of usage stats stored in different CSVs, with a couple hundred fields. I am cutting off the top 30 of each one, but the bottom will change (and the top as changes as stuff is banned, albeit less commonly). Currently I have the lines representing months, but I want the points to be (y=usage %) and (x=month) with the legend being different users.
column[0] is their number in the file (1-30)
column[1] is their name
column[2] is the usage percent
AprilStats = pd.read_csv(r'filepath', nrows=30)
MayStats = pd.read_csv(r'filepath', nrows=30)
JuneStats = pd.read_csv(r'filepath', nrows=30)
## Assign labels and sources
labels = [[AprilStats.columns[1]], [MayStats.columns[1]], [JuneStats.columns[1]]]
AprilUsage=np.array(AprilStats[AprilStats.columns[2]].tolist())
MayUsage=np.array(MayStats[MayStats.columns[2]].tolist())
JuneUsage=np.array(JuneStats[JuneStats.columns[2]].tolist())
x = np.array(AprilStats[AprilStats.columns[0]].tolist())
y = np.array(AprilStats[AprilStats.columns[2]].tolist())
my_xticks = AprilStats[AprilStats.columns[1]].tolist()
plt.xticks(x, my_xticks, rotation='55')
x1 = np.array(MayStats[MayStats.columns[0]].tolist())
y1 = np.array(MayStats[MayStats.columns[2]].tolist())
my_xticks1 = MayStats[MayStats.columns[1]].tolist()
plt.xticks(x, my_xticks1, rotation='55')
x2 = np.array(JuneStats[JuneStats.columns[0]].tolist())
y2 = np.array(JuneStats[JuneStats.columns[2]].tolist())
my_xticks2 = JuneStats[JuneStats.columns[1]].tolist()
plt.xticks(x, my_xticks2, rotation='55',)
### Plot the data
plt.rc('xtick', labelsize='xx-small')
plt.title('Little Cup Usage')
plt.ylabel('Usage (Percent)')
plt.plot(x,y,label='April', color='green', alpha=.4)
plt.plot(x1,y1,label='May', color='blue', alpha=.4)
plt.plot(x2,y2,label='June', color='red', alpha=.4)
plt.subplots_adjust(bottom=.2)
plt.legend()
plt.savefig('90daytest.png', dpi=500)
plt.show()
I think I am mislabeling them, but the month of usage isn't stored in the file. I reckon I could add it, but I'd like to not have to go in and edit these files every month. Also, sorry if this is horribly inneficient coding, I have just started learning python less than two weeks ago and this is a little project for me to learn with.
I'd divide this into two steps:
Gather all the data into a single dataframe in which the rows correspond to the different months, the columns to the different names and the values are the usage %.
Plot each column as a different series in a scatter plot.
Step 1:
# Create a dictionary associating a file to each month
files = {dt.date(2019, 4, 1): 'april.csv',
dt.date(2019, 5, 1): 'may.csv'}
# An empty data frame
df = pd.DataFrame()
''' For each file, generate a one entry data frame as follows, and append it to df.
Month name1 name2 ...
2019-1-1 0.5 0.2
'''
for month, file in files.items():
data = pd.read_csv(file, usecols=['name', 'usage'], index_col='name')
data = data.transpose()
data['month'] = month
data = data.set_index('month')
df = df.append(data)
Step 2:
# New figure
fig = plt.figure()
# Plot one series for each column in df
for name in df.columns:
plt.scatter(x=df.index, y=df[name], label=name)
# Additional plot formatting code here
plt.show()
I hope that helps.
Related
I'm playing around with kaggle dataframe to practice using matplotlib.
I was creating bar graph one by one, but it keeps adding up.
When I called plt.show() there were like 10 windows of figure suddenly shows up.
Is it possible to combine 4 of those figures into 1 window?
These part are in the same segments "Time Analysis" So I want to combine these 4 figures in 1 window.
import matplotlib.pyplot as plt
import seaborn as sns
dataset = ('accidents_data.csv')
df = pd.read_csv(dataset)
"""Time Analysis :
Analyze the time that accidents happen for various patterns and trends"""
df.Start_Time = pd.to_datetime(df.Start_Time) #convert the start time column to date time format
df['Hour_of_Accident'] = df.Start_Time.dt.hour #extract the hour from the time data
hour_accident = df['Hour_of_Accident'].value_counts()
hour_accident_df = hour_accident.to_frame() #convert the series data to dataframe in order to sort the index columns
hour_accident_df.index.names = ['Hours'] #naming the index column
hour_accident_df.sort_index(ascending=True, inplace=True)
print(hour_accident_df)
# Plotting the hour of accidents data in a bargraph
hour_accident_df.plot(kind='bar',figsize=(8,4),color='blue',title='Hour of Accident')
#plt.show() #Show the bar graph
"""Analyzing the accident frequency per day of the week"""
df['Day_of_the_week'] = df.Start_Time.dt.day_of_week
day_of_accident = df['Day_of_the_week'].value_counts()
day_of_accident_df = day_of_accident.to_frame() #convert the series data to dataframe so that we can sort the index columns
day_of_accident_df.index.names = ['Day'] # Renaming the index column
day_of_accident_df.sort_index(ascending=True, inplace=True)
print(day_of_accident_df)
f, ax = plt.subplots(figsize = (8, 5))
x = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Sartuday', 'Sunday']
l = day_of_accident_df.index.values
y = day_of_accident_df.Day_of_the_week
plt.bar(l, y, color='green')
plt.title('Day of the week vs total number of accidents')
plt.ylabel("No. of accidents recorded")
ax.set_xticks(l)
ax.set_xticklabels(x)
#plt.show()
"""Analysis for the months"""
df['Month'] = df.Start_Time.dt.month
accident_month = df['Month'].value_counts()
accident_month_df = accident_month.to_frame() #convert the series data to dataframe so that we can sort the index columns
accident_month_df.index.names = ['Month'] # Renaming the index column
accident_month_df.sort_index(ascending=True, inplace=True)
print(accident_month_df)
#Plotting the Bar Graph
accident_month_df.plot(kind='bar',figsize=(8,5),color='purple',title='Month of Accident')
"""Yearly Analysis"""
df['Year_of_accident'] = df.Start_Time.dt.year
#Check the yearly trend
yearly_count = df['Year_of_accident'].value_counts()
yearly_count_df = pd.DataFrame({'Year':yearly_count.index, 'Accidents':yearly_count.values})
yearly_count_df.sort_values(by='Year', ascending=True, inplace=True)
print(yearly_count_df)
#Creating line plot
yearly_count_df.plot.line(x='Year',color='red',title='Yearly Accident Trend ')
plt.show()
I have dataframe as ,
i need something like this for each columns like stress , depression and anxiety and each participant data in each category
i wrote the python code as
ax = data_full.plot(x="participants", y=["Stress","Depression","Anxiety"],kind="line", lw=3, ls='--', figsize = (12,6))
plt.grid(True)
plt.show()
get the output like this
Split the participant column and merge it with the original data frame. Change the data frame to a data frame with only the columns you need in the merged data frame. Transform the data frame in its final form by pivoting. The resulting data frame is then used as the basis for the graph. Now we can adjust the x-axis tick marks, the legend position, and the y-axis limits.
dfs = pd.concat([df,df['participants'].str.split('_', expand=True)],axis=1)
dfs.columns = ['Stress', 'Depression', 'Anxiety', 'participants', 'category', 'group']
fin_df = dfs[['category','group','Stress']]
fin_df = dfs.pivot(index='category', columns='group', values='Stress')
# update
fin_df = fin_df.sort_index(ascending=False)
g = fin_df.plot(kind='line', title='Stress')
g.set_xticks([0,1])
g.set_xticklabels(['pre','post'])
g.legend(loc='center right')
g.set_ylim(5,25)
This question already has answers here:
How to plot different groups of data from a dataframe into a single figure
(5 answers)
Closed 5 months ago.
I am trying to plot multiple different lines on the same graph using pandas and matplotlib.
I have a series of 100 synthetic temperature histories that I want to plot, all in grey, against the original real temperature history I used to generate the data.
How can I plot all of these series on the same graph? I know how to do it in MATLAB but am using Python for this project, and pandas has been the easiest way I have found to read in every single column from the output file without having to specify each column individually. The number of columns of data will change from 100 up to 1000, so I need a generic solution. My code plots each of the data series individually fine, but I just need to work out how to add them both to the same figure.
Here is the code so far:
# dirPath is the path to my working directory
outputFile = "output.csv"
original_data = "temperature_data.csv"
# Read in the synthetic temperatures from the output file, time is the index in the first column
data = pd.read_csv(outputFile,header=None, skiprows=1, index_col=0)
# Read in the original temperature data, time is the index in the first column
orig_data = pd.read_csv(dirPath+original_data,header=None, skiprows=1, index_col=0)
# Convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)
# Plot all columns of synthetic data in grey
data = data.plot.line(title="ARMA Synthetic Temperature Histories",
xlabel="Time (yrs)",
ylabel=("Synthetic avergage hourly temperature (C)"),
color="#929591",
legend=None)
# Plot one column of original data in black
orig_data = orig_data.plot.line(color="k",legend="Original temperature data")
# Create and save figure
fig = data.get_figure()
fig = orig_data.get_figure()
fig.savefig("temp_arma.png")
This is some example data for the output data:
And this is the original data:
Plotting each individually gives these graphs - I just want them overlaid!
Your data.plot.line returns an AxesSubplot instance, you can catch it and feed it to your second command:
# plot 1
ax = data.plot.line(…)
# plot 2
data.plot.line(…, ax=ax)
Try to run this code:
# convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)
# Plot all columns of synthetic data in grey
ax = data.plot.line(title="ARMA Synthetic Temperature Histories",
xlabel="Time (yrs)",
ylabel=("Synthetic avergage hourly temperature (C)"),
color="#929591",
legend=None)
# Plot one column of original data in black
orig_data.plot.line(color="k",legend="Original temperature data", ax=ax)
# Create and save figure
ax.figure.savefig("temp_arma.png")
You should directy use matplotlib functions. It offers more control and is easy to use as well.
Part 1 - Reading files (borrowing your code)
# Read in the synthetic temperatures from the output file, time is the index in the first column
data = pd.read_csv(outputFile,header=None, skiprows=1, index_col=0)
# Read in the original temperature data, time is the index in the first column
orig_data = pd.read_csv(dirPath+original_data,header=None, skiprows=1, index_col=0)
# Convert data to float format
data = data.astype(float)
orig_data = orig_data.astype(float)
Part 2 - Plotting
fig = plt.figure(figsize=(10,8))
ax = plt.gca()
# Plotting all histories:
# 1st column contains time hence excluding
for col in data.columns[1:]:
ax.plot(data["Time"], data[col], color='grey')
# Orig
ax.plot(orig_data["Time"], orig_data["Temperature"], color='k')
# axis labels
ax.set_xlabel("Time (yrs)")
ax.set_ylabel("avergage hourly temperature (C)")
fig.savefig("temp_arma.png")
Try the following:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
data.plot.line(ax=ax,
title="ARMA Synthetic Temperature Histories",
xlabel="Time (yrs)",
ylabel="Synthetic avergage hourly temperature (C)",
color="#929591",
legend=False)
orig_data.rename(columns={orig_data.columns[0]: "Original temperature data"},
inplace=True)
orig_data.plot.line(ax=ax, color="k")
It's pretty much your original code with the following slight modifications:
Getting the ax object
fig, ax = plt.subplots()
and using it for the plotting
data.plot.line(ax=ax, ...
...
orig_data.plot.line(ax=ax, ...)
Result for some randomly generated sample data:
import random # For sample data only
# Sample data
data = pd.DataFrame({
f'col_{i}': [random.random() for _ in range(25)]
for i in range(1, 50)
})
orig_data = pd.DataFrame({
'col_0': [random.random() for _ in range(25)]
})
I have a dataframe as mentioned below:
Date,Time,Price,Volume
31/01/2019,09:15:00,10691.50,600
31/01/2019,09:15:01,10709.90,13950
31/01/2019,09:15:02,10701.95,9600
31/01/2019,09:15:03,10704.10,3450
31/01/2019,09:15:04,10700.05,2625
31/01/2019,09:15:05,10700.05,2400
31/01/2019,09:15:06,10698.10,3000
31/01/2019,09:15:07,10699.90,5925
31/01/2019,09:15:08,10699.25,5775
31/01/2019,09:15:09,10700.45,5925
31/01/2019,09:15:10,10700.00,4650
31/01/2019,09:15:11,10699.40,8025
31/01/2019,09:15:12,10698.95,5025
31/01/2019,09:15:13,10698.45,1950
31/01/2019,09:15:14,10696.15,3900
31/01/2019,09:15:15,10697.15,2475
31/01/2019,09:15:16,10697.05,4275
31/01/2019,09:15:17,10696.25,3225
31/01/2019,09:15:18,10696.25,3300
The data frame contains approx 8000 rows. I want plot both price and volume in same chart. (Volume Range: 0 - 8,00,000)
Suppose you want to compare price and volume vs time, try this:
df = pd.read_csv('your_path_here')
df.plot('Time', ['Price', 'Volume'], secondary_y='Price')
edit: x-axis customization
Since you want x-axis customization,try this (this is just a basic example you can follow):
# Create a Datetime column while parsing the csv file
df = pd.read_csv('your_path_here', parse_dates= {'Datetime': ['Date', 'Time']})
Then you need to create two list, one containing the position on the x-axis and the other one the labels.
Say you want labels every 5 seconds (your requests at 30 min is possibile but not with the data you provided)
positions = [p for p in df.Datetime if p.second in range(0, 60, 5)]
labels = [l.strftime('%H:%M:%S') for l in positions]
Then you plot passing the positions and labels lists to set_xticks and set_xticklabels
ax = df.plot('Datetime', ['Price', 'Volume'], secondary_y='Price')
ax.set_xticks(positions)
ax.set_xticklabels(labels)
I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
else:
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
Code:
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
Image:
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.