Matplotlib Plot points on an existing line, only by knowing x values

Matplotlib Plot points on an existing line, only by knowing x values - python

I have 2 dataframes:
'stock' is a dataframe with columns Date and Price.
'events' is a dataframe with columns Date and Text.
My goal is to produce a graph of the stock prices and on the line place dots where the events occur. However, I do not know how to do 'y' value for the events dataframe as I want it to be where it is on the stock dataframe.
I am able to plot the first dataframe fine with:
plt.plot('Date', 'Price', data=stock)
And I try to plot the event dots with:
plt.scatter('created_at', ???, data=events)
However, it is the ??? that I don't know how to set

Assuming Date and created_at are datetime:
stock = pd.DataFrame({'Date':['2021-01-01','2021-02-01','2021-03-01','2021-04-01','2021-05-01'],'Price':[1,5,3,4,10]})
events = pd.DataFrame({'created_at':['2021-02-01','2021-03-01'],'description':['a','b']})
stock.Date = pd.to_datetime(stock.Date)
events.created_at = pd.to_datetime(events.created_at)
Filter stock by events.created_at (or merge) and plot them onto the same ax :
stock_events = stock[stock.Date.isin(events.created_at)]
# or merge on the date columns
# stock_events = stock.merge(events, left_on='Date', right_on='created_at')
ax = stock.plot(x='Date', y='Price')
stock_events.plot.scatter(ax=ax, x='Date', y='Price', label='Event', c='r', s=50)

Related

Changing plot title through loop

I am new to python and need your help. I have several dataframes. Each dataframe is for one day. So I am using for loop to plot for all dataframe. For each plot I want to add the date in my title. Can anyone help me. I have created a variable 'date_created and assigned the dates which I want. I want my title to look like below :
'Voltage vs time
28-01-2022'
for df in (df1,df2,df3,df4,df5,df6,df7,df8):
y = df[' Voltage']
x = df['time']
date_created = [ '28-01-2022, 29-01-2022, 30-01-2022, 31-08-2022, 01-02-2022, 02-02-2022, 03-02-2022, 04-02-2022' ]
fig, ax = plt.subplots(figsize=(18,7))
plt.plot(x,y, 'b')
plt.xlabel("time")
plt.ylabel(" Voltage [V]")
plt.title("Voltage vs time")

To make code work more effective it would be better to create a dictionary of dataframes and dates (if you haven't got date column in your dataframe).
dict = {df1: '28-01-2022', df2: '29-01-2022', df3: '30-01-2022'}
Than we will use for loop for elements of this dictionary
for key, value in dict.items():
y = key['Voltage']
x = key['time']
fig, ax = plt.subplots(figsize=(18,7))
plt.plot(x,y, 'b')
plt.xlabel("time")
plt.ylabel(" Voltage [V]")
plt.title(f"Voltage vs time {value}")
Hope this will work for you!

How to do line plot for each column separately based on another column for each two samples for pandas dataframe?

I have dataframe as ,
i need something like this for each columns like stress , depression and anxiety and each participant data in each category
i wrote the python code as
ax = data_full.plot(x="participants", y=["Stress","Depression","Anxiety"],kind="line", lw=3, ls='--', figsize = (12,6))
plt.grid(True)
plt.show()
get the output like this

Split the participant column and merge it with the original data frame. Change the data frame to a data frame with only the columns you need in the merged data frame. Transform the data frame in its final form by pivoting. The resulting data frame is then used as the basis for the graph. Now we can adjust the x-axis tick marks, the legend position, and the y-axis limits.
dfs = pd.concat([df,df['participants'].str.split('_', expand=True)],axis=1)
dfs.columns = ['Stress', 'Depression', 'Anxiety', 'participants', 'category', 'group']
fin_df = dfs[['category','group','Stress']]
fin_df = dfs.pivot(index='category', columns='group', values='Stress')
# update
fin_df = fin_df.sort_index(ascending=False)
g = fin_df.plot(kind='line', title='Stress')
g.set_xticks([0,1])
g.set_xticklabels(['pre','post'])
g.legend(loc='center right')
g.set_ylim(5,25)

How to plot from multiple datasets with same fieldnames?

I have a few monthly datasets of usage stats stored in different CSVs, with a couple hundred fields. I am cutting off the top 30 of each one, but the bottom will change (and the top as changes as stuff is banned, albeit less commonly). Currently I have the lines representing months, but I want the points to be (y=usage %) and (x=month) with the legend being different users.
column[0] is their number in the file (1-30)
column[1] is their name
column[2] is the usage percent
AprilStats = pd.read_csv(r'filepath', nrows=30)
MayStats = pd.read_csv(r'filepath', nrows=30)
JuneStats = pd.read_csv(r'filepath', nrows=30)
## Assign labels and sources
labels = [[AprilStats.columns[1]], [MayStats.columns[1]], [JuneStats.columns[1]]]
AprilUsage=np.array(AprilStats[AprilStats.columns[2]].tolist())
MayUsage=np.array(MayStats[MayStats.columns[2]].tolist())
JuneUsage=np.array(JuneStats[JuneStats.columns[2]].tolist())
x = np.array(AprilStats[AprilStats.columns[0]].tolist())
y = np.array(AprilStats[AprilStats.columns[2]].tolist())
my_xticks = AprilStats[AprilStats.columns[1]].tolist()
plt.xticks(x, my_xticks, rotation='55')
x1 = np.array(MayStats[MayStats.columns[0]].tolist())
y1 = np.array(MayStats[MayStats.columns[2]].tolist())
my_xticks1 = MayStats[MayStats.columns[1]].tolist()
plt.xticks(x, my_xticks1, rotation='55')
x2 = np.array(JuneStats[JuneStats.columns[0]].tolist())
y2 = np.array(JuneStats[JuneStats.columns[2]].tolist())
my_xticks2 = JuneStats[JuneStats.columns[1]].tolist()
plt.xticks(x, my_xticks2, rotation='55',)
### Plot the data
plt.rc('xtick', labelsize='xx-small')
plt.title('Little Cup Usage')
plt.ylabel('Usage (Percent)')
plt.plot(x,y,label='April', color='green', alpha=.4)
plt.plot(x1,y1,label='May', color='blue', alpha=.4)
plt.plot(x2,y2,label='June', color='red', alpha=.4)
plt.subplots_adjust(bottom=.2)
plt.legend()
plt.savefig('90daytest.png', dpi=500)
plt.show()
I think I am mislabeling them, but the month of usage isn't stored in the file. I reckon I could add it, but I'd like to not have to go in and edit these files every month. Also, sorry if this is horribly inneficient coding, I have just started learning python less than two weeks ago and this is a little project for me to learn with.

I'd divide this into two steps:
Gather all the data into a single dataframe in which the rows correspond to the different months, the columns to the different names and the values are the usage %.
Plot each column as a different series in a scatter plot.
Step 1:
# Create a dictionary associating a file to each month
files = {dt.date(2019, 4, 1): 'april.csv',
dt.date(2019, 5, 1): 'may.csv'}
# An empty data frame
df = pd.DataFrame()
''' For each file, generate a one entry data frame as follows, and append it to df.
Month name1 name2 ...
2019-1-1 0.5 0.2
'''
for month, file in files.items():
data = pd.read_csv(file, usecols=['name', 'usage'], index_col='name')
data = data.transpose()
data['month'] = month
data = data.set_index('month')
df = df.append(data)
Step 2:
# New figure
fig = plt.figure()
# Plot one series for each column in df
for name in df.columns:
plt.scatter(x=df.index, y=df[name], label=name)
# Additional plot formatting code here
plt.show()
I hope that helps.

How to plot both Price and Volume in same Chart

I have a dataframe as mentioned below:
Date,Time,Price,Volume
31/01/2019,09:15:00,10691.50,600
31/01/2019,09:15:01,10709.90,13950
31/01/2019,09:15:02,10701.95,9600
31/01/2019,09:15:03,10704.10,3450
31/01/2019,09:15:04,10700.05,2625
31/01/2019,09:15:05,10700.05,2400
31/01/2019,09:15:06,10698.10,3000
31/01/2019,09:15:07,10699.90,5925
31/01/2019,09:15:08,10699.25,5775
31/01/2019,09:15:09,10700.45,5925
31/01/2019,09:15:10,10700.00,4650
31/01/2019,09:15:11,10699.40,8025
31/01/2019,09:15:12,10698.95,5025
31/01/2019,09:15:13,10698.45,1950
31/01/2019,09:15:14,10696.15,3900
31/01/2019,09:15:15,10697.15,2475
31/01/2019,09:15:16,10697.05,4275
31/01/2019,09:15:17,10696.25,3225
31/01/2019,09:15:18,10696.25,3300
The data frame contains approx 8000 rows. I want plot both price and volume in same chart. (Volume Range: 0 - 8,00,000)

Suppose you want to compare price and volume vs time, try this:
df = pd.read_csv('your_path_here')
df.plot('Time', ['Price', 'Volume'], secondary_y='Price')
edit: x-axis customization
Since you want x-axis customization,try this (this is just a basic example you can follow):
# Create a Datetime column while parsing the csv file
df = pd.read_csv('your_path_here', parse_dates= {'Datetime': ['Date', 'Time']})
Then you need to create two list, one containing the position on the x-axis and the other one the labels.
Say you want labels every 5 seconds (your requests at 30 min is possibile but not with the data you provided)
positions = [p for p in df.Datetime if p.second in range(0, 60, 5)]
labels = [l.strftime('%H:%M:%S') for l in positions]
Then you plot passing the positions and labels lists to set_xticks and set_xticklabels
ax = df.plot('Datetime', ['Price', 'Volume'], secondary_y='Price')
ax.set_xticks(positions)
ax.set_xticklabels(labels)

Add Pivot table columns and index as xticks and yticks

I have a pivot table created according to this: Color mapping of data on a date vs time plot and plot it with imshow(). I want to use the index and columns of the pivot table as yticks and xticks. The columns in my pivot table are dates and the index are time of the day.
data = pd.DataFrame()
data['Date']=Tgrad_GFAVD_3m2mRot.index.date
data['Time']=Tgrad_GFAVD_3m2mRot.index.strftime("%H")
data['Tgrad']=Tgrad_GFAVD_3m2mRot.values
C = data.pivot(index='Time', columns='Date', values='Tgrad')
print(C.head()):
Date 2016-08-01 2016-08-02 2016-08-03 2016-08-04 2016-08-05 2016-08-06 \
Time
00 -0.841203 -0.541871 -0.042984 -0.867929 -0.790869 -0.940757
01 -0.629176 -0.520935 -0.194655 -0.866815 -0.794878 -0.910690
02 -0.623608 -0.268820 -0.255457 -0.859688 -0.824276 -0.913808
03 -0.615145 -0.008241 -0.463920 -0.909354 -0.811136 -0.878619
04 -0.726949 -0.169488 -0.529621 -0.897773 -0.833408 -0.825612
I plot the pivot table with
fig, ax = plt.subplots(figsize = (16,9))
plt = ax.imshow(C,aspect = 'auto', extent=[0,len(data["Date"]),0,23], origin = "lower")
I tried a couple of things but nothing worked. At the moment my xticks range between 0 and 6552, which is the length of the C.columns object and is set by the extent argument in imshow()
I would like to have the xticks at every first of the month but not by index number but as a datetick in the format "2016-08-01" for example.
I am sure it was just a small thing that has been stopping me the last hour, but now I give up. Do you know how to set the xticks accordingly?

I found the solution myself after trying one more thing.. I created another column in the "data" Dataframe with datenum entries instead of dates
data["datenum"]=mdates.date2num(data["Date"])
Then changed the plot line to:
pl = ax.imshow(C,aspect = 'auto',
extent=[data["datenum"].iloc[0],data["datenum"].iloc[-1],data["Time"].iloc[0],data["Time"].iloc[-1]],
origin = "lower")
So the change of the extent argument provided the datenum values to the plot instead of the index of the date column.
Then with this the following lines worked:
ax.set_yticks(data["Time"]) # sets yticks
ax.xaxis_date() # tells the xaxis that it should expect datetime values
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m/%d") ) # formats the datetime values
fig.autofmt_xdate() # makes it look nice
Best,
Vroni

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib Plot points on an existing line, only by knowing x values - python

Related

Changing plot title through loop

How to do line plot for each column separately based on another column for each two samples for pandas dataframe?

How to plot from multiple datasets with same fieldnames?

How to plot both Price and Volume in same Chart

Add Pivot table columns and index as xticks and yticks

Categories

Resources