Integrating over range of dates, and labeling the xaxis - python

I am trying to integrate 2 curves as they change through time using pandas. I am loading data from a CSV file like such:
Where the Dates are the X-axis and both the Oil & Water points are the Y-axis. I have learned to use the cross-section option to isolate the "NAME" values, but am having trouble finding a good way to integrate with dates as the X-axis. I eventually would like to be able to take the integrals of both curves and stack them against each other. I am also having trouble with the plot defaulting the x-ticks to arbitrary values, instead of the dates.
I can change the labels/ticks manually, but have a large CSV to process and would like to automate the process. Any help would be greatly appreciated.
NAME,DATE,O,W
A,1/20/2000,12,50
B,1/20/2000,25,28
C,1/20/2000,14,15
A,1/21/2000,34,50
B,1/21/2000,8,3
C,1/21/2000,10,19
A,1/22/2000,47,35
B,1/22/2000,4,27
C,1/22/2000,46,1
A,1/23/2000,19,31
B,1/23/2000,18,10
C,1/23/2000,19,41
Contents of CSV in text form above.

Further to my comment above, here is some sample code (using logic from the example mentioned) to label your xaxis with formatted dates. Hope this helps.
Data Collection / Imports:
Just re-creating your dataset for the example.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
header = ['NAME', 'DATE', 'O', 'W']
data = [['A','1/20/2000',12,50],
['B','1/20/2000',25,28],
['C','1/20/2000',14,15],
['A','1/21/2000',34,50],
['B','1/21/2000',8,3],
['C','1/21/2000',10,19],
['A','1/22/2000',47,35],
['B','1/22/2000',4,27],
['C','1/22/2000',46,1],
['A','1/23/2000',19,31],
['B','1/23/2000',18,10],
['C','1/23/2000',19,41]]
df = pd.DataFrame(data, columns=header)
df['DATE'] = pd.to_datetime(df['DATE'], format='%m/%d/%Y')
# Subset to just the 'A' labels.
df_a = df[df['NAME'] == 'A']
Plotting:
# Define the number of ticks you need.
nticks = 4
# Define the date format.
mask = '%m-%d-%Y'
# Create the set of custom date labels.
step = int(df_a.shape[0] / nticks)
xdata = np.arange(df_a.shape[0])
xlabels = df_a['DATE'].dt.strftime(mask).tolist()[::step]
# Create the plot.
fig, ax = plt.subplots(1, 1)
ax.plot(xdata, df_a['O'], label='Oil')
ax.plot(xdata, df_a['W'], label='Water')
ax.set_xticks(np.arange(df_a.shape[0], step=step))
ax.set_xticklabels(xlabels, rotation=45, horizontalalignment='right')
ax.set_title('Test in Naming Labels for the X-Axis')
ax.legend()
Output:

I'd recommend modifying the X-axis into some form of integers or floats (Seconds, minutes, hours days since a certain time, based on the precision that you need). You can then use usual methods to integrate and the x-axes would no longer default to some other values.
See How to convert datetime to integer in python

Related

How do you get dates on the start on the specified month? (matplotlib)

# FEB
# configuring the figure and plot space
fig, lx = plt.subplots(figsize=(30,10))
# converting the Series into str so the data can be plotted
wd = df2['Unnamed: 1']
wd = wd.astype(float)
# adding the x and y axes' values
lx.plot(list(df2.index.values), wd)
# defining what the labels will be
lx.set(xlabel='Day', ylabel='Weight', title='Daily Weight February 2022')
# defining the date format
date_format = DateFormatter('%m-%d')
lx.xaxis.set_major_formatter(date_format)
lx.xaxis.set_minor_locator(mdates.WeekdayLocator(interval=1))
Values I would like the x-axis to have:
['2/4', '2/5', '2/6', '2/7', '2/8', '2/9', '2/10', '2/11', '2/12', '2/13', '2/14', '2/15', '2/16', '2/17', '2/18', '2/19', '2/20', '2/21', '2/22', '2/23', '2/24', '2/25', '2/26', '2/27']
Values on the x-axis:
enter image description here
It is giving me the right number of values just not the right labels. I have tried to specify the start and end with xlim=['2/4', '2/27], however that did seem to work.
It would be great to see how your df2 actually looks, but from your code snippet, it looks like it has weights recorded but not the corresponding dates.
How about prepare a data frame that has dates in it?
(Also, since this question is tagged with seaborn too, I'm going to use Seaborn, but the same idea should work.)
import pandas as pd
import seaborn as sns
import seaborn.objects as so
from matplotlib.dates import DateFormatter
sns.set_theme()
Create an index with the dates starting from 4 Feb with the number of days we have weight recorded.
index = pd.date_range(start="2/4/2022", periods=df.count().Weight, name="Date")
Then with Seaborn's object interface (v0.12+), we can do:
(
so.Plot(df2.set_index(index), x="Date", y="Weight")
.add(so.Line())
.scale(x=so.Temporal().label(formatter=DateFormatter('%m-%d')))
.label(title="Daily Weight February 2022")
)
I have solved this solution. Very simple. I just added mdates.WeekdayLocator() to set_major_formatter. I overlooked this when I was going through the matplotlib docs. But happy to have found this solution.

Plot data frame fast and with correct date format

I have the data as in the screenshot, it is in a dataframe format, I would like to plot the dataframe fast and with correct date format.
The code as follow is much fast than using e.g plt.plot(df["Date"], df["D30"])
df.plot(marker='.', linestyle='none')
So that I would like to keep using the dataframe.plot() functionality directly because it is much faster than plot each column against the "Date" column separately. However, as shown in the graph, the date is not correct. My actually starting Date is 2006-01-10, but in the figure, it is shown from 70-01 (1970-01-01).
For me, the official documentation of matplotlib DateFormatter is quite confusing and not so helpful. I tried to google a easy and clear solution, but most answers are related to plt.plot(x, y) where x is Date and y is the actual value. After that it is easy to adjust the format of the "Date" in the figure. But it will make my plot super slow since I am plotting 11 columns in total.
Any idea how I can plot data frame fast and with correct date format
import os
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
date_format = mdates.DateFormatter('%y%m')
df_file = r"C:\Codes\df_file.csv"
df = pd.read_csv(path_file)
print(len(df), df.info(), df["Date"][0], type(df["Date"][0]))
df.head(2)
fig = plt.figure(figsize=(12.0, 8.0))
df.plot(marker='.', linestyle='none')
plt.title("data_frame_show date", fontsize=16)
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.legend(loc=(1.04, 0))
plt.show()
partial input:
Date,D10,D30,D60,D91,D122,D152,D182,D273,D365,D547,D730
2006-01-10,,0.1373444,0.1544265,0.1541397,0.1429375,0.1421464,0.1426055,0.1460771,0.1486266,0.1551848,0.1593932
2006-01-11,,0.135426,0.1411246,0.141093,0.1384091,0.1383636,0.1395791,0.1438944,0.1469191,0.1553112,0.1598582
2006-01-12,,0.1311339,0.1292621,0.1304292,0.1363482,0.1362213,0.1367843,0.1404174,0.1439877,0.152306,0.1568677
2006-01-13,,0.1594458,0.1355387,0.1367246,0.1434708,0.143745,0.1441349,0.1453056,0.1481918,0.157193,0.1607564
2006-01-16,,0.1374846,0.1182223,0.1272385,0.1415359,0.1418881,0.1430098,0.1468544,0.1496407,0.1547714,0.158936
2006-01-17,,0.1453834,0.1418838,0.143198,0.1437924,0.143473,0.1440987,0.1473208,0.1501543,0.1590842,0.1629096
2006-01-18,,0.1385479,0.141472,0.1481763,0.1515037,0.1511353,0.1511544,0.1535245,0.1554254,0.1626349,0.1663554
2006-01-19,,0.1639788,0.1462084,0.1483903,0.1486906,0.1483109,0.1492335,0.1539002,0.1563708,0.1611751,0.1644693
2006-01-20,,0.189771,0.178394,0.1638331,0.1565402,0.1559029,0.1553547,0.1526479,0.1516396,0.1614136,0.1646431
2006-01-23,,0.1420271,0.1570005,0.1614942,0.1607205,0.1605297,0.1630065,0.1653838,0.1642349,0.166809,0.1701779
2006-01-24,,0.1814291,0.1633585,0.1563364,0.1548823,0.15382,0.1545099,0.1590869,0.1609158,0.1653819,0.1681759
2006-01-25,,0.1272998,0.1445222,0.1487031,0.1522032,0.152714,0.1524364,0.1532192,0.1550062,0.1635665,0.1658293
2006-01-26,,0.1392162,0.1413034,0.1443807,0.1476261,0.1482458,0.1473548,0.1471019,0.1493254,0.1578586,0.160699
2006-01-27,,0.1360269,0.1374056,0.1387952,0.1426731,0.1441445,0.144917,0.1462428,0.1478979,0.1519537,0.1550311
2006-01-30,,0.1439245,0.1430108,0.1434628,0.1448731,0.1450397,0.1454756,0.1467621,0.1487521,0.1538424,0.1561802
2006-01-31,,0.1483135,0.1468713,0.1473837,0.1519043,0.1519379,0.1502139,0.1504632,0.1529254,0.1571567,0.1589795
2006-02-01,,0.1464208,0.1447363,0.1443483,0.1459808,0.1477726,0.1505124,0.1520256,0.1535773,0.1589145,0.1607383
2006-02-02,,0.1484249,0.1414394,0.1412338,0.1497531,0.1500731,0.1475751,0.147502,0.1512457,0.1571017,0.1606797
2006-02-03,,0.1496503,0.1485318,0.1502473,0.1565336,0.156727,0.1556335,0.1560396,0.1579241,0.1619183,0.1634751
2006-02-06,,0.149966,0.1457216,0.1475524,0.1539103,0.1546401,0.154973,0.1553681,0.1570598,0.161173,0.1630743
2006-02-08,,0.1463649,0.1436135,0.1454147,0.1498372,0.1507231,0.1520234,0.1538407,0.1563603,0.1617697,0.1639547
2006-02-09,,0.1401312,0.1432856,0.1437166,0.1443243,0.1463163,0.148681,0.1496198,0.1516376,0.1584639,0.1615756
2006-02-10,,0.1339916,0.1405194,0.1432779,0.1464605,0.1470921,0.1484831,0.1514307,0.1550715,0.1599564,0.1623171
2006-02-13,,0.1470304,0.1423007,0.1446087,0.1470668,0.1485171,0.1503383,0.1508497,0.1532987,0.1591155,0.1615874
2006-02-14,,0.1454322,0.1449017,0.1455735,0.1462286,0.1478059,0.1501469,0.1522522,0.1541999,0.157668,0.1601427
2006-02-15,,0.1429312,0.1455881,0.1464055,0.1471812,0.1489883,0.1514654,0.153837,0.1559375,0.16082,0.1631557
2006-02-16,,0.134637,0.1373471,0.140634,0.1432172,0.145788,0.14875,0.1507805,0.15325,0.1581015,0.1613797
2006-02-20,,0.1303785,0.1334454,0.139216,0.1423217,0.1454704,0.1477552,0.1487534,0.1509405,0.1554398,0.1588761
2006-02-21,,0.1359587,0.1370814,0.1416117,0.1418016,0.1441761,0.1468109,0.1476679,0.1496546,0.1561362,0.1607204
2006-02-22,,0.1302253,0.1337104,0.1415016,0.141451,0.1438881,0.1467031,0.1502449,0.1514018,0.1531452,0.1582335
2006-02-23,,0.1282022,0.1333902,0.1342376,0.1385976,0.1453201,0.1481733,0.1490296,0.1512885,0.1554035,0.1593463
2006-02-24,,0.1269229,0.1304391,0.1348061,0.1378378,0.1419301,0.1442134,0.1472283,0.1507224,0.1555662,0.1595938
2006-02-27,,0.1254707,0.128201,0.1334554,0.1374389,0.1427246,0.1446071,0.1465459,0.1496113,0.1541296,0.1578174
2006-02-28,,0.1346332,0.1361773,0.139586,0.1421924,0.1468084,0.1489651,0.1505661,0.1541479,0.1606205,0.1675438
2006-03-01,,0.1301198,0.1318495,0.1343342,0.1376886,0.1434328,0.1459977,0.1490832,0.1525961,0.1557153,0.1593923
2006-03-02,,0.1304425,0.1347556,0.1398592,0.1420431,0.1457691,0.1479747,0.1510143,0.1544964,0.1589201,0.1616325
2006-03-03,,0.1311674,0.1339681,0.138887,0.1418598,0.1451706,0.1472144,0.1495689,0.1536886,0.1599843,0.162247
2006-03-06,,0.1308081,0.1367775,0.1412145,0.1436582,0.1480171,0.1495588,0.1511633,0.1545973,0.1588486,0.1616268
2006-03-07,,0.1344355,0.1387528,0.143365,0.1459607,0.1482421,0.1491656,0.1512236,0.1550063,0.1593201,0.1615385
When plotting time series, pandas takes the index for the x-axis when calling the plot function.
I would suggest to:
df = df.assign(
Date=lambda x: pd.to_datetime(x["Date"], format="%Y-%m%d")
).set_index("Date")

How to add a string comment above every single candle using mplfinance.plot() or any similar package?

i want to add a string Comment above every single candle using mplfinance package .
is there a way to do it using mplfinance or any other package ?
here is the code i used :
import pandas as pd
import mplfinance as mpf
import matplotlib.animation as animation
from mplfinance import *
import datetime
from datetime import date, datetime
fig = mpf.figure(style="charles",figsize=(7,8))
ax1 = fig.add_subplot(1,1,1 , title='ETH')
def animate(ival):
idf = pd.read_csv("test1.csv", index_col=0)
idf['minute'] = pd.to_datetime(idf['minute'], format="%m/%d/%Y %H:%M")
idf.set_index('minute', inplace=True)
ax1.clear()
mpf.plot(idf, ax=ax1, type='candle', ylabel='Price US$')
ani = animation.FuncAnimation(fig, animate, interval=250)
mpf.show()
You should be able to do this using Axes.text()
After calling mpf.plot() then call
ax1.text()
for each text that you want (in your case for each candle).
There is an important caveat regarding the x-axis values that you pass into ax1.text():
If you do not specify show_nontrading=True then it will default to False in which case the x-axis value that you pass into ax1.text() for the position of the text must be the row number corresponding to the candle where you want the text counting from 0 for the first row in your DataFrame.
On the other hand if you do set show_nontrading=True then the x-axis value that you pass into ax1.text() will need to be the matplotlib datetime. You can convert pandas datetimes from you DataFrame DatetimeIndex into matplotlib datetimes as follows:
import matplotlib.dates as mdates
my_mpldates = mdates.date2num(idf.index.to_pydatetime())
I suggest using the first option (DataFrame row number) because it is simpler. I am currently working on an mplfinance enhancement that will allow you to enter the x-axis values as any type of datetime object (which is the more intuitive way to do it) however it may be another month or two until that enhancement is complete, as it is not trivial.
Code example, using data from the mplfinance repository examples data folder:
import pandas as pd
import mplfinance as mpf
infile = 'data/yahoofinance-SPY-20200901-20210113.csv'
# take rows [18:28] to keep the demo small:
df = pd.read_csv(infile, index_col=0, parse_dates=True).iloc[18:25]
fig, axlist = mpf.plot(df,type='candle',volume=True,
ylim=(330,345),returnfig=True)
x = 1
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'Custom\nText\nHere')
x = 3
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'High here\n= '+str(y-1),fontstyle='italic')
x = 5
y = df.loc[df.index[x],'High']+1
axlist[0].text(x-0.2,y,'More\nCustom\nText\nHere',fontweight='bold')
mpf.show()
Comments on the above code example:
I am setting the ylim=(330,345) in order to provide a little extra room above the candles for the text. In practice you might choose the high dynamically as perhaps high_ylim = 1.03*max(df['High'].values).
Notice that the for first two candles with text, the text begins at the center of the candle. The 3rd text call uses x-0.2 to position the text more over the center of the candle.
For this example, the y location of the candle is determined by taking the high of that candle and adding 1. (y = df.loc[df.index[x],'High']+1) Of course adding 1 is arbitrary, and in practice, depending on the maginitude of your prices, adding 1 may be too little or too much. Rather you may want to add a small percentage, for example 0.2 percent:
y = df.loc[df.index[x],'High']
y = y * 1.002
Here is the plot the above code generates:

Graphing a dataframe line plot with a legend in Matplotlib

I'm working with a dataset that has grades and states and need to create line graphs by state showing what percent of each state's students fall into which bins.
My methodology (so far) is as follows:
First I import the dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
records = [{'Name':'A', 'Grade':'.15','State':'NJ'},{'Name':'B', 'Grade':'.15','State':'NJ'},{'Name':'C', 'Grade':'.43','State':'CA'},{'Name':'D', 'Grade':'.75','State':'CA'},{'Name':'E', 'Grade':'.17','State':'NJ'},{'Name':'F', 'Grade':'.85','State':'HI'},{'Name':'G', 'Grade':'.89','State':'HI'},{'Name':'H', 'Grade':'.38','State':'CA'},{'Name':'I', 'Grade':'.98','State':'NJ'},{'Name':'J', 'Grade':'.49','State':'NJ'},{'Name':'K', 'Grade':'.17','State':'CA'},{'Name':'K', 'Grade':'.94','State':'HI'},{'Name':'M', 'Grade':'.33','State':'HI'},{'Name':'N', 'Grade':'.22','State':'NJ'},{'Name':'O', 'Grade':'.7','State':'NJ'}]
df = pd.DataFrame(records)
df.Grade = df.Grade.astype(float)
Next I cut each grade into a bin
df['bin'] = pd.cut(df['Grade'],[-np.inf,.05,.1,.15,.2,.25,.3,.35,.4,.45,.5,.55,.6,.65,.7,.75,.8,.85,.9,.95,1],labels=False)/10
Then I create a pivot table giving me the count of people by bin in each state
df2 = pd.pivot_table(df,index=['bin'],columns='State',values=['Name'],aggfunc=pd.Series.nunique,margins=True)
df2 = df2.fillna(0)
Then I convert those n-counts into percentages and remove the margin rows
df3 = df2.div(df2.iloc[-1])
df3 = df3.iloc[:-1,:-1]
Now I want to create a line graph with multiple lines (one for each state) with the bin on the x axis and the percentage on the Y axis. df3.plot() will give me the chart I want but I would like to accomplish the same using matplotlib, because it offers me greater customization of the graph. However, running
plt.plot(df3)
gives me the lines I need but I can't get the legend the work properly. Any thoughts on how to accomplish this?
It may not be the best way, but I use the pandas plot function to draw df3, then get the legend and get the new label names. Please note that the processing of the legend string is limited to this data.
line = df3.plot(kind='line')
handles, labels = line.get_legend_handles_labels()
label = []
for l in labels:
label.append(l[7:-1])
plt.legend(handles, label, loc='best')
You can do this:
plt.plot(df3,label="label")
plt.legend()
plt.show()
For more information visit here
And if it helps you to solve your issues then don't forget to mark this as accepted answer.

Inconsistent automatic pandas date labeling

I was wondering how pandas formats the x-axis date exactly. I am using the same script on a bunch of data results, which all have the same pandas df format. However, pandas formats each df date differently. How could this be more consistently?
Each df has a DatetimeIndex like this, dtype='datetime64[ns]
>>> df.index
DatetimeIndex(['2014-10-02', '2014-10-03', '2014-10-04', '2014-10-05',
'2014-10-06', '2014-10-07', '2014-10-08', '2014-10-09',
'2014-10-10', '2014-10-11',
...
'2015-09-23', '2015-09-24', '2015-09-25', '2015-09-26',
'2015-09-27', '2015-09-28', '2015-09-29', '2015-09-30',
'2015-10-01', '2015-10-02'],
dtype='datetime64[ns]', name='Date', length=366, freq=None)
Eventually, I plot with df.plot() where the df has two columns.
But the axes of the plots have different styles, like this:
I would like all plots to have the x-axis style of the first plot. pandas should do this automatically, so I'd rather not prefer to begin with xticks formatting, since I have quite a lot of data to plot. Could anyone explain what to do? Thanks!
EDIT:
I'm reading two csv-files from 2015. The first has the model results of about 200 stations, the second has the gauge measurements of the same stations. Later, I read another two csv-files from 2016 with the same format.
import pandas as pd
df_model = pd.read_csv(path_model, sep=';', index_col=0, parse_dates=True)
df_gauge = pd.read_csv(path_gauge, sep=';', index_col=0, parse_dates=True)
df = pd.DataFrame(columns=['model', 'gauge'], index=df_model.index)
df['model'] = df_model['station_1'].copy()
df['gauge'] = df_gauge['station_1'].copy()
df.plot()
I do this for each year, so the x-axis should look the same, right?
I do not think this possible unless you make modifications to the pandas library. I looked around a bit for options that one may set in Pandas, but couldn't find one. Pandas tries to intelligently select the type of axis ticks using logic implemented here (I THINK). So in my opinion, it would be best to define your own function to make the plots and than overwrite the tick formatting (although you do not want to do that).
There are many references around the internet which show how to do this. I used this one by "Simone Centellegher" and this stackoverflow answer to come up with a function that may work for you (tested in python 3.7.1 with matplotlib 3.0.2, pandas 0.23.4):
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
## pass df with columns you want to plot
def my_plotter(df, xaxis, y_cols):
fig, ax = plt.subplots()
plt.plot(xaxis,df[y_cols])
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
# Remove overlapping major and minor ticks
majticklocs = ax.xaxis.get_majorticklocs()
minticklocs = ax.xaxis.get_minorticklocs()
minticks = ax.xaxis.get_minor_ticks()
for i in range(len(minticks)):
cur_mintickloc = minticklocs[i]
if cur_mintickloc in majticklocs:
minticks[i].set_visible(False)
return fig, ax
df = pd.DataFrame({'values':np.random.randint(0,1000,36)}, \
index=pd.date_range(start='2014-01-01', \
end='2016-12-31',freq='M'))
fig, ax = my_plotter(df, df.index, ["values"])

Categories