Integrating over range of dates, and labeling the xaxis - python
I am trying to integrate 2 curves as they change through time using pandas. I am loading data from a CSV file like such:
Where the Dates are the X-axis and both the Oil & Water points are the Y-axis. I have learned to use the cross-section option to isolate the "NAME" values, but am having trouble finding a good way to integrate with dates as the X-axis. I eventually would like to be able to take the integrals of both curves and stack them against each other. I am also having trouble with the plot defaulting the x-ticks to arbitrary values, instead of the dates.
I can change the labels/ticks manually, but have a large CSV to process and would like to automate the process. Any help would be greatly appreciated.
NAME,DATE,O,W
A,1/20/2000,12,50
B,1/20/2000,25,28
C,1/20/2000,14,15
A,1/21/2000,34,50
B,1/21/2000,8,3
C,1/21/2000,10,19
A,1/22/2000,47,35
B,1/22/2000,4,27
C,1/22/2000,46,1
A,1/23/2000,19,31
B,1/23/2000,18,10
C,1/23/2000,19,41
Contents of CSV in text form above.
Further to my comment above, here is some sample code (using logic from the example mentioned) to label your xaxis with formatted dates. Hope this helps.
Data Collection / Imports:
Just re-creating your dataset for the example.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
header = ['NAME', 'DATE', 'O', 'W']
data = [['A','1/20/2000',12,50],
['B','1/20/2000',25,28],
['C','1/20/2000',14,15],
['A','1/21/2000',34,50],
['B','1/21/2000',8,3],
['C','1/21/2000',10,19],
['A','1/22/2000',47,35],
['B','1/22/2000',4,27],
['C','1/22/2000',46,1],
['A','1/23/2000',19,31],
['B','1/23/2000',18,10],
['C','1/23/2000',19,41]]
df = pd.DataFrame(data, columns=header)
df['DATE'] = pd.to_datetime(df['DATE'], format='%m/%d/%Y')
# Subset to just the 'A' labels.
df_a = df[df['NAME'] == 'A']
Plotting:
# Define the number of ticks you need.
nticks = 4
# Define the date format.
mask = '%m-%d-%Y'
# Create the set of custom date labels.
step = int(df_a.shape[0] / nticks)
xdata = np.arange(df_a.shape[0])
xlabels = df_a['DATE'].dt.strftime(mask).tolist()[::step]
# Create the plot.
fig, ax = plt.subplots(1, 1)
ax.plot(xdata, df_a['O'], label='Oil')
ax.plot(xdata, df_a['W'], label='Water')
ax.set_xticks(np.arange(df_a.shape[0], step=step))
ax.set_xticklabels(xlabels, rotation=45, horizontalalignment='right')
ax.set_title('Test in Naming Labels for the X-Axis')
ax.legend()
Output:
I'd recommend modifying the X-axis into some form of integers or floats (Seconds, minutes, hours days since a certain time, based on the precision that you need). You can then use usual methods to integrate and the x-axes would no longer default to some other values.
See How to convert datetime to integer in python
Related
How do you get dates on the start on the specified month? (matplotlib)
# FEB # configuring the figure and plot space fig, lx = plt.subplots(figsize=(30,10)) # converting the Series into str so the data can be plotted wd = df2['Unnamed: 1'] wd = wd.astype(float) # adding the x and y axes' values lx.plot(list(df2.index.values), wd) # defining what the labels will be lx.set(xlabel='Day', ylabel='Weight', title='Daily Weight February 2022') # defining the date format date_format = DateFormatter('%m-%d') lx.xaxis.set_major_formatter(date_format) lx.xaxis.set_minor_locator(mdates.WeekdayLocator(interval=1)) Values I would like the x-axis to have: ['2/4', '2/5', '2/6', '2/7', '2/8', '2/9', '2/10', '2/11', '2/12', '2/13', '2/14', '2/15', '2/16', '2/17', '2/18', '2/19', '2/20', '2/21', '2/22', '2/23', '2/24', '2/25', '2/26', '2/27'] Values on the x-axis: enter image description here It is giving me the right number of values just not the right labels. I have tried to specify the start and end with xlim=['2/4', '2/27], however that did seem to work.
It would be great to see how your df2 actually looks, but from your code snippet, it looks like it has weights recorded but not the corresponding dates. How about prepare a data frame that has dates in it? (Also, since this question is tagged with seaborn too, I'm going to use Seaborn, but the same idea should work.) import pandas as pd import seaborn as sns import seaborn.objects as so from matplotlib.dates import DateFormatter sns.set_theme() Create an index with the dates starting from 4 Feb with the number of days we have weight recorded. index = pd.date_range(start="2/4/2022", periods=df.count().Weight, name="Date") Then with Seaborn's object interface (v0.12+), we can do: ( so.Plot(df2.set_index(index), x="Date", y="Weight") .add(so.Line()) .scale(x=so.Temporal().label(formatter=DateFormatter('%m-%d'))) .label(title="Daily Weight February 2022") )
I have solved this solution. Very simple. I just added mdates.WeekdayLocator() to set_major_formatter. I overlooked this when I was going through the matplotlib docs. But happy to have found this solution.
Plot data frame fast and with correct date format
I have the data as in the screenshot, it is in a dataframe format, I would like to plot the dataframe fast and with correct date format. The code as follow is much fast than using e.g plt.plot(df["Date"], df["D30"]) df.plot(marker='.', linestyle='none') So that I would like to keep using the dataframe.plot() functionality directly because it is much faster than plot each column against the "Date" column separately. However, as shown in the graph, the date is not correct. My actually starting Date is 2006-01-10, but in the figure, it is shown from 70-01 (1970-01-01). For me, the official documentation of matplotlib DateFormatter is quite confusing and not so helpful. I tried to google a easy and clear solution, but most answers are related to plt.plot(x, y) where x is Date and y is the actual value. After that it is easy to adjust the format of the "Date" in the figure. But it will make my plot super slow since I am plotting 11 columns in total. Any idea how I can plot data frame fast and with correct date format import os import datetime as dt import pandas as pd import matplotlib.pyplot as plt date_format = mdates.DateFormatter('%y%m') df_file = r"C:\Codes\df_file.csv" df = pd.read_csv(path_file) print(len(df), df.info(), df["Date"][0], type(df["Date"][0])) df.head(2) fig = plt.figure(figsize=(12.0, 8.0)) df.plot(marker='.', linestyle='none') plt.title("data_frame_show date", fontsize=16) plt.gca().xaxis.set_major_formatter(dtFmt) plt.legend(loc=(1.04, 0)) plt.show() partial input: Date,D10,D30,D60,D91,D122,D152,D182,D273,D365,D547,D730 2006-01-10,,0.1373444,0.1544265,0.1541397,0.1429375,0.1421464,0.1426055,0.1460771,0.1486266,0.1551848,0.1593932 2006-01-11,,0.135426,0.1411246,0.141093,0.1384091,0.1383636,0.1395791,0.1438944,0.1469191,0.1553112,0.1598582 2006-01-12,,0.1311339,0.1292621,0.1304292,0.1363482,0.1362213,0.1367843,0.1404174,0.1439877,0.152306,0.1568677 2006-01-13,,0.1594458,0.1355387,0.1367246,0.1434708,0.143745,0.1441349,0.1453056,0.1481918,0.157193,0.1607564 2006-01-16,,0.1374846,0.1182223,0.1272385,0.1415359,0.1418881,0.1430098,0.1468544,0.1496407,0.1547714,0.158936 2006-01-17,,0.1453834,0.1418838,0.143198,0.1437924,0.143473,0.1440987,0.1473208,0.1501543,0.1590842,0.1629096 2006-01-18,,0.1385479,0.141472,0.1481763,0.1515037,0.1511353,0.1511544,0.1535245,0.1554254,0.1626349,0.1663554 2006-01-19,,0.1639788,0.1462084,0.1483903,0.1486906,0.1483109,0.1492335,0.1539002,0.1563708,0.1611751,0.1644693 2006-01-20,,0.189771,0.178394,0.1638331,0.1565402,0.1559029,0.1553547,0.1526479,0.1516396,0.1614136,0.1646431 2006-01-23,,0.1420271,0.1570005,0.1614942,0.1607205,0.1605297,0.1630065,0.1653838,0.1642349,0.166809,0.1701779 2006-01-24,,0.1814291,0.1633585,0.1563364,0.1548823,0.15382,0.1545099,0.1590869,0.1609158,0.1653819,0.1681759 2006-01-25,,0.1272998,0.1445222,0.1487031,0.1522032,0.152714,0.1524364,0.1532192,0.1550062,0.1635665,0.1658293 2006-01-26,,0.1392162,0.1413034,0.1443807,0.1476261,0.1482458,0.1473548,0.1471019,0.1493254,0.1578586,0.160699 2006-01-27,,0.1360269,0.1374056,0.1387952,0.1426731,0.1441445,0.144917,0.1462428,0.1478979,0.1519537,0.1550311 2006-01-30,,0.1439245,0.1430108,0.1434628,0.1448731,0.1450397,0.1454756,0.1467621,0.1487521,0.1538424,0.1561802 2006-01-31,,0.1483135,0.1468713,0.1473837,0.1519043,0.1519379,0.1502139,0.1504632,0.1529254,0.1571567,0.1589795 2006-02-01,,0.1464208,0.1447363,0.1443483,0.1459808,0.1477726,0.1505124,0.1520256,0.1535773,0.1589145,0.1607383 2006-02-02,,0.1484249,0.1414394,0.1412338,0.1497531,0.1500731,0.1475751,0.147502,0.1512457,0.1571017,0.1606797 2006-02-03,,0.1496503,0.1485318,0.1502473,0.1565336,0.156727,0.1556335,0.1560396,0.1579241,0.1619183,0.1634751 2006-02-06,,0.149966,0.1457216,0.1475524,0.1539103,0.1546401,0.154973,0.1553681,0.1570598,0.161173,0.1630743 2006-02-08,,0.1463649,0.1436135,0.1454147,0.1498372,0.1507231,0.1520234,0.1538407,0.1563603,0.1617697,0.1639547 2006-02-09,,0.1401312,0.1432856,0.1437166,0.1443243,0.1463163,0.148681,0.1496198,0.1516376,0.1584639,0.1615756 2006-02-10,,0.1339916,0.1405194,0.1432779,0.1464605,0.1470921,0.1484831,0.1514307,0.1550715,0.1599564,0.1623171 2006-02-13,,0.1470304,0.1423007,0.1446087,0.1470668,0.1485171,0.1503383,0.1508497,0.1532987,0.1591155,0.1615874 2006-02-14,,0.1454322,0.1449017,0.1455735,0.1462286,0.1478059,0.1501469,0.1522522,0.1541999,0.157668,0.1601427 2006-02-15,,0.1429312,0.1455881,0.1464055,0.1471812,0.1489883,0.1514654,0.153837,0.1559375,0.16082,0.1631557 2006-02-16,,0.134637,0.1373471,0.140634,0.1432172,0.145788,0.14875,0.1507805,0.15325,0.1581015,0.1613797 2006-02-20,,0.1303785,0.1334454,0.139216,0.1423217,0.1454704,0.1477552,0.1487534,0.1509405,0.1554398,0.1588761 2006-02-21,,0.1359587,0.1370814,0.1416117,0.1418016,0.1441761,0.1468109,0.1476679,0.1496546,0.1561362,0.1607204 2006-02-22,,0.1302253,0.1337104,0.1415016,0.141451,0.1438881,0.1467031,0.1502449,0.1514018,0.1531452,0.1582335 2006-02-23,,0.1282022,0.1333902,0.1342376,0.1385976,0.1453201,0.1481733,0.1490296,0.1512885,0.1554035,0.1593463 2006-02-24,,0.1269229,0.1304391,0.1348061,0.1378378,0.1419301,0.1442134,0.1472283,0.1507224,0.1555662,0.1595938 2006-02-27,,0.1254707,0.128201,0.1334554,0.1374389,0.1427246,0.1446071,0.1465459,0.1496113,0.1541296,0.1578174 2006-02-28,,0.1346332,0.1361773,0.139586,0.1421924,0.1468084,0.1489651,0.1505661,0.1541479,0.1606205,0.1675438 2006-03-01,,0.1301198,0.1318495,0.1343342,0.1376886,0.1434328,0.1459977,0.1490832,0.1525961,0.1557153,0.1593923 2006-03-02,,0.1304425,0.1347556,0.1398592,0.1420431,0.1457691,0.1479747,0.1510143,0.1544964,0.1589201,0.1616325 2006-03-03,,0.1311674,0.1339681,0.138887,0.1418598,0.1451706,0.1472144,0.1495689,0.1536886,0.1599843,0.162247 2006-03-06,,0.1308081,0.1367775,0.1412145,0.1436582,0.1480171,0.1495588,0.1511633,0.1545973,0.1588486,0.1616268 2006-03-07,,0.1344355,0.1387528,0.143365,0.1459607,0.1482421,0.1491656,0.1512236,0.1550063,0.1593201,0.1615385
When plotting time series, pandas takes the index for the x-axis when calling the plot function. I would suggest to: df = df.assign( Date=lambda x: pd.to_datetime(x["Date"], format="%Y-%m%d") ).set_index("Date")
How to add a string comment above every single candle using mplfinance.plot() or any similar package?
i want to add a string Comment above every single candle using mplfinance package . is there a way to do it using mplfinance or any other package ? here is the code i used : import pandas as pd import mplfinance as mpf import matplotlib.animation as animation from mplfinance import * import datetime from datetime import date, datetime fig = mpf.figure(style="charles",figsize=(7,8)) ax1 = fig.add_subplot(1,1,1 , title='ETH') def animate(ival): idf = pd.read_csv("test1.csv", index_col=0) idf['minute'] = pd.to_datetime(idf['minute'], format="%m/%d/%Y %H:%M") idf.set_index('minute', inplace=True) ax1.clear() mpf.plot(idf, ax=ax1, type='candle', ylabel='Price US$') ani = animation.FuncAnimation(fig, animate, interval=250) mpf.show()
You should be able to do this using Axes.text() After calling mpf.plot() then call ax1.text() for each text that you want (in your case for each candle). There is an important caveat regarding the x-axis values that you pass into ax1.text(): If you do not specify show_nontrading=True then it will default to False in which case the x-axis value that you pass into ax1.text() for the position of the text must be the row number corresponding to the candle where you want the text counting from 0 for the first row in your DataFrame. On the other hand if you do set show_nontrading=True then the x-axis value that you pass into ax1.text() will need to be the matplotlib datetime. You can convert pandas datetimes from you DataFrame DatetimeIndex into matplotlib datetimes as follows: import matplotlib.dates as mdates my_mpldates = mdates.date2num(idf.index.to_pydatetime()) I suggest using the first option (DataFrame row number) because it is simpler. I am currently working on an mplfinance enhancement that will allow you to enter the x-axis values as any type of datetime object (which is the more intuitive way to do it) however it may be another month or two until that enhancement is complete, as it is not trivial. Code example, using data from the mplfinance repository examples data folder: import pandas as pd import mplfinance as mpf infile = 'data/yahoofinance-SPY-20200901-20210113.csv' # take rows [18:28] to keep the demo small: df = pd.read_csv(infile, index_col=0, parse_dates=True).iloc[18:25] fig, axlist = mpf.plot(df,type='candle',volume=True, ylim=(330,345),returnfig=True) x = 1 y = df.loc[df.index[x],'High']+1 axlist[0].text(x,y,'Custom\nText\nHere') x = 3 y = df.loc[df.index[x],'High']+1 axlist[0].text(x,y,'High here\n= '+str(y-1),fontstyle='italic') x = 5 y = df.loc[df.index[x],'High']+1 axlist[0].text(x-0.2,y,'More\nCustom\nText\nHere',fontweight='bold') mpf.show() Comments on the above code example: I am setting the ylim=(330,345) in order to provide a little extra room above the candles for the text. In practice you might choose the high dynamically as perhaps high_ylim = 1.03*max(df['High'].values). Notice that the for first two candles with text, the text begins at the center of the candle. The 3rd text call uses x-0.2 to position the text more over the center of the candle. For this example, the y location of the candle is determined by taking the high of that candle and adding 1. (y = df.loc[df.index[x],'High']+1) Of course adding 1 is arbitrary, and in practice, depending on the maginitude of your prices, adding 1 may be too little or too much. Rather you may want to add a small percentage, for example 0.2 percent: y = df.loc[df.index[x],'High'] y = y * 1.002 Here is the plot the above code generates:
Graphing a dataframe line plot with a legend in Matplotlib
I'm working with a dataset that has grades and states and need to create line graphs by state showing what percent of each state's students fall into which bins. My methodology (so far) is as follows: First I import the dataset: import pandas as pd import numpy as np import matplotlib.pyplot as plt records = [{'Name':'A', 'Grade':'.15','State':'NJ'},{'Name':'B', 'Grade':'.15','State':'NJ'},{'Name':'C', 'Grade':'.43','State':'CA'},{'Name':'D', 'Grade':'.75','State':'CA'},{'Name':'E', 'Grade':'.17','State':'NJ'},{'Name':'F', 'Grade':'.85','State':'HI'},{'Name':'G', 'Grade':'.89','State':'HI'},{'Name':'H', 'Grade':'.38','State':'CA'},{'Name':'I', 'Grade':'.98','State':'NJ'},{'Name':'J', 'Grade':'.49','State':'NJ'},{'Name':'K', 'Grade':'.17','State':'CA'},{'Name':'K', 'Grade':'.94','State':'HI'},{'Name':'M', 'Grade':'.33','State':'HI'},{'Name':'N', 'Grade':'.22','State':'NJ'},{'Name':'O', 'Grade':'.7','State':'NJ'}] df = pd.DataFrame(records) df.Grade = df.Grade.astype(float) Next I cut each grade into a bin df['bin'] = pd.cut(df['Grade'],[-np.inf,.05,.1,.15,.2,.25,.3,.35,.4,.45,.5,.55,.6,.65,.7,.75,.8,.85,.9,.95,1],labels=False)/10 Then I create a pivot table giving me the count of people by bin in each state df2 = pd.pivot_table(df,index=['bin'],columns='State',values=['Name'],aggfunc=pd.Series.nunique,margins=True) df2 = df2.fillna(0) Then I convert those n-counts into percentages and remove the margin rows df3 = df2.div(df2.iloc[-1]) df3 = df3.iloc[:-1,:-1] Now I want to create a line graph with multiple lines (one for each state) with the bin on the x axis and the percentage on the Y axis. df3.plot() will give me the chart I want but I would like to accomplish the same using matplotlib, because it offers me greater customization of the graph. However, running plt.plot(df3) gives me the lines I need but I can't get the legend the work properly. Any thoughts on how to accomplish this?
It may not be the best way, but I use the pandas plot function to draw df3, then get the legend and get the new label names. Please note that the processing of the legend string is limited to this data. line = df3.plot(kind='line') handles, labels = line.get_legend_handles_labels() label = [] for l in labels: label.append(l[7:-1]) plt.legend(handles, label, loc='best')
You can do this: plt.plot(df3,label="label") plt.legend() plt.show() For more information visit here And if it helps you to solve your issues then don't forget to mark this as accepted answer.
Inconsistent automatic pandas date labeling
I was wondering how pandas formats the x-axis date exactly. I am using the same script on a bunch of data results, which all have the same pandas df format. However, pandas formats each df date differently. How could this be more consistently? Each df has a DatetimeIndex like this, dtype='datetime64[ns] >>> df.index DatetimeIndex(['2014-10-02', '2014-10-03', '2014-10-04', '2014-10-05', '2014-10-06', '2014-10-07', '2014-10-08', '2014-10-09', '2014-10-10', '2014-10-11', ... '2015-09-23', '2015-09-24', '2015-09-25', '2015-09-26', '2015-09-27', '2015-09-28', '2015-09-29', '2015-09-30', '2015-10-01', '2015-10-02'], dtype='datetime64[ns]', name='Date', length=366, freq=None) Eventually, I plot with df.plot() where the df has two columns. But the axes of the plots have different styles, like this: I would like all plots to have the x-axis style of the first plot. pandas should do this automatically, so I'd rather not prefer to begin with xticks formatting, since I have quite a lot of data to plot. Could anyone explain what to do? Thanks! EDIT: I'm reading two csv-files from 2015. The first has the model results of about 200 stations, the second has the gauge measurements of the same stations. Later, I read another two csv-files from 2016 with the same format. import pandas as pd df_model = pd.read_csv(path_model, sep=';', index_col=0, parse_dates=True) df_gauge = pd.read_csv(path_gauge, sep=';', index_col=0, parse_dates=True) df = pd.DataFrame(columns=['model', 'gauge'], index=df_model.index) df['model'] = df_model['station_1'].copy() df['gauge'] = df_gauge['station_1'].copy() df.plot() I do this for each year, so the x-axis should look the same, right?
I do not think this possible unless you make modifications to the pandas library. I looked around a bit for options that one may set in Pandas, but couldn't find one. Pandas tries to intelligently select the type of axis ticks using logic implemented here (I THINK). So in my opinion, it would be best to define your own function to make the plots and than overwrite the tick formatting (although you do not want to do that). There are many references around the internet which show how to do this. I used this one by "Simone Centellegher" and this stackoverflow answer to come up with a function that may work for you (tested in python 3.7.1 with matplotlib 3.0.2, pandas 0.23.4): import pandas as pd import numpy as np import matplotlib.dates as mdates import matplotlib.pyplot as plt ## pass df with columns you want to plot def my_plotter(df, xaxis, y_cols): fig, ax = plt.subplots() plt.plot(xaxis,df[y_cols]) ax.xaxis.set_minor_locator(mdates.MonthLocator()) ax.xaxis.set_major_locator(mdates.YearLocator()) ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b')) ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y')) # Remove overlapping major and minor ticks majticklocs = ax.xaxis.get_majorticklocs() minticklocs = ax.xaxis.get_minorticklocs() minticks = ax.xaxis.get_minor_ticks() for i in range(len(minticks)): cur_mintickloc = minticklocs[i] if cur_mintickloc in majticklocs: minticks[i].set_visible(False) return fig, ax df = pd.DataFrame({'values':np.random.randint(0,1000,36)}, \ index=pd.date_range(start='2014-01-01', \ end='2016-12-31',freq='M')) fig, ax = my_plotter(df, df.index, ["values"])