I have used calmap to analyze whether data has been imported correctly over the past 2.5 years. I have a few gaps, some of which are easy to explain with holidays. However, I need to google search for those dates to confirm because they are not marked as such by default in the output-calendar. Is there a way to do that automatically? There is none in the documentation for highlighting specific dates. I was thinking of adding an extra df with only holidays for the respective years but I couldn't figure out how to add two dataframes (with different coloring) to the calendar.
Here's the function I am referencing:
import numpy as np
import matplotlib.pyplot as plt
from plotly_calplot import calplot
all_days = pd.date_range('1/1/2019', periods=730, freq='D')
days = np.random.choice(all_days, 500)
events = pd.Series(np.random.randn(len(days)), index=days)
fig = calplot(events, cmap='YlGn', colorbar=False, title="Fantastic Calendar")
fig.show()ยดยดยด
Related
# FEB
# configuring the figure and plot space
fig, lx = plt.subplots(figsize=(30,10))
# converting the Series into str so the data can be plotted
wd = df2['Unnamed: 1']
wd = wd.astype(float)
# adding the x and y axes' values
lx.plot(list(df2.index.values), wd)
# defining what the labels will be
lx.set(xlabel='Day', ylabel='Weight', title='Daily Weight February 2022')
# defining the date format
date_format = DateFormatter('%m-%d')
lx.xaxis.set_major_formatter(date_format)
lx.xaxis.set_minor_locator(mdates.WeekdayLocator(interval=1))
Values I would like the x-axis to have:
['2/4', '2/5', '2/6', '2/7', '2/8', '2/9', '2/10', '2/11', '2/12', '2/13', '2/14', '2/15', '2/16', '2/17', '2/18', '2/19', '2/20', '2/21', '2/22', '2/23', '2/24', '2/25', '2/26', '2/27']
Values on the x-axis:
enter image description here
It is giving me the right number of values just not the right labels. I have tried to specify the start and end with xlim=['2/4', '2/27], however that did seem to work.
It would be great to see how your df2 actually looks, but from your code snippet, it looks like it has weights recorded but not the corresponding dates.
How about prepare a data frame that has dates in it?
(Also, since this question is tagged with seaborn too, I'm going to use Seaborn, but the same idea should work.)
import pandas as pd
import seaborn as sns
import seaborn.objects as so
from matplotlib.dates import DateFormatter
sns.set_theme()
Create an index with the dates starting from 4 Feb with the number of days we have weight recorded.
index = pd.date_range(start="2/4/2022", periods=df.count().Weight, name="Date")
Then with Seaborn's object interface (v0.12+), we can do:
(
so.Plot(df2.set_index(index), x="Date", y="Weight")
.add(so.Line())
.scale(x=so.Temporal().label(formatter=DateFormatter('%m-%d')))
.label(title="Daily Weight February 2022")
)
I have solved this solution. Very simple. I just added mdates.WeekdayLocator() to set_major_formatter. I overlooked this when I was going through the matplotlib docs. But happy to have found this solution.
i want to add a string Comment above every single candle using mplfinance package .
is there a way to do it using mplfinance or any other package ?
here is the code i used :
import pandas as pd
import mplfinance as mpf
import matplotlib.animation as animation
from mplfinance import *
import datetime
from datetime import date, datetime
fig = mpf.figure(style="charles",figsize=(7,8))
ax1 = fig.add_subplot(1,1,1 , title='ETH')
def animate(ival):
idf = pd.read_csv("test1.csv", index_col=0)
idf['minute'] = pd.to_datetime(idf['minute'], format="%m/%d/%Y %H:%M")
idf.set_index('minute', inplace=True)
ax1.clear()
mpf.plot(idf, ax=ax1, type='candle', ylabel='Price US$')
ani = animation.FuncAnimation(fig, animate, interval=250)
mpf.show()
You should be able to do this using Axes.text()
After calling mpf.plot() then call
ax1.text()
for each text that you want (in your case for each candle).
There is an important caveat regarding the x-axis values that you pass into ax1.text():
If you do not specify show_nontrading=True then it will default to False in which case the x-axis value that you pass into ax1.text() for the position of the text must be the row number corresponding to the candle where you want the text counting from 0 for the first row in your DataFrame.
On the other hand if you do set show_nontrading=True then the x-axis value that you pass into ax1.text() will need to be the matplotlib datetime. You can convert pandas datetimes from you DataFrame DatetimeIndex into matplotlib datetimes as follows:
import matplotlib.dates as mdates
my_mpldates = mdates.date2num(idf.index.to_pydatetime())
I suggest using the first option (DataFrame row number) because it is simpler. I am currently working on an mplfinance enhancement that will allow you to enter the x-axis values as any type of datetime object (which is the more intuitive way to do it) however it may be another month or two until that enhancement is complete, as it is not trivial.
Code example, using data from the mplfinance repository examples data folder:
import pandas as pd
import mplfinance as mpf
infile = 'data/yahoofinance-SPY-20200901-20210113.csv'
# take rows [18:28] to keep the demo small:
df = pd.read_csv(infile, index_col=0, parse_dates=True).iloc[18:25]
fig, axlist = mpf.plot(df,type='candle',volume=True,
ylim=(330,345),returnfig=True)
x = 1
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'Custom\nText\nHere')
x = 3
y = df.loc[df.index[x],'High']+1
axlist[0].text(x,y,'High here\n= '+str(y-1),fontstyle='italic')
x = 5
y = df.loc[df.index[x],'High']+1
axlist[0].text(x-0.2,y,'More\nCustom\nText\nHere',fontweight='bold')
mpf.show()
Comments on the above code example:
I am setting the ylim=(330,345) in order to provide a little extra room above the candles for the text. In practice you might choose the high dynamically as perhaps high_ylim = 1.03*max(df['High'].values).
Notice that the for first two candles with text, the text begins at the center of the candle. The 3rd text call uses x-0.2 to position the text more over the center of the candle.
For this example, the y location of the candle is determined by taking the high of that candle and adding 1. (y = df.loc[df.index[x],'High']+1) Of course adding 1 is arbitrary, and in practice, depending on the maginitude of your prices, adding 1 may be too little or too much. Rather you may want to add a small percentage, for example 0.2 percent:
y = df.loc[df.index[x],'High']
y = y * 1.002
Here is the plot the above code generates:
newbie programmer here:
I have this big data set (Excel file) on gas, hydro, and water bills per unit from 2017 to 2020 for each building.
Basically, the first column is the date column, and each subsequent column has the building name as the title of the column which contains the cost/unit for that particular building.
So there are 61 buildings, hence 61 columns, plus the date column bringing the total # of columns to 62. I am trying to make 62 individual plots of "cost/unit vs time", whereby I want my cost/unit to be on the y axis and the date(time) to be on the x axis.
I think I am getting the plots right, I am just not able to figure out why my dates don't come the way they should on the x axis.
Here is the code:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stat
import numpy as np
import math as mt
import matplotlib.dates as mdates
from datetime import datetime
df1 = pd.read_csv('Gas Costs.csv')
df1['Date'] = pd.to_datetime(df1['Date'], format='%m-%y')
df1 = df1.set_index('Date')
for column in df1:
columnSeriesObj = df1[column]
plt.plot(columnSeriesObj.values)
plt.gcf().autofmt_xdate()
plt.show()
By doing this, I get 61 plots, one of which looks like this:
Cost/unit v/s time plot for one of the buildings
I also wish to give each plot a title stating the building name, but I am unable to do so. I tried it using the for loop but didn't strike much luck with it.
Any help on this will be appreciated!
I am working with this dataframe containing bit coin data from yahoo finance. I set a list of cryptocurrencies and I would like:
a. to limit the x axis to the last 2 months
b. try to put all the graphs together, like faceting one close to the other in a graph table, as it s possible to do in ggplot in R.
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
# For reading stock data from yahoo
from pandas.io.data import DataReader
# For time stamps
from datetime import datetime
# For division
from __future__ import division
tech_list = ['BTC','TTC','DGC','DEE','PPC']
end = datetime.now()
start = datetime(end.year - 1,end.month,end.day)
for stock in tech_list:
# Set DataFrame as the Stock Ticker
globals()[stock] = DataReader(stock,'yahoo',start,end)
DEE['Volume'].plot(legend=True,figsize=(10,4))
Should I change something in the time definition or in seaborn itself?
thanks
Your question has nothing to do with seaborn, it is basic
matplotlib stuff. seaborn only changes the style of figures in
your case, so I exclude it from the code I provide below, but you can
import it if you want to change the style.
To get the data for the last 2 months you should make your start for two months before today. It is easy to do with relativedelta:
from dateutil.relativedelta import relativedelta
start = dt.datetime(end.year,end.month,end.day) - relativedelta(months=2)
Changing limits of x axis would be more painful if you plot with padnas.DataFrame.plot(). But if you want to import all data and then select only data for the last 2 months, you can use the same relativedelta trick, but with indexing, like this:
DEE.loc[DEE.index >= end - relativedelta(months=2),'Volume'].plot()
As for putting the graphs together, your question is not phrased clearly, so I can only guess that you meant putting all stocks one under another as subplots, like this:
I rewrote your code to do this:
import pandas as pd
# For reading stock data from yahoo
from pandas.io.data import DataReader
import matplotlib.pyplot as plt
%matplotlib inline
# For time stamps
import datetime as dt
from dateutil.relativedelta import relativedelta
end = dt.datetime.now()
start = dt.datetime(end.year-1,end.month,end.day)
tech_list = dict({'BTC':None,'TTC':None,'DGC':None,'DEE':None,'PPC':None})
for stock in tech_list.keys():
tech_list[stock] = DataReader(stock,'yahoo',start,end)
months_to_plot = relativedelta(months=2)
fig = plt.figure(figsize=(8,10))
for (n,stock) in enumerate(tech_list.keys()):
ax = fig.add_subplot(len(tech_list),1,n)
tech_list[stock].loc[tech_list[stock].index >= end - months_to_plot,'Volume'].plot()
ax.set_title(stock)
plt.tight_layout()
P.S. Please be more clear in the future with your questions if you want a concise answer. The code you provided contains extra lines, not necessary for your question, which makes it more difficult to understand what exactly you want to do. Your question, on the other hand, is not nearly as detailed as it could be.
I have two time-series datasets that I want to make a step-chart of.
The time series data is between Monday 2015-04-20 and Friday 2015-04-24.
The first dataset contains 26337 rows with values ranging from 0-1.
The second dataset contains 80 rows with values between 0-4.
First dataset represents motion sensor values in a room, with around 2-3 minutes between each measurement. 1 indicates the room is occupied, 0 indicates that it is empty. The second contains data from a survey where users could fill in how many people were in the same room, at the time they were answering the survey.
Now I want to compare this data, to find out how well the sensor performs. Obviously there is a lot of data that is "missing" in the second set. Is there a way to fill in the "blanks" in a step chart?
Each row has the following format:
Header
Timestamp (%Y-%m-%d %H:%M:%S),value
Example:
Time,Occupancy
24-04-2015 21:40:33,1
24-04-2015 21:43:11,0
.....
So far I have managed to import the first dataset and make a plot of it. Unfortunately the x-axis is not showing dates, but a lot of numbers:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
data = open('PIRDATA.csv')
ts = pd.Series.from_csv(data, sep=',')
plot(ts);
Result:
How would I go on from here on now?
Try to use Pandas to read the data, using the Date column as the index (parsing the values to dates).
data = pd.read_csv('PIRDATA.csv', index_col=0, parse_dates=0)
To achieve your step chart objective, try:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
from matplotlib.dates import HourLocator
small_dataset = pd.read_csv('SURVEY_RESULTS_WEEK1.csv', header=0,index_col=0, parse_dates=0)
big_dataset = pd.read_csv('PIRDATA_RAW_CONVERTED_DATETIME.csv', header=0,index_col=0, parse_dates=0)
small_dataset.rename(columns={'Occupancy': 'Survey'}, inplace=True)
big_dataset.rename(columns={'Occupancy': 'PIR'}, inplace=True)
big = big_dataset.plot()
big.xaxis.set_major_formatter(DateFormatter('%y-%m-%d H: %H'))
big.xaxis.set_major_locator(HourLocator(np.arange(0, 25, 6)))
big.set_ylabel('Occupancy')
small_dataset.plot(ax=big, drawstyle='steps')
fig = plt.gcf()
fig.suptitle('PIR and Survey Occupancy Comparsion')
plt.show()