Plotting whole month in python with only 1 day data - python

I am trying to create a plot with an amount (int) in the y-axis and days in the x-axis.
I want the plot to always have the whole month in the x-axis although I dont have data for all days.
This is the code I tryed:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import datetime as dt
df=get_pandas_data(datab) #Taking data from database in pandas DataFrame
fig = plt.figure(figsize=(10,10)) #Initialize plot
ax1 = fig.add_subplot(1,1,1)
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in df['date']]
dates=list(set(dates)) #Takes all the dates from de Dataframe and sets to avoid repeated dates
s=df.resample('D', on='date')['amount'].sum() #Takes the total amount for the same date
ax1.bar(dates,s) #Bar plot for dates and amount
ax1.set(xlabel="Date",
ylabel="Balance (€)",
title="Total Monthly balance") # Plot information
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
#this is soposed to set all days of the month in the x-axis
ax1.xaxis.set_major_locator(mdates.DayLocator(interval=1))
fig.autofmt_xdate()
plt.show()
The result I get from this is a plot but only with those days that have data.
How can I make the plot to have all days in the month and plot the bar on those who have data?

This works fine with bare datetimes and matplotlib so you must be malforming your data somehow when doing your pandas manipulations. But we can't really help because we don't have your dataframe. Its always preferable to create a standalone example with dummy data, and as little code as possible to recreate the issue. a) 90% of the time you will realize your problem b) if not, we can help...
import numpy as np
import matplotlib.pyplot as plt
import datetime
x = np.array([1, 3, 7, 8, 10])
y = x * 2
dates = [datetime.datetime(2000, 2, xx) for xx in x]
fig, ax = plt.subplots()
ax.bar(dates, y)
fig.autofmt_xdate()
plt.show()

Related

Python Pyplot - Format Plotted Graph's Y Axis as a Percent to 2 Decimal Places

I am trying to represent CDC Delay of Care data as a line graph but am having some trouble formatting the y axis so that it is a percentage to the hundredths place. I would also like for the x axis to show every year in the range selected.
Here is my code:
import pandas as pd
from isolation import isolate_total_stub, isolate_age_stub
import matplotlib.pyplot as plt
# very simple extraction, drop some columns and check some data
cdc_data = pd.read_csv('CDC_Delay_of_Care_Data.csv')
# separate the categories of delayed care
delay_of_medical_care = cdc_data[cdc_data.PANEL == 'Delay or nonreceipt of needed medical care due to cost']
# isolate the totals stub
total_delay_of_medical_care = isolate_total_stub(delay_of_medical_care)
x_axis = total_delay_of_medical_care.YEAR
y_axis = total_delay_of_medical_care.ESTIMATE
plt.plot(x_axis, y_axis)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()
The graph that displays looks like this:
line graph
Excuse me for being a novice, I have been googling for an hour now and instead of continue to search for an answer I thought it would be more productive to ask StackOverflow.
Thank you for your time.
To change the format of Y-axis, you can use set_major_formatter
To change X-axis to date in year format, you will need to use set_major_locator, assuming that your date is in datetime format
To change format of X-axis, you can again use the set_major_formatter
I am showing a small example below with dummy data. Hope this works.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib.dates as mdate
estimate = [8, 7.1, 11, 10.6, 8, 8.3]
year = ['2000-01-01', '2004-01-01', '2008-01-01', '2012-01-01', '2016-01-01', '2020-01-01']
year=pd.to_datetime(year) ## Convert string to datetime
plt.figure(figsize=(12,5)) ## Added so the Years don't overlap on each other
plt.plot(year, estimate)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.gca().yaxis.set_major_formatter(FormatStrFormatter('%.2f')) ## Makes X-axis label with two decimal points
locator = mdate.YearLocator()
plt.gca().xaxis.set_major_locator(locator) ## Changes datetime to years - 1 label per year
plt.gca().xaxis.set_major_formatter(mdate.DateFormatter('%Y')) ## Shows X-axis in Years
plt.gcf().autofmt_xdate() ## Rotates X-labels, if you want to use it
plt.show()
Output plot

How can I list sequentially the x and y axis on chart?

I have a dataframe and I want to show them on graph. When I start my code, the x and y axis are non-sequential. How can I solve it? Also I give a example graph on picture. First image is mine, the second one is what I want.
This is my code:
from datetime import timedelta, date
import datetime as dt #date analyse
import matplotlib.pyplot as plt
import pandas as pd #read file
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
tarih="01-01-2021"
tarih2="20-06-2021"
start=dt.datetime.strptime(tarih, '%d-%m-%Y')
end=dt.datetime.strptime(tarih2, '%d-%m-%Y')
fg=pd.DataFrame()
liste=[]
tarih=[]
for dt in daterange(start, end):
dates=dt.strftime("%d-%m-%Y")
with open("fng_value.txt", "r") as filestream:
for line in filestream:
date = line.split(",")[0]
if dates == date:
fng_value=line.split(",")[1]
liste.append(fng_value)
tarih.append(dates)
fg['date']=tarih
fg['fg_value']=liste
print(fg.head())
plt.subplots(figsize=(20, 10))
plt.plot(fg.date,fg.fg_value)
plt.title('Fear&Greed Index')
plt.ylabel('Fear&Greed Data')
plt.xlabel('Date')
plt.show()
This is my graph:
This is the graph that I want:
Line plot with datetime x axis
So it appears this code is opening a text file, adding values to either a list of dates or a list of values, and then making a pandas dataframe with those lists. Finally, it plots the date vs values with a line plot.
A few changes should help your graph look a lot better. A lot of this is very basic, and I'd recommend reviewing some matplotlib tutorials. The Real Python tutorial is a good starting place in my opinion.
Fix the y axis limit:
plt.set_ylim(0, 100)
Use a x axis locator from mdates to find better spaced x label locations, it depends on your time range, but I made some data and used day locator.
import matplotlib.dates as mdates
plt.xaxis.set_major_locator(mdates.DayLocator())
Use a scatter plot to add data points as on the linked graph
plt.scatter(x, y ... )
Add a grid
plt.grid(axis='both', color='gray', alpha=0.5)
Rotate the x tick labels
plt.tick_params(axis='x', rotation=45)
I simulated some data and plotted it to look like the plot you linked, this may be helpful for you to work from.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(15,5))
x = pd.date_range(start='june 26th 2021', end='july 25th 2021')
rng = np.random.default_rng()
y = rng.integers(low=15, high=25, size=len(x))
ax.plot(x, y, color='gray', linewidth=2)
ax.scatter(x, y, color='gray')
ax.set_ylim(0,100)
ax.grid(axis='both', color='gray', alpha=0.5)
ax.set_yticks(np.arange(0,101, 10))
ax.xaxis.set_major_locator(mdates.DayLocator())
ax.tick_params(axis='x', rotation=45)
ax.set_xlim(min(x), max(x))

How do I control the number of x-axis ticks?

I have pulled in a dataset that I want to use, with columns named Date and Adjusted. Adjusted is just the adjusted percentage growth on the base month.
The code I currently have is:
x = data['Date']
y = data['Adjusted']
fig = plt.figure(dpi=128, figsize=(7,3))
plt.plot(x,y)
plt.title("FTSE 100 Growth", fontsize=25)
plt.xlabel("Date", fontsize=14)
plt.ylabel("Adjusted %", fontsize=14)
plt.show()
However, when I run it I get essentially a solid black line across the bottom where all of the dates are covering each other up. It is trying to show every single date, when obviously I only want to show major ones. That dates are in the format Apr-19, and the data runs from Oct-03 to May-20.
How do I limit the number of date ticks and labels to one per year, or any amount I choose? If you do have a solution, if you could respond with the edits made to the code itself that would be great. I've tried other solutions I've found on here but I haven't been able to get it to work.
dates module of matplotlib will do the job. You can control the interval by modifying the MonthLocator (It's currently set to 6 months). Here's how:
import pandas as pd
from datetime import date, datetime, timedelta
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
import matplotlib.ticker as ticker
x = data['Date']
y = data['Adjusted']
#converts differently formatted date to a datetime object
def convert_date(df):
return datetime.strptime(df['Date'], '%b-%y')
data['Formatted_Date'] = data.apply(convert_date, axis=1)
# plot
fig, ax = plt.subplots(1, 1)
ax.plot(data['Formatted_Date'], y,'ok')
## Set time format and the interval of ticks (every 6 months)
xformatter = md.DateFormatter('%Y-%m') # format as year, month
xlocator = md.MonthLocator(interval = 6)
## Set xtick labels to appear every 6 months
ax.xaxis.set_major_locator(xlocator)
## Format xtick labels as YYYY:mm
plt.gcf().axes[0].xaxis.set_major_formatter(xformatter)
plt.title("FTSE 100 Growth", fontsize=25)
plt.xlabel("Date", fontsize=14)
plt.ylabel("Adjusted %", fontsize=14)
plt.show()
Example output:

Pandas plotting--ignore time range

Let's say I have one-minute data during business hours of 8am to 4pm over three days. I would like to plot these data using the pandas plot function:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(51723)
dates = pd.date_range("11/8/2018", "11/11/2018", freq = "min")
df = pd.DataFrame(np.random.rand(len(dates)), index = dates, columns = ['A'])
df = df[(df.index.hour >= 8) & (df.index.hour <= 16)] # filter for business hours
fig, ax = plt.subplots()
df.plot(ax = ax)
plt.show()
However, the plot function also includes overnight hours in the plot, resulting in unintended plotting during this time:
I would the data to be plotted contiguously, ignoring the overnight time (something like this):
What is a good way to plot only the intended hours of 8am to 4pm?
This can be done by plotting each date on a different axis. But things like the labels will get cramped in certain cases.
import datetime
import matplotlib.pyplot as plt
pdates = np.unique(df.index.date) # Unique Dates
fig, ax = plt.subplots(ncols=len(pdates), sharey=True, figsize=(18,6))
# Adjust spacing between suplots
# (Set to 0 for continuous, though labels will overlap)
plt.subplots_adjust(wspace=0.05)
# Plot all data on each subplot, adjust the limits of each accordingly
for i in range(len(pdates)):
df.plot(ax=ax[i], legend=None)
# Hours 8-16 each day:
ax[i].set_xlim(datetime.datetime.combine(pdates[i], datetime.time(8)),
datetime.datetime.combine(pdates[i], datetime.time(16)))
# Deal with spines for each panel
if i !=0:
ax[i].spines['left'].set_visible(False)
ax[i].tick_params(right=False,
which='both',
left=False,
axis='y')
if i != len(pdates)-1:
ax[i].spines['right'].set_visible(False)
plt.show()

How to plot time series that consists of different dates but same timestamps on one graph in matplotlib

I have data that shows some values collected on three different dates: 2015-01-08, 2015-01-09 and 2015-01-12. For each date there are several data points that have timestamps.
Date/times are in a list and it looks as follows:
['2015-01-08-09:00:00', '2015-01-08-10:00:00', '2015-01-08-11:00:00', '2015-01-08-12:00:00', '2015-01-08-13:00:00', '2015-01-09-14:00:00', '2015-01-09-15:00:00', '2015-01-09-16:00:00', '2015-01-12-09:00:00', '2015-01-12-10:00:00', '2015-01-12-11:00:00']
On the other hand I have corresponding values (floats) in another list:
[12210.0, 12210.0, 12180.0, 12240.0, 12250.0, 12420.0, 12390.0, 12400.0, 12380.0, 12450.0, 12460.0]
To put all this together and plot a graph I use following code:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as md
import dateutil
from matplotlib.font_manager import FontProperties
timestamps = ['2015-01-08-09:00:00', '2015-01-08-10:00:00', '2015-01-08-11:00:00', '2015-01-08-12:00:00', '2015-01-08-13:00:00', '2015-01-09-14:00:00', '2015-01-09-15:00:00', '2015-01-09-16:00:00', '2015-01-12-09:00:00', '2015-01-12-10:00:00', '2015-01-12-11:00:00']
ticks = [12210.0, 12210.0, 12180.0, 12240.0, 12250.0, 12420.0, 12390.0, 12400.0, 12380.0, 12450.0, 12460.0]
plt.subplots_adjust(bottom=0.2)
plt.xticks( rotation=90 )
dates = [dateutil.parser.parse(s) for s in timestamps]
ax=plt.gca()
ax.set_xticks(dates)
ax.tick_params(axis='x', labelsize=8)
xfmt = md.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(dates, ticks, label="Price")
plt.xlabel("Date and time", fontsize=12)
plt.ylabel("Price", fontsize=12)
plt.suptitle("Price during last three days", fontsize=12)
plt.legend(loc=0,prop={'size':8})
plt.savefig("figure.pdf")
When I try to plot these datetimes and values I get a messy graph with the line going back and forth.
It looks like the dates are being ignored and only timestamps are taken in account which is the reason for the messy chart. I tried to edit the datetimes to have the same date and consecutive timestamps and it fixed the chart. However, I must have dates as well..
What am I doing wrong?
When I try to plot these datetimes and values I get a messy graph with the line going back and forth.
Your plots are going all over the place because plt.plot connects the dots in the order you give it. If this order is not monotonically increasing in x, then it looks "messy". You can sort the points by x first to fix this. Here is a minimal example:
import numpy as np
import pylab as plt
X = np.random.random(20)
Y = 2*X+np.random.random(20)
idx = np.argsort(X)
X2 = X[idx]
Y2 = Y[idx]
fig,ax = plt.subplots(2,1)
ax[0].plot(X,Y)
ax[1].plot(X2,Y2)
plt.show()

Categories