I have read in a monthly temperature anomalies csv file using Pandas read.csv() function. Years are from 1881 to 2022. I excluded the last 3 months of 202 to avoid -999 values). Date format is yyyy-mm-dd. How can I just plot the year and only one value instead of 12 on the x-axis (i.e., I don't need 12 1851s, 1852s, etc.)?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.dates import YearLocator, MonthLocator, DateFormatter
import matplotlib.dates as mdates
ds = pd.read_csv('path_to_file.csv', header='infer', engine='python', skipfooter=3)
dates = ds['Date']
tAnoms = ds[' Berkeley Earth 2m Air Temperature (degree C) 0N-90N;0E-360E']
fig = plt.figure(figsize=(10,10))
ax = plt.subplot(111)
ax.plot(dates,tAnoms)
ax.plot(dates,tAnoms.rolling(60, center=True).mean())
ax.xaxis.set_major_locator(mdates.YearLocator(month=1) # EDIT
years_fmt = mdates.DateFormatter('%Y') # EDIT 2
ax.xaxis.set_major_formatter(years_fmt) # EDIT 2
plt.show()
EDIT: adding the following gives me the 2nd plot
EDIT 2: Gives me yearly values, but only from 1970-1975. 3rd plot
You could:
Create a new column year from your Date column.
Compute the average temperature for each year (using mean or median): df.groupby(['year']).mean()
So, I found a good, but maybe not perfect solution. First thing I needed to do was use parse_dates & infer_datetime_format when reading in the csv file. Then, convert dates to pydatetime(). mdates.AutoDateLocator() was what I needed along with set_major_formatter. Not sure how I could manually change the interval, however (e.g., change to every 10 years or 25 years instead of using the default. This does work well enough though.
ds = pd.read_csv('path_to_file.csv', parse_dates=['Date'], infer_datetime_format=True,
header='infer', engine='python', skipfooter=3)
dates = ds['Date'].dt.to_pydatetime() # Convert to pydatetime()
tAnoms = ds[' Berkeley Earth 2m Air Temperature (degree C) 0N-90N;0E-360E']
fig = plt.figure(figsize=(10,10))
ax = plt.subplot(111)
# Produce plot
ax.plot(dates,tAnoms.rolling(60, center=True).mean())
# Use AutoDateLocator() from matplotlib.dates (mdates)
# Set date format to years
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
years_fmt = mdates.DateFormatter('%Y')
ax.xaxis.set_major_formatter(years_fmt)
plt.show()
Related
I am plotting some time series from .nc files using pandas, xarray and matplotlib. I have two datasets:
Sea Surface Temerature from 1982 to 2019, from which I plot the monthly mean for my area and represent the monthly temperature variation for those 37 years.
Sea Sea Surface Temerature from 2020 to 2021, where I plot the monthly variation for each of the years.
Two plot this, I use te following code (PLEASE NOTE THAT DUE TO MEMORY ALLOCATION ISSUES I HAD WHILE LOOPING THROUGH THE VARIABLES I WROTE A VERY BASIC CODE WITH NO LOOPS, SORRY FOR THAT!)
import xarray as xr
import matplotlib.pyplot as plt
from matplotlib import dates as md
import pandas as pd
import numpy as np
import netCDF4
import seaborn as sns
import marineHeatWaves as mhw
import datetime
sns.set()
ds_original = xr.open_dataset('sst_med_f81_to21_L4.nc')
ds_original_last = xr.open_dataset('sst_med_f20_to21_L4.nc')
extract_date = datetime.datetime.today()
date = extract_date.strftime("%Y-%m-%d")
ds1 = ds_original.sel(time=slice('1982-01-01','2019-12-31'))
ds2 = ds_original_last.sel(time=slice('2020-01-01','2020-12-31'))
ds3 = ds_original_last.sel(time=slice('2021-01-01', date))
# Convert to Pandas Dataframe
df1 = ds1.to_dataframe().reset_index().set_index('time')
df2 = ds2.to_dataframe().reset_index().set_index('time')
df3 = ds3.to_dataframe().reset_index().set_index('time')
#Converting to Celsius
def kelvin_to_celsius(temp_k):
"""
Receives temperature in K and returns
temperature in CÂș
"""
temp_c = temp_k - 273.15
return temp_c
df1['analysed_sst_C'] = kelvin_to_celsius(df1['analysed_sst'])
df2['analysed_sst_C'] = kelvin_to_celsius(df2['analysed_sst'])
df3['analysed_sst_C'] = kelvin_to_celsius(df3['analysed_sst'])
#Indexing by month and yearday
df1['month'] = df1.index.month
df1['yearday'] = df1.index.dayofyear
df2['month'] = df2.index.month
df2['yearday'] = df2.index.dayofyear
df3['month'] = df3.index.month
df3['yearday'] = df3.index.dayofyear
# Calculating the average
daily_sst_82_19 = df1.analysed_sst_C.groupby(df1.yearday).agg(np.mean)
daily_sst_2020 = df2.analysed_sst_C.groupby(df2.yearday).agg(np.mean)
daily_sst_2021 = df3.analysed_sst_C.groupby(df3.yearday).agg(np.mean)
# Quick Plot
sns.set_theme(style="whitegrid")
fig, ax=plt.subplots(1, 1, figsize=(15, 7))
ax.xaxis.set_major_locator(md.MonthLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('%b'))
ax.margins(x=0)
plt.plot(daily_sst_82_19, label='1982-2019')
plt.plot(daily_sst_2020,label='2020')
plt.plot(daily_sst_2021,label='2021', c = 'black')
plt.legend(loc = 'upper left')
I obtain the following plot:
I want my plot to start with Jan and end with Dec, but I cannot figure out where is the problem. I have tried to set x axis limit between to specific dates, but this creates a conflict as one of the time series is for 37 years and the other two are for 1 year only.
Any help would be very appreciated!!
UPDATE
I figured out how to move the months, specifying the follwing:
ax.xaxis.set_major_locator(MonthLocator(bymonthday=2))
So I obtained this:
But I still ned to delete that last Jan, and I cannot figure out how to do it.
Okay so I figure out how to solve the issue.
Fine tunning plot parameters, I switched the DateFormatter to %D, to see the year as well. For my surprise, the year was set to 1970 and I have no idea why, because my oldest dataset starts in 1981. So once I discovered this, I set up the xlims to the ones you can read below and it worked pretty well:
#Add to plot settings:
ax.set_xlim(np.datetime64('1970-01-01'), np.datetime64('1970-12-31'))
ax.xaxis.set_major_locator(MonthLocator(bymonthday=1))
ax.xaxis.set_major_formatter(md.DateFormatter('%b'))
Result:
I have a dataset containing information related to COVID-19 data with columns = ['total_cases', 'new_cases', 'date']. The data increases monotonically with atleast no sudden spikes in new_cases in January month. The dataset can be found here: https://fnvuusdqoptinxntjrmodi.coursera-apps.org/edit/CovidIndiaData.csv with lots of columns out of which I use only ['total_cases', 'new_cases', 'date'].
First 10 days data is 0 for 'new_cases' as shown in this image:
I use this code to plot bar plot for 'date' vs 'new_cases':
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
df = pd.read_csv("CovidIndiaData.csv", parse_dates=['date'], index_col=['date'])
df = df[['new_cases', 'total_cases']]
df.fillna(0)
fig = plt.figure()
ax = plt.gca()
ax.bar(df.index.values,
df['new_cases'],
color='purple')
ax.set(xlabel="Date",
ylabel="New Cases",
title="New Cases per day",
xlim=["2020-01-01", "2020-07-18"])
date_form = DateFormatter("%m-%d")
ax.xaxis.set_major_formatter(date_form)
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
plt.setp(ax.get_xticklabels(), rotation=45)
plt.show()
The final plot looks like this:
The plot shows some spikes at 7th January ('01-07' on plot) where clearly in dataset the new_cases are 0. This is continued approximately after every one month interval.
Where does this data come from? How can I plot a correct graph for this data?
Thanks to Davis Herring for pointing out my mistake.
In case anyone faces similar issue, the solution is to specify date format when your date isn't in standardized format.
What I did is:
mydateparser = lambda x: pd.datetime.strptime(x, "%d-%m-%Y")
df = pd.read_csv("CovidIndiaData.csv", parse_dates=['date'], date_parser=mydateparser, index_col=['date'])
I am trying to plot temperature over time and use the datetime format for it. But when I plot it lines are obscurring the plot seemingly random. Maybe this is due to the cyclical nature of a year? Just a thought
here is the code:
the column df["DateTime"] is a datetime object.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
days = mdates.DayLocator()# every year
hours = mdates.HourLocator()# every month
days_fmt = mdates.DateFormatter('%D')
fig, ax = plt.subplots()
ax.minorticks_on()
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(days_fmt)
ax.xaxis.set_minor_locator(hours)
datemin = df['DateTime'].head(1)
datemax = df['DateTime'].tail(1)
ax.set_xlim(datemin, datemax)
ax.plot(df.DateTime, df.TempTop, label = 'Top')
ax.set_ylabel('Temperature in Celsius')
Plot produced by the code:
the function
dfData.DateTime = pd.to_datetime(dfData['DateTime'])
was switching months and days at (to me it seems) random times.
Setting dayfirst=True:
pd.to_datetime(dfData['DateTime'], dayfirst=True)
resolved the issue.
Solution came from here:
Python Pandas : pandas.to_datetime() is switching day & month when day is less than 13
I have pulled in a dataset that I want to use, with columns named Date and Adjusted. Adjusted is just the adjusted percentage growth on the base month.
The code I currently have is:
x = data['Date']
y = data['Adjusted']
fig = plt.figure(dpi=128, figsize=(7,3))
plt.plot(x,y)
plt.title("FTSE 100 Growth", fontsize=25)
plt.xlabel("Date", fontsize=14)
plt.ylabel("Adjusted %", fontsize=14)
plt.show()
However, when I run it I get essentially a solid black line across the bottom where all of the dates are covering each other up. It is trying to show every single date, when obviously I only want to show major ones. That dates are in the format Apr-19, and the data runs from Oct-03 to May-20.
How do I limit the number of date ticks and labels to one per year, or any amount I choose? If you do have a solution, if you could respond with the edits made to the code itself that would be great. I've tried other solutions I've found on here but I haven't been able to get it to work.
dates module of matplotlib will do the job. You can control the interval by modifying the MonthLocator (It's currently set to 6 months). Here's how:
import pandas as pd
from datetime import date, datetime, timedelta
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
import matplotlib.ticker as ticker
x = data['Date']
y = data['Adjusted']
#converts differently formatted date to a datetime object
def convert_date(df):
return datetime.strptime(df['Date'], '%b-%y')
data['Formatted_Date'] = data.apply(convert_date, axis=1)
# plot
fig, ax = plt.subplots(1, 1)
ax.plot(data['Formatted_Date'], y,'ok')
## Set time format and the interval of ticks (every 6 months)
xformatter = md.DateFormatter('%Y-%m') # format as year, month
xlocator = md.MonthLocator(interval = 6)
## Set xtick labels to appear every 6 months
ax.xaxis.set_major_locator(xlocator)
## Format xtick labels as YYYY:mm
plt.gcf().axes[0].xaxis.set_major_formatter(xformatter)
plt.title("FTSE 100 Growth", fontsize=25)
plt.xlabel("Date", fontsize=14)
plt.ylabel("Adjusted %", fontsize=14)
plt.show()
Example output:
My dataframe is like this-
Energy_MWh Month
0 39686.82 1979-01
1 35388.78 1979-02
2 50134.02 1979-03
3 37499.22 1979-04
4 20104.08 1979-05
5 17440.26 1979-06
It goes on like this to the month 2015-12. So you can imagine all the data.
I want to plot a continuous graph with the months as the x-axis and the Energy_MWh as the y-axis. How to best represent this using matplotlib?
I would also like to know for my knowledge if there's a way to print 1979-01 as Jan-1979 on the x-axis and so on. Probably a lambda function or something while plotting.
Borrowed liberally from this answer, which you should go out and upvote:
from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
df = <set_your_data_frame_here>
myDates = pd.to_datetime(df['Month'])
myValues = df['Energy_MWh']
fig, ax = plt.subplots()
ax.plot(myDates,myValues)
myFmt = DateFormatter("%b-%Y")
ax.xaxis.set_major_formatter(myFmt)
## Rotate date labels automatically
fig.autofmt_xdate()
plt.show()
Set Month as the index:
df.set_index('Month', inplace=True)
Convert the index to Datetime:
df.index = pd.DatetimeIndex(df.index)
Plot:
df.plot()