In Python, I am pulling in data from a data frame that should show me the number of COVID-19 cases by date. See example values for three dates:
date: 20201201; positive: 10000
date: 20201202; positive: 10500
date: 20201203; positive: 11000
I am hitting a roadblock when I try to format the plot I created. How can I increase the font of the x and y axes and modify the intervals so that instead of each individual date being shown, I can show the only first day of every month? Note that "date" is currently listed as an object and "positive" is listed as int64.
Also, what does the 121 represent in my code? I picked this up from somewhere else and noticed whenever I change the number, I get an error.
Thanks in advance.
import matplotlib.pyplot as plt
import numpy as np
x = data["date"]
y = data["positive"]
fig = plt.figure(figsize=(75, 25))
# Adds subplot on position 1
ax = fig.add_subplot(121)
ax.plot(x, y)
plt.show()
plt.xlabel('xlabel', fontsize=10) # for the label
plt.xticks(fontsize=10) # for the tick values
Also suggest using Seaborn library for plotting data, relatively straightforward than Matplotlib
Quick resource as example : https://seaborn.pydata.org/examples/errorband_lineplots.html
Related
I have a data frame with only three columns:timestamp (type: datetime), favorite_count, and retweet_count.
I want to draw a scatter plot to show how the number of favorite_count and retweet_count changed over time.
Here is my code:
x = df.timestamp
y = df.favorite_count
plt.scatter(x,y,alpha = 0.4)
The result:
I met two problems:
Since the timestamp is like "2017-08-01 16:23:56+00:00", I cannot see the values in x-axis.
How can I only show year-month, and set interval as 2 month in x-axis? (like: 2017-06 2017-08 ...)
I want to show two variables (favorite_count and retweet_count) in the same plot with different colors. How failed to do that.
Thank you in advance.
Although the sample data is not optimal, two of the issues can be addressed by the following.
You can use the MonthLocator(interval=2) to display the X-axis in two months.
Two types of data can be displayed in a graph with different Y-values.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig = plt.figure(figsize=(12,9),dpi=144)
ax = fig.add_subplot(111)
ax.scatter(df.index, df['favorite_count'], alpha=0.4)
ax.scatter(df.index, df['retweet_count'], alpha=0.4)
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonth=None, interval=2, tz=None))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
plt.show()
I have a data in this format.
Using the data above, I created a graph using code below.
from matplotlib import pyplot
pyplot.figure(figsize=(12,8))
pyplot.plot(df_movie['date'], df_movie['rate'], label = 'rating')
pyplot.xticks(rotation=90)
pyplot.legend(loc='best')
pyplot.grid()
pyplot.show()
Below is the result of this code.
There is xticks label for every single date. But I want xticks labels to be shown by every week or every other 10 days or so on. How can I do this?
Thank you
Here's a very simple way of doing it. You can set the parameter interval to 7 or 10 or any other value you fancy:
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
x = list('ABCDEFGHIJ')
y = [1,2,4,8,16,32,64,128,256,512]
fig, ax = plt.subplots(1)
interval = 2 # This parameter regulates the interval between xticks
plt.plot(x,y)
ax.xaxis.set_major_locator(tkr.MultipleLocator(interval))
plt.show()
You should first set your date column to the right type.
df_amovie['date'] = pd.to_datetime(df_amovie['date'])
Then if you plot you will probably get the desired result.
I have a pandas series with index as datetime which I am trying to visualize,
using bar graph. My code is below. But the chart I am getting is not quite accurate (pic below) it seems. How do I fix this?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
s2 = pd.Series(np.random.randint(100,1000,size=(30)),index=dti)
df4 = s2.to_frame(name='count')
print('\ndf4:')
print(df4)
print(type(df4))
f2 = plt.figure("Quarterly",figsize=(10,5))
ax = plt.subplot(1,1,1)
ax.bar(df4.index,df4['count'])
plt.tight_layout()
plt.show()
Unfortunately, matplotlib's bar plots don't seem to play along very happily with pandas dates.
In theory, matplotlib expresses the bar widths in days. But if you try something like ax.bar(df4.index,df4['count'], width=30), you'll see the plot with extremely wide bars, almost completely filling the plot. Experimenting with the width, something weird happens. When width is smaller than 2, it looks like it is expressed in days. But with the width larger than 2, it suddenly jumps to something much wider.
On my system (matplotlib 3.1.2, pandas 0.25.3, Windows) it looks like:
A workaround uses the bar plots from pandas. These seem to make the bars categorical, with one tick per bar. But they get labelled with a full date including hours, minutes and seconds. You could relabel them, for example like:
df4.plot.bar(y='count', width=0.9, ax=ax)
plt.xticks(range(len(df4.index)),
[t.to_pydatetime().strftime("%b '%y") for t in df4.index],
rotation=90)
Investigating further, the inconsistent jumping around of matplotlib's bar width, seems related to the frequency build into pandas times. So, a solution could be to convert the dates to matplotlib dates. Trying this, yes, the widths get expressed consistently in days.
Unfortunately, the quarterly dates don't have exactly the same number of days between them, resulting in some bars too wide, and others too narrow. A solution to this next problem is explicitly calculating the number of days for each bar. In order to get nice separations between the bars, it helps to draw their edges in white.
from datetime import datetime
x = [datetime.date(t) for t in df4.index] # convert the pandas datetime to matplotlib's
widths = [t1-t0 for t0, t1 in zip(x, x[1:])] # time differences between dates
widths += [widths[-1]] # the very last bar didn't get a width, just repeat the last width
ax.bar(x, df4['count'], width=widths, edgecolor='white')
You can set the width of the bars via the width argument in ax.bar() to some value larger than the default of 0.8
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(100)
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
s2 = pd.Series(np.random.randint(100,1000,size=(30)),index=dti)
df4 = s2.to_frame(name='count')
f2 = plt.figure("Quarterly",figsize=(10,5))
ax = plt.subplot(1,1,1)
ax.bar(df4.index,df4['count'], width=70)
plt.tight_layout()
plt.show()
In this case the width is interpreted as a scalar in days.
Edit
For some reason the above only works correctly for older versions of matplotlib (tested 2.2.3). In order to work with current (3.1.2) version the following modification must be made:
# ...
dti = pd.date_range('2012-12-31', periods=30, freq='Q')
dti = [pd.to_datetime(t) for t in dti]
# ...
which will then produce the correct behavior in setting the widths of the bars.
I have a line chart based on a simple list of numbers. By default the x-axis is just the an increment of 1 for each value plotted. I would like to be a percentage instead but can't figure out how. So instead of having an x-axis from 0 to 5, it would go from 0% to 100% (but keeping reasonably spaced tick marks. Code below. Thanks!
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid.axislines import Subplot
data=[8,12,15,17,18,18.5]
fig=plt.figure(1,(7,4))
ax=Subplot(fig,111)
fig.add_subplot(ax)
plt.plot(data)
The code below will give you a simplified x-axis which is percentage based, it assumes that each of your values are spaces equally between 0% and 100%.
It creates a perc array which holds evenly-spaced percentages that can be used to plot with. It then adjusts the formatting for the x-axis so it includes a percentage sign using matplotlib.ticker.FormatStrFormatter. Unfortunately this uses the old-style string formatting, as opposed to the new style, the old style docs can be found here.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you can do as follows to set the axis:
import matplotlib.ticker as mtick
# Actual plotting code omitted
ax.xaxis.set_major_formatter(mtick.PercentFormatter(5.0))
This will display values from 0 to 5 on a scale of 0% to 100%. The formatter is similar in concept to what #Ffisegydd suggests doing except that it can take any arbitrary existing ticks into account.
PercentFormatter() accepts three arguments, max, decimals, and symbol. max allows you to set the value that corresponds to 100% on the axis (in your example, 5).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Note that this formatter will use whatever ticks would normally be generated if you just plotted your data. It does not modify anything besides the strings that are output to the tick marks.
Update
PercentFormatter was accepted into Matplotlib in version 2.1.0.
Totally late in the day, but I wrote this and thought it could be of use:
def transformColToPercents(x, rnd, navalue):
# Returns a pandas series that can be put in a new dataframe column, where all values are scaled from 0-100%
# rnd = round(x)
# navalue = Nan== this
hv = x.max(axis=0)
lv = x.min(axis=0)
pp = pd.Series(((x-lv)*100)/(hv-lv)).round(rnd)
return pp.fillna(navalue)
df['new column'] = transformColToPercents(df['a'], 2, 0)
I am trying to plot datetime on y axis and time on x-axis using a bar graph. I need to specify the heights in terms of datetime of y-axis and I am not sure how to do that.
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import datetime as dt
# Make a series of events 1 day apart
y = mpl.dates.drange(dt.datetime(2009,10,1),
dt.datetime(2010,1,15),
dt.timedelta(days=1))
# Vary the datetimes so that they occur at random times
# Remember, 1.0 is equivalent to 1 day in this case...
y += np.random.random(x.size)
# We can extract the time by using a modulo 1, and adding an arbitrary base date
times = y % 1 + int(y[0]) # (The int is so the y-axis starts at midnight...)
# I'm just plotting points here, but you could just as easily use a bar.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(times, y, width = 10)
ax.yaxis_date()
fig.autofmt_ydate()
plt.show()
I think it could be your y range.
When I do your code I get a plot with the y axis ranging from about 11/Dec/2011 to 21/Dec/2011.
However, you've generated dates ranging from 1/10/2009 to 15/1/2010.
When I adjusted my y limits to take this into account it worked fine. I added this after the ax.bar.
plt.ylim( (min(y)-1,max(y)+1) )
Another reason the output is confusing is that since you've picked a width of 10, the bars are too wide and are actually obscuring each other.
Try use ax.plot(times,y,'ro') to see what I mean.
I produced the following plot using ax.bar(times,y,width=.1,alpha=.2) and ax.plot(times,y,'ro') to show you what I meant about bars overlapping each other:
And that's with a width of .1 for the bars, so if they had a width of 10 they'd be completely obscuring each other.