How to change scale of a plot with Maplotlib and add increments? - python

I am making a simple plot in Python with Matplotlib that shows populations of different regions over time. I have a CSV file that has columns of each region's population over the years, so the years is on the x-axis and population is on the y-axis. The plot looks okay except the y-axis. As you can see in the image, every single population value is included on the y-axis, which is too many values and is unnecessary. I would like to y-axis to have some increments (such as 100 million). Is there a simple way to do that or would I have to manually add my own increments?
And I tried to scale it linearly and logarithmic but I would still prefer to have increments on the y-axis.
This is what the plot looks like right now.
(I took out unnecessary code such as legend and formatting):
data2 = pd.read_csv('data02_world.csv')
for region in data2:
if region != 'Year':
plt.plot(data2.Year, data2[region], marker='.', label=region)
plt.xlabel('Year')
plt.ylabel('Population')
plt.show()

I think you can simply do with pandas:
data2 = pd.read_csv('data02_world.csv')
data2.set_index('Year', inplace=True)
data2.plot()
if you would like to adopt matplotlib plt.yticks is what you need

Related

How to skip some x-values in matplotlib plots to reduce the density?

I'm trying to plot minimum and maximum daily temperature values for last 20 years. Since there are too many days in between, my plot graph looks too complicated.
How can I make change the frequency of days to reduce the density of my graph?
In other words, I want to set that it gets the weather of one day and then skips following 2 days in the plot without changing the dataframe.
fig, ax = plt.subplots()
colors = ["Orange", "Blue"]
for i,col in enumerate(weather_data.columns):
if col is "Date": continue
ax.plot('Date', col, data=weather_data)
ax.set_xlabel("Date")
ax.set_ylabel("Temperature (Celcius)")
# set 15 xticks to prevent overlapping
ax.set_xticks(np.arange(0, weather_data.shape[0],weather_data.shape[0] / 15))
ax.legend()
fig.autofmt_xdate()
ax.set_title('Time Plot of Weather');
Dataset:
https://drive.google.com/uc?id=1O-7DuL6-bkPBpz7mAUZ7M62P6EOyngG2
Hard to say without sample data, but one option is to show only one data point out of every k data points in the original DataFrame, and interpolate the missing days with straight line segments. (This is basically downsampling.)
For example, to show every 5 data points, change this line:
ax.plot('Date', col, data=weather_data)
to this:
ax.plot('Date', col, data=weather_data.iloc[::5])
There are other approaches such as nonlinear interpolation or showing a rolling average, but this should serve as a starting point.

How to change tick labels for plot chart from 19:00-7:00 hours in matplotlib

I am trying to plot line charts for both nighttime and daytime to compare the differences in traffic volume in both time periods.
plt.subplot(2,1,1) #plot in grid chart to better compare differences
by_hour_business_night['traffic_volume'].plot.line()
plt.title('Business Nights Traffic Volume by Hours')
plt.ylabel('Traffic Volume')
plt.ylim(0,6500)
plt.show()
The chart for nighttime shows up alright, but the xtick labels are in [0,5,10,15,20,25], how can I change the labels to fit the hours? Something along the lines like: [0,1,2,3,4,5,6,19,20,21,22,23]
I have tried
x=[0,1,2,3,4,5,6,19,20,21,22,23]
plt.xticks(x)
But then I just got [0-6] on the left, and [19-23] on the right, both crammed on either side, leaving the middle of the xticks blank.
Or is there a better way to plot the chart? Since there will be a breaking point between 6 and 19 hours, is there a way to avoid this?
I am new to python and matplotlib, so forgive me if my wordings aren't precise enough.
xticks takes in two arguments: an array-like object of the placements and an array-like object of the labels. So you can do something like this:
plt.xticks(x, x)
This will set a label equal to the placement of the xtick. For more info you can read the docs for xtick here

Date labels intersecting

I'm using Matplotlib to plot data on Ubuntu 15.10. My y-axis has numeric values and my x-axis timestamps.
I'm having the problem that the date labels intersect with each other making it look bad. How do I increase the distance between the x-axis ticks/labels to be evenly spaced still? Since the automatic selection of ticks was bad I'm okay with manually setting the amount of date ticks. Any other solution is appreciated, too.
Besides, I'm using the following DateFormatter:
formatter = DateFormatter('%m/%d/%y')
axis = plt.gca()
axis.xaxis.set_major_formatter(formatter)
You could add the following to your code:
plt.gcf().autofmt_xdate()
Which automatically formats the x axis for you (rotates the labels to something like 30 degrees etc).
You can also manually set the amount of x ticks that show on your x-axis to avoid it getting crowded, by using the following:
max_xticks = 10
xloc = plt.MaxNLocator(max_xticks)
ax.xaxis.set_major_locator(xloc)
I personally use both together as it makes the graph look much nicer when using dates.
You can simply set the locations you want to be labeled:
axis.set_xticks(x[[0, int(len(x)/2), -1]])
where x would be your array of timestamps

grids of graphs in matplotlib

Using the AXIS notation for matplotlib has allowed me to manually plot a grid of 2x2 or 3x3 or whatever size grid (if I know what size grid I want beforehand.)
However, how do you determine what size grid is needed automatically. Like what if you don't know how many unique values are in a column that you want to graph?
I am thinking there must be a way of doing this in a loop and figuring out based on the number of unique values in the column this is how big the graph needs to be.
Example
When I plot this for some reason it doesn't show month_name on the x axis (as in Jan, Feb, Marc etc)
avg_all_account.plot(legend=False,subplots=True,x='month_date',figsize=(10,20))
plt.xlabel('month')
plt.ylabel('number of proposals')
Yet when I plot subplots on a figure and specify x axis paremeter x='month_name' The month name appears on the plot here:
f = plt.figure()
f.set_figheight(8)
f.set_figwidth(8)
f.sharex=True
f.sharey=True
#graph1 = f.add_subplot(2,2,1)
avg_all_account.ix[0:,['month_date','number_open_proposals_all']].plot(ax=f.add_subplot(331),legend=False,subplots=True,x='month_date',y='number_open_proposals_all',title='open proposals')
plt.xlabel('month')
plt.ylabel('number of proposals')
Thus because the subplot method worked and showed the month_name on the x axis, and my x and y axis labels: I wanted to know how would I work out how many subplots I would need without first calculating it, then writing out each line and hard coding the subplot position?

Plotting multiple timeseries power data using matplotlib and pandas

I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):
2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)
The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.
The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:
import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')
I do get a plot but the X axis is not right for each station.
I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.
I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.
I'd rather plot the typical way, smth like:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()
( http://matplotlib.org/users/pyplot_tutorial.html )
Re more subplots: simply call plt.plot() multiple times, once for each data series.
P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?
Sorry for the comment above where I needed to add code. Still learning . .
From the 5th code line:
import matplotlib.dates as mdates
for stn in d.keys():
plt.figure()
d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
plt.savefig('Station_Power_'+stn+'.png')
Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.
The key to getting a continuous plot is to use the interpolate() method in the plot.
With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.

Categories