Plot with many data points - python

New to matplotlib. Basically, I have 2 axis using ax.twinx(). I have daily data going back 20 years. One of plots (LHS - Red) shows up as expected, but when I add the second plot (RHS - Blue), it doesn't show up as I want it as there is a large variation in the data points.
How can I fix it? I want it to be a smooth line.
When I add the second subplot
This is what I want the blue line to look like:
Before I add the second subplot

Related

How to change tick labels for plot chart from 19:00-7:00 hours in matplotlib

I am trying to plot line charts for both nighttime and daytime to compare the differences in traffic volume in both time periods.
plt.subplot(2,1,1) #plot in grid chart to better compare differences
by_hour_business_night['traffic_volume'].plot.line()
plt.title('Business Nights Traffic Volume by Hours')
plt.ylabel('Traffic Volume')
plt.ylim(0,6500)
plt.show()
The chart for nighttime shows up alright, but the xtick labels are in [0,5,10,15,20,25], how can I change the labels to fit the hours? Something along the lines like: [0,1,2,3,4,5,6,19,20,21,22,23]
I have tried
x=[0,1,2,3,4,5,6,19,20,21,22,23]
plt.xticks(x)
But then I just got [0-6] on the left, and [19-23] on the right, both crammed on either side, leaving the middle of the xticks blank.
Or is there a better way to plot the chart? Since there will be a breaking point between 6 and 19 hours, is there a way to avoid this?
I am new to python and matplotlib, so forgive me if my wordings aren't precise enough.
xticks takes in two arguments: an array-like object of the placements and an array-like object of the labels. So you can do something like this:
plt.xticks(x, x)
This will set a label equal to the placement of the xtick. For more info you can read the docs for xtick here

I need to plot multiple mayavi plots in in one screen with ability to chose one at a time

I have a data frame in which has 11 columns the first 3 columns give me the x,y,z coordinates. The rest 8 columns I need to plot separately.
mlab.plot3d(df['x_coordinate'],df['y_coordinate'],df['z_coordinate'],df['Fx'],tube_radius=1, colormap='copper', opacity = 0.5)
mlab.show()
mlab.plot3d(df['x_coordinate'],df['y_coordinate'],df['z_coordinate'],df['Fy'],tube_radius=0.5, colormap='spectral', opacity = 0.5)
mlab.show()
This is an example code I am trying to plot two columns. I am satisfied with the result but with this, all the plots of "Fx", "Fy" are getting into one plot. I want a single screen but I want that I should be able to switch between "Fx" "Fy",,,,etc. As you can see in the image once I will plot all 8 columns then it will hard to read from one image.

python bar chart not centered

I am trying to build a simple histogram. For some reason, my bars are behaving abnormally. As you can see in this picture, my bar over "3" is moved to the right side. I am not sure what caused it. I did align='mid' but it did not fix it.
This is the code that I used to create it:
def createBarChart(colName):
df[colName].hist(align='mid')
plt.title(str(colName))
RUNS = [1,2,3,4,5]
plt.xticks(RUNS)
plt.show()
for column in colName:
createBarChart(column)
And this is what I got:
bar is not centered over 3
To recreate my data:
df = pd.DataFrame(np.random.randint(1,6,size=(100, 4)), columns=list('ABCD'))
Thank you for your help!
P/s: idk if this info is relevant, but I am using seaborn-whitegrid style. I tried to recreate a plot with sample data and it's still showing up. Is it a bug?
hist created using random data
The hist function is behaving exactly as it is supposed to. By default it splits the data you pass into 10 bins, with the left edge of the first bin at the data's minimum value and the right edge of the last bin at its maximum. The chart below shows the randomly generated data binned this way, with red dashed lines to mark the edges of the bins.
The way around this is to define the bin edges yourself, with a slight adjustment to the minimum and maximum values to centre the bars over the x axis ticks. This can be done quite easily with numpy's linspace function (using column A in the randomly generated data frame as an example):
bins = np.linspace(df["A"].min() - .5, df["A"].max() + .5, 6)
df["A"].hist(bins=bins)
We ask for 6 values because we are defining the bin edges, this will result in 5 bins, as shown in this chart:
If you wanted to keep the gaps between the bars you can increase the number of bins to 9 and adjust the offset slightly, but this wouldn't work in all cases (it works here because every value is either 1, 2, 3, 4 or 5).
bins = np.linspace(df["A"].min() - .25, df["A"].max() + .25, 10)
df["A"].hist(bins=bins)
Finally, as this data contains discrete values and really you are plotting the counts, you could use the value_counts function to create a series that can then be plotted as a bar chart:
df["A"].value_counts().sort_index().plot(kind="bar")
# Provide a 'color' argument if you need all of the bars to look the same.
df["A"].value_counts().sort_index().plot(kind="bar", color="steelblue")
Try using something like this in your code to create all of the histogram bars to the same place.
plt.hist("Your data goes here", bins=range(1,7), align='left', rwidth=1, normed=True)
place your data where I put your data goes here

Matplotlib markers which plot and render fast

I'm using matplotlib to plot 5 sets of approx. 400,000 data points each. Although each set of points is plotted in a different color, I need different markers for people reading the graph on black and white print-outs. The issue I'm facing is that almost all of the possible markers available in the documentation at http://matplotlib.org/api/markers_api.html take too much time to plot and render while displaying. I could only find two markers which plot and render quickly, these are '-' and '--'. Here's my code:
plt.plot(series1,'--',label='Label 1',lw=5)
plt.plot(series2,'-',label='Label 2',lw=5)
plt.plot(series3,'^',label='Label 3',lw=5)
plt.plot(series4,'*',label='Label 4',lw=5)
plt.plot(series5,'_',label='Label 5',lw=5)
I tried multiple markers. Series 1 and series 2 plot quickly and render in no time. But series 3, 4, and 5 take forever to plot and AGES to display.
I'm not able to figure out the reason behind this. Does someone know of more markers that plot and render quickly?
The first two ('--' and '-') are linestyles not markers. Thats why they are rendered faster.
It doesn't make sense to plot ~400,000 markers. You wont be able to see all of them... However, what you could do is to only plot a subset of the points.
So add the line with all your data (even though you could probably also subsample that too) and then add a second "line" with only the markers.
for that you need an "x" vectors, which you can subsample too:
# define the number of markers you want
nrmarkers = 100
# define a x-vector
x = np.arange(len(series3))
# calculate the subsampling step size
subsample = int(len(series3) / nrmarkers)
# plot the line
plt.plot(x, series3, color='g', label='Label 3', lw=5)
# plot the markers (using every `subsample`-th data point)
plt.plot(x[::subsample], series3[::subsample], color='g',
lw=5, linestyle='', marker='*')
# similar procedure for series4 and series5
Note: The code is written from scratch and not tested

grids of graphs in matplotlib

Using the AXIS notation for matplotlib has allowed me to manually plot a grid of 2x2 or 3x3 or whatever size grid (if I know what size grid I want beforehand.)
However, how do you determine what size grid is needed automatically. Like what if you don't know how many unique values are in a column that you want to graph?
I am thinking there must be a way of doing this in a loop and figuring out based on the number of unique values in the column this is how big the graph needs to be.
Example
When I plot this for some reason it doesn't show month_name on the x axis (as in Jan, Feb, Marc etc)
avg_all_account.plot(legend=False,subplots=True,x='month_date',figsize=(10,20))
plt.xlabel('month')
plt.ylabel('number of proposals')
Yet when I plot subplots on a figure and specify x axis paremeter x='month_name' The month name appears on the plot here:
f = plt.figure()
f.set_figheight(8)
f.set_figwidth(8)
f.sharex=True
f.sharey=True
#graph1 = f.add_subplot(2,2,1)
avg_all_account.ix[0:,['month_date','number_open_proposals_all']].plot(ax=f.add_subplot(331),legend=False,subplots=True,x='month_date',y='number_open_proposals_all',title='open proposals')
plt.xlabel('month')
plt.ylabel('number of proposals')
Thus because the subplot method worked and showed the month_name on the x axis, and my x and y axis labels: I wanted to know how would I work out how many subplots I would need without first calculating it, then writing out each line and hard coding the subplot position?

Categories