I have a script for plotting big data sets. I have a problem while setting the xticks in my plot. I have tried the following code:
plt.xticks(configDBcomplete.index),max(configDBcomplete.index[-1]),5),data,rotation=90, fontsize= 12
The problem is that I have more than 2000 data points for x and the ticks get overlapped. I want to have ticks at every 5th data point. I have tried using np.arange as:
plt.xticks(np.arange(min(configDBcomplete.index),max(configDBcomplete.index[-1]),5),data,rotation=90, fontsize= 12
but it plots the first 50 data points along the plot and not the corresponding ones. Any idea how to solve this?
Currently you are using the whole data for setting the x-ticklabels. The first argument to the xticks() function is the location of the ticks and the second argument is the tick labels.
You need to use indexing to get every 5th data point (corresponding to the ticks). You can access it using [::5]. So you need to pass data[::5] to your xticks() as
plt.xticks(np.arange(min(configDBcomplete.index),max(configDBcomplete.index[-1]),5),data[::5],rotation=90, fontsize= 12)
You can also use range() as
plt.xticks(range(min(configDBcomplete.index),max(configDBcomplete.index[-1]),5),data[::5],rotation=90, fontsize= 12)
Related
I am drawing some graphs and I wanna import them in LaTex in 2 by 2 format. One of the problems is that values on the y-axis for one graph range from 1 to 6, but for another graph, those range from 1 to 200. Because of that, when I import graphs into my document, they do not look good. Is there any way to set the same width for value on the y-axis?
You can set the y axis limits using ax.set_ylim or plt.ylim:
# Set axis from 1 to 200
ax.set_ylim((1,200))
# Or just set it directly - this will also act on the current axis
plt.ylim((1,200))
Edit: The question is about widths rather than limits.
I think making the subplots together on one figure should solve this problem.
plt.figure()
plt.subplot(2,2,1)
plt.plot(x1,y1)
.
.
plt.subplot(2,2,4)
plt.plot(x4,y4)
I am trying to build a simple histogram. For some reason, my bars are behaving abnormally. As you can see in this picture, my bar over "3" is moved to the right side. I am not sure what caused it. I did align='mid' but it did not fix it.
This is the code that I used to create it:
def createBarChart(colName):
df[colName].hist(align='mid')
plt.title(str(colName))
RUNS = [1,2,3,4,5]
plt.xticks(RUNS)
plt.show()
for column in colName:
createBarChart(column)
And this is what I got:
bar is not centered over 3
To recreate my data:
df = pd.DataFrame(np.random.randint(1,6,size=(100, 4)), columns=list('ABCD'))
Thank you for your help!
P/s: idk if this info is relevant, but I am using seaborn-whitegrid style. I tried to recreate a plot with sample data and it's still showing up. Is it a bug?
hist created using random data
The hist function is behaving exactly as it is supposed to. By default it splits the data you pass into 10 bins, with the left edge of the first bin at the data's minimum value and the right edge of the last bin at its maximum. The chart below shows the randomly generated data binned this way, with red dashed lines to mark the edges of the bins.
The way around this is to define the bin edges yourself, with a slight adjustment to the minimum and maximum values to centre the bars over the x axis ticks. This can be done quite easily with numpy's linspace function (using column A in the randomly generated data frame as an example):
bins = np.linspace(df["A"].min() - .5, df["A"].max() + .5, 6)
df["A"].hist(bins=bins)
We ask for 6 values because we are defining the bin edges, this will result in 5 bins, as shown in this chart:
If you wanted to keep the gaps between the bars you can increase the number of bins to 9 and adjust the offset slightly, but this wouldn't work in all cases (it works here because every value is either 1, 2, 3, 4 or 5).
bins = np.linspace(df["A"].min() - .25, df["A"].max() + .25, 10)
df["A"].hist(bins=bins)
Finally, as this data contains discrete values and really you are plotting the counts, you could use the value_counts function to create a series that can then be plotted as a bar chart:
df["A"].value_counts().sort_index().plot(kind="bar")
# Provide a 'color' argument if you need all of the bars to look the same.
df["A"].value_counts().sort_index().plot(kind="bar", color="steelblue")
Try using something like this in your code to create all of the histogram bars to the same place.
plt.hist("Your data goes here", bins=range(1,7), align='left', rwidth=1, normed=True)
place your data where I put your data goes here
I am trying to plot a graph using Matplotlib using the following code:
fig, axs = plt.subplots()
axs.set_xlim([1,5])
axs.grid()
axs.errorbar(plot1_dataerr[1],range(len(plot1_dataerr[1])),xerr = plot1_dataerr[2], fmt = 'k o')
axs.yaxis.set_ticks(np.arange(len(plot1_dataerr[1])))
axs.set_yticklabels(plot1_dataerr[0])
The variable plot1_dataerr contains the labels for the data as its 0th element, the actual means as the 1st element and the half-length of the error bars as the second element. When I run this code (along with the exact data) I get the following:
However as you can see some of the ticks on the y-axis are cut off, they should all start with 'vegetable based side dishes'. Does anyone know what I should change so that everything fits. I don't mind if some of the labels need to occupy 2 lines.
Thanks in advance!
You probably need to increase the left margin. For automatic adjustment, use
fig.tight_layout()
Else, start with
fig.subplots_adjust(left=0.4)
and decrease the value until you are happy with the result.
I'm using Matplotlib to plot data on Ubuntu 15.10. My y-axis has numeric values and my x-axis timestamps.
I'm having the problem that the date labels intersect with each other making it look bad. How do I increase the distance between the x-axis ticks/labels to be evenly spaced still? Since the automatic selection of ticks was bad I'm okay with manually setting the amount of date ticks. Any other solution is appreciated, too.
Besides, I'm using the following DateFormatter:
formatter = DateFormatter('%m/%d/%y')
axis = plt.gca()
axis.xaxis.set_major_formatter(formatter)
You could add the following to your code:
plt.gcf().autofmt_xdate()
Which automatically formats the x axis for you (rotates the labels to something like 30 degrees etc).
You can also manually set the amount of x ticks that show on your x-axis to avoid it getting crowded, by using the following:
max_xticks = 10
xloc = plt.MaxNLocator(max_xticks)
ax.xaxis.set_major_locator(xloc)
I personally use both together as it makes the graph look much nicer when using dates.
You can simply set the locations you want to be labeled:
axis.set_xticks(x[[0, int(len(x)/2), -1]])
where x would be your array of timestamps
Using the AXIS notation for matplotlib has allowed me to manually plot a grid of 2x2 or 3x3 or whatever size grid (if I know what size grid I want beforehand.)
However, how do you determine what size grid is needed automatically. Like what if you don't know how many unique values are in a column that you want to graph?
I am thinking there must be a way of doing this in a loop and figuring out based on the number of unique values in the column this is how big the graph needs to be.
Example
When I plot this for some reason it doesn't show month_name on the x axis (as in Jan, Feb, Marc etc)
avg_all_account.plot(legend=False,subplots=True,x='month_date',figsize=(10,20))
plt.xlabel('month')
plt.ylabel('number of proposals')
Yet when I plot subplots on a figure and specify x axis paremeter x='month_name' The month name appears on the plot here:
f = plt.figure()
f.set_figheight(8)
f.set_figwidth(8)
f.sharex=True
f.sharey=True
#graph1 = f.add_subplot(2,2,1)
avg_all_account.ix[0:,['month_date','number_open_proposals_all']].plot(ax=f.add_subplot(331),legend=False,subplots=True,x='month_date',y='number_open_proposals_all',title='open proposals')
plt.xlabel('month')
plt.ylabel('number of proposals')
Thus because the subplot method worked and showed the month_name on the x axis, and my x and y axis labels: I wanted to know how would I work out how many subplots I would need without first calculating it, then writing out each line and hard coding the subplot position?