How to scale X of plot in Python? - python

I have the following code that builds the empirical function according to data stored in Z_score_list.
Z_score_list.sort()
edf = []
step = 1 / len(Z_score_list)
for i in range(len(Z_score_list)):
edf.append(step * i)
edf = np.array(edf)
fig, ax = plt.subplots()
ax.plot(Z_score_list, edf,
'b--', lw=3, alpha=0.6, label='Эмпирическая')
plt.show()
As a result I have this:
There isn't enough space for X axis. So it breaks the plot and continues it from the start of X axis to its end. How can I scale this graphic for the one continuous line that will be independent from Z_score_list size?

The problem wasn't in plot function itself, there were NaN values in Z_score_list and every occurance of it starts sort again from the index NaN was occured. Removing these values made it all ok.

Related

Secondary x axis labels

I have two csv files that have been generated on one chronological basis during my recording (they both have a timestamp column based on one clock).
I want to plot my data in matplotlib (or elsewhere using python, if you have a better suggestion).
On my primary x axis, I want to have the general continuous timestamps (from csv file 1).
On my y axis I need the recordings of my desired variable (from csv file 1).
On my secondary x axis, I need to have my experiment events or annotations (from csv file 2), right at the timestamps (ticks) when they happened.
I try to plot all of these, this way:
ticks = annotations_pd_frame['timestamp']
labels = annotations_pd_frame['label']
fig, ax1 = plt.subplots()
ax2 = ax1.twiny()
fig.set_figheight(5)
fig.set_figwidth(25)
ax1.yaxis.grid()
plt.xticks(ticks, labels)
plt.plot(pupil_data_in_trial_eye0['pupil_timestamp'].loc[pupil_data_in_trial_eye0['trial'] == trial_label], pupil_data_in_trial_eye0['diameter_3d'].loc[pupil_data_in_trial_eye0['trial'] == trial_label])
plt.plot(pupil_data_in_trial_eye1['pupil_timestamp'].loc[pupil_data_in_trial_eye1['trial'] == trial_label], pupil_data_in_trial_eye1['diameter_3d'].loc[pupil_data_in_trial_eye1['trial'] == trial_label])
plt.legend(['eye0', 'eye1'])
ax1.set_xlabel('Timestamps [s]')
ax1.set_ylabel('Diameter [mm]')
plt.title('Pupil Diameter in ' + str(label) )
plt.grid(b=True)
An example of the csv files is here :
https://gist.github.com/Zahra-on-Github/aa67a3e309fa66582a118f5c08509f77
First figure is when I plot my main data using plt.plot
and I get correct ticks and labels (ticks and labels correctly shown as they happened in this one trial of data),
but incorrect timestamps on the primary x axis.
Second figure is when I plot my main data using ax1.plot
and I get correct timestamps on primary x axis,
but incorrect ticks and labels (the whole run’s ticks and labels are shown for this one trial of data).
Any ideas what I'm doing wrong?
I solved it like this:
for (t, l) in zip(ticks, labels):
ax1.axvline(t, color='black', linestyle='--')
trans = mtransforms.blended_transform_factory(ax1.transData, ax1.transAxes)
ax1.text(t, 1.1, l, ha='center', transform=trans, rotation = 30)

How to multiply the y-axis values of a histogram by a fixed number in Python

I have a list of data to plot using histograms. I want to scale the y-axis of each plot separately. If I do like the following, it scales each plot's y-axis by 10.
protocols = {}
types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}
for protname, values in protocols.items():
fig, ax1 = plt.subplots()
ax1.hist(values["col_data"], facecolor='blue', alpha=0.9, label=protname,align='left')
y_vals = ax1.get_yticks()
ax1.set_yticklabels(['{:3.0f}'.format(x * 10) for x in y_vals])
plt.legend()
plt.show()
However, I want the scaling to be separate for each histogram. I tried it as the following but it doesn't seem to be working as intended.
for protname, values in protocols.items():
fig, ax1 = plt.subplots()
ax1.hist(values["col_data"], facecolor='blue', alpha=0.9, label=protname,align='left')
y_vals = ax1.get_yticks()
ax1.set_yticklabels(['{:3.0f}'.format(x * 10) for x in y_vals if protname=="data1" and ['{:3.0f}'.format(x * 10) for x in y_vals if protname=="data2" and ['{:3.0f}'.format(x * 15) for x in y_vals if protname=="data3"]]])
plt.legend()
plt.show()
If we try ONLY for one plot as ax1.set_yticklabels(['{:3.0f}'.format(x * 10) for x in y_vals if protname=="data2"]) it applies the changes only to the second plot and leave the others blank.
At first I'd be interested in why you want to manipulate the y-axis values, as the histogram values are those of your data - I don't see a reason for changing it without loosing the meaning for your data.
That said, my next question would be generally if you intentionally set plt.subplots inside your for-loop, because one use case for this command is in fact creating several subplots into one figure - perhaps you'll think about that later...
However, the easiest way to apply different factors at different iterations is simply to add them as another list into your loop with zip:
factors = [10, 10, 15]
for (protname, values), m in zip(protocols.items(), factors):
...
ax1.set_yticklabels(['{:3.0f}'.format(x * m) for x in y_vals])
...

Matplotlib.pyplot - Multiple plots on a log-log plot the third plot will not show

The following handles the first two plots:
data = [228782,38376,1416068,15177,3267,37238,43946,16882,1032,18015,32867,99886,10837,539578,2097,243504,721254,821,7264,24889,11301229,45200,1489405,106374,86755,117564,1195884,35285,1,1,69718,49305,99705,755060,172177,90544,375,13231,41636,2194328,445222,4151167,16397,13233,16167,28013,23123,20600,279078,204561,903049,1077988,3175300,133098,313077,150900,7916,7192,27825,11757,131951,36491,412661,13770,234963,14,27165,2655993,35906,4726736,1400946,50211,970,12991,500415,49638,227901,113491,34705,3601896,13206,2,163363,1763238,704929,17683,3345746,6169795,50960,10562405,19964,8768,60654,36688,408987,5491596,9198,3555,6956,44754,24177,44582,961052,53915,136564,88098,2506,4963972,30797,182620,120865,255538,82780,62005,31621,1878092,9538,3,104934,190948,32669,144033,1476,10792,490043,44086,16947,65156,4,67559,91119,158176,41496,444282,716632,370458,3497,113392,14303,14122,29358,35527,3359,1283,49466,964715,2231204,222141,215161,295728,32342,13462,2479807,5400340,388405,67750,22347,1491,106668,1533418,535,10953,10796,286274,33799,68004,11507,84894,353376,963503,2450530,161553,146589,796,94487,124304,4909631,31693,7574328,321190,15335,243789,71551,4162413,292380,140728,611758,536,120032,549,32414,52314,3321,5251,17321,290913,224,138860,2175516,13886,3674482,308513,2179765,11830,2228414,128376,103459,100288,442028,2685164,176,2,821791,780,533,125481,140268,96293,223237,22322,51114,10292,126129,271666,104450,5073589,1767339,250,32306,154856,20439,2160830,5,310789,896015,754529,38198,1445987,12020931,795411,25003,538,36861,18150,991877,1962984,48752,82654,3963056,6494512,79644,12438,20884,7,3849311,1495,6469,291234,72614,5439,26130,274373,12597,811805,3,295383,99982,22564,38928,92,1907481,3075,729,658168,1165,951107,128879,680182,3601,4208,2026,108807,746024,2866765,1505305,344209,223629,121982,77107,20725,12501,6308,858843,3675204,5050872,152005,36862,238924,52329,2049905,29376,855373,3766990,1756,1516744,22267,1515269,616,10687,17424,30983,314870,5017553,110463,50482,34061,33543,107524,619803,1108,190765,108684,2452800,90389,3213,871491,3760,773869,63341,5691,23539,20696,36256,373034,8614,4724,3692,13870,105831,26373,3188,160035,27253,1281,3332,43168]
rank = sorted(list(range(1, len(data)+1)))
freq = np.array(sorted(data, reverse=True))
plt.figure(figsize=(8, 6))
plt.xlim(1, 10**3)
plt.ylim(1, 10**8)
lines = plt.loglog(rank, freq, marker=".")
plt.plot([1, freq[0]], [freq[0], 1], 'r')
Now I am trying to add this log-normal plot of:
# log-normal
x = np.ma.log(freq)
avg = np.mean(x)
std = np.std(x)
pdf = lognorm.pdf(x, avg, 0, np.exp(std))
plt.loglog(pdf, x, marker='v')
# plt.plot(pdf, x, 'gv-')
title('Zipf plot of Airport frequency')
xlabel('frequency rank')
ylabel('airport frequency')
grid(True)
plt.show()
This is what produce by the both code blocks and as you can see the third plot is not showing:
Appreciate any insight on how to get this to work. I running on Win10 using Python 3.7 with latest versions of matplotlib and other packages.
Both plots are actually being plotted, but the x and y limits you're using are ignoring your log-normal data. I copied and ran your code exactly, except I changed the limits to
plt.xlim(10**-3, 10**3)
plt.ylim(10**-1, 10**8)
and this produced the plot below
When I want to adjust the x and y limits for my plots, I typically do that at the end right before plt.show() or when I define the xlabel and ylabel. I also will do min(xdata) and max(xdata) to get the limits that show all the data (you can also add offsets as necessary).

Force all x-axis values to compare in scatterplot in Python

I'm using matplotlib.pyplot for plot a scatterplot. The following code produces a scatterplot that does not match this request.
months = []
data = [...] #some data in list form
#skipping the 8th value since I don't want data to refer at this value
for i in [x for x in range(1, len(data) +2) if x != 8]:
months.append(i)
fig, ax = plt.subplots()
plt.scatter(months,data)
plt.scatter([months[-1]],[data[-1]], color=['red'])
plt.title('Quantity scatterplot')
ax.set_xlabel('Months')
ax.set_ylabel('Quantities')
ax.legend(['Historical quantities','Forecasted quantity'], loc=1)
plt.show()
While I would like to see all months (from 1 to 10) on x-axis
The easiest way to force all numbers between 1 and 10 to appear as ticklabels on the x axis is to use
ax.set_xticks(range(1,11))
For the more general case where axis limits are not determined beforehands you may get ticklabels at integer positions using a matplotlib.ticker.MultipleLocator.
ax.xaxis.set_major_locator(matplotlib.ticker.MultipleLocator(1))
where 1 is the number of which all ticks should be multiples of.

expand plot for readability without expanding lines

I am plotting 2 lines and a dot, X axis is a date range. The dot is most important, but it appears on the boundary of the plot. I want to "expand" the plot further right so that the dot position is more visible.
In other words I want to expand the X axis without adding new values to Y values of lines. However if I just add a few dates to X values of lines I get the "x and y dimensions must be equal" error. I tried to add a few np.NaN values to Y so that dimensions are equal, but then I get an error "integer required".
My plot:
My code:
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
plot_x = train_original.index.values
train_y = train_original.values
ax1.plot(plot_x, train_y, 'grey')
x = np.concatenate([np.array([train_original.index.values[-1]]), test_original.index.values])
y = np.concatenate([np.array([train_original.dropna().values[-1]]), test_original.dropna().values])
ax1.plot(x, y, color='grey')
ax1.plot(list(predicted.index.values), list(predicted.values), 'ro')
ax1.axvline(x=train_end, alpha=0.7, linestyle='--',color='blue')
plt.show()
There are a couple of ways to do this.
An easy, automatic way to do this, without needing knowledge of the existing xlim is to use ax.margins. This will add a certain fraction of the data limits to either side of the plot. For example:
ax.margins(x=0.1)
will add 10% of the current x range to both ends of the plot.
Another method is to explicitly set the x limits using ax.set_xlim.
Just change the xlim(). Something like:
xmin, xmax = plt.xlim() # return the current xlim
plt.xlim(xmax=xmax+1)

Categories