MemoryError with numpy arange - python

I want to create an array of powers of 10 as a label for the y axis of a plot.
I am using the plt.yticks() with matplotlib imported as plt but this does not matter here anyway.
I have plots where as the y axis is varying from 1e3 to 1e15. Those are log plots.
Matplotlib is automatically displaying those with ticks with 1e2 steps and I want to have a step of 10 instead (in order to be able to use the minorticks properly).
I want to use the plt.yticks(numpy.arange(1e3, 1e15, 10)) command as said but numpy.arange(1e3, 1e15, 10) result in a MemoryError. Isn't it supposed to output an array of length 13? Why does the memory gets full?
How to overpass this issue and not build the array manually?
I also tried using built-in range but it won't work with floats.
Thank you.

Try the logspace from NumPy as
plt.yticks(numpy.logspace(3, 15, 13))
Here you give the starting and the last exponent (powers of 10) and the number of data-points in between. If you print the above mesh, you get the following
array([1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10,
1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15])

You could also just do:
10. ** np.arange(3,16)
that decimal point is important, as without it you will overflow the default int32 dtype for integers

An alternative way to do this, rather than explicitly defining the tick positions, is to use a LogLocator from the matplotlib.ticker module, and manually increase the number of ticks (by default, it will try to set a nice-looking number of ticks; i.e. so it doesn't look too cramped).
In this example, I set the number of ticks to 13 on the Axes on the right (using numticks=13), and you can see this increases the number of ticks so there is one on each integer power of 10.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
# Create figure and axes
fig, (ax1, ax2) = plt.subplots(ncols=2)
# Make yscale logarithmic
ax1.set_yscale('log')
ax2.set_yscale('log')
# Set y limits
ax1.set_ylim(1e3, 1e15)
ax2.set_ylim(1e3, 1e15)
# On ax2, lets tell the locator how many ticks we want
ax2.yaxis.set_major_locator(ticker.LogLocator(numticks=13))
ax1.set_title('default ticks')
ax2.set_title('LogLocator with numticks=13')
plt.show()
EDIT:
To add minor ticks with this method, we can use another LogLocator, and this time set the subs option to say where we want minor ticks in each decade. Here I haven't set minor ticks on every 0.1 because it would be too cramped, so just done for a subset. Note that if you set minor ticks like this, you also need to turn off tick labels for the minor ticks, using a NullFormatter.
ax2.yaxis.set_major_locator(ticker.LogLocator(numticks=13))
ax2.yaxis.set_minor_locator(ticker.LogLocator(subs=(0.2,0.4,0.6,0.8),numticks=13))
ax2.yaxis.set_minor_formatter(ticker.NullFormatter())

In this case the function logspace from numpy is more suitable.
The answer to the example is
np.logspace(3,15,num=15-3+1, endpoint=True)

Related

Seaborn jointplot axis on log scale with kind="hex"

I'd like to show the chart below, but with the x-axis on a log scale.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('XY'))
sns.jointplot(data=df,x='X',y='Y',height=3,kind='hex')
To be clear, I don't want to log X first, rather I want the numbers to stay the same but the distance between the axis ticks to change. In altair, it would look like the following (I can't do hex in altair, although please correct me if I'm wrong on that):
EDIT: Matt suggested adding xscale="log". That gets me very nearly there. I just need a way to from powers to normal integers.
You can use the xscale="log" keyword argument, which gets passed to the Matplotlib hexbin function that is used under-the-hood by seaborn. E.g.,
sns.jointplot(data=df, x='X', y='Y' ,height=3, kind='hex')
As stated in the comments, there are various ways to set the axis tick labels to not be in scientific format. The simplest is to do:
import matplotlib.ticker as mticker
grid = sns.jointplot(data=df, x='X', y='Y', height=3, kind='hex', xscale="log")
grid.ax_joint.xaxis.set_major_formatter(mticker.ScalarFormatter())
If instead you want, e.g., 1000 to be formatted with a comma so that it is 1,000, you could instead do:
grid.ax_joint.xaxis.set_major_formatter(mticker.StrMethodFormatter("{x:,.0f}"))

How to ensure even spacing between labels on x axis of matplotlib graph?

I have been given a data for which I need to find a histogram. So I used pandas hist() function and plot it using matplotlib. The code runs on a remote server so I cannot directly see it and hence I save the image. Here is what the image looks like
Here is my code below
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5) // raw_data is the data supplied to me
plt.savefig('/path/to/file.png')
plt.close()
As you can see the x axis labels are overlapping. So I used this function plt.tight_layout() like so
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5)
plt.tight_layout()
plt.savefig('/path/to/file.png')
plt.close()
There is some improvement now
But still the labels are too close. Is there a way to ensure the labels do not touch each other and there is fair spacing between them? Also I want to resize the image to make it smaller.
I checked the documentation here https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html but not sure which parameter to use for savefig.
Since raw_data is not already a pandas dataframe there's no need to turn it into one to do the plotting. Instead you can plot directly with matplotlib.
There are many different ways to achieve what you'd like. I'll start by setting up some data which looks similar to yours:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gamma
raw_data = gamma.rvs(a=1, scale=1e6, size=100)
If we go ahead and use matplotlib to create the histogram we may find the xticks too close together:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
ax.hist(raw_data, bins=5)
fig.tight_layout()
The xticks are hard to read with all the zeros, regardless of spacing. So, one thing you may wish to do would be to use scientific formatting. This makes the x-axis much easier to interpret:
ax.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
Another option, without using scientific formatting would be to rotate the ticks (as mentioned in the comments):
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
Finally, you also mentioned altering the size of the image. Note that this is best done when the figure is initialised. You can set the size of the figure with the figsize argument. The following would create a figure 5" wide and 3" in height:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
I think the two best fixes were mentioned by Pam in the comments.
You can rotate the labels with
plt.xticks(rotation=45
For more information, look here: Rotate axis text in python matplotlib
The real problem is too many zeros that don't provide any extra info. Numpy arrays are pretty easy to work with, so pd.DataFrame(np.array(raw_data)/1000).hist(bins=5) should get rid of three zeros off of both axes. Then just add a 'kilo' in the axes labels.
To change the size of the graph use rcParams.
from matplotlib import rcParams
rcParams['figure.figsize'] = 7, 5.75 #the numbers are the dimensions

How to define the number of minor ticks in matplotlib config file

Using one Locator() one can control the number of minor or major ticks on a matplotlib axes.
For example :
plt.gca().yaxis.set_minor_locator(plt.MultipleLocator(1))
But the above line depends on the data you are plotting.
I would like to know if there is a way to change the number of minor ticks from a matplotlib style file such as matplotlibrc (https://matplotlib.org/users/customizing.html). For example, you can manage the minor ticks style with:
xtick.minor.visible : True
xtick.minor.width : 2
xtick.minor.size : 5
But I do not know how to deal with the number of minor ticks.
The number of ticks is not really considered a "style" attribute. So you currently have no choice of using an rc file for this.
As to where the default settings come from, this is determined by the AutoMinorLocator, which is used if no other custom locator is given and the minor ticks are turned on.
matplotlib.ticker.AutoMinorLocator(n=None)
Dynamically find minor tick positions based on the positions of major ticks. The scale must be linear with major ticks evenly spaced.
n is the number of subdivisions of the interval between major ticks; e.g., n=2 will place a single minor tick midway between major ticks.
If n is omitted or None, it will be set to 5 or 4.
This leaves room for the following workaround. You may monkey patch the AutoMinorLocator to use a different default than usual. E.g. to have 23 minor bins,
# use these lines on top of your matplotlib script
import matplotlib.ticker
class MyLocator(matplotlib.ticker.AutoMinorLocator):
def __init__(self, n=23):
super().__init__(n=n)
matplotlib.ticker.AutoMinorLocator = MyLocator
# Now use matplotlib as usual.
import matplotlib.pyplot as plt
plt.rcParams["xtick.minor.visible"] = True
plt.plot([1,2])
plt.show()
Use
ax.xaxis.set_minor_locator(matplotlib.ticker.AutoMinorLocator(2))
ax.yaxis.set_minor_locator(matplotlib.ticker.AutoMinorLocator(2))
where ax is your axes and 2 is the number of minor ticks, which you can replace to your liking.

Pyplot how to reduce xticks *and* xticklabels density?

I have to plot several curves with very high xtick density, say 1000 date strings. To prevent these tick labels overlapping each other I manually set them to be 60 dates apart. Code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
fig = plt.figure(1)
ax = plt.subplot(1, 1, 1)
tick_spacing = 60
for i in range(5):
plt.plot(ts_index, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
fig.savefig(r".\net_value_curves.png", )
fig.clf()
I'm running this piece of code in PyCharm Community Edition 2017.2.2 with a Python 3.6 kernel. Now comes the funny thing: whenever I ran the code in the normal "run" mode (i.e. just hit the execution button and let the code run "freely" till interruption or termination), then the figure I got would always miss xticklabels:
However, if I ran the code in "debug" mode and ran it step by step then I would get an expected figure with complete xticklabels:
This is really weird. Anyway, I just hope to find a way that can ensure me getting the desired output (the second figure) in the normal "run" mode. How can I modify my current code to achieve this?
Thanks in advance!
Your x axis data are strings. Hence you will get one tick per data point. This is probably not what you want. Instead use the dates to plot. Because you are using pandas, this is easily converted,
dates = pd.to_datetime(ts_index, format="%Y%m%d")
You may then get rid of your manual xtick locating and formatting, because matplotlib will automatically choose some nice tick locations for you.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
dates = pd.to_datetime(ts_index, format="%Y%m%d")
fig, ax = plt.subplots()
for i in range(5):
plt.plot(dates, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
However in case you do want to have some manual control over the locations and formats you may use matplotlib.dates locators and formatters.
# tick every 3 months
plt.gca().xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
# format as "%Y%m%d"
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y%m%d"))
In general, the Axis object computes and places ticks using a Locator object. Locators and Formatters are meant to be easily replaceable, with appropriate methods of Axis. The default Locator does not seem to be doing the trick for you so you can replace it with anything you want using axes.xaxis.set_major_locator. This problem is not complicated enough to write your own, so I would suggest that MaxNLocator fits your needs fairly well. Your example seems to work well with nbins=16 (which is what you have in the picture, since there are 17 ticks.
You need to add an import:
from matplotlib.ticker import MaxNLocator
You need to replace the block
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
with
ax.xaxis.set_major_locator(MaxNLocator(nbins=16))
or just
ax.xaxis.set_major_locator(MaxNLocator(16))
You may want to play around with the other arguments (all of which have to be keywords, except nbins). Pay especial attention to integer.
Note that for the Locator and Formatter APIs we work with an Axis object, not Axes. Axes is the whole plot, while Axis is the thing with the spines on it. Axes usually contains two Axis objects and all the other stuff in your plot.
You can set the visibility of the xticks-labels to False
for label in plt.gca().xaxis.get_ticklabels()[::N]:
label.set_visible(False)
This will set every Nth label invisible.

're-sort' / adapt ticks of matshow matrix plot

I tried hard, but I'm stuck with matplotlib here. Please overlook, that the mpl docs are a bit confusing to me . My question concerns the following:
I draw a symmetrical n*n matrix D with matshow function. That works.
I want to do the same thing, just with different order of (the n) items in D
D = [:,neworder]
D = [neworder,:]
Now, how do I make the ticks reproduce this neworder, preferably using additionally MaxNLocator?
As far as I understand...
set_xticklabels assigns labels to the ticks by order, independently of where the ticks are set?!
set_xticks (mpl docs: 'Set the x ticks with list of ticks') here I'm really not sure what it does. Can somebody explain it precisely? I don't know, whether these functions are helpful in my case at all. Maybe even things are different between using a common xy plot and matshow.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.gca()
D = np.arange(100).reshape(10,10)
neworder = np.arange(10)
np.random.shuffle(neworder)
D = D[:,neworder]
D = D[neworder, :]
# modify ticks somehow...
ax.matshow(D)
plt.show()
Referring to Paul's answer, think I tried smth like this. Using the neworder to define positions and using it for the labels, I added plt.xticks(neworder, neworder) as tick-modifier. For example with neworder = [9 8 4 7 2 6 3 0 1 5] I get is this
The order of the labels is correct, but the ticks are not. The labels should be independently show the correct element independently of where the ticks are set. So where is the mistake?
I think what you want to do is set the labels on the new plot to show the rearranged order of the values. Is that right? If so, you want to keep the tick locations the same, but change the labels:
plt.xticks(np.arange(0,10), neworder)
plt.yticks(np.arange(0,10), neworder)
Edit: Note that these commands must be issued after matshow. This seems to be a quirk of matshow (plot does not show this behaviour, for example). Perhaps it's related to this line from the plt.matshow documentation:
Because of how :func:matshow tries to set the figure aspect ratio to be the
one of the array, if you provide the number of an already
existing figure, strange things may happen.
Perhaps the safest way to go is to issue plt.matshow(D) without first creating a figure, then use plt.xticks and plt.yticks to make adjustments.
Your question also asks about the set_ticks and related axis methods. The same thing can be accomplished using those tools, again after issuing matshow:
ax = plt.gca()
ax.xaxis.set_ticks(np.arange(0,10)) # turn on all tick locations
ax.xaxis.set_ticklabels(neworder) # use neworder for labels
Edit2: The next part of your question is related to setting a max number of ticks. 20 would require a new example. For our example I'll set the max no. of ticks at 2:
ax = plt.gca()
ax.xaxis.set_major_locator(plt.MaxNLocator(nbins=3)) # one less tick than 'bin'
tl = ax.xaxis.get_ticklocs() # get current tick locations
tl[1:-1] = [neworder[idx] for idx in tl[1:-1]] # find what the labels should be at those locs
ax.xaxis.set_ticklabels(tl) # set the labels
plt.draw()
You are on the right track. The plt.xticks command is what you need.
You can specify the xtick locations and the label at each position with the following command.
labelPositions = arange(len(D))
newLabels = ['z','y','x','w','v','u','t','s','q','r']
plt.xticks(labelPositions,newLabels)
You could also specify an arbitrary order for labelPositions, as they will be assigned based on the values in the vector.
labelPositions = [0,9,1,8,2,7,3,6,4,5]
newLabels = ['z','y','x','w','v','u','t','s','q','r']
plt.xticks(labelPositions,newLabels)

Categories