Connect different data series with the same line

Connect different data series with the same line - python

Is there a way to get matplotlib to connect data from two different data sets with the same line?
Context: I need to plot some data in log scale, but some of them are negative. I use the workaround of plotting the data absolute value in different colours (red for positive and green for negative), something like:
import pylab as pl
pl.plot( x, positive_ys, 'r-' ) # positive y's
pl.plot( x, abs( negative_ys ), 'g-' ) # negative y's
pl.show()
However, as they represent the same quantity, it would be helpful to have the two data series connected by the same line. Is this possible?
I cannot use pl.plot( x, abs( ys )) because I need to be able to differentiate between the positive and originally negative values.

With numpy you can use logical indexing.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.array([10000, 1000, 100, 10, 1, 5, 50, 500, 5000, 50000])
y = np.array([-10000, -1000, -100, -10, -1, 5, 50, 500, 5000, 50000])
ax.plot(x,abs(y),'+-b',label='all data')
ax.plot(abs(x[y<= 0]),abs(y[y<= 0]),'o',markerfacecolor='none',
markeredgecolor='r',
label='we are negative')
ax.set_xscale('log')
ax.set_yscale('log')
ax.legend(loc=0)
plt.show()
The key feature is first plotting all absolute y-values and then re-plotting those that were originally negative as hollow circles to single them out. This second step uses the logical indexing x[y<=0] and y[y<=0] to only pick those elements of the y-array which are negative.
The example above gives you this figure:
If you really have two different data sets, the following code will give you the same figure as above:
x1 = np.array([1, 10, 100, 1000, 10000])
x2 = np.array([5, 50, 500, 5000, 50000])
y1 = np.array([-1, -10, -100, -1000, -10000])
y2 = np.array([5, 50, 500, 5000, 50000])
x = np.concatenate((x1,x2))
y = np.concatenate((y1,y2))
sorted = np.argsort(y)
ax.plot(x[sorted],abs(y[sorted]),'+-b',label='all data')
ax.plot(abs(x[y<= 0]),abs(y[y<= 0]),'o',markerfacecolor='none',
markeredgecolor='r',
label='we are negative')
Here, you first use np.concatenate to combine both the x- and the y-arrays. Then you employ np.argsort to sort the y-array in a way that makes sure you do not get a overly zig-zaggy line when plotting. You use that index-array (sorted) when you call the first plot. As the second plot only plots symbols but no connecting line, you do not require sorted arrays here.

Related

Matplotlib + pandas change xtick label frequency when using period[Q-DEC] [duplicate]

I am trying to fix how python plots my data.
Say:
x = [0,5,9,10,15]
y = [0,1,2,3,4]
matplotlib.pyplot.plot(x,y)
matplotlib.pyplot.show()
The x axis' ticks are plotted in intervals of 5. Is there a way to make it show intervals of 1?

You could explicitly set where you want to tick marks with plt.xticks:
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
For example,
import numpy as np
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(np.arange(min(x), max(x)+1, 1.0))
plt.show()
(np.arange was used rather than Python's range function just in case min(x) and max(x) are floats instead of ints.)
The plt.plot (or ax.plot) function will automatically set default x and y limits. If you wish to keep those limits, and just change the stepsize of the tick marks, then you could use ax.get_xlim() to discover what limits Matplotlib has already set.
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
The default tick formatter should do a decent job rounding the tick values to a sensible number of significant digits. However, if you wish to have more control over the format, you can define your own formatter. For example,
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
Here's a runnable example:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 0.712123))
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
plt.show()

Another approach is to set the axis locator:
import matplotlib.ticker as plticker
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
There are several different types of locator depending upon your needs.
Here is a full example:
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
fig, ax = plt.subplots()
ax.plot(x,y)
loc = plticker.MultipleLocator(base=1.0) # this locator puts ticks at regular intervals
ax.xaxis.set_major_locator(loc)
plt.show()

I like this solution (from the Matplotlib Plotting Cookbook):
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [0,5,9,10,15]
y = [0,1,2,3,4]
tick_spacing = 1
fig, ax = plt.subplots(1,1)
ax.plot(x,y)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.show()
This solution give you explicit control of the tick spacing via the number given to ticker.MultipleLocater(), allows automatic limit determination, and is easy to read later.

In case anyone is interested in a general one-liner, simply get the current ticks and use it to set the new ticks by sampling every other tick.
ax.set_xticks(ax.get_xticks()[::2])

if you just want to set the spacing a simple one liner with minimal boilerplate:
plt.gca().xaxis.set_major_locator(plt.MultipleLocator(1))
also works easily for minor ticks:
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
a bit of a mouthfull, but pretty compact

This is a bit hacky, but by far the cleanest/easiest to understand example that I've found to do this. It's from an answer on SO here:
Cleanest way to hide every nth tick label in matplotlib colorbar?
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
Then you can loop over the labels setting them to visible or not depending on the density you want.
edit: note that sometimes matplotlib sets labels == '', so it might look like a label is not present, when in fact it is and just isn't displaying anything. To make sure you're looping through actual visible labels, you could try:
visible_labels = [lab for lab in ax.get_xticklabels() if lab.get_visible() is True and lab.get_text() != '']
plt.setp(visible_labels[::2], visible=False)

This is an old topic, but I stumble over this every now and then and made this function. It's very convenient:
import matplotlib.pyplot as pp
import numpy as np
def resadjust(ax, xres=None, yres=None):
"""
Send in an axis and I fix the resolution as desired.
"""
if xres:
start, stop = ax.get_xlim()
ticks = np.arange(start, stop + xres, xres)
ax.set_xticks(ticks)
if yres:
start, stop = ax.get_ylim()
ticks = np.arange(start, stop + yres, yres)
ax.set_yticks(ticks)
One caveat of controlling the ticks like this is that one does no longer enjoy the interactive automagic updating of max scale after an added line. Then do
gca().set_ylim(top=new_top) # for example
and run the resadjust function again.

I developed an inelegant solution. Consider that we have the X axis and also a list of labels for each point in X.
Example:
import matplotlib.pyplot as plt
x = [0,1,2,3,4,5]
y = [10,20,15,18,7,19]
xlabels = ['jan','feb','mar','apr','may','jun']
Let's say that I want to show ticks labels only for 'feb' and 'jun'
xlabelsnew = []
for i in xlabels:
if i not in ['feb','jun']:
i = ' '
xlabelsnew.append(i)
else:
xlabelsnew.append(i)
Good, now we have a fake list of labels. First, we plotted the original version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabels,rotation=45)
plt.show()
Now, the modified version.
plt.plot(x,y)
plt.xticks(range(0,len(x)),xlabelsnew,rotation=45)
plt.show()

Pure Python Implementation
Below's a pure python implementation of the desired functionality that handles any numeric series (int or float) with positive, negative, or mixed values and allows for the user to specify the desired step size:
import math
def computeTicks (x, step = 5):
"""
Computes domain with given step encompassing series x
# params
x - Required - A list-like object of integers or floats
step - Optional - Tick frequency
"""
xMax, xMin = math.ceil(max(x)), math.floor(min(x))
dMax, dMin = xMax + abs((xMax % step) - step) + (step if (xMax % step != 0) else 0), xMin - abs((xMin % step))
return range(dMin, dMax, step)
Sample Output
# Negative to Positive
series = [-2, 18, 24, 29, 43]
print(list(computeTicks(series)))
[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
# Negative to 0
series = [-30, -14, -10, -9, -3, 0]
print(list(computeTicks(series)))
[-30, -25, -20, -15, -10, -5, 0]
# 0 to Positive
series = [19, 23, 24, 27]
print(list(computeTicks(series)))
[15, 20, 25, 30]
# Floats
series = [1.8, 12.0, 21.2]
print(list(computeTicks(series)))
[0, 5, 10, 15, 20, 25]
# Step – 100
series = [118.3, 293.2, 768.1]
print(list(computeTicks(series, step = 100)))
[100, 200, 300, 400, 500, 600, 700, 800]
Sample Usage
import matplotlib.pyplot as plt
x = [0,5,9,10,15]
y = [0,1,2,3,4]
plt.plot(x,y)
plt.xticks(computeTicks(x))
plt.show()
Notice the x-axis has integer values all evenly spaced by 5, whereas the y-axis has a different interval (the matplotlib default behavior, because the ticks weren't specified).

Generalisable one liner, with only Numpy imported:
ax.set_xticks(np.arange(min(x),max(x),1))
Set in the context of the question:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0,5,9,10,15]
y = [0,1,2,3,4]
ax.plot(x,y)
ax.set_xticks(np.arange(min(x),max(x),1))
plt.show()
How it works:
fig, ax = plt.subplots() gives the ax object which contains the axes.
np.arange(min(x),max(x),1) gives an array of interval 1 from the min of x to the max of x. This is the new x ticks that we want.
ax.set_xticks() changes the ticks on the ax object.

xmarks=[i for i in range(1,length+1,1)]
plt.xticks(xmarks)
This worked for me
if you want ticks between [1,5] (1 and 5 inclusive) then replace
length = 5

Since None of the above solutions worked for my usecase, here I provide a solution using None (pun!) which can be adapted to a wide variety of scenarios.
Here is a sample piece of code that produces cluttered ticks on both X and Y axes.
# Note the super cluttered ticks on both X and Y axis.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x) # set xtick values
ax.set_yticks(y) # set ytick values
plt.show()
Now, we clean up the clutter with a new plot that shows only a sparse set of values on both x and y axes as ticks.
# inputs
x = np.arange(1, 101)
y = x * np.log(x)
fig = plt.figure() # create figure
ax = fig.add_subplot(111)
ax.plot(x, y)
ax.set_xticks(x)
ax.set_yticks(y)
# which values need to be shown?
# here, we show every third value from `x` and `y`
show_every = 3
sparse_xticks = [None] * x.shape[0]
sparse_xticks[::show_every] = x[::show_every]
sparse_yticks = [None] * y.shape[0]
sparse_yticks[::show_every] = y[::show_every]
ax.set_xticklabels(sparse_xticks, fontsize=6) # set sparse xtick values
ax.set_yticklabels(sparse_yticks, fontsize=6) # set sparse ytick values
plt.show()
Depending on the usecase, one can adapt the above code simply by changing show_every and using that for sampling tick values for X or Y or both the axes.
If this stepsize based solution doesn't fit, then one can also populate the values of sparse_xticks or sparse_yticks at irregular intervals, if that is what is desired.

You can loop through labels and show or hide those you want:
for i, label in enumerate(ax.get_xticklabels()):
if i % interval != 0:
label.set_visible(False)

Plot an histogram with y-axis as percentage (using FuncFormatter?)

I have a list of data in which the numbers are between 1000 and 20 000.
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
When I plot a histogram using the hist() function, the y-axis represents the number of occurrences of the values within a bin. Instead of the number of occurrences, I would like to have the percentage of occurrences.
Code for the above plot:
f, ax = plt.subplots(1, 1, figsize=(10,5))
ax.hist(data, bins = len(list(set(data))))
I've been looking at this post which describes an example using FuncFormatter but I can't figure out how to adapt it to my problem. Some help and guidance would be welcome :)
EDIT: Main issue with the to_percent(y, position) function used by the FuncFormatter. The y corresponds to one given value on the y-axis I guess. I need to divide this value by the total number of elements which I apparently can' t pass to the function...
EDIT 2: Current solution I dislike because of the use of a global variable:
def to_percent(y, position):
# Ignore the passed in position. This has the effect of scaling the default
# tick locations.
global n
s = str(round(100 * y / n, 3))
print (y)
# The percent symbol needs escaping in latex
if matplotlib.rcParams['text.usetex'] is True:
return s + r'$\%$'
else:
return s + '%'
def plotting_hist(folder, output):
global n
data = list()
# Do stuff to create data from folder
n = len(data)
f, ax = plt.subplots(1, 1, figsize=(10,5))
ax.hist(data, bins = len(list(set(data))), rwidth = 1)
formatter = FuncFormatter(to_percent)
plt.gca().yaxis.set_major_formatter(formatter)
plt.savefig("{}.png".format(output), dpi=500)
EDIT 3: Method with density = True
Actual desired output (method with global variable):

Other answers seem utterly complicated. A histogram which shows the proportion instead of the absolute amount can easily produced by weighting the data with 1/n, where n is the number of datapoints.
Then a PercentFormatter can be used to show the proportion (e.g. 0.45) as percentage (45%).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
plt.hist(data, weights=np.ones(len(data)) / len(data))
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()
Here we see that three of the 7 values are in the first bin, i.e. 3/7=43%.

Simply set density to true, the weights will be implicitly normalized.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
plt.hist(data, density=True)
plt.gca().yaxis.set_major_formatter(PercentFormatter(1))
plt.show()

You can calculate the percentages yourself, then plot them as a bar chart. This requires you to use numpy.histogram (which matplotlib uses "under the hood" anyway). You can then adjust the y tick labels:
import matplotlib.pyplot as plt
import numpy as np
f, ax = plt.subplots(1, 1, figsize=(10,5))
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
heights, bins = np.histogram(data, bins = len(list(set(data))))
percent = [i/sum(heights)*100 for i in heights]
ax.bar(bins[:-1], percent, width=2500, align="edge")
vals = ax.get_yticks()
ax.set_yticklabels(['%1.2f%%' %i for i in vals])
plt.show()

I think the simplest way is to use seaborn which is a layer on matplotlib. Note that you can still use plt.subplots(), figsize(), ax, and fig to customize your plot.
import seaborn as sns
And using the following code:
sns.displot(data, stat='probability'))
Also, sns.displot has so many parameters that allow for very complex and informative graphs very easily. They can be found here: displot Documentation

You can use functools.partial to avoid using globals in your example.
Just add n to function parameters:
def to_percent(y, position, n):
s = str(round(100 * y / n, 3))
if matplotlib.rcParams['text.usetex']:
return s + r'$\%$'
return s + '%'
and then create a partial function of two arguments that you can pass to FuncFormatter:
percent_formatter = partial(to_percent,
n=len(data))
formatter = FuncFormatter(percent_formatter)
Full code:
from functools import partial
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
def to_percent(y, position, n):
s = str(round(100 * y / n, 3))
if matplotlib.rcParams['text.usetex']:
return s + r'$\%$'
return s + '%'
def plotting_hist(data):
f, ax = plt.subplots(figsize=(10, 5))
ax.hist(data,
bins=len(set(data)),
rwidth=1)
percent_formatter = partial(to_percent,
n=len(data))
formatter = FuncFormatter(percent_formatter)
plt.gca().yaxis.set_major_formatter(formatter)
plt.show()
plotting_hist(data)
gives:

I found yet an other way to do so. As you can see in other answers, density=True alone doesn't solve the problem, as it calculates the area under the curve in percentage. But that can easily be converted, just divide it by the width of the bars.
import matplotlib.pyplot as plt
data = [1000, 1000, 5000, 3000, 4000, 16000, 2000]
bins=10
plt.hist(data, bins=bins, density=True)
bar_width = (max(data)-min(data))/bins # calculate width of a bar
ticks = plt.yticks()[0] # get ticks
tick_labels = ticks * bar_width # calculate labels for ticks
tick_labels = map(lambda f: f"{f:0.2}%",tick_labels) # format float to string
plt.yticks(ticks=ticks, labels=tick_labels) # set new labels
plt.show()
However, the solution weights=np.ones(len(data)) / len(data) may be a shorther and cleaner. This is just an other way and without numpy

Plotting a choropleth map (with geopandas) using a user_defined classification scheme

I'm kind of new to python, so I'm hoping that the answer to my question is relatively straight forward.
I'm trying to make a choropleth map using geopandas. However, since I'm making multiple maps that need to be compared to each other, it is indispensable that I use a custom data classification scheme (rather than quantiles or jenks). Hence, I've been trying to work with the User_Defined scheme, and I'm able to create the bins but I don't know how to apply them to the map itself.
This is what I did to create my classification scheme:
import pysal.esda.mapclassify as ps
from pysal.esda.mapclassify import User_Defined
bins = [5, 20, 100, 600, 1000, 3000, 5000, 10000, 20000, 400000]
ud = User_Defined(projected_world_exports['Value'], bins)
(where 'Value' is the column I plot in the map)
And then when I try to plot the choropleth map I don't know what the scheme is meant to be called
projected_world_exports.plot(column='Value', cmap='Greens', scheme = ?????)
If anyone could help I would be hugely appreciative!
Thanks x

Here is an alternative approach that does not require modifying the geopandas code. It involves first labeling the bins so that you can create a custom colormap that maps each bin label to a specific color. A column must then be created in your geodataframe that specifies which bin label is applied to each row in the geodataframe, and this column is then used to plot the choropleth using the custom colormap.
from matplotlib.colors import LinearSegmentedColormap
bins = [5, 20, 100, 600, 1000, 3000, 5000, 10000, 20000, 400000]
# Maps values to a bin.
# The mapped values must start at 0 and end at 1.
def bin_mapping(x):
for idx, bound in enumerate(bins):
if x < bound:
return idx / (len(bins) - 1.0)
# Create the list of bin labels and the list of colors
# corresponding to each bin
bin_labels = [idx / (len(bins) - 1.0) for idx in range(len(bins))]
color_list = ['#edf8fb', '#b2e2e2', '#66c2a4', '#2ca25f', '#006d2c', \
'#fef0d9', '#fdcc8a', '#fc8d59', '#e34a33', '#b30000']
# Create the custom color map
cmap = LinearSegmentedColormap.from_list('mycmap',
[(lbl, color) for lbl, color in zip(bin_labels, color_list)])
projected_world_exports['Bin_Lbl'] = projected_world_exports['Value'].apply(bin_mapping)
projected_world_exports.plot(column='Bin_Lbl', cmap=cmap, alpha=1, vmin=0, vmax=1)

I took a look at the code of geopandas plotting function (https://github.com/geopandas/geopandas/blob/master/geopandas/plotting.py) but I guess the plot method only accepts one of the three name ("quantiles", "equal_interval", "fisher_jenks") but not directly a list of bins or a pysal.esda.mapclassify classifier such as User_Defined.
(I guess it could be linked to that issue where the last comment is about defining an API for "user defined" binning).
However for now I guess you can achieve this by slightly modifying and reusing the functions from the file I linked.
For example you could rewrite you're own version of plot_dataframe like this :
import numpy as np
def plot_dataframe(s, column, binning, cmap,
linewidth=1.0, figsize=None, **color_kwds):
import matplotlib.pyplot as plt
values = s[column]
values = np.array(binning.yb)
fig, ax = plt.subplots(figsize=figsize)
ax.set_aspect('equal')
mn = values.min()
mx = values.max()
poly_idx = np.array(
(s.geometry.type == 'Polygon') | (s.geometry.type == 'MultiPolygon'))
polys = s.geometry[poly_idx]
if not polys.empty:
plot_polygon_collection(ax, polys, values[poly_idx], True,
vmin=mn, vmax=mx, cmap=cmap,
linewidth=linewidth, **color_kwds)
plt.draw()
return ax
Then you would need to define the functions _flatten_multi_geoms and plot_polygon_collection by copying them and you are ready to use it like this :
bins = [5, 20, 100, 600, 1000, 3000, 5000, 10000, 20000, 400000]
ud = User_Defined(projected_world_exports['Value'], bins)
plot_dataframe(projected_world_exports, 'Value', ud, 'Greens')

This can be done easily using UserDefined scheme. While defining such scheme, a mapclassify.MapClassifier object will be used under the hood. In fact, all the supported schemes are provided by mapclassify.
For passing your bins, you need to pass them in classification_kwds arguments.
So, your code is going to be:
projected_world_exports.plot(
column='Value',
cmap='Greens',
scheme='UserDefined',
classification_kwds={'bins': bins}
)

How to make grid of the irregular data?

I have the numpy arrays of longitudes, latitudes, and the data.
I want to plot this data as a raster image using numpy, scipy, and matplotlib.
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
longitudes = np.array([[139.79391479492188, 140.51760864257812, 141.19119262695312, 141.82083129882812, 142.41165161132812],
[139.79225158691406, 140.51416015625, 141.18606567382812, 141.8140869140625, 142.40338134765625],
[139.78591918945312, 140.50637817382812, 141.17694091796875, 141.80377197265625, 142.3919677734375],
[139.78387451171875, 140.50253295898438, 141.17147827148438, 141.79678344726562, 142.38360595703125],
[139.77781677246094, 140.4949951171875, 141.16250610351562, 141.78646850585938, 142.37196350097656]],dtype=float)
latitudes = np.array([[55.61929702758789, 55.621070861816406, 55.61888122558594, 55.613487243652344, 55.60547637939453],
[55.53120040893555, 55.532840728759766, 55.53053665161133, 55.525047302246094, 55.5169677734375],
[55.44305419921875, 55.444580078125, 55.44219207763672, 55.43663024902344, 55.42848587036133],
[55.35470199584961, 55.356109619140625, 55.353614807128906, 55.34796905517578, 55.33975601196289],
[55.26683807373047, 55.268131256103516, 55.26553726196289, 55.25981140136719, 55.25152587890625]],dtype=float)
data = np.array([[10, 10, 10, 10, 10],
[20, 20, 20, 20, 20],
[30, 30, 30, 30, 30],
[40, 40, 40, 40, 40],
[50, 50, 50, 50, 50]],dtype=float)
x = longitudes.ravel()
y = latitudes.ravel()
z = data.ravel()
xMin, xMax = np.min(x), np.max(x)
yMin, yMax = np.min(y), np.max(y)
xi = np.linspace(xMin, xMax, 0.005) ##choosen spacing of 0.005
yi = np.linspace(yMin, yMax, 0.005) ##choosen spacing of 0.005
The data are not exactly a grid. Actually I could not imagine how to do it ahead:
zi_matplotlib = griddata(x, y, z, xi, yi, interp='linear')
from scipy.interpolate import griddata ##Using scipy method
zi_scipy = griddata((x, y), z, (xi, yi), method='nearest')
plt.imshow(????)
Any ideas and solution please.

You can use interpolation to convert the distorted grid into a regular grid. The interpolation fits the original data points and returns a function that can be evaluated at any point of your choosing, and in this case, you would choose a regular grid of points.
Here's an example:
import numpy as np
from scipy.interpolate import interp2d
import matplotlib.pyplot as plt
# your data here, as posted in the question
f = interp2d(lon, lat, data, kind="cubic", bounds_error=False)
dlon, dlat = 1.2, .2
xlon = np.linspace(min(lon.flat), max(lon.flat), 20)
xlat = np.linspace(min(lat.flat), max(lat.flat), 20)
# the next few lines are because there seems to be a bug in interp2d
# instead one would just want to use r = interp2d(X.flat, Y.flat) (where X,Y are as below)
# but for the version of scipy I'm using ('0.13.3'), this throws an exception.
r = np.zeros((len(xlon), len(xlat)))
for i, rlat in enumerate(xlat):
for j, rlon in enumerate(xlon):
r[i,j] = f(rlon, rlat)
X, Y = np.meshgrid(xlon, xlat)
plt.imshow(r, interpolation="nearest", origin="lower", extent=[min(xlon), max(xlon), min(xlat), max(xlat)], aspect=6.)
plt.scatter(lon.flat, lat.flat, color='k')
plt.show()
Here, I left the mesh fairly coarse (20x20) and used interpolation="nearest" so you could still see the colored squares representing each of the interpolated values, done, of course, on a regular grid (created using the two linspace calls). Note also the use or origin="lower" which sets the image and the scatter plot to have the same orientation.
To interpret this, the main issue is that changing of values from left-to-right. This is due to the data being specified as constant across the horizontal set of points, but because the points where these specified were warped, the interpolated values slowly change as they move across. For example, the lowest scatter point on the right should have approximately the same color as the highest one towards the left. Also, indicative of this is that there's not much color change between any of the two leftmost pairs, but a lot between the two right most, where the warping is largest.
Note that the interpolation could be done for any values, not only a regular grid, which is just being used for imshow as per the original question. Also note that I used bounds_error=False so I could evaluate a few points slightly outside of the original dataset, but be very careful with this as points outside of the original data will quickly become unreasonable due to the cubics being evaluated beyond the region where they were fit.

Assuming that longitudes and latitudes are equally spaced, you can use imshow directly as it features interpolation:
import numpy as np
import matplotlib.pyplot as plt
longitudes = np.array([[139.79391479492188, 140.51760864257812, 141.19119262695312, 141.82083129882812, 142.41165161132812],
[139.79225158691406, 140.51416015625, 141.18606567382812, 141.8140869140625, 142.40338134765625],
[139.78591918945312, 140.50637817382812, 141.17694091796875, 141.80377197265625, 142.3919677734375],
[139.78387451171875, 140.50253295898438, 141.17147827148438, 141.79678344726562, 142.38360595703125],
[139.77781677246094, 140.4949951171875, 141.16250610351562, 141.78646850585938, 142.37196350097656]],dtype=float)
latitudes = np.array([[55.61929702758789, 55.621070861816406, 55.61888122558594, 55.613487243652344, 55.60547637939453],
[55.53120040893555, 55.532840728759766, 55.53053665161133, 55.525047302246094, 55.5169677734375],
[55.44305419921875, 55.444580078125, 55.44219207763672, 55.43663024902344, 55.42848587036133],
[55.35470199584961, 55.356109619140625, 55.353614807128906, 55.34796905517578, 55.33975601196289],
[55.26683807373047, 55.268131256103516, 55.26553726196289, 55.25981140136719, 55.25152587890625]],dtype=float)
data = np.array([[10, 10, 10, 10, 10],
[20, 20, 20, 20, 20],
[30, 30, 30, 30, 30],
[40, 40, 40, 40, 40],
[50, 50, 50, 50, 50]],dtype=float)
extent = (longitudes[0,0], longitudes[0,-1], latitudes[0,0], latitudes[-1,0])
plt.imshow(data, interpolation='bilinear', extent=extent, aspect='auto')
plt.show()
I'm aware that this does not exactly answer your question. But I think it is an easy solution to the underlying problem.
Edit
I just realized that your data is in fact not exactly a grid, but almost. You have to decide if you still want to use my solution...

Here's an example of a scatter 3d plot using your data, breaking out each set of lat/long data in its own series with respective colored markers.
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
longitudes = np.array([[139.79391479492188, 140.51760864257812, 141.19119262695312, 141.82083129882812, 142.41165161132812],
[139.79225158691406, 140.51416015625, 141.18606567382812, 141.8140869140625, 142.40338134765625],
[139.78591918945312, 140.50637817382812, 141.17694091796875, 141.80377197265625, 142.3919677734375],
[139.78387451171875, 140.50253295898438, 141.17147827148438, 141.79678344726562, 142.38360595703125],
[139.77781677246094, 140.4949951171875, 141.16250610351562, 141.78646850585938, 142.37196350097656]],dtype=float)
latitudes = np.array([[55.61929702758789, 55.621070861816406, 55.61888122558594, 55.613487243652344, 55.60547637939453],
[55.53120040893555, 55.532840728759766, 55.53053665161133, 55.525047302246094, 55.5169677734375],
[55.44305419921875, 55.444580078125, 55.44219207763672, 55.43663024902344, 55.42848587036133],
[55.35470199584961, 55.356109619140625, 55.353614807128906, 55.34796905517578, 55.33975601196289],
[55.26683807373047, 55.268131256103516, 55.26553726196289, 55.25981140136719, 55.25152587890625]],dtype=float)
data = np.array([[10, 10, 10, 10, 10],
[20, 20, 20, 20, 20],
[30, 30, 30, 30, 30],
[40, 40, 40, 40, 40],
[50, 50, 50, 50, 50]],dtype=float)
colors = ['r','g','b','k','k']
markers = ['o','o','o','o','^']
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for i in range(5):
ax.scatter(longitudes[i], latitudes[i], data[i], c=colors[i], marker=markers[i])
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_zlabel('Data')
plt.show()
Which results in an image like

How to get this line plot to show up properly using matplotlib

I have these data structures:
X axis values:
delta_Array = np.array([1000,2000,3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000])
Y Axis values
error_matrix =
[[ 24.22468454 24.22570421 24.22589308 24.22595919 24.22598979
24.22600641 24.22601644 24.22602294 24.2260274 24.22603059]
[ 28.54275713 28.54503017 28.54545119 28.54559855 28.54566676
28.54570381 28.54572615 28.54574065 28.5457506 28.54575771]]
How do I plot them as a line plot using matplotlib and python
This code I came up with renders a flat line as follows
figure(3)
i = 0
for i in range(error_matrix.shape[0]):
plot(delta_Array, error_matrix[i,:])
title('errors')
xlabel('deltas')
ylabel('errors')
grid()
show()
The problem here looks like is scaling of the axes. But im not sure how to fix it. Any ideas, suggestions how to get the curvature showing up properly?

You could use ax.twinx to create twin axes:
import matplotlib.pyplot as plt
import numpy as np
delta_Array = np.array([1000,2000,3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000])
error_matrix = np.array(
[[ 24.22468454, 24.22570421, 24.22589308, 24.22595919, 24.22598979, 24.22600641, 24.22601644, 24.22602294, 24.2260274, 24.22603059],
[ 28.54275713, 28.54503017, 28.54545119, 28.54559855, 28.54566676, 28.54570381, 28.54572615, 28.54574065, 28.5457506, 28.54575771]])
fig = plt.figure()
ax = []
ax.append(fig.add_subplot(1, 1, 1))
ax.append(ax[0].twinx())
colors = ('red', 'blue')
for i,c in zip(range(error_matrix.shape[0]), colors):
ax[i].plot(delta_Array, error_matrix[i,:], color = c)
plt.show()
yields
The red line corresponds to error_matrix[0, :], the blue with error_matrix[1, :].
Another possibility is to plot the ratio error_matrix[0, :]/error_matrix[1, :].

Matplotlib is showing you the right thing. If you want both curves on the same y scale, then they will be flat because their difference is much larger than the variation in each. If you don't mind different y scales, then do as unutbu suggested.
If you want to compare the rate of change between the functions, then I'd suggest normalising by the highest value in each:
import matplotlib.pyplot as plt
import numpy as np
plt.plot(delta_Array, error_matrix[0] / np.max(error_matrix[0]), 'b-')
plt.plot(delta_Array, error_matrix[1] / np.max(error_matrix[1]), 'r-')
plt.show()
And by the way, you don't need to be explicit in the dimensions of your 2D array. When you use error_matrix[i,:], it is the same as error_matrix[i].

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Connect different data series with the same line - python

Related

Matplotlib + pandas change xtick label frequency when using period[Q-DEC] [duplicate]

Plot an histogram with y-axis as percentage (using FuncFormatter?)

Plotting a choropleth map (with geopandas) using a user_defined classification scheme

How to make grid of the irregular data?

How to get this line plot to show up properly using matplotlib

Categories

Resources