I have been trying to insert a .dot graph from an sklearn decision tree into a pyplot subplot, and have been struggling to do so. The library pygraphviz is refusing to work on my Windows system, so I've used the following to insert the images:
from subprocess import check_call
from PIL import Image
import matplotlib.pyplot as plt
gridSpec = fig.add_gridspec(3, 2)
treeSubPlot = plt.subplot(gridSpec.new_subplotspec((2, 1)))
treeSubPlot.tick_params(axis = "both", which = "both", bottom = False, top = False, labelbottom = False, right = False, left = False, labelleft = False)
treeSubPlot.set_xlabel("Final model")
# Obtain the size of a single plot in inches to inform the dot picture creator.
height = treeSubPlot.get_window_extent().y1 - treeSubPlot.get_window_extent().y0
width = treeSubPlot.get_window_extent().x1 - treeSubPlot.get_window_extent().x0
check_call(['dot','-Tpng', "-Gsize=" + str(width/100) + "," + str(height/100) + "!", "-Gdpi=100", "-Gratio=fill", 'random_tree.dot','-o','random.png'])
randImg = Image.open("random.png")
treeSubplot.imshow(randImg, aspect = "auto")
With the .dot file:
digraph Tree {
node [shape=box] ;
0 [label="B field <= 332.72\nsamples = 19\nvalue = 0.41"] ;
1 [label="samples = 11\nvalue = 0.67"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="B field <= 576.73\nsamples = 8\nvalue = 0.04"] ;
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
3 [label="samples = 4\nvalue = 0.05"] ;
2 -> 3 ;
4 [label="samples = 4\nvalue = 0.03"] ;
2 -> 4 ;
}
What results is the plot, but it looks extremely 'jagged', for lack of a better term:
Are there any settings I can use to stop this? I'm trying to make the image to fit the subplot perfectly but it's still jagged.
There are several issues.
The image produced by dot is not actually the exact size you specify in -Gsize={},{}! parameter.
There are rounding errors, because the matplotlib axes is positioned in relative figure coordinates and those later do not precisely match the integer positions of pixels.
Finally, the axes to show the image in is extremely small, so the above effects become even more pronounced.
A solution one can come up with is to find the approximate size of the axes, call the dot program and retrieve the image from it. Then set the actual size of the axes according to the actual image size (this might shift the axes by a few pixels compared to their original position).
In the code below I took a bigger axes. Also note that a higher dpi or figure size will increase the readability.
import numpy as np
from subprocess import check_call
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.transforms as mtrans
dpi = 100
fig = plt.figure(dpi=dpi)
gridSpec = fig.add_gridspec(3, 2)
ax = plt.subplot(gridSpec.new_subplotspec((2, 1)))
#ax = fig.add_subplot(111)
ax.tick_params(axis = "both", which = "both", bottom = False, top = False,
labelbottom = False, right = False, left = False, labelleft = False)
ax.set_xlabel("Final model")
# calculate nominal size supplied to `dot`
ext = ax.get_position().transformed(fig.transFigure).extents
bbox = mtrans.Bbox.from_extents(np.array(ext).astype(int))
height = bbox.height
width = bbox.width
# get image from dot
check_call(['dot','-Tpng', "-Gsize={},{}!".format(width/100, height/100),
"-Gdpi={}".format(100), "-Gratio=fill", 'random_tree.dot','-o','random.png'])
randImg = np.array(Image.open("random.png"))
# get ACTUAL size from produced image
aheight, awidth, _ = randImg.shape
# create new bounding box from actual size
abbox = mtrans.Bbox.from_bounds(bbox.x0, bbox.y0, awidth, aheight)
# transform bbox to figure coordintes
abboxf = bbox.transformed(fig.transFigure.inverted())
# Use the thus created bbox to modify the position of the axes
ax.set_position(abboxf)
# finally plot the data
ax.imshow(randImg, aspect = "auto")
# save image (this should now be correct)
plt.savefig("randomdot_mpl.png")
# show image (this might still be slightly wrong,
# if the GUI changes the figure size on the fly.)
plt.show()
As you can see the result is still not perfect (e.g the l in "values" is slightly thicker). In general, a much better solution might be to not place the image inside the axes at all, but to use a figimage, which is a non-resampled image placed inside the figure. If you still want to show the axes around it, you need to make the axes transparent to "see-through".
This would be accomplished by the following code
import numpy as np
from subprocess import check_call
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.transforms as mtrans
dpi = 100
fig = plt.figure(dpi=dpi)
#gridSpec = fig.add_gridspec(3, 2)
#ax = plt.subplot(gridSpec.new_subplotspec((2, 1)))
ax = fig.add_subplot(111)
ax.tick_params(axis = "both", which = "both", bottom = False, top = False,
labelbottom = False, right = False, left = False, labelleft = False)
ax.patch.set_visible(False)
ax.set_xlabel("Final model")
# calculate nominal size supplied to `dot`
ext = ax.get_position().transformed(fig.transFigure).extents
bbox = mtrans.Bbox.from_extents(np.array(ext).astype(int))
height = bbox.height
width = bbox.width
# get image from dot
check_call(['dot','-Tpng', "-Gsize={},{}!".format(width/100, height/100),
"-Gdpi={}".format(100), "-Gratio=fill", 'random_tree.dot','-o','random.png'])
randImg = np.array(Image.open("random.png"))
# produce a figimage, i.e. a non-resampled image
fig.figimage(randImg, xo=bbox.x0, yo=bbox.y0)
# save image (this should now be perfect)
plt.savefig("randomdot_mpl.png")
# show image (this should now be perfect)
plt.show()
Related
I am trying to add a function to the pycircos module written by ponnhide on https://github.com/ponnhide/pyCircos/blob/master/pycircos/pycircos.py. I tried to add a circular arrow in the circular plot, which is a patch collection from a ring and an arrow. However, when I call the function I put into the pycircos module in my python notebook, I get a very distorted shape, which has nothing to do with the ring (wedge) or the arrow (polygon).
I have integrated the following function from https://github.com/NaysanSaran/markov-chain/blob/master/notebooks/Drawing.ipynb in the following pycircos module:
def draw_self_loop(self, center=(0, 0), radius=500, facecolor='#2693de', edgecolor='#000000', theta1=-30, theta2=240):
# Add the ring
rwidth = 0.02
ring = mpatches.Wedge(center, radius, theta1, theta2, width=rwidth)
# Triangle edges
offset = 0.02
xcent = center[0] - radius + (rwidth/2)
left = [xcent - offset, center[1]]
right = [xcent + offset, center[1]]
bottom = [(left[0]+right[0])/2., center[1]-0.05]
arrow = plt.Polygon([left, right, bottom, left])
p = PatchCollection(
[ring, arrow],
edgecolor = edgecolor,
facecolor = facecolor
)
self.ax.add_collection(p)
In my python notebook, I have called the function from the class object "circle" at the second last line:
%matplotlib inline
import pycircos
import matplotlib.pyplot as plt
import collections
from Bio import SeqIO
Garc = pycircos.Garc
Gcircle = pycircos.Gcircle
arcList_size = []
arcList_name = []
circle = Gcircle(figsize=(8,8))
with open("BIM_data_general.txt") as f:
f.readline()
for line in f:
line = line.rstrip().split(",")
name = line[0]
length = int(line[-1])
arc = Garc(arc_id=name, size=length, interspace=2, raxis_range=(990,1000),
labelposition=50,facecolor="#FFFFFF",edgecolor="#FFFFFF",labelsize=15, label_visible=False)
arcList_size.append(arc.size)
arcList_name.append(arc.arc_id)
circle.add_garc(arc)
circle.set_garcs(0,360)
circle.draw_self_loop()
circle.figure
instead of this example , I get this distorted shape
I'm facing an issue when trying to use 'sampleRectangle()' function in GEE, it is returning 1x1 arrays and I can't seem to find a workaround. Please, see below a python code in which I'm using an approach posted by Justin Braaten. I suspect there's something wrong with the geometry object I'm passing to the function, but at the same time I've tried several ways to check how this argument is behaving and couldn't no spot any major issue.
Can anyone give me a hand trying to understand what is happening?
Thanks!
import json
import ee
import numpy as np
import matplotlib.pyplot as plt
ee.Initialize()
point = ee.Geometry.Point([-55.8571, -9.7864])
box_l8sr = ee.Geometry(point.buffer(50).bounds())
box_l8sr2 = ee.Geometry.Polygon(box_l8sr.coordinates())
# print(box_l8sr2)
# Define an image.
# l8sr_y = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_038029_20180810')
oli_sr_coll = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
## Function to mask out clouds and cloud-shadows present in Landsat images
def maskL8sr(image):
## Bits 3 and 5 are cloud shadow and cloud, respectively.
cloudShadowBitMask = (1 << 3)
cloudsBitMask = (1 << 5)
## Get the pixel QA band.
qa = image.select('pixel_qa')
## Both flags should be set to zero, indicating clear conditions.
mask = qa.bitwiseAnd(cloudShadowBitMask).eq(0)
mask = qa.bitwiseAnd(cloudsBitMask).eq(0)
return image.updateMask(mask)
l8sr_y = oli_sr_coll.filterDate('2019-01-01', '2019-12-31').map(maskL8sr).mean()
l8sr_bands = l8sr_y.select(['B2', 'B3', 'B4']).sampleRectangle(box_l8sr2)
print(type(l8sr_bands))
# Get individual band arrays.
band_arr_b4 = l8sr_bands.get('B4')
band_arr_b3 = l8sr_bands.get('B3')
band_arr_b2 = l8sr_bands.get('B2')
# Transfer the arrays from server to client and cast as np array.
np_arr_b4 = np.array(band_arr_b4.getInfo())
np_arr_b3 = np.array(band_arr_b3.getInfo())
np_arr_b2 = np.array(band_arr_b2.getInfo())
print(np_arr_b4.shape)
print(np_arr_b3.shape)
print(np_arr_b2.shape)
# Expand the dimensions of the images so they can be concatenated into 3-D.
np_arr_b4 = np.expand_dims(np_arr_b4, 2)
np_arr_b3 = np.expand_dims(np_arr_b3, 2)
np_arr_b2 = np.expand_dims(np_arr_b2, 2)
# # print(np_arr_b4.shape)
# # print(np_arr_b5.shape)
# # print(np_arr_b6.shape)
# # Stack the individual bands to make a 3-D array.
rgb_img = np.concatenate((np_arr_b2, np_arr_b3, np_arr_b4), 2)
# print(rgb_img.shape)
# # Scale the data to [0, 255] to show as an RGB image.
rgb_img_test = (255*((rgb_img - 100)/3500)).astype('uint8')
# plt.imshow(rgb_img)
plt.show()
# # # create L8OLI plot
# fig, ax = plt.subplots()
# ax.set(title = "Satellite Image")
# ax.set_axis_off()
# plt.plot(42, 42, 'ko')
# img = ax.imshow(rgb_img_test, interpolation='nearest')
I have the same issue. It seems to have something to do with .mean(), or any reduction of image collections for that matter.
One solution is to reproject after the reduction. For example, you could try adding "reproject" at the end:
l8sr_y = oli_sr_coll.filterDate('2019-01-01', '2019-12-31').map(maskL8sr).mean().reproject(crs = ee.Projection('EPSG:4326'), scale=30)
It should work.
Would there be a way to plot the borders of the continents with Basemap (or without Basemap, if there is some other way), without those annoying rivers coming along? Especially that piece of Kongo River, not even reaching the ocean, is disturbing.
EDIT: I intend to further plot data over the map, like in the Basemap gallery (and still have the borderlines of the continents drawn as black lines over the data, to give structure for the worldmap) so while the solution by Hooked below is nice, masterful even, it's not applicable for this purpose.
Image produced by:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8, 4.5))
plt.subplots_adjust(left=0.02, right=0.98, top=0.98, bottom=0.00)
m = Basemap(projection='robin',lon_0=0,resolution='c')
m.fillcontinents(color='gray',lake_color='white')
m.drawcoastlines()
plt.savefig('world.png',dpi=75)
For reasons like this i often avoid Basemap alltogether and read the shapefile in with OGR and convert them to a Matplotlib artist myself. Which is alot more work but also gives alot more flexibility.
Basemap has some very neat features like converting the coordinates of input data to your 'working projection'.
If you want to stick with Basemap, get a shapefile which doesnt contain the rivers. Natural Earth for example has a nice 'Land' shapefile in the physical section (download 'scale rank' data and uncompress). See http://www.naturalearthdata.com/downloads/10m-physical-vectors/
You can read the shapefile in with the m.readshapefile() method from Basemap. This allows you to get the Matplotlib Path vertices and codes in the projection coordinates which you can then convert into a new Path. Its a bit of a detour but it gives you all styling options from Matplotlib, most of which are not directly available via Basemap. Its a bit hackish, but i dont now another way while sticking to Basemap.
So:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
from matplotlib.collections import PathCollection
from matplotlib.path import Path
fig = plt.figure(figsize=(8, 4.5))
plt.subplots_adjust(left=0.02, right=0.98, top=0.98, bottom=0.00)
# MPL searches for ne_10m_land.shp in the directory 'D:\\ne_10m_land'
m = Basemap(projection='robin',lon_0=0,resolution='c')
shp_info = m.readshapefile('D:\\ne_10m_land', 'scalerank', drawbounds=True)
ax = plt.gca()
ax.cla()
paths = []
for line in shp_info[4]._paths:
paths.append(Path(line.vertices, codes=line.codes))
coll = PathCollection(paths, linewidths=0, facecolors='grey', zorder=2)
m = Basemap(projection='robin',lon_0=0,resolution='c')
# drawing something seems necessary to 'initiate' the map properly
m.drawcoastlines(color='white', zorder=0)
ax = plt.gca()
ax.add_collection(coll)
plt.savefig('world.png',dpi=75)
Gives:
How to remove "annoying" rivers:
If you want to post-process the image (instead of working with Basemap directly) you can remove bodies of water that don't connect to the ocean:
import pylab as plt
A = plt.imread("world.png")
import numpy as np
import scipy.ndimage as nd
import collections
# Get a counter of the greyscale colors
a = A[:,:,0]
colors = collections.Counter(a.ravel())
outside_and_water_color, land_color = colors.most_common(2)
# Find the contigous landmass
land_idx = a == land_color[0]
# Index these land masses
L = np.zeros(a.shape,dtype=int)
L[land_idx] = 1
L,mass_count = nd.measurements.label(L)
# Loop over the land masses and fill the "holes"
# (rivers without outlays)
L2 = np.zeros(a.shape,dtype=int)
L2[land_idx] = 1
L2 = nd.morphology.binary_fill_holes(L2)
# Remap onto original image
new_land = L2==1
A2 = A.copy()
c = [land_color[0],]*3 + [1,]
A2[new_land] = land_color[0]
# Plot results
plt.subplot(221)
plt.imshow(A)
plt.axis('off')
plt.subplot(222)
plt.axis('off')
B = A.copy()
B[land_idx] = [1,0,0,1]
plt.imshow(B)
plt.subplot(223)
L = L.astype(float)
L[L==0] = None
plt.axis('off')
plt.imshow(L)
plt.subplot(224)
plt.axis('off')
plt.imshow(A2)
plt.tight_layout() # Only with newer matplotlib
plt.show()
The first image is the original, the second identifies the land mass. The third is not needed but fun as it ID's each separate contiguous landmass. The fourth picture is what you want, the image with the "rivers" removed.
Following user1868739's example, I am able to select only the paths (for some lakes) that I want:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8, 4.5))
plt.subplots_adjust(left=0.02, right=0.98, top=0.98, bottom=0.00)
m = Basemap(resolution='c',projection='robin',lon_0=0)
m.fillcontinents(color='white',lake_color='white',zorder=2)
coasts = m.drawcoastlines(zorder=1,color='white',linewidth=0)
coasts_paths = coasts.get_paths()
ipolygons = range(83) + [84] # want Baikal, but not Tanganyika
# 80 = Superior+Michigan+Huron, 81 = Victoria, 82 = Aral, 83 = Tanganyika,
# 84 = Baikal, 85 = Great Bear, 86 = Great Slave, 87 = Nyasa, 88 = Erie
# 89 = Winnipeg, 90 = Ontario
for ipoly in ipolygons:
r = coasts_paths[ipoly]
# Convert into lon/lat vertices
polygon_vertices = [(vertex[0],vertex[1]) for (vertex,code) in
r.iter_segments(simplify=False)]
px = [polygon_vertices[i][0] for i in xrange(len(polygon_vertices))]
py = [polygon_vertices[i][2] for i in xrange(len(polygon_vertices))]
m.plot(px,py,linewidth=0.5,zorder=3,color='black')
plt.savefig('world2.png',dpi=100)
But this only works when using white background for the continents. If I change color to 'gray' in the following line, we see that other rivers and lakes are not filled with the same color as the continents are. (Also playing with area_thresh will not remove those rivers that are connected to ocean.)
m.fillcontinents(color='gray',lake_color='white',zorder=2)
The version with white background is adequate for further color-plotting all kind of land information over the continents, but a more elaborate solution would be needed, if one wants to retain the gray background for continents.
I frequently modify Basemap's drawcoastlines() to avoid those 'broken' rivers. I also modify drawcountries() for the sake of data source consistency.
Here is what I use in order to support the different resolutions available in Natural Earth data:
from mpl_toolkits.basemap import Basemap
class Basemap(Basemap):
""" Modify Basemap to use Natural Earth data instead of GSHHG data """
def drawcoastlines(self):
shapefile = 'data/naturalearth/coastline/ne_%sm_coastline' % \
{'l':110, 'm':50, 'h':10}[self.resolution]
self.readshapefile(shapefile, 'coastline', linewidth=1.)
def drawcountries(self):
shapefile = 'data/naturalearth/countries/ne_%sm_admin_0_countries' % \
{'l':110, 'm':50, 'h':10}[self.resolution]
self.readshapefile(shapefile, 'countries', linewidth=0.5)
m = Basemap(llcrnrlon=-90, llcrnrlat=-40, urcrnrlon=-30, urcrnrlat=+20,
resolution='l') # resolution = (l)ow | (m)edium | (h)igh
m.drawcoastlines()
m.drawcountries()
Here is the output:
Please note that by default Basemap uses resolution='c' (crude), which is not supported in the code shown.
If you're OK with plotting outlines rather than shapefiles, it's pretty easy to plot coastlines that you can get from wherever. I got my coastlines from the NOAA Coastline Extractor in MATLAB format:
http://www.ngdc.noaa.gov/mgg/shorelines/shorelines.html
To edit the coastlines, I converted to SVG, then edited them with Inkscape, then converted back to the lat/lon text file ("MATLAB" format).
All Python code is included below.
# ---------------------------------------------------------------
def plot_lines(mymap, lons, lats, **kwargs) :
"""Plots a custom coastline. This plots simple lines, not
ArcInfo-style SHAPE files.
Args:
lons: Longitude coordinates for line segments (degrees E)
lats: Latitude coordinates for line segments (degrees N)
Type Info:
len(lons) == len(lats)
A NaN in lons and lats signifies a new line segment.
See:
giss.noaa.drawcoastline_file()
"""
# Project onto the map
x, y = mymap(lons, lats)
# BUG workaround: Basemap projects our NaN's to 1e30.
x[x==1e30] = np.nan
y[y==1e30] = np.nan
# Plot projected line segments.
mymap.plot(x, y, **kwargs)
# Read "Matlab" format files from NOAA Coastline Extractor.
# See: http://www.ngdc.noaa.gov/mgg/coast/
lineRE=re.compile('(.*?)\s+(.*)')
def read_coastline(fname, take_every=1) :
nlines = 0
xdata = array.array('d')
ydata = array.array('d')
for line in file(fname) :
# if (nlines % 10000 == 0) :
# print 'nlines = %d' % (nlines,)
if (nlines % take_every == 0 or line[0:3] == 'nan') :
match = lineRE.match(line)
lon = float(match.group(1))
lat = float(match.group(2))
xdata.append(lon)
ydata.append(lat)
nlines = nlines + 1
return (np.array(xdata),np.array(ydata))
def drawcoastline_file(mymap, fname, **kwargs) :
"""Reads and plots a coastline file.
See:
giss.basemap.drawcoastline()
giss.basemap.read_coastline()
"""
lons, lats = read_coastline(fname, take_every=1)
return drawcoastline(mymap, lons, lats, **kwargs)
# =========================================================
# coastline2svg.py
#
import giss.io.noaa
import os
import numpy as np
import sys
svg_header = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
version="1.1"
width="360"
height="180"
id="svg2">
<defs
id="defs4" />
<metadata
id="metadata7">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
id="layer1">
"""
path_tpl = """
<path
d="%PATH%"
id="%PATH_ID%"
style="fill:none;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
"""
svg_footer = "</g></svg>"
# Set up paths
data_root = os.path.join(os.environ['HOME'], 'data')
#modelerc = giss.modele.read_modelerc()
#cmrun = modelerc['CMRUNDIR']
#savedisk = modelerc['SAVEDISK']
ifname = sys.argv[1]
ofname = ifname.replace('.dat', '.svg')
lons, lats = giss.io.noaa.read_coastline(ifname, 1)
out = open(ofname, 'w')
out.write(svg_header)
path_id = 1
points = []
for lon, lat in zip(lons, lats) :
if np.isnan(lon) or np.isnan(lat) :
# Process what we have
if len(points) > 2 :
out.write('\n<path d="')
out.write('m %f,%f L' % (points[0][0], points[0][1]))
for pt in points[1:] :
out.write(' %f,%f' % pt)
out.write('"\n id="path%d"\n' % (path_id))
# out.write('style="fill:none;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"')
out.write(' />\n')
path_id += 1
points = []
else :
lon += 180
lat = 180 - (lat + 90)
points.append((lon, lat))
out.write(svg_footer)
out.close()
# =============================================================
# svg2coastline.py
import os
import sys
import re
# Reads the output of Inkscape's "Plain SVG" format, outputs in NOAA MATLAB coastline format
mainRE = re.compile(r'\s*d=".*"')
lineRE = re.compile(r'\s*d="(m|M)\s*(.*?)"')
fname = sys.argv[1]
lons = []
lats = []
for line in open(fname, 'r') :
# Weed out extraneous lines in the SVG file
match = mainRE.match(line)
if match is None :
continue
match = lineRE.match(line)
# Stop if something is wrong
if match is None :
sys.stderr.write(line)
sys.exit(-1)
type = match.group(1)[0]
spairs = match.group(2).split(' ')
x = 0
y = 0
for spair in spairs :
if spair == 'L' :
type = 'M'
continue
(sdelx, sdely) = spair.split(',')
delx = float(sdelx)
dely = float(sdely)
if type == 'm' :
x += delx
y += dely
else :
x = delx
y = dely
lon = x - 180
lat = 90 - y
print '%f\t%f' % (lon, lat)
print 'nan\tnan'
Okay I think I have a partial solution.
The basic idea is that the paths used by drawcoastlines() are ordered by the size/area. Which means the first N paths are (for most applications) the main land masses and lakes and the later paths the smaller islands and rivers.
The issue is that the first N paths that you want will depend on the projection (e.g., global, polar, regional), if area_thresh has been applied and whether you want lakes or small islands etc. In other words, you will have to tweak this per application.
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
mp = 'cyl'
m = Basemap(resolution='c',projection=mp,lon_0=0,area_thresh=200000)
fill_color = '0.9'
# If you don't want lakes set lake_color to fill_color
m.fillcontinents(color=fill_color,lake_color='white')
# Draw the coastlines, with a thin line and same color as the continent fill.
coasts = m.drawcoastlines(zorder=100,color=fill_color,linewidth=0.5)
# Exact the paths from coasts
coasts_paths = coasts.get_paths()
# In order to see which paths you want to retain or discard you'll need to plot them one
# at a time noting those that you want etc.
for ipoly in xrange(len(coasts_paths)):
print ipoly
r = coasts_paths[ipoly]
# Convert into lon/lat vertices
polygon_vertices = [(vertex[0],vertex[1]) for (vertex,code) in
r.iter_segments(simplify=False)]
px = [polygon_vertices[i][0] for i in xrange(len(polygon_vertices))]
py = [polygon_vertices[i][1] for i in xrange(len(polygon_vertices))]
m.plot(px,py,'k-',linewidth=1)
plt.show()
Once you know the relevant ipoly to stop drawing (poly_stop) then you can do something like this...
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
mproj = ['nplaea','cyl']
mp = mproj[0]
if mp == 'nplaea':
m = Basemap(resolution='c',projection=mp,lon_0=0,boundinglat=30,area_thresh=200000,round=1)
poly_stop = 10
else:
m = Basemap(resolution='c',projection=mp,lon_0=0,area_thresh=200000)
poly_stop = 18
fill_color = '0.9'
# If you don't want lakes set lake_color to fill_color
m.fillcontinents(color=fill_color,lake_color='white')
# Draw the coastlines, with a thin line and same color as the continent fill.
coasts = m.drawcoastlines(zorder=100,color=fill_color,linewidth=0.5)
# Exact the paths from coasts
coasts_paths = coasts.get_paths()
# In order to see which paths you want to retain or discard you'll need to plot them one
# at a time noting those that you want etc.
for ipoly in xrange(len(coasts_paths)):
if ipoly > poly_stop: continue
r = coasts_paths[ipoly]
# Convert into lon/lat vertices
polygon_vertices = [(vertex[0],vertex[1]) for (vertex,code) in
r.iter_segments(simplify=False)]
px = [polygon_vertices[i][0] for i in xrange(len(polygon_vertices))]
py = [polygon_vertices[i][1] for i in xrange(len(polygon_vertices))]
m.plot(px,py,'k-',linewidth=1)
plt.show()
As per my comment to #sampo-smolander
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8, 4.5))
plt.subplots_adjust(left=0.02, right=0.98, top=0.98, bottom=0.00)
m = Basemap(resolution='c',projection='robin',lon_0=0)
m.fillcontinents(color='gray',lake_color='white',zorder=2)
coasts = m.drawcoastlines(zorder=1,color='white',linewidth=0)
coasts_paths = coasts.get_paths()
ipolygons = range(83) + [84]
for ipoly in xrange(len(coasts_paths)):
r = coasts_paths[ipoly]
# Convert into lon/lat vertices
polygon_vertices = [(vertex[0],vertex[1]) for (vertex,code) in
r.iter_segments(simplify=False)]
px = [polygon_vertices[i][0] for i in xrange(len(polygon_vertices))]
py = [polygon_vertices[i][1] for i in xrange(len(polygon_vertices))]
if ipoly in ipolygons:
m.plot(px,py,linewidth=0.5,zorder=3,color='black')
else:
m.plot(px,py,linewidth=0.5,zorder=4,color='grey')
plt.savefig('world2.png',dpi=100)
I am trying to plot a very big file (~5 GB) using python and matplotlib. I am able to load the whole file in memory (the total available in the machine is 16 GB) but when I plot it using simple imshow I get a segmentation fault. This is most probable to the ulimit which I have set to 15000 but I cannot set higher. I have come to the conclusion that I need to plot my array in batches and therefore made a simple code to do that. My main isue is that when I plot a batch of the big array the x coordinates start always from 0 and there is no way I can overlay the images to create a final big one. If you have any suggestion please let me know. Also I am not able to install new packages like "Image" on this machine due to administrative rights. Here is a sample of the code that reads the first 12 lines of my array and make 3 plots.
import os
import sys
import scipy
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.cm as cm
from optparse import OptionParser
from scipy import fftpack
from scipy.fftpack import *
from cmath import *
from pylab import *
import pp
import fileinput
import matplotlib.pylab as plt
import pickle
def readalllines(file1,rows,freqs):
file = open(file1,'r')
sizer = int(rows*freqs)
i = 0
q = np.zeros(sizer,'float')
for i in range(rows*freqs):
s =file.readline()
s = s.split()
#print s[4],q[i]
q[i] = float(s[4])
if i%262144 == 0:
print '\r ',int(i*100.0/(337*262144)),' percent complete',
i += 1
file.close()
return q
parser = OptionParser()
parser.add_option('-f',dest="filename",help="Read dynamic spectrum from FILE",metavar="FILE")
parser.add_option('-t',dest="dtime",help="The time integration used in seconds, default 10",default=10)
parser.add_option('-n',dest="dfreq",help="The bandwidth of each frequency channel in Hz",default=11.92092896)
parser.add_option('-w',dest="reduce",help="The chuncker divider in frequency channels, integer default 16",default=16)
(opts,args) = parser.parse_args()
rows=12
freqs = 262144
file1 = opts.filename
s = readalllines(file1,rows,freqs)
s = np.reshape(s,(rows,freqs))
s = s.T
print s.shape
#raw_input()
#s_shift = scipy.fftpack.fftshift(s)
#fig = plt.figure()
#fig.patch.set_alpha(0.0)
#axes = plt.axes()
#axes.patch.set_alpha(0.0)
###plt.ylim(0,8)
plt.ion()
i = 0
for o in range(0,rows,4):
fig = plt.figure()
#plt.clf()
plt.imshow(s[:,o:o+4],interpolation='nearest',aspect='auto', cmap=cm.gray_r, origin='lower')
if o == 0:
axis([0,rows,0,freqs])
fdf, fdff = xticks()
print fdf
xticks(fdf+o)
print xticks()
#axis([o,o+4,0,freqs])
plt.draw()
#w, h = fig.canvas.get_width_height()
#buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
#buf.shape = (w,h,4)
#buf = np.rol(buf, 3, axis=2)
#w,h,_ = buf.shape
#img = Image.fromstring("RGBA", (w,h),buf.tostring())
#if prev:
# prev.paste(img)
# del prev
#prev = img
i += 1
pl.colorbar()
pl.show()
If you plot any array with more than ~2k pixels across something in your graphics chain will down sample the image in some way to display it on your monitor. I would recommend down sampling in a controlled way, something like
data = convert_raw_data_to_fft(args) # make sure data is row major
def ds_decimate(row,step = 100):
return row[::step]
def ds_sum(row,step):
return np.sum(row[:step*(len(row)//step)].reshape(-1,step),1)
# as per suggestion from tom10 in comments
def ds_max(row,step):
return np.max(row[:step*(len(row)//step)].reshape(-1,step),1)
data_plotable = [ds_sum(d) for d in data] # plug in which ever function you want
or interpolation.
Matplotlib is pretty memory-inefficient when plotting images. It creates several full-resolution intermediate arrays, which is probably why your program is crashing.
One solution is to downsample the image before feeding it into matplotlib, as #tcaswell suggests.
I also wrote some wrapper code to do this downsampling automatically, based on your screen resolution. It's at https://github.com/ChrisBeaumont/mpl-modest-image, if it's useful. It also has the advantage that the image is resampled on the fly, so you can still pan and zoom without sacrificing resolution where you need it.
I think you're just missing the extent=(left, right, bottom, top) keyword argument in plt.imshow.
x = np.random.randn(2, 10)
y = np.ones((4, 10))
x[0] = 0 # To make it clear which side is up, etc
y[0] = -1
plt.imshow(x, extent=(0, 10, 0, 2))
plt.imshow(y, extent=(0, 10, 2, 6))
# This is necessary, else the plot gets scaled and only shows the last array
plt.ylim(0, 6)
plt.colorbar()
plt.show()
My goal is to trace drawings that have a lot of separate shapes in them and to split these shapes into individual images. It is black on white. I'm quite new to numpy,opencv&co - but here is my current thought:
scan for black pixels
black pixel found -> watershed
find watershed boundary (as polygon path)
continue searching, but ignore points within the already found boundaries
I'm not very good at these kind of things, is there a better way?
First I tried to find the rectangular bounding box of the watershed results (this is more or less a collage of examples):
from numpy import *
import numpy as np
from scipy import ndimage
np.set_printoptions(threshold=np.nan)
a = np.zeros((512, 512)).astype(np.uint8) #unsigned integer type needed by watershed
y, x = np.ogrid[0:512, 0:512]
m1 = ((y-200)**2 + (x-100)**2 < 30**2)
m2 = ((y-350)**2 + (x-400)**2 < 20**2)
m3 = ((y-260)**2 + (x-200)**2 < 20**2)
a[m1+m2+m3]=1
markers = np.zeros_like(a).astype(int16)
markers[0, 0] = 1
markers[200, 100] = 2
markers[350, 400] = 3
markers[260, 200] = 4
res = ndimage.watershed_ift(a.astype(uint8), markers)
unique(res)
B = argwhere(res.astype(uint8))
(ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1
tr = a[ystart:ystop, xstart:xstop]
print tr
Somehow, when I use the original array (a) then argwhere seems to work, but after the watershed (res) it just outputs the complete array again.
The next step could be to find the polygon path around the shape, but the bounding box would be great for now!
Please help!
#Hooked has already answered most of your question, but I was in the middle of writing this up when he answered, so I'll post it in the hopes that it's still useful...
You're trying to jump through a few too many hoops. You don't need watershed_ift.
You use scipy.ndimage.label to differentiate separate objects in a boolean array and scipy.ndimage.find_objects to find the bounding box of each object.
Let's break things down a bit.
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
def draw_circle(grid, x0, y0, radius):
ny, nx = grid.shape
y, x = np.ogrid[:ny, :nx]
dist = np.hypot(x - x0, y - y0)
grid[dist < radius] = True
return grid
# Generate 3 circles...
a = np.zeros((512, 512), dtype=np.bool)
draw_circle(a, 100, 200, 30)
draw_circle(a, 400, 350, 20)
draw_circle(a, 200, 260, 20)
# Label the objects in the array.
labels, numobjects = ndimage.label(a)
# Now find their bounding boxes (This will be a tuple of slice objects)
# You can use each one to directly index your data.
# E.g. a[slices[0]] gives you the original data within the bounding box of the
# first object.
slices = ndimage.find_objects(labels)
#-- Plotting... -------------------------------------
fig, ax = plt.subplots()
ax.imshow(a)
ax.set_title('Original Data')
fig, ax = plt.subplots()
ax.imshow(labels)
ax.set_title('Labeled objects')
fig, axes = plt.subplots(ncols=numobjects)
for ax, sli in zip(axes.flat, slices):
ax.imshow(labels[sli], vmin=0, vmax=numobjects)
tpl = 'BBox:\nymin:{0.start}, ymax:{0.stop}\nxmin:{1.start}, xmax:{1.stop}'
ax.set_title(tpl.format(*sli))
fig.suptitle('Individual Objects')
plt.show()
Hopefully that makes it a bit clearer how to find the bounding boxes of the objects.
Use the ndimage library from scipy. The function label places a unique tag on each block of pixels that are within a threshold. This identifies the unique clusters (shapes). Starting with your definition of a:
from scipy import ndimage
image_threshold = .5
label_array, n_features = ndimage.label(a>image_threshold)
# Plot the resulting shapes
import pylab as plt
plt.subplot(121)
plt.imshow(a)
plt.subplot(122)
plt.imshow(label_array)
plt.show()