gee 'sampleRectangle()' returning 1x1 array - python

I'm facing an issue when trying to use 'sampleRectangle()' function in GEE, it is returning 1x1 arrays and I can't seem to find a workaround. Please, see below a python code in which I'm using an approach posted by Justin Braaten. I suspect there's something wrong with the geometry object I'm passing to the function, but at the same time I've tried several ways to check how this argument is behaving and couldn't no spot any major issue.
Can anyone give me a hand trying to understand what is happening?
Thanks!
import json
import ee
import numpy as np
import matplotlib.pyplot as plt
ee.Initialize()
point = ee.Geometry.Point([-55.8571, -9.7864])
box_l8sr = ee.Geometry(point.buffer(50).bounds())
box_l8sr2 = ee.Geometry.Polygon(box_l8sr.coordinates())
# print(box_l8sr2)
# Define an image.
# l8sr_y = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_038029_20180810')
oli_sr_coll = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
## Function to mask out clouds and cloud-shadows present in Landsat images
def maskL8sr(image):
## Bits 3 and 5 are cloud shadow and cloud, respectively.
cloudShadowBitMask = (1 << 3)
cloudsBitMask = (1 << 5)
## Get the pixel QA band.
qa = image.select('pixel_qa')
## Both flags should be set to zero, indicating clear conditions.
mask = qa.bitwiseAnd(cloudShadowBitMask).eq(0)
mask = qa.bitwiseAnd(cloudsBitMask).eq(0)
return image.updateMask(mask)
l8sr_y = oli_sr_coll.filterDate('2019-01-01', '2019-12-31').map(maskL8sr).mean()
l8sr_bands = l8sr_y.select(['B2', 'B3', 'B4']).sampleRectangle(box_l8sr2)
print(type(l8sr_bands))
# Get individual band arrays.
band_arr_b4 = l8sr_bands.get('B4')
band_arr_b3 = l8sr_bands.get('B3')
band_arr_b2 = l8sr_bands.get('B2')
# Transfer the arrays from server to client and cast as np array.
np_arr_b4 = np.array(band_arr_b4.getInfo())
np_arr_b3 = np.array(band_arr_b3.getInfo())
np_arr_b2 = np.array(band_arr_b2.getInfo())
print(np_arr_b4.shape)
print(np_arr_b3.shape)
print(np_arr_b2.shape)
# Expand the dimensions of the images so they can be concatenated into 3-D.
np_arr_b4 = np.expand_dims(np_arr_b4, 2)
np_arr_b3 = np.expand_dims(np_arr_b3, 2)
np_arr_b2 = np.expand_dims(np_arr_b2, 2)
# # print(np_arr_b4.shape)
# # print(np_arr_b5.shape)
# # print(np_arr_b6.shape)
# # Stack the individual bands to make a 3-D array.
rgb_img = np.concatenate((np_arr_b2, np_arr_b3, np_arr_b4), 2)
# print(rgb_img.shape)
# # Scale the data to [0, 255] to show as an RGB image.
rgb_img_test = (255*((rgb_img - 100)/3500)).astype('uint8')
# plt.imshow(rgb_img)
plt.show()
# # # create L8OLI plot
# fig, ax = plt.subplots()
# ax.set(title = "Satellite Image")
# ax.set_axis_off()
# plt.plot(42, 42, 'ko')
# img = ax.imshow(rgb_img_test, interpolation='nearest')

I have the same issue. It seems to have something to do with .mean(), or any reduction of image collections for that matter.
One solution is to reproject after the reduction. For example, you could try adding "reproject" at the end:
l8sr_y = oli_sr_coll.filterDate('2019-01-01', '2019-12-31').map(maskL8sr).mean().reproject(crs = ee.Projection('EPSG:4326'), scale=30)
It should work.

Related

Why are these codes not visually showing the right colors extracted from the image?

so I am working on a program to extract up to 4 of the most common colors, from a picture. Right now, I'm working on it visually showing the most common colors, however, after reading the image, I am:
unable to get an output of the correct rgb codes (it's not outputting it for me)
and
the chart that pops up either shows all black, or shows 3 random colors that are not in the picture.
Any tips or help? I've tried anything that I can, I am not sure why it cannot read the colors well. Thank you.
The code:
import matplotlib.image as img
import matplotlib.pyplot as plt
from scipy.cluster.vq import whiten
from scipy.cluster.vq import kmeans
import pandas as pd
import numpy as np
bimage = img.imread('Images/build2.jpg') #read image (this part works)
print(bimage.shape)
r = []
g = []
b = []
for row in bimage:
for temp_r, temp_g, temp_b in row:
r.append(temp_r)
g.append(temp_g)
b.append(temp_b)
bimage_df = pd.DataFrame({'red': r,
'green': g,
'blue': b})
bimage_df['scaled_color_red'] = whiten(bimage_df['red']) #supposed to give color codes
bimage_df['scaled_color_blue'] = whiten(bimage_df['blue'])
bimage_df['scaled_color_green'] = whiten(bimage_df['green'])
cluster_centers, _ = kmeans(bimage_df[['scaled_color_red', #to find most common colors
'scaled_color_blue',
'scaled_color_green']], 3)
dominant_colors = []
red_std, green_std, blue_std = bimage_df[['red',
'green',
'blue']].std()
for cluster_center in cluster_centers:
red_scaled, green_scaled, blue_scaled = cluster_center
dominant_colors.append((
red_scaled * red_std / 255,
green_scaled * green_std / 255,
blue_scaled * blue_std / 255
))
plt.imshow([dominant_colors])
plt.show()
The image I used:
I have tried using this method for an output and another type of chart too, but that gave me all black or purple, unrelated colors. I had referred to geeks4geeks for this, could not troubleshoot either. Any help would be greatly appreciated.
The major issue is the usage of whiten method that is not adequate for the sample image:
whiten documentation:
Before running k-means, it is beneficial to rescale each feature dimension of the observation set by its standard deviation (i.e. “whiten” it - as in “white noise” where each frequency has equal power). Each feature is divided by its standard deviation across all observations to give it unit variance.
The normalization method assumes normal distribution of the noise.
The sample image is not a natural image (has no noise), and the normalization procedure does not feat the given image.
Instead of normalization, it is recommended to convert the image to LAB color space, where color distances better match the perceptual distances.
Keeping the colors in RGB format may work good enough...
Swapping the green and the blue channels is another issue.
Instead of using a for loop, we may use NumPy array operations (it's not a bug, just faster):
fimage = bimage.astype(float) # Convert image from uint8 to float (kmeans requires floats).
r = fimage[:, :, 0].flatten().tolist() # Convert red elements to list
g = fimage[:, :, 1].flatten().tolist() # Convert grenn elements to list
b = fimage[:, :, 2].flatten().tolist() # Convert blue elements to list
bimage_df = pd.DataFrame({'red': r,
'green': g,
'blue': b})
Apply kmeans with 100 iterations (the default is 20, and may not be enough):
cluster_centers, _ = kmeans(bimage_df[['red', #Find rhe 4 most common colors
'green',
'blue']], 4, iter=100) # The default is 20 iterations, use 100 iterations for better convergence
Before using plt.imshow we have to convert the colors to uint8 type (we may also convert to range [0, 1]), otherwize the displayed colors are going to be white (saturated).
dominant_colors = np.round(cluster_centers).astype(np.uint8) # Round and convert to uint8
plt.imshow([dominant_colors])
plt.show()
Code sample:
import matplotlib.image as img
import matplotlib.pyplot as plt
#from scipy.cluster.vq import whiten
from scipy.cluster.vq import kmeans
import pandas as pd
import numpy as np
bimage = img.imread('Images/build2.jpg') #read image (this part works)
print(bimage.shape)
#r = []
#g = []
#b = []
#for row in bimage:
# for temp_r, temp_g, temp_b in row:
# r.append(temp_r)
# g.append(temp_g)
# b.append(temp_b)
# Use NumPy array operations, instead of using a for loop.
fimage = bimage.astype(float) # Convert image from uint8 to float (kmeans requires floats).
r = fimage[:, :, 0].flatten().tolist() # Convert red elements to list
g = fimage[:, :, 1].flatten().tolist() # Convert grenn elements to list
b = fimage[:, :, 2].flatten().tolist() # Convert blue elements to list
bimage_df = pd.DataFrame({'red': r,
'green': g,
'blue': b})
# Don't use whiten
#bimage_df['scaled_color_red'] = whiten(bimage_df['red']) #supposed to give color codes
#bimage_df['scaled_color_blue'] = whiten(bimage_df['blue'])
#bimage_df['scaled_color_green'] = whiten(bimage_df['green'])
#cluster_centers, _ = kmeans(bimage_df[['scaled_color_red', #to find most common colors
# 'scaled_color_blue',
# 'scaled_color_green']], 3)
cluster_centers, _ = kmeans(bimage_df[['red', #Find the 4 most common colors
'green',
'blue']], 4, iter=100) # The default is 20 iterations, use 100 iterations for better convergence
dominant_colors = np.round(cluster_centers).astype(np.uint8) # Round and convert to uint8
print(dominant_colors)
# Since whiten is not used, we don't need the STD
#red_std, green_std, blue_std = bimage_df[['red',
# 'green',
# 'blue']].std()
#for cluster_center in cluster_centers:
# red_scaled, green_scaled, blue_scaled = cluster_center
# dominant_colors.append((
# red_scaled * red_std / 255,
# green_scaled * green_std / 255,
# blue_scaled * blue_std / 255
# ))
plt.imshow([dominant_colors])
plt.show()
Result:

NameError: name 'IMG_H' is not defined

I am a new programming Interface. I am using the PIL and Matplotlib libraries for the contract streaching.When I am using the Histogram Equalizer I am getting the error as name 'IMG_H' is not defined.I am also Converting my image to numpy array, calculate the histogram, cumulative sum, mapping and then apply the mapping to create a new image.
You can see my code below -
# HISTOGRAM EQUALIZATION
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
def make_histogram(img):
""" Take an image and create a historgram from it's luma values """
y_vals = img[:,:,0].flatten()
histogram = np.zeros(256, dtype=int)
for y_index in range(y_vals.size):
histogram[y_vals[y_index]] += 1
return histogram
def make_cumsum(histogram):
""" Create an array that represents the cumulative sum of the histogram """
cumsum = np.zeros(256, dtype=int)
cumsum[0] = histogram[0]
for i in range(1, histogram.size):
cumsum[i] = cumsum[i-1] + histogram[i]
return cumsum
def make_mapping(histogram, cumsum):
mapping = np.zeros(256, dtype=int)
luma_levels = 256
for i in range(histogram.size):
mapping[i] = max(0, round((luma_levels*cumsum[i])/(IMG_H*IMG_W))-1)
return mapping
def apply_mapping(img, mapping):
""" Apply the mapping to our image """
new_image = img.copy()
new_image[:,:,0] = list(map(lambda a : mapping[a], img[:,:,0]))
return new_image
# Load image
pillow_img = Image.open('pout.jpg')
# Convert our image to numpy array, calculate the histogram, cumulative sum,
# mapping and then apply the mapping to create a new image
img = np.array(pillow_img)
histogram = make_histogram(img)
cumsum = make_cumsum(histogram)
mapping = make_mapping(histogram, cumsum)
new_image = apply_mapping(img, mapping)
output_image = Image.fromarray(np.uint8(new_image))
imshow(output_image, cmap='gray')
# Display the old (black) and new (red) histograms next to eachother
x_axis = np.arange(256)
fig = plt.figure()
fig.add_subplot(1,2,1)
plt.bar(x_axis , histogram, color = "black")
fig.add_subplot(1,2,2)
plt.bar(x_axis , make_histogram(new_image), color = "red")
plt.show()
You have this variable here:
mapping[i] = max(0, round((luma_levels*cumsum[i])/(IMG_H*IMG_W))-1)
But you didn't define it (or import) before, therefore you get this error.
for i in range(histogram.size):
mapping[i] = max(0, round((luma_levels*cumsum[i])/(IMG_H*IMG_W))-1)
In above stated line. you are using 2 Variables, IMG_H and IMG_W.
where you defined these variables?
EDITED PART
for i in range(histogram.size):
mapping[i] = max(0, round((luma_levels*cumsum[i])/(IMG_H*IMG_W))-1)
in the above stated line you are using 2 variables try to do multiplication (IMG_H*IMG_W) but you did not define and import these variables in the whole code.
You can do like this.
you can define these variables on the top of the code.
your code shows that these variables are defined for Image width and height
IMG_W = 120 #Any value in integer for Image Width
IMG_H = 124 #Any value in integer for Image Height

Extract N number of patches from an image

I have an image of dimension 155 x 240. Like the following:
I want to extract certain shape of patchs (25 x 25).
I don't want to patch from the whole image.
I want to extract N number of patch from non-zero (not background) area of the image. How can I do that? Any idea or suggestion or implementation will be appreciated. You can try with either Matlab or Python.
Note:
I have generated a random image so that you can process it for patching. image_process variable is that image in this code.
import numpy as np
from scipy.ndimage.filters import convolve
import matplotlib.pyplot as plt
background = np.ones((155,240))
background[78,120] = 2
n_d = 50
y,x = np.ogrid[-n_d: n_d+1, -n_d: n_d+1]
mask = x**2+y**2 <= n_d**2
mask = 254*mask.astype(float)
image_process = convolve(background, mask)-sum(sum(mask))+1
image_process[image_process==1] = 0
image_process[image_process==255] = 1
plt.imshow(image_process)
Lets assume that the pixels values you want to omit is 0.
In this case what you could do, is first find the indices of the non-zero values, then slice the image in the min/max position to get only the desired area, and then simply apply extract_patches_2d with the desired window size and number of patches.
For example, given the dummy image you supplied:
import numpy as np
from scipy.ndimage.filters import convolve
import matplotlib.pyplot as plt
background = np.ones((155,240))
background[78,120] = 2
n_d = 50
y,x = np.ogrid[-n_d: n_d+1, -n_d: n_d+1]
mask = x**2+y**2 <= n_d**2
mask = 254*mask.astype(float)
image_process = convolve(background, mask)-sum(sum(mask))+1
image_process[image_process==1] = 0
image_process[image_process==255] = 1
plt.figure()
plt.imshow(image_process)
plt.show()
from sklearn.feature_extraction.image import extract_patches_2d
x, y = np.nonzero(image_process)
xl,xr = x.min(),x.max()
yl,yr = y.min(),y.max()
only_desired_area = image_process[xl:xr+1, yl:yr+1]
window_shape = (25, 25)
B = extract_patches_2d(only_desired_area, window_shape, max_patches=100) # B shape will be (100, 25, 25)
If you plot the only_desired_area you will get the following image:
This is the main logic if you wish an even tighter bound you should adjust the slicing properly.

Plot really big file in python (5GB) with x axis offset

I am trying to plot a very big file (~5 GB) using python and matplotlib. I am able to load the whole file in memory (the total available in the machine is 16 GB) but when I plot it using simple imshow I get a segmentation fault. This is most probable to the ulimit which I have set to 15000 but I cannot set higher. I have come to the conclusion that I need to plot my array in batches and therefore made a simple code to do that. My main isue is that when I plot a batch of the big array the x coordinates start always from 0 and there is no way I can overlay the images to create a final big one. If you have any suggestion please let me know. Also I am not able to install new packages like "Image" on this machine due to administrative rights. Here is a sample of the code that reads the first 12 lines of my array and make 3 plots.
import os
import sys
import scipy
import numpy as np
import pylab as pl
import matplotlib as mpl
import matplotlib.cm as cm
from optparse import OptionParser
from scipy import fftpack
from scipy.fftpack import *
from cmath import *
from pylab import *
import pp
import fileinput
import matplotlib.pylab as plt
import pickle
def readalllines(file1,rows,freqs):
file = open(file1,'r')
sizer = int(rows*freqs)
i = 0
q = np.zeros(sizer,'float')
for i in range(rows*freqs):
s =file.readline()
s = s.split()
#print s[4],q[i]
q[i] = float(s[4])
if i%262144 == 0:
print '\r ',int(i*100.0/(337*262144)),' percent complete',
i += 1
file.close()
return q
parser = OptionParser()
parser.add_option('-f',dest="filename",help="Read dynamic spectrum from FILE",metavar="FILE")
parser.add_option('-t',dest="dtime",help="The time integration used in seconds, default 10",default=10)
parser.add_option('-n',dest="dfreq",help="The bandwidth of each frequency channel in Hz",default=11.92092896)
parser.add_option('-w',dest="reduce",help="The chuncker divider in frequency channels, integer default 16",default=16)
(opts,args) = parser.parse_args()
rows=12
freqs = 262144
file1 = opts.filename
s = readalllines(file1,rows,freqs)
s = np.reshape(s,(rows,freqs))
s = s.T
print s.shape
#raw_input()
#s_shift = scipy.fftpack.fftshift(s)
#fig = plt.figure()
#fig.patch.set_alpha(0.0)
#axes = plt.axes()
#axes.patch.set_alpha(0.0)
###plt.ylim(0,8)
plt.ion()
i = 0
for o in range(0,rows,4):
fig = plt.figure()
#plt.clf()
plt.imshow(s[:,o:o+4],interpolation='nearest',aspect='auto', cmap=cm.gray_r, origin='lower')
if o == 0:
axis([0,rows,0,freqs])
fdf, fdff = xticks()
print fdf
xticks(fdf+o)
print xticks()
#axis([o,o+4,0,freqs])
plt.draw()
#w, h = fig.canvas.get_width_height()
#buf = np.fromstring(fig.canvas.tostring_argb(), dtype=np.uint8)
#buf.shape = (w,h,4)
#buf = np.rol(buf, 3, axis=2)
#w,h,_ = buf.shape
#img = Image.fromstring("RGBA", (w,h),buf.tostring())
#if prev:
# prev.paste(img)
# del prev
#prev = img
i += 1
pl.colorbar()
pl.show()
If you plot any array with more than ~2k pixels across something in your graphics chain will down sample the image in some way to display it on your monitor. I would recommend down sampling in a controlled way, something like
data = convert_raw_data_to_fft(args) # make sure data is row major
def ds_decimate(row,step = 100):
return row[::step]
def ds_sum(row,step):
return np.sum(row[:step*(len(row)//step)].reshape(-1,step),1)
# as per suggestion from tom10 in comments
def ds_max(row,step):
return np.max(row[:step*(len(row)//step)].reshape(-1,step),1)
data_plotable = [ds_sum(d) for d in data] # plug in which ever function you want
or interpolation.
Matplotlib is pretty memory-inefficient when plotting images. It creates several full-resolution intermediate arrays, which is probably why your program is crashing.
One solution is to downsample the image before feeding it into matplotlib, as #tcaswell suggests.
I also wrote some wrapper code to do this downsampling automatically, based on your screen resolution. It's at https://github.com/ChrisBeaumont/mpl-modest-image, if it's useful. It also has the advantage that the image is resampled on the fly, so you can still pan and zoom without sacrificing resolution where you need it.
I think you're just missing the extent=(left, right, bottom, top) keyword argument in plt.imshow.
x = np.random.randn(2, 10)
y = np.ones((4, 10))
x[0] = 0 # To make it clear which side is up, etc
y[0] = -1
plt.imshow(x, extent=(0, 10, 0, 2))
plt.imshow(y, extent=(0, 10, 2, 6))
# This is necessary, else the plot gets scaled and only shows the last array
plt.ylim(0, 6)
plt.colorbar()
plt.show()

Shape recognition with numpy/scipy (perhaps watershed)

My goal is to trace drawings that have a lot of separate shapes in them and to split these shapes into individual images. It is black on white. I'm quite new to numpy,opencv&co - but here is my current thought:
scan for black pixels
black pixel found -> watershed
find watershed boundary (as polygon path)
continue searching, but ignore points within the already found boundaries
I'm not very good at these kind of things, is there a better way?
First I tried to find the rectangular bounding box of the watershed results (this is more or less a collage of examples):
from numpy import *
import numpy as np
from scipy import ndimage
np.set_printoptions(threshold=np.nan)
a = np.zeros((512, 512)).astype(np.uint8) #unsigned integer type needed by watershed
y, x = np.ogrid[0:512, 0:512]
m1 = ((y-200)**2 + (x-100)**2 < 30**2)
m2 = ((y-350)**2 + (x-400)**2 < 20**2)
m3 = ((y-260)**2 + (x-200)**2 < 20**2)
a[m1+m2+m3]=1
markers = np.zeros_like(a).astype(int16)
markers[0, 0] = 1
markers[200, 100] = 2
markers[350, 400] = 3
markers[260, 200] = 4
res = ndimage.watershed_ift(a.astype(uint8), markers)
unique(res)
B = argwhere(res.astype(uint8))
(ystart, xstart), (ystop, xstop) = B.min(0), B.max(0) + 1
tr = a[ystart:ystop, xstart:xstop]
print tr
Somehow, when I use the original array (a) then argwhere seems to work, but after the watershed (res) it just outputs the complete array again.
The next step could be to find the polygon path around the shape, but the bounding box would be great for now!
Please help!
#Hooked has already answered most of your question, but I was in the middle of writing this up when he answered, so I'll post it in the hopes that it's still useful...
You're trying to jump through a few too many hoops. You don't need watershed_ift.
You use scipy.ndimage.label to differentiate separate objects in a boolean array and scipy.ndimage.find_objects to find the bounding box of each object.
Let's break things down a bit.
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
def draw_circle(grid, x0, y0, radius):
ny, nx = grid.shape
y, x = np.ogrid[:ny, :nx]
dist = np.hypot(x - x0, y - y0)
grid[dist < radius] = True
return grid
# Generate 3 circles...
a = np.zeros((512, 512), dtype=np.bool)
draw_circle(a, 100, 200, 30)
draw_circle(a, 400, 350, 20)
draw_circle(a, 200, 260, 20)
# Label the objects in the array.
labels, numobjects = ndimage.label(a)
# Now find their bounding boxes (This will be a tuple of slice objects)
# You can use each one to directly index your data.
# E.g. a[slices[0]] gives you the original data within the bounding box of the
# first object.
slices = ndimage.find_objects(labels)
#-- Plotting... -------------------------------------
fig, ax = plt.subplots()
ax.imshow(a)
ax.set_title('Original Data')
fig, ax = plt.subplots()
ax.imshow(labels)
ax.set_title('Labeled objects')
fig, axes = plt.subplots(ncols=numobjects)
for ax, sli in zip(axes.flat, slices):
ax.imshow(labels[sli], vmin=0, vmax=numobjects)
tpl = 'BBox:\nymin:{0.start}, ymax:{0.stop}\nxmin:{1.start}, xmax:{1.stop}'
ax.set_title(tpl.format(*sli))
fig.suptitle('Individual Objects')
plt.show()
Hopefully that makes it a bit clearer how to find the bounding boxes of the objects.
Use the ndimage library from scipy. The function label places a unique tag on each block of pixels that are within a threshold. This identifies the unique clusters (shapes). Starting with your definition of a:
from scipy import ndimage
image_threshold = .5
label_array, n_features = ndimage.label(a>image_threshold)
# Plot the resulting shapes
import pylab as plt
plt.subplot(121)
plt.imshow(a)
plt.subplot(122)
plt.imshow(label_array)
plt.show()

Categories