How to speed up exporting in Google Earth Engine?

How to speed up exporting in Google Earth Engine? - python

I need to download a bunch of Landsat images from Google Earth Engine, and I'm doing it in its Python API with
task = ee.batch.Export.image(IMAGE, NAME, config=CONFIG)
task.start()
Though it's working, the speed is extremely slow. The size of my images is around 70kb, but each takes 4 min or longer. Currently this happens in a for loop, so how can I, if possible, pack up multiple images as one task? Or other ways to speed up this process?

The startup costs for an export are pretty high. If you're just exporting tiny pieces, you might be able to do it with image.getThumbnail for RGB visualized images, or image.getDownloadURL() for raw numbers.
The getThumbnail() function will give you a PNG or JPEG and getDownloadURL() will give you a zipfile with the bands as individual geotiffs. Note that both are unreliable for large downloads or images that take a lot of computation. Since the image starts being computed once you access the resulting URL, there's no way to communicate failure; if something goes wrong, you just get a empty/corrupt file.

You simply can't. It is a process that takes place in Google's servers, so you can do nothing about it. The only thing I can think of is that: if you make unnecessary operations it may take longer, I guess. For example (a very simple one):
img_1 = ee.Image(A).add(ee.Image(B)).add(ee.Image(C)).subtract(ee.Image(B))
img_2 = ee.Image(A).add(ee.Image(C))
exporting img_1 could take longer than exporting img_2. But I'm only guessing because I don't really know what happens on the "server side".. you could ask the people who do know that in the Earth Engine Forum (https://groups.google.com/forum/#!forum/google-earth-engine-developers).
By the way, if you use the Python API, this could be useful: https://github.com/gee-community/gee_tools

Related

How to find an exact match of an image in hashed data with openCV

for my school project, I need to find images in a large dataset. I'm working with python and opencv. Until now, I've managed to find an exact match of an image in the dataset but it takes a lot of time even though I had 20 images for the test code. So, I've searched few pages of google and I've tried the code on these pages
image hashing
building an image hashing search engine
feature matching
Also, I've been thinking to search through the hashed dataset, save their paths, then find the best feature matching image among them. But most of the time, my narrowed down working area is so much different than what is my query image.
The image hashing is really great. It looks like what I need but there is a problem: I need to find an exact match, not similar photos. So, I'm asking you guys, if you have any suggestion or a piece of code might help or improve the reference code that I've linked, can you share it with me? I'd be really happy to try or research what you guys send or suggest.

opencv is probably the wrong tool for this. The algorithms there are geared towards finding similar matches, not exact ones. The general idea is to use machine learning to teach the code to recognize what a car looks like so it can detect cars in videos, even when the color or form changes (driving in the shadow, different make, etc).
I've found two approaches work well when trying to build an image database.
Use a normal hash algorithm like SHA-256 plus maybe some metadata (file or image size) to find matches
Resize the image down to 4x4 or even 2x2. Use the pixel RGB values as "hash".
The first approach is to reduce the image to a number. You can then put the number in a look up table. When searching for the image, apply the same hashing algorithm to the image you're looking for. Use the new number to look in the table. If it's there, you have a match.
Note: In all cases, hashing can produce the same number for different pictures. So you have to compare all the pixels of two pictures to make sure it's really an exact match. That's why it sometimes helps to add information like the picture size (in pixels, not file size in bytes).
The second approach allows to find pictures which very similar to the eye but in fact slightly different. Imagine cropping off a single pixel column on the left or tilting the image by 0.01°. To you, the image will be the same but for a computer, they will by totally different. The second approach tries to average small changes out. The cost here is that you will get more collisions, especially for B&W pictures.

Finding exact image matches using hash functions can be done with the undouble library (Disclaimer: I am also the author). It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.

Creating a fingerprint database

I was wondering how to create a fingerprint database. If fingerprints are stored as images, how do you compare images in a database, or create an image search engine like TinEye?
I know this is a big subject, but I'm just looking for a starting point. Can this be done using Python/Django libraries and MySQL?

OpenCV comes with a sample program that does what you are looking for. It's called find_obj.py. Pull it up in your editor and change:
surf = cv2.SURF(1000)
to
surf = cv2.SURF(100)
This should find lots of "inlier" points of interest in the negative of the fingerprint scan.
You can play around with a number of the variables and eventually find the best configuration for the sort of images you're comparing. It's also fairly straightforward to alter the sample to allow you to compare a single image against an entire directory.
I should point out that this will only be effective for the sort of digitized fingerprint scans used by law enforcement.

The Python Imaging Library is probably the best library to get started on image processing with.
The library most commonly used for real time image processing (you don't need real time, but you can't go wrong with fast) is OpenCV. It has Python Bindings and built-in feature detection algorithms. See also this comparison.
For an overview of image comparison algorithms take a look at this question.

As a very simple approach you can crawl all images and compute a hash for each.
Later on, when user submits an image for a search, you compute a hash for that too and look for the same hash in your database.
However, this is really simplistic approach and will only work if searched for exact image copies. Ideally, each image should be converted to some simplified feature set (to have tolerance against different versions of the same image --- different formats, sizes, noise, etc.) used for comparison. For instance, it could be worth trying convert images (both crawled and submitted for search) to grayscale of 128x128 size and compute hash of that.

Creating Segments in Video

I'm using Python 2.7, PyGTK 2.24, and PyGST (Gstreamer).
To ensure smooth playback from one clip to another (without a blink), I combined all the clips I needed into one larger video. This lets me seek to the exact place I need in code. One of the clips is like a "fill-in", which should loop whenever one of the other clips is not playing.
However, to make my code easier and more streamlined, I want to use segments to define the various clips within the larger video. Then, at the end of each segment (I know there is a segment end event), I seek to the fill-in clip. When I need another clip, I just seek to that segment.
My question is, how exactly do I create these segments? I'm guessing that would be the event_new_new_segment(), but I am not sure. Can I create multiple clips to seek with using this function? Is there another I should use. Are there any gotchas to this method of seeking in my video that I should be aware of?
Second, how do I seek to that segement?
Thank you!

Looks like only GstElement's can generate NEWSEGMENT events, you can't simply attach it to an existing element. The closest thing you could do if not using Python, would be creating a single shot or periodic GstClockID or and use gst_clock_id_wait_async until the clock time hit. But the problem is, GstClockID is not wrapped in PyGst.
I think I'm actually working on some similar problem. Some kind of solution I'm using now, is gluing video streams in real time with gnonlin. The good side: seems to work, didn't have time to thoroughly test it yet. Bad side: poorly documented and buggy. These sources from the flumotion project (and the comments inside!) were very, very helpful to me for understanding how to make the whole thing work.

How fast is Python Image Library's (PIL) ImageDraw Module, for instance as compared to OpenGL?

I have some .png images and I want to be able to quickly:
(a) Load a .png from a file.
(b) Draw some simple lines on top of the .png.
(c) Get the contents (bytes) of the resulting image to return as the result of an http request.
It sounds like PIL is a good candidate for doing this with relatively little code. However, I'm trying to understand how efficient it is, especially when I have, say, thousands of lines to draw in step (b). The alternative is using PyOpenGL, but before getting into that I wanted to understand if PIL was already fast enough.
I was going to ask if PIL used OpenGL under the covers. But that might be the wrong question, because my understanding is that to get the real speed benefit from PyOpenGL I'd want to submit my line vertexes as NumPy arrays. So presumably even if PIL uses OpenGL, I'm going to lose a lot of that benefit when I make an individual PIL call to draw each of my lines?
Anybody have concrete data for speed of PIL when drawing lots of primitives?

"Draw some simple lines on top of the .png" is not a computationally intensive task.
This doesn't seems to be a good candidate for the GPU since they are better suited for more complex tasks. You've got to realize that the image is initially loaded on the RAM, making it your job to send this data to the GPU memory and then retrieve it back. This operation consumes a few milliseconds, depending on the size of the image, that could be better used for CPU processing.
Your application would only benefit from the GPU if it had high arithmetic intensity.

Reading Alpha of a PNG Pixel. Fast way via pure Python?

I am having an issue with an embedded 64bit Python instance not liking PIL. Before i start exhausting more methods to get a compiled image editor to read the pixels for me (such as ImageMagick) i am hoping perhaps anyone here can think of a purely Python solution that will be comparable in speeds to the compiled counterparts.
Now i am aware that the compiled friends will always be much faster, but i am hoping that because i "just" want to read the alpha of a group of pixels, that perhaps a fast enough pure Python solution can be conjured up. Anyone have any bright ideas?
Though, i have tried PyPNG and that is far too slow, so i'm not expecting any magic solutions. None the less, i had to ask.
Thanks to any replies!
And just for reference, the images i'll be reading will be on average around 512*512 to 2048*2048, and i'll be reading anywhere from one to all of the pixels alpha (multiplied by a few million times, but the values can be stored so reading twice isn't done).

Getting data out of a PNG requires unpacking data and decompressing it. These are likely going to be too slow in Python for your application. One possibility is to start with PyPNG and get rid of anything in it that you don't need. For example, it is probably storing all of the data it reads from the PNG, and some of the slow speed you see may be due to the memory allocations.

When you say PyPNG is too slow, how slow is it? To put it another way, how fast would be fast enough? PyPNG doesn't do anything stupid to make itself slow, but it is written in Python.
Make sure you're using read() to read the image row by row, and make sure you're using row[3::4] to extract the alpha channel. Extracting the alpha channel by using slice notation is no slower than reading the whole image.
I've added some notes to the PyPNG documentation about its speed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.