Compressing big GeoJSON/Shapefle datasets for viewing on web browser

Compressing big GeoJSON/Shapefle datasets for viewing on web browser - python

So I have a shapefile that is 3GB in size and as you can imagine my browser doesn't like it. How can I compress the data I have which is either in lon/lat coordinates or points on an X,Y grid?
I saw a video on Computerphile about Discreet Cosine Transforms for reducing high dimesionality data but being a programmer and not a mathematician I don't know if this is even possible. I have tried to take a point every 10 steps in the file like so: map[0:100000:10] but this had an udesireable and very lossy effect.
I would ideally like to have my data so it would work like Google Earth in which the resolution adjusts to your viewport altitude. So when you zoom in to the map higher freqency data is presented in the viewport, limiting the amount of points but I don't know how they do this and Google return nothing of value.
Last point is that since these are just vectors is there any type of vector compression I could use? I'm not to great at math so as you can imagine when I look into this I just get confused fairly quickly. I uderstand SciPy has some DCT built in and I know it has a whole bunch of other features which I don't understand, perhaps I could use this?

I can answer the "level of detail" part: you can experiment with leaflet (a javascript mapping library). You could then define a "coarse" layer wich is displayed for low zoom levels and "high detail" layers that are only displayed at higher zoom levels. You probably need to capture the map zoomend event and load/unload your layers from there.

One solution to this problem is to use a Web Map Server (WMS) like GeoServer or MapServer that stores your ShapeFile (though a spatial database like PostGIS would be better) on the server and sends a rendered image (often broken down into cacheable tiles) to the browser.

Related

How to retrieve data points from chart image?

I have some time series plots I've done a long time ago that I would like to improve in terms of graphics and style. However, I didn't save the raw data and I cannot recover them.
So I was wondering, is there a way to retrieve data points from a chart image (e.g. png file)? like I input an image and I get a csv/dataframe/array with pairs of x,y coordinates?
To give an idea, that's the kind of images I would like to convert:
I've seen this GRABIT could potentially work but I'm not familiar with MatLab. Is there anything python-based or possibly some web tools?
Preferences:
work on linux systems (in particular ubuntu)
doesn't require installation

there is a little and simple software develop here in Brazil that can do this kind of thing.
You have to download in the following link:
http://paginapessoal.utfpr.edu.br/lasouza/analise-nao-linear-de-estruturas/Pega%20Ponto%201.0.exe/view
You can load the image, specify the origin and set the x and y labels. After you can retrieve the points that you clicked on the image.

Geotiff overlay position is slightly off on Holoviews/Bokeh tilemap

I have a Geotiff that I display on a tile map, but it's slightly off to the south. For example, on this screenshot the edge of the image should be where the country border is, but it's a bit to the south:
Here's the relevant part of the code:
tiff_rio_500 = rioxarray.open_rasterio('/content/mw/mw_dist_to_light_at_all_from_light_mask_mw_cut_s3_500.tif')
dataarray_500 = tiff_rio_500[0]
dataarray_500_meters = dataarray_500.copy()
dataarray_500_meters['x'], dataarray_500_meters['y'] = ds.utils.lnglat_to_meters(dataarray_500.x, dataarray_500.y)
hv_dataset_500_meters = hv.Dataset(dataarray_500_meters, name='nightlights', vdims='cumulative_cost')
hv_tiles_osm_bokeh = hv.element.tiles.OSM().opts(width=1000, height=800)
hv_image_500_meters_bokeh = hv.Image(hv_dataset_500_meters, kdims=['x', 'y'], vdims=['cumulative_cost'], rtol=1).opts(cmap='inferno_r')
hv_combined_osm_500_meters_bokeh = hv_tiles_osm_bokeh * hv_image_500_meters_bokeh
hv_combined_osm_500_meters_bokeh
You can see the live notebook on google colab.
Now this is not the usual "everything is way off" problem that occurs when one doesn't convert the map to Web Mercator. It is almost perfect, it just isn't.
The Geotiff is an Earth Engine export. This is how it looked originally in Earth Engine (see live code):
As you can see, the image follows the borders everywhere.
At first, I suspected that maybe the export went wrong, or the google map tileset is somewhat different, but no, if I open the same exported Tiff in the QGis application on my windows laptop and view it on the same OSM tilemap as I do in the colab notebook, it looks fine:
Okay, the image does not follow the borders perfectly, but I know why and that's unrelated (I oversimplified the country border geometry). The point is, that it is projected to the correct location. So based on that, the tiff contains the correct information, it can be displayed at the same location as the borders are in the OSM tilemap, but still in my Holoviews-Datashader-Bokeh project it is slightly off.
Any idea why this happens?

I've got the answer on the Holoviz Discourse from one of the developers. Seeing how the recommended function is practically undocumented, I copy it here in case somebody looks for an easy way to load a geotiff and add to a tilemap in Holoviews/Geoviews:
https://discourse.holoviz.org/t/geotiff-overlay-position-is-slightly-off-on-holoviews-bokeh-tilemap/2071
philippjfr
I wouldn’t expect manually transforming the coordinates to work
particularly well. While it’s a much heavier weight dependency for
accurate coordinate transforms I’d recommend using GeoViews.
img = gv.util.load_tiff( '/content/mw/mw_dist_to_light_at_all_from_light_mask_mw_cut_s3_500.tif' )
gv.tile_sources.OSM() * img.opts(cmap='inferno_r')
Edit: Now it is possible one doesn't want to use Geoviews as it has a pretty heavy dependency chain that requires a lot of patience and luck to set it up right. Fortunately rioxarray (through rasterio) has a tool to reproject, just append ".rio.reproject('EPSG:3857')" to the first line and then you don't have to use the lnglat_to_meters which is not intended for this purpose.
So the corrected code becomes:
tiff_rio_500 = rioxarray.open_rasterio('/content/mw/mw_dist_to_light_at_all_from_light_mask_mw_cut_s3_500.tif').rio.reproject('EPSG:3857')
hv_dataset_500_meters = hv.Dataset(tiff_rio_500[0], name='nightlights', vdims='cumulative_cost')
hv_tiles_osm_bokeh = hv.element.tiles.OSM().opts(width=1000, height=800)
hv_image_500_meters_bokeh = hv.Image(hv_dataset_500_meters, kdims=['x', 'y'], vdims=['cumulative_cost'], rtol=1).opts(cmap='inferno_r')
hv_combined_osm_500_meters_bokeh = hv_tiles_osm_bokeh * hv_image_500_meters_bokeh
hv_combined_osm_500_meters_bokeh
Now compared to the Geoviews solution (that supposedly handles everything automatically), this solution has a downside that if you use a Hover Tooltip to display the values and coordinates under the mouse cursor, the coordinates are showing up in the newly projected web mercator system in millions of meters instead of the expected degrees. The solution for that is outside the scope of this answer, but I'm just finishing a detailed step by step guide that contains a solution for that too, and I will link that here as soon as it is published. Of course if you don't use Hover Tooltip, the code above will be perfect for you without any more tinkering.

Displaying very large data sets more efficently

I have a logic analyser project that records several hundred million 16bit values (~100-500 million) and I need to display anything from a few hundred samples to the entire capture as the user zooms.
When you zoom out the whole system gets a huge performance hit as it's loading a massive chunk from the file.
I just though this morning that it would be more efficient to "stride" through the file at the users screen resolution. You can't physically display anything between pixels anyways. This doesn't solve the massive file size hit in memory though.
Is there away I can take a huge data set and stream chunk it down efficiently?
I was thinking streaming from start to start + view size by horiz resolution. This makes a very choppy zoom though.
Program uses python but I am open to calling something in c if it already exists.

Well, I don't know if this is actually question on programming or design overall.
For "zooming" problem with vizualizations I suggest:
Have pre-computed/cached version for some zoom levels. Ideally, gradation should be calculated based on user behaviour.
When user zooms-in, you simultaneously
calculate "proper" data or load pre-computed aggregated data of deeper zoom layer and crop it by your view frame
cheat by rendering low-res data from previous layer or smooth it by some approximation (but make sure to somehow tell user that data is not finalized)
Aside of it, think if you can optimize the way you store data. Trees may make your life way easier, both for partial disk read/search and for storing aggregated data.

In my opinion, there is no point to display even a few hundred samples unless they form some kind of image/shape. I guess one can look at hundred numbers if they are properly structured (colored). Several hundred - doubt it - here you replace actual data with some visualization (plots, charts, maps, ...).
To approach the problem you may define some rule to stop displaying actual data at all. For instance, if digit height becomes less than, say, 10 pixels you display some kind of message selected numbers are from rows 200...300, columns 400..500 or some graphical alterantive with corner coordinates and amount of numbers.

calculate particle size distribution from AFM measurements

I am trying to obtain a radius and diameter distribution from some AFM (Atomic force microscopy) measurements. So far I am trying out Gwyddion, ImageJ and different workflows in Matlab.
At the moment the best results I have found is to use Gwyddion and to take the Phase image, high pass filter it and then try an edge detection with 'Laplacian of Gaussian'. The result is shown in figure 3. However this image is still too noisy and doesnt really capture the edges of all the particles. (some are merged together others do not have a clear perimeter).
In the end I need an image which segments each of the spherical particles which I can use for blob detection/analysis to obtain size/radius information.
Can anyone recommend a different method?
[

I would definitely try a Granulometry, it was designed for something really similar. There is a good explanation of granulometry here starting page 158.
The granulometry will perform consecutive / increasing openings that will erase the different patterns according to their dimensions. The bigger the pattern, the latter it will be erased. It will give you a curve that represent the pattern dimension distributions in your image, so exactly what you want.
However, it will not give you any information about the position inside the image. If you want to have a rough modeling of the blobs present in your image, you can take a look to the Ultimate Opening.

Maybe you can use Avizo, it's a powerful software for dealing with image issues, especially for three D data (CT)

Server Side Google Markers Clustering - Python/Django

After experimenting with client side approach to clustering large numbers of Google markers I decided that it won't be possible for my project (social network with 28,000+ users).
Are there any examples of clustering the coordinates on the server side - preferably in Python/Django?
The way I would like this to work is to gradually index the markers based on their proximity (radius) and zoom level.
In another words when a new user registers he/she is automatically assigned to a certain 'group' of markers that are close to each other thus increasing the 'group's' counter. What's being send to the server is just a small number of 'groups'. Only when the zoom level/scale of map is 1:1 - actual users are shown on the map.
That way the client side will have to deal only with 10-50 markers per request/zoom level.

This is a paid service that uses server-side clustering, but I'm not sure how it works. I'm guessing that they just use your data to generate the markers to be shown at each zoom level.
Update: This tutorial demonstrates a basic server-side clustering function. It's written in PHP for the Static Maps API, but you could use it as a starting point.

You might want to take a look at the DBSCAN and OPTICS pages on wikipedia, these looks very suitable for clustering places on a map. There is also a page about Cluster Analysis that shows all the possible algorithms you can use, most would be trivial to implement using the language of your choice.
With 28k+ points, you might want to skip django and just jump into C/C++ directly, and surely not expect this to get calculated in real-time in response to web requests.

One way to do it would be to define a grid with a unit size based on the zoom level. So you collect up all the items within a grid by lat,lon to one decimal place. An example is 42.2x73.4. So a point at 42.2003x73.4021 falls in that grid cell. That cell is bounded by 42.2x73.3 and 42.2x73.5.
If there are one or more points in a grid cell, you place a marker in the center of that grid.
You then hook up the zoomend event and change your grid size accordingly, and redraw the markers.
http://code.google.com/apis/maps/documentation/reference.html#GMap2.zoomend

You can try my server-side clustering django app:
https://github.com/biodiv/anycluster
It prvides a kmeans and a grid cluster.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Compressing big GeoJSON/Shapefle datasets for viewing on web browser - python

One solution to this problem is to use a Web Map Server (WMS) like GeoServer or MapServer that stores your ShapeFile (though a spatial database like PostGIS would be better) on the server and sends a rendered image (often broken down into cacheable tiles) to the browser.

Related

How to retrieve data points from chart image?

Geotiff overlay position is slightly off on Holoviews/Bokeh tilemap

Displaying very large data sets more efficently

calculate particle size distribution from AFM measurements

Server Side Google Markers Clustering - Python/Django

Categories

Resources