How to retrieve data points from chart image?

How to retrieve data points from chart image? - python

I have some time series plots I've done a long time ago that I would like to improve in terms of graphics and style. However, I didn't save the raw data and I cannot recover them.
So I was wondering, is there a way to retrieve data points from a chart image (e.g. png file)? like I input an image and I get a csv/dataframe/array with pairs of x,y coordinates?
To give an idea, that's the kind of images I would like to convert:
I've seen this GRABIT could potentially work but I'm not familiar with MatLab. Is there anything python-based or possibly some web tools?
Preferences:
work on linux systems (in particular ubuntu)
doesn't require installation

there is a little and simple software develop here in Brazil that can do this kind of thing.
You have to download in the following link:
http://paginapessoal.utfpr.edu.br/lasouza/analise-nao-linear-de-estruturas/Pega%20Ponto%201.0.exe/view
You can load the image, specify the origin and set the x and y labels. After you can retrieve the points that you clicked on the image.

Related

Geotiff overlay position is slightly off on Holoviews/Bokeh tilemap

I have a Geotiff that I display on a tile map, but it's slightly off to the south. For example, on this screenshot the edge of the image should be where the country border is, but it's a bit to the south:
Here's the relevant part of the code:
tiff_rio_500 = rioxarray.open_rasterio('/content/mw/mw_dist_to_light_at_all_from_light_mask_mw_cut_s3_500.tif')
dataarray_500 = tiff_rio_500[0]
dataarray_500_meters = dataarray_500.copy()
dataarray_500_meters['x'], dataarray_500_meters['y'] = ds.utils.lnglat_to_meters(dataarray_500.x, dataarray_500.y)
hv_dataset_500_meters = hv.Dataset(dataarray_500_meters, name='nightlights', vdims='cumulative_cost')
hv_tiles_osm_bokeh = hv.element.tiles.OSM().opts(width=1000, height=800)
hv_image_500_meters_bokeh = hv.Image(hv_dataset_500_meters, kdims=['x', 'y'], vdims=['cumulative_cost'], rtol=1).opts(cmap='inferno_r')
hv_combined_osm_500_meters_bokeh = hv_tiles_osm_bokeh * hv_image_500_meters_bokeh
hv_combined_osm_500_meters_bokeh
You can see the live notebook on google colab.
Now this is not the usual "everything is way off" problem that occurs when one doesn't convert the map to Web Mercator. It is almost perfect, it just isn't.
The Geotiff is an Earth Engine export. This is how it looked originally in Earth Engine (see live code):
As you can see, the image follows the borders everywhere.
At first, I suspected that maybe the export went wrong, or the google map tileset is somewhat different, but no, if I open the same exported Tiff in the QGis application on my windows laptop and view it on the same OSM tilemap as I do in the colab notebook, it looks fine:
Okay, the image does not follow the borders perfectly, but I know why and that's unrelated (I oversimplified the country border geometry). The point is, that it is projected to the correct location. So based on that, the tiff contains the correct information, it can be displayed at the same location as the borders are in the OSM tilemap, but still in my Holoviews-Datashader-Bokeh project it is slightly off.
Any idea why this happens?

I've got the answer on the Holoviz Discourse from one of the developers. Seeing how the recommended function is practically undocumented, I copy it here in case somebody looks for an easy way to load a geotiff and add to a tilemap in Holoviews/Geoviews:
https://discourse.holoviz.org/t/geotiff-overlay-position-is-slightly-off-on-holoviews-bokeh-tilemap/2071
philippjfr
I wouldn’t expect manually transforming the coordinates to work
particularly well. While it’s a much heavier weight dependency for
accurate coordinate transforms I’d recommend using GeoViews.
img = gv.util.load_tiff( '/content/mw/mw_dist_to_light_at_all_from_light_mask_mw_cut_s3_500.tif' )
gv.tile_sources.OSM() * img.opts(cmap='inferno_r')
Edit: Now it is possible one doesn't want to use Geoviews as it has a pretty heavy dependency chain that requires a lot of patience and luck to set it up right. Fortunately rioxarray (through rasterio) has a tool to reproject, just append ".rio.reproject('EPSG:3857')" to the first line and then you don't have to use the lnglat_to_meters which is not intended for this purpose.
So the corrected code becomes:
tiff_rio_500 = rioxarray.open_rasterio('/content/mw/mw_dist_to_light_at_all_from_light_mask_mw_cut_s3_500.tif').rio.reproject('EPSG:3857')
hv_dataset_500_meters = hv.Dataset(tiff_rio_500[0], name='nightlights', vdims='cumulative_cost')
hv_tiles_osm_bokeh = hv.element.tiles.OSM().opts(width=1000, height=800)
hv_image_500_meters_bokeh = hv.Image(hv_dataset_500_meters, kdims=['x', 'y'], vdims=['cumulative_cost'], rtol=1).opts(cmap='inferno_r')
hv_combined_osm_500_meters_bokeh = hv_tiles_osm_bokeh * hv_image_500_meters_bokeh
hv_combined_osm_500_meters_bokeh
Now compared to the Geoviews solution (that supposedly handles everything automatically), this solution has a downside that if you use a Hover Tooltip to display the values and coordinates under the mouse cursor, the coordinates are showing up in the newly projected web mercator system in millions of meters instead of the expected degrees. The solution for that is outside the scope of this answer, but I'm just finishing a detailed step by step guide that contains a solution for that too, and I will link that here as soon as it is published. Of course if you don't use Hover Tooltip, the code above will be perfect for you without any more tinkering.

python plotting and image viewing library with equivalent of MATLABs datatips

Currently, I use MATLAB extensively for analyzing experimental scientific data (mostly time traces and images). However, again and again I keep running into fundamental problems with the MATLAB language and I would like to make the switch to python. One feature of matlab is holding me back however: its ability to add datatips to plots and images.
For a line plot the datatip is a window next to one of the data points that shows its coordinates. This is very useful to quickly see where datapoints are and what their value is. Of course this can also be done by inspecting the vectors that were used to plot the line, but that is slightly more cumbersome and becomes a headache when trying to analyze loads of data. E.g. let's say we quickly want to know for what value of x, y=0.6. Moving the datatip around will give a rough estimate very quickly.
For images, the datatip shows the x and y coordinates, but also the greyscale value (called index by MATLAB) and the RGB color. I'm mainly interested in the greyscale value here. Suppose we want to know the coordinates of the bottom tip of the pupil of the cat's eye. A datatip allows to simply click that point and copy the coordinates (either manually or programmatically). Alternatively, one would have to write some image processing script to find this pixel location. For a one time analysis that is not worthwhile.
The plotting library for python that I'm most familiar with and that is commonly called the most flexible is matplotlib. An old stock overflow question seems to indicate that this can be done using mpldatacursor and another module seems to be mplcursors. These libraries do not seem to be compatible with Spyder, however, limiting their usability. Also, I imagine many python programmers would be using a feature like datatips, so it seems odd to have to rely on a 3rd party module.
Now on to the actual question: Is there any module (or simple piece of code that I could put in my personal library) to get the equivalent of MATLAB's datatips in all figures generated by a python script?

Is there a way on Python to extract a time series from an image?

I am trying to extract a time series dataset from an image (with x-axis and y-axis). Is there a quick way to do so on Python?
To be more precise, this is my graph:
HEL Share Price
and I am trying to get daily data.
Any help?
Thanks! :)

I know this Web App that can do it: WebPlotDigitizer
Looking at alternativeto.net I found Engauge Digitizer which "accepts image files (like PNG, JPEG and TIFF) containing graphs, and recovers the data points from those graphs" and a recent version "adds python support". I never used Engauge, but it sounds like what want...
Keep in mind, that it is not that easy to automate such a task, because finding the correct axis labels and "49,28" label even might overlap the graph sometimes...

In Python, you could try this Python3 utility. It says it can extract raw data from plots images.
But you can more easily extract data from graph images using GUI-friendly tools, like plotdigitizer.com or automeris.io. I prefer the former over the latter. You can find the entire list of such programs over here.

Compressing big GeoJSON/Shapefle datasets for viewing on web browser

So I have a shapefile that is 3GB in size and as you can imagine my browser doesn't like it. How can I compress the data I have which is either in lon/lat coordinates or points on an X,Y grid?
I saw a video on Computerphile about Discreet Cosine Transforms for reducing high dimesionality data but being a programmer and not a mathematician I don't know if this is even possible. I have tried to take a point every 10 steps in the file like so: map[0:100000:10] but this had an udesireable and very lossy effect.
I would ideally like to have my data so it would work like Google Earth in which the resolution adjusts to your viewport altitude. So when you zoom in to the map higher freqency data is presented in the viewport, limiting the amount of points but I don't know how they do this and Google return nothing of value.
Last point is that since these are just vectors is there any type of vector compression I could use? I'm not to great at math so as you can imagine when I look into this I just get confused fairly quickly. I uderstand SciPy has some DCT built in and I know it has a whole bunch of other features which I don't understand, perhaps I could use this?

I can answer the "level of detail" part: you can experiment with leaflet (a javascript mapping library). You could then define a "coarse" layer wich is displayed for low zoom levels and "high detail" layers that are only displayed at higher zoom levels. You probably need to capture the map zoomend event and load/unload your layers from there.

One solution to this problem is to use a Web Map Server (WMS) like GeoServer or MapServer that stores your ShapeFile (though a spatial database like PostGIS would be better) on the server and sends a rendered image (often broken down into cacheable tiles) to the browser.

Creating a fingerprint database

I was wondering how to create a fingerprint database. If fingerprints are stored as images, how do you compare images in a database, or create an image search engine like TinEye?
I know this is a big subject, but I'm just looking for a starting point. Can this be done using Python/Django libraries and MySQL?

OpenCV comes with a sample program that does what you are looking for. It's called find_obj.py. Pull it up in your editor and change:
surf = cv2.SURF(1000)
to
surf = cv2.SURF(100)
This should find lots of "inlier" points of interest in the negative of the fingerprint scan.
You can play around with a number of the variables and eventually find the best configuration for the sort of images you're comparing. It's also fairly straightforward to alter the sample to allow you to compare a single image against an entire directory.
I should point out that this will only be effective for the sort of digitized fingerprint scans used by law enforcement.

The Python Imaging Library is probably the best library to get started on image processing with.
The library most commonly used for real time image processing (you don't need real time, but you can't go wrong with fast) is OpenCV. It has Python Bindings and built-in feature detection algorithms. See also this comparison.
For an overview of image comparison algorithms take a look at this question.

As a very simple approach you can crawl all images and compute a hash for each.
Later on, when user submits an image for a search, you compute a hash for that too and look for the same hash in your database.
However, this is really simplistic approach and will only work if searched for exact image copies. Ideally, each image should be converted to some simplified feature set (to have tolerance against different versions of the same image --- different formats, sizes, noise, etc.) used for comparison. For instance, it could be worth trying convert images (both crawled and submitted for search) to grayscale of 128x128 size and compute hash of that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.