I am looking for a way to easily plot a world map with a higher resolution compared to the built in resolution of Geopandas. To my knowing the built in dataset for a world map is only in low resolution:
import geopandas
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world.plot()
plt.show()
I already read this page but couldn't find an answer: https://geopandas.org/en/stable/docs/user_guide/mapping.html#
I am not looking for google maps precision, but I would appreciate a map where if I zoom in, Belgium for example is plotted by a polygon which has a bit more than 14 points (see screenshot). (Let's say 100 to 1000 points.)
(I need the full map of the world as I am plotting data in different countries and would like to zoom in.)
That's a limitation of the data that built-in polygons have. Vectors represent the map as it is, regardless of zooming in on the map.
As the number of points that make up a polygon increases, a more precise map requires more capacity.
You will be able to use a variety of sources.
https://gis.stackexchange.com/questions/182944/seeking-polygon-shapefile-of-countries-states-and-islands
The sites listed in the answers above will help.
Get and load world borders from OSM or GADM. I haven't checked the size of the data, but the closer the boundaries are to reality, the larger the size of the data and the memory requirements to load the file.
Related
Recently I have been reading a paper called Modeling Taxi Drivers’ Behaviour for the Next Destination Prediction. There is a figure(Fig.1) that I wonder how to draw. Based on what I know, it may be drawn by Python. Then what library of Python should I use to draw such a heatmap?
Thanks a lot in advance for your time and your expertise.
Best Regards
I have built something very similar for myself in the past. However, as you are just wondering how this would be drawn and not exactly asking how to do it with Python, I will share how I did it.
1. Building Grid:
The grids on the map are squares of X-size latitude and longitude. In short, those are latitude and longitude grids. I used an interactive map library named leaflet.js to build the world map with an overlay of latitude and longitude grids. This is the tutorial I followed from the leaflet: https://leafletjs.com/examples/choropleth/
Remember, you have to and can build your version of the grid to overlay on a world map using GeoJSON as discussed in the tutorial. At least, when I was building, there wasn't a publicly available version of the lat/long square grid.
2. Showing Colors (Heatmap):
Once you build the grid with GeoJSON, the leaflet can take the whole GeoJSON as it is and overlay it on any map of your choice. That means you can put the numbers(aka data) for each grid in the GeoJSON. This part is also shown in the same tutorial.
For my project, I used to create a complete GeoJSON formatted file with normalized data in Python and then visualize it in leaflet.js. Below is an example of what I have built using these tools.
I have a dataset of Points of Interest (via latitude/longitude) and want to generate a graph out of them. The aim is to have a routeable graph. So I want to have the "nearest" POIs to be connected via an edge.
I came up with Delaunay, cause this will create a simple planar graph out of the points. As far as good, i got some results. But the problem is, the edges are not good connected, due to the fact that the earth is not flat. In the northern and southern hemisphere the triangles are vertically strechted.
Is there a way to use the scipy.spatial.Delaunay package in order to accept latitude / longitude as positioning information, instead having a flat area?
Or does it make sense to use another procedure? Would be great to have some solutionen for python.
I got it running. I read this article and did the first steps to create the delaunay.
The main point is to make a stereographic projection and afterwards the regular 2D delaunay triangulation.
Now i got a nice basic graph.
I want to understand the clear difference between Datashader and other graphing libraries eg plotly/matplotlib etc.
I understand that in order to plot millions/billions of data points, we need datashader as other plotting libraries will hung up the browser.
But what exactly is the reason which makes datashader fast and does not hung up the browser and how exactly the plotting is done which doesnt put any load on the browser ????
Also, datashader doesnt put any load on browser because in the backend datashader will create a graph on the basis of my dataframe and send only the image to the browser which is why its fast??
Plz explain i am unable to understand the in and out clearly.
It may be helpful to first think of Datashader not in comparison to Matplotlib or Plotly, but in comparison to numpy.histogram2d. By default, Datashader will turn a long list of (x,y) points into a 2D histogram, just like histogram2d. Doing so only requires a simple increment of a grid cell for each new point, which is easily accellerated to machine-code speeds with Numba and is trivial to parallelize with Dask. The resulting array is then at most the size of your display screen, no matter how big your dataset is. So it's cheap to process in a separate program that adds axes, labels, etc., and it will never crash your browser.
By contrast, a plotting program like Plotly will need to convert each data point into a JSON or other serialized representation, pass that to JavaScript in the browser, have JavaScript draw a shape into a graphics buffer, and make each such shape support hover and other interactive features. Those interactive features are great, but it means Plotly is doing vastly more work per data point than Datashader is, and requires that the browser can hold all those data points. The only computation Datashader needs to do with your full data is to linearly scale the x and y locations of each point to fit the grid, then increment the grid value, which is much easier than what Plotly does.
The comparison to Matplotlib is slightly more complicated, because with an Agg backend, Matplotlib is also pre-rendering to a fixed-size graphics buffer before display (somewhat like Datashader). But Matplotlib was written before Numba and Dask (making it more difficult to speed up), it still has to draw shapes for each point (not just a simple increment), it can't fully parallelize the operations (because later points overwrite earlier ones in Matplotlib), and it provides anti-aliasing and other nice features not available in Datashader. So again Matplotlib is doing a lot more work than Datashader.
But if what you really want to do is see the faithful 2D distribution of billions of data points, Datashader is the way to go, because that's really all it is doing. :-)
From the datashader docs,
datashader is designed to "rasterize" or "aggregate" datasets into regular grids that can be viewed as images, making it simple and quick to see the properties and patterns of your data. Datashader can plot a billion points in a second or so on a 16GB laptop, and scales up easily to out-of-core or distributed processing for even larger datasets.
There aren't any tricks going on in any of these libraries - rendering a huge number of points takes a long time. What datashader does is to shift the burden of visualization from rendering to computing. There's a very good reason you have to create a canvas before plotting instructions in datashader. The first step in a datashader pipeline is to rasterize a dataset, in other words, it approximates the position of each piece of data and then uses aggregation functions to determine the intensity or color of each pixel. This allows datashader to plot enormous numbers of points; even more points than can be held in memory.
Matplotlib, on the other hand, renders every single point you instruct it to plot, making plotting large datasets time consuming or even impossible.
I have data in the format (latitude, longitude, value). I want to plot (lat, long) -> value on a map of the city. Something like the following images:
I've already tried the following:
Python's Matplotlib: Unable to find required functions
Plotly
r-barplots on map, RG-histogram-bar-chart-over-map.
plot-3d-bars-on-a-map-in-matlab: This will do, but I'm trying to find a similar thing in python
D3 map histogram: This allows me to plot
city-wise, but not within a city.
I posted above question and then, found interesting plotting libraries.
Cesium : An open-source JavaScript library for 3D globes and maps.
ArcGIS: This one is paid (60 days free trial is available), but provide a wide variety of beautiful visualizations on 2D maps and 3D globes
How about Basemap?
It is a matplotlib extension, so it has got all its features to create data visualizations and adds the geographical projections and some datasets to be able to plot coastlines, countries directly from the library.
I'm working with some instrument data that has records the temperature at a specific latitude, longitude, and pressure (height) coordinate. I need to create a 3d grid from this instrument data that I can then use to take a vertical cross sections of the interpolated gridded data. I've looked at pretty much every interpolation function/library I can find and I'm still having trouble just wrapping my head around how to do this.
I'd prefer not to use Mayavi, since it seems to bug out on my school's server and I'd rather not try to deal with fixing it right now.
The data is currently in 4 separate 1d arrays and I used those to mock up some scatter plots of what I'm trying to get.
Here is the structure of my instrument data points:
And here is what I'm trying to create:
Ultimately, I'd like to create some kind of 3d contour from these points that I can take slices of. Each of the plotted points has a corresponding temperature attached to it, which is really what I think is throwing me off in terms of dimensions and whatnot.
There are a few options to go from the unstructured data which you have to a structured dataset.
The simplest option might be to use the scipy interpolate.griddata method which can interpolate unstructured points using, linear or cubic interpolation.
Another option is to define your grid and then average all of the unstructured points which fall into each grid cell, giving you some gridded representation of the data. You could use a tool such as CIS to do this easily (full disclosure, I wrote this package to do exactly this kind of thing).
Or, there are more complicated methods of interpolating the data by trying to determine the most likely value of the grid points based on the unstructured data, for example using kriging with the pyKriging package, though I've never used this.