I have a set of data with each row representing a series of observations taken at a particular point. Each column represents a different observation as well as information regarding where the data was collected (longitude and latitude (column 3 and 4 respectively). The data needs to be gridded so that the data is over 5degrees latitude 5degrees longitude bin-averaged grids. How would I go about doing this in python?
I don't know how to solve this. hope someone could help me.thanks so much
Related
I am attempting to generate a folium heatmap based on geopoints and how often the geopoint appear in my dataset. Tbh it seems that the counter how often the geopoint appear did not affect my heatmap.
Additionally to that i need a legend that everybody can read my heatmap.
My data is saved in a pandas dataframe with following columns:
Latitude Longitude count
Count hold the data for how often each Latitude and Longitude point occur in the dataset.
If i generate a heatmap like:
heat_data2 = [[row['Latitude'], row['Longitude'], row['count']] for index, row in df.iterrows()]
it seems that the count does not get included. I noticed this at the moment i added a legend like this:
steps = 200
colormap = branca.colormap.linear.YlOrRd_09.scale(0, 5000).to_step(steps)
gradient_map = defaultdict(dict)
for i in range(steps):
gradient_map[1 / steps * i] = colormap.rgb_hex_str(1 / steps * i)
colormap.add_to(map)
all points i see on heatmap have the color of the least occurrence. How can i combine the the count and the geopoints to get a heatmap that shows me how often each point occurred?
I also appreciate any tips for better tools to generate heatmaps for geodata in python!
I am currently working with a 3D array with global ozone data with dimensions (42, 361, 576) which is (years, latitude, longitude). This array contains the ozone data on January 1st of every year of my dataset (which is 42 years long).
I am trying to now make a time series line/scatter plot of ozone in specific locations around the globe. However, I can't seem to figure out how to specify the latitude and longitude of a location within the entire array, and put the information from that location over the 42 years into a 2D array which I can then use to make my plot.
Within the data, longitude ranges from -180 to 180, with steps of 0.625. Latitude ranges from -90 to 90, with steps of 0.5.
My end goal is to have a plot with time on the x-axis, and the actual ozone values for the y-axis. I have done a lot of research into finding solutions for this, and I have yet to find anything that applies to what I am trying to do.
Any help is appreciated, as I am still fairly new to Python and how to work with arrays.
I have a .nc file that I open with xarray as a dataset. This dataset has 3 variables:
Band (5000x300x250)
latitude (300x250)
longitude (300x250)
Its dimensions are:
time (5000)
y (300)
x (250)
I created the dataset myself and made a mistake, because I would like to "grab" the timeseries of a specific point of "Band" based on its coordinates value:
dataset.Band.sel(longitude=6.696e+06,latitude=4.999e+05,method='nearest')
(I based the values to grab on the first values of both variables).
The issue is that when I created the .nc file, I did not enter the latitude and longitude as dimensions but as variables. Is there a way to use my code but modify a few things so I can grab the point based on the nearest values of variables latitude and longitude ? Or should I redefine completely the dimensions of my .nc to replace x and y by longitude and latitude?
there isn't a great way to select data using the lat/lon values - as your data is structured you essentially have mutlidimensional coordinates.
That said, if your lat/lon are actually only indexed by x OR y; that is, latitude has the same value repeated over and over for all levels of x and same for longitude with y, you could reorganize your data pretty easily:
lats = dataset.latitude.mean(dim='x')
lons = dataset.longitude.mean(dim='y')
dataset = dataset.drop(['latitude', 'longitude'])
dataset.coords['latitude'] = latitude
dataset.coords['longitude'] = longitude
dataset = dataset.swap_dims({'x': 'longitude', 'y': 'latitude'})
At this point, your data is indexed by time, latitude, longitude and you can select the data how you'd like
I was wondering if I could get some concept ideas from you all before spending too much time on this.
I have a (X,Y,Z) heatmap file showing the energy (Z value) of multiple XY coordinates.
X,Y,Z
-8.000000,0.000000,30
-7.920000,0.000000,30
-7.840000,0.000000,30
-7.760000,0.000000,30
-7.680000,0.000000,30
(...)
7.680000,25.000000,30
7.760000,25.000000,30
7.840000,25.000000,30
7.920000,25.000000,30
8.000000,25.000000,30
I would like to determine possible pathways between 2 points in the XY space. These pathways should consist of a series of XY coordinates with the lowest Z values necessary in order to connect the selected regions.
I appreciate any suggestions on how to approach this.
I am trying to read a NetCDF file from IRI/LDEO Climate Data Library (dust_pm25_sconc10_mon) but I am with a problem to read this file. When I select the variables that compuse the database (longitude (X), latitude (Y) and time (T)), the output from X and Y are a sequence with the number of observations (1, 2, ..., 139 for example). That is, the values of longitude and latitude are not exported corretly.
Someone could help me with this problem? I already tried read this file with R, Python and Qgis and in all of these threes the output of X and Y are the same.
My codes are below (Python).
Thank you all very much.
from netCDF4 import Dataset as dt
filestr = 'dust_pm25_sconc10_mon.nc'
ncfile = dt(filestr, 'r')
print(ncfile.variables)
lat = ncfile.variables['Y'][:]
lat
lon = ncfile.variables['X'][:]
lon
time = ncfile.variables['T'][:]
time
Edit:
This file has three independent variables, X, Y, and T. And the values of X and Y intentionally go from 1 to len(X) and len(Y) respectively.
Look at the description of the file:
http://iridl.ldeo.columbia.edu/home/.nasa_roses_a19/.Dust_model/.dust_mon_avg/.dust_pm25_sconc10_mon/
Independent Variables (Grids)
Time
grid: /T (months since 1960-01-01) ordered (Mar 1979) to (Mar 2010) by 1.0 N= 373 pts :grid
Longitude
grid: /X (unitless) ordered (1.0) to (191.0) by 1.0 N= 191 pts :grid
Latitude
grid: /Y (unitless) ordered (1.0) to (139.0) by 1.0 N= 139 pts :grid
Of course, this might be meaningful for longitude, but for latitude this is nonsense. Unfortunately, I did not find any hint which area on this planet this dataset should describe.
However, I also did not find any data in it's only dependent variable dust_pm25_sconc10_mon - it's empty.
PS: Just as an example:
This dataset here
http://iridl.ldeo.columbia.edu/home/.nasa_roses_a19/.Dust_model/.RegDustModelProjected/.dust_pm25_sconc10/datafiles.html
looks much more reasonable...
The description alone is much more promising:
Independent Variables (Grids)
Time (time)
grid: /T (days since 2009-01-02 00:00) ordered (0130-0430 2 Jan 2009) to (2230 1 Apr 2010 - 0130 2 Apr 2010) by 0.125 N= 3640 pts :grid
Longitude
grid: /X (degree_east) ordered (19.6875W) to (54.6875E) by 0.625 N= 120 pts :grid
Latitude
grid: /Y (degree_north) ordered (0.3125N) to (39.6875N) by 0.625 N= 64 pts :grid
And its dependent variable dust_pm25_sconc10 is also not empty.
I really tried to find this file on the website you mentioned, but it is futile imo. So without knowing it, I have to guess:
netcdf-files provide the possibility to save data space by scaling and shifting the values of any variable so that they can be stored e.g. as int instead of float.
You could simply check, if there are attributes add_offset other than 0 and scale_factor other than 1.
For further information about this concept you can refer to https://www.unidata.ucar.edu/software/netcdf/workshops/2010/bestpractices/Packing.html.
While the information in the link above states that the java interface to netcdf does apply these attributes automatically, the netcdf4-python library does not. So if you want to stay with this package, you have to rescale and -offset the data back to the original values as described.
However, you could also consider trying out xarray, a library which implements the n-dimensional datastructure of netcdf files and as far ss I experienced, this library does automatic scaling and offsetting according to the rules described above.
http://xarray.pydata.org/en/stable/
The example file at http://iridl.ldeo.columbia.edu/home/.nasa_roses_a19/.Dust_model/.dust_mon_avg/.dust_pm25_sconc10_mon/datafiles.html that you linked in your comment on SpghttCd's response is not well-formed. For one thing, the X and Y arrays do not have units attributes appropriate to such dimensions but instead both have value "units". And as already noted the values in the arrays don't "look" valid anyway. Further, the values in the dust_pm25_sconc10_mon array in that file all appear to be NaN.
On the other hand the example dataset at http://iridl.ldeo.columbia.edu/home/.nasa_roses_a19/.Dust_model/.RegDustModelProjected/.dust_pm25_sconc10/datafiles.html that SpghttCd references has good units attribute information ("degrees_east" and "degrees_north", respectively). Furthermore, the actual values in the X and Y arrays look good. I had no problem making a plot of the dust_pm25_sconc10 variable in that dataset (using Panoply) and seeing the data mapped over the appropriate region.
SpghttCd's comments regarding scaling and offsets do not apply here as the longitude and latitudes in that second, good file have actual lon and lat values.