How to replace values in netcdf file with Nan? - python

I'm using a NASA GISS netcdf file with gridded monthly temperature values. According to the readme file "Missing data is flagged with a value of 9999.f" I am trying to plot the data but keep getting blank maps. I think its because this 9999.f value is throwing off my scale. How do I replace it with Nan? I tried:
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
data2 = Dataset(r'GriddedAir250.nc')
lats=data2.variables['lat'][:]
lons=data2.variables['lon'][:]
time=data2.variables['time'][:]
air=data2.variables['air'][:]
air=air.astype('float')
air[air==9999]=np.nan
But it looks like this gives me an array of boolean values:

netCDF4 creates masked arrays, and automatically masks the value 9999.0. In your code, this means the result of air = data2.variables['air'][:] is a masked array. So I suspect the problem is that the plotting code that you are trying to use does not handle masked arrays. If you think the plotting code can handle nan, you could try
air = air.filled(fill_value=np.nan)
This will convert air to a regular NumPy array, with the masked values (i.e. the values that were originaly 9999.0 in the .nc file) converted to nan.

Related

Read a HDF data to a 3d array and save as a dataframe in python

I am currently working on the NASA aerosol optical depth data (MCD19A2), which is a NASA satellite level three product. I have uploaded the data. I want to save the data as a dataframe including all the information of longitude and latitude, and values. I have successfully converted the 0.47um band file into a three-dimensional array. I want to ask how to convert this array into a correct dataframe includes X, Y and the value.
Below are the codes I have tried:
from osgeo import gdal
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
rds = gdal.Open("MCD19A2.A2006001.h26v04.006.2018036214627.hdf")
names=rds.GetSubDatasets()
names[0][0]
*'HDF4_EOS:EOS_GRID:"MCD19A2.A2006001.h26v04.006.2018036214627.hdf":grid1km:Optical_Depth_047'*
aod_047 = gdal.Open(names[0][0])
a47=aod_047.ReadAsArray()
a47[1].shape
(1200,1200)
I would like the result to be like
X (n=1200)
Y (n=1200)
AOD_047
8896067
5559289
0.0123
I know that in R this can be done by
require('gdalUtils')
require('raster')
require('rgdal')
file.name<-"MCD19A2.A2006001.h26v04.006.2018036214627.hdf"
sds <- get_subdatasets(file.name)
gdal_translate(sds[1], dst_dataset = paste0('tmp047', basename(file.name), '.tiff'), b = nband)
r.047 <- raster(paste0('tmp047', basename(file.name), '.tiff'))
df.047 <- raster::as.data.frame(r.047, xy = T)
names(df.047)[3] <- 'AOD_047'
But, R really relies on memory and saving to 'tif' and reading 'tif' is using a lot of memory. So I want to do this task in python. Thanks a lot for your help.
You can use pandas:
import pandas as pd
df=pd.read_hdf('filename.hdf')

Astronomical Plotting Techniques in Python

I am new to python and am currently in the process of attempting to plot the retrograde motion of mars. I have a txt file that has R.A. and Declination in addition 12 other rows of data (like apparent magnitude etc). However, from that file I am trying to convert only the R.A and Dec. to decimal degrees in order to create a scatter plot with dec on the x axis and R.A. on the y axis. After researching I discovered that atrophy/skycoord may be the best tool to use. The problem I am having is how to code the conversion for the two specific rows of data I'm needing. Any help is greatly appreciated![][1]
I am currently in the process of attempting to plot the retrograde motion of mars. I have a txt file that has R.A. and Declination in addition 12 other rows of data (like apparent magnitude etc). However, from that file I am trying to convert only the R.A and Dec. to decimal degrees in order to create a scatter plot with dec on the x axis and R.A. on the y axis. After researching I discovered that atrophy/skycoord may be the best tool to use. The problem I am having is how to code the conversion for the two specific rows of data I'm needing. Any help is greatly appreciated!
import numpy as np
import pandas as pd
import csv
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
f = open("Mars2.txt", "r")
print(f.read())
df = pd.read_csv('Mars2.txt', sep=";", names=['Date(0 UT)','Apparent R.A.','Apparent Declination','Distance to Earth','Distance to Sun','App. Mag.','Ang. Diam.','Phase Illum','Phase Angle','S.E Long','S.E Lat','P.A Axis','Ls','Solar Elong'])
print (df)
df.plot(x ='Apparent Declination', y='Apparent R.A.', kind = 'scatter')
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.io import ascii
c = SkyCoord(ra=10.625*u.degree, dec=41.2*u.degree, frame='icrs')

plotting a large matrix in python

I have a data file in excel (.xlsx). The data represents a 100 micrometer by 100 micrometer area. Number of steps were set at 50 for x and 50 for y meaning each pixel is 2 micrometer in size. How can I create a 2D image from this data.
getting data from xslx files can be achieved using the openpyxl python module. after installing the module a simple example is (assuming you have an xslx as in the image attached):
from openpyxl import load_workbook
wb = load_workbook("/path/to/matrix.xlsx")
cell_range = wb['Sheet1']['B2:G16']
for row in cell_range:
for cell in row:
print(str(cell.value) + " ", end='')
print("")
this would print all the vaules in the range, you could also read them into a numpy array and plot. xslx example
If you are willing to plot the pixels instead of points using matplotlib then you can convert your dataframe into numpy array and then plot that array using imshow() method of matplotlib, after manipulating the numpy array as per your need.

Image of Mnist data Python - Error when displaying the image

I'm working with the Mnist data set, in order to learn about Machine learning, and as for now I'm trying to display the first digit in the Mnist data set as an image, and I have encountered a problem.
I have a matrix with the dimensions 784x10000, where each column is a digit in the data set. I have created the matrix myself, because the Mnist data set came in the form of a text file, which in itself caused me quite a lot of problems, but that's a question for itself.
The MN_train matrix below, is my large 784x10000 matrix. So what I'm trying to do below, is to fill up a 28x28 matrix, in order to display my image.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
grey = np.zeros(shape=(28,28))
k = 0
for l in range(28):
for p in range(28):
grey[p,l]=MN_train[k,0]
k = k + 1
print grey
plt.show(grey)
But when I try to display the image, I get the following error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Followed by a image plot that does not look like the number five, as I would expect.
Is there something I have overlooked, or does this tell me that my manipulation of the text file, in order to construct the MN_train matrix, has resulted in an error?
The error you get is because you supply the array to show. show accepts only a single boolean argument hold=True or False.
In order to create an image plot, you need to use imshow.
plt.imshow(grey)
plt.show() # <- no argument here
Also note that the loop is rather inefficient. You may just reshape the input column array.
The complete code would then look like
import numpy as np
import matplotlib.pyplot as plt
MN_train = np.loadtxt( ... )
grey = MN_train[:,0].reshape((28,28))
plt.imshow(grey)
plt.show()

how to combine numpy ndarray?

I have MODIS atmospheric product. I used the code below to read the data.
%matplotlib inline
import numpy as np
from pyhdf import SD
import matplotlib.pyplot as plt
files = ['file1.hdf','file2.hdf','file3.hdf']
for n in files:
hdf=SD.SD(n)
lat = (hdf.select('Latitude'))[:]
lon = (hdf.select('Longitude'))[:]
sds=hdf.select('Deep_Blue_Aerosol_Optical_Depth_550_Land')
data=sds.get()
attributes = sds.attributes()
scale_factor = attributes['scale_factor']
data= data*scale_factor
plt.contourf(lon,lat,data)
The problem is, in some days, there are 3 data sets (as in this case, some days have four datasets) so I can not use hstack or vstack to merge these datasets.
My intention is to get the single array from three different data arrays.
I have also attached datafiles along with this link:https://drive.google.com/open?id=0B2rkXkOkG7ExYW9RNERaZU5lam8
your help will be highly appreciated.

Categories