how to edit part of an hdf5 file

how to edit part of an hdf5 file - python

I'm trying to edit precipitation rate values in an existing hdf5 file such that values >= 10 get rewritten as 1 and values < 10 get rewritten as 0. This is what I have so far. The code runs without errors, but after checking the hdf5 files it appears that the changes to the precipitation rate dataset weren't made. I'd appreciate any ideas on how to make it work.
import h5py
import numpy as np
import glob
filenames = []
filenames += glob.glob("/IMERG/Exceedance/2014_E/3B-HHR.MS.MRG.3IMERG.201401*")
for file in filenames:
f = h5py.File(file,'r+')
new_value = np.zeros((3600, 1800))
new_value = new_value.astype(int)
precip = f['Grid/precipitationCal'][0][:][:]
for i in precip:
for j in i:
if j >= 10.0:
new_value[...] = 1
else:
pass
precip[...] = new_value
f.close()

It seems like you are not writing the new values into the file, but only storing them in an array.

It seems like you're only changing the values of the array, not actually updating anything in the file object. Also, I'd get rid of that for loop - it's slow! Try this:
import h5py
import numpy as np
import glob
filenames = []
filenames += glob.glob("/IMERG/Exceedance/2014_E/3B-HHR.MS.MRG.3IMERG.201401*")
for file in filenames:
f = h5py.File(file,'r+')
precip = f['Grid/precipitationCal'][0][:][:]
# Replacing the for loop
precip[precip>10.0] = 1
# Assign values
f['Grid/precipitationCal'][0][:][:] = precip
f.close()

Related

How to write csv inside a loop python

i've done got my outputs for the csv file, but i dont know how to write it into csv file because output result is numpy array
def find_mode(np_array) :
vals,counts = np.unique(np_array, return_counts=True)
index = np.argmax(counts)
return(vals[index])
folder = ("C:/Users/ROG FLOW/Desktop/Untuk SIDANG TA/Sudah Aman/testbikincsv/folderdatacitra/*.jpg")
for file in glob.glob(folder):
a = cv2.imread(file)
rows = a.shape[0]
cols = a.shape[1]
middlex = cols/2
middley = rows/2
middle = [middlex,middley]
titikawalx = middlex - 10
titikawaly = middley - 10
titikakhirx = middlex + 10
titikakhiry = middley + 10
crop = a[int(titikawaly):int(titikakhiry), int(titikawalx):int(titikakhirx)]
c = cv2.cvtColor(crop, cv2.COLOR_BGR2HSV)
H,S,V = cv2.split(c)
hsv_split = np.concatenate((H,S,V),axis=1)
Modus_citra = (find_mode(H)) #how to put this in csv
my outputs is modus citra which is array np.uint8, im trying to put it on csv file but im still confused how to write it into csv because the result in loop.
can someone help me how to write it into csv file ? i appreciate every help

Run your loop, and put the data into lists
eg. mydata = [result1,result2,result3]
Then use csv.writerows(mydata) to write your list into csv rows
https://docs.python.org/3/library/csv.html#csv.csvwriter.writerows

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format. You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma. For example:
import numpy as np
my_array = np.array([1,2,3,4,5,6,7,8,9,10])
my_file = np.savetxt('randomtext.csv', my_array, delimiter = ',', fmt = '%d')
print(my_file)

Extracting all specific rows (separately) from multiple csv files and combine rows to save as a new file

I have a number of csv files. I need to extract all respective rows from each file and save it as a new file.
i.e. first output file must contain first rows of all input files and so on.
I have done the following.
import pandas as pd
import os
import numpy as np
data = pd.DataFrame('', columns =['ObjectID', 'SPI'], index = np.arange(1,100))
path = r'C:\Users\bikra\Desktop\Pandas'
i = 1
for files in os.listdir(path):
if files[-4:] == '.csv':
for j in range(0,10, 1):
#print(files)
dataset = pd.read_csv(r'C:\Users\bikra\Desktop\Pandas'+'\\'+files)
spi1 = dataset.loc[j,'SPI']
data.loc[i]['ObjectID'] = files[:]
data.loc[i]['SPI'] = spi1
data.to_csv(r'C:\Users\bikra\Desktop\Pandas\output\\'+str(j)+'.csv')
i + 1
It works well when index (i.e. 'j' ) is specified. But when I tried to loop, the output csv file contains only first row. Where am I wrong?

You better use append:
data = data.append(spi1)

How to update portion of the netCDF file?

I have netCDF files with 1500 rows and 2000 columns. Few of them contain inconsistencies in data at different locations. I want to update such inconsistencies with NoData values. While researching I found many answers where one would like to update variable values above/below a certain threshold. For example:
#------ Research-----
dset['var'][:][dset['var'][:] < 0] = -1
#-----------------
Python : Replacing Values in netcdf file using netCDF4
Since, the values of inconsistencies match with the data values, updating inconsistencies based on below / above a certain threshold value is not possible.
My approach 1:
ncfile = r'C:\\abcd\\55618_12.nc'
variableName = 'MAX'
fh = Dataset(ncfile, mode='r+')
for i in range(500,600,1):
for j in range(200,300,1):
fh.variables[variableName][i][j] = -99900.0 # NoData value
#--- or
#fh.variables[variableName][i:j] = -99900.0
fh.close()
Approach 2:
fh = Dataset(ncfile, mode='r')
val = fh.variables[variableName]
for i in range(500,600,1):
for j in range(200,300,1):
val[i][j] = -99900.0
fh = Dataset(ncfile, mode='w') #(ncfile, mode='a')(ncfile, mode='r+')
fh.variables[variableName] = val
fh.close()
Result:
The scripts completes processing successfully. However do not update the .nc file.
Friends, your help is highly appreciated.

Following approach worked for me:
import netCDF4 as nc
import numpy as np
ncfile = r'C:\\abcd\\55618_12.nc'
variableName = 'MAX'
fh = nc.Dataset(ncfile, mode='r')
val = fh.variables[variableName][:]
fh.close()
print type (val)
for i in range(500,600,1):
for j in range(200,300,1):
#print i,j
val[i][j] = -99900.0
if val[i][j]> -99900.0:
print val[i][j]
fh = nc.Dataset(ncfile, mode='r+')
fh.variables[variableName][:]= val
fh.close()

Is the data on a lat/lon grid? If so it may be easier to do it from the command line using cdo:
cdo setclonlatbox,FillValue,lon1,lon2,lat1,lat2 infile.nc outfile.nc
Where FillValue is your missing value which seems to be -99900.0 in your case.

Numpy ValueError: setting an array element with a sequence reading in list

I have this code that reads numbers and is meant to calculate std and %rms using numpy
import numpy as np
import glob
import os
values = []
line_number = 6
road = '/Users/allisondavis/Documents/HCl'
for pbpfile in glob.glob(os.path.join(road, 'pbpfile*')):
lines = open(pbpfile, 'r').readlines()
while line_number < len(lines) :
variables = lines[line_number].split()
values.append(variables)
line_number = line_number + 3
a = np.asarray(values).astype(np.float)
std = np.std(a)
rms = std * 100
print rms
However I keep getting the error code:
Traceback (most recent call last):
File "rmscalc.py", line 17, in <module>
a = np.asarray(values).astype(np.float)
ValueError: setting an array element with a sequence.
Any idea how to fix this? I am new to python/numpy. If I print my values it looks something like this:
[[1,2,3,4],[2,4,5,6],[1,3,5,6]]

I can think of a modification to your code which can potentially fix your problem:
Initialize values as a numpy array, and use numpy append or concatenate:
values = np.array([], dtype=float)
Then inside loop:
values = np.append(values, [variables], axis=0)
# or
variables = np.array(lines[line_number].split(), dtype=float)
values = np.concatenate((values, variables), axis=0)
Alternatively, if you files are .csv (or any other type Pandas can read):
import pandas as pd
# Replace `read_csv` with your appropriate file reader
a = pd.concat([pd.read_csv(pbpfile)
for pbpfile in glob.glob(os.path.join(road, 'pbpfile*'))]).values
# or
a = np.concatenate([pd.read_csv(pbpfile).values
for pbpfile in glob.glob(os.path.join(road, 'pbpfile*'))], axis=0)

Filling an array with data from dat files in python

I have a folder that has dat files, each of which contains data that should be places on a 360 x 181 grid. How can I populate an array of that size with the data? First, the data comes out as a strip, that is, 1 x (360*181). The data needs to be reshaped and then placed into the array.
Try as I might I can not get this to work correctly. I was able to get the data to read into an array, however it seemed that it was being placed into elements psuedo-randomly, as each element did not necessarily match up with the correct placement, as I had previously found in MATLAB. I also have the data in txt format, should that make this easier.
Here is what I have so far, not much luck (very new to python):
#!/usr/bin/python
############################################
#
import csv
import sys
import numpy as np
import scipy as sp
#
#############################################
level = input("Enter a level: ");
LEVEL = str(level);
MODEL = raw_input("Enter a model: ");
NX = 360;
NY = 181;
date = 201409060000;
DATE = str(date);
#############################################
FileList = [];
data = [];
for j in range(24,384,24):
J = str(j);
for i in range(1,51,1):
I = str(i);
fileName = '/Users/alexg/ECMWF_DATA/DAT_FILES/'+MODEL+'_'+LEVEL+'_h_'+I+'_FT0'+J+'_'+DATE+'.dat';
fo = open(FileList(i), "r");
data.append(fo);

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to edit part of an hdf5 file - python

It seems like you are not writing the new values into the file, but only storing them in an array.

Related

How to write csv inside a loop python

Extracting all specific rows (separately) from multiple csv files and combine rows to save as a new file

How to update portion of the netCDF file?

Numpy ValueError: setting an array element with a sequence reading in list

Filling an array with data from dat files in python

Categories

Resources