I'm really new at python and I'm using a script to do some data analysis.
The analysis part is standard and in order to get results I need to import an .epw file.
nhours = 8760
n12h = 730
klimain = 'weather.epw'
data = readclimafile(klimain)
And I use the data array to do analysis. For example, I do this
def readradfile(radin):
rad = np.loadtxt(radin)
def do_analysis(t, rad):
R = 6.667
for j in range(rad.shape[0]):
q[j] = (20-t[j])/R
So, when I use an .epw for a whole year it works fine.
But when I import an .epw for 8 days (192 hours), I get an error:
q[j] = (20-t[j])/R
IndexError: index 192 is out of bounds for axis 0 with size 192
I changed the nhours to 191 and the n12h=16. I don't understand why it doesn't work. There's nowhere else to define the length of array, just the .epw imported. And it works fine for larger file, why does it get an error for a much smaller?
Any ideas?
Thank you!
Related
Function I tried to replicate:
doing a project for coursework in which I need to make the blackbody function and manipulate it in some ways.
I'm trying out alternate equations and in doing 2 of them i keep getting over flow error.
this is the error message:
alt2_2 = (1/((const_e**(freq/temp))-1))
OverflowError: (34, 'Result too large')
temp is given in kelvin (im using 5800 as my test value as it is approximately the temp of the sun)
freq is speed of light divided by whatever wavelength is inputted
freq = (3*(10**8))/wavelength
in this case i am using 0.00000005 as the test value for wavelength.
and const e is 2.7182
first time using stack. also first time doing a project on my own, any help appreciated.
This does the blackbody computation with your values.
import math
# Planck constant
h = 6.6e-34
# Boltzmann constant
k = 1.38e-23
# Speed of light
c = 3e+8
# Wavelength
wl = 0.00000005
# Temp
T = 5800
# Frequency
f = c/wl
# This is the exponent for e (about 49).
k1 = h*f / (k*T)
# This computes the spectral radiance.
Bvh = 2*f*f*f*h / (math.exp(k1)-1)
print(Bvh)
Output:
9.293819741690355e-08
Since we only used one or two digits on the way in, the resulting value is only good to one or two digits, 9.3E-08.
I'm a Python noobie trying to find my way around using it for Dynamo. I've had quite a bit of success using simple while loops/nested ifs to neaten my Dynamo scripts; however, I've been stumped by this recent error.
I'm attempting to pull in lists of data (flow rates from pipe fittings) and then output the maximum flow rate of each fitting by comparing the indices of each list (a cross fitting would have 4 flow rates in Revit, I'm comparing each pipe inlet/outlet flow rate and calculating the maximum for sizing purposes). For some reason, adding lists in the while loop and iterating the indices gives me the "unexpected token" error which I presume is related to "i += 1" according to online debuggers.
I've been using this while loop code format for a bit now and it has always worked for non-listed related iterations. Can anyone give me some guidance here?
Thank you in advance!
Error in Dynamo:
Warning: IronPythonEvaluator.EvaluateIronPythonScript
operation failed.
unexpected token 'i'
Code used:
import sys
import clr
clr.AddReference('ProtoGeometry')
from Autodesk.DesignScript.Geometry import *
dataEnteringNode = IN
a = IN[0]
b = IN[1]
c = IN[2]
d = IN[3]
start = 0
end = 3
i = start
y=[]
while i < end:
y.append(max( (a[i], b[i], c[i] ))
i += 1
OUT = y
I'm plotting a function over a range of values, so naturally I fed a numpy.arange() into the function to get the dependent values for the plot. However, some of the values are going to NaN or infinity. I know why this is happening, so I went back into the function and included some conditionals. If they were working, these would replace values leading to the NaN outputs only when conditions are met. However, based on the errors I'm getting back, it appears the operations are being performed on every entry in the inputs, rather than only when those conditions are met.
My code is as follows
import matplotlib.pyplot as plt
import numpy as np
import math
k=10
l=50
pForPlot = np.arange(0,1,0.01)
def ProbThingHappens(p,k,l):
temp1 = (((1-p)**(k-1)*((k-1)*p+1))**l)
temp2 = (((1-p)**(1-k))/((k-1)*p+1))
if float(temp2) <= 1.0*10^-300:
temp2 = 0
print("Temp1 = " + str(temp1))
print("Temp2 = " + str(temp2))
temp = temp1*(temp2**l-1)
if math.isnan(temp):
temp = 1
print("Temp = " + str(temp))
return temp
plt.plot(pForPlot,ProbThingHappens(pForPlot,k,l))
plt.axis([0,1,0,1])
plt.xlabel("p")
plt.ylabel("Probability of thing happening")
plt.rcParams["figure.figsize"] = (20,20)
plt.rcParams["font.size"] = (20)
plt.show()
And the error it returns is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-e707ad24af01> in <module>
20 return temp
21
---> 22 plt.plot(pForPlot,ProbAnyOverlapsAtAll(pForPlot,k,l))
23 plt.axis([0,1,0,0.01])
24 plt.xlabel("p")
<ipython-input-20-e707ad24af01> in ProbAnyOverlapsAtAll(p, k, l)
10 temp1 = (((1-p)**(k-1)*((k-1)*p+1))**l)
11 temp2 = (((1-p)**(1-k))/((k-1)*p+1))
---> 12 if float(temp2) <= 1.0*10^-300:
13 temp2 = 0
14 print("Temp1 = " + str(temp1))
TypeError: only size-1 arrays can be converted to Python scalars
How do I specify within the function that I only want to operate on specific values, rather than the whole array of values being produced? I assume that if I know how to get float() to pick out just one value, it should be straightforward to do the same for other conditionals. I'm terribly sorry if this question has been asked before, but I searched for answers using every phrasing I could imagine. I'm afraid in this case I may simply not know know the proper terminology. Any assistance would be greatly appreciated.
I have a binary file that was created using a Python code. This code mainly scripts a bunch of tasks to pre-process a set of data files. I would now like to read this binary file in Fortran. The content of the binary file is coordinates of points in a simple format e.g.: number of points, x0, y0, z0, x1, y1, z1, ....
These binary files were created using the 'tofile' function in numpy. I have the following code in Fortran so far:
integer:: intValue
double precision:: dblValue
integer:: counter
integer:: check
open(unit=10, file='file.bin', form='unformatted', status='old', access='stream')
counter = 1
do
if ( counter == 1 ) then
read(unit=10, iostat=check) intValue
if ( check < 0 ) then
print*,"End Of File"
stop
else if ( check > 0 ) then
print*, "Error Detected"
stop
else if ( check == 0 ) then
counter = counter + 1
print*, intValue
end if
else if ( counter > 1 ) then
read(unit=10, iostat=check) dblValue
if ( check < 0 ) then
print*,"End Of File"
stop
else if ( check > 0 ) then
print*, "Error Detected"
stop
else if ( check == 0 ) then
counter = counter + 1
print*,dblValue
end if
end if
end do
close(unit=10)
This unfortunately does not work, and I get garbage numbers (e.g 6.4731191026611484E+212, 2.2844499004808491E-279 etc.). Could someone give some pointers on how to do this correctly?
Also what would be a good way of writing and reading binary files interchangeably between Python and Fortran - as it seems like that is going to be one of the requirements of my application.
Thanks
Here's a trivial example of how to take data generated with numpy to Fortran the binary way.
I calculated 360 values of sin on [0,2π),
#!/usr/bin/env python3
import numpy as np
with open('sin.dat', 'wb') as outfile:
np.sin(np.arange(0., 2*np.pi, np.pi/180.,
dtype=np.float32)).tofile(outfile)
exported that with tofile to binary file 'sin.dat', which has a size of 1440 bytes (360 * sizeof(float32)), read that file with this Fortran95 (gfortran -O3 -Wall -pedantic) program which outputs 1. - (val**2 + cos(x)**2) for x in [0,2π),
program numpy_import
integer, parameter :: REAL_KIND = 4
integer, parameter :: UNIT = 10
integer, parameter :: SAMPLE_LENGTH = 360
real(REAL_KIND), parameter :: PI = acos(-1.)
real(REAL_KIND), parameter :: DPHI = PI/180.
real(REAL_KIND), dimension(0:SAMPLE_LENGTH-1) :: arr
real(REAL_KIND) :: r
integer :: i
open(UNIT, file="sin.dat", form='unformatted',&
access='direct', recl=4)
do i = 0,ubound(arr, 1)
read(UNIT, rec=i+1, err=100) arr(i)
end do
do i = 0,ubound(arr, 1)
r = 1. - (arr(i)**2. + cos(real(i*DPHI, REAL_KIND))**2)
write(*, '(F6.4, " ")', advance='no')&
real(int(r*1E6+1)/1E6, REAL_KIND)
end do
100 close(UNIT)
write(*,*)
end program numpy_import
thus if val == sin(x), the numeric result must in good approximation vanish for float32 types.
And indeed:
output:
360 x 0.0000
So thanks to this great community, from all the advise I got, and a little bit of tinkering around, I think I figured out a stable solution to this problem, and I wanted to share with you all this answer. I will provide a minimal example here, where I want to write a variable size array from Python into a binary file, and read it using Fortran. I am assuming that the number of rows numRows and number of columns numCols are also written along with the full array datatArray. The following Python script writeBin.py writes the file:
import numpy as np
# Read in the numRows and numCols value
# Read in the array values
numRowArr = np.array([numRows], dtype=np.float32)
numColArr = np.array([numCols], dtype=np.float32)
fileObj = open('pybin.bin', 'wb')
numRowArr.tofile(fileObj)
numColArr.tofile(fileObj)
for i in range(numRows):
lineArr = dataArray[i,:]
lineArr.tofile(fileObj)
fileObj.close()
Following this, the fortran code to read the array from the file can be programmed as follows:
program readBin
use iso_fortran_env
implicit none
integer:: nR, nC, i
real(kind=real32):: numRowVal, numColVal
real(kind=real32), dimension(:), allocatable:: rowData
real(kind=real32), dimension(:,:), allocatable:: fullData
open(unit=10,file='pybin.bin',form='unformatted',status='old',access='stream')
read(unit=10) numRowVal
nR = int(numRowVal)
read(unit=10) numColVal
nC = int(numColVal)
allocate(rowData(nC))
allocate(fullData(nR,nC))
do i = 1, nR
read(unit=10) rowData
fullData(i,:) = rowData(:)
end do
close(unit=10)
end program readBin
The main point that I gathered from the discussion on this thread is to match the read and the write as much as possible, with precise specifications of the data types to be read, the way they are written etc. As you may note, this is a made up example, so there may be some things here and there that are not perfect. However, I have used this now to program a finite element program, and the mesh data was where I used this binary read/write - and it worked very well.
P.S: In case you find some typo, please let me know, and I will edit it out rightaway.
Thanks a lot.
I am struggling to get to grips with this.
I create a netcdf4 file with the following dimensions and variables (note in particular the unlimited point dimension):
dimensions:
point = UNLIMITED ; // (275935 currently)
realization = 24 ;
variables:
short mod_hs(realization, point) ;
mod_hs:scale_factor = 0.01 ;
short mod_ws(realization, point) ;
mod_ws:scale_factor = 0.01 ;
short obs_hs(point) ;
obs_hs:scale_factor = 0.01 ;
short obs_ws(point) ;
obs_ws:scale_factor = 0.01 ;
short fchr(point) ;
float obs_lat(point) ;
float obs_lon(point) ;
double obs_datetime(point) ;
}
I have a Python program that populated this file with data in a loop (hence the unlimited record dimension - I don't know apriori how big the file will be).
After populating the file, it is 103MB in size.
My issue is that reading data from this file is quite slow. I guessed that this is something to do with chunking and the unlmited point dimension?
I ran ncks --fix_rec_dmn on the file and (after a lot of churning) it produced a new netCDF file that is only 32MB in size (which is about the right size for the data it contains).
This is a massive difference in size - why is the original file so bloated? Also - accessing the data in this file is orders of magnitude quicker. For example, in Python, to read in the contents of the hs variable takes 2 seconds on the original file and 40 milliseconds on the fixed record dimension file.
The problem I have is that some of my files contain a lot of points and seem to be too big to run ncks on (my machine runs out of memoery and I have 8GB), so I can't convert all the data to fixed record dimension.
Can anyone explain why the file sizes are so different and how I can make the original files smaller and more efficient to read?
By the way - I am not using zlib compression (I have opted for scaling floating point values to an integer short).
Chris
EDIT
My Python code is essentially building up one single timeseries file of collocated model and observation data from multiple individual model forecast files over 3 months. My forecast model runs 4 times a day, and I am aggregateing 3 months of data, so that is ~120 files.
The program extracts a subset of the forecast period from each file (e.t. T+24h -> T+48h), so it is not a simple matter of concatenating the files.
This is a rough approxiamtion of what my code is doing (it actually reads/writes more variables, but I am just showing 2 here for clarity):
# Create output file:
dout = nc.Dataset(fn, mode='w', clobber=True, format="NETCDF4")
dout.createDimension('point', size=None)
dout.createDimension('realization', size=24)
for varname in ['mod_hs','mod_ws']:
v = ncd.createVariable(varname, np.short,
dimensions=('point', 'realization'), zlib=False)
v.scale_factor = 0.01
# Cycle over dates
date = <some start start>
end_dat = <some end date>
# Keeo track if record dimension ('point') size:
n = 0
while date < end_date:
din = nc.Dataset("<path to input file>", mode='r')
fchr = din.variables['fchr'][:]
# get mask for specific forecast hour range
m = np.logical_and(fchr >= 24, fchr < 48)
sz = np.count_nonzero(m)
if sz == 0:
continue
dout.variables['mod_hs'][n:n+sz,:] = din.variables['mod_hs'][:][m,:]
dout.variables['mod_ws'][n:n+sz,:] = din.variables['mod_wspd'][:][m,:]
# Increment record dimension count:
n += sz
din.close()
# Goto next file
date += dt.timedelta(hours=6)
dout.close()
Interestingly, if I make the output file format NETCDF3_CLASSIC rather that NETCDF4 the output size the size that I would expect. NETCDF4 output seesm to be bloated.
My experience has been that the default chunksize for record dimensions depends on the version of the netCDF library underneath. For 4.3.3.1, it is 524288. 275935 records is about half a record-chunk. ncks automatically chooses (without telling you) more sensible chunksizes than netCDF defaults, so the output is better optimized. I think this is what is happening. See http://nco.sf.net/nco.html#cnk
Please try to provide a code that works without modification if possible, I had to edit to get it working, but it wasn't too difficult.
import netCDF4 as nc
import numpy as np
dout = nc.Dataset('testdset.nc4', mode='w', clobber=True, format="NETCDF4")
dout.createDimension('point', size=None)
dout.createDimension('realization', size=24)
for varname in ['mod_hs','mod_ws']:
v = dout.createVariable(varname, np.short,
dimensions=('point', 'realization'), zlib=False,chunksizes=[1000,24])
v.scale_factor = 0.01
date = 1
end_date = 5000
n = 0
while date < end_date:
sz=100
dout.variables['mod_hs'][n:n+sz,:] = np.ones((sz,24))
dout.variables['mod_ws'][n:n+sz,:] = np.ones((sz,24))
n += sz
date += 1
dout.close()
The main difference is in createVariable command. For file size, without providing "chunksizes" in creating variable, I also got twice as large file compared to when I added it. So for file size it should do the trick.
For reading variables from file, I did not notice any difference actually, maybe I should add more variables?
Anyway, it should be clear how to add chunk size now, You probably need to test a bit to get good conf for Your problem. Feel free to ask more if it still does not work for You, and if You want to understand more about chunking, read the hdf5 docs
I think your problem is that the default chunk size for unlimited dimensions is 1, which creates a huge number of internal HDF5 structures. By setting the chunksize explicitly (obviously ok for unlimited dimensions), the second example does much better in space and time.
Unlimited dimensions require chunking in HDF5/netCDF4, so if you want unlimited dimensions you have to think about chunking performance, as you have discovered.
More here:
https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf_chunking.html