Numpy mean for big array dataset using For loop Python

Numpy mean for big array dataset using For loop Python - python

I have big dataset in array form and its arranged like this:
Rainfal amount arranged in array form
Average or mean mean for each latitude and longitude at axis=0 is computed using this method declaration:
Lat=data[:,0]
Lon=data[:,1]
rain1=data[:,2]
rain2=data[:,3]
--
rain44=data[:,44]
rainT=[rain1,rain2,rain3,rain4,....rain44]
mean=np.mean(rainT)
The result was aweseome but requires time computation and I look forward to use For Loop to ease the calculation. As for the moment the script that I used is like this:
mean=[]
lat=data[:,0]
lon=data[:,1]
for x in range(2,46):
rainT=data[:,x]
mean=np.mean(rainT,axis=0)
print mean
But weird result is appeared. Anyone?

First, you probably meant to make the for loop add the subarrays rather than keep replacing rainT with other slices of the subarray. Only the last assignment matters, so the code averages that one subarray rainT=data[:,45], also it doesn't have the correct number of original elements to divide by to compute an average. Both of these mistakes contribute to the weird result.
Second, numpy should be able to average elements faster than a Python for loop can do it since that's just the kind of thing that numpy is designed to do in optimized native code.
Third, your original code copies a bunch of subarrays into a Python List, then asks numpy to average that. You should get much faster results by asking numpy to sum the relevant subarray without making a copy, something like this:
rainT = data[:,2:] # this gets a view onto data[], not a copy
mean = np.mean(rainT)
That computes an average over all the rainfall values, like your original code.
If you want an average for each latitude or some such, you'll need to do it differently. You can average over an array axis, but latitude and longitude aren't axes in your data[].

Thanks friends, you are giving me such aspiration. Here is the working script ideas by #Jerry101 just now but I decided NOT to apply Python Loop. New declaration would be like this:
lat1=data[:,0]
lon1=data[:,1]
rainT=data[:,2:46] ---THIS IS THE STEP THAT I AM MISSING EARLIER
mean=np.mean(rainT,axis=1)*24 - MAKE AVERAGE DAILY RAINFALL BY EACH LAT AND LON
mean2=np.array([lat1,lon1,mean])
mean2=mean2.T
np.savetxt('average-daily-rainfall.dat2',mean2,fmt='%9.3f')
And finally the result is exactly same to program made in Fortran.

Related

How many time can you fit one array to a specific size array (python)

I want to create a randomized array that contains another array few times,
So if:
Big_Array = np.zeroes(5,5)
Inner_array = no.array([[1,1,1],
[2,1,2]])
And if we want 2 Inner_array it could look like:
Big_Array = [[1,2,0,0,0],
[1,1,0,0,0],
[1,2,0,0,0],
[0,0,2,1,2],
[0,0,1,1,1]]
I would like to write a code that will
A. Tell whether the bigger array can fit the required amount of inner arrays, and
B. place randomly the inner array (in random rotations) x amount of times in the big array without overlap
Thanks in advance!

If I understood correctly, you'd like to sample valid tilings of a square which contain a specified amount of integral-sided rectangles.
This is a special case of the exact cover problem, which is NP-complete, so in general I'd expect there to be no really efficient solutions, but you could solve it using Knuth's algorithm x. It would take a while to code yourself though.
There are also a few implementations of DLX online, such as this one from code review SE (not sure what the copyright on that is though).

Vectorize a lookup between two numbers

I am trying to find a way to vectorize the following for loop using numpy. This for loop is making my code really drag. The problem that I am having is that I need to look up a value sitting in the dictionary, d, based on the index where the value,val, falls in a range within the array, row.
for i in range(len(row)-1):
if row[i]<val<=row[i+1]:
return d[i]*row[-1]
I would imagine that I could use np.where and np.logical_and to get between two numbers in the array, but then I need the index to grab the value from a dictionary, and that is the part that I just can't seem to figure out without the loop.

Thanks to Divakar's comment, I think that the right answer is to replace the entire for-loop with this numpy monstrosity:
np.vectorize(d1.get)((np.searchsorted(row[:-1],vals,side='left'))-1)*row[-1]

python array initialisation (preallocation) with nans

I want to initialise an array that will hold some data. I have created a random matrix (using np.empty) and then multiplied it by np.nan. Is there anything wrong with that? Or is there a better practice that I should stick to?
To further explain my situation: I have data I need to store in an array. Say I have 8 rows of data. The number of elements in each row is not equal, so my matrix row length needs to be as long as the longest row. In other rows, some elements will not be filled. I don't want to use zeros since some of my data might actually be zeros.
I realise I can use some value I know my data will never, but nans is definitely clearer. Just wondering if that can cause any issues later with processing. I realise I need to use nanmax instead of max and so on.

I have created a random matrix (using np.empty) and then multiplied it by np.nan. Is there anything wrong with that? Or is there a better practice that I should stick to?
You can use np.full, for example:
np.full((100, 100), np.nan)
However depending on your needs you could have a look at numpy.ma for masked arrays or scipy.sparse for sparse matrices. It may or may not be suitable, though. Either way you may need to use different functions from the corresponding module instead of the normal numpy ufuncs.

A way I like to do it which probably isn't the best but it's easy to remember is adding a 'nans' method to the numpy object this way:
import numpy as np
def nans(n):
return np.array([np.nan for i in range(n)])
setattr(np,'nans',nans)
and now you can simply use np.nans as if it was the np.zeros:
np.nans(10)

Calculate the value of a function for each row in a matrix without iteration through all rows

I'm developing a genetic program and by now the whole algorithm appears to be fine. (Albeit slow...).
I'm iterating through lists of real values, one at a time and then applying a function to the list. The format is something like :
trainingset=[[3.32,55,33,22],[3.322,5,3,223],[23.32,355,33,122]...]]
Where each inner list represents a line in the set and the last item of that list is the result of the regression in that line/individual.
The function I use is some thing like:
def getfitness(individual,set):
...
for elem in set:
apply the function individual to it
fitness=fitness+(set[-1]-(result of individual with the parameters of the set))
fitness=RMS(fitness)
return fitness
So, what I'de like to know is , is there a way of calculating the function in one go, are there any libs that can do this ? I've been looking at matrices in numpy but to no avail.
Thanks in advance.
Jorge

Creating and Storing Multi-Dimensional Array in a netCDF File

This question has potentially two parts but maybe only one if the first part can be encapsulated by the second. I am using python with numpy and netCDF4
First:
I have four lists of different variable values (hereafter referred to elevation values) each of which has a length of 28. These four lists are one set of 5 different latitude values of which are one set of the 24 different time values.
So 24 times...each time with 5 latitudes...each latitude with four lists...each list with 28 values.
I want to create an array with the following dimensions (elevation, latitude, time, variable)
In words, I want to be able to specify which of the four lists I access,which index in the list, and specify a specific time and latitude. So an index into this array would look like this:
array(0,1,2,3) where 0 specifies the first index of the the 4th list specified by the 3. 1 specifies the 2nd latitude, and 2 specifies the 3rd time and the output is the value at that point.
I won't include my code for this part since literally the only things of mention are the lists
list1=[...]
list2=[...]
list3=[...]
list4=[...]
How can I do this, is there an easier structure of the array, or is there anything else I a missing?
Second:
I have created a netCDF file with variables with these four dimensions. I need to set those variables to the array structure made above. I have no idea how to do this and the netCDF4 documentation does a 1-d array in a fairly cryptic way. If the arrays can be made directly into the netCDF file bypassing the need to use numpy first, by all means show me how.
Thanks!

After talking to a few people where I work we came up with this solution:
First we made an array of zeroes using the following argument:
array1=np.zeros((28,5,24,4))
Then appended this array by specifying where in the array we wanted to change:
array1[:,0,0,0]=list1
This inserted the values of the list into the first entry in the array.
Next to write the array to a netCDF file, I created a netCDF in the same program I made the array, made a single variable and gave it values like this:
netcdfvariable[:]=array1
Hope that helps anyone who finds this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.