Working with multiple columns from a data file

Working with multiple columns from a data file - python

I have a file in which I need to use the first column. The remaining columns need to be integrated with respect to the first. Lets say my file looks like this:
100 1.0 1.1 1.2 1.3 0.9
110 1.8 1.9 2.0 2.1 2.2
120 1.8 1.9 2.0 2.1 2.2
130 2.0 2.1 2.3 2.4 2.5
Could I write a piece of code that takes the second column and integrates with the first then the third and integrates with respect to the first and so on? For my code I have:
import scipy as sp
first_col=dat[:,0] #first column from data file
cols=dat[:,1:] #other columns from data file
col2 = cols[:,0] # gets the first column from variable cols
I = sp.integrate.cumtrapz(col2, first_col, initial = 0) #integration step
This works only for the first row from the variable col, however, I don't want to write this out for all the other columns, it would look discussing (the thought of it makes me shiver). I have seen similar questions but haven't been able to relate the answers to mine and the ones that are more or less the same have vague answers. Any ideas?

The function cumtrapz accepts an axis argument. For example, suppose you put your first column in x and the remaining columns in y, and they have these values:
In [61]: x
Out[61]: array([100, 110, 120, 130])
In [62]: y
Out[62]:
array([[ 1.1, 2.1, 2. , 1.1, 1.1],
[ 2. , 2.1, 1. , 1.2, 2.1],
[ 1.2, 1. , 1.1, 1. , 1.2],
[ 2. , 1.1, 1.2, 2. , 1.2]])
You can integrate each column of y with respect to x as follows:
In [63]: cumtrapz(y, x=x, axis=0, initial=0)
Out[63]:
array([[ 0. , 0. , 0. , 0. , 0. ],
[ 15.5, 21. , 15. , 11.5, 16. ],
[ 31.5, 36.5, 25.5, 22.5, 32.5],
[ 47.5, 47. , 37. , 37.5, 44.5]])

Related

turning a list of numpy.ndarray to a matrix in order to perform multiplication

i have vectors of this form :
test=np.linspace(0,1,10)
i want to stack them horizontally in order to make a matrix .
problem is that i define them in a loop so the first stack is between an empty matrix and the first column vector , which gives the following error:
ValueError: all the input arrays must have same number of dimensions
bottom line - i have a for loop that with every iteration creates a vector p1 and i want to add it to a final matrix of the form :
[p1 p2 p3 p4] which i could then do matrix operations on such as multiplying by the transposed etc

If you've got a list of 1D arrays that you want horizontally stacked, you could convert them all to column first, but it's probably easier to just vertically stack them and then transpose:
In [6]: vector_list = [np.linspace(0, 1, 10) for _ in range(3)]
In [7]: np.vstack(vector_list).T
Out[7]:
array([[0. , 0. , 0. ],
[0.11111111, 0.11111111, 0.11111111],
[0.22222222, 0.22222222, 0.22222222],
[0.33333333, 0.33333333, 0.33333333],
[0.44444444, 0.44444444, 0.44444444],
[0.55555556, 0.55555556, 0.55555556],
[0.66666667, 0.66666667, 0.66666667],
[0.77777778, 0.77777778, 0.77777778],
[0.88888889, 0.88888889, 0.88888889],
[1. , 1. , 1. ]])

How did you get this dimension error? What does empty array have to do with it?
A list of arrays of the same length:
In [610]: alist = [np.linspace(0,1,6), np.linspace(10,11,6)]
In [611]: alist
Out[611]:
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]),
array([10. , 10.2, 10.4, 10.6, 10.8, 11. ])]
Several ways of making an array from them:
In [612]: np.array(alist)
Out[612]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
In [614]: np.stack(alist)
Out[614]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
If you want to join them in columns, you can transpose one of the above, or use:
In [615]: np.stack(alist, axis=1)
Out[615]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
np.column_stack is also handy.
In newer numpy versions you can do:
In [617]: np.linspace((0,10),(1,11),6)
Out[617]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
You don't specify how you create the 'empty array' and how you attempt to stack. I can't exactly recreate the error message (full traceback would have helped). But given that message did you check the number of dimensions of the inputs? Did they match?
Array stacking in a loop is tricky. You have to pay close attention to the shapes, especially of the initial 'empty' array. There isn't a close analog to the empty list []. np.array([]) is 1d with shape (1,). np.empty((0,6)) is 2d with shape (0,6). Also all the stacking functions create a new array with each call (non operate in-place), so they are inefficient (compared to list append).

Generating multiple vandermonde arrays

I have a function that creates a 2-dim array, a Vandermonde matrix and is called as:
vandermonde(generator, rank)
Where generator is a n-sized array for example
generator = np.array([-1/2, 1/2, 3/2, 5/2, 7/2, 9/2])
and rank=4
Then I need to create 4 Vandermonde matrices (because rank=4) skewed by h in my space (that h is arbitrary here, lets call h=1).
Therefore I came with the following deterministic code:
V = np.array([
vandermonde(generator-0*h, rank),
vandermonde(generator-1*h, rank),
vandermonde(generator-2*h, rank),
vandermonde(generator-3*h, rank)
])
Then I want instead do multiple manual calls to vandermonde I used a for-loop as in:
V=[]
for i in range(rank):
V.append(vandermonde(generator - h*i, rank))
V = np.array(V)
This approach works fine, but seems too "patchy". I tried a np.append approach as below:
M = np.array([])
for i in range(rank):
M = np.append(M,[vandermonde(generator - h*i, rank)])
But didn't worked as I expected, seems np.append expand the array instead to create a new element.
My questions are:
How can I not use standard Python lists, use directly a np approach cause np.append seems not behave as I expect, instead it just grow that array instead add a new array element
Is there any more direct numpy approaches to this?
My vandermonde function is:
def vandermonde(generator, rank=None):
"""Returns a vandermonde matrix
If rank not passwd returns a square vandermonde matrix
"""
if rank is None:
rank = len(generator)
return np.tile(generator,(rank,1)) ** np.array(range(rank)).reshape((rank,1))
The expected answer is a 3 dimensional array with size (generator, rank, rank) where each element is one of the generator skewed vandermonde matrices. For the constants above(generator, rank, h) we have:
V= array([[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -0.5 , 0.5 , 1.5 , 2.5 , 3.5 , 4.5 ],
[ 0.25, 0.25, 2.25, 6.25, 12.25, 20.25],
[ -0.12, 0.12, 3.38, 15.62, 42.88, 91.12]],
[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -1.5 , -0.5 , 0.5 , 1.5 , 2.5 , 3.5 ],
[ 2.25, 0.25, 0.25, 2.25, 6.25, 12.25],
[ -3.38, -0.12, 0.12, 3.38, 15.62, 42.88]],
[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -2.5 , -1.5 , -0.5 , 0.5 , 1.5 , 2.5 ],
[ 6.25, 2.25, 0.25, 0.25, 2.25, 6.25],
[-15.62, -3.38, -0.12, 0.12, 3.38, 15.62]],
[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -3.5 , -2.5 , -1.5 , -0.5 , 0.5 , 1.5 ],
[ 12.25, 6.25, 2.25, 0.25, 0.25, 2.25],
[-42.88, -15.62, -3.38, -0.12, 0.12, 3.38]]])
Some related ideas can be found in this discussion on: efficient-way-to-compute-the-vandermonde-matrix

Use broadcasting to get the final 3D array in a vectorized manner -
r = np.arange(rank)
V_out = (generator - h*r[:,None,None]) ** r[:,None]
We can also use cumprod to achieve the exponential values for another solution -
gr = np.repeat(generator - h*r[:,None,None], rank, axis=1)
gr[:,0] = 1
out = gr.cumprod(1)

python: performance enhancement shrinking image

I have written the following code to scale an image to 50%. However, it took this algorithm 65 seconds to shrink a 3264x2448 image. Can someone who understands numpy explain why this algorithm is so inefficient and suggest more efficient changes?
def shrinkX2(im):
X, Y = im.shape[1] / 2, im.shape[0] / 2
new = np.zeros((Y, X, 3))
for y in range(Y):
for x in range(X):
new[y, x] = im[2*y:2*y + 2, 2*x:2*x + 2].reshape(4, 3).mean(axis=0)
return new

Going by the text of the question, it seems you are shrinking the image by 50% and by the code it seems, you are doing it in blocks. We can reshape to split each of the two axes of the 2D input by lengths as the required block sizes to get a 4D array and then compute mean along the axes corresponding to the block sizes, like so -
def block_mean(im, BSZ):
m,n = im.shape[:2]
return im.reshape(m//BSZ[0],BSZ[0],n//BSZ[1],BSZ[1],-1).mean((1,3))
Sample run -
In [44]: np.random.seed(0)
...: im = np.random.randint(0,9,(6,8,3))
In [45]: im[:2,:2,:].mean((0,1)) # average of first block across all 3 channels
Out[45]: array([3.25, 3.75, 3.5 ])
In [46]: block_mean(im, BSZ=(2,2))
Out[46]:
array([[[3.25, 3.75, 3.5 ],
[4. , 4.5 , 3.75],
[5.75, 2.75, 5. ],
[3. , 3.5 , 3.25]],
[[4. , 5.5 , 5.25],
[6.25, 1.75, 2. ],
[4.25, 2.75, 1.75],
[2. , 4.75, 3.75]],
[[3.25, 3.5 , 5.25],
[4.25, 1.5 , 5.25],
[3.5 , 3.5 , 4.25],
[0.75, 5. , 5.5 ]]])

Wrong CSV printing in Python (enumerating numpy array)

I apologize if this question looks like a duplicate. I am trying to write a 7x2 array to a .csv file. The array I want to print is called x5:
x5
Out[47]:
array([[ 0.5, 1. ],
[ 0.7, 3. ],
[ 1.1, 5. ],
[ 1.9, 6. ],
[ 2. , 7. ],
[ 2.2, 9. ],
[ 3.1, 10. ]])
The code I use:
import time
import csv
import numpy
timestr = time.strftime("%Y%m%d-%H%M%S")
with open('mydir\\AreaIntCurve'+'_'+str(timestr)+'.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Unique value', ' Occurrences'])
for m, val in numpy.ndenumerate(x5):
writer.writerow([m, val])
The result I get:
Unique value, Occurrences
"(0, 0)",0.5
"(0, 1)",1.0
"(1, 0)",0.69999999999999996
"(1, 1)",3.0
"(2, 0)",1.1000000000000001
"(2, 1)",5.0
"(3, 0)",1.8999999999999999
"(3, 1)",6.0
"(4, 0)",2.0
"(4, 1)",7.0
"(5, 0)",2.2000000000000002
"(5, 1)",9.0
"(6, 0)",3.1000000000000001
"(6, 1)",10.0
The result I want:
Unique value, Occurrences
0.5, 1
0.7, 3
1.1, 5
1.9, 6
2.0, 7
2.2, 9
3.1, 10
I assume the problem is with ndenumerate(x5), which prints the coordinates of my values. I have tried different approaches like numpy.savetxt, but it does not produce what I want and also does not print the current date in the file name. How to amend the ndenumerate() call to get rid of the value coordinates, while keeping the current date in the file name? Thanks a lot!

Here's an alternative that uses numpy.savetxt instead of the csv library:
In [17]: x5
Out[17]:
array([[ 0.5, 1. ],
[ 0.7, 3. ],
[ 1.1, 5. ],
[ 1.9, 6. ],
[ 2. , 7. ],
[ 2.2, 9. ],
[ 3.1, 10. ]])
In [18]: np.savetxt('foo.csv', x5, fmt=['%4.1f', '%4i'], header='Unique value, Occurrences', delimiter=',', comments='')
In [19]: !cat foo.csv
Unique value, Occurrences
0.5, 1
0.7, 3
1.1, 5
1.9, 6
2.0, 7
2.2, 9
3.1, 10

replace this line
for m, val in numpy.ndenumerate(x5):
writer.writerow([m, val])
with:
for val in x5:
writer.writerow(val)
you dont need to do ndenumerate

Have you tried replacing your two last lines of code with
for x in x5:
writer.writerow(x)
?
You may be surpised to see 1.8999999999999999 instead of 1.9 in your csv result; that is because 1.9 cannot be represented exactly in floating point arithmetics (see this question).
If you want to limit the number of digits to 3, you can replace the last line with writer.writerow([["{0:.3f}".format(val) for val in x]])
But this will also add three zeroes to integer values. Since you can check if a float is an integer with is_integer(), you can avoid this with
writer.writerow([str(y) if y.is_integer() else "{0:.3f}".format(y) for y in x])

Using Python to change dimensions of numpy array

I have data in an array.
The first column is time. Second, latitude, third longitude, fourth precipitation
Sample:
2 70 100 5.6
2 70 110 5.9
2 80 100 6.2
2 80 110 5.0
3 70 100 2.3
3 70 110 1.1
3 80 100 0.0
3 80 110 7.9
I would like to convert this into an array where the y axis is longitude, the z axis is latitude, and the x axis is time. Precipitation amounts will be located at each 3d grid point.
For instance, in the following image:
The sizes of the bubbles represent different precipitation amounts (ignore the colors)
How can I use python to do this?
So far I have:
import numpy as np<br>
a=open('time.dat') #original file
b=open('three.dat','w+')
dif=np.fromfile(a)
tim=dif[:,[0]]
lat=dif[:,[1]]
lon=dif[:,[2]]
pre=dif[:,[3]]
c=np.empty(780,360,720)
780 time steps, 360 latitudes, 720 longitudes

So you want a 2 dimensional array with the inner dimension containing all of the data, and the outer dimension ordered by lon, lat, time.
You can read in the file as a array of values, convert to a 2d array to group them into each 4 tuple. Then translate the column order of the inner array. Next sort the outer dimension on the inner dimension.
>>> data = np.array([2, 70, 100, 5.6, 2, 70, 110, 5.9, 2, 80, 100, 6.2, 2, 80, 110, 5.0, 3, 70, 100, 2.3, 3, 70, 110, 1.1, 3, 80, 100, 0.0, 3, 80, 110, 7.9])
>>> data2 = data.reshape((8, 4))
>>> data2
array([[ 2. , 70. , 100. , 5.6],
[ 2. , 70. , 110. , 5.9],
[ 2. , 80. , 100. , 6.2],
[ 2. , 80. , 110. , 5. ],
[ 3. , 70. , 100. , 2.3],
[ 3. , 70. , 110. , 1.1],
[ 3. , 80. , 100. , 0. ],
[ 3. , 80. , 110. , 7.9]])
>>> data2 = data2[:,[1,2,0,3]]
>>> data2
array([[ 70. , 100. , 2. , 5.6],
[ 70. , 110. , 2. , 5.9],
[ 80. , 100. , 2. , 6.2],
[ 80. , 110. , 2. , 5. ],
[ 70. , 100. , 3. , 2.3],
[ 70. , 110. , 3. , 1.1],
[ 80. , 100. , 3. , 0. ],
[ 80. , 110. , 3. , 7.9]])
The goofiness with view and sort described here

You can't use the numpy reshape for a simple reason : you have data duplicity in your original array (time and positions) and not in the result you want. Before and after a reshape the number of elements must be the same.
You have to do a loop to read your initial array and fill your new array.
Hope it helped

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Working with multiple columns from a data file - python

Related

turning a list of numpy.ndarray to a matrix in order to perform multiplication

Generating multiple vandermonde arrays

python: performance enhancement shrinking image

Wrong CSV printing in Python (enumerating numpy array)

Using Python to change dimensions of numpy array

Categories

Resources