Write multidimensional numpy array to csv - python

I have a multidimensional numpy array containing function values, and I'd like to write it to a long csv. How can I do that cleanly? I couldn't find a numpy function but maybe I was googling the wrong terms. An example:
#!/usr/bin/python
import csv
import numpy as np
x = np.array([1, 2, 3, 4])
y = np.array([50, 51])
z = np.array([99, 100, 101])
f = np.arange(24).reshape((4, 2, 3)) # Contains f(x, y, z)
assert f.shape == (x.size, y.size, z.size)
## I'd like to create a csv file whose columns are x, y, z, f
## How can I do that?
## np.savetxt("test.csv", a, delimiter=",")
## TypeError: float argument required, not numpy.ndarray
## Works, but does numpy already have a function that does this?
with open("test.csv", "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=",", quotechar="'", quoting=csv.QUOTE_MINIMAL)
writer.writerow(["x", "y", "z", "f"])
for x_index in range(x.size):
for y_index in range(y.size):
for z_index in range(z.size):
writer.writerow([x[x_index], y[y_index], z[z_index],
f[x_index, y_index, z_index]])
I have three vectors x, y, z and an X-by-Y-by-Z array containing function values f(x, y, z). In other words, f[i, j, k] contains the function value f that corresponds to x[i], y[j] and z[k]. Is there a cleaner way to write a long csv with columns x,y,z,f?
Here's head test.csv:
x,y,z,f
1,50,99,0
1,50,100,1
1,50,101,2
1,51,99,3
1,51,100,4
1,51,101,5
2,50,99,6
2,50,100,7
2,50,101,8
Edit: This seems to work as well:
x_y_z = np.array([x for x in itertools.product(x, y, z)])
assert x_y_z.shape[0] == f.size
output_array = np.hstack((x_y_z, f.flatten().reshape((f.size, 1)))
np.savetxt("test2.csv", output_array, comments="", delimiter=",", fmt="%i",
header="x,y,z,f")
Am I reinventing the wheel?

In fact, yes it's lightly more complicated than what it should be.
Given 3 lists x,y and z
import numpy as np
x = [1,2,3]
y = [4,5]
z = [6,7,8]
You need to modify this lists in order to get all possible combinations, use numpy.repeat this way:
new_x = np.array(x).repeat(len(y)*len(z))
print new_x
>> [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3]
new_y = np.array([y]).repeat(len(z),axis=0).repeat(len(x),axis=1)
print new_y
>> [4 4 4 5 5 5 4 4 4 5 5 5 4 4 4 5 5 5]
new_z = np.array([z]).repeat(len(x)*len(y),axis=0)
print new_z
>> [6 7 8 6 7 8 6 7 8 6 7 8 6 7 8 6 7 8]
# reshape y and z just like new_x
new_y = new_y.reshape(new_x.shape)
new_z = new_z.reshape(new_x.shape)
just concatenate them!
# suppose that your vector f
f = np.array(range(len(x)*len(y)*len(z)))
matrix = np.array([new_x,new_y,new_z,f]).T
# or matrix = np.concatenate((np.concatenate((new_x,new_y),axis=1),np.concatenate((new_z,f),axis=1)),axis=1).T
print matrix
>>
[[ 1 4 6 0]
[ 1 4 7 1]
[ 1 4 8 2]
[ 1 5 6 3]
[ 1 5 7 4]
[ 1 5 8 5]
[ 2 4 6 6]
[ 2 4 7 7]
[ 2 4 8 8]
[ 2 5 6 9]
[ 2 5 7 10]
[ 2 5 8 11]
[ 3 4 6 12]
[ 3 4 7 13]
[ 3 4 8 14]
[ 3 5 6 15]
[ 3 5 7 16]
[ 3 5 8 17]]
finally, save the array as csv
np.savetxt('file_name.csv',matrix)

Related

How to get a stratified random sample of indices?

I have an array (pd.Series) of two values (A's and B's, for example).
y = pd.Series(['A','B','A','A','B','B','A','B','A','B','B'])
0 A
1 B
2 A
3 A
4 B
5 B
6 A
7 B
8 A
9 B
10 B
I want to get a random sample of indices from series, but half of the indices must correspond with an A, and the other half must correspond with a B.
For example
get_random_stratified_sample_of_indices(y=y, n=4)
[0, 1, 2, 4]
The indices 0 and 2 correspond with the indices of A's, and the indices of 1 and 4 correspond with the indices of B's.
Another example
get_random_stratified_sample_of_indices(y=y, n=6)
[1, 4, 5, 0, 2, 3]
The order of the returned list of indices doesn't matter but I need it to be even split between indices of A's and B's from the y array.
My plan was to first look at the indices of A's, then take a random sample (size=n/2) of the indices. And then repeat for B.
You can use groupby.sample:
N = 4
idx = (y
.index.to_series()
.groupby(y)
.sample(n=N//len(y.unique()))
.to_list()
)
Output: [3, 8, 10, 1]
Check:
3 A
8 A
10 B
1 B
dtype: object
Here's one way to do it:
def get_random_stratified_sample_of_indices(s, n):
mask = s == 'A'
s1 = s[mask]
s2 = s[~mask]
m1 = n // 2
m2 = m1 if n % 2 == 0 else m1 + 1
i1 = s1.sample(m1).index.to_list()
i2 = s2.sample(m2).index.to_list()
return i1 + i2
Which could be used in this way:
y = pd.Series(['A','B','A','A','B','B','A','B','A','B','B'])
i = get_random_stratified_sample_of_indices(y, 5)
print(i)
print()
print(y[i])
Result:
[6, 2, 7, 10, 5]
6 A
2 A
7 B
10 B
5 B
I think you could use the train_test_split from Scikit-Learn, defining its stratify parameter.
from sklearn.model_selection import train_test_split
import pandas as pd
y = (
pd.Series(["A", "B", "A", "A", "B", "B", "A", "B", "A", "B", "B"])
.T.to_frame("col")
.assign(i=lambda xdf: xdf.index)
)
print(y)
# Prints:
#
# col i
# 0 A 0
# 1 B 1
# 2 A 2
# 3 A 3
# 4 B 4
# 5 B 5
# 6 A 6
# 7 B 7
# 8 A 8
# 9 B 9
# 10 B 10
print('\n')
# ===== Actual solution =====================================
a, b = train_test_split(y, test_size=0.5, stratify=y["col"])
# ===========================================================
print(a)
# Prints:
#
# col i
# 10 B 10
# 6 A 6
# 7 B 7
# 8 A 8
# 4 B 4
print('\n')
print(b)
# Prints:
#
# col i
# 3 A 3
# 9 B 9
# 2 A 2
# 1 B 1
# 5 B 5
# 0 A 0

How to construct a simple matrix and change values according to equation (numpy)?

My question is really simple. I have to make a 5*5 matrix and each i,j value should follow a formula of i+j
I have this so far:
'''
w = np.zeros(shape=(5,5))
print(w)
for i in range(5):
for j in range(5):
w[i][j] == i**2+j
print(w)
But Its just returning a 0 matrix right now what to do ?
Just change
w[i][j] == i**2+j
to (if you want to keep the formular)
w[i,j] = i**2+j
or use the formular from your question
w[i,j] = i+j
If you want to get rid of the loops, you can use numpy
w = np.arange(5)
w = np.add.outer(w ** 2, w)
print(w)
Out:
[[ 0 1 2 3 4]
[ 1 2 3 4 5]
[ 4 5 6 7 8]
[ 9 10 11 12 13]
[16 17 18 19 20]]

An efficient way to concatenate rows of a 2-dim array according to a given list of pairs of indexes

Suppose I have a 2 dimensional array with a very large number of rows, and a list of pairs of indexes of that array. I want to create a new 2 dim array, whose rows are concatenations of the rows of the original array, made according to the list of pairs of indexes. For example:
a =
1 2 3
4 5 6
7 8 9
0 0 0
indexes = [[0,0], [0,1], [2,3]]
the returned array should be:
1 2 3 1 2 3
1 2 3 4 5 6
7 8 9 0 0 0
Obviously I can iterate the list of indexes, but my question is whether there is a more efficient way of doing this. I should say that the list of indexes is also very large.
First convert indexes to a Numpy array:
ind = np.array(indexes)
Then generate your result as:
result = np.concatenate([a[ind[:,0]], a[ind[:,1]]], axis=1)
The result is:
array([[1, 2, 3, 1, 2, 3],
[1, 2, 3, 4, 5, 6],
[7, 8, 9, 0, 0, 0]])
Another possible formula (with the same result):
result = np.concatenate([ a[ind[:,i]] for i in range(ind.shape[1]) ], axis=1)
You can do this in one line using NumPy as:
a = np.arange(12).reshape(4, 3)
print(a)
b = [[0, 0], [1, 1], [2, 3]]
b = np.array(b)
print(b)
c = a[b.reshape(-1)].reshape(-1, a.shape[1]*b.shape[1])
print(c)
'''
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[0 0]
[1 1]
[2 3]]
[[ 0 1 2 0 1 2]
[ 3 4 5 3 4 5]
[ 6 7 8 9 10 11]]
'''
You can use horizontal stacking np.hstack:
c = np.array(indexes)
np.hstack((a[c[:,0]],a[c[:,1]]))
output:
[[1 2 3 1 2 3]
[1 2 3 4 5 6]
[7 8 9 0 0 0]]

(Inverse-) Sorting 2d numpy array column-wise

The following code sorts an 2d numpy array column-wise forth and back
import numpy as np
#Column-wise sort and inverse sort of image (2d array)
nrows = 10
ncols = 5
a = np.random.randint(nrows, size=(nrows, ncols))
a_sorted = np.sort(a, axis=0)
ori_indices = np.zeros_like(a)
for c in range(ncols):
ori_indices[:,c] = np.argsort(np.argsort(a[:,c]))
#Do some work on sorted array, like e.g row-wise filtering
#After processing sorted array, move it back to original order
a_backsorted = np.zeros_like(a)
for c in range(ncols):
a_backsorted[:,c] = a_sorted[:,c][ori_indices[:,c]]
print (a); print ()
print (a_backsorted); print ()
print (a_sorted); print ()
The code work as is but I guess there is a more efficient implementation without for loop (using fancy indexing)
You can try a_sorted[::-1] to reverse the array
print (a_sorted); print ()
print (a_sorted[::-1])
[[0 0 0 2 0]
[2 0 0 2 2]
[4 0 2 6 4]
[4 2 3 7 5]
[4 4 4 7 6]
[5 5 4 8 7]
[6 5 4 8 7]
[7 6 8 9 8]
[8 7 9 9 9]
[8 8 9 9 9]]
[[8 8 9 9 9]
[8 7 9 9 9]
[7 6 8 9 8]
[6 5 4 8 7]
[5 5 4 8 7]
[4 4 4 7 6]
[4 2 3 7 5]
[4 0 2 6 4]
[2 0 0 2 2]
[0 0 0 2 0]]
#Column-wise sort and inverse sort of image (2d array)
import numpy as np
#Define random array and sort it
nrows = 10
ncols = 5
a = np.random.randint(nrows, size=(nrows, ncols))
a_sorted = np.sort(a, axis=0)
#Save original order of columns
ori_indices = np.argsort(np.argsort(a, axis=0), axis=0)
#Do some work on sorted array, like e.g row-wise filtering.
#....
#After processing sorted array, move it back to original order:
c=np.array([[i] for i in range(ncols)]).T
a_backsorted = a_sorted[ori_indices, c]
#Check results
print (a); print ()
print (a_backsorted); print ()
print (a_sorted); print ()
import numpy as np
nrows = 10; ncols = 5
a = np.random.randint(nrows, size=(nrows, ncols))
a_sorted = np.sort(a, axis=0)
a_backsorted = np.zeros_like(a)
c = np.array([[i] for i in range(ncols)]).T
a_backsorted[np.argsort(a, axis=0), c] = a_sorted
The reverting of the column-wise sorting is done by inserting the values of the sorted array at the argsorted positions in the backsorted array. Since this is done columnwise, the argsorted positions are paired with the columns represented in the c array

Creating a subarray with no of aubarrays passed as arguments in python

I have a large 100x15 array like this:
[a b c d e f g h i j k l m n o]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
.
.
.(Up to 100 rows)
I want to select a portion of this data into a subset using a function which has an argument 'k' in which 'k' denotes the no of subsets to be made, like say k=5 means the data attributes are divided into 3 subsets like below:
[a b c d e] [f g h i j] [k l m n o]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
[1 2 3 4 5] [6 7 8 9 10] [11 12 13 14 15]
.
.
.(Up to 100 rows)
and they are stored in a different array. I want to implement this using python. I have implemented this partially. Can any one implement this and provide me the code in the answer?
Partial logic for the inner loop
given k
set start_index = 0
end_index = length of array/k = increment
for j from start_index to end_index
start_index=end_index + 1
end_index = end_index + increment
//newarray[][] (I'm not sure abt here)
Thank You.
This returns an array of matrices with columnsize = 2 , which works for k=2:
import numpy as np
def portion(mtx, k):
array = []
array.append( mtx[:, :k])
for i in range(1, mtx.shape[1]-1):
array.append( mtx[:, k*i:k*(i+1)])
return array[:k+1]
mtx = np.matrix([[1,2,3,10,13,14], [4,5,6,11,15,16], [7,8,9,12,17,18]])
k = 2
print(portion(mtx, k))
Unfortunately I have to do it myself and this is the code in python for the logic. Anyway thanks to #astaning for the attempt.
def build_rotationtree_model(k):
mtx =np.array([[2.95,6,63,23],[2,53,7,79],[3.57,5,65,32],[3.16,5,47,34],[21,2.58,4,46],[3.1,2.16,6,22],[3.5,3.27,3,52],[12,2.56,4,42]])
#Length of attributes (width of matrix)
a = mtx.shape[1]
newArray =[[0 for x in range(k)] for y in range(len(mtx))]
#Height of matrix(total rows)
b = mtx.shape[0]
#Seperation limit
limit = a/k
#Starting of sub matrix
start = 0
#Ending of sub matrix
end = a/k
print(end)
print(a)
#Loop
while(end != a):
for i in range(0,b-1):
for j in range(start,int(end)):
newArray[i][j] = mtx[i][j]
print(newArray[i])
#Call LDA function and add the result to Sparse Matrix
#sparseMat = LDA(newArray) SHould be inside a loop
start = end + 1
end = end + limit
a=list(input())
for i in range(0,len(a)):
for j in range(i,len(a)):
for k in range(i,j+1):
print(a[k],end=" ")
print("\n",end="")

Categories