using python to read a column vector from excel - python

OK i think this must be a super simple thing to do, but i keep getting index error messages no matter how i try to format this. my professor is making us multiply a 1X3 row vector by a 3x1 column vector, and i cant get python to read the column vector. the row vector is from cells A1-C1, and the column vector is from cells A3-A5 in my excel spreadsheet. I am using the right "format" for how he wants us to do it, (if i do something that works, but don't format it the way he likes i don't get credit.) the row vector is reading properly in the variable explorer, but i am only getting a 2x2 column vector (with the first column being the 0th column and being all zeros, again how he wants it), I havent even gotten to the multiplication part of the code because i cant get python to read the column vector correctly. here is the code:
import xlwings as xw
import numpy as np
filename = 'C:\\python\\homework4.xlsm'
wb=xw.Workbook(filename)
#initialize vectors
a = np.zeros((1+1,3+1))
b = np.zeros((3+1,1+1))
n=3
#Read a and b vectors from excel
for i in range(1,n+1):
for j in range(1,n+1):
a[i,j] = xw.Range((i,j)).value
'end j'
b[i,j] = xw.Range((i+2,j)).value
'end i'

Something like this should work. The way you iterate over i and j are wrong (plus the initalization of a and b)
#initialize vectors
a = np.zeros((1,3))
b = np.zeros((3,1))
n=3
#Read a and b vectors from excel
for i in range(0,n):
a[0,i] = xw.Range((1,i+1)).value
for i in range (0,n)
b[i,0] = xw.Range((3+i,1)).value

Remember, Python use 0-based indexing and Excel use 1-based indexing.
This code will read out the vectors properly, and then you can check on numpy "scalar product" to produce the multiplication. You can also assign the whole vectors immediately without loop.
import xlwings as xw
import numpy as np
filename = 'C:\\Temp\\Book2.xlsx'
wb=xw.Book(filename).sheets[0]
n=3
#initialize vectors
a = np.zeros((1,n))
b = np.zeros((n,1))
#Read a and b vectors from excel
for j in range(1,n+1):
a[0, j-1] = wb.range((1, j)).value
b[j-1, 0] = wb.range((j+3-1, 1)).value
#Without loop
a = wb.range((1, 1),(1, 3)).value
b = wb.range((3, 1),(5, 1)).value

Related

How do I save a N x M array/list using Pandas?

I have a N x M numpy array / list. I want to save this matrix into a .csv file using Pandas. Unfortunately I don't know a priori the values of M and N which can be large. I am interested in Pandas because I find it manageable in terms of data columns access.
Let's start with this MWE:
import numpy as np
import pandas as pd
N,M = np.random.randint(10,100, size = 2)
A = np.random.randint(10, size = (N,M))
columns = []
for i in range(len(A[0,:])):
columns.append( "column_{} ".format(i) )
I cannot do something like pd.append( ) i.e. appending columns with new additional indices via a for loop.
Is there a way to save A into a .csv file?
Following the comment of Quang Hoang, there are 2 possibilities:
pd.DataFrame(A).to_csv('yourfile.csv').
np.save("yourfile.npy",A) and then A = np.load("yourfile.npy").

Vectorization - how to append array without loop for

I have the following code:
x = range(100)
M = len(x)
sample=np.zeros((M,41632))
for i in range(M):
lista=np.load('sample'+str(i)+'.npy')
for j in range(41632):
sample[i,j]=np.array(lista[j])
print i
to create an array made of sample_i numpy arrays.
sample0, sample1, sample3, etc. are numpy arrays and my expected output is a Mx41632 array like this:
sample = [[sample0],[sample1],[sample2],...]
How can I compact and make more quick this operation without loop for? M can reach also 1 million.
Or, how can I append my sample array if the starting point is, for example, 1000 instead of 0?
Thanks in advance
Initial load
You can make your code a lot faster by avoiding the inner loop and not initialising sample to zeros.
x = range(100)
M = len(x)
sample = np.empty((M, 41632))
for i in range(M):
sample[i, :] = np.load('sample'+str(i)+'.npy')
In my tests this took the reading code from 3 seconds to 60 miliseconds!
Adding rows
In general it is very slow to change the size of a numpy array. You can append a row once you have loaded the data in this way:
sample = np.insert(sample, len(sample), newrow, axis=0)
but this is almost never what you want to do, because it is so slow.
Better storage: HDF5
Also if M is very large you will probably start running out of memory.
I recommend that you have a look at PyTables which will allow you to store your sample results in one HDF5 file and manipulate the data without loading it into memory. This will in general be a lot faster than the .npy files you are using now.
It is quite simple with numpy. Consider this example:
import numpy as np
l = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
#create an array with 4 rows and 3 columns
arr = np.zeros([4,3])
arr[:,:] = l
You can also insert rows or columns separately:
#insert the first row
arr[0,:] = l[0]
You just have to provide that dimensions are the same.

python Numpy transpose and calculate

I am relatively new to python and numpy. I am currently trying to replicate the following table as shown in the image in python using numpy.
As in the figure, I have got the columns "group, sub_group,value" that are populated. I want to transpose column "sub_group" and do a simple calculation i.e. value minus shift(value) and display the figure in the lower diagonal of the matrix for each group. If sub_group is "0", then assign the whole column as 0. The transposed sub_group can be named anything (preferably index numbers) if it makes it easier. I am ok with a pandas solution as well. I just think pandas may be slow?
Below is code in array form:
import numpy as np
a=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1),(1,0,12),(1,-5,16)], dtype=[('group',float),('sub_group',float),('value',float)])
Any help would be appreciated.
Regards,
S
Try this out :
import numpy as np
import pandas as pd
a=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1),(1,0,12),(1,-5,16)], dtype=[('group',float),('sub_group',float),('value',float)])
df = pd.DataFrame(a)
for i in df.index:
col_name = str(int(df['sub_group'][i]))
df[col_name] = None
if df['sub_group'][i] == 0:
df[col_name] = 0
else:
val = df['value'][i]
for j in range(i, df.index[-1]+1):
df[col_name][j] = val - df['value'][j]
For the upper triangle of the matrix, I have put Nonevalues. You can replace it by whatever you want.
This piece of code does the calculation for the example of the subgroup, I am not sure if this is what you actually want, in that case post a comment here and I will edit
import numpy as np
array_1=np.array([(1,-1,10),(1,0,10),(1,-2,15),(1,-3,1),(1,-4,1);(1,0,12),(1,-5,16)])
#transpose the matrix
transposed_group = array_1.transpose()
#loop over the first row
for i in range(0,len(transposed_group[1,:])):
#value[i] - first value of the row
transposed_group[0,i] = transposed_group[0,i] - transposed_group[0,0]
print transposed_group
In case you want to display that in the diagonal of the matrix, you can loop through the rows and columns, as for example:
import numpy as np
#create an array of 0
array = np.zeros(shape=(3,3))
#fill the array with 1 in the diagonals
print array
#loop over rows
for i in range(0,len(array[:,1])):
#loop over columns
for j in range(0,len(array[1,:])):
array[i,j] = 1
print array

Updating a NumPy array by adding columns

I am working with a large dataset and I would like to make a new array by adding columns, updating the array by opening a new file, taking a piece from it and adding this to my new array.
I have already tried the following code:
import numpy as np
Powers = np.array([])
with open('paths powers.tex', 'r') as paths_list:
for file_path in paths_list:
with open(file_path.strip(), 'r') as file:
data = np.loadtxt(file_path.strip())
Pname = data[0:32446,0]
Powers = np.append(Powers,Pname, axis = 1)
np.savetxt("Powers.txt", Powers)
However, what it does here is just adding the stuff from Pname in the bottom of the array, making a large 1D array instead of adding new columns and making an ndarray.
I have also tried this with numpy.insert, numpy.hstack and numpy.concatenate and I tried changing the shape of Pname. Unfortunately, they all give me the same result.
Have you tried numpy.column_stack?
Powers = np.column_stack([Powers,Pname])
However, the array is empty first, so make sure that the array isn't empty before concatenating or you will get a dimension mismatch error:
import numpy as np
Powers = np.array([])
with open('paths powers.tex', 'r') as paths_list:
for file_path in paths_list:
with open(file_path.strip(), 'r') as file:
data = np.loadtxt(file_path.strip())
Pname = data[0:32446,0]
if len(Powers) == 0:
Powers = Pname[:,None]
else:
Powers = np.column_stack([Powers,Pname])
np.savetxt("Powers.txt", Powers)
len(Powers) will check the amount of rows that exist in Powers. At the start, this should be 0 so at the first iteration, this is true and we will need to explicitly make Powers equal to a one column 2D array that consists of the first column in your file. Powers = Pname[:,None] will help you do this, which is the same as Powers = Pname[:,np.newaxis]. This transforms a 1D array into a 2D array with a singleton column. Now, the problem is that when you have 1D arrays in numpy, they are agnostic of whether they are rows or columns. Therefore, you must explicitly convert the arrays into columns before appending. numpy.column_stack takes care of that for you.
However, you'll also need to make sure that the Powers is a 2D matrix with one column the first time the loop iterates. Should you not want to use numpy.column_stack, you can still certainly use numpy.append, but make sure that what you're concatenating to the array is a column. The thing we talked about above should help you do this:
import numpy as np
Powers = np.array([])
with open('paths powers.tex', 'r') as paths_list:
for file_path in paths_list:
with open(file_path.strip(), 'r') as file:
data = np.loadtxt(file_path.strip())
Pname = data[0:32446,0]
if len(Powers) == 0:
Powers = Pname[:,None]
else:
Pname = Pname[:,None]
Powers = np.append(Powers, Pname, axis=1)
np.savetxt("Powers.txt", Powers)
The second statement ensures that the array becomes a 2D array with a singleton column before concatenating.

Fastest way to get elements from a numpy array and create a new numpy array

I have numpy array called data of dimensions 150x4
I want to create a new numpy array called mean of dimensions 3x4 by choosing random elements from data.
My current implementation is:
cols = (data.shape[1])
K=3
mean = np.zeros((K,cols))
for row in range(K):
index = np.random.randint(data.shape[0])
for col in range(cols):
mean[row][col] = data[index][col]
Is there a faster way to do the same?
You can specify the number of random integers in numpy.randint (third argument). Also, you should be familiar with numpy.array's index notations. Here, you can access all the elements in one row by : specifier.
mean = data[np.random.randint(0,len(data),3),:]

Categories