picking out rows and columns in matrix - python

In Python, I have defined my matrix the following way
A = [[1, 4, 5, 12],
[-5, 8, 9, 0],
[-6, 7, 11, 19],
[-2, 7, 4, 23]]
and wanted to try printing out the individual columns and rows by the following
print(A[2][:]) and print(A[:][2]) for the 3rd row and 3rd column, respectively.
To my surprise, they both printed the 3rd row.
For the purpose of learning, I am not using Numpy or any math packages.
Not sure why print(A[2][:]) and print(A[:][2]) result in the same output

This seems to follow the same principles of accessing various elements in lists. Namely, by calling A[2] for example, you are calling the entire data in the 3rd element (which in this case is its own list). By further stating A[2][:] you are calling all elements within the 3rd list. The same logic is applied to the A[:][2] call - namely, all the information is called within your A list and then you further stipulate that you want the 3rd element shown - this is namely the 3rd 'sublist' so to say.

To extract a column, you need to manually loop throw all rows and extract the elements, e.g., with a list comprehension:
third_column = [row[2] for row in A]

Related

How to update only selected values in a 2 dimensional list without using for loop?

I have a huge matrix with around 80000 rows and 66000 columns. I need to update selected values in each row. These selected values vary from row to row. For example, I might have to update 346th, 446th, 789th and 321th column values for first row and for second row I might have to update 821th, 564th, 101th, 781th column values. I hope you get the situation.
Here, I am simulating the problem using a small matrix.
Suppose I have a 2 dimensional list/matrix.
matrix = [ [1,2,3], [4,5,6], [7,8,9]]
In the actual problem, I need to update all the rows but here for the sake of simplicity I am considering only 1 row. i.e. 2nd row. I wish to update 1st and 2nd values of 2nd row and keep the rest of the values in 2nd row as they are.
I need to do it without using for loops.
The code I tried is as follows :
index_list = [0,1]
matrix[1] = [ matrix[1][index] + 1 for index in index_list ]
print(matrix)
Here, index_list is the list of selected columns that need to be updated. The output I get is :
[[1, 2, 3], [5, 6], [7, 8, 9]]
The output I need / expected output is :
[[1, 2, 3], [5, 6, 6], [7, 8, 9]]
So, the question is, I wish to update only 1sta and 2nd values of second row for above given matrix and keep the rest of the values in 2nd row as it is. And this needs to be done without using for loops because of time constraints. I am trying to use list compression because it is relatively fast. Could you please help with it ?
I forgot to mention the code is in python, and we can use pandas, numpy if required.
matrix[1] = [matrix[1][i] + 1 if i in index_list else matrix[1][i] for i in range(len(matrix[1]))]
This solution worked.
You need to cover the case where the index is not in the list, preserving those values:
matrix[1] = [ matrix[1][index] + 1 if index in index_list
else matrix[1][index] ]

Python loop through text and set numpy array index

Given a block of text with matrix rows and columns separated by commas and semicolons, I want to parse the text and set the indices of numpy arrays. Here is the code with the variable 'matrixText' representing the base text.
I first create the matrices and then split the text by semicolons and then by commas. I loop through the split text and set each index. However with the text ...
1,2,3;4,5,6;7,8,9
I get the result
7,7,7;8,8,8;9,9,9
temp1=matrixText.split(';')
temp2=temp1[0].split(',')
rows=len(temp1)
columns=len(temp2)
rA=np.zeros((rows, columns))
arrayText=matrixText.split(';')
rowText=range(len(arrayText))
for rowIndex, rowItem in enumerate(arrayText):
rowText[rowIndex]=arrayText[rowIndex].split(',')
for colIndex, colItem in enumerate(rowText[rowIndex]):
rA[[rowIndex, colIndex]]=rowText[rowIndex][colIndex]
I thought that by setting each index, I would avoid any copy by reference issues.
To provide more info, in the first iteration, the 0,0 index is set to 1 and the output of that is then 1,1,1;0,0,0;0,0,0 which I can't figure out since setting one index in the numpy array sets three.
In the second iteration, the index 0-1 is set to 2 and the result is then 2,2,2;2,2,2;0,0,0
The third iteration sets 0-2 to 3 but the result is 3,3,3;2,2,2;3,3,3
Any suggestions?
You can (ab-) use the matrix constructor plus the A property
np.matrix('1,2,3;4,5,6;7,8,9').A
Output:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
matrixText = '1,2,3;4,5,6;7,8,9'
temp1=matrixText.split(';')
temp2=temp1[0].split(',')
rows=len(temp1)
columns=len(temp2)
rA=np.empty((rows, columns),dtype=np.int)
for n, line in enumerate(temp1):
rA[n,:]=line.split(',')
Using a nested list-comprehension:
Having defined:
s = "1,2,3;4,5,6;7,8,9"
we can use a nice one-liner:
np.array([[int(c) for c in r.split(",")] for r in s.split(";")])
which would give the following array:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Another one-liner (+1 import):
from io import StringIO
rA = np.loadtxt(StringIO(matrixText.replace(';','\n')), delimiter=',')
So, the problem here was that the dual brackets in rA[[rowIndex, colIndex]] caused every cell in the row to be set. This should be rA[rowIndex, colIndex]

finding the max of a column in an array

def maxvalues():
for n in range(1,15):
dummy=[]
for k in range(len(MotionsAndMoorings)):
dummy.append(MotionsAndMoorings[k][n])
max(dummy)
L = [x + [max(dummy)]] ## to be corrected (adding columns with value max(dummy))
## suggest code to add new row to L and for next function call, it should save values here.
i have an array of size (k x n) and i need to pick the max values of the first column in that array. Please suggest if there is a simpler way other than what i tried? and my main aim is to append it to L in columns rather than rows. If i just append, it is adding values at the end. I would like to this to be done in columns for row 0 in L, because i'll call this function again and add a new row to L and do the same. Please suggest.
General suggestions for your code
First of all it's not very handy to access globals in a function. It works but it's not considered good style. So instead of using:
def maxvalues():
do_something_with(MotionsAndMoorings)
you should do it with an argument:
def maxvalues(array):
do_something_with(array)
MotionsAndMoorings = something
maxvalues(MotionsAndMoorings) # pass it to the function.
The next strange this is you seem to exlude the first row of your array:
for n in range(1,15):
I think that's unintended. The first element of a list has the index 0 and not 1. So I guess you wanted to write:
for n in range(0,15):
or even better for arbitary lengths:
for n in range(len(array[0])): # I chose the first row length here not the number of columns
Alternatives to your iterations
But this would not be very intuitive because the max function already implements some very nice keyword (the key) so you don't need to iterate over the whole array:
import operator
column = 2
max(array, key=operator.itemgetter(column))[column]
this will return the row where the i-th element is maximal (you just define your wanted column as this element). But the maximum will return the whole row so you need to extract just the i-th element.
So to get a list of all your maximums for each column you could do:
[max(array, key=operator.itemgetter(column))[column] for column in range(len(array[0]))]
For your L I'm not sure what this is but for that you should probably also pass it as argument to the function:
def maxvalues(array, L): # another argument here
but since I don't know what x and L are supposed to be I'll not go further into that. But it looks like you want to make the columns of MotionsAndMoorings to rows and the rows to columns. If so you can just do it with:
dummy = [[MotionsAndMoorings[j][i] for j in range(len(MotionsAndMoorings))] for i in range(len(MotionsAndMoorings[0]))]
that's a list comprehension that converts a list like:
[[1, 2, 3], [4, 5, 6], [0, 2, 10], [0, 2, 10]]
to an "inverted" column/row list:
[[1, 4, 0, 0], [2, 5, 2, 2], [3, 6, 10, 10]]
Alternative packages
But like roadrunner66 already said sometimes it's easiest to use a library like numpy or pandas that already has very advanced and fast functions that do exactly what you want and are very easy to use.
For example you convert a python list to a numpy array simple by:
import numpy as np
Motions_numpy = np.array(MotionsAndMoorings)
you get the maximum of the columns by using:
maximums_columns = np.max(Motions_numpy, axis=0)
you don't even need to convert it to a np.array to use np.max or transpose it (make rows to columns and the colums to rows):
transposed = np.transpose(MotionsAndMoorings)
I hope this answer is not to unstructured. Some parts are suggestions to your function and some are alternatives. You should pick the parts that you need and if you have any trouble with it, just leave a comment or ask another question. :-)
An example with a random input array, showing that you can take the max in either axis easily with one command.
import numpy as np
aa= np.random.random([4,3])
print aa
print
print np.max(aa,axis=0)
print
print np.max(aa,axis=1)
Output:
[[ 0.51972266 0.35930957 0.60381998]
[ 0.34577217 0.27908173 0.52146593]
[ 0.12101346 0.52268843 0.41704152]
[ 0.24181773 0.40747905 0.14980534]]
[ 0.51972266 0.52268843 0.60381998]
[ 0.60381998 0.52146593 0.52268843 0.40747905]

Reduce size of array based on multiple column criteria in python

I need to reduce the size of an array, based on criteria found on another array; I need to look into the relationships and change the value based on the new information. Here is a simplified version of my problem.
I have an array (or dataframe) with my data:
data = np.array([[[[1, 2, 3, 4], [5, 6, 7, 8]]]]).reshape((4,2))
I have another file, of different size, that holds information about the values in the data array:
a = np.array([[1, 1, 2],[2, 3, 4],[3, 5, 6], [4, 7, 8] ]).reshape((4,3))
The information I have in a tells me how I can reduce the size of data, for example a[0] tells me that data[0][0:2] == a[0][1:].
so I can replace the unique value a[0][0:1] with data[0][0:2] (effectively reducing the size of array data
To clarify, array a holds three pieces of information per position, a[0] has the information 1, 1, 2 - now I want to scan through the data array, and when the a[i][1:] is equal to any of the data[i][0:2] or data[i][2:] then I want to replace the value with the a[i][0:1] - is that any clearer?
my final array should be like this:
new_format = np.array([[[[1, 2], [3,4]]]]).reshape((2,2))
There are questions like the following: Filtering a DataFrame based on multiple column criteria
but are only based on filtering based on certain numerical criteria.
I figured out a way to do it, using the pandas library. Probably not the best solution, but worked from me.
In my case I read the data in the pandas library, but for the posted example I can convert the arrays to dataframes
datas = pd.DataFrame(data) ##convert to dataframe
az = pd.DataFrame(a)
datas= datas.rename(columns={'0': '1', '1': '2'}) ## rename columns for comparison with a array
new_format= pd.merge(datas, az, how='right') #do the comparison
new_format = new_format.drop(['1','2'],1) #drop the old columns, keeping only the new format

How to only add item from list if item in different list is not 0 keeping same index?

I am working with Excel (using xlsxwriter and openpyxl) and I am trying to populate cells of one column from one list based on if the cell in the adjacent column has a 0 in it or not. If said adjacent column cell has a 0 in it, the code should ignore whatever number is in the second list and replace that with a 0 in the new cell.
To simplify my code, here is what I am working with, just less numbers. I have two lists:
full[2, 5, 0, 1, 3, 0, 3, 4, 5, 0]
regr[3, 6, 4, 5, 1, 5, 7, 8, 9, 3]
List full is displayed in Excel's column B, one list item per cell. What I need to do is display the items fro list regr in the next column C, replacing current numbers with 0 if 0 is found in the adjacent cell in column B.
So it should ideally look something like this:
http://i.stack.imgur.com/fJ0HG.png
What I am finding difficult is having a loop that keeps track of the index of each list, and a counter that adds each time (for column insertion purposes - B1, B2, B3, B4 etc.)
I have code that populates column B with the regr list but it doesn't do the 0 check and all my attempts to store and use the index have failed.
for x in range(0, 50):
worksheet1.write("B" + str(x), str(regr[x]))
Any help would be greatly appreciated. Thanks!
You should probably use zip to loop over the two lists in parallel. Also, don't try and create cell coordinates programmatically using the "A1" syntax. Both openpyxl and xlsxwriter allow the use of numeric row and column indices for this kind of thing.

Categories