How to zero out rows and columns in a SymPy Matrix - python

I'm implementing a simple FEA code and I need to zero out particular rows and columns to apply boundary conditions. Example matrix:
I tried with my_matrix[:,1] = 0 but it returns an error: ValueError: unexpected value: 0
Can some one guide me on how to make columns and rows zero?

Sympy matrix objects don't appear to support assigning a constant to multiple entries like numpy array objects.
Try my_matrix[:,1] = [0]*my_matrix.shape[0] instead, which generates a list of 0s of length equal to the number of rows of my_matrix.

Related

How do you filter rows in a dataframe based on the column numbers from a Python list?

I have a Pandas dataframe with two columns, x and y, that correspond to a large signal. It is about 3 million rows in size.
Wavelength from dataframe
I am trying to isolate the peaks from the signal. After using scipy, I got a 1D Python list corresponding to the indexes of the peaks. However, they are not the actual x-values of the signal, but just the index of their corresponding row:
from scipy.signal import find_peaks
peaks, _ = find_peaks(y, height=(None, peakline))
So, I decided I would just filter the original dataframe by setting all values in its y column to NaN unless they were on an index found in the peak list. I did this iteratively, however, since it is 3000000 rows, it is extremely slow:
peak_index = 0
for data_index in list(data.index):
if data_index != peaks[peak_index]:
data[data_index, 1] = float('NaN')
else:
peak_index += 1
Does anyone know what a faster method of filtering a Pandas dataframe might be?
Looping in most cases is extremely inefficient when it comes to pandas. Assuming you just need filtered DataFrame that contains the values of both x and y columns only when y is a peak, you may use the following piece of code:
df.iloc[peaks]
Alternatively, if you are hoping to retrieve an original DataFrame with y column retaining its peak values and having NaN otherwise, then please use:
df.y = df.y.where(df.y.iloc[peaks] == df.y.iloc[peaks])
Finally, since you seem to care about just the x values of the peaks, you might just rework the first piece in the following way:
df.iloc[peaks].x

Creating array from existing array based on conditions of other arrays

itmPaths is a list with 719255 (integer) values.
Pt is a 719255x1 matrix/array with float64 values.
C is a 719255x1 matrix/array with float64 values.
I would like to extract the index values where Pt > C, and then use those index values to extract the values from itmPaths that correspond with those index values, and then store those values in a new array, called exPaths. I have tried using the following code:
exPaths = itmPaths[index for index,value in enumerate(Pt-C) if value > 0]
In Matlab I can successfully do this using:
exPaths = itmPaths(Pt>C);
I would like to keep the code as efficient as possible. Thanks.
Using a list comprehension, you could do this, but just as I don't know the exact structure of what you call matrix you may adapt, but zipping both allows to keep track of the index (to extract value after) and the values (to apply the condition)
exPaths = [itmPaths[idx] for idx, pc in enumerate(zip(Pt, C)) if pc[0] > pc[1]]

Python - split matrix data into separate columns

I have read data from a file and stored into a matrix (frag_coords):
frag_coords =
[[ 916.0907976 -91.01391344 120.83596334]
[ 916.01117655 -88.73389753 146.912555 ]
[ 924.22832597 -90.51682575 120.81734705]
...
[ 972.55384732 708.71316138 52.24644577]
[ 972.49089559 710.51583744 72.86369124]]
type(frag_coords) =
class 'numpy.matrixlib.defmatrix.matrix'
I do not have any issues when reordering the matrix by a specified column. For example, the code below works just fine:
order = np.argsort(frag_coords[:,2], axis=0)
My issue is that:
len(frag_coords[0]) = 1
I need to access the individual numbers of the first row individually, I've tried splitting it, transforming it into a list and everything seems to return the 3 numbers not as columns but rather as a single element with len=1. I need help please!
Your problem is that you're using a matrix instead of an ndarray. Are you sure you want that?
For a matrix, indexing the first row alone leads to another matrix, a row matrix. Check frag_coords[0].shape: it will be (1,3). For an ndarray, it would be (3,).
If you only need to index the first row, use two indices:
frag_coords[0,j]
Or if you store the row temporarily, just index into it as a row matrix:
tmpvar = frag_coords[0] # shape (1,3)
print(tmpvar[0,2]) # for column 2 of row 0
If you don't need too many matrix operations, I'd advise that you use np.arrays instead. You can always read your data into an array directly, but at a given point you can just transform an existing matrix with np.array(frag_coords) too if you wish.

Python Numpy: Coalesce and return first nonzero observation

I am currently new to NumPy, but very proficient with SQL.
I used a function called coalesce in SQL, which I was disappointed not to find in NumPy. I need this function to create a third array want from 2 arrays i.e. array1 and array2, where zero/ missing observations in array1 are replaced by observations in array2 under the same address/Location. I can't figure out how to use np.where?
Once this task is accomplished, I would like to take the lower diagonal of this array want and then populate a final array want2 noting the first non-zero observation. If all observations i.e. coalesce(array1, array2) returns missing or 0 in want2, then assign by default zero.
I have written an example demonstrating the desired behavior.
import numpy as np
array1= np.array(([-10,0,20],[-1,0,0],[0,34,-50]))
array2= np.array(([10,10,50],[10,0,25],[50,45,0]))
# Coalesce array1,array2 i.e. get the first non-zero value from array1, then from array2.
# if array1 is empty or zero, then populate table want with values from array2 under same address
want=np.tril(np.array(([-10,10,20],[-1,0,25],[50,34,-50])))
print(array1)
print(array2)
print(want)
# print first instance of nonzero observation from each column of table want
want2=np.array([-10,34,-50])
print(want2)
"Coalesce": use putmask to replace values equal to zero with values from array2:
want = array1.copy()
np.putmask(array1.copy(), array1==0, array2)
First nonzero element of each column of np.tril(want):
where_nonzero = np.where(np.tril(want) != 0)
"""For the where array, get the indices of only
the first index for each column"""
first_indices = np.unique(where_nonzero[1], return_index=True)[1]
# Get the values from want for those indices
want2 = want[(where_nonzero[0][first_indices], where_nonzero[1][first_indices])]

calculating means of many matrices in numpy

I have many csv files which each contain roughly identical matrices. Each matrix is 11 columns by either 5 or 6 rows. The columns are variables and the rows are test conditions. Some of the matrices do not contain data for the last test condition, which is why there are 5 rows in some matrices and six rows in other matrices.
My application is in python 2.6 using numpy and sciepy.
My question is this:
How can I most efficiently create a summary matrix that contains the means of each cell across all of the identical matrices?
The summary matrix would have the same structure as all of the other matrices, except that the value in each cell in the summary matrix would be the mean of the values stored in the identical cell across all of the other matrices. If one matrix does not contain data for the last test condition, I want to make sure that its contents are not treated as zeros when the averaging is done. In other words, I want the means of all the non-zero values.
Can anyone show me a brief, flexible way of organizing this code so that it does everything I want to do with as little code as possible and also remain as flexible as possible in case I want to re-use this later with other data structures?
I know how to pull all the csv files in and how to write output. I just don't know the most efficient way to structure flow of data in the script, including whether to use python arrays or numpy arrays, and how to structure the operations, etc.
I have tried coding this in a number of different ways, but they all seem to be rather code intensive and inflexible if I later want to use this code for other data structures.
You could use masked arrays. Say N is the number of csv files. You can store all your data in a masked array A, of shape (N,11,6).
from numpy import *
A = ma.zeros((N,11,6))
A.mask = zeros_like(A) # fills the mask with zeros: nothing is masked
A.mask = (A.data == 0) # another way of masking: mask all data equal to zero
A.mask[0,0,0] = True # mask a value
A[1,2,3] = 12. # fill a value: like an usual array
Then, the mean values along first axis, and taking into account masked values, are given by:
mean(A, axis=0) # the returned shape is (11,6)

Categories