Efficient replacement of rows or columns of 2d arrays - python

I need a lot of operations to replace parts of the array repeatedly, I hope there is an efficient way to avoid loops, because I found that 2d array slicing does not support write operations.So I constructed a simple function to achieve this goal.
a = np.random.rand(4,4)
b = np.random.rand(4)
c = [1,1,1,1]
def ravel_index(a,b,row_index,col_index,order='c'):
rindex = row_index * a.shape[1] + col_index
lindex = rindex + b.ravel().shape[0]
return rindex,lindex
f,l = ravel_index(a,b,1,0)
a.ravel()[f:l]=c
print (a)
>>>[[ 0.013631517 0.81654666 0.96975073 0.832641632]
[ 1. 1. 1. 1. ]
[ 0.092047737 0.149801674 0.322049501 0.162026284]
[ 0.490197753 0.54935894 0.527087062 0.126544099]]
It looks ideal now, But when trying to write in column direction..
f,l = ravel_index(a,b,1,0)
a.ravel('F')[f:l]=c
print (a)
>>>[[ 0.306372691 0.586445896 0.052487946 0.864993735]
[ 0.873470159 0.762572666 0.986864265 0.803903923]
[ 0.000208709 0.579103322 0.811386673 0.196167481]
[ 0.928682626 0.707539068 0.752064295 0.564061717]]
Obviously the array is copied, I don't know how to solve this problem, hope to get help Thank you

The documentation for numpy.ravel mentions that
A copy is made only if needed
which means that ravel('F') won't be the solution you're looking for. If you assume instead that everything is in order 'C', you can modify your ravel_index function:
def ravel_index(a,c,row_index,col_index,order='c'):
rindex = row_index * a.shape[1] + col_index
if order == 'c':
lindex = rindex + b.ravel().shape[0]
return range(rindex, lindex)
elif order == 'f':
lindex = rindex + a.shape[1]*np.ravel(c).shape[0]
return (None, range(rindex, lindex, a.shape[1]))
E.g.
>>> a = np.random.rand(4,4)
>>> c = [1,1,1,1]
>>> ravel_slice = ravel_index(a,c,0,1,order='f')
>>> a.ravel()[ravel_slice]=c
>>>
>>> print a
[[0.56152208 1. 0.76850125 0.90981706]
[0.00753469 1. 0.33609404 0.01321701]
[0.36101786 1. 0.36610868 0.77170151]
[0.64812018 1. 0.33486985 0.58649772]]

Related

How can I make my code simpler and still get the same output?

So I had to write a code to find all the diagonal strings in a grid mystery
mystery = [["r","a","w","b","i","t"],
["x","a","y","z","c","h"],
["p","q","b","e","i","e"],
["t","r","s","b","o","g"],
["u","w","x","v","i","t"],
["n","m","r","w","o","t"]]
And here is what I have so far, with the help of a few experts because I'm new to this. The expert who helped me is https://stackoverflow.com/users/5237560/alain-t
def diagsDownRight(M):
diags,pad = [],[]
while any(M):
edge = [*next(zip(*reversed(M))),*M[0][1:]]
M = [r[1:] for r in M[1:]]
diags.append(pad+edge+pad)
pad.append("")
return [*map("".join,zip(*diags))]
While this does work, I myself find it hard to grasp and I do not want to just write down a code that I do not understand. So, can anyone please help make the code as basic as possible?
When I mean basic as possible, I mean like picture yourself as a person who has just learnt coding for a couple of months, and please try to simplify my code as much as possible.
The easiest I could think of: pad rows so that diagonals become columns. The code:
def diagsDownRight(M):
n = len(M)
m = [[''] * (n-i-1) + row + [''] * i for i, row in enumerate(M)] # pad rows
return [''.join(col) for col in zip(*m)]
The result is the same, and IMO the approach is more intuitive
consider a square matrix
[
[ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8, 9 ]
]
the indexes into the diagonals are as follows
d1 = [[0,0],[1,1],[2,2]]
d2 = [[0,1],[1,2]]
d3 = [[1,0],[2,1]]
d4 = [[2,0]]
d5 = [[0,2]]
to get the middle diagonal you can simply start with the indexes
for i in range(3):
index = [i,i]
for the next diagonal we simply do the same... but offset x by 1, until we go out of bounds
for i in range(3):
if i + 1 > 2:
break
index = [i, i+1]
for the next diagonal its the same ... except we do it on the other axis
for i in range(3):
if i + 1 > 2:
break
index = [i + 1, i]
for the toprightmost (in this case ...) its the same but we add 2
for i in range(3):
if i + 2 > 2:
break
index = [i, i+2]
same for the bottom most but using the other index
for i in range(3):
if i + 2 > 2:
break
index = [i + 2, i]
I will leave it to you to extrapolate this into a working solution
here's a simpler version:
def diagsDownRight(M):
rows = len(M) # number of rows
cols = len(M[0]) # number of columns
result = [] # result will be a list of strings
leftSide = [(r,0) for r in range(rows)] # first column
topSide = [(0,c) for c in range(1,cols)] # first row
for r,c in leftSide[::-1] + topSide: # all start positions
s = "" # string on the diagonal
while r < rows and c < cols:
s += M[r][c] # accumulate characters
r += 1 # move down
c += 1 # and to the right
result.append(s) # add diagonal string to result
return result
print(diagsDownRight(mystery))
['n', 'um', 'twr', 'prxw', 'xqsvo',
'rabbit', 'ayeot', 'wzig', 'bce', 'ih', 't']
The way it works is by starting at the coordinate of the left and top positions, accumulate characters going one place to the right and down until going out of the matrix.
I would suggest you go with Marat's solution though. It is simple and elegant. If you print the m matrix I'm sure, you'll understand what's going on

Assinging submatrix to matrix in numpy [duplicate]

This question already has an answer here:
Numpy getting in the way of int -> float type casting
(1 answer)
Closed 2 years ago.
I wrote a some code to compute householder reduction to Hessenberg form
V = []
m,n = A.shape
for i in range(m-1):
x = A[i+1:,i]
e1 = np.zeros(x.shape)
e1[0] = 1
v = sgn(x[0])*np.linalg.norm(x)*e1 + x
v = v/np.linalg.norm(v)
V.append(v)
vv = np.outer(v,v)
print(A[i+1:,i:]-2*vv # A[i+1:,i:])
A[i+1:,i:] =A[i+1:,i:]-2*vv # A[i+1:,i:]
print(A)
A[:,i+1:] = A[:,i+1:] - 2 * np.outer(A[:,i+1:] # v,v)
I run this code with A =
[[1,2,3],
[2,4,5],
[1,3,2]]
The first print statement prints
[[-2.23606798 -4.91934955 -5.36656315]
[ 0. 0.89442719 -0.4472136 ]]
Which is what makes sense.
While the second prints
[[ 1 2 3]
[-2 -4 -5]
[ 0 0 0]]
And this don't make sense.
Why do they print differently?
If this kind of assignment don't work, is there some other smart way?
You are trying to assign floats to an array of ints. Here is a quick fix - Change the dtype before operating on it.
A = np.array([[1,2,3],
[2,4,5],
[1,3,2]])
b = np.array([[-2.23606798, -4.91934955, -5.36656315],
[ 0., 0.89442719, -0.4472136 ]])
A = A.astype('float64')
A[1:,0:] = b
print(A)
>>>
[[ 1. 2. 3. ]
[-2.23606798 -4.91934955 -5.36656315]
[ 0. 0.89442719 -0.4472136 ]]
>>>
From the documentation regarding indexing/assignments:
Note that assignments may result in changes if assigning higher types to lower types (like floats to ints) or even exceptions (assigning complex to floats or ints):

Checking for a Magic Square Python

i'm trying to make a function that checks whether or not a matrix is a magic square. i only need to check the vertical and horizontal (not diagonal). sometimes it passes and sometimes it fails. i was hoping that someone could help me solve the issue. this is my code:
for i in range(len(m)):
if len(m[i]) != len(m):
return False
return True
and this is the one that fails. it returns false:
m = [ [1,2,3,4]
, [5,6,7,8]
, [9,10,11,12]
, [13,14,15,16]
]
print(magic_square(m)) == False)
The code you provide does not check if a matrix is a magic square, it only checks if a matrix (in this case, a list of lists) is square. After this check, you need to calculate the sums of each row, each column and each diagonal (although for some reason you said you don't need those) and compare if all of them are equal.
If you are ok with using numpy then you can use
m = np.array(m)
len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1
Test cases:
m = [ [1,2,3,4]
, [5,6,7,8]
, [9,10,11,12]
, [13,14,15,16]
]
m = np.array(m)
print (len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1)
m = [ [2,7,6]
, [9,5,1]
, [4,3,8]
]
m = np.array(m)
print (len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1)
Output:
False
True
Meaning:
m.sum(axis=1) : Sum the numpy array along the rows
m.sum(axis=0) : Sum the numpy array along the columns
np.concatenate([m.sum(axis=1), m.sum(axis=0)]) : Combine both the sums (along rows and columns) into t single numpy array
np.unique(x) : Find the number of unique elements in the numpy array x
len(np.unique(np.concatenate([m.sum(axis=1), m.sum(axis=0)]))) == 1) : Check the number of unique elements in the row wise sum and columns wise sum is 1. i.e all the row wise sum and column wise sums are same.
This is not as clever as the numpy answer, but it works
def magic_square(m):
# check size
for i in range(len(m)):
if len(m[i]) != len(m):
return False
# check row sums
for r in m:
if sum(r) != sum(m[0]):
return False
# check column sums
cols = [[r[c] for r in m] for c in range(len(m[0]))]
for c in cols:
if sum(c) != sum(m[0]):
return False
return True
m = [ [1,2,3,4]
, [5,6,7,8]
, [9,10,11,12]
, [13,14,15,16]
]
print(magic_square(m)) # False
m = [ [8,11,14,1]
, [13,2,7,12]
, [3,16,9,6]
, [10,5,4,15]
]
print(magic_square(m)) # True

Numpy where conditional statement along axis 0

I have a 1D vector Zc containing n elements that are 2D arrays. I want to find the index of each 2D array that equals np.ones(Zc[i].shape).
a = np.zeros((5,5))
b = np.ones((5,5))*4
c = np.ones((5,5))
d = np.ones((5,5))*2
Zc = np.stack((a,b,c,d))
for i in range(len(Zc)):
a = np.ones(Zc[i].shape)
b = Zc[i]
if np.array_equal(a,b):
print(i)
else:
pass
Which returns 2. The code above works and returns the correct answer, but I want to know if there a vectorized way to achieve the same result?
Going off of hpaulj's comment:
>>> allones = (Zc == np.array(np.ones(Zc[i].shape))).all(axis=(1,2))
>>> np.where(allones)[0][0]
2

Applying several functions to each row of an array

I have a numpy array which has only a few non-zero entries which can be either positive or negative. E.g. something like this:
myArray = np.array([[ 0. , 0. , 0. ],
[ 0.32, -6.79, 0. ],
[ 0. , 0. , 0. ],
[ 0. , 1.5 , 0. ],
[ 0. , 0. , -1.71]])
In the end, I would like to receive a list where each entry of this list corresponds to a row of myArray and is a cumulative product of function outputs which depend on the entries of the respective row of myArray and another list (in the example below it is called l).
The individual terms depend on the sign of the myArray entry: When it is positive, I apply "funPos", when it is negative, I apply "funNeg" and if the entry is 0, the term will be 1. So in the example array from above it would be:
output = [1*1*1 ,
funPos(0.32, l[0])*funNeg(-6.79,l[1])*1,
1*1*1,
1*funPos(1.5, l[1])*1,
1*1*funNeg(-1.71, l[2])]
I implemented this as shown below and it gives me the desired output (note: that is just a highly simplified toy example; the actual matrices are far bigger and the functions more complicated). I go through each row of the array, if the sum of the row is 0, I don't have to do any calculations and the output is just 1. If it is not equal 0, I go through this row, check the sign of each value and apply the appropriate function.
import numpy as np
def doCalcOnArray(Array1, myList):
output = np.ones(Array1.shape[0]) #initialize output
for indRow,row in enumerate(Array1):
if sum(row) != 0: #only then calculations are needed
tempProd = 1. #initialize the product that corresponds to the row
for indCol, valCol in enumerate(row):
if valCol > 0:
tempVal = funPos(valCol, myList[indCol])
elif valCol < 0:
tempVal = funNeg(valCol, myList[indCol])
elif valCol == 0:
tempVal = 1
tempProd = tempProd*tempVal
output[indRow] = tempProd
return output
def funPos(val1,val2):
return val1*val2
def funNeg(val1,val2):
return val1*(val2+1)
myArray = np.array([[ 0. , 0. , 0. ],
[ 0.32, -6.79, 0. ],
[ 0. , 0. , 0. ],
[ 0. , 1.5 , 0. ],
[ 0. , 0. , -1.71]])
l = [1.1, 2., 3.4]
op = doCalcOnArray(myArray,l)
print op
The output is
[ 1. -7.17024 1. 3. -7.524 ]
which is the desired one.
My question is whether there is a more efficient way for doing that since that is quite "expensive" for large arrays.
EDIT:
I accepted gabhijit's answer because the pure numpy solution he came up with seems to be the fastest one for the arrays I am dealing with. Please note, that there is also a nice working solution from RaJa that requires panda and also the solution from dave works fine which can serve as a nice example on how to use generators and numpy's "apply_along_axis".
Here's what I have tried - using reduce, map. I am not sure how fast this is - but is this what you are trying to do?
Edit 4: Simplest and most readable - Make l a numpy array and then greatly simplifies where.
import numpy as np
import time
l = np.array([1.0, 2.0, 3.0])
def posFunc(x,y):
return x*y
def negFunc(x,y):
return x*(y+1)
def myFunc(x, y):
if x > 0:
return posFunc(x, y)
if x < 0:
return negFunc(x, y)
else:
return 1.0
myArray = np.array([
[ 0.,0.,0.],
[ 0.32, -6.79, 0.],
[ 0.,0.,0.],
[ 0.,1.5,0.],
[ 0.,0., -1.71]])
t1 = time.time()
a = np.array([reduce(lambda x, (y,z): x*myFunc(z,l[y]), enumerate(x), 1) for x in myArray])
t2 = time.time()
print (t2-t1)*1000000
print a
Basically let's just look at last line it says cumulatively multiply things in enumerate(xx), starting with 1 (last parameter to reduce). myFunc simply takes the element in myArray(row) and element # index row in l and multiplies them as needed.
My output is not same as yours - so I am not sure whether this is exactly what you want, but may be you can follow the logic.
Also I am not so sure how fast this will be for huge arrays.
edit: Following is a 'pure numpy way' to do this.
my = myArray # just for brevity
t1 = time.time()
# First set the positive and negative values
# complicated - [my.itemset((x,y), posFunc(my.item(x,y), l[y])) for (x,y) in zip(*np.where(my > 0))]
# changed to
my = np.where(my > 0, my*l, my)
# complicated - [my.itemset((x,y), negFunc(my.item(x,y), l[y])) for (x,y) in zip(*np.where(my < 0))]
# changed to
my = np.where(my < 0, my*(l+1), my)
# print my - commented out to time it.
# Now set the zeroes to 1.0s
my = np.where(my == 0.0, 1.0, my)
# print my - commented out to time it
a = np.prod(my, axis=1)
t2 = time.time()
print (t2-t1)*1000000
print a
Let me try to explain the zip(*np.where(my != 0)) part as best as I can. np.where simply returns two numpy arrays first array is an index of row, second array is an index of column that matches the condition (my != 0) in this case. We take a tuple of those indices and then use array.itemset and array.item, thankfully, column index is available for free to us, so we can just take the element # that index in the list l. This should be faster than previous (and by orders of magnitude readable!!). Need to timeit to find out whether it indeed is.
Edit 2: Don't have to call separately for positive and negative can be done with one call np.where(my != 0).
So, let's see if I understand your question.
You want to map elements of your matrix to a new matrix such that:
0 maps to 1
x>0 maps to funPos(x)
x<0 maps to funNeg(x)
You want to calculate the product of all elements in the rows this new matrix.
So, here's how I would go about doing it:
1:
def myFun(a):
if a==0:
return 1
if a>0:
return funPos(a)
if a<0:
return funNeg(a)
newFun = np.vectorize(myFun)
newArray = newFun(myArray)
And for 2:
np.prod(newArray, axis = 1)
Edit: To pass the index to funPos, funNeg, you can probably do something like this:
# Python 2.7
r,c = myArray.shape
ctr = -1 # I don't understand why this should be -1 instead of 0
def myFun(a):
global ctr
global c
ind = ctr % c
ctr += 1
if a==0:
return 1
if a>0:
return funPos(a,l[ind])
if a<0:
return funNeg(a,l[ind])
I think this numpy function would be helpful to you
numpy.apply_along_axis
Here is one implementation. Also I would warn against checking if the sum of the array is 0. Comparing floats to 0 can give unexpected behavior due to machine accuracy constraints. Also if you have -5 and 5 the sum is zero and I'm not sure thats what you want. I used numpy's any() function to see if anything was nonzero. For simplicity I also pulled your list (my_list) into global scope.
import numpy as np
my_list = 1.1, 2., 3.4
def func_pos(val1, val2):
return val1 * val2
def func_neg(val1, val2):
return val1 *(val2 + 1)
def my_generator(row):
for i, a in enumerate(row):
if a > 0:
yield func_pos(a, my_list[i])
elif a < 0:
yield func_neg(a, my_list[i])
else:
yield 1
def reduce_row(row):
if not row.any():
return 1.0
else:
return np.prod(np.fromiter(my_generator(row), dtype=float))
def main():
myArray = np.array([
[ 0. , 0. , 0. ],
[ 0.32, -6.79, 0. ],
[ 0. , 0. , 0. ],
[ 0. , 1.5 , 0. ],
[ 0. , 0. , -1.71]])
return np.apply_along_axis(reduce_row, axis=1, arr=myArray)
There are probably faster implmentations, I think apply_along_axis is really just a loop under the covers.
I didn't test, but I bet this is faster than what you started with, and should be more memory efficient.
I've tried your example with the masking function of numpy arrays. However, I couldn't find a solution to replace the values in your array by funPos or funNeg.
So my suggestion would be to try this using pandas instead as it conserves indices while masking.
See my example:
import numpy as np
import pandas as pd
def funPos(a, b):
return a * b
def funNeg(a, b):
return a * (b + 1)
myPosFunc = np.vectorize(funPos) #vectorized form of funPos
myNegFunc = np.vectorize(funNeg) #vectorized form of funNeg
#Input
I = [1.0, 2.0, 3.0]
x = pd.DataFrame([
[ 0.,0.,0.],
[ 0.32, -6.79, 0.],
[ 0.,0.,0.],
[ 0.,1.5,0.],
[ 0.,0., -1.71]])
b = pd.DataFrame(myPosFunc(x[x>0], I)) #calculate all positive values
c = pd.DataFrame(myNegFunc(x[x<0], I)) #calculate all negative values
b = b.combineMult(c) #put values of c in b
b = b.fillna(1) #replace all missing values that were '0' in the raw array
y = b.product() #multiply all elements in one row
#Output
print ('final result')
print (y)
print (y.tolist())

Categories