Perform element-wise operation on numpy string array

Perform element-wise operation on numpy string array - python

say I have a numpy array like this io = np.asarray(['hello world','hello Graz', 'hello all']). Now its shape is io.shape (3,). I would like to perform a split per each element. I know this works splituf = lambda i: np.asarray([item.split(" ",1) for item in i]). Because the real life application will be on much larger array I'd like to avoid the for loop and use vectorized operation.
Any ideas?
Many thanks

There's a collection nu py functions that applies the Python str operations to elemets of an array
http://docs.scipy.org/doc/numpy/reference/routines.char.html
This includes a np.char.split.
In my limited experience these aren't significantly faster than a list comprehension because they still call Python functions, not fast compiled numpyccode.
If the split occurs at the same point in each string , ega[:5],a[5:]`, we might be able to do some dtype conversion.
The result will be 2d, right?

You can used pandas library. It is built uses numpy, providing rich documentation and wonderful operations like pivot, graphs, element-wise operations,... lots of them
Note: pandas is not a replacement for numpy.
Pandas element wise operation
Here is one special case of element wise operation
>>> sam = np.arange(15)
>>> print sam
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
>>> print pd.rolling_apply(sam, 2, lambda x: x[1] - x[0])
[ nan 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

You can use join and re.split
import numpy as np
import re
io = np.asarray(['hello world','hello Graz', 'hello all'])
print(re.split('[ -]', '-'.join(io)))

Related

How can I improve the efficiency of my algorithm, while I use two loops inside?

Dear experienced friends, I proposed a method to solve an algorithm problem. However, I found my method becomes very time-consuming when the data size grows. May I ask is there any better way to solve this problem? Is it possible to use matrix manipulation?
The question:
Suppose we have 1 score-matrix and 3 value-matrix.
Each of them is a square matrix with the same size (N*N).
The element in score-matrix means the weights between two entities. For example, S12 means the score between entity 1 and entity 2. (Weights are only meaningful when greater than 0.)
The element in value-matrix means the values between two entities. For example, V12 means the value between entity 1 and entity 2. Since we have 3 value-matrix, we have 3 different V12.
The target is: I want to multiply the values with the corresponding weights, so that I can finally output a (Nx3) matrix.
My solutions: I solved this problem as follows. However, I use two for-loops here, which makes my program become very time-consuming. (e.g. When N is big or 3 becomes 100) May I ask is there any way to improve this code? Any suggestions or hints would be very appreciated. Thank you in advance!
# generate sample data
import numpy as np
score_mat = np.random.randint(low=0, high=4, size=(2,2))
value_mat = np.random.randn(3,2,2)
# solve problem
# init the output info
output = np.zeros((2, 3))
# update the output info
for entity_1 in range(2):
# consider meaningful score
entity_others_list = np.where(score_mat[entity_1,:]>0)[0].tolist()
# iterate every other entity
for entity_2 in entity_others_list:
vec = value_mat[:,entity_1,entity_2].copy()
vec *= score_mat[entity_1,entity_2]
output[entity_1] += vec

You don't need to iterate them manually, just multiply score_mat by value_mat, then call sum on axis=2, again call sum on axis=1.
As you have mentioned that the score will make sense only if it is greater than zero, if that's the case, you can first replace non-positive values by 1, since multiplying something by 1 remains intact:
>>> score_mat[score_mat<=0] = 1
>>> (score_mat*value_mat).sum(axis=2).sum(axis=1)
array([-0.58826032, -3.08093186, 10.47858256])
Break-down:
# This is what the randomly generated numpy arrays look like:
>>> score_mat
array([[3, 3],
[1, 3]])
>>> value_mat
array([[[ 0.81935985, 0.92228075],
[ 1.07754964, -2.29691059]],
[[ 0.12355602, -0.36182607],
[ 0.49918847, -0.95510339]],
[[ 2.43514089, 1.17296263],
[-0.81233976, 0.15553725]]])
# When you multiply the matcrices, each inner matrices in value_mat will be multiplied
# element-wise by score_mat
>>> score_mat*value_mat
array([[[ 2.45807955, 2.76684225],
[ 1.07754964, -6.89073177]],
[[ 0.37066806, -1.08547821],
[ 0.49918847, -2.86531018]],
[[ 7.30542266, 3.51888789],
[-0.81233976, 0.46661176]]])
# Now calling sum on axis=2, will give the sum of each rows in the inner-most matrices
>>> (score_mat*value_mat).sum(axis=2)
array([[ 5.22492181, -5.81318213],
[-0.71481015, -2.36612171],
[10.82431055, -0.34572799]])
# Finally calling sum on axis=1, will again sum the row values
>>> (score_mat*value_mat).sum(axis=2).sum(axis=1)
array([-0.58826032, -3.08093186, 10.47858256])

How to mutliply a number with negative power in python

When I try to multiply this by a negative integer it just returns an error
I use:
A = np.array([[1,2,0], [2,4,-2], [0,-2,3]])

From the screenshot, I can see this is homework.
So it asks for the matrix inverse. In maths this is written as A^(-1)
import numpy as np
A = np.array([[1,2,0], [2,4,-2], [0,-2,3]])
np.linalg.inv(A)
array([[-2. , 1.5 , 1. ],
[ 1.5 , -0.75, -0.5 ],
[ 1. , -0.5 , 0. ]])

In numpy, you can not raise integers by negative integer powers (Read this).
In python, the ** operator returns the value without any error.
In [6]: A = 20
In [7]: print(A ** -1)
0.05
You can also use pow(),
In [1]: A = 20
In [2]: pow(20, -1)
Out[2]: 0.05

If you're working with matrices, it's a good idea to ensure that they are instances of the numpy.matrix type rather than the more-generic numpy.ndarray.
import numpy as np
M = np.matrix([[ ... ]])
To convert an existing generic array to a matrix you can also pass it into np.asmatrix().
Once you have a matrix instance M, one way to get the inverse is M.I
To avoid the "integers not allowed" problem, ensure that the dtype of your matrix is floating-point, not integer (specify dtype=float in the call to matrix() or asmatrix())

To Insert power as negative value assume an another variable and name it "pow" and assign that negative value.
Now put below in your code.
pow = -3
value = 5**pow
print(value)
Execute the code and you will see result.
Hope it helps... 🤗🤗🤗

Converting object of type dtype='<U77' into numpy array

I have an object of dtype='<U77' type, consisting of a string of numbers, separated with the spaces:
array('[ 0.20988965 0.05172284 -0.13468404 ... 2.06070718 -0.6160391\n 3. ]',
dtype='<U77')
How can I convert it into numpy array?

Even if you wanted to do some kludgy string parsing to try to fix this object, you can't. You've already lost almost all of the original data, and there's no way to get it back just by looking at the string.
See that ... in the middle? That's what happens when you print an array large enough to trigger summarization:
>>> print(numpy.arange(1001))
[ 0 1 2 ... 998 999 1000]
It looks like you printed a large array and then called array on the resulting string. NumPy isn't designed for print to be reversible, and even in the cases where it is reversible, calling array on the printed output isn't how you'd reverse it.
You need to redo the computation that originally produced the array, and pick a better way to save the result, like numpy.save.

So here is a quick solution:
save the original data string as np.savetxt('filename', data_string), then when loading you get something like the following:
array('[ 0.119871 -0.50688947 0.27891722 0.58804999 -2.03537473 0.63659631\n 1.2 -0.83374409 -1.04955507 -0.6538087 -0.05 -0.23323881\n 1.2 3. 1.2 ]', dtype='<U183')
use np.fromstring(c1[1:-2], dtype=float, sep=' ') as a converter, this will come back with a similar numpy array:array([ 0.119871 , -0.50688947, 0.27891722, 0.58804999, -2.03537473,
0.63659631, 1.2 , -0.83374409, -1.04955507, -0.6538087 ,
-0.05 , -0.23323881, 1.2 , 3. , 1.2 ])

How can I apply the assignment operator correctly in Python?

I have to do some math operations (e.g., add, multiply) on a large array.
To prevent any 'MemoryError' , I am doing my computations as suggested on the answer from this thread.
However, I am running into some trouble while applying the assignment operations as per suggested in the thread. I will demonstrate my problem using a small 3x3 array.
I have the following input array K:
array([[ 0. , 0.51290339, 0.24675368],
[ 0.51290339, 0. , 0.29440921],
[ 0.24675368, 0.29440921, 0. ]])
I want to apply the following computation to the input array K:
output = K* (1.5 - 0.5 * K* K)
I apply the above equation to compute the desired output as follows in Python:
K*= (1.5+np.dot(np.dot(-0.5,K),K))
However, the output answer is not correct.
My desired answer should be:
0.0000000 0.7018904 0.3626184
0.7018904 0.0000000 0.4288546
0.3626184 0.4288546 0.0000000
Any help is welcome.

The difference arises because dot computes the dot product whereas * computes the element-wise product. Try using
K *= 1.5 - 0.5 * K * K
instead.
Addition
Unfortunately, that does not yet solve the memory problems. I would recommend using cython to compute the desired function without allocating extra memory.
# This cython function must be compiled
def evaluate_function_inplace(double[:] values):
cdef int i
for i in range(values.shape[0]):
values[i] *= 1.5 - 0.5 * values[i] * values[i]
Subsequently, you can use the function like so.
K = ...
evaluate_function_inplace(K.ravel())
The K.ravel() call will flatten the array but will not allocate new memory.
Of course, you can also use the above approach without resorting to cython but the performance overhead of iterating over the elements of such a large array in python are very large.

Your problem is that you're actually performing matrices multiplications.
In your case what you want is the following :
K = (np.dot(-0.5,K) * K + 1.5) * K

Try this
K*= (1.5+np.multiply(np.multiply(-0.5,K),K))
It gives output
array([[ 0. , 0.70189037, 0.36261843],
[ 0.70189037, 0. , 0.42885459],
[ 0.36261843, 0.42885459, 0. ]])

Division by zero in numpy (sub)arrays

I have three arrays that are processed with a mathematical function to get a final result array. Some of the arrays contain NaNs and some contain 0. However a division by zero logically raise a Warning, a calculation with NaN gives NaN. So I'd like to do certain operations on certain parts of the arrays where zeros are involved:
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
1.0*n*numpy.exp(r*(1-(n/k)))
e.g. in cases where k == 0, I'd like to get as a result 0. In all other cases I'd to calculate the function above. So what is the way to do such calculations on parts of the array (via indexing) to get a final single result array?

import numpy
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
indxZeros=numpy.where(k==0)
indxNonZeros=numpy.where(k!=0)
d=numpy.empty(k.shape)
d[indxZeros]=0
d[indxNonZeros]=n[indxNonZeros]/k[indxNonZeros]
print d

Is following what you need?
>>> rv = 1.0*n*numpy.exp(r*(1-(n/k)))
>>> rv[k==0] = 0
>>> rv
array([ nan, 0., nan])

So, you may think that the solution to this problem is to use numpy.where, but the following:
numpy.where(k==0, 0, 1.0*n*numpy.exp(r*(1-(n/k))))
still gives a warning, as the expression is actually evaluated for the cases where k is zero, even if those results aren't used.
If this really bothers you, you can use numexpr for this expression, which will actually branch on the where statement and not evaluate the k==0 case:
import numexpr
numexpr.evaluate('where(k==0, 0, 1.0*n*exp(r*(1-(n/k))))')
Another way, based on indexing as you asked for, involves a little loss in legibility
result = numpy.zeros_like(k)
good = k != 0
result[good] = 1.0*n[good]*numpy.exp(r[good]*(1-(n[good]/k[good])))
This can be bypassed somewhat by defining a gaussian function:
def gaussian(r, k, n):
return 1.0*n*numpy.exp(r*(1-(n/k)))
result = numpy.zeros_like(k)
good = k != 0
result[good] = gaussian(r[good], k[good], n[good])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Perform element-wise operation on numpy string array - python

You can use join and re.split import numpy as np import re io = np.asarray(['hello world','hello Graz', 'hello all']) print(re.split('[ -]', '-'.join(io)))

Related

How can I improve the efficiency of my algorithm, while I use two loops inside?

How to mutliply a number with negative power in python

Converting object of type dtype='<U77' into numpy array

How can I apply the assignment operator correctly in Python?

Division by zero in numpy (sub)arrays

Categories

Resources