Currently I am working on a python script which extracts measurement data from a text file. I am working with iPython Notebook and Python 2.7
Now I experienced some odd behaviour when working with numpy arrays. I have no explanation for this.
myArray = numpy.zeros((4,3))
myArrayTransposed = myArray.transpose()
for i in range(0,4):
for j in range(0,3):
myArray[i][j] = i+j
print myArray
print myArrayTransposed
leads to:
[[ 0. 1. 2.]
[ 1. 2. 3.]
[ 2. 3. 4.]
[ 3. 4. 5.]]
[[ 0. 1. 2. 3.]
[ 1. 2. 3. 4.]
[ 2. 3. 4. 5.]]
So without working on the transposed array, values are updated in this array.
How is this possible?
From http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html:
Different ndarrays can share the same data, so that changes made in one ndarray may be visible in another. That is, an ndarray can be a “view” to another ndarray, and the data it is referring to is taken care of by the “base” ndarray. ndarrays can also be views to memory owned by Python strings or objects implementing the buffer or array interfaces.
When you do a transpose(), this returns a "view" to the original ndarray. It points to the same memory buffer, but it has a different indexing scheme:
A segment of memory is inherently 1-dimensional, and there are many different schemes for arranging the items of an N-dimensional array in a 1-dimensional block. Numpy is flexible, and ndarray objects can accommodate any strided indexing scheme.
To create an independent ndarray, you can use numpy.array() operator:
myArrayTransposed = myArray.transpose().copy()
Related
I have a file that it's content is:
30373
25512
65332
33549
35390
So i want to create a 2D (matrix) array of the file content
like this
[[3. 0. 3. 7. 3.]
[2. 5. 5. 1. 2.]
[6. 5. 3. 3. 2.]
[3. 3. 5. 4. 9.]
[3. 5. 3. 9. 0.]]
So i try this
import numpy as np
print(np.loadtxt('file.txt'))
But this gave me the following
[30373. 25512. 65332. 33549. 35390.]
So it's not the answer that i excepted
And also there is an parameter called delimiter in method
import numpy as np
print(np.loadtxt('file.txt', delimiter=''))
And this was not the excepted answer too
So can any one help me to figure out this problem.
[EDIT]
It's easy to use the following codes
array = [i.split() for i in open('file.txt').read().splitlines()]
But i want to know, is it possible to do this by numpy?
coniferous was not that far, tho, by replacing loadtxt with genfromtxt, even if that was not just it.
genfromtxt's delimiter option can be either a string describing what separates two fields (eg, ','), or it can also be an integer, which is then the size of the field (for fixed size field format).
So, in your case
np.genfromtxt('file.txt', delimiter=1)
does exactly what you want.
Simply read ffile using python to list & then typecast to float as follows & then convert to numpy array.
import numpy as np
with open(r"text_file.txt") as f:
data = [x.rstrip('\n').split(" ") for x in f.readlines() ]
new = []
for x in data:
for y in x:
s =list(map(float,str(y)))
new.append(s)
npa = np.asarray(new, dtype=np.float32)
print(npa)
Gives #
[[3. 0. 3. 7. 3.]
[2. 5. 5. 1. 2.]
[6. 5. 3. 3. 2.]
[3. 3. 5. 4. 9.]
[3. 5. 3. 9. 0.]]
Note: There could a functional approch. i'm not aware of any
functional approches to solve this. so I solved step by step
I've recently used both of these functions, and am looking for input from anyone who can speak to the following:
do argsort and rankdata differ fundamentally in their purpose?
are there performance advantages with one over the other? (specifically: large vs small array performance differences?)
what is the memory overhead associated with importing rankdata?
Thanks in advance.
p.s. I could not create the new tags 'argsort' or 'rankdata'. If anyone with sufficient standing feels they should be added to this question, please do.
Do argsort and rankdata differ fundamentally in their purpose?
In my opinion, they do slightly. The first gives you the positions of the data if the data was sorted, while the second the rank of the data. The difference can become apparent in the case of ties:
import numpy as np
from scipy import stats
a = np.array([ 5, 0.3, 0.4, 1, 1, 1, 3, 42])
almost_ranks = np.empty_like(a)
almost_ranks[np.argsort(a)] = np.arange(len(a))
print(almost_ranks)
print(almost_ranks+1)
print(stats.rankdata(a))
Results to (notice 3. 4. 5 vs. 4. 4. 4 ):
[6. 0. 1. 2. 3. 4. 5. 7.]
[7. 1. 2. 3. 4. 5. 6. 8.]
[7. 1. 2. 4. 4. 4. 6. 8.]
Are there performance advantages with one over the other?
(specifically: large vs small array performance differences?)
Both algorithms seem to me to have the same complexity: O(NlgN) I would expect the numpy implementation to be slightly faster as it has a bit of a smaller overhead, plus it's numpy. But you should test this yourself... Checking the code for scipy.rankdata, it seems to -at present, my python...- be calling np.unique among other functions, so i would guess it would take more in practice...
what is the memory overhead associated with importing rankdata?
Well, you import scipy, if you had not done so before, so it is the overhead of scipy...
I have to do some math operations (e.g., add, multiply) on a large array.
To prevent any 'MemoryError' , I am doing my computations as suggested on the answer from this thread.
However, I am running into some trouble while applying the assignment operations as per suggested in the thread. I will demonstrate my problem using a small 3x3 array.
I have the following input array K:
array([[ 0. , 0.51290339, 0.24675368],
[ 0.51290339, 0. , 0.29440921],
[ 0.24675368, 0.29440921, 0. ]])
I want to apply the following computation to the input array K:
output = K* (1.5 - 0.5 * K* K)
I apply the above equation to compute the desired output as follows in Python:
K*= (1.5+np.dot(np.dot(-0.5,K),K))
However, the output answer is not correct.
My desired answer should be:
0.0000000 0.7018904 0.3626184
0.7018904 0.0000000 0.4288546
0.3626184 0.4288546 0.0000000
Any help is welcome.
The difference arises because dot computes the dot product whereas * computes the element-wise product. Try using
K *= 1.5 - 0.5 * K * K
instead.
Addition
Unfortunately, that does not yet solve the memory problems. I would recommend using cython to compute the desired function without allocating extra memory.
# This cython function must be compiled
def evaluate_function_inplace(double[:] values):
cdef int i
for i in range(values.shape[0]):
values[i] *= 1.5 - 0.5 * values[i] * values[i]
Subsequently, you can use the function like so.
K = ...
evaluate_function_inplace(K.ravel())
The K.ravel() call will flatten the array but will not allocate new memory.
Of course, you can also use the above approach without resorting to cython but the performance overhead of iterating over the elements of such a large array in python are very large.
Your problem is that you're actually performing matrices multiplications.
In your case what you want is the following :
K = (np.dot(-0.5,K) * K + 1.5) * K
Try this
K*= (1.5+np.multiply(np.multiply(-0.5,K),K))
It gives output
array([[ 0. , 0.70189037, 0.36261843],
[ 0.70189037, 0. , 0.42885459],
[ 0.36261843, 0.42885459, 0. ]])
say I have a numpy array like this io = np.asarray(['hello world','hello Graz', 'hello all']). Now its shape is io.shape (3,). I would like to perform a split per each element. I know this works splituf = lambda i: np.asarray([item.split(" ",1) for item in i]). Because the real life application will be on much larger array I'd like to avoid the for loop and use vectorized operation.
Any ideas?
Many thanks
There's a collection nu py functions that applies the Python str operations to elemets of an array
http://docs.scipy.org/doc/numpy/reference/routines.char.html
This includes a np.char.split.
In my limited experience these aren't significantly faster than a list comprehension because they still call Python functions, not fast compiled numpyccode.
If the split occurs at the same point in each string , ega[:5],a[5:]`, we might be able to do some dtype conversion.
The result will be 2d, right?
You can used pandas library. It is built uses numpy, providing rich documentation and wonderful operations like pivot, graphs, element-wise operations,... lots of them
Note: pandas is not a replacement for numpy.
Pandas element wise operation
Here is one special case of element wise operation
>>> sam = np.arange(15)
>>> print sam
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
>>> print pd.rolling_apply(sam, 2, lambda x: x[1] - x[0])
[ nan 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
You can use join and re.split
import numpy as np
import re
io = np.asarray(['hello world','hello Graz', 'hello all'])
print(re.split('[ -]', '-'.join(io)))
I have a set of tasks i have to complete please help me im stuck on the multiplication one :(
1. np.array([0,5,10]) will create an array of integers starting at 0, finishing at 10, with step 5. Use a different command to create the same array automatically.
array_a = np.linspace(0,10,5)
print array_a
Is this correct? Also what is meant by automatically?
2. Create (automatically, not using np.array!) another array that contains 3 equally-spaced floating point numbers starting at 2.5 and finishing at 3.5.
array_b = np.linspace(2.5,3.5,3,)
print array_b
Use the multiplication operator * to multiply the two arrays together
How do i multiply them? I get an error that they arent the same shape, so do i need to slice array a?
The answer to the first problem is wrong; it asks you to create an array with elements [0, 5, 10]. When I run your code it prints [ 0. , 2.5, 5. , 7.5, 10. ] instead. I don't want to give the answer away completely (it is homework after all), but try looking up the docs for the arange function. You can solve #1 with either linspace or arange (you'll have to tweak the parameters either way), but I think the arange function is more suited to the specific wording of the question.
Once you've got #1 returning the correct result, the error in #3 should go away because the arrays will both have length 3 (i.e. they'll have the same shape).