I was unable to find anything describing how to do this, which leads to be believe I'm not doing this in the proper idiomatic Python way. Advice on the 'proper' Python way to do this would also be appreciated.
I have a bunch of variables for a datalogger I'm writing (arbitrary logging length, with a known maximum length). In MATLAB, I would initialize them all as 1-D arrays of zeros of length n, n bigger than the number of entries I would ever see, assign each individual element variable(measurement_no) = data_point in the logging loop, and trim off the extraneous zeros when the measurement was over. The initialization would look like this:
[dData gData cTotalEnergy cResFinal etc] = deal(zeros(n,1));
Is there a way to do this in Python/NumPy so I don't either have to put each variable on its own line:
dData = np.zeros(n)
gData = np.zeros(n)
etc.
I would also prefer not just make one big matrix, because keeping track of which column is which variable is unpleasant. Perhaps the solution is to make the (length x numvars) matrix, and assign the column slices out to individual variables?
EDIT: Assume I'm going to have a lot of vectors of the same length by the time this is over; e.g., my post-processing takes each log file, calculates a bunch of separate metrics (>50), stores them, and repeats until the logs are all processed. Then I generate histograms, means/maxes/sigmas/etc. for all the various metrics I computed. Since initializing 50+ vectors is clearly not easy in Python, what's the best (cleanest code and decent performance) way of doing this?
If you're really motivated to do this in a one-liner you could create an (n_vars, ...) array of zeros, then unpack it along the first dimension:
a, b, c = np.zeros((3, 5))
print(a is b)
# False
Another option is to use a list comprehension or a generator expression:
a, b, c = [np.zeros(5) for _ in range(3)] # list comprehension
d, e, f = (np.zeros(5) for _ in range(3)) # generator expression
print(a is b, d is e)
# False False
Be careful, though! You might think that using the * operator on a list or tuple containing your call to np.zeros() would achieve the same thing, but it doesn't:
h, i, j = (np.zeros(5),) * 3
print(h is i)
# True
This is because the expression inside the tuple gets evaluated first. np.zeros(5) therefore only gets called once, and each element in the repeated tuple ends up being a reference to the same array. This is the same reason why you can't just use a = b = c = np.zeros(5).
Unless you really need to assign a large number of empty array variables and you really care deeply about making your code compact (!), I would recommend initialising them on separate lines for readability.
Nothing wrong or un-Pythonic with
dData = np.zeros(n)
gData = np.zeros(n)
etc.
You could put them on one line, but there's no particular reason to do so.
dData, gData = np.zeros(n), np.zeros(n)
Don't try dData = gData = np.zeros(n), because a change to dData changes gData (they point to the same object). For the same reason you usually don't want to use x = y = [].
The deal in MATLAB is a convenience, but isn't magical. Here's how Octave implements it
function [varargout] = deal (varargin)
if (nargin == 0)
print_usage ();
elseif (nargin == 1 || nargin == nargout)
varargout(1:nargout) = varargin;
else
error ("deal: nargin > 1 and nargin != nargout");
endif
endfunction
In contrast to Python, in Octave (and presumably MATLAB)
one=two=three=zeros(1,3)
assigns different objects to the 3 variables.
Notice also how MATLAB talks about deal as a way of assigning contents of cells and structure arrays. http://www.mathworks.com/company/newsletters/articles/whats-the-big-deal.html
If you put your data in a collections.defaultdict you won't need to do any explicit initialization. Everything will be initialized the first time it is used.
import numpy as np
import collections
n = 100
data = collections.defaultdict(lambda: np.zeros(n))
for i in range(1, n):
data['g'][i] = data['d'][i - 1]
# ...
How about using map:
import numpy as np
n = 10 # Number of data points per array
m = 3 # Number of arrays being initialised
gData, pData, qData = map(np.zeros, [n] * m)
Python has a built in functionality for checking the validity of entire slices: slice.indices. Is there something similar that is built-in for individual indices?
Specifically, I have an index, say a = -2 that I wish to normalize with respect to a 4-element list. Is there a method that is equivalent to the following already built in?
def check_index(index, length):
if index < 0:
index += length
if index < 0 or index >= length:
raise IndexError(...)
My end result is to be able to construct a tuple with a single non-None element. I am currently using list.__getitem__ to do the check for me, but it seems a little awkward/overkill:
items = [None] * 4
items[a] = 'item'
items = tuple(items)
I would like to be able to do
a = check_index(a, 4)
items = tuple('item' if i == a else None for i in range(4))
Everything in this example is pretty negotiable. The only things that are fixed is that I am getting a in a way that can have all of the problems that an arbitrary index can have and that the final result has to be a tuple.
I would be more than happy if the solution used numpy and only really applied to numpy arrays instead of Python sequences. Either one would be perfect for the application I have in mind.
If I understand correctly, you can use range(length)[index], in your example range(4)[-2]. This properly handles negative and out-of-bounds indices. At least in recent versions of Python, range() doesn't literally create a full list so this will have decent performance even for large arguments.
If you have a large number of indices to do this with in parallel, you might get better performance doing the calculation with Numpy vectorized arithmetic, but I don't think the technique with range will work in that case. You'd have to manually do the calculation using the implementation in your question.
There is a function called numpy.core.multiarray.normalize_axis_index which does exactly what I need. It is particularly useful to be because the implementation I had in mind was for numpy array indexing:
from numpy.core.multiarray import normalize_axis_index
>>> normalize_axis_index(3, 4)
3
>>> normalize_axis_index(-3, 4)
1
>>> normalize_axis_index(-5, 4)
...
numpy.core._internal.AxisError: axis -5 is out of bounds for array of dimension 4
The function was added in version 1.13.0. The source for this function is available here, and the documentation source is here.
This is a pretty simple question, I've written a lot in case people find this that were a few hours behind me on the 'WHY WON'T THIS WORK?' train, to help them out
In Matlab, the following code would create a dynamically increasing-in-size array:
for i = 1:5
array(i) = i*2;
end
but I am having some problems with the complexities in Python. The following does work but isn't exactly what I want:
i = []
array = []
for i in range(1, 5):
array.append(i*2)
This works however you can only append to the end. You can also allocate values to a range of cells that already exist - beyond simply 'stick them on the end' (i.e. the below code which could replace cells 14-36 in a 100 cell long list)
i = []
array = list(xrange(1,100)) #creates list from 1 to 99
for i in range(14, 36):
array[i] = i*2 #assign and overwrite previous cell values
Is there some catch-all coding method here that combines the two? A solution to the following code:
i = []
array = list(xrange(1,50)) #creates list from 1 to 49
for i in range(34, 66):
array[i] = i*2
Error message:
IndexError: list assignment index out of range
General differences I've seen so far:
Python starts at cell number [0] not [1]
You can't dynamically update list sizes and so you need to use the append function
(Possibly?) need to preallocate the lists before using them
**
Note for others struggling:
One error that consistently came up was:
TypeError: 'builtin_function_or_method' object does not support item assignment
This was due to trying to assign a value like this:
array.append[i-1] = i*2
See higher up for the correct method.
Misc.
Thanks for any help! I'm sure this is really simple but I have run out of ideas and can't find a solution in previous questions!
Other similar questions that either didn't solve it or I didn't understand:
"TypeError: 'function' object does not support item assignment"
Python Array is read-only, can't append values
http://www.pythonbackend.com/topic/1329787069?reply=3
You're going to need to familiarize yourself with numpy as a minimum if you're going to get close to Matlab-like functionality in python. The following is a useful reference:
https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html
I suggest you try to avoid dynamically increasing the size of your list altogether (in MATLAB as well as python) unless you have a good reason to do so. Maybe you could check the desired (final) size of your list elsewhere in the code, before allocation.
So I would go along the same lines as one of your suggestions, with the difference that you can allocate an empty list like so:
arr = [None] * 100
for i in range(100):
arr[i] = i*2
Or, as John Greenall suggested, use numpy:
import numpy as np
arr = np.empty(100)
for i in range(100):
arr[i] = i*2
I'm wondering what is the most efficient way to replace elements in an array with other random elements in the array given some criteria. More specifically, I need to replace each element which doesn't meet a given criteria with another random value from that row. For example, I want to replace each row of data as a random cell in data(row) which is between -.8 and .8. My inefficinet solution looks something like this:
import numpy as np
data = np.random.normal(0, 1, (10, 100))
for index, row in enumerate(data):
row_copy = np.copy(row)
outliers = np.logical_or(row>.8, row<-.8)
for prob in np.where(outliers==1)[0]:
fixed = 0
while fixed == 0:
random_other_value = r.randint(0,99)
if random_other_value in np.where(outliers==1)[0]:
fixed = 0
else:
row_copy[prob] = row[random_other_value]
fixed = 1
Obviously, this is not efficient.
I think it would be faster to pull out all the good values, then use random.choice() to pick one whenever you need it. Something like this:
import numpy as np
import random
from itertools import izip
data = np.random.normal(0, 1, (10, 100))
for row in data:
good_ones = np.logical_and(row >= -0.8, row <= 0.8)
good = row[good_ones]
row_copy = np.array([x if f else random.choice(good) for f, x in izip(good_ones, row)])
High-level Python code that you write is slower than the C internals of Python. If you can push work down into the C internals it is usually faster. In other words, try to let Python do the heavy lifting for you rather than writing a lot of code. It's zen... write less code to get faster code.
I added a loop to run your code 1000 times, and to run my code 1000 times, and measured how long they took to execute. According to my test, my code is ten times faster.
Additional explanation of what this code is doing:
row_copy is being set by building a new list, and then calling np.array() on the new list to convert it to a NumPy array object. The new list is being built by a list comprehension.
The new list is made according to the rule: if the number is good, keep it; else, take a random choice from among the good values.
A list comprehension walks over a sequence of values, but to apply this rule we need two values: the number, and the flag saying whether that number is good or not. The easiest and fastest way to make a list comprehension walk along two sequences at once is to use izip() to "zip" the two sequences together. izip() will yield up tuples, one at a time, where the tuple is (f, x); f in this case is the flag saying good or not, and x is the number. (Python has a built-in feature called zip() which does pretty much the same thing, but actually builds a list of tuples; izip() just makes an iterator that yields up tuple values. But you can play with zip() at a Python prompt to learn more about how it works.)
In Python we can unpack a tuple into variable names like so:
a, b = (2, 3)
In this example, we set a to 2 and b to 3. In the list comprehension we unpack the tuples from izip() into variables f and x.
Then the heart of the list comprehension is a "ternary if" statement like so:
a if flag else b
The above will return the value a if the flag value is true, and otherwise return b. The one in this list comprehension is:
x if f else random.choice(good)
This implements our rule.
I have a problem with some numpy stuff. I need a numpy array to behave in an unusual manner by returning a slice as a view of the data I have sliced, not a copy. So heres an example of what I want to do:
Say we have a simple array like this:
a = array([1, 0, 0, 0])
I would like to update consecutive entries in the array (moving left to right) with the previous entry from the array, using syntax like this:
a[1:] = a[0:3]
This would get the following result:
a = array([1, 1, 1, 1])
Or something like this:
a[1:] = 2*a[:3]
# a = [1,2,4,8]
To illustrate further I want the following kind of behaviour:
for i in range(len(a)):
if i == 0 or i+1 == len(a): continue
a[i+1] = a[i]
Except I want the speed of numpy.
The default behavior of numpy is to take a copy of the slice, so what I actually get is this:
a = array([1, 1, 0, 0])
I already have this array as a subclass of the ndarray, so I can make further changes to it if need be, I just need the slice on the right hand side to be continually updated as it updates the slice on the left hand side.
Am I dreaming or is this magic possible?
Update: This is all because I am trying to use Gauss-Seidel iteration to solve a linear algebra problem, more or less. It is a special case involving harmonic functions, I was trying to avoid going into this because its really not necessary and likely to confuse things further, but here goes.
The algorithm is this:
while not converged:
for i in range(len(u[:,0])):
for j in range(len(u[0,:])):
# skip over boundary entries, i,j == 0 or len(u)
u[i,j] = 0.25*(u[i-1,j] + u[i+1,j] + u[i, j-1] + u[i,j+1])
Right? But you can do this two ways, Jacobi involves updating each element with its neighbours without considering updates you have already made until the while loop cycles, to do it in loops you would copy the array then update one array from the copied array. However Gauss-Seidel uses information you have already updated for each of the i-1 and j-1 entries, thus no need for a copy, the loop should essentially 'know' since the array has been re-evaluated after each single element update. That is to say, every time we call up an entry like u[i-1,j] or u[i,j-1] the information calculated in the previous loop will be there.
I want to replace this slow and ugly nested loop situation with one nice clean line of code using numpy slicing:
u[1:-1,1:-1] = 0.25(u[:-2,1:-1] + u[2:,1:-1] + u[1:-1,:-2] + u[1:-1,2:])
But the result is Jacobi iteration because when you take a slice: u[:,-2,1:-1] you copy the data, thus the slice is not aware of any updates made. Now numpy still loops right? Its not parallel its just a faster way to loop that looks like a parallel operation in python. I want to exploit this behaviour by sort of hacking numpy to return a pointer instead of a copy when I take a slice. Right? Then every time numpy loops, that slice will 'update' or really just replicate whatever happened in the update. To do this I need slices on both sides of the array to be pointers.
Anyway if there is some really really clever person out there that awesome, but I've pretty much resigned myself to believing the only answer is to loop in C.
Late answer, but this turned up on Google so I probably point to the doc the OP wanted. Your problem is clear: when using NumPy slices, temporaries are created. Wrap your code in a quick call to weave.blitz to get rid of the temporaries and have the behaviour your want.
Read the weave.blitz section of PerformancePython tutorial for full details.
accumulate is designed to do what you seem to want; that is, to proprigate an operation along an array. Here's an example:
from numpy import *
a = array([1,0,0,0])
a[1:] = add.accumulate(a[0:3])
# a = [1, 1, 1, 1]
b = array([1,1,1,1])
b[1:] = multiply.accumulate(2*b[0:3])
# b = [1 2 4 8]
Another way to do this is to explicitly specify the result array as the input array. Here's an example:
c = array([2,0,0,0])
multiply(c[:3], c[:3], c[1:])
# c = [ 2 4 16 256]
Just use a loop. I can't immediately think of any way to make the slice operator behave the way you're saying you want it to, except maybe by subclassing numpy's array and overriding the appropriate method with some sort of Python voodoo... but more importantly, the idea that a[1:] = a[0:3] should copy the first value of a into the next three slots seems completely nonsensical to me. I imagine that it could easily confuse anyone else who looks at your code (at least the first few times).
It is not the correct logic.
I'll try to use letters to explain it.
Image array = abcd with a,b,c,d as elements.
Now, array[1:] means from the element in position 1 (starting from 0) on.
In this case:bcd and array[0:3] means from the character in position 0 up to the third character (the one in position 3-1) in this case: 'abc'.
Writing something like:
array[1:] = array[0:3]
means: replace bcd with abc
To obtain the output you want, now in python, you should use something like:
a[1:] = a[0]
It must have something to do with assigning a slice. Operators, however, as you may already know, do follow your expected behavior:
>>> a = numpy.array([1,0,0,0])
>>> a[1:]+=a[:3]
>>> a
array([1, 1, 1, 1])
If you already have zeros in your real-world problem where your example does, then this solves it. Otherwise, at added cost, set them to zero either by multiplying by zero or assigning to zero, (whichever is faster)
edit:
I had another thought. You may prefer this:
numpy.put(a,[1,2,3],a[:3])
Numpy must be checking if the target array is the same as the input array when doing the setkey call. Luckily, there are ways around it. First, I tried using numpy.put instead
In [46]: a = numpy.array([1,0,0,0])
In [47]: numpy.put(a,[1,2,3],a[0:3])
In [48]: a
Out[48]: array([1, 1, 1, 1])
And then from the documentation of that, I gave using flatiters a try (a.flat)
In [49]: a = numpy.array([1,0,0,0])
In [50]: a.flat[1:] = a[0:3]
In [51]: a
Out[51]: array([1, 1, 1, 1])
But this doesn't solve the problem you had in mind
In [55]: a = np.array([1,0,0,0])
In [56]: a.flat[1:] = 2*a[0:3]
In [57]: a
Out[57]: array([1, 2, 0, 0])
This fails because the multiplication is done before the assignment, not in parallel as you would like.
Numpy is designed for repeated application of the exact same operation in parallel across an array. To do something more complicated, unless you can find decompose it in terms of functions like numpy.cumsum and numpy.cumprod, you'll have to resort to something like scipy.weave or writing the function in C. (See the PerfomancePython page for more details.) (Also, I've never used weave, so I can't guarantee it will do what you want.)
You could have a look at np.lib.stride_tricks.
There is some information in these excellent slides:
http://mentat.za.net/numpy/numpy_advanced_slides/
with stride_tricks starting at slide 29.
I'm not completely clear on the question though so can't suggest anything more concrete - although I would probably do it in cython or fortran with f2py or with weave. I'm liking fortran more at the moment because by the time you add all the required type annotations in cython I think it ends up looking less clear than the fortran.
There is a comparison of these approaches here:
www. scipy. org/ PerformancePython
(can't post more links as I'm a new user)
with an example that looks similar to your case.
In the end I came up with the same problem as you. I had to resort to use Jacobi iteration and weaver:
while (iter_n < max_time_steps):
expr = "field[1:-1, 1:-1] = (field[2:, 1:-1] "\
"+ field[:-2, 1:-1]+"\
"field[1:-1, 2:] +"\
"field[1:-1, :-2] )/4."
weave.blitz(expr, check_size=0)
#Toroidal conditions
field[:,0] = field[:,self.flow.n_x - 2]
field[:,self.flow.n_x -1] = field[:,1]
iter_n = iter_n + 1
It works and is fast, but is not Gauss-Seidel, so convergence can be a bit tricky. The only option of doing Gauss-Seidel as a traditional loop with indexes.
i would suggest cython instead of looping in c. there might be some fancy numpy way of getting your example to work using a lot of intermediate steps... but since you know how to write it in c already, just write that quick little bit as a cython function and let cython's magic make the rest of the work easy for you.