Converting an array of time increments to an array of instants - python

If I have an array of time increments, for example:
intervals = np.random.normal(loc=1,scale=0.1,size=100)
one possible way to create the corresponding array of time instants is to create a list and manually make the sum:
Sum=0.
instants=[]
for k in range(len(intervals)):
Sum+=intervals[k]
instants.append(Sum)
instants=np.array(instants)
So, I have just switched from a array of dt(i) to an array of t(i).
But usually python offers elegant alternatives to using for loops. Is there a better way to do it?

What you here describe is the cumulative sum. You can calculate this with .cumsum() [numpy-doc]:
intervals.cumsum()
For example:
>>> intervals
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> intervals.cumsum()
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])

Related

How can I find the final cumulative sum across numpy axis? [duplicate]

This question already has answers here:
How to calculate the sum of all columns of a 2D numpy array (efficiently)
(6 answers)
Closed 4 years ago.
I have a numpy array
np.array(data).shape
(50,50)
Now, I want to find the cumulative sums across axis=1. The problem is cumsum creates an array of cumulative sums, but I just care about the final value of every row.
This is incorrect of course:
np.cumsum(data, axis=1)[-1]
Is there a succinct way of doing this without looping through the array.
You are almost there, but as you have it now, you are selecting just the final row. What you need is to select all rows from the last column, so your indexing at the end should be: [:,-1].
Example:
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> a.cumsum(axis=1)[:,-1]
array([ 10, 35, 60, 85, 110])
Note, I'm leaving this up as I think it explains what was going wrong with your attempt, but admittedly, there are more effective ways of doing this in the other answers!
The final cumulative sum of every row, is in fact simply the sum of every row, or the row-wise sum, so we can implement this as:
>>> x.sum(axis=1)
array([ 10, 35, 60, 85, 110])
So here for every row, we calculate the sum of all the columns. We thus do not need to first generate the sums in between (well these will likely be stored in an accumulator in numpy), but not "emitted" in the array.
You can use numpy.ufunc.reduce if you don't need the intermediary accumulated results of any ufunc.
>>> a = np.arange(9).reshape(3,3)
>>> a
>>>
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>>
>>> np.add.reduce(a, axis=1)
>>> array([ 3, 12, 21])
However, in the case of sum, Willem's answer is clearly superior and to be preferred. Just keep in mind that in the general case, there's ufunc.reduce.

Function reads np.array - produces the mean for k nn to number p in np.array

I need to defina a function which reads a numpy array and produces the mean for k nearest points to number p in the array.
Example:
array= np.array([1, 2, 3, 4, 5, 6, 7, 50, 24, 32, 9, 11, 12, 10])
p= 15 (**Note this is not a number in the array, I will need to find the
number closest to p or p number itself)
k = 3
In this case, I would need to generate the mean for ([11, 12, 10)]
as they are closest to p = 15
With the above numbers, I will need to find the mean for k number of points closest to p and p can be explicitly stated in the array or may not be.
I am new and very confused at this point and feel I have exhausted my resources. I feel this question has been asked before but the answers are much too complex for what I need.
Thanks in advance.
Given a (1d) array arr and scalar input p, here's how you could find the mean of the n nearest values:
def neighbor_mean(arr, p, n=3):
idx = np.abs(arr - p).argsort()[:n]
return arr[idx].mean()
arr = np.array([1, 2, 3, 4, 5, 6, 7, 50, 24, 32, 9, 11, 12, 10])
neighbor_mean(arr, p=15)
# 11.0
In the above, first you take the absolute differences:
np.abs(arr - 15)
# array([14, 13, 12, 11, 10, 9, 8, 35, 9, 17, 6, 4, 3, 5])
Then argsort() returns the indices that would sort an array. We're interested in the n-smallest absolute differences. This is what you're really looking for, rather than sorting the differences directly.
np.abs(arr - p).argsort()[:3]
# array([12, 11, 13])
Lastly you want to index your input array arr and take the mean of this:
arr[[12, 11, 13]]
# array([12, 11, 10]) # mean: 11.0

find repeating dates between two datetime arrays python

I have two datetime arrays, and I am trying to output an array with only those dates which are repeated between both arrays.. I feel like this is something I should be able to answer myself, but I have spent a lot of time searching and I do not understand how to solve this.
>>> datetime1[0:4]
array([datetime.datetime(2014, 6, 19, 4, 0),
datetime.datetime(2014, 6, 19, 5, 0),
datetime.datetime(2014, 6, 19, 6, 0),
datetime.datetime(2014, 6, 19, 7, 0)], dtype=object)
>>> datetime2[0:4]
array([datetime.datetime(2014, 6, 19, 3, 0),
datetime.datetime(2014, 6, 19, 4, 0),
datetime.datetime(2014, 6, 19, 5, 0),
datetime.datetime(2014, 6, 19, 6, 0)], dtype=object)
I've tried this below but I still do not understand why this does not work
>>> np.where(datetime1==datetime2)
(array([], dtype=int64),)
This:
datetime1==datetime2
Is an element-wise comparison. It compares [0] with [0], then [1] with [1], and gives you a boolean array.
Instead, try:
np.in1d(datetime1, datetime2)
This gives you a boolean array the same size as datetime1, set to True for those elements which exist in datetime2.
If your goal is only to get the values rather than the indexes, use this:
np.intersect1d(datetime1, datetime2)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.intersect1d.html
I would say just iterate over the values of datetime1 and datetime2 and check for containment. So for example:
for date in datetime1:
if date in datetime2:
print(date)

Slicing without views (or: shuffling multiple arrays)

I have two different numpy arrays and I would like to shuffle them in asynchronized way.
The current solution is taken from https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html and proceeds as follows:
perm = np.arange(self.no_images_train)
np.random.shuffle(perm)
self.images_train = self.images_train[perm]
self.labels_train = self.labels_train[perm]
The problem is that it doubles memory each time I do it. Somehow the old arrays are not getting deleted, probably because the slicing operator creates views I guess. I tried the following change, out of pure desperation:
perm = np.arange(self.no_images_train)
np.random.shuffle(perm)
n_images_train = self.images_train[perm]
n_labels_train = self.labels_train[perm]
del self.images_train
del self.labels_train
gc.collect()
self.images_train = n_images_train
self.labels_train = n_labels_train
Still the same, memory leaks and I am running out of memory after a couple of operations.
Btw, the two arrays are of rank 100000,224,244,1 and 100000,1.
I know that this has been dealt with here (Better way to shuffle two numpy arrays in unison), but the answer didn't help me, as the provided solution needs slicing again.
Thanks for any help.
One way to permute two large arrays in-place in a synchronized way is to save the state of the random number generator and then shuffle the first array. Then restore the state and shuffle the second array.
For example, here are my two arrays:
In [48]: a
Out[48]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
In [49]: b
Out[49]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
Save the current internal state of the random number generator:
In [50]: state = np.random.get_state()
Shuffle a in-place:
In [51]: np.random.shuffle(a)
Restore the internal state of the random number generator:
In [52]: np.random.set_state(state)
Shuffle b in-place:
In [53]: np.random.shuffle(b)
Check that the permutations are the same:
In [54]: a
Out[54]: array([13, 12, 11, 15, 10, 5, 1, 6, 14, 3, 9, 7, 0, 8, 4, 2])
In [55]: b
Out[55]: array([13, 12, 11, 15, 10, 5, 1, 6, 14, 3, 9, 7, 0, 8, 4, 2])
For your code, this would look like:
state = np.random.get_state()
np.random.shuffle(self.images_train)
np.random.set_state(state)
np.random.shuffle(self.labels_train)
Actually I don't think that there is any issue with numpy or python. Numpy uses the system malloc / free to allocate the array and this leads to memory fragmentation (see Memory Fragmentation on SO).
So I guess that your memory profile may increase and suddenly drops when the system is able to reduce fragmentation, if possible.

Extracting minimum values per row using numpy

I have a question and I could not find the answer on the internet nor on this website. I am sure it is very easy though. Let's say I have a set of 20 numbers and I have them in a 5x4 matrix:
numbers = np.arange(20).reshape(5,4)
This yields the following matrix:
[ 0, 1, 2, 3]
[ 4, 5, 6, 7]
[ 8, 9, 10, 11]
[12, 13, 14, 15]
[16, 17, 18, 19]
Now I would like to have the minimum value of each row, in this case amounting to 0,4,8,12,16. However, I would like to add that for my problem the minimum value is NOT always in the first column, it can be at a random place in the matrix (i.e. first, second, third or fourth column for each row). If someone could shed some light on this it would be greatly appreciated.
You just need to specify the axis across which you want to take the minimum. To find the minimum value in each row, you need to specify axis 1:
>>> numbers.min(axis=1)
array([ 0, 4, 8, 12, 16])
For a 2D array, numbers.min() finds the single minimum value in the array, numbers.min(axis=0) returns the minimum value for each column and numbers.min(axis=1) returns the minimum value for each row.

Categories