I need a fast way to keep a running maximum of a numpy array. For example, if my array was:
x = numpy.array([11,12,13,20,19,18,17,18,23,21])
I'd want:
numpy.array([11,12,13,20,20,20,20,20,23,23])
Obviously I could do this with a little loop:
def running_max(x):
result = [x[0]]
for val in x:
if val > result[-1]:
result.append(val)
else:
result.append(result[-1])
return result
But my arrays have hundreds of thousands of entries and I need to call this many times. It seems like there's got to be a numpy trick to remove the loop, but I can't seem to find anything that will work. The alternative will be to write this as a C extension, but it seems like I'd be reinventing the wheel.
numpy.maximum.accumulate works for me.
>>> import numpy
>>> numpy.maximum.accumulate(numpy.array([11,12,13,20,19,18,17,18,23,21]))
array([11, 12, 13, 20, 20, 20, 20, 20, 23, 23])
As suggested, there is scipy.maximum.accumulate:
In [9]: x
Out[9]: [1, 3, 2, 5, 4]
In [10]: scipy.maximum.accumulate(x)
Out[10]: array([1, 3, 3, 5, 5])
Related
I have the following numpy array
u = np.array([a1,b1,a2,b2...,an,bn])
where I would like to subtract the a and b elements from each other and end up with a numpy array:
u_result = np.array([(a2-a1),(b2-b1),(a3-a2),(b3-b2),....,(an-a_(n-1)),(an-a_(n-1))])
How can I do this without too much array splitting and for loops? I'm using this in a larger loop so ideally, I would like to do this efficiently (and learn something new)
(I hope the indexing of the resulting array is clear)
Or simply, perform a substraction :
u = np.array([3, 2, 5, 3, 7, 8, 12, 28])
u[2:] - u[:-2]
Output:
array([ 2, 1, 2, 5, 5, 20])
you can use ravel torearrange as your original vector.
Short answer:
u_r = np.ravel([np.diff(u[::2]),
np.diff(u[1::2])], 'F')
Here a long and moore detailed explanation:
separate a from b in u this can be achieved indexing
differentiate a and b you can use np.diff for easiness of code.
ravel again the differentiated values.
#------- Create u---------------
import numpy as np
a_aux = np.array([50,49,47,43,39,34,28])
b_aux = np.array([1,2,3,4,5,6,7])
u = np.ravel([a_aux,b_aux],'F')
print(u)
#-------------------------------
#1)
# get a as elements with index 0, 2, 4 ....
a = u[::2]
b = u[1::2] #get b as 1,3,5,....
#2)
#differentiate
ad = np.diff(a)
bd = np.diff(b)
#3)
#ravel putting one of everyone
u_result = np.ravel([ad,bd],'F')
print(u_result)
You can try in this way. Firstly, split all a and b elements using array[::2], array[1::2]. Finally, subtract from b to a (np.array(array[1::2] - array[::2])).
import numpy as np
array = np.array([7,8,9,6,5,2])
u_result = np.array(array[1::2] - array[::2] )
print(u_result)
Looks like you need to use np.roll:
shift = 2
u = np.array([1, 11, 2, 12, 3, 13, 4, 14])
shifted_u = np.roll(u, -shift)
(shifted_u - u)[:-shift]
Returns:
array([1, 1, 1, 1, 1, 1])
I was reading an article and I came across this below-given piece of code. I ran it and it worked for me:
x = df.columns
x_labels = [v for v in sorted(x.unique())]
x_to_num = {p[1]:p[0] for p in enumerate(x_labels)}
#till here it is okay. But I don't understand what is going with this map.
x.map(x_to_num)
The final result from the map is given below:
Int64Index([ 0, 3, 28, 1, 26, 23, 27, 22, 20, 21, 24, 18, 10, 7, 8, 15, 19,
13, 14, 17, 25, 16, 9, 11, 6, 12, 5, 2, 4],
dtype='int64')
Can someone please explain to me how the .map() worked here. I searched online, but could not find anything related.
ps: df is a pandas dataframe.
Let's look what .map() function in general does in python.
>>> l = [1, 2, 3]
>>> list(map(str, l))
# ['1', '2', '3']
Here the list having numeric elements is converted to string elements.
So, whatever function we are trying to apply using map needs an iterator.
You probably might have got confused because the general syntax of map (map(MappingFunction, IteratorObject)) is not used here and things still work.
The variable x takes the form of IteratorObject , while the dictionary x_to_num contains the mapping and hence takes the form of MappingFunction.
Edit: this scenario has nothing to with pandas as such, x can be any iterator type object.
I'm trying to implement an efficient way of creating a frequency table in python, with a rather large numpy input array of ~30 million entries. Currently I am using a for-loop, but it's taking far too long.
The input is an ordered numpy array of the form
Y = np.array([4, 4, 4, 6, 6, 7, 8, 9, 9, 9..... etc])
And I would like to have an output of the form:
Z = {4:3, 5:0, 6:2, 7:1,8:1,9:3..... etc} (as any data type)
Currently I am using the following implementation:
Z = pd.Series(index = np.arange(Y.min(), Y.max()))
for i in range(Y.min(), Y.max()):
Z[i] = (Y == i).sum()
Is there a quicker way of doing this or a way without iterating through a loop? Thanks for helping, and sorry if this has been asked before!
You can simply do this using Counter from collections module. Please see the below code i ran for your test case.
import numpy as np
from collections import Counter
Y = np.array([4, 4, 4, 6, 6, 7, 8, 9, 9, 9,10,5,5,5])
print(Counter(Y))
It gave the following output
Counter({4: 3, 9: 3, 5: 3, 6: 2, 7: 1, 8: 1, 10: 1})
you can easily use this object for further. I hope this helps.
If your input array x is sorted, you can do the following to get the counts in linear time:
diff1 = np.diff(x)
# get indices of the elements at which jumps occurred
jumps = np.concatenate([[0], np.where(diff1 > 0)[0] + 1, [len(x)]])
unique_elements = x[jumps[:-1]]
counts = np.diff(jumps)
I think numpy.unique is your solution.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.unique.html
import numpy as np
t = np.random.randint(0, 1000, 100000000)
print(np.unique(t, return_counts=True))
This takes ~4 seconds for me.
The collections.Counter approach takes ~10 seconds.
But the numpy.unique returns the frequencies in an array and the collections.Counter returns a dictionary. It's up to convenience.
Edit. I cannot comment on other posts so I'll write here that #lomereiters solution is lightning fast (linear) and should be the accepted one.
I'm trying to vectorize some element calculations but having difficulty doing so without creating list comprehensions for local information to global information. I was told that I can accomplish what I want to do using logical arrays, but so far the examples I've found has not been helpful. While yes I can accomplish this with list comprehensions, speed is a main concern with my code.
I have a set of values that indicate indices in the "global" calculation that should not be adjusted.
For example, these "fixed" indices are
1 2 6
If my global calculation has ten elements, I would be able to set all the "free" values by creating a list of the set of the global indices and subtracting the fixed indices.
free = list(set(range(len(global)) - set(fixed))
[0, 3, 4, 5, 7, 8, 9]
in the global calculation, I would be able to adjust the "free" elements as shown in the following code snippet
global = np.ones(10)
global[free] = global[free] * 10
which should produce:
global = [10, 1, 1, 10, 10, 10, 1, 10, 10, 10]
my "local" calculation is a subset of the global one, where the local map indicates the corresponding indices in the global calculation.
local_map = [4, 2, 1, 8, 6]
local_values = [40, 40, 40, 40, 40]
but I need the values associated with the local map to retain their order for calculation purposes.
What would the equivalent of global[free] be on the local level?
the desired output would be something like this:
local_free = list(set(range(len(local)) - set(fixed))
local_values[local_free] *= 10
OUTPUT: local_values = [400, 40, 40, 400, 40]
I apologize if the question formatting is off, the code block formatting doesn't seem to be working in my browser, so please let me know if you need clarification.
For such comparison-related operations, NumPy has tools like np.setdiff1d and np.in1d among others. To solve our case, these two would be enough. I would assume that the inputs are NumPy arrays, as then we could use vectorized indexing methods supported by NumPy.
On the first case, we have -
In [97]: fixed = np.array([1,2,6])
...: global_arr = np.array([10, 1, 1, 10, 10, 10, 1, 10, 10, 10])
...:
To get the equivalent of list(set(range(len(global_arr)) - set(fixed)) in NumPy, we could make use of np.setdiff1d -
In [98]: np.setdiff1d(np.arange(len(global_arr)),fixed)
Out[98]: array([0, 3, 4, 5, 7, 8, 9])
Next up, we have -
In [99]: local_map = np.array([4, 2, 1, 8, 6])
...: local_values = np.array([42, 40, 48, 41, 43])
...:
We were trying to get -
local_free = list(set(range(len(local)) - set(fixed))
local_values[local_free] *= 10
Here, we can use np.in1d to get a mask to be an equivalent for local_free that could be used to index and assign into local_values with NumPy's boolean-indexing method -
In [100]: local_free = ~np.in1d(local_map,fixed)
...: local_values[local_free] *= 10
...:
In [101]: local_values
Out[101]: array([420, 40, 48, 410, 43])
I have a 3 element python tuple that I'm trying to sort or re-arrange using the indices of a 3-element list, and I want to know what the most concise way to do this is.
So far I've got:
my_tuple = (10, 20, 30)
new_positions = [2, 0, 1]
my_shuffled_tuple = my_tuple[new_positions[0]], my_tuple[new_positions[1]], my_tuple[new_positions[2]]
# outputs: (30, 10, 20)
I also get the same result if I do:
my_shuffled_tuple = tuple([my_tuple[i] for i in new_positions])
Is there a more concise way to create my_shuffled_tuple?
One way to do this is with a generator expression as an argument to tuple, which accepts an iterable:
In [1]: my_tuple = (10, 20, 30)
...: new_positions = [2, 0, 1]
...:
In [2]: my_shuffled_tuple = tuple(my_tuple[i] for i in new_positions)
In [3]: my_shuffled_tuple
Out[3]: (30, 10, 20)
If speed is an issue and you are working with a large amount of data, you should consider using Numpy. This allows direct indexing with a list or array of indices:
In [4]: import numpy as np
In [5]: my_array = np.array([10, 20, 30])
In [6]: new_positions = [2, 0, 1] # or new_positions = np.array([2, 0, 1])
In [7]: my_shuffled_array = my_array[new_positions]
In [8]: my_shuffled_array
Out[8]: array([30, 10, 20])
You can use operator.itemgetter like this:
from operator import itemgetter
my_tuple = (10, 20, 30)
new_positions = [2, 0, 1]
print itemgetter(*new_positions)(my_tuple)
If you will be accessing the elements of my_tuple (or other things too) in the new ordering a lot, you can save this itemgetter as a helper function:
access_at_2_0_1 = itemgetter(*new_positions)
and then access_at_2_0_1(foo) will be the same as tuple(foo[2], foo[0], foo[1]).
This is very helpful when you are trying to work with an argsort-like operation (where lots of arrays need to be re-accessed in a sort order that comes from sorting some other array). Generally, by that point you should probably be using NumPy arrays, but still this is a handy approach.
Note that as itemgetter relies on the __getitem__ protocol (derp) it is not guaranteed to work with all types of iterables, if that is important.
use a comprehension in a tuple() built-in function (it accept generators)
>>> my_tuple = (10, 20, 30)
>>> new_positions = [2, 0, 1]
>>> tuple(my_tuple[i] for i in new_positions)
(30, 10, 20)