In C++ vector there is .reserve(size) and .capacity() methods which allow you to reserve memory for array and get current reserved size. This reserved size is greater or equal to vector's real size (obtained through .size()).
If I do .push_back(element) in this array memory for array is not reallocated if current .size() < .capacity(). This allows to fast appending elements to array. If there is no more capacity then array gets reallocated to new memory location and all data is copied.
I'd like to know if there are same low-level methods available for numpy arrays? Can I reserve large capacity so that small appends/inserts don't reallocate numpy array in memory to often?
Probably there is already some growth mechanism built into numpy array, like 10% growth of reserved capacity on each reallocation. But I wonder if I can control this by myself and maybe implement faster growth, like doubling reserved capacity on each growth.
Also would be nice to know if there in-place variants of numpy functions, like insert/append, which modify array in-place without creating a copy. I.e. part of array is somehow reserved and filled with zeros and this part is used for shifting. E.g. if I have array [1 0 0 0] with 3 last 0 elements reserved then in-place .append(2) would modify mutably this array to make [1 2 0 0] with 2 reserved 0 elements left. Then .insert(1, 3) would again modify it to become [1 3 2 0] with 1 reserved 0 element left. I.e. everything like in C++.
Related
I have a large NumPy array nodes = np.arange(100_000_000) and I need to rearrange this array by:
Recording and then removing the middle value in the array
Split the array into the left half and right half
Repeat Steps 1-2 for each half
Stop when all values are exhausted
So, for a smaller input example nodes = np.arange(10), the output would be:
[5 2 8 1 4 7 9 0 3 6]
This was accomplished by naively doing:
import numpy as np
def split(node, out):
mid = len(node) // 2
out.append(node[mid])
return node[:mid], node[mid+1:]
def reorder(a):
nodes = [a.tolist()]
out = []
while nodes:
tmp = []
for node in nodes:
for n in split(node, out):
if n:
tmp.append(n)
nodes = tmp
return np.array(out)
if __name__ == "__main__":
nodes = np.arange(10)
print(reorder(nodes))
However, this is way too slow for nodes = np.arange(100_000_000) and so I am looking for a much faster solution.
You can vectorize your function with Numpy by working on groups of slices.
Here is an implementation:
# Similar to [e for tmp in zip(a, b) for e in tmp] ,
# but on Numpy arrays and much faster
def interleave(a, b):
assert len(a) == len(b)
return np.column_stack((a, b)).reshape(len(a) * 2)
# n is the length of the input range (len(a) in your example)
def fast_reorder(n):
if n == 0:
return np.empty(0, dtype=np.int32)
startSlices = np.array([0], dtype=np.int32)
endSlices = np.array([n], dtype=np.int32)
allMidSlices = np.empty(n, dtype=np.int32) # Similar to "out" in your implementation
midInsertCount = 0 # Actual size of allMidSlices
# Generate a bunch of middle values as long as there is valid slices to split
while midInsertCount < n:
# Generate the new mid/left/right slices
midSlices = (endSlices + startSlices) // 2
# Computing the next slices is not needed for the last step
if midInsertCount + len(midSlices) < n:
# Generate the nexts slices (possibly with invalid ones)
newStartSlices = interleave(startSlices, midSlices+1)
newEndSlices = interleave(midSlices, endSlices)
# Discard invalid slices
isValidSlices = newStartSlices < newEndSlices
startSlices = newStartSlices[isValidSlices]
endSlices = newEndSlices[isValidSlices]
# Fast appending
allMidSlices[midInsertCount:midInsertCount+len(midSlices)] = midSlices
midInsertCount += len(midSlices)
return allMidSlices[0:midInsertCount]
On my machine, this is 89 times faster than your scalar implementation with the input np.arange(100_000_000) dropping from 2min35 to 1.75s. It also consume far less memory (rougthly 3~4 times less). Note that if you want a faster code, then you probably need to use a native language like C or C++.
Edit:
The question has been updated to have a much smaller input array so I leave the below for historical reasons. Basically it was likely a typo but we often get accustomed to computers working with insanely large numbers and when memory is involved they can be a real problem.
There is already a numpy based solution submitted by someone else that I think fits the bill.
Your code requires an insane amount of RAM just to hold 100 billion 64 bit integers. Do you have 800GB of RAM? Then you convert the numpy array to a list which will be substantially larger than the array (each packed 64 bit int in the numpy array will become a much less memory efficient python int object and the list will have a pointer to that object). Then you make a lot of slices of the list which will not duplicate the data but will duplicate the pointers to the data and use even more RAM. You also append all the result values to a list a single value at a time. Lists are very fast for adding items generally but with such an extreme size this will not only be slow but the way the list is allocated is likely to be extremely wasteful RAM wise and contribute to major problems (I believe they double in size when they get to a certain level of fullness so you will end up allocating more RAM than you need and doing many allocations and likely copies). What kind of machine are you running this on? There are ways to improve your code but unless you're running it on a super computer I don't know that you're going to ever finish that calculation. I only..only? have 32GB of RAM and I'm not going to even try to create a 100B int_64 numpy array as I don't want to use up ssd write life for a mass of virtual memory.
As for improving your code stick to numpy arrays don't change to a python list it will greatly increase the RAM you need. Preallocate a numpy array to put the answer in. Then you need a new algorithm. Anything recursive or recursive like (ie a loop splitting the input,) will require tracking a lot of state, your nodes list is going to be extraordinarily gigantic and again use a lot of RAM. You could use len(a) to indicate values that are removed from your list and scan through the entire array each time to figure out what to do next but that will save RAM in favour of a tremendous amount of searching a gigantic array. I feel like there is an algorithm to cut numbers from each end and place them in the output and just track the beginning and end but I haven't figured it out at least not yet.
I also think there is a simpler algorithm where you just track the number of splits you've done instead of making a giant list of slices and keeping it all in memory. Take the middle of the left half and then the middle of the right then count up one and when you take the middle of the left half's left half you know you have to jump to the right half then the count is one so you jump over to the original right half's left half and on and on... Based on the depth into the halves and the length of the input you should be able to jump around without scanning or tracking all of those slices though I haven't been able to dedicate much time to thinking this through in my head.
With a problem of this nature if you really need to push the limits you should consider using C/C++ so you can be as efficient as possible with RAM usage and because you're doing an insane number of tiny things which doesn't map well to python performance.
I have setup this code using 2 python dictionaries objects with a loop, but my code needs to run faster and therefor I am looking at numpy arrays since I read that these could operate faster than dicts, especially for all numerical values.
Basically what I have is 2 arrays of data.
The first array contains variables. These variables are pulled in from a websocket service and are constantly updating. Each row represents 2 values of 1 parameter. No values will be added, all values are constantly being updated.
VariablesArray (this array is about 70 rows, 2 columns).
[
1.5 0.1
8 9
4 3
27 6
...
]
(in theory this could also be just a 1D array of 70 variables)
The second array need to be some type of fully static array containing referenced operations that need to be done and verified on these variables
OperationsArray (this array is about 1000 rows, 1 column)
[
VariablesArray[1,1] * VariablesArray[2,1] * VariablesArray [3,0]
VariablesArray[1,0] * VariablesArray[2,0] * VariablesArray [3,1]
VariablesArray[1,0] * VariablesArray[5,0] * VariablesArray [2,1]
...
]
Every time a variable changes, this list of calculations is checked, preferably only the rows that contain this updated variable, but for the sake of simplicity in this question let's perhaps recalculate everything.
If any of these multiplications returns a result higher then 100, I need to take action and trigger some alarm code.
If I put both these arrays in dictionary objects and loop through this in python, I can do 7 "OperationsArray" calculations per millisecond. Since some variables are referenced in a few hundred calculation rows, any update of those variables would take up to 100ms to calculate alarms, which is too long.
Now I am wondering what the best approach is for the fastest result. I am really new to python and coding , perhaps this is just as easy as adding these variables as specified above in these 2 arrays and then loop through the second array to see if anything is bigger than 100?
I have a 2 * N integer array ids representing intervals, where N is about a million. It looks like this
0 2 1 ...
3 4 3 ...
The ints in the arrays can be 0, 1, ... , M-1, where M <= 2N - 1. (Detail: if M = 2N, then the ints span all the 2N integers; if M < 2N, then there are some integers that have the same values.)
I need to calculate a kind of inverse map from ids. What I called "inverse map" is to see ids as intervals and capture the relation from their inner points with their indices.
Intuition Intuitively,
0 2 1
3 4 3
can be seen as
0 -> 0, 1, 2
1 -> 2, 3
2 -> 1, 2
where the right-hand-side endpoints are excluded for my problem. The "inverse" map would be
0 -> 0
1 -> 0, 2
2 -> 0, 1, 2
3 -> 1
Code I have a piece of Python code that attempts to calculate the inverse map in a dictionary inv below:
for i in range(ids.shape[1]):
for j in range(ids[0][i], ids[1][i]):
inv[j].append(i)
where each inv[j] is an array-like data initialized as empty before the nested loop. Currently I use python's built-in arrays to initialize it.
for i in range(M): inv[i]=array.array('I')
Question The nested loop above works like a mess. In my problem setting (in image processing), my first loop has a million iterations; second one about 3000 iterations. Not only it takes much memory (because inv is huge), it is also slow. I would like to focus on speed in this question. How can I accelerate this nested loop above, e.g. with vectorization?
You could try the below option, in which, your outer loop is hidden away within numpy's C-language implementation of apply_along_axis(). Not sure about about performance benefit, only a test at a decent scale can tell (especially as there's some initial overhead involved in converting lists to numpy arrays):
import numpy as np
import array
ids = [[0,2,1],[3,4,3]]
ids_arr = np.array(ids) # Convert to numpy array. Expensive operation?
range_index = 0 # Initialize. To be bumped up by each invocation of my_func()
inv = {}
for i in range(np.max(ids_arr)):
inv[i] = array.array('I')
def my_func(my_slice):
global range_index
for i in range(my_slice[0], my_slice[1]):
inv[i].append(range_index)
range_index += 1
np.apply_along_axis (my_func,0,ids_arr)
print (inv)
Output:
{0: array('I', [0]), 1: array('I', [0, 2]), 2: array('I', [0, 1, 2]),
3: array('I', [1])}
Edit:
I feel that using a dictionary might not be a good idea here. I suspect that in this particular context, dictionary-indexing might actually be slower than numpy array indexing. Use the below lines to create and initialize inv as a numpy array of Python arrays. The rest of the code can remain as-is:
inv_len = np.max(ids_arr)
inv = np.empty(shape=(inv_len,), dtype=array.array)
for i in range(inv_len):
inv[i] = array.array('I')
(Note: This assumes that your application isn't doing dict-specific stuff on inv, such as inv.items() or inv.keys(). If that's the case, however, you might need an extra step to convert the numpy array into a dict)
avoid for loop, just a pandas sample
import numpy as np
import pandas as pd
df = pd.DataFrame({
"A": np.random.randint(0, 100, 100000),
"B": np.random.randint(0, 100, 100000)
})
df.groupby("B")["A"].agg(list)
Since the order of N is large, I've come up with what seems like a practical approach; let me know if there are any flaws.
For the ith interval as [x,y], store it as [x,y,i]. Sort the arrays based on their start and end times. This should take O(NlogN) time.
Create a frequency array freq[2*N+1]. For each interval, update the frequency using the concept of range update in O(1) per update. Generating the frequencies gets done in O(N).
Determine a threshold, based on your data. According to that value, the elements can be specified as either sparse or frequent. For sparse elements, do nothing. For frequent elements only, store the intervals in which they occur.
During lookup, if there is a frequent element, you can directly access the pre-computed lists. If the element is a sparse one, you can search the intervals in O(logN) time, since the intervals are sorted and there indexes were appended in step 1.
This seems like a practical approach to me, rest depends on your usage. Like the amortized time complexity you need per query and so on.
I am new to Python programming and I have a problem in assigning specific values to the first column of a very large numpy.array.
This is the code I use:
import numpy as np
a = np.zeros ((365343020, 9), dtype = np.float32)
for n in range (0, 36534302):
a[n*10:(n+1)*10,0] = n
where the second row is where I create an array, of 365343020 rows and 9 columns, filled with zeros; while the successive “for” is meant to replace the first column of the array with a vector whose elements are 36534302 sequential integers repeated 10 times each (eg [0,0,…,0,1,1,…,1,2,2,…, 36534301, 36534301,…, 36534301]).
The code seems to respond as desired till around row 168000000 or the array, then it substitute the 10 repetitions of numbers with the last digit odd with a second repetition of the (even) number before.
I have looked for explanations regarding the difference between views and copies. However, even trying to manually define the content of a specific cell of the array (where it is wrongly defined by the loop), it does not change.
Could you please help me in solving this problem?
Thanks
Maybe your program is consuming too much memory. Here is some basic math for your code.
Date type: float32
Bits used: 32 bits
Size of array: 3288087180 (365343020*9)
Total memory consumed: 105218789760 bits(13.15234872 GB)
1.Try using float8 bit if value being stored in array is not large.
2.Try to decrease your array size.
3.Both 1 and 2
Have a look at this image:
In my application I receive from an iterator an arbitrary amount (let's say 1000 for now) of big 1-dimensional arrays arr1, arr2, arr3, ..., arr1000 (10000 entries each). Each entry is an integer between 0 and n, where in this case n = 9. My ultimate goal is to compute a 1-dimensional array result such that result[i] == the mode of arr1[i], arr2[i], arr3[i], ..., arr1000[i].
However, it is not tractable to concatenate the arrays to one big matrix and then compute the mode row-wise, since this may exceed the RAM on my machine.
An alternative would be to set up an array res2 of shape (10000, 10), then loop through every array, use each entry e as index and then to increase the value of res2[i][e] by 1. Alter looping, I would apply something like argmax. However, this is too slow.
So: Is the a way to perform the task in a fast way, maybe by using NumPy's advanced indexing?
EDIT (due to the comments):
This is basically the code which calculates the modes row-wise – avoiding to concatenate the arrays:
def foo(length, n):
counts = np.zeros((length, n), dtype=np.int_)
for arr in array_iterator():
i = 0
for e in arr:
counts[i][e] += 1
i += 1
return np.argmax(counts, axis=1)
It takes already 60 seconds for 100 arrays of size 10000 (although there is more work done behind the scenes, which results into that time – however, this work scales linearly with the amount of arrays).
Regarding the real sizes:
The amount of different arrays is really arbitrary. It's a parameter of experiments and I'd like to have the opportunity even to set this to values like 10^6. The length of each array is depending of my data set I'm working with. This could be 10000, or 100000 or even worse. However – spitting this into smaller pieces may be possible, though annoying.
My free RAM for this task is about 4 GB.
EDIT 2:
The running time I gave above leads to a wrong impression. Actually, the running time which just belongs to the inner loop (for e in arr) in the above mentioned scenario is just 5 seconds – which is now ok for me, since it's negligible compared to the remaining running time. I will leave this question open anyway for a moment, since there might be an even faster method waiting out there.