Which object is generally smaller in memory given the exact same data: a numpy array with dtype int64 or a C++ vector of type int? For example:
v = np.array([34, 23])
std::vector<int> v { 34,23 };
There effectively 2 parts to an np.array - the object overhead plus attributes like shape and strides, and a data buffer. The first has roughly the same size for all arrays, the second scales with the number of elements (and the size of each element). In numpy the data buffer is 1d, regardless of the array shape.
With only 2 elements the overhead part of your example array is probably larger than the databuffer. But with 1000s of elements the size proportion goes the other way.
Saving the array with np.save will give a rough idea of the memory use. That file format writes a header buffer (256 bytes?), and the rest is the databuffer.
I'm less familiar with C++ storage, though I think that's more transparent (if you know the language).
But remember efficiency in storing one array is only part of the story. In practice you need to think about the memory use when doing math and indexing. The ndarray distinction between view and copy makes it harder to predict just how much memory is being used.
In [1155]: np.save('test.npy',np.array([1,2]))
In [1156]: ls -l test.npy
-rw-rw-r-- 1 paul paul 88 Jun 30 17:08 test.npy
In [1157]: np.save('test.npy',np.arange(1000))
In [1158]: ls -l test.npy
-rw-rw-r-- 1 paul paul 4080 Jun 30 17:08 test.npy
This looks like 80 bytes of header, and 4*len bytes for the data.
Related
I have a data frame of about 19 million rows, which 4 of the variables are latitudes & longitudes. I create a function to calculate distance of latitudes & longitudes with help of python haversine package.
# function to calculate distance of 2 coordinates
def measure_distance(lat_1, long_1, lat_2, long_2):
coordinate_start = list(zip(lat_1, long_1))
coodrinate_end = list(zip(lat_2, long_2))
distance = haversine_vector(coordinate_start, coodrinate_end, Unit.KILOMETERS)
return distance
I use magic command %%memit to measure memory usage to perform the calculation. On average, memory usage is between 8 - 10 GB. I run my work on Google Colab which has 12GB RAM, as a result, sometime the operation hit the limit of runtime and restart.
%%memit
measure_distance(df.station_latitude_start.values,
df.station_longitude_start.values,
df.station_latitude_end.values,
df.station_longitude_end.values)
peak memory: 7981.16 MiB, increment: 5312.66 MiB
Is there a way to optimise my code?
TL;DR: use Numpy and compute the result by chunk.
The amount of memory taken by the CPython interpreter is expected regarding the big input size.
Indeed, CPython stores values in list using references. On a 64-bit system, references takes 8 bytes and basic types (float and small integers) take usually 32 bytes. A tuple of two floats is a complex type that contains the size of the tuple as well as references of the two floats (not values themselves). Its size should be close to 64 bytes. Since you have 2 lists containing 19 million of (reference of) float pairs and 4 list containing 19 million of (reference of) floats, the resulting memory taken should be about 4*19e6*(8+32) + 2*19e6*(8+64) = 5.7 GB. Not to mention that Haversine can make some internal copies and the result take some space too.
If you want to reduce the memory usage, then use Numpy. Indeed, float Numpy arrays store values in a much more compact way (no references, no internal tag). You can replace the list of tuple by a N x 2 Numpy 2D array. The resulting size should be about 4*19e6*8 + 2*19e6*(8*2) = 1.2 GB. Moreover, the computation will be much faster Haversine use Numpy internally. Here is an example:
import numpy as np
# Assume lat_1, long_1, lat_2 and long_2 are of type np.array.
# Use np.array(yourList) if you want to convert it.
def measure_distance(lat_1, long_1, lat_2, long_2):
coordinate_start = np.column_stack((lat_1, long_1))
coordinate_end = np.column_stack((lat_2, long_2))
return haversine_vector(coordinate_start, coordinate_end, Unit.KILOMETERS)
The above code is about 25 time faster.
If you want to reduce even more the memory usage, you can compute the coordinate by chunk (for example 32K values) and then concatenate the output chunks. You can also use single precision numbers rather than double precision if you do not care too much about the accuracy of the computed distances.
Here is an example of how to compute the result by chunk:
def better_measure_distance(lat_1, long_1, lat_2, long_2):
chunckSize = 65536
result = np.zeros(len(lat_1))
for i in range(0, len(lat_1), chunckSize):
coordinate_start = np.column_stack((lat_1[i:i+chunckSize], long_1[i:i+chunckSize]))
coordinate_end = np.column_stack((lat_2[i:i+chunckSize], long_2[i:i+chunckSize]))
result[i:i+chunckSize] = haversine_vector(coordinate_start, coordinate_end, Unit.KILOMETERS)
return result
On my machine, using double precision, the above code takes about 800 MB while the initial implementation take 8 GB. Thus, 10 times less memory! It is also still 23 times faster! Using simple precision, the above code takes about 500 MB, so 16 times less memory, and it is 48 times faster!
In C++ vector there is .reserve(size) and .capacity() methods which allow you to reserve memory for array and get current reserved size. This reserved size is greater or equal to vector's real size (obtained through .size()).
If I do .push_back(element) in this array memory for array is not reallocated if current .size() < .capacity(). This allows to fast appending elements to array. If there is no more capacity then array gets reallocated to new memory location and all data is copied.
I'd like to know if there are same low-level methods available for numpy arrays? Can I reserve large capacity so that small appends/inserts don't reallocate numpy array in memory to often?
Probably there is already some growth mechanism built into numpy array, like 10% growth of reserved capacity on each reallocation. But I wonder if I can control this by myself and maybe implement faster growth, like doubling reserved capacity on each growth.
Also would be nice to know if there in-place variants of numpy functions, like insert/append, which modify array in-place without creating a copy. I.e. part of array is somehow reserved and filled with zeros and this part is used for shifting. E.g. if I have array [1 0 0 0] with 3 last 0 elements reserved then in-place .append(2) would modify mutably this array to make [1 2 0 0] with 2 reserved 0 elements left. Then .insert(1, 3) would again modify it to become [1 3 2 0] with 1 reserved 0 element left. I.e. everything like in C++.
I'm attempting to parallelize a program that reads chunked numpy arrays over the network using shared memory. It seems to work (my data comes out the other side), but the memory of all my child processes is blowing up to about the size of the shared memory (~100-250MB each) and it happens when I write to it. Is there some way to avoid these copies being created? They seem to be unnecessary since the data is propagating back to the actual shared memory array.
Here's how I've set up my array using posix_ipc, mmap, and numpy (np):
shared = posix_ipc.SharedMemory(vol.uuid, flags=O_CREAT, size=int(nbytes))
array_like = mmap.mmap(shared.fd, shared.size)
renderbuffer = np.ndarray(buffer=array_like, dtype=vol.dtype, shape=mcshape)
The memory increases when I do this:
renderbuffer[ startx:endx, starty:endy, startz:endz, : ] = 1
Thanks for your help!
Your actual data has 4 dimensions, but I'm going to work through a simpler 2D example.
Imagine you have this array (renderbuffer):
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Now imagine your startx/endx/starty/endy parameters select this slice in one process:
8 9
13 14
18 19
The entire array is 4x5 times 8 bytes, so 160 bytes. The "window" is 3x2 times 8 bytes, so 48 bytes.
Your expectation seems to be that accessing this 48 byte slice would require 48 bytes of memory in this one process. But actually it requires closer to the full 160 bytes. Why?
The reason is that memory is mapped in pages, which are commonly 4096 bytes each. So when you access the first element (here, the number 8), you will map the entire page of 4096 bytes containing that element.
Memory mapping on many systems is guaranteed to start on a page boundary, so the first element in your array will be at the beginning of a page. So if your array is 4096 bytes or smaller, accessing any element of it will map the entire array into memory in each process.
In your actual use case, each element you access in the slice will result in mapping the entire page which contains it. Elements adjacent in memory (meaning either the first or the last index in the slice increments by one, depending on your array order) will usually reside in the same page, but elements which are adjacent in other dimensions will likely be in separate pages.
But take heart: the memory mappings are shared between processes, so if your entire array is 200 MB, even though each process will end up mapping most or all of it, the total memory usage is still 200 MB combined for all processes. Many memory measurement tools will report that each process uses 200 MB, which is sort of true but useless: they are sharing a 200 MB view of the same memory.
I am new to python and I want to have a list with 2 elements the first one is an integer between 0 and 2 billion, the other one is a number between 0 to 10. I have a large number of these lists (billions).
Suppose I use chr() function to add the second argument for the list. For example:
first_number = 123456678
second_number = chr(1)
mylist = [first_number,second_number]
In this case how does python allocate memory? Will it assume that the second argument is a char and give it (1 byte + overheads) or will it assume that the second argument is a string? If it thinks that it is a string is there any way that I can define and enforce something as char or make this some how more memory efficient?
Edit --> added some more information about why I need this data structure
Here is some more information about what I want to do:
I have a sparse weighted graph with 2 billion edges and 25 million nodes. To represent this graph I tried to create a dictionary (because I need a fast lookup) in which the keys are the nodes (as integers). These nodes are represented by a number between 0 to 2 billion (there is no relation between this and the number of edges). The edges are represented like this: For each of the nodes (or the keys in the dictionary ) I am keeping a list of list. Each element of this list of list is a list that I have explained above. The first one represent the other node and the second argument represents the weight of the edge between the key and the first argument. For example, for a graph that contain 5 nodes, if I have something like
{1: [[2, 1], [3, 1], [4, 2], [5, 1]], 2: [[5, 1]], 3: [[5, 2]], 4: [[6, 1]], 5: [[6, 1]]}
it means that node 1 has 4 edges: one that goes to node 2 with weight 1, one that goes to node 3, with weight 1, one that goes to node 4 with weight 2, etc.
I was looking to see if I could make this more memory efficient by making the second argument of the edge smaller.
Using a single character string will take up about the same amount memory as a small integer because CPython will only create one object of each value, and use that object every time it needs a string or integer of that value. Using strings will take up a bit more space, but it'll be insignificant.
But lets answer you real question, how can you reduce the amount of memory your Python program uses? First I'll calculate about how much memory the objects you want to create will use. I'm using the 64-bit version of Python 2.7 to get my numbers but other 64-bit versions of Python should be similar.
Starting off you have only one dict object, but it has 25 million nodes. Python will use 2^26 hash buckets for a dict of this size, and each bucket is 24 bytes. That comes to about 1.5 GB for the dict itself.
The dict will have 25 million keys, all of them int objects, and each of them is 24 bytes. That comes to total of about 570 MB for all the integers that represent nodes. It will also have 25 million list objects as values. Each list will take up 72 bytes plus 8 bytes per element in the list. These lists will have a total of 2 billion elements, so they'll take up a total of 16.6 GB.
Each of these 2 billion list elements will refer to another list object that's two elements long. That comes to whopping 164 GB. Each of the two element lists will refer two different int objects. Now the good news, while that appears to be a total of about 4 billion integer objects, it's actually only 2 billion different integer objects. There will be only one object created for each of the small integer values used in the second element. So that's a total 44.7 GB of memory used by the integer objects referred to by the first element.
That comes to at least 227 GB of memory you'll need for the data structure as you plan to implement it. Working back through this list I'll explain how its possible for you reduce the memory you'll need to something more practical.
The 44.7 GB of memory used by the int objects that represent nodes in your two element edge lists is the easiest to deal with. Since there are only 25 million nodes, you don't need 2 billion different objects, just one for each node value. Also since you're already using the node values as keys you can just reuse those objects. So that's 44.7 GB gone right there, and depending on how you build your data structure it might not take much effort to ensure only that no redudant node value objects are created. That brings the total down to 183 GB.
Next lets tackle the 164 GB needed for all the two element edge list objects. It's possible that you can share list objects that happen to have the same node value and weighting, but you can do better. Eliminate all the edges lists by flatting the lists of lists. You'll have to do a bit arithmetic access the correct elements, but unless you have a system with a huge amount of memory you're going to have to make compromises. The list objects used as dict values will have to double in length, increasing their total size from 16.1 GB to 31.5 GB. That makes your net savings from flatting the lists a nice 149 GB, bringing the total down to a more reasonable 33.5 GB.
Going farther than this is trickier. One possibility is to use arrays. Unlike lists their elements don't refer to other objects, the value is stored in each element. An array.array object is 56 bytes long plus the size of the elements which in this case are 32-bit integers. That adds up to 16.2 GB for a net savings of 15.3 GB. The total is now only 18.3 GB.
It's possible to squeeze a little more space by taking advantage of the fact that your weights are small integers that fit in single byte characters. Create two array.array objects for each node, one with 32-bit integers for the node values, and the other with 8-bit integers for the weights. Because there are now two array objects, use a tuple object to hold the pair. The total size of all these objects is 13.6 GB. Not a big savings over a single array but now you don't need to any arithmetic to access elements, you just need switch how you index them. The total is down to 15.66 GB.
Finally the last thing I can think of to save memory is to only have two array.array objects. The dict values then become tuple objects that refer to two int objects. The first is an index into the two arrays, the second is a length. This representation takes up 11.6 GB of memory, another small net decrease, with the total becoming 13.6 GB.
That final total of 13.6 GB should work on machine with 16 GB of RAM without much swapping, but it won't leave much room for anything else.
The program I've written stores a large amount of data in dictionaries. Specifically, I'm creating 1588 instances of a class, each of which contains 15 dictionaries with 1500 float to float mappings. This process has been using up the 2GB of memory on my laptop pretty quickly (I start writing to swap at about the 1000th instance of the class).
My question is, which of the following is using up my memory?
34 million some pairs of floats?
The overhead of 22,500 dictionaries?
the overhead of 1500 classes?
To me it seems like the memory hog should be the huge number of floating point numbers that I'm holding in memory. However, If what I've read so far is correct, each of my floating point numbers take up 16 bytes. Since I have 34 million pairs, this should be about 108 million bytes, which should be just over a gigabyte.
Is there something I'm not taking into consideration here?
The floats do take up 16 bytes apiece, and a dict with 1500 entries about 100k:
>> sys.getsizeof(1.0)
16
>>> d = dict.fromkeys((float(i) for i in range(1500)), 2.0)
>>> sys.getsizeof(d)
98444
so the 22,500 dicts take over 2GB all by themselves, the 68 million floats another GB or so. Not sure how you compute 68 million times 16 equal only 100M -- you may have dropped a zero somewhere.
The class itself takes up a negligible amount, and 1500 instances thereof (net of the objects they refer to of course, just as getsizeof gives us such net amounts for the dicts) not much more than a smallish dict each, so, that's hardly the problem. I.e.:
>>> sys.getsizeof(Sic)
452
>>> sys.getsizeof(Sic())
32
>>> sys.getsizeof(Sic().__dict__)
524
452 for the class, (524 + 32) * 1550 = 862K for all the instances, as you see that's not the worry when you have gigabytes each in dicts and floats.