I am new to python and I want to have a list with 2 elements the first one is an integer between 0 and 2 billion, the other one is a number between 0 to 10. I have a large number of these lists (billions).
Suppose I use chr() function to add the second argument for the list. For example:
first_number = 123456678
second_number = chr(1)
mylist = [first_number,second_number]
In this case how does python allocate memory? Will it assume that the second argument is a char and give it (1 byte + overheads) or will it assume that the second argument is a string? If it thinks that it is a string is there any way that I can define and enforce something as char or make this some how more memory efficient?
Edit --> added some more information about why I need this data structure
Here is some more information about what I want to do:
I have a sparse weighted graph with 2 billion edges and 25 million nodes. To represent this graph I tried to create a dictionary (because I need a fast lookup) in which the keys are the nodes (as integers). These nodes are represented by a number between 0 to 2 billion (there is no relation between this and the number of edges). The edges are represented like this: For each of the nodes (or the keys in the dictionary ) I am keeping a list of list. Each element of this list of list is a list that I have explained above. The first one represent the other node and the second argument represents the weight of the edge between the key and the first argument. For example, for a graph that contain 5 nodes, if I have something like
{1: [[2, 1], [3, 1], [4, 2], [5, 1]], 2: [[5, 1]], 3: [[5, 2]], 4: [[6, 1]], 5: [[6, 1]]}
it means that node 1 has 4 edges: one that goes to node 2 with weight 1, one that goes to node 3, with weight 1, one that goes to node 4 with weight 2, etc.
I was looking to see if I could make this more memory efficient by making the second argument of the edge smaller.
Using a single character string will take up about the same amount memory as a small integer because CPython will only create one object of each value, and use that object every time it needs a string or integer of that value. Using strings will take up a bit more space, but it'll be insignificant.
But lets answer you real question, how can you reduce the amount of memory your Python program uses? First I'll calculate about how much memory the objects you want to create will use. I'm using the 64-bit version of Python 2.7 to get my numbers but other 64-bit versions of Python should be similar.
Starting off you have only one dict object, but it has 25 million nodes. Python will use 2^26 hash buckets for a dict of this size, and each bucket is 24 bytes. That comes to about 1.5 GB for the dict itself.
The dict will have 25 million keys, all of them int objects, and each of them is 24 bytes. That comes to total of about 570 MB for all the integers that represent nodes. It will also have 25 million list objects as values. Each list will take up 72 bytes plus 8 bytes per element in the list. These lists will have a total of 2 billion elements, so they'll take up a total of 16.6 GB.
Each of these 2 billion list elements will refer to another list object that's two elements long. That comes to whopping 164 GB. Each of the two element lists will refer two different int objects. Now the good news, while that appears to be a total of about 4 billion integer objects, it's actually only 2 billion different integer objects. There will be only one object created for each of the small integer values used in the second element. So that's a total 44.7 GB of memory used by the integer objects referred to by the first element.
That comes to at least 227 GB of memory you'll need for the data structure as you plan to implement it. Working back through this list I'll explain how its possible for you reduce the memory you'll need to something more practical.
The 44.7 GB of memory used by the int objects that represent nodes in your two element edge lists is the easiest to deal with. Since there are only 25 million nodes, you don't need 2 billion different objects, just one for each node value. Also since you're already using the node values as keys you can just reuse those objects. So that's 44.7 GB gone right there, and depending on how you build your data structure it might not take much effort to ensure only that no redudant node value objects are created. That brings the total down to 183 GB.
Next lets tackle the 164 GB needed for all the two element edge list objects. It's possible that you can share list objects that happen to have the same node value and weighting, but you can do better. Eliminate all the edges lists by flatting the lists of lists. You'll have to do a bit arithmetic access the correct elements, but unless you have a system with a huge amount of memory you're going to have to make compromises. The list objects used as dict values will have to double in length, increasing their total size from 16.1 GB to 31.5 GB. That makes your net savings from flatting the lists a nice 149 GB, bringing the total down to a more reasonable 33.5 GB.
Going farther than this is trickier. One possibility is to use arrays. Unlike lists their elements don't refer to other objects, the value is stored in each element. An array.array object is 56 bytes long plus the size of the elements which in this case are 32-bit integers. That adds up to 16.2 GB for a net savings of 15.3 GB. The total is now only 18.3 GB.
It's possible to squeeze a little more space by taking advantage of the fact that your weights are small integers that fit in single byte characters. Create two array.array objects for each node, one with 32-bit integers for the node values, and the other with 8-bit integers for the weights. Because there are now two array objects, use a tuple object to hold the pair. The total size of all these objects is 13.6 GB. Not a big savings over a single array but now you don't need to any arithmetic to access elements, you just need switch how you index them. The total is down to 15.66 GB.
Finally the last thing I can think of to save memory is to only have two array.array objects. The dict values then become tuple objects that refer to two int objects. The first is an index into the two arrays, the second is a length. This representation takes up 11.6 GB of memory, another small net decrease, with the total becoming 13.6 GB.
That final total of 13.6 GB should work on machine with 16 GB of RAM without much swapping, but it won't leave much room for anything else.
Related
I have a data frame of about 19 million rows, which 4 of the variables are latitudes & longitudes. I create a function to calculate distance of latitudes & longitudes with help of python haversine package.
# function to calculate distance of 2 coordinates
def measure_distance(lat_1, long_1, lat_2, long_2):
coordinate_start = list(zip(lat_1, long_1))
coodrinate_end = list(zip(lat_2, long_2))
distance = haversine_vector(coordinate_start, coodrinate_end, Unit.KILOMETERS)
return distance
I use magic command %%memit to measure memory usage to perform the calculation. On average, memory usage is between 8 - 10 GB. I run my work on Google Colab which has 12GB RAM, as a result, sometime the operation hit the limit of runtime and restart.
%%memit
measure_distance(df.station_latitude_start.values,
df.station_longitude_start.values,
df.station_latitude_end.values,
df.station_longitude_end.values)
peak memory: 7981.16 MiB, increment: 5312.66 MiB
Is there a way to optimise my code?
TL;DR: use Numpy and compute the result by chunk.
The amount of memory taken by the CPython interpreter is expected regarding the big input size.
Indeed, CPython stores values in list using references. On a 64-bit system, references takes 8 bytes and basic types (float and small integers) take usually 32 bytes. A tuple of two floats is a complex type that contains the size of the tuple as well as references of the two floats (not values themselves). Its size should be close to 64 bytes. Since you have 2 lists containing 19 million of (reference of) float pairs and 4 list containing 19 million of (reference of) floats, the resulting memory taken should be about 4*19e6*(8+32) + 2*19e6*(8+64) = 5.7 GB. Not to mention that Haversine can make some internal copies and the result take some space too.
If you want to reduce the memory usage, then use Numpy. Indeed, float Numpy arrays store values in a much more compact way (no references, no internal tag). You can replace the list of tuple by a N x 2 Numpy 2D array. The resulting size should be about 4*19e6*8 + 2*19e6*(8*2) = 1.2 GB. Moreover, the computation will be much faster Haversine use Numpy internally. Here is an example:
import numpy as np
# Assume lat_1, long_1, lat_2 and long_2 are of type np.array.
# Use np.array(yourList) if you want to convert it.
def measure_distance(lat_1, long_1, lat_2, long_2):
coordinate_start = np.column_stack((lat_1, long_1))
coordinate_end = np.column_stack((lat_2, long_2))
return haversine_vector(coordinate_start, coordinate_end, Unit.KILOMETERS)
The above code is about 25 time faster.
If you want to reduce even more the memory usage, you can compute the coordinate by chunk (for example 32K values) and then concatenate the output chunks. You can also use single precision numbers rather than double precision if you do not care too much about the accuracy of the computed distances.
Here is an example of how to compute the result by chunk:
def better_measure_distance(lat_1, long_1, lat_2, long_2):
chunckSize = 65536
result = np.zeros(len(lat_1))
for i in range(0, len(lat_1), chunckSize):
coordinate_start = np.column_stack((lat_1[i:i+chunckSize], long_1[i:i+chunckSize]))
coordinate_end = np.column_stack((lat_2[i:i+chunckSize], long_2[i:i+chunckSize]))
result[i:i+chunckSize] = haversine_vector(coordinate_start, coordinate_end, Unit.KILOMETERS)
return result
On my machine, using double precision, the above code takes about 800 MB while the initial implementation take 8 GB. Thus, 10 times less memory! It is also still 23 times faster! Using simple precision, the above code takes about 500 MB, so 16 times less memory, and it is 48 times faster!
I have setup this code using 2 python dictionaries objects with a loop, but my code needs to run faster and therefor I am looking at numpy arrays since I read that these could operate faster than dicts, especially for all numerical values.
Basically what I have is 2 arrays of data.
The first array contains variables. These variables are pulled in from a websocket service and are constantly updating. Each row represents 2 values of 1 parameter. No values will be added, all values are constantly being updated.
VariablesArray (this array is about 70 rows, 2 columns).
[
1.5 0.1
8 9
4 3
27 6
...
]
(in theory this could also be just a 1D array of 70 variables)
The second array need to be some type of fully static array containing referenced operations that need to be done and verified on these variables
OperationsArray (this array is about 1000 rows, 1 column)
[
VariablesArray[1,1] * VariablesArray[2,1] * VariablesArray [3,0]
VariablesArray[1,0] * VariablesArray[2,0] * VariablesArray [3,1]
VariablesArray[1,0] * VariablesArray[5,0] * VariablesArray [2,1]
...
]
Every time a variable changes, this list of calculations is checked, preferably only the rows that contain this updated variable, but for the sake of simplicity in this question let's perhaps recalculate everything.
If any of these multiplications returns a result higher then 100, I need to take action and trigger some alarm code.
If I put both these arrays in dictionary objects and loop through this in python, I can do 7 "OperationsArray" calculations per millisecond. Since some variables are referenced in a few hundred calculation rows, any update of those variables would take up to 100ms to calculate alarms, which is too long.
Now I am wondering what the best approach is for the fastest result. I am really new to python and coding , perhaps this is just as easy as adding these variables as specified above in these 2 arrays and then loop through the second array to see if anything is bigger than 100?
I'm attempting to parallelize a program that reads chunked numpy arrays over the network using shared memory. It seems to work (my data comes out the other side), but the memory of all my child processes is blowing up to about the size of the shared memory (~100-250MB each) and it happens when I write to it. Is there some way to avoid these copies being created? They seem to be unnecessary since the data is propagating back to the actual shared memory array.
Here's how I've set up my array using posix_ipc, mmap, and numpy (np):
shared = posix_ipc.SharedMemory(vol.uuid, flags=O_CREAT, size=int(nbytes))
array_like = mmap.mmap(shared.fd, shared.size)
renderbuffer = np.ndarray(buffer=array_like, dtype=vol.dtype, shape=mcshape)
The memory increases when I do this:
renderbuffer[ startx:endx, starty:endy, startz:endz, : ] = 1
Thanks for your help!
Your actual data has 4 dimensions, but I'm going to work through a simpler 2D example.
Imagine you have this array (renderbuffer):
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Now imagine your startx/endx/starty/endy parameters select this slice in one process:
8 9
13 14
18 19
The entire array is 4x5 times 8 bytes, so 160 bytes. The "window" is 3x2 times 8 bytes, so 48 bytes.
Your expectation seems to be that accessing this 48 byte slice would require 48 bytes of memory in this one process. But actually it requires closer to the full 160 bytes. Why?
The reason is that memory is mapped in pages, which are commonly 4096 bytes each. So when you access the first element (here, the number 8), you will map the entire page of 4096 bytes containing that element.
Memory mapping on many systems is guaranteed to start on a page boundary, so the first element in your array will be at the beginning of a page. So if your array is 4096 bytes or smaller, accessing any element of it will map the entire array into memory in each process.
In your actual use case, each element you access in the slice will result in mapping the entire page which contains it. Elements adjacent in memory (meaning either the first or the last index in the slice increments by one, depending on your array order) will usually reside in the same page, but elements which are adjacent in other dimensions will likely be in separate pages.
But take heart: the memory mappings are shared between processes, so if your entire array is 200 MB, even though each process will end up mapping most or all of it, the total memory usage is still 200 MB combined for all processes. Many memory measurement tools will report that each process uses 200 MB, which is sort of true but useless: they are sharing a 200 MB view of the same memory.
So, I'm making a game in Python 3.4. In the game I need to keep track of a map. It is a map of joined rooms, starting at (0,0) and continuing in every direction, generated in a filtered-random way(only correct matches for the next position are used for a random list select).
I have several types of rooms, which have a name, and a list of doors:
RoomType = namedtuple('Room','Type,EntranceLst')
typeA = RoomType("A",["Bottom"])
...
For the map at the moment I keep a dict of positions and the type of room:
currentRoomType = typeA
currentRoomPos = (0,0)
navMap = {currentRoomPos: currentRoomType}
I have loop that generates 9.000.000 rooms, to test the memory usage.
I get around 600 and 800Mb when I run it.
I was wondering if there is a way to optimize that.
I tried with instead of doing
navMap = {currentRoomPos: currentRoomType}
I would do
navMap = {currentRoomPos: "A"}
but this doesn't have a real change in usage.
Now I was wondering if I could - and should - keep a list of all the types, and for every type keep the positions on which it occurs. I do not know however if it will make a difference with the way python manages its variables.
This is pretty much a thought-experiment, but if anything useful comes from it I will probably implement it.
You can use sys.getsizeof(object) to get the size of a Python object. However, you have to be careful when calling sys.getsizeof on containers: it only gives the size of the container, not the content -- see this recipe for an explanation of how to get the total size of a container, including contents. In this case, we don't need to go quite so deep: we can just manually add up the size of the container and the size of its contents.
The sizes of the types in question are:
# room type size
>>> sys.getsizeof(RoomType("A",["Bottom"])) + sys.getsizeof("A") + sys.getsizeof(["Bottom"]) + sys.getsizeof("Bottom")
233
# position size
>>> sys.getsizeof((0,0)) + 2*sys.getsizeof(0)
120
# One character size
>>> sys.getsizeof("A")
38
Let's look at the different options, assuming you have N rooms:
Dictionary from position -> room_type. This involves keeping N*(size(position) + size(room_type)) = 353 N bytes in memory.
Dictionary from position -> 1-character string. This involves keeping N*158 bytes in memory.
Dictionary from type -> set of positions. This involves keeping N*120 bytes plus a tiny overhead with storing dictionary keys.
In terms of memory usage, the third option is clearly better. However, as is often the case, you have a CPU memory tradeoff. It's worth thinking briefly about the computational complexity of the queries you are likely to do. To find the type of a room given its position, with each of the three choices above you have to:
Look up the position in a dictionary. This is an O(1) lookup, so you'll always have the same run time (approximately), independent of the number of rooms (for a large number of rooms).
Same
Look at each type, and for each type, ask if that position is in the set of positions for that type. This is an O(ntypes) lookup, that is, the time it takes is proportional to the number of types that you have. Note that, if you had gone for list instead of a set to store the rooms of a given type, this would grow to O(nrooms * ntypes), which would kill your performance.
As always, when optimising, it is important to consider the effect of an optimisation on both memory usage and CPU time. The two are often at odds.
As an alternative, you could consider keeping the types in a 2-dimensional numpy array of characters, if your map is sufficiently rectangular. I believe this would be far more efficient. Each character in a numpy array is a single byte, so the memory usage would be much less, and the CPU time would still be O(1) lookup from room position to type:
# Generate random 20 x 10 rectangular map
>>> map = np.repeat('a', 100).reshape(20, 10)
>>> map.nbytes
200 # ie. 1 byte per character.
Some additionally small scale optimisations:
Encode the room type as an int rather than a string. Ints have size 24 bytes, while one-character strings have size 38.
Encode the position as a single integer, rather than a tuple. For instance:
# Random position
xpos = 5
ypos = 92
# Encode the position as a single int, using high-order bits for x and low-order bits for y
pos = 5*1000 + ypos
# Recover the x and y values of the position.
xpos = pos / 1000
ypos = pos % 1000
Note that this kills readability, so it's only worth doing if you want to squeeze the last bits of performance. In practice, you might want to use a power of 2, rather than a power of 10, as your delimiter (but a power of 10 helps with debugging and readability). Note that this brings your number of bytes per position from 120 to 24. If you do go down this route, consider defining a Position class using __slots__ to tell Python how to allocate memory, and add xpos and ypos properties to the class. You don't want to litter your code with pos / 1000 and pos % 1000 statements.
The program I've written stores a large amount of data in dictionaries. Specifically, I'm creating 1588 instances of a class, each of which contains 15 dictionaries with 1500 float to float mappings. This process has been using up the 2GB of memory on my laptop pretty quickly (I start writing to swap at about the 1000th instance of the class).
My question is, which of the following is using up my memory?
34 million some pairs of floats?
The overhead of 22,500 dictionaries?
the overhead of 1500 classes?
To me it seems like the memory hog should be the huge number of floating point numbers that I'm holding in memory. However, If what I've read so far is correct, each of my floating point numbers take up 16 bytes. Since I have 34 million pairs, this should be about 108 million bytes, which should be just over a gigabyte.
Is there something I'm not taking into consideration here?
The floats do take up 16 bytes apiece, and a dict with 1500 entries about 100k:
>> sys.getsizeof(1.0)
16
>>> d = dict.fromkeys((float(i) for i in range(1500)), 2.0)
>>> sys.getsizeof(d)
98444
so the 22,500 dicts take over 2GB all by themselves, the 68 million floats another GB or so. Not sure how you compute 68 million times 16 equal only 100M -- you may have dropped a zero somewhere.
The class itself takes up a negligible amount, and 1500 instances thereof (net of the objects they refer to of course, just as getsizeof gives us such net amounts for the dicts) not much more than a smallish dict each, so, that's hardly the problem. I.e.:
>>> sys.getsizeof(Sic)
452
>>> sys.getsizeof(Sic())
32
>>> sys.getsizeof(Sic().__dict__)
524
452 for the class, (524 + 32) * 1550 = 862K for all the instances, as you see that's not the worry when you have gigabytes each in dicts and floats.