I'm collecting the system information of the current machine. Part of this information is the RAM and HDD capacity. Problem is that the capacity being gathered is measured in bytes rather than GB.
In a nutshell, how do I convert the display of the internal specifications to resemble what you would see from a consumer/commercial stand point?
1000GB HDD or 8GB RAM as opposed to the exact number of bytes available. Especially since manufacturers set aside different amounts of recovery sectors, RAM can be used for integrated graphics and the 1000 vs 1024 binary differential, etc... Here's an example of my current code:
import os
import wmi #import native powershell functionality
import math
c = wmi.WMI()
SYSINFO = c.Win32_ComputerSystem()[0] # Manufacturer/Model/Spec blob
RAMTOTAL = int(SYSINFO.TotalPhysicalMemory) # Gathers only the RAM capacity in bytes.
RAMROUNDED = math.ceil(RAMTOTAL / 2000000000.) * 2.000000000 # attempts to round bytes to nearest, even, GB.
HDDTOTAL = int(HDDINFO.size) # Gathers only the HDD capacity in bytes.
HDDROUNDED = math.ceil(HDDTOTAL / 2000000000.) * 2.000000000 # attempts to round bytes to nearest, even, GB.
HDDPRNT = "HDD: " + str(HDDROUNDED) + "GB"
RAMPRNT = "RAM: " + str(RAMROUNDED) + "GB"
print(HDDPRNT)
print(RAMPRNT)
The area of interest is lines 8-11where I'm rounding up to the nearest even number since the internal size of RAM/HDD are always lower than advertised for reasons mentioned previously. StackOverflow posts have gotten me this method which is the most accurate, across the most machines, but it's still hard coded. Meaning the HDD only rounds accurately for either hundreds of GB or thousands, not both. Also, the RAM isn't 100% accurate.
Here's a couple workarounds that come to mind that will produce the results I'm looking for:
Adding additional commands to RAMTOTAL that may or may not be available. Allowing for GB output instead of KB. However. I would prefer it to be apart of the WMI import instead of straight native Windows code.
Figure out a more static method of rounding. ie: if HDDTOTAL > 1TB round up to decimal point X. else HDDTOTAL < 1TB use different rounding method.
I think you could write a simple function that solves it. In case the number in kB would be significantly smaller or greater, I added a possibility of different suffixes (It is inspired by very similar example in a book Dive Into Python 3). It might look something like this:
def round(x):
a = 0
while x > 1000:
suffixes = ('kB','MB','GB','TB')
a += 1 #This will go up the suffixes tuple with each division
x = x /1000
return math.ceil(x), suffixes[a]
Results of this function might look like this:
>>> print(round(19276246))
(20, 'GB')
>>> print(round(135565666656))
(136, 'TB')
>>> print(round(1355))
(2, 'MB')
and you could implement it to your code like this:
import os
import wmi #import native powershell functionality
import math
def round(x):
a = 0
while x > 1000:
suffixes = ('kB','MB','GB','TB')
a += 1 #This will go up the suffixes tuple for each division
x = x /1000
return math.ceil(x), suffixes[a]
.
.
.
RAMROUNDED = round(RAMTOTAL) #attempts to round bytes to nearest, even, GB.
HDDTOTAL = int(HDDINFO.size) # Gathers only the HDD capacity in bytes.
HDDROUNDED = round(HDDTOTAL) #attempts to round bytes to nearest, even, GB.
HDDPRNT = "HDD: " + str(HDDROUNDED[0]) + HDDROUNDED[1]
RAMPRNT = "RAM: " + str(RAMROUNDED[0]) + RAMROUNDED[1]
print(HDDPRNT)
print(RAMPRNT)
PowerShell has a lot of very powerful native math capabilities built in, allowing us to do things like divide by 1GB to get the whole number in gigabytes of a particular drive.
So, to see the total physical memory rounded by 1 GB, this is how to do it:
get-wmiobject -Class Win32_ComputerSystem |
select #{Name='Ram(GB)';Expression={[int]($_.TotalPhysicalMemory /1GB)}}
This method is called a Calculated Property, the way it differs from using a regular select statement (like Select TotalPhysicalMemory) is that I'm telling PowerShell to make a new Prop call Ram(GB) and use the following expression to determine it's value.
[int]($_.TotalPhysicalMemory /1GB)
The expression I'm using begins in the parenthesis, where I'm getting the TotalPhysicalMemory (which returns as 17080483840). I then divide by 1GB to give me 15.9074401855469. Finally, I apply [int] to cast the whole thing as an integer that is to say, make it a whole number, rounding as appropriate.
Here is the output
>Ram(GB)
-------
16
I used a combination of the two previous suggestions.
I used an if loop, rather than a while loop, but get the same results. I also mirrored the same internal process of the suggested powershell commands to keep the script more native to python and without adding modules/dependencies.
GBasMB = int(1000000000) # Allows for accurate Bytes to GB conversion
global RAMSTRROUNDED
RAMTOTAL = int(SYSINFO.TotalPhysicalMemory) / (GBasMB) # Outputs GB by dividing by previous MB variable
RAMROUNDED = math.ceil(RAMTOTAL / 2.) * 2 # Rounds up to nearest even whole number
RAMSTRROUNDED = int(RAMROUNDED) # final converted variable
HDDTOTAL = int(HDDINFO.size) / (GBasMB) # Similar process as before for HardDrive
HDDROUNDED = math.ceil(HDDTOTAL / 2.) * 2 # round up to nearest even whole number
def ROUNDHDDTBORGB(): # function for determining TB or GB sized HDD
global HDDTBORGBOUTPUT
global HDDPRNT
if HDDROUNDED >= 1000: # if equal to or greater than 1000GB, list as 1TB
HDDTBORGB = HDDROUNDED * .001
HDDTBORGBOUTPUT = str(HDDTBORGB) + "TB"
HDDPRNT = "HDD: " + str(HDDTBORGBOUTPUT)
print(HDDPRNT)
elif HDDROUNDED < 1000: # if less than 1000GB list as GB
HDDTBORGBOUTPUT = str(str(HDDROUNDED) + "GB")
HDDPRNT = "HDD: " + str(HDDTBORGBOUTPUT)
I've ran this script on several dozen computers and seems to accurately gather the appropriate amount of RAM and HDD capacities. Regardless of how much RAM the integrated graphics decides to consume and/or reserve sectors on the HDD, etc...
Related
I have an assignment to use a greedy approach to satisfy TSP. The problem has 33708 cities. because I had a lot of helpful tools for this from the previous assignment, I decided to reuse that approach and precompute the distances.
so that is barely more than half a billion entries (33708 choose 2), each comfortably fitting in a float32. The x and y coordinates, likewise, are numbers $|n| < 10000 $ with no more than 4 decimal places.
My python for the same was:
def get_distance(left, right):
""" return the euclidean distance between tuples left and right, which are coordinates"""
return ((left[0] - right[0]) ** 2 + (left[1] - right[1]) ** 2) ** 0.5
# precompute all distances
distances = {}
for i in range(len(cities)):
for j in range(i + 1, len(cities)):
d = get_distance(cities[i], cities[j])
distances[frozenset((i, j)))] = d
and I expected this to occupy (3 * 32b) * 568m ≈ 6.7 gigabytes of memory. But in fact, watching the live runtime in my jupyter notebook, it appears to be shooting past even 35GB. (442s and counting) I had to kill it as I was well into my swap space and it slowed down a lot. Anyone know why this is so surprisingly large?
update: trying again with tuple(sorted((i,j))) -- but already at 110s it is 15GB and counting
sizes
>>> import sys
>>> a = frozenset((1,2))
>>> sys.getsizeof(a)
216
>>> sys.getsizeof(tuple(sorted((1,2))))
56
>>> sys.getsizeof(1)
28
is there anything like float32 and int16 in python?? -- ans: numpy has them
updated attempt:
from numpy import float32, int16
from itertools import combinations
import sys
def get_distance(left, right):
""" return the euclidean distance between tuples left and right, which are coordinates"""
return float32(((left[0] - right[0]) ** 2 + (left[1] - right[1]) ** 2) ** 0.5)
# precompute all distances
distances = {}
for i, j in combinations(range(len(cities)), 2):
distances[tuple(sorted((int16(i), int16(j))))] = get_distance(cities[i], cities[j])
print(sys.getsizeof(distances))
observed sizes:
with cities = cities[:2] : 232
with cities = cities[:3] : also 232
with cities = cities[:10] : 2272
with cities = cities[:100] : 147552
with cities = cities[:1000] : 20971608 (20MB)
with cities = cities[:10000] : 2684354656 (2.6GB)
note the growth rate does not scale with the data even as we approach 50 million entries ie 10000 choose 2 (10% of the total size of the data):
2684354656/(1000 choose 2 / 100 choose 2 * 20971608) ≈ 1.27
20971608/(1000 choose 2 / 100 choose 2 * 147552) ≈ 1.4
I decided to halt my attempt at the full cities list, as my OS snapshot of the memory grew to well over 30GB and I was going to swap. This means that, even if the final object ends up that big, the amount of memory the notebook is requiring is much larger still.
Python objects have an overhead because of dynamic typing and reference counting. The absolute minimal object object() has a size of 16 bytes (on 64 bit machines). 8 byte reference count, 8 bytes type pointer. No python object can be smaller than that. float and int are slightly larger which 24 bytes at least. list are at least an array of pointers, which adds an additional 8 bytes. So the small possible memory footprint of a list of half a billion ints is 32 * 500_000_000 ~= 16Gb. sets and dicts are even larger than that since they store more than just one pointer per element.
Use numpy (maybe the stdlib array module is already enough).
(Note: The numpy float32 types can't be smaller than 16 bytes either)
I have a data frame of about 19 million rows, which 4 of the variables are latitudes & longitudes. I create a function to calculate distance of latitudes & longitudes with help of python haversine package.
# function to calculate distance of 2 coordinates
def measure_distance(lat_1, long_1, lat_2, long_2):
coordinate_start = list(zip(lat_1, long_1))
coodrinate_end = list(zip(lat_2, long_2))
distance = haversine_vector(coordinate_start, coodrinate_end, Unit.KILOMETERS)
return distance
I use magic command %%memit to measure memory usage to perform the calculation. On average, memory usage is between 8 - 10 GB. I run my work on Google Colab which has 12GB RAM, as a result, sometime the operation hit the limit of runtime and restart.
%%memit
measure_distance(df.station_latitude_start.values,
df.station_longitude_start.values,
df.station_latitude_end.values,
df.station_longitude_end.values)
peak memory: 7981.16 MiB, increment: 5312.66 MiB
Is there a way to optimise my code?
TL;DR: use Numpy and compute the result by chunk.
The amount of memory taken by the CPython interpreter is expected regarding the big input size.
Indeed, CPython stores values in list using references. On a 64-bit system, references takes 8 bytes and basic types (float and small integers) take usually 32 bytes. A tuple of two floats is a complex type that contains the size of the tuple as well as references of the two floats (not values themselves). Its size should be close to 64 bytes. Since you have 2 lists containing 19 million of (reference of) float pairs and 4 list containing 19 million of (reference of) floats, the resulting memory taken should be about 4*19e6*(8+32) + 2*19e6*(8+64) = 5.7 GB. Not to mention that Haversine can make some internal copies and the result take some space too.
If you want to reduce the memory usage, then use Numpy. Indeed, float Numpy arrays store values in a much more compact way (no references, no internal tag). You can replace the list of tuple by a N x 2 Numpy 2D array. The resulting size should be about 4*19e6*8 + 2*19e6*(8*2) = 1.2 GB. Moreover, the computation will be much faster Haversine use Numpy internally. Here is an example:
import numpy as np
# Assume lat_1, long_1, lat_2 and long_2 are of type np.array.
# Use np.array(yourList) if you want to convert it.
def measure_distance(lat_1, long_1, lat_2, long_2):
coordinate_start = np.column_stack((lat_1, long_1))
coordinate_end = np.column_stack((lat_2, long_2))
return haversine_vector(coordinate_start, coordinate_end, Unit.KILOMETERS)
The above code is about 25 time faster.
If you want to reduce even more the memory usage, you can compute the coordinate by chunk (for example 32K values) and then concatenate the output chunks. You can also use single precision numbers rather than double precision if you do not care too much about the accuracy of the computed distances.
Here is an example of how to compute the result by chunk:
def better_measure_distance(lat_1, long_1, lat_2, long_2):
chunckSize = 65536
result = np.zeros(len(lat_1))
for i in range(0, len(lat_1), chunckSize):
coordinate_start = np.column_stack((lat_1[i:i+chunckSize], long_1[i:i+chunckSize]))
coordinate_end = np.column_stack((lat_2[i:i+chunckSize], long_2[i:i+chunckSize]))
result[i:i+chunckSize] = haversine_vector(coordinate_start, coordinate_end, Unit.KILOMETERS)
return result
On my machine, using double precision, the above code takes about 800 MB while the initial implementation take 8 GB. Thus, 10 times less memory! It is also still 23 times faster! Using simple precision, the above code takes about 500 MB, so 16 times less memory, and it is 48 times faster!
I'm writing a toy rsync-like tool in Python. Like many similar tools, it will first use a very fast hash as the rolling hash, and then a SHA256 once a match has been found (but the latter is out of topic here: SHA256, MDA5, etc. are too slow as a rolling hash).
I'm currently testing various fast hash methods:
import os, random, time
block_size = 1024 # 1 KB blocks
total_size = 10*1024*1024 # 10 MB random bytes
s = os.urandom(total_size)
t0 = time.time()
for i in range(len(s)-block_size):
h = hash(s[i:i+block_size])
print('rolling hashes computed in %.1f sec (%.1f MB/s)' % (time.time()-t0, total_size/1024/1024/(time.time()-t0)))
I get: 0.8 MB/s ... so the Python built-in hash(...) function is too slow here.
Which solution would allow a faster hash of at least 10 MB/s on a standard machine?
I tried with
import zlib
...
h = zlib.adler32(s[i:i+block_size])
but it's not much better (1.1 MB/s)
I tried with sum(s[i:i+block_size]) % modulo and it's slow too
Interesting fact: even without any hash fonction, the loop itself is slow!
t0 = time.time()
for i in range(len(s)-block_size):
s[i:i+block_size]
I get: 3.0 MB/s only! So the simpe fact of having a loop accessing to a rolling block on s is already slow.
Instead of reinventing the wheel and write my own hash / or use custom Rabin-Karp algorithms, what would you suggest, first to speed up this loop, and then as a hash?
Edit: (Partial) solution for the "Interesting fact" slow loop above:
import os, random, time, zlib
from numba import jit
#jit()
def main(s):
for i in range(len(s)-block_size):
block = s[i:i+block_size]
total_size = 10*1024*1024 # 10 MB random bytes
block_size = 1024 # 1 KB blocks
s = os.urandom(total_size)
t0 = time.time()
main(s)
print('rolling hashes computed in %.1f sec (%.1f MB/s)' % (time.time()-t0, total_size/1024/1024/(time.time()-t0)))
With Numba, there is a massive improvement: 40.0 MB/s, but still no hash done here. At least we're not blocked at 3 MB/s.
Instead of reinventing the wheel and write my own hash / or use custom
Rabin-Karp algorithms, what would you suggest, first to speed up this
loop, and then as a hash?
It's always great to start with this mentality, but seems that you didn't get the idea of rolling hashes.
What makes a hashing function great for rolling is it's capability of reuse the previous processing.
A few hash functions allow a rolling hash to be computed very
quickly—the new hash value is rapidly calculated given only the old
hash value, the old value removed from the window, and the new value
added to the window.
From the same wikipedia page
It's hard to compare performance across different machines without timeit, but I changed your script to use a simple polynomial hashing with a prime modulo (would be even faster to work with a Mersene prime, because the modulo operation could be done with binary operations):
import os, random, time
block_size = 1024 # 1 KB blocks
total_size = 10*1024*1024 # 10 MB random bytes
s = os.urandom(total_size)
base = 256
mod = int(1e9)+7
def extend(previous_mod, byte):
return ((previous_mod * base) + ord(byte)) % mod
most_significant = pow(base, block_size-1, mod)
def remove_left(previous_mod, byte):
return (previous_mod - (most_significant * ord(byte)) % mod) % mod
def start_hash(bytes):
h = 0
for b in bytes:
h = extend(h, b)
return h
t0 = time.time()
h = start_hash(s[:block_size])
for i in range(block_size, len(s)):
h = remove_left(h, s[i - block_size])
h = extend(h, s[i])
print('rolling hashes computed in %.1f sec (%.1f MB/s)' % (time.time()-t0, total_size/1024/1024/(time.time()-t0)))
Apparently you achieved quite a improvement with Numba and it may speed up this code as well.
To extract more performance you may want to write a C (or other low-level language as Rust) functions to process a big slice of the list at time and returns an array with the hashes.
I'm creating a rsync-like tool as well, but as I'm writing in Rust performance in this level isn't a concern of mine. Instead, I'm following the tips of the creator of rsync and trying to parallelize everything I can, a painful task to do in Python (probably impossible without Jython).
what would you suggest, first to speed up this loop, and then as a hash?
Increase the blocksize. The smaller your blocksize the more python you'll be executing per byte, and the slower it will be.
edit: your range has the default step of 1 and you don't multiply i by block_size, so instead of iterating on 10*1024 non-overlapping blocks of 1k, you're iterating on 10 million - 1024 mostly overlapping blocks
First, your slow loop. As has been mentioned you are slicing a new block for every byte (less blocksize) in the stream. This is a lot of work on both cpu and memory.
A faster loop would be to pre chunk the data into parallel bits.
chunksize = 4096 # suggestion
# roll the window over the previous chunk's last block into the new chunk
lastblock = None
for readchunk in read_file_chunks(chunksize):
for i in range(0, len(readchunk), blocksize):
# slice a block only once
newblock = readchunk[i:blocksize]
if lastblock:
for bi in range(len(newblock)):
outbyte = lastblock[bi]
inbyte = newblock[bi]
# update rolling hash with inbyte and outbyte
# check rolling hash for "hit"
else:
pass # calculate initial weak hash, check for "hit"
lastblock = newblock
Chunksize should be a multiple of blocksize
Next, you were calculating a "rolling hash" over the entirety of each block in turn, instead of updating the hash byte by byte in "rolling" fashion. That is immensely slower. The above loop forces you to deal with the bytes as they go in and out of the window. Still, my trials show pretty poor throughput (~3Mbps~ edit: sorry that's 3MiB/s) even with a modest number of arithmetic operations on each byte. Edit: I initially had a zip() and that appears rather slow. I got more than double the throughout for the loop alone without the zip (current code above)
Python is single threaded and interpreted. I see one cpu pegged and that is the bottleneck. To get faster you'll want multiple threads (subprocess) or break into C, or both. Simply running the math in C would probably be enough I think. (Haha, "simply")
I have these two small programs:
1.
x = 1000
while (1000 * x != x):
x = 1000 * x
print("Done")
2.
x = 1000.0
while (1000.0 * x != x):
x = 1000.0 * x
print("Done")
I am trying to make an informed guess on how these programs would execute. I thought as integers are stored in 4 bytes (32 bits), that the first program will execute the loop until x reaches 2^31 and then maybe give out an error. And I guessed that the second loop would go on forever as floats can store more information than int.
My guess couldn't be any more wrong. The first one seems to go on forever whereas the second exists the loop and prints "Done" after x reaches approximately 10^308–this is when x takes the value inf (presumably infinite).
I can't understand how this works, any explanation would be appreciated. Thank you!
The first example with integers will loop until no memory is available (in which case the process will stop or the machine will swap to death):
x = 1000
while (1000 * x != x):
x = 1000 * x
because integers don't have a fixed size in python, they just use all the memory if available (in the process address range).
In the second example you're multiplying your floating point value, which has a limit, because it's using the processor floating point, 8 bytes (python float generally use C double type)
After reaching the max value, it overflows to inf (infinite) and in that case
1000 * inf == inf
small interactive demo:
>>> f = 10.0**308
>>> f*2
inf
>>> f*2 == f*1000
True
>>>
From this article:
When a variable is initialized with an integer value, that value becomes an integer object, and the variable points to it (references the object).
Python removes this confusion, there is only the integer object. Does it have any limits? Very early versions of Python had a limit that was later removed. The limits now are set by the amount of memory you have in your computer. If you want to create an astronomical integer 5,000 digits long, go ahead. Typing it or reading it will be the only problem! How does Python do all of this? It automatically manages the integer object, which is initially set to 32 bits for speed. If it exceeds 32 bits, then Python increases its size as needed up to the RAM limit.
So example 1 will run as long as your computer has the RAM.
Setup
Python 2.6
Ubuntu x64
I have a set of unique integers with values between 1 and 50 million. New integers are added at random e.g. numberset.add(random.randint(1, 50000000)). I need to be able to quickly add new integers and quickly check if an integer is already present.
Problem
After a while, the set grows too large for my low memory system and I experience MemoryErrors.
Question
How can I achieve this while using less memory? What's the fastest way to do this using the disk without reconfiguring the system e.g. swapfiles? Should I use a database file like sqlite? Is there a library that will compress the integers in memory?
You can avoid dependencies on 3rd-party bit-array modules by writing your own -- the functionality required is rather minimal:
import array
BITS_PER_ITEM = array.array('I').itemsize * 8
def make_bit_array(num_bits, initially=0):
num_items = (num_bits + BITS_PER_ITEM - 1) // BITS_PER_ITEM
return array.array('I', [initially]) * num_items
def set_bit(bit_array, offset):
item_index = offset // BITS_PER_ITEM
bit_index = offset % BITS_PER_ITEM
bit_array[item_index] |= 1 << bit_index
def clear_bit(bit_array, offset):
item_index = offset // BITS_PER_ITEM
bit_index = offset % BITS_PER_ITEM
bit_array[item_index] &= ~(1 << bit_index)
def get_bit(bit_array, offset):
item_index = offset // BITS_PER_ITEM
bit_index = offset % BITS_PER_ITEM
return (bit_array[item_index] >> bit_index) & 1
Use a bit-array.This will reduce the need for huge space requirement.
Realted SO Question:
Python equivalent to Java's BitSet
Use an array of bits as flags for each integer - the memory needed will be only 50 million bits (about 6 MB). There are a few modules that can help. This example uses bitstring, another option is bitarray:
from bitstring import BitArray
i = BitArray(50000000) # initialise 50 million zero bits
for x in xrange(100):
v = random.randint(1, 50000000)
if not i[v]: # Test if it's already present
i.set(1, v) # Set a single bit
Setting and checking bits is very fast and it uses very little memory.
Try to use array module.
Depending on your requirements, you might also consider a bloom filter. It is a memory-efficient data structure for testing if an element is in a set. The catch is that it it can give false-positives, though it will never give false-negatives.
If integers are unique then use bits. Example: binary 01011111 means that there are: 1, 3, 4, 5, 6 and 7. This way every bit is used to check if its integer index is used (value 1) or not (value 0).
It was described in one chapter of "Programming Pearls" by Jon Bentley (look for "The file contains at most ten million records; each record is a seven-digit integer.")
It seems that there is bitarray module mentioned by Emil that works this way.