Possibility of memory error? - python

a=raw_input()
prefix_dict = {}
for j in xrange(1,len(a)+1):
prefix = a[:j]
prefix_dict[prefix] = len(prefix)
print prefix_dict
Is there any possibility of memory error in the above code? This code is running on a server, the server is a quad core Xeon machines running 32-bit Ubuntu (Ubuntu 12.04 LTS). For few cases its working and for few its showing memory error. FYI: I do not know the cases that they are testing but inputs are lower case alphabets. Size of input <= 10,000

The amount of memory for just the data alone for that will be 1 + 2 + 3 ... + n-2 + n-1 + n where n is the length of the input, in other words, len(a). That works out to (n+1) * n/2. If n is 10,000, this works out to about 50 MB of string data plus however much RAM is used by a python dictionary to store 10,000 entries. Testing on my OSX box, this seems minimal and indeed if I run this code on it, the process shows 53.9 MB used:
str = "a"
d = {}
for i in xrange(10000):
d[str] = i
str = str + "a"
I don't see anything overtly wrong with your code and when I run it on a string that is 10,000 letters long, it happily spits out about 50mb to output, so something else must be going wrong.
What does top show as memory usage for the process?

Maybe a smaller piece of code will help:
prefix_dict = { a[:j]:j for j in xrange(1, len(a) + 1) }

Related

Range of int and float in Python

I have these two small programs:
1.
x = 1000
while (1000 * x != x):
x = 1000 * x
print("Done")
2.
x = 1000.0
while (1000.0 * x != x):
x = 1000.0 * x
print("Done")
I am trying to make an informed guess on how these programs would execute. I thought as integers are stored in 4 bytes (32 bits), that the first program will execute the loop until x reaches 2^31 and then maybe give out an error. And I guessed that the second loop would go on forever as floats can store more information than int.
My guess couldn't be any more wrong. The first one seems to go on forever whereas the second exists the loop and prints "Done" after x reaches approximately 10^308–this is when x takes the value inf (presumably infinite).
I can't understand how this works, any explanation would be appreciated. Thank you!
The first example with integers will loop until no memory is available (in which case the process will stop or the machine will swap to death):
x = 1000
while (1000 * x != x):
x = 1000 * x
because integers don't have a fixed size in python, they just use all the memory if available (in the process address range).
In the second example you're multiplying your floating point value, which has a limit, because it's using the processor floating point, 8 bytes (python float generally use C double type)
After reaching the max value, it overflows to inf (infinite) and in that case
1000 * inf == inf
small interactive demo:
>>> f = 10.0**308
>>> f*2
inf
>>> f*2 == f*1000
True
>>>
From this article:
When a variable is initialized with an integer value, that value becomes an integer object, and the variable points to it (references the object).
Python removes this confusion, there is only the integer object. Does it have any limits? Very early versions of Python had a limit that was later removed. The limits now are set by the amount of memory you have in your computer. If you want to create an astronomical integer 5,000 digits long, go ahead. Typing it or reading it will be the only problem! How does Python do all of this? It automatically manages the integer object, which is initially set to 32 bits for speed. If it exceeds 32 bits, then Python increases its size as needed up to the RAM limit.
So example 1 will run as long as your computer has the RAM.

Round system RAM and HDD bytes, UP to nearest even gigabyte. Python

I'm collecting the system information of the current machine. Part of this information is the RAM and HDD capacity. Problem is that the capacity being gathered is measured in bytes rather than GB.
In a nutshell, how do I convert the display of the internal specifications to resemble what you would see from a consumer/commercial stand point?
1000GB HDD or 8GB RAM as opposed to the exact number of bytes available. Especially since manufacturers set aside different amounts of recovery sectors, RAM can be used for integrated graphics and the 1000 vs 1024 binary differential, etc... Here's an example of my current code:
import os
import wmi #import native powershell functionality
import math
c = wmi.WMI()
SYSINFO = c.Win32_ComputerSystem()[0] # Manufacturer/Model/Spec blob
RAMTOTAL = int(SYSINFO.TotalPhysicalMemory) # Gathers only the RAM capacity in bytes.
RAMROUNDED = math.ceil(RAMTOTAL / 2000000000.) * 2.000000000 # attempts to round bytes to nearest, even, GB.
HDDTOTAL = int(HDDINFO.size) # Gathers only the HDD capacity in bytes.
HDDROUNDED = math.ceil(HDDTOTAL / 2000000000.) * 2.000000000 # attempts to round bytes to nearest, even, GB.
HDDPRNT = "HDD: " + str(HDDROUNDED) + "GB"
RAMPRNT = "RAM: " + str(RAMROUNDED) + "GB"
print(HDDPRNT)
print(RAMPRNT)
The area of interest is lines 8-11where I'm rounding up to the nearest even number since the internal size of RAM/HDD are always lower than advertised for reasons mentioned previously. StackOverflow posts have gotten me this method which is the most accurate, across the most machines, but it's still hard coded. Meaning the HDD only rounds accurately for either hundreds of GB or thousands, not both. Also, the RAM isn't 100% accurate.
Here's a couple workarounds that come to mind that will produce the results I'm looking for:
Adding additional commands to RAMTOTAL that may or may not be available. Allowing for GB output instead of KB. However. I would prefer it to be apart of the WMI import instead of straight native Windows code.
Figure out a more static method of rounding. ie: if HDDTOTAL > 1TB round up to decimal point X. else HDDTOTAL < 1TB use different rounding method.
I think you could write a simple function that solves it. In case the number in kB would be significantly smaller or greater, I added a possibility of different suffixes (It is inspired by very similar example in a book Dive Into Python 3). It might look something like this:
def round(x):
a = 0
while x > 1000:
suffixes = ('kB','MB','GB','TB')
a += 1 #This will go up the suffixes tuple with each division
x = x /1000
return math.ceil(x), suffixes[a]
Results of this function might look like this:
>>> print(round(19276246))
(20, 'GB')
>>> print(round(135565666656))
(136, 'TB')
>>> print(round(1355))
(2, 'MB')
and you could implement it to your code like this:
import os
import wmi #import native powershell functionality
import math
def round(x):
a = 0
while x > 1000:
suffixes = ('kB','MB','GB','TB')
a += 1 #This will go up the suffixes tuple for each division
x = x /1000
return math.ceil(x), suffixes[a]
.
.
.
RAMROUNDED = round(RAMTOTAL) #attempts to round bytes to nearest, even, GB.
HDDTOTAL = int(HDDINFO.size) # Gathers only the HDD capacity in bytes.
HDDROUNDED = round(HDDTOTAL) #attempts to round bytes to nearest, even, GB.
HDDPRNT = "HDD: " + str(HDDROUNDED[0]) + HDDROUNDED[1]
RAMPRNT = "RAM: " + str(RAMROUNDED[0]) + RAMROUNDED[1]
print(HDDPRNT)
print(RAMPRNT)
PowerShell has a lot of very powerful native math capabilities built in, allowing us to do things like divide by 1GB to get the whole number in gigabytes of a particular drive.
So, to see the total physical memory rounded by 1 GB, this is how to do it:
get-wmiobject -Class Win32_ComputerSystem |
select #{Name='Ram(GB)';Expression={[int]($_.TotalPhysicalMemory /1GB)}}
This method is called a Calculated Property, the way it differs from using a regular select statement (like Select TotalPhysicalMemory) is that I'm telling PowerShell to make a new Prop call Ram(GB) and use the following expression to determine it's value.
[int]($_.TotalPhysicalMemory /1GB)
The expression I'm using begins in the parenthesis, where I'm getting the TotalPhysicalMemory (which returns as 17080483840). I then divide by 1GB to give me 15.9074401855469. Finally, I apply [int] to cast the whole thing as an integer that is to say, make it a whole number, rounding as appropriate.
Here is the output
>Ram(GB)
-------
16
I used a combination of the two previous suggestions.
I used an if loop, rather than a while loop, but get the same results. I also mirrored the same internal process of the suggested powershell commands to keep the script more native to python and without adding modules/dependencies.
GBasMB = int(1000000000) # Allows for accurate Bytes to GB conversion
global RAMSTRROUNDED
RAMTOTAL = int(SYSINFO.TotalPhysicalMemory) / (GBasMB) # Outputs GB by dividing by previous MB variable
RAMROUNDED = math.ceil(RAMTOTAL / 2.) * 2 # Rounds up to nearest even whole number
RAMSTRROUNDED = int(RAMROUNDED) # final converted variable
HDDTOTAL = int(HDDINFO.size) / (GBasMB) # Similar process as before for HardDrive
HDDROUNDED = math.ceil(HDDTOTAL / 2.) * 2 # round up to nearest even whole number
def ROUNDHDDTBORGB(): # function for determining TB or GB sized HDD
global HDDTBORGBOUTPUT
global HDDPRNT
if HDDROUNDED >= 1000: # if equal to or greater than 1000GB, list as 1TB
HDDTBORGB = HDDROUNDED * .001
HDDTBORGBOUTPUT = str(HDDTBORGB) + "TB"
HDDPRNT = "HDD: " + str(HDDTBORGBOUTPUT)
print(HDDPRNT)
elif HDDROUNDED < 1000: # if less than 1000GB list as GB
HDDTBORGBOUTPUT = str(str(HDDROUNDED) + "GB")
HDDPRNT = "HDD: " + str(HDDTBORGBOUTPUT)
I've ran this script on several dozen computers and seems to accurately gather the appropriate amount of RAM and HDD capacities. Regardless of how much RAM the integrated graphics decides to consume and/or reserve sectors on the HDD, etc...

Itertools.combinations performance issue with large lists

I've got a list of tickers in tickers_list and I need to make a combination of the unique elements.
Here's my code:
corr_list = []
i = 0
for ticker in tickers_list:
tmp_list = tickers_list[i+1:]
for tl_ticker in tmp_list:
corr_list.append({'ticker1':ticker, 'ticker2':tl_ticker})
i = i+1
len(corr_list)
Here's the code using iteritems:
from itertools import combinations
result = combinations(tickers_list, 2)
new_list = [{'ticker1':comb[0], 'ticker2':comb[1]} for comb in combinations(tickers_list, 2)]
len(new_list)
The produce the exact same results. Of course, the iteritems code is much more elegant, and both work perfectly for 10, 100, and 1,000 items. The time required is almost identical (I timed it). However, at 4,000 items, my code takes about 12 seconds to run on my machine while iteritems crashes my computer. The resultant sent is only around 20m rows, so am I doing something wrong or is this an issue with iteritems?
from itertools import combinations, product
# 20 * 20 * 20 == 8000 items
tickers_list = ["".join(chars) for chars in product("ABCDEFGHIJKLMNOPQRST", repeat=3)]
# 8000 * 7999 / 2 == 31996000 (just under 32 million) items
unique_pairs = [{"ticker1":t1, "ticker2":t2} for t1,t2 in combinations(tickers_list, 2)]
works fine on my machine (Win7Pro x64, Python 3.4) up to len(tickers_list) == 8000 (taking 38s and consuming almost 76 GiB of RAM to do so!).
Are you running 32-bit or 64-bit Python? Do you actually need to store all combinations, or can you use and discard them as you go?
Edit: I miscalculated the size; it is actually about 9.5 GiB. Try it yourself:
from sys import getsizeof as size
size(unique_pairs) + len(unique_pairs) * size(unique_pairs[0])
which gives
9479596592 # total size of structure in bytes == 9.5 GiB
In any case, this is not a result of using itertools; it is the memory footprint of the resulting list-of-dicts and would be the same regardless of how you generated it. Your computer can probably handle it anyway using virtual RAM (swapping to disk), though it is much slower.
If you are just doing this to get it into a database, I suggest letting the database do the work: make a table containing the items of tickers_list, then select a cross-join to produce the final table, something like
SELECT a.ticker, b.ticker
FROM
tickers as a,
tickers as b
WHERE
a.ticker < b.ticker

Python prime number code giving runtime error(NZEC) on spoj

I am trying to get an accepted answer for this question:http://www.spoj.com/problems/PRIME1/
It's nothing new, just wanting prime numbers to be generated between two given numbers. Eventually, I have coded the following. But spoj is giving me runtime-error(nzec), and I have no idea how it should be dealt with. I hope you can help me with it. Thanks in advance.
def is_prime(m,n):
myList= []
mySieve= [True] * (n+1)
for i in range(2,n+1):
if mySieve[i]:
myList.append(i)
for x in range(i*i,n+1,i):
mySieve[x]= False
for a in [y for y in myList if y>=m]:
print(a)
t= input()
count = 0
while count <int(t):
m, n = input().split()
count +=1
is_prime(int(m),int(n))
if count == int(t):
break
print("\n")
Looking at the problem definition:
In each of the next t lines there are two numbers m and n (1 <= m <= n <= 1000000000, n-m<=100000) separated by a space.
Looking at your code:
mySieve= [True] * (n+1)
So, if n is 1000000000, you're going to try to create a list of 1000000001 boolean values. That means you're asking Python to allocate storage for a billion pointers. On a 64-bit platform, that's 8GB—which is fine as far as Python's concerned, but might well throw your system into swap hell or get it killed by a limit or watchdog. On a 32-bit platform, that's 4GB—which will guarantee you a MemoryError.
The problem also explicitly has this warning:
Warning: large Input/Output data, be careful with certain languages
So, if you want to implement it this way, you're going to have to come up with a more compact storage. For example, array.array('B', [True]) * (n+1) will only take 1GB instead of 4 or 8. And you can make it even smaller (128MB) if you store it in bits instead of bytes, but that's not quite as trivial a change to code.
Calculating prime numbers between two numbers is meaningless. You can only calculate prime numbers until a given number by using other primes you found before, then show only range you wanted.
Here is a python code and some calculated primes you can continue by using them:
bzr branch http://bzr.ceremcem.net/calc-primes
This code is somewhat trivial but is working correctly and tested well.

NZEC error in python code

MY code is working perfectly fine on my machine but it gives M+NZEC erro when compiled by spoj.
Here is the link to my ques:
http://www.spoj.com/problems/CPRIME/
Here is my code:
def smallPrimes(n):
"""Given an integer n, compute a list of the primes <= n"""
if n <= 1:
return []
sieve = range(3, n+1, 2)
top = len(sieve)
for si in sieve:
if si:
bottom = (si*si - 3)//2
if bottom >= top:
break
sieve[bottom::si] = [0] * -((bottom-top)//si)
return [2]+filter(None, sieve)
from math import *
import sys
def main():
flag=True
while(flag==True):
x=input()
if(x==0):
flag=False
return 0
z=x/log(x)
v=len(smallPrimes(x))
print round((abs(v-z)*100/(v)),1)
if __name__ == "__main__":
main()
In SPOJ the NZEC error is raised when an exception occurs in the Python script execution.
In your case as the problem in input is well specified and terminates on a zero, so it can't be because of input as you take that into consideration.
The error is most likely because of usage of more memory than allowed. In your problem the memory limit is specified as 256 MB. But in your code
sieve = range(3, n+1, 2)
This line declares a list of size about n/2. When, n=10^8 it means that you will declare a list with 5*10^7 integers which with a naive approximation and ignoring all the overheads will be
(5*10^7)*4 bytes
~ 200 MB
Including the overheads and other memory usage for your second big list declaration
[0] * -((bottom-top)//si)
which can reach about 130 MB neglecting all the overheads, you will exceed the memory limit to just store that many integers in the list. I noticed memory usage of about 1 GB by your code on my machine. So your code crosses the memory limit on SPOJ and it raises an exception.
The best thing to do is to optimize your approach, declaring lists of the order of 10^8 is seldom needed in such questions. I can see a way in which you won't need to declare a list that big but since it's a question of an online judge, it's best to let you figure out the approach. :)

Categories