Why does this python statement got stuck? - python

Have anyone ever experienced this statement? The following code looks work well, but it makes my laptop get stuck when the exponent reaches 9 or above.
ordered_tuple = tuple(range(10**9))
Every time I run this statement, my laptop slows down, got stuck taking up 100% of RAM usage.
I searched for the reason why this happened, but there was no suitable answer.
Please help me understand why this code makes computers busy.
Thanks in advance.
I tried with different exponent between [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].
1-8 works fairly well as I expected.
ordered_tuple = tuple(range(10**1))
ordered_tuple = tuple(range(10**2))
ordered_tuple = tuple(range(10**3))
ordered_tuple = tuple(range(10**4))
ordered_tuple = tuple(range(10**5))
ordered_tuple = tuple(range(10**6))
ordered_tuple = tuple(range(10**7))
ordered_tuple = tuple(range(10**8))
whereas 9, 10 doesn't work.

10**9 is 10 times as big as 10**8. 10**10 is 100 times as big as 10**8. Likely your computer just does not have enough RAM to hold all these numbers.
PS: Not sure how long is a Python int, if it is 4-byte long, then 10**9 of them is 4 GB, but then 10**10 does not fit in 4-byte long int and needs more. If it is 8-byte long, then 10**9 is 8 GB and 10**10 is 80 GB.

Related

Processing a byte string in chunks

I have a really long byte string such as this (the actual value may be random):
in_var = b'\x01\x02\x03\x04\x05\x06...\xff'
I also have a function that performs an operation on a chunk of bytes and returns the same number of bytes (let's say 10 bytes for this example):
def foo(chunk):
# do smth with chunk
# ......
return chunk
I want to process in_var with foo() for all chunks of 10 bytes (sending the last chunk as is if less than 10 bytes remain at the end) and create a new variable out_var with the outputs.
The way I'm currently doing it is taking way too long:
out_var = b''
for chunk in range(0, len(in_var), 10):
out_var += foo(in_var[chunk: chunk + 10])
The function foo() only takes a fraction of a second per run, so the total should be very fast (total of all chunks of 10). However, I'm getting an order of magnitude longer.
I also tried this with similar results:
import numpy as np
import math
in_var= np.array_split(np.frombuffer(in_var, dtype=np.uint8), math.ceil(len(in_var)/10))
out_var= b"".join(map(lambda x: foo(x), in_var))
foo() can only process 10 bytes for this example (ex: it's an encryption function with a fixed block size) and if a smaller chunk is given to it, it just pads to make the chunk 10 bytes. Let's say I have no control over it, and foo() can only process in chunks of 10 bytes.
Is there a much faster way to do this? As a last resort, I may have to parallelize my code so all chunks get processed in parallel...
Thank you!
UPDATE:
Apparently I had not correctly measured the time foo() takes. It turns out, foo() is taking the majority of the time, hence the order of magnitude comment above. Thank you again for your comments and suggestions, I did make some improvements nevertheless. Parallelizing the code seems to be the correct path forward.
The problem with your for loop is that it creates a new string, slightly longer each time, and copies the old data to the new. You can speed it up by pre-allocating the bytes and just copying over them directly:
out_var = bytearray(len(in_var))
for chunk in range(0, len(in_var), 10):
out_var[chunk: chunk + 10] = foo(in_var[chunk: chunk + 10])

how do i get all the divisors of an input using its prime factors and multiplicities?

i've been trying to solve this problem for days but i still couldnt get the correct output. the problem is like this, for example if i use the number 50, it has a list [2, 5] as its prime factors and the corresponding multiplicities of those factors are as follows [1, 2]
prime_factors = [2, 5]
multiplicities = [1, 2]
so the divisors should have an output of [1, 2, 5, 10, 25, 50] but that's just the problem, how do i create a function that gets all the divisors of a certain input, using its prime factors and their corresponding multiplicities?
Well, the question is "how do I write a function", so maybe those advices may help you:
Get simple example. You already did this part - you have input 50 (prime_factors = [2, 5] multiplicities = [1, 2]) and the output of [1, 2, 5, 10, 25, 50].
Try to work example on paper step by step. - Just try to figure out how the output is obtained from input. For example, 10 in the output list comes from 5 in prime_factors to the power of 1 from multiplicities multiplied by 2 to the power of 1 from multiplicities. Try to figure out the rest of elements in the output list. Where does the 25 come from?
Come up with the function signature. - Think about what is the minimal information you need to figure out the result and what form should the result have? That should be easy since you already posted and example with well-defined inputs and output.
Try to generify the steps you did with example. - When you will come up with solution for your example, try to think about the other one. How would you get the result for 40 or 60? Can you use the same patterns as for 50?
Write some pseudocode. - Write your idea in a code-like fashion. Do some loops if necessary, name variables, structure your code a little. That should result in the result of 4th step, but ready to be written in code.
Write the code. - Since you want to program in Python, change your pseudocode to the Python one. This step may be difficult depending on your skills, but if you come this far, people here will help you with finishing.
If you are stuck on any of these steps, people here will gladly help, we just have to have some starting point. I would recommend writing out each element in the output and where it comes from. Good luck!

Python MemoryError on large array

This is the python script that I'm trying to run:
n = 50000000000 ##50 billion
b = [0]*n
for x in range(0,n):
b[x] = random.randint(1,899999)
... But the output I'm getting is:
E:\python\> python sort.py
Traceback (most recent call last):
File "E:\python\sort.py", line 8, in <module>
b = [0]*n
MemoryError
So, what do I do now?
The size of the list you are generating (which is 50 billion not 5).
An int object instance takes 24 bytes (sys.getsizeof(int(899999)), the upper limit of your random numbers), so that list would take 50,000,000,000 * 24 bytes, which is about 1.09 TB.
In other words to create such a list you would need at least 1118 GB of RAM in your computer.
I don't know what your use case is, but you should consider a different approach to what you are trying to solve (maybe define a generator, or just don't store your numbers in memory and instead directly use the numbers in the for loop).
Since other people already answered your question here's a quick tip when dealing with big numbers: you can use "_" to separate the digits of your numbers as you wish:
n = 50_000_000_000
is the same as
n = 50000000000
but the former is much easier on the eyes
One other possibility is to increase you computers vitual memory. It helped me in my code. I had a max 3000MB virtual memory, when I increased it to 5000MB the memory error was gone.

Python for for loops Optimisation

so I got a piece of code on which I try ways to make it run faster...because it takes roughtly 40 seconds to execute :
for i in range (1, len(pk), 2):
for j in range (2, len(pk), 2):
b = random.randrange(2 ** Const.ALPHA)
sum += (b * pk[i] * pk[j])
I tried Threads...it doesnt run any faster.
I tried to used sum() with the two for loops embbeded in it. It doesnt run any faster either.
pk elements are very large int.
Right now, len(pk) is equal to 162 and Const.ALPHA is equal to 9. But in the future, it might very well be more than that.
Thx
PS : Btw, you get a cookie if you can guess what's the purpose of the program using these variables.
I don't have an i7 :) and I don't know how big your numbers are. I tried it with 65536-bit pk[i], and your function took almost 10.5 seconds, so I suppose your numbers are still a lot bigger.
But I think the results should be indicative. The function below took 0.45 seconds (for a better than 20x speed-up); it avoids multiplying bignums by turning sum(sum(r(b)*pk[i]*pk[j])) into sum(pk[i]*sum(r(b)*pk[j])). Not only does this do fewer multiplications, the majority of the multiplications which are left are smallnum * bignum instead of bignum * bignum.
The use of generators and list comprehensions might not help. YMMV.
def xmul(pk):
# The speedup from these locals is insignificant, but
# it lets the inline generator fit on a line.
rmax = 2**const.ALPHA
r = random.randrange
return sum(
pki * sum(r(rmax) * pkj for pkj in pk[2::2])
for pki in pk[1::2]
)

Why does Python slow to a crawl in this code?

So I'm doing an Euler Project problem trying to incorporate the Sieve of Eratosthenes to find the largest prime factor of a number, however when I try to fill my initial hashtable it slows to a crawl and eats up gigs worth of RAM and takes over my CPU. Can anyone explain why? I realize the code itself is probably subpar
allNums = {}
maxNum=600851475143
maxFactor=0
#fill dictionary, slows to a crawl here
for x in xrange(2,maxNum+1):
allNums[x]=True
#sieve of Erastosthenes
for x in xrange(2,len(allNums)):
y=x
if allNums[x]:
y **= 2
while y<=maxNum:
if allNums[y]:
allNums.pop(y)
y+=x
#largest prime factor
for x in allNums:
if maxNum%x==0 and x>maxFactor:
maxFactor=x
print x
Well, you allocate a huge list (maxNum*4 bytes, right)? Even if it is a dictionary and you can search there with log(N) complexity, it is still going to take significant time (and, more importantly, memory). Such a huge list might not even fit into your RAM, therefore your OS would have to imitate the extra memory (which would mean even longer access time to just read that data).
By the way, this is how your problem can be done (not very effective - takes a couple of seconds on my machine :) - but with the idea similar to Eratosphene sieve, except for a small optimization, namely not to continue if x*x > maxNum.
maxNum=600851475143
maxFactor=0
x = 1
while True:
(quotient, remainder) = divmod(maxNum, x)
if x>quotient:
break
if remainder==0:
maxFactor=x
x +=1
print maxNum, maxFactor

Categories