Python's pow function : How to integrate pow function into assembly language

Python's pow function : How to integrate pow function into assembly language - python

I am designing an operating system, written in Nasm. The purpose is to calculate Fermat's primality test on a low level system in protected mod without multitasking. I use DOS level system calls in DPMI. The code is a standard one and i do not think it is a good idea blowing up this question with long codes.
The question is about python's pow(int,exponent,modulus) function. I will obviously calculate very long numbers about 10^100000000 length. (Example: 2**332200000-1) The code gets the input from user or reads this from a file. It takes approx. 40Mb to cache this large sized integer into file or just into memory.
This means i just allocate a memory for a size of 40Mb in protected mod.
The Fermat's little theorem is working as you know:
if p is prime,
a is integer and gcd(a,p)=1 then,
a**(p-1) mod p = 1
In python, it is being calculated like a charm with no extra effort it gives much back, but extra-ordinary integers like 2^332200000-1 are at a low speed and i decided to make my own operating system-shell, fired on when my computer is booting. The reason is to get the most out of my piety computer system without any system calls which are lowering the speed of my calculations. I have following question:
Is there a website where i can observe and calculate the assembly-code of python's power function ?
If yes or no, can you give me a hint how to do this effectively in assembly very short and speedy ?
The idea is very basic and brief:
The 4-Byte integer will not work in assembly. So decided to read the long hex integers from a file into allocated memory(40Mbyte). When i do calculations with this integer which is very long, for example multiply by 2 then i roll every 4 Byte-Integer to right into a second memory free place. If there is a carry in rest, this will be added to second 4-Byte calculation and so on. It is possible to use memory for these long integers. Everything is ready designed, but the kick to make sense in assembly is just in researching phase. Can you help or inform me in some way.
Again, how to calculate with very, very long numbers in assembly and how to make a power function python's-like with exponent and a modulus inside. How would that seem to be like in code-form.

Related

What is the max value of pygame.time.get_ticks()?

I'm wondering, if i use pygame.time.get_ticks(), can i get a stack overflow?
I wrote a game, and the score depends on the time the player spend with it, before dies. This way, I can get a quite large number, especially measuring it in miliseconds. What's the upper limit, pygame can handle? I know, that python is only limited by my computer's memory, but it looks ineffective, even compared to itself.
What are the alternatives, if i dont want this type of memory-leak? Or is it a problem at all?

This is 100000% not an issue. When dealing with an int Python will never throw an OverflowError.
get_ticks returns a python int, which when initialized on a 32 bit system is a 4-byte number. So it can store 2^32 values. 2^32 milliseconds is equal to about 50 days, so you are not going to run into any issues whatsoever. Even if it did somehow run longer than that, Python automatically increases the size of integers and there is no limit. Even a simple 8 byte value (2^64) can store milliseconds totaling up to 584,554,531 years.
See also How does Python manage int and long? and the OverflowError doc.

Why does my Python freeze when I do an overflow calculation?

I'm a MATLAB user trying to understand Python so sorry if this is obvious.
If I say
print(9**9)
I get:
387420489
Great.
If I say print(9**9**9)
Python just sits there indefinitely and freezes (I use Spyder version 4). Ctrl-C doesn't stop it.
Why does it not just immediately return Inf? Is this expected behavior?

When doing numerical calculations with integers, python is not limited to machine-specific numbers such as "int32", and therefore a number such as "2147483647" does not mean much to it. Instead, it uses a "big integer" library, which can, in principle, express any large number, provided there is enough memory for it. When facing a computation such as 9**9**9 python tries to perform it exactly, producing the exact result, however big it may be. For this particular calculation it just takes a lot of time (and memory, presumably internally python is trying to allocate more and more memory as needed).

the num 9**9**9 is very big to caculated
you can wait untill it will return a result
it can take much long time

Why does my Python freeze when I do an overflow calculation?
because no overflow occurred and python hasn't given up. Python will extend the precision until either the calculation succeeds or the machine runs out of memory.

Numpy octuple precision floats and 128 bit ints. Why and how?

This is mostly a question out of curiosity. I noticed that the numpy test suite contains tests for 128 bit integers, and the numerictypes module refers to int128, float256 (octuple precision?), and other types that don't seem to map to numpy dtypes on my machine.
My machine is 64bit, yet I can use quadruple 128bit floats (but not really). I suppose that if it's possible to emulate quadruple floats in software, one can theoretically also emulate octuple floats and 128bit ints. On the other hand, until just now I had never heard of either 128bit ints or octuple precision floating point before. Why is there a reference to 128bit ints and 256bit floats in numpy's numerictypes module if there are no corresponding dtypes, and how can I use those?

This is a very interesting question and probably there are reasons related to python, to computing and/or to hardware. While not trying to give a full answer, here is what I would go towards...
First note that the types are defined by the language and can be different from your hardware architecture. For example you could even have doubles with an 8-bits processor. Of course any arithmetic involves multiple CPU instructions, making the computation much slower. Still, if your application requires it, it might be worth it or even required (better being late than wrong, especially if say you are running a simulation for a say bridge stability...) So where is 128bit precision required? Here's the wikipedia article on it...
One more interesting detail is that when we say a computer is say 64-bit, this is not fully describing the hardware. There are a lot of pieces that can each be (and at least have been at times) different bits: The computational registers in the CPU, the memory addressing scheme / memory registers and the different buses with most important the buss from CPU to memory.
-The ALU (arithmetic and logic unit) has registers that do calculations. Your machines are 64bit (not sure if that also mean they could do 2 32bit calculations at a similar time) This is clearly the most relevant quantity for this discussion. Long time ago, it used to be the case you could go out and buy a co-processor to speed that for calculations of higher precision...
-The Registers that hold memory addresses limit the memory the computer can see (directly) that is why computers that had 32bit memory registers could only see 2^32 bytes (or approx 4 GB) Notice that for 16bits, this becomes 65K which is very low. The OS can find ways around this limit, but not for a single program, so no program in a 32bit computer can normally have more than 4GB memmory.
-Notice that those limits are about bytes, not bits. That is because when referring and loading from memory we load bytes. In fact, the way this is done, loading a byte (8 bits) or 8 (64 bits == buss length for your computer) takes the same time. I ask for an address, and then get at once all bits through the bus.
It can be that in an architecture all these quantities are not the same number of bits.

NumPy is amazingly powerful and can handle numbers much bigger than the internal CPU representation (e.g. 64 bit).
In case of dynamic type it stores the number in an array. It can extend the memory block too, that is why you can have an integer with 500 digits. This dynamic type is called bignum. In older Python versions it was the type long. In newer Python (3.0+) there is only long, which is called int, which supports almost arbitrarily number of digits (-> bignum).
If you specify a data type (int32 for example), then you specify bit length and bit format, i.e. which bits in memory stands for what. Example:
dt = np.dtype(np.int32) # 32-bit integer
dt = np.dtype(np.complex128) # 128-bit complex floating-point number
Look in: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

Prime number hard drive storage for very large primes - Sieve of Atkin

I have implemented the Sieve of Atkin and it works great up to primes nearing 100,000,000 or so. Beyond that, it breaks down because of memory problems.
In the algorithm, I want to replace the memory based array with a hard drive based array. Python's "wb" file functions and Seek functions may do the trick. Before I go off inventing new wheels, can anyone offer advice? Two issues appear at the outset:
Is there a way to "chunk" the Sieve of Atkin to work on segment in memory, and
is there a way to suspend the activity and come back to it later - suggesting I could serialize the memory variables and restore them.
Why am I doing this? An old geezer looking for entertainment and to keep the noodle working.

Implementing the SoA in Python sounds fun, but note it will probably be slower than the SoE in practice. For some good monolithic SoE implementations, see RWH's StackOverflow post. These can give you some idea of the speed and memory use of very basic implementations. The numpy version will sieve to over 10,000M on my laptop.
What you really want is a segmented sieve. This lets you constrain memory use to some reasonable limit (e.g. 1M + O(sqrt(n)), and the latter can be reduced if needed). A nice discussion and code in C++ is shown at primesieve.org. You can find various other examples in Python. primegen, Bernstein's implementation of SoA, is implemented as a segmented sieve (Your question 1: Yes the SoA can be segmented). This is closely related (but not identical) to sieving a range. This is how we can use a sieve to find primes between 10^18 and 10^18+1e6 in a fraction of a second -- we certainly don't sieve all numbers to 10^18+1e6.
Involving the hard drive is, IMO, going the wrong direction. We ought to be able to sieve faster than we can read values from the drive (at least with a good C implementation). A ranged and/or segmented sieve should do what you need.
There are better ways to do storage, which will help some. My SoE, like a few others, uses a mod-30 wheel so has 8 candidates per 30 integers, hence uses a single byte per 30 values. It looks like Bernstein's SoA does something similar, using 2 bytes per 60 values. RWH's python implementations aren't quite there, but are close enough at 10 bits per 30 values. Unfortunately it looks like Python's native bool array is using about 10 bytes per bit, and numpy is a byte per bit. Either you use a segmented sieve and don't worry about it too much, or find a way to be more efficient in the Python storage.

First of all you should make sure that you store your data in an efficient manner. You could easily store the data for up to 100,000,000 primes in 12.5Mb of memory by using bitmap, by skipping obvious non-primes (even numbers and so on) you could make the representation even more compact. This also helps when storing the data on hard drive. You getting into trouble at 100,000,000 primes suggests that you're not storing the data efficiently.
Some hints if you don't receive a better answer.
1.Is there a way to "chunk" the Sieve of Atkin to work on segment in memory
Yes, for the Eratosthenes-like part what you could do is to run multiple elements in the sieve list in "parallell" (one block at a time) and that way minimize the disk accesses.
The first part is somewhat more tricky, what you would want to do is to process the 4*x**2+y**2, 3*x**2+y**2 and 3*x**2-y**2 in a more sorted order. One way is to first compute them and then sort the numbers, there are sorting algorithms that work well on drive storage (still being O(N log N)), but that would hurt the time complexity. A better way would be to iterate over x and y in such a way that you run on a block at a time, since a block is determined by an interval you could for example simply iterate over all x and y such that lo <= 4*x**2+y**2 <= hi.
2.is there a way to suspend the activity and come back to it later - suggesting I could serialize the memory variables and restore them
In order to achieve this (no matter how and when the program is terminated) you have to first have journalizing disk accesses (fx use a SQL database to keep the data, but with care you could do it yourself).
Second since the operations in the first part are not indempotent you have to make sure that you don't repeat those operations. However since you would be running that part block by block you could simply detect which was the last block processed and resume there (if you can end up with partially processed block you'd just discard that and redo that block). For the Erastothenes part it's indempotent so you could just run through all of it, but for increasing speed you could store a list of produced primes after the sieving of them has been done (so you would resume with sieving after the last produced prime).
As a by-product you should even be able to construct the program in a way that makes it possible to keep the data from the first step even when the second step is running and thereby at a later moment extending the limit by continuing the first step and then running the second step again. Perhaps even having two program where you terminate the first when you've got tired of it and then feeding it's output to the Eratosthenes part (thereby not having to define a limit).

You could try using a signal handler to catch when your application is terminated. This could then save your current state before terminating. The following script shows a simple number count continuing when it is restarted.
import signal, os, cPickle
class MyState:
def __init__(self):
self.count = 1
def stop_handler(signum, frame):
global running
running = False
signal.signal(signal.SIGINT, stop_handler)
running = True
state_filename = "state.txt"
if os.path.isfile(state_filename):
with open(state_filename, "rb") as f_state:
my_state = cPickle.load(f_state)
else:
my_state = MyState()
while running:
print my_state.count
my_state.count += 1
with open(state_filename, "wb") as f_state:
cPickle.dump(my_state, f_state)
As for improving disk writes, you could try experimenting with increasing Python's own file buffering with a 1Mb or more sized buffer, e.g. open('output.txt', 'w', 2**20). Using a with handler should also ensure your file gets flushed and closed.

There is a way to compress the array. It may cost some efficiency depending on the python interpreter, but you'll be able to keep more in memory before having to resort to disk. If you search online, you'll probably find other sieve implementations that use compression.
Neglecting compression though, one of the easier ways to persist memory to disk would be through a memory mapped file. Python has an mmap module that provides the functionality. You would have to encode to and from raw bytes, but it is fairly straightforward using the struct module.
>>> import struct
>>> struct.pack('H', 0xcafe)
b'\xfe\xca'
>>> struct.unpack('H', b'\xfe\xca')
(51966,)

How Can I: Generate 40/64 Bit WEP Key In Python?

So, I've been beating my head against the wall of this issue for several months now, partly because it's a side interest and partly because I suck at programming. I've searched and researched all across the web, but have not had any luck (except one small bit of success; see below), so I thought I might try asking the experts.
What I am trying to do is, as the title suggests, generate a 40/64 bit WEP key from a passphrase, according to the "de facto" standard. (A site such as http://www.powerdog.com/wepkey.cgi produces the expected outputs.) I have already written portions of the script that take inputs and write them to a file; one of the inputs would be the passphrase, sanitized to lower case.
For the longest time I had no idea what the defacto standard was, much less how to even go about implementing it. I finally stumbled across a paper (http://www.lava.net/~newsham/wlan/WEP_password_cracker.pdf) that sheds as much light as I've had yet on the issue (page 18 has the relevant bits). Apparently, the passphrase is "mapped to a 32-bit value with XOR," the result of which is then used as the seed for a "linear congruential PRNG (which one of the several PRNGs Python has would fit this description, I don't know), and then from that result several bits of the result are taken. I have no idea how to go about implementing this, since the description is rather vague.
What I need is help in writing the generator in Python, and also in understanding how exactly the key is generated. In other words, I need code to turn "jackson" into "09F38AF593". (And please don't tell me jackson = 09F38AF593; print (jackson))
I'm not much of a programmer, so explanations are appreciated as well.
(Yes, I know that WEP isn't secure.)

That C code you linked to would have been awfully helpful to include in the question ;-) Anyway, I went ahead and translated it into Python. Before you read it, let me say that I highly encourage you to try it yourself and only use my transcription as a guide. Translating algorithms from one programming language to another is generally great practice when you want to boost your skills in one or both languages. Even if you don't know C, as long as you're familiar enough with Python to write programs in it, you should be able to get the gist of the C code, since there are many similarities.
Anyway, on to the code.
import itertools, operator
First, the pseudorandom number generator, which was identified in the presentation as a linear congruential generator. This type of PRNG is a general algorithm which can be "customized" by choosing specific values of a, c, and m (the variables mentioned in the Wikipedia article). Here is an implementation of a generic linear congruential generator:
def prng(x, a, c, m):
while True:
x = (a * x + c) % m
yield x
(hopefully you could have come up with that on your own)
Now for the actual function:
def pass_to_key(passphrase):
The first step in the process is to hash (or "map") the passphrase provided to a 32-bit number. The WEP algorithm does this by creating a set of 4 bytes (thus 4*8=32 bits) which are initialized to zero.
bits = [0,0,0,0]
It goes through the string and XORs each character with one of the bytes; specifically, character i is XOR'd into byte i % 4.
for i, c in enumerate(passphrase):
bits[i & 3] ^= ord(c)
These four bytes are then concatenated together, in order, to form a single 32-bit value. (Alternatively, I could have written the code to store them as a 32-bit number from the beginning)
val = reduce(operator.__or__, (b << 8*i for (i,b) in enumerate(bits)))
This 32-bit value is used as the seed for a linear congruential generator with certain specific values which you can see in the code. How the original developer figured out these numbers, I have no idea.
keys = []
The linear congruential generator can produce up to 32 bits of output at a time. (In C this is a limitation of the data type; in Python I had to artificially enforce it.) I need 20 bytes to generate 4 40-bit (5-byte) WEP keys, so I'll iterate the PRNG 20 times,
for i, b in enumerate(itertools.islice(prng(val, 0x343fd, 0x269ec3, 1<<32), 20)):
and from each number, take only the 3rd byte from the right (bits 16-23):
keys.append((b >> 16) & 0xff)
Why the third? Well, the bits at the high end (4th from the right) tend not to change much, and those at the low end can be predictable for many values of the PRNG constants.
Afterwards, all that's left is to print out the generated bytes in groups of 5.
print ('%02x:%02x:%02x:%02x:%02x\n'*4) % tuple(keys)

I'm not sure what "de facto standard" that website is talking about, but I'm fairly sure router manufacturers all implement their own methods. It doesn't matter how you do it, as long as the same input always results in the same output; it's a convenience so WEP users can remember a passphrase instead of the actual hex key. Even the method in the PDF you posted is largely ambiguous; it uses an undefined PRNG (and every type of PRNG is going to give a different result), and takes "one byte" from each result without specifying which. If you're trying to reverse engineer a particular router's method, mention that in the post and we might be able to find out how that one works, but there isn't a standard method

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.