I'm writing a game in Python in which the environment is generated randomly. Currently, the game's "save" function works by writing out all parts of the environment which the player has explored. The result is that save files are larger than they need to be—why write random data to disk when you can just generate it again?
What I could use is a random noise function: a function noise such that noise(x) returns a random number, and always the same number whenever it's called with the same value of x. Now, for each point (x,y) in the game's environment, instead of generating a random number using random() and storing the result in env[(x,y)], I can generate a random number using noise((x,y)), throw it away, and generate the same number later.
Not quite sure if I'm stating the obvious, but using some variation of a Perlin noise generator is a common way to do this. This post is a nice description of doing exactly this (as mentioned in the comments, it's not exactly Perlin noise)
For a given position, the Perlin function will return a random value (the position can be 2D, 3D or any dimensionality).
There is a noise module, and this page has a implementation of it
There's a similar thread on gamedev.SE
First, if you need it to be true that noise(x) would always return the same value for the same x, no matter what, even if it's never been called, then you can't really use randomness at all. A good hash function is the only possibility.
However, if you just need to be able to restore a previous state consisting of the values for all of the previously-explored points (never-explored points may turn out different after save and load than if you hadn't quit… but how can anyone tell without access to multiple universes?), and you don't want to store all of those points, then it might be reasonable to regenerate them.
But let's back up a step. You want something that acts like a hash function. Is there a hash function you can use?
I'd imagine the algorithms in hashlib are too slow (md5 is probably the fastest, but test them all), but I wouldn't reject them without actually testing.
It's possible that the "random period" of zlib.adler32 (or zlib.crc32) is too short, but I wouldn't reject it (except maybe hash) without thinking through whether it's good enough. For that matter, even hash plus a decent fixed-side blender function might be good enough (at least on a 64-bit system).
Python doesn't come with anything "between" md5 and `adler32' out of the box. But you can find PyPI modules or source recipes for hundreds of other hash algorithms. For that matter, if you're familiar with any particular hash algorithm that sounds good, most of them are trivial—you could probably code up, e.g., an FNV hash with xor-folding in less time than it takes you to look through the alternatives.
Also, keep in mind that you can generate a bunch of random bytes at "new game" time, store that in the save file, and use it as salt to your hash function.
If you've exhausted the possibilities are you really do need more randomness than a fast-enough hash function with arbitrary salt can give you alone, then:
It sounds like you'll already need to store a list of the points the user has explored (because how else do you know which points you need to restore?). And the order doesn't really matter. So, you can store them in the order of exploration. That means you can regenerate the values deterministically (just by iterating the list). Which means you can use the suggestion by #delnan on your own answer.
However, seed is not the way to do that. It isn't guaranteed to put the RNG into the same state each time across runs, Python versions, machines, etc. For that, you need setstate:
To save, call random.getstate(), and pickle and stash the result.
To load, read and unpickle the state, and call random.setstate(state).
See the docs for full details.
If you're using a random.Random instance, it's exactly the same, except of course that you have to construct a random.Random before you can call setstate on it.
This is guaranteed to work between runs of your program, across machines, etc. Even with a newer version of Python. However, it's not guaranteed to work with an older version of Python. (That is, if the user saves a game with Python 2.6, then tries to load it with 2.5, the state will not be compatible. I believe the only problems come with 2.6->older and 2.3->older, but of course there's no guarantee there won't be additional ones in the future.) I'd suggest stashing the Python version, and if they've downgraded, show a warning saying "This save file requires Python 2.6 or later. You have Python 2.5. The load may fail. Continue anyway?"
This is only guaranteed for random.Random and for the random module itself (since the top-level module functions just use a hidden random.Random). In particular, random.SystemRandom are explicitly documented not to work.
Practically speaking, you can also just pickle a random.Random directly, because the state gets pickled in. It seems like that ought to work, or what would be the sense of pickling a Random object? And it definitely does work. But it isn't actually documented to work, so I'd stick with pickling the getstate, for safety.
One possible implementation of noise is this:
import random
def noise(point):
gen = random.Random()
gen.seed(point)
return gen.random()
I don't know how fast Random.seed() is, though. In addition, Random may change from one version of Python to the next, causing the players of my game to find that the environment changes when they upgrade.
Related
Some methods don't need to make a new variable, i.e. lists.reverse() works like this:
lists = [123, 456, 789]
lists.reverse()
print(lists)
this method make itself reversed (without new variable).
Why there is vary ways to manufacture variable in Python?
Some cases which is like variable.method().method2().method3() are typed continuously but type(variable) and print() are not. Why we can't typing like variable.print() or variable.type()?
Is there any philosophical reasons for Python?
You may be confused by the difference between a function and a method, and by three different purposes to them. As much as I dislike using SO for tutorial purposes, these issues can be hard to grasp from other documentation. You can look up function vs method easily enough -- once you know it's a (slightly) separate issue.
Your first question is a matter of system design. Python merely facilitates what programmers want to do, and the differentiation is common to many (most?) programming languages since ASM and FORTRAN crawled out of the binary slime pools in the days when dinosaurs roamed the earth.
When you design how your application works, you need to make a lot of implementation decisions: individual variables vs a sequence, in-line coding vs functions, separate functions vs encased functions vs classes and methods, etc. Part of this decision making is what each function should do. You've raised three main types:
(1) Process this data -- take the given data and change it, rearrange it, whatever needs doing -- but I don't need the previous version, just the improved version, so just put the new stuff where the old stuff was. This is used almost exclusively when one variable is getting processed; we don't generally take four separate variables and change each of them. In that case, we'd put them all in a list and change the list (a single variable). reverse falls into this class.
One important note is that for such a function, the argument in question must be mutable (capable of change). Python has mutable and immutable types. For instance, a list is mutable; a tuple is immutable. If you wanted to reverse a tuple, you'd need to return a new tuple; you can't change the original.
(2) Tell me something interesting -- take the given data and extract some information. However, I'm going to need the originals, so leave them alone. If I need to remember this cool new insight, I'll put it in a variable of my own. This is a function that returns a value. sqrt is one such function.
(3) Interact with the outside world -- input or output data permanently. For output, nothing in the program changes; we may present the data in an easy-to-read format, but we don't change anything internally. print is such a function.
Much of this decision also depends on the function's designed purpose: is this a "verb" function (do something) or a noun/attribute function (look at this data and tell me what you see)?
Now you get the interesting job for yourself: learn the art of system design. You need to become familiar enough with the available programming tools that you have a feeling for how they can be combined to form useful applications.
See the documentation:
The reverse() method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence.
I'm using python to set up a computationally intense simulation, then running it in a custom built C-extension and finally processing the results in python. During the simulation, I want to store a fixed-length number of floats (C doubles converted to PyFloatObjects) representing my variables at every time step, but I don't know how many time steps there will be in advance. Once the simulation is done, I need to pass back the results to python in a form where the data logged for each individual variable is available as a list-like object (for example a (wrapper around a) continuous array, piece-wise continuous array or column in a matrix with a fixed stride).
At the moment I'm creating a dictionary mapping the name of each variable to a list containing PyFloatObject objects. This format is perfect for working with in the post-processing stage but I have a feeling the creation stage could be a lot faster.
Time is quite crucial since the simulation is a computationally heavy task already. I expect that a combination of A. buying lots of memory and B. setting up your experiment wisely will allow the entire log to fit in the RAM. However, with my current dict-of-lists solution keeping every variable's log in a continuous section of memory would require a lot of copying and overhead.
My question is: What is a clever, low-level way of quickly logging gigabytes of doubles in memory with minimal space/time overhead, that still translates to a neat python data structure?
Clarification: when I say "logging", I mean storing until after the simulation. Once that's done a post-processing phase begins and in most cases I'll only store the resulting graphs. So I don't actually need to store the numbers on disk.
Update: In the end, I changed my approach a little and added the log (as a dict mapping variable names to sequence types) to the function parameters. This allows you to pass in objects such as lists or array.arrays or anything that has an append method. This adds a little time overhead because I'm using the PyObject_CallMethodObjArgs function to call the Append method instead of PyList_Append or similar. Using arrays allows you to reduce the memory load, which appears to be the best I can do short of writing my own expanding storage type. Thanks everyone!
You might want to consider doing this in Cython, instead of as a C extension module. Cython is smart, and lets you do things in a pretty pythonic way, even though it at the same time lets you use C datatypes and python datatypes.
Have you checked out the array module? It allows you to store lots of scalar, homogeneous types in a single collection.
If you're truly "logging" these, and not just returning them to CPython, you might try opening a file and fprintf'ing them.
BTW, realloc might be your friend here, whether you go with a C extension module or Cython.
This is going to be more a huge dump of ideas rather than a consistent answer, because it sounds like that's what you're looking for. If not, I apologize.
The main thing you're trying to avoid here is storing billions of PyFloatObjects in memory. There are a few ways around that, but they all revolve on storing billions of plain C doubles instead, and finding some way to expose them to Python as if they were sequences of PyFloatObjects.
To make Python (or someone else's module) do the work, you can use a numpy array, a standard library array, a simple hand-made wrapper on top of the struct module, or ctypes. (It's a bit odd to use ctypes to deal with an extension module, but there's nothing stopping you from doing it.) If you're using struct or ctypes, you can even go beyond the limits of your memory by creating a huge file and mmapping in windows into it as needed.
To make your C module do the work, instead of actually returning a list, return a custom object that meets the sequence protocol, so when someone calls, say, foo.getitem(i) you convert _array[i] to a PyFloatObject on the fly.
Another advantage of mmap is that, if you're creating the arrays iteratively, you can create them by just streaming to a file, and then use them by mmapping the resulting file back as a block of memory.
Otherwise, you need to handle the allocations. If you're using the standard array, it takes care of auto-expanding as needed, but otherwise, you're doing it yourself. The code to do a realloc and copy if necessary isn't that difficult, and there's lots of sample code online, but you do have to write it. Or you may want to consider building a strided container that you can expose to Python as if it were contiguous even though it isn't. (You can do this directly via the complex buffer protocol, but personally I've always found that harder than writing my own sequence implementation.) If you can use C++, vector is an auto-expanding array, and deque is a strided container (and if you've got the SGI STL rope, it may be an even better strided container for the kind of thing you're doing).
As the other answer pointed out, Cython can help for some of this. Not so much for the "exposing lots of floats to Python" part; you can just move pieces of the Python part into Cython, where they'll get compiled into C. If you're lucky, all of the code that needs to deal with the lots of floats will work within the subset of Python that Cython implements, and the only things you'll need to expose to actual interpreted code are higher-level drivers (if even that).
I have a rather big program, where I use functions from the random module in different files. I would like to be able to set the random seed once, at one place, to make the program always return the same results. Can that even be achieved in python?
The main python module that is run should import random and call random.seed(n) - this is shared between all other imports of random as long as somewhere else doesn't reset the seed.
zss's comment should be highlighted as an actual answer:
Another thing for people to be careful of: if you're using
numpy.random, then you need to use numpy.random.seed() to set the
seed. Using random.seed() will not set the seed for random numbers
generated from numpy.random. This confused me for a while. -zss
In the beginning of your application call random.seed(x) making sure x is always the same. This will ensure the sequence of pseudo random numbers will be the same during each run of the application.
Jon Clements pretty much answers my question. However it wasn't the real problem:
It turns out, that the reason for my code's randomness was the numpy.linalg SVD because it does not always produce the same results for badly conditioned matrices !!
So be sure to check for that in your code, if you have the same problems!
Building on previous answers: be aware that many constructs can diverge execution paths, even when all seeds are controlled.
I was thinking "well I set my seeds so they're always the same, and I have no changing/external dependencies, therefore the execution path of my code should always be the same", but that's wrong.
The example that bit me was list(set(...)), where the resulting order may differ.
One important caveat is that for python versions earlier than 3.7, Dictionary keys are not deterministic. This can lead to randomness in the program or even a different order in which the random numbers are generated and therefore non-deterministic random numbers. Conclusion update python.
I was also puzzled by the question when reproducing a deep learning project.So I do a toy experiment and share the results with you.
I create two files in a project, which are named test1.py and test2.py respectively. In test1, I set random.seed(10) for the random module and print 10 random numbers for several times. As you can verify, the results are always the same.
What about test2? I do the same way except setting the seed for the random module.The results display differently every time. Howerver, as long as I import test1———even without using it, the results appear the same as in test1.
So the experiment comes the conclusion that if you want to set seed for all files in a project, you need to import the file/module that define and set the seed.
According to Jon's answer, setting random.seed(n), at the beginning of the main program will set the seed globally. Afterward to set seeds of the imported libraries, one can use the output from random.random(). For example,
rng = np.random.default_rng(int(abs(math.log(random.random()))))
tf.random.set_seed(int(abs(math.log(random.random()))))
You can guarantee this pretty easily by using your own random number generator.
Just pick three largish primes (assuming this isn't a cryptography application), and plug them into a, b and c:
a = ((a * b) % c)
This gives a feedback system that produces pretty random data. Note that not all primes work equally well, but if you're just doing a simulation, it shouldn't matter - all you really need for most simulations is a jumble of numbers with a pattern (pseudo-random, remember) complex enough that it doesn't match up in some way with your application.
Knuth talks about this.
I have a project with different modules. Then I have a file called Main.py which has some code that calls these modules during the run. In the file Main.py I set random seed using:
random.seed(2)
The output that I get from different runs is not identical even if I use the same random seed. Could you tell me why this might be happening? The various modules in my class use random.uniform, random.choice, random.sample functions. In one place, I also define rnduniform = random.uniform and use that.
Any help about how to solve this problem (i.e., be able to replicate the result by setting random seed) and help me understand this would be greatly appreciated.
Thank you.
EDIT: Solved. My error.
Sorry for wasting your time. I looked more carefully at the code and one of the functions that uses random number generation was called in an init method of one of the classes. The init method was accessed before the seed was set. I tried to delete the post but I could not. Hence, this edit.
Thread safety deals with concurrent programming - or in other words, when you two different codepaths executing at the same time through means of threading. As something that might be a single line of code to you as a programmer are usually a multitude of seperate actions, a different thread might interfere with whatever variables you are using, or use intermediate calculations. This will cause very hard to understand bugs because usually your code will seem completely fine.
In this case, he is saying that your code using random() and other code in a thread that is somehow using the random number generator might conflict and not behave as expected. For example, the numbers might not be as mathematically random any longer, or if you initialize with a certain base seed and then expect random() return a number of set values over multiple calls, those numbers may not be the ones you expect to be returned. In the very worst case of using non-thread-safe functions, you might end up with harsh exceptions and/or crashes since the function is not designed to be used in multiple threads at the same time.
Also see the Wikipedia topic on Thread safety.
I'm making a game which uses procedurally generated levels, and when I'm testing I'll often want to reproduce a level. Right now I haven't made any way to save the levels, but I thought a simpler solution would be to just reuse the seed used by Python's random module. However I've tried using both random.seed() and random.setstate() and neither seem to reliably reproduce results. Oddly, I'll sometimes get the same level a few times in a row if I reuse a seed, but it's never completely 100% reliable. Should I just save the level normally (as a file containing its information)?
Edit:
Thanks for the help everyone. It turns out that my problem came from the fact that I was randomly selecting sprites from groups in Pygame, which are retrieved in unordered dictionary views. I altered my code to avoid using Pygame's sprite groups for that part and it works perfectly now.
random.seed should work ok, but remember that it is not thread safe - if random numbers are being used elsewhere at the same time you may get different results between runs.
In that case you should use an instance of random.Random() to get a private random number generator
>>> import random
>>> seed=1234
>>> n=10
>>> random.Random(seed).sample(range(1000),n)
[966, 440, 7, 910, 939, 582, 671, 83, 766, 236]
>>>
Will always return the same result for a given seed
Best would be to create a levelRandom class with a data slot for every randomly produced result when generating a level. Then separate the random-generation code from the level-construction code. When you want a new level, first generate a new levelRandom object, then hand that object to the level-generator to produce your level. Later, to get the same level again, you can just reuse the levelRandom instance.
The numbers generated by a random function are deterministic. Thus, seed with known value and save it. Create a level from the following random numbers. To load the level, simply seed again with the same known value and use the "random" numbers again, which will be reoccurring in the same sequence. This is language agnostic but for Python do mind threading as pointed out by gnibbler.
This exact technique was actually used to create the humongous world of the legendary game Elite. If you are brave, just go check how Ian Bell did it. ;)
At the point when you want another level, you should initially create another levelRandom object, and give that object to a level-generator to produce the level (if you are using one)