I am testing this code
protected final double sqrt_3 = Math.sqrt( 3 );
protected final double denom = 4 * Math.sqrt( 2 );
//
// forward transform scaling (smoothing) coefficients
//
protected final double h0 = (1 + sqrt_3)/denom;
protected final double h1 = (3 + sqrt_3)/denom;
protected final double h2 = (3 - sqrt_3)/denom;
protected final double h3 = (1 - sqrt_3)/denom;
//
// forward transform wavelet coefficients
//
protected final double g0 = h3;
protected final double g1 = -h2;
protected final double g2 = h1;
protected final double g3 = -h0;
protected void transform( double a[], int n )
{
if (n >= 4) {
int i, j;
int half = n >> 1;
double tmp[] = new double[n];
i = 0;
for (j = 0; j < n-3; j = j + 2) {
tmp[i] = a[j]*h0 + a[j+1]*h1 + a[j+2]*h2 + a[j+3]*h3;
tmp[i+half] = a[j]*g0 + a[j+1]*g1 + a[j+2]*g2 + a[j+3]*g3;
i++;
}
tmp[i] = a[n-2]*h0 + a[n-1]*h1 + a[0]*h2 + a[1]*h3;
tmp[i+half] = a[n-2]*g0 + a[n-1]*g1 + a[0]*g2 + a[1]*g3;
for (i = 0; i < n; i++) {
a[i] = tmp[i];
}
}
} // transform
to perform a Daubechies D4 wavelet transform on this discrete array:
[1,2,0,4,5,6,8,10]
the result is
- 0 : 1.638357430415108
- 1 : 3.6903274198537357
- 2 : -2.6439375651698196
- 3 : 79.01146993331695
- 4 : 7.399237211089009
- 5 : 0.3882285676537802
- 6 : -39.6029588778518
- 7 : -19.794010741818195
- 8 : -2.1213203435596424
- 9 : 0.0
but when I use python pywt.dwt on the same array, I get this:
import pywt
[cA, cD] = pywt.dwt([1,2,0,4,5,6,8,10], 'db4')
>>> >>> cA
array([ 7.14848277, 1.98754736, 1.9747116 , 0.95510018, 4.90207373,
8.72887094, 14.23995582])
>>> cD
array([-0.5373913 , -2.00492859, 0.01927609, 0.1615668 , -0.0823509 ,
-0.32289939, 0.92816281])
Beyond the different values, one has 10 items and the other 7.
what am I missing?
I have never used any of these codes, also, not really sure about your question! But, maybe, this information might help you to get closer to an answer to your question:
Daubechies 4 Wiki
Daubechies Coefficients Wiki
Before that, I think your input vector (signal) maybe too small to make Wavelet calculations come up right? Not sure though! Maybe, try something in 1x128 size.
Maybe, Java code is Fast Wavelet Transform. Guessing based on the following methods:
Code
/**
Forward Daubechies D4 transform
*/
public void daubTrans( double s[] )
{
final int N = s.length;
int n;
for (n = N; n >= 4; n >>= 1) {
transform( s, n );
}
}
/**
Inverse Daubechies D4 transform
*/
public void invDaubTrans( double coef[])
{
final int N = coef.length;
int n;
for (n = 4; n <= N; n <<= 1) {
invTransform( coef, n );
}
}
Based on above methods, it seems this would be "Fast Wavelet Transform", which I'm also not so sure about their calculations, you might look into this link.
There are so many so-called, similar "terms" on Wavelet transforms that it might be best to go through their math to see things, and find out what the exact method might be (e.g., Discrete Wavelet Transform, Continuous Wavelet Transform, Discrete with Packet Decomposition). Every library has some terminologies and assumptions and make different calculations. You might print to see, if you would get anything close to D4 Wavelet = {−0.1830127, −0.3169873, 1.1830127, −0.6830127}; for DB4, first. Or, you may do other testing to see, if the calculations are correct.
Methods of Decomposition in Wavelets
It looks like cA and cD are coefficients of "Approximated" and "Details" signals decomposed by a discrete Wavelet transform. However, I'm not so sure, to how many layers you might have been decomposed your input vector.
There are two well-known ways of decomposing a signal in Wavelet, one is "packet" (which decomposes both "approximations" and "details" signals, so you would get 2^4=16 sub-signals for decomposing your original signal to 4 layers).
The other decomposition method decomposes the low-frequency part of signals. So, you might need to find out about your level of decomposition that your vector is being decomposed.
Also, if you write your own code, you can decompose it however you wish.
Simple Keys to Understand Wavelet
Shifting (Time) vs Scale (Frequency)
There is one simple thing that if you understand, then Wavelet becomes much easier. First, as you may know, Wavelet is a time-frequency method. However, instead of plotting time vs frequency, you do time vs scale, where scale is "inverse" of frequency.
Children of a Wavelet Function such as DB4
Wavelet transform maps a Wavelet function - such as DB4 - throughout your original signal, and that's how it would compute those numbers that you have printed out, perhaps. One thing to consider is to find a base function, DB4, that would "look like" you original signal. How do you do that?
Basically, you pick a base function, DB4, then Wavelet transform creates multiple forms of that base function (e.g., imagine you name them DB4-0, DB4-2, DB4-3, DB4-4, ...,DB4-15). These children are created based on:
(a) Shifting (in a for loop by incrementing time, sliding a child function, then calculating coefficients), shifting is in relation with time, obviously.
(b) Scaling (means "stretching" a Wavelet function, vertically or horizontally, which would changes the frequency nature of the base function, then sliding it again through time), which is converse relation with frequency, meaning that higher scale, lower frequencies, and vice versa.
Therefore, this depends on how many children functions you may need, based on the decompositions (sub-signals). If you have 16 sub-signals (4 level of decomposition with a packet method), then you will have 16 of those "children" functions mapping throughout your signal. And that's how coefficients vectors being calculated. Then, you may toss those unnecessary sub-signals, and keep focusing on those sub-signals (frequencies) that you might be interested in. The thing is Wavelet reserves (maintain) the time information, as opposed to Fourier.
Normal Decomposition
Also, since you are a good programmer, I'm pretty sure, you can quickly crack the code and I don't think you might be missing anything here. You can just go through their methods and read a few pages of wikipedia, and you would be probably there, if you wish so.
If you might have really exciting details questions, you may try DSP SE. So many signals experts are in there. Sorry about that! Wrote this too fast, also not a good writer/explainer, later hopefully others would edit and provide the right answer. Not really expert.
In short, you are not missing on anything, good method, good luck and best wishes!
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am making an implementation of a MobileNetV2 in C and comparing it, layer by layer, to a Keras model of the same network to make sure I'm doing things right. I managed to get to a very close approximation of the inference result of the network, but there is an error in the 5th decimal place or so. Looking for the reason for the imprecision I came across something strange.
I am working exclusively with float objects in C, and all of the arrays in Python, including all of the weight arrays and other parameters, are float32.
When exporting my processed image from Python to a .csv file, I put seven decimal points to the export function: np.savetxt(outfile, twoD_data_slice, fmt='%-1.7e') which would still result in a float but with certain limitations. Namely, that last decimal place does not have full precision. However, one of the numbers I got was "0.98431373". When trying to convert this to C it instead gave me "0.98431377".
I asked a question here about this result and I was told of my mistake to use seven decimal places, but this still doesn't explain why Python can handle a number like "0.98431373" as a float32 when in C that gets changed to "0.98431377".
My guess is that Python is using a different 32-bit float than the one I'm using in C, as evidenced by how their float32 can handle a number like "0.98431373" and the float in C cannot. And I think this is what is causing the imprecision of my implementation when compared to the final result in Python. Because if Python can handle numbers like these, then the precision it has while doing calculations for the neural network is higher than in C, or at least different, so the answer should be different as well.
Is the floating point standard different in Python compared to C? And if so, is there a way I can tell Python to use the same format as the one in C?
Update
I changed the way I import files using atof, like so:
void import_image(float data0[224][224][3]) {
// open file
FILE *fptr;
fptr = fopen("image.csv", "r");
if (fptr == NULL) {
perror("fopen()");
exit(EXIT_FAILURE);
}
char c = fgetc(fptr); // generic char
char s[15]; // maximum number of characters is "-x.xxxxxxxe-xx" = 14
for (int y = 0; y < 224; ++y) { // lines
for (int x = 0; x < 224; ++x) { // columns
for (int d = 0; d < 3; ++d) { // depth
// write string
int i;
for (i = 0; c != '\n' && c != ' '; ++i) {
assert( 0 <= i && i <= 14 );
s[i] = c;
c = fgetc(fptr);
}
s[i] = '\0';
float f = atof(s); // convert to float
data0[y][x][d] = f; // save on array
c = fgetc(fptr);
}
}
}
fclose(fptr);
}
I also exported the images from python using seven decimal places and the result seems more accurate. If the float standard from both is the same, even the half-precision digit should be the same. And indeed, there doesn't seem to be any error in the image I import when compared to the one I exported from Python.
There is still, however, an error in the last digits of the final answer in the system. Python displays this answer with eight significant places. I mimic this with %.8g.
My answer:
'tiger cat', 0.42557633
'tabby, tabby cat', 0.35453162
'Egyptian cat', 0.070309319
'lynx, catamount', 0.0073038512
'remote control, remote', 0.0032443549
Python's Answer:
('n02123159', 'tiger_cat', 0.42557606)
('n02123045', 'tabby', 0.35453174)
('n02124075', 'Egyptian_cat', 0.070309244)
('n02127052', 'lynx', 0.007303906)
('n04074963', 'remote_control', 0.0032443653)
The error seems to start appearing after the first convolutional layer, which is where I start making mathematical operations with these values. There could be an error in my implementation, but assuming there isn't, could this be caused by a difference in the way Python operates with floats compared to C? Is this imprecision expected, or is it likely an error in the code?
A 32-bit floating point number can encode about 232 different values.
0.98431373 is not one of them.
Finite floating point values are of the form: some_integer * power-of-two.
The closest choice to 0.98431373 is 0.98431372_64251708984375 which is 16514044 * 2-24.
Printing 0.98431372_64251708984375 to 8 fractional decimal places is 0.98431373. That may appear to be the 32-bit float value, but its exact value differs a small amount.
in C that gets changed to "0.98431377"
0.98431377 is not an expected output of a 32-bit float as the next larger float is 0.98431378_6029815673828125. Certainly OP's conversion code to C results in a 64-bit double with some unposted TBD conversion artifacts.
"the way I import the data to C is I take the mantissa, convert it to float, then take then exponent, convert it to long, and then multiply the mantissa by 10^exponent" is too vague. Best to post code than only a description of code.
Is the floating point standard different in Python compared to C?
They could differ, but likely are the same.
And if so, is there a way I can tell Python to use the same format as the one in C?
Not really. More likely the other way around. I am certain C allows more variations on FP than python.
I currently have enormous amount of medical records which consists medical terms that need to be translated. For cost consideration, we don't want to translate every term for each record. For example, if we found the terms in a record are already frequently appeared in previous records which means these terms might be translated already in previous record, then we don't want to translate them again. I was asked to design a program to accomplish this goal. Hints I got is that I may need to break the records to alphabet level, and matrix may needed to solve this problem. I am literally a beginner in programming. Therefore, I'm looking for help here. Brutal thoughts/suggestions are enough for now. Thanks.
[Edit by Spektre] moved from comments
My problem boils down to this:
Say there are two sentences A and B. A has m tokens (a1, a2, ……, am) and B has n tokens (b1, b2, ……, bn). While A and B might have common tokens. So I need a function to estimate the likelihood of tokens in B that not covered by A.
The tokens are already stored in dictionary.
How to implement this?
So if I see it right you want to know if bi is not in A.
I do not code in python but I see it like this (in C++ like languages)
bool untranslated(int j,int m,int n,string *a,string *b)
{
// the dictionaries are: a[m],b[n]
for (int i=0;j<m;i++) // inspect all tokens of A
if (b[j]==a[i]) // if b[j] present in A
return false;
return true;
}
Now if the dictionaries are rather large then you need to change this linear search to binary search. Also to speed up (if the words are big) you need to use hashes (hash map) for matching. Of coarse depending on your language you can not compare words naively with == rather implement some function that will convert the word into its simplex grammatical form and store to dictionary just that. That can be pretty complicated to implement.
Now the probability of whole sentence would be:
// your dictionaries:
const int m=?,n=?;
string A[m],string B[n];
// code:
int j; float p;
for (p=0.0,j=0;j<n;j++) // test all words of B
if (untranslated(j,m,n,A,B)) p++; // and count how many are untranslated
p/=float(n); // normalize p to <0,1> its your probability that sentence B is not in A
the resulting probability p is in range <0,1> so if you want percentage instead just multiply it by 100.
[Edit1] occurrence of bi
that is entirely different problem but also solvable relatively easy. Its the same as computing histogram so:
add counter for each word in A dictionary
so each record of A will be like this:
struct A_record
{
string word;
int cnt;
};
int m=0;
A_record a[];
process B sentences
on each word bi look into dictionary A. If not present there add it to dictionary and set its counter to 1. If present then just increment its counter by one instead.
const int n=?; // input sentence word count
string b[n]={...}; // input sentence words
int i,j;
for (i=0;i<n;i++) // process B
for (j=0;j<m;j++) // search in A (should be binary search or has-map search)
if (b[i]==a[j].word)
{ a[j].cnt++; j=-1; break; } // here a[j].cnt is the bi occurrence you wanted if divided by m then its probability <0,1>
if (j<0)
{ a[m].word=b[i]; a[m].cnt=1; m++; } // here no previous occurrence of bi
Now if you want just previous occurrence of bithen look at the matched a[j].cnt during the search. If you want the occurrence of any b[i] word in whole text look at the same counter after whole text is processed.
My queries are regarding the generation of the uniform random number generator using numpy.random.uniform on [0,1).
Does this implementation involve a uniform step-size, i.e. are the universe of possibilities {0,a,2a,...,Na} where (N+1)a = 1 and a is constant?
If the above is true, then what's the value of this step-size? I noticed that the value of numpy.nextafter(x,y) keeps on changing depending upon x. Hence my question regarding whether a uniform step-size was used to implement numpy.random.uniform.
If the step-size is not uniform, then what would be the best way to figure out the number of unique values that numpy.random.uniform(low=0, high=1) can take?
What's the recurrence period of numpy.random.uniform, i.e. after how many samples will I see my original number again? For maximum efficiency, this should be equal to the number of unique values.
I tried looking up the source code at Github but didn't find anything directly interpretable there.
The relevant function is
double
rk_double(rk_state *state)
{
/* shifts : 67108864 = 0x4000000, 9007199254740992 = 0x20000000000000 */
long a = rk_random(state) >> 5, b = rk_random(state) >> 6;
return (a * 67108864.0 + b) / 9007199254740992.0;
}
which is found in randomkit.c inside the numpy source tree.
As you can see the granularity is 1 / 9007199254740992.0 which happens to equal 2^-53 which is the (downward) float64 resolution at 1.
>>> 1 / 9007199254740992.0
1.1102230246251565e-16
>>> 2**-53
1.1102230246251565e-16
>>> 1-np.nextafter(1.0, 0)
1.1102230246251565e-16
I've started messing around with parallel programming and cython/openmp, and I have a simple program that sums over an array using prange:
import numpy as np
from cython.parallel import prange
from cython import boundscheck, wraparound
#boundscheck(False)
#wraparound(False)
def parallel_summation(double[:] vec):
cdef int n = vec.shape[0]
cdef double total
cdef int i
for i in prange(n, nogil=True):
total += vec[i]
return total
It seems to work OK with a setup.py file. However, I was wondering if it is possible to adjust this function and have a little more control over what the processors are doing.
Let's say I have 4 processors: I want to split the vector to be summed into 4 parts, and then have each processor locally add the elements inside. Then at the end, I can combine the results from each processor to get the total sum. From the cython documentation, I wasn't able to gather whether something like this is possible or not (the documentation is a little sparse).
I'd appreciate if someone could explain if/how something like this is done using cython/openmp, or maybe help locate some relevant examples (its been surprisingly hard to find simple ones online).
I want to split the vector to be summed into 4 parts, and then have each processor locally add the elements inside. Then at the end, I can combine the results from each processor to get the total sum.
That's exactly what's happening here already. Cython infers from your inplace operation that you want to do a reduction. OpenMP will implement a parallel loop with private (zero initialized) copies of the total variable and add them all to total at the end of the loop.
In the generated C, this looks like this:
#pragma omp parallel
{
#pragma omp for firstprivate(__pyx_v_i) lastprivate(__pyx_v_i) reduction(+:__pyx_v_total)
for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_3; __pyx_t_2++){
{
__pyx_v_i = (int)(0 + 1 * __pyx_t_2);
__pyx_t_4 = __pyx_v_i;
__pyx_v_total = (__pyx_v_total + (*((double *) ( /* dim=0 */ (__pyx_v_vec.data + __pyx_t_4 * __pyx_v_vec.strides[0]) ))));
}
}
}
You just need to enable OpenMP as described here.
The one thing that you should change in your code, is to initialize total = 0, otherwise it's just an unitialized C variable with may contain garbage.
I have noticed that you can put various numbers inside of numpy.random.seed(), for example numpy.random.seed(1), numpy.random.seed(101). What do the different numbers mean? How do you choose the numbers?
Consider a very basic random number generator:
Z[i] = (a*Z[i-1] + c) % m
Here, Z[i] is the ith random number, a is the multiplier and c is the increment - for different a, c and m combinations you have different generators. This is known as the linear congruential generator introduced by Lehmer. The remainder of that division, or modulus (%), will generate a number between zero and m-1 and by setting U[i] = Z[i] / m you get random numbers between zero and one.
As you may have noticed, in order to start this generative process - in order to have a Z[1] you need to have a Z[0] - an initial value. This initial value that starts the process is called the seed. Take a look at this example:
The initial value, the seed is determined as 7 to start the process. However, that value is not used to generate a random number. Instead, it is used to generate the first Z.
The most important feature of a pseudo-random number generator would be its unpredictability. Generally, as long as you don't share your seed, you are fine with all seeds as the generators today are much more complex than this. However, as a further step you can generate the seed randomly as well. You can skip the first n numbers as another alternative.
Main source: Law, A. M. (2007). Simulation modeling and analysis. Tata McGraw-Hill.
The short answer:
There are three ways to seed() a random number generator in numpy.random:
use no argument or use None -- the RNG initializes itself from the OS's random number generator (which generally is cryptographically random)
use some 32-bit integer N -- the RNG will use this to initialize its state based on a deterministic function (same seed → same state)
use an array-like sequence of 32-bit integers n0, n1, n2, etc. -- again, the RNG will use this to initialize its state based on a deterministic function (same values for seed → same state). This is intended to be done with a hash function of sorts, although there are magic numbers in the source code and it's not clear why they are doing what they're doing.
If you want to do something repeatable and simple, use a single integer.
If you want to do something repeatable but unlikely for a third party to guess, use a tuple or a list or a numpy array containing some sequence of 32-bit integers. You could, for example, use numpy.random with a seed of None to generate a bunch of 32-bit integers (say, 32 of them, which would generate a total of 1024 bits) from the OS's RNG, store in some seed S which you save in some secret place, then use that seed to generate whatever sequence R of pseudorandom numbers you wish. Then you can later recreate that sequence by re-seeding with S again, and as long as you keep the value of S secret (as well as the generated numbers R), no one would be able to reproduce that sequence R. If you just use a single integer, there's only 4 billion possibilities and someone could potentially try them all. That may be a bit on the paranoid side, but you could do it.
Longer answer
The numpy.random module uses the Mersenne Twister algorithm, which you can confirm yourself in one of two ways:
Either by looking at the documentation for numpy.random.RandomState, of which numpy.random uses an instance for the numpy.random.* functions (but you can also use an isolated independent instance of)
Looking at the source code in mtrand.pyx which uses something called Pyrex to wrap a fast C implementation, and randomkit.c and initarray.c.
In any case here's what the numpy.random.RandomState documentation says about seed():
Compatibility Guarantee A fixed seed and a fixed series of calls to RandomState methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect. Incorrect values will be fixed and the NumPy version in which the fix was made will be noted in the relevant docstring. Extension of existing parameter ranges and the addition of new parameters is allowed as long the previous behavior remains unchanged.
Parameters:
seed : {None, int, array_like}, optional
Random seed used to initialize the pseudo-random number generator. Can be any integer between 0 and 2**32 - 1 inclusive, an array (or other sequence) of such integers, or None (the default). If seed is None, then RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.
It doesn't say how the seed is used, but if you dig into the source code it refers to the init_by_array function: (docstring elided)
def seed(self, seed=None):
cdef rk_error errcode
cdef ndarray obj "arrayObject_obj"
try:
if seed is None:
with self.lock:
errcode = rk_randomseed(self.internal_state)
else:
idx = operator.index(seed)
if idx > int(2**32 - 1) or idx < 0:
raise ValueError("Seed must be between 0 and 2**32 - 1")
with self.lock:
rk_seed(idx, self.internal_state)
except TypeError:
obj = np.asarray(seed).astype(np.int64, casting='safe')
if ((obj > int(2**32 - 1)) | (obj < 0)).any():
raise ValueError("Seed must be between 0 and 2**32 - 1")
obj = obj.astype('L', casting='unsafe')
with self.lock:
init_by_array(self.internal_state, <unsigned long *>PyArray_DATA(obj),
PyArray_DIM(obj, 0))
And here's what the init_by_array function looks like:
extern void
init_by_array(rk_state *self, unsigned long init_key[], npy_intp key_length)
{
/* was signed in the original code. RDH 12/16/2002 */
npy_intp i = 1;
npy_intp j = 0;
unsigned long *mt = self->key;
npy_intp k;
init_genrand(self, 19650218UL);
k = (RK_STATE_LEN > key_length ? RK_STATE_LEN : key_length);
for (; k; k--) {
/* non linear */
mt[i] = (mt[i] ^ ((mt[i - 1] ^ (mt[i - 1] >> 30)) * 1664525UL))
+ init_key[j] + j;
/* for > 32 bit machines */
mt[i] &= 0xffffffffUL;
i++;
j++;
if (i >= RK_STATE_LEN) {
mt[0] = mt[RK_STATE_LEN - 1];
i = 1;
}
if (j >= key_length) {
j = 0;
}
}
for (k = RK_STATE_LEN - 1; k; k--) {
mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
- i; /* non linear */
mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
i++;
if (i >= RK_STATE_LEN) {
mt[0] = mt[RK_STATE_LEN - 1];
i = 1;
}
}
mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
self->gauss = 0;
self->has_gauss = 0;
self->has_binomial = 0;
}
This essentially "munges" the random number state in a nonlinear, hash-like method using each value within the provided sequence of seed values.
What is normally called a random number sequence in reality is a "pseudo-random" number sequence because the values are computed using a deterministic algorithm and probability plays no real role.
The "seed" is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers. This is very useful for example for debugging (when you are looking for an error in a program you need to be able to reproduce the problem and study it, a non-deterministic program would be much harder to debug because every run would be different).
Basically the number guarantees the same 'randomness' every time.
More properly, the number is a seed, which can be an integer, an array (or other sequence) of integers of any length, or the default (none). If seed is none, then random will try to read data from /dev/urandom if available or make a seed from the clock otherwise.
Edit: In most honesty, as long as your program isn't something that needs to be super secure, it shouldn't matter what you pick. If this is the case, don't use these methods - use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator.
The most important concept to understand here is that of pseudo-randomness. Once you understand this idea, you can determine if your program really needs a seed etc. I'd recommend reading here.
To understand the meaning of random seeds, you need to first understand the "pseudo-random" number sequence because the values are computed using a deterministic algorithm.
So you can think of this number as a starting value to calulate the next number you get from the random generator. Putting the same value here will make your program getting the same "random" value everytime, so your program becomes deterministic.
As said in this post
they (numpy.random and random.random) both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next.
If you really care about randomness, ask the user to generate some noise (some arbitary words) or just put the system time as seed.
If your codes run on Intel CPU (or AMD with newest chips) I also suggest you to check the RdRand package which uses the cpu instruction rdrand to collect "true" (hardware) randomness.
Refs:
Random seed
What is a seed in terms of generating a random number
One very specific answer: np.random.seed can take values from 0 and 2**32 - 1, which interestingly differs from random.seed which can take any hashable object.
A side comment: better set your seed to a rather large number but still within the generator limit. Doing so can let the seed number have a good balance of 0 and 1 bits. Avoid having many 0 bits in the seed.
Reference: pyTorch documentation