I'm looking for a Python object which is guaranteed to compare greater than any given int. It should be portable, platform-independent and work on both Python 2.7+ and 3.x.
For example:
x = float('inf')
while True:
n = next(my_gen)
if my_calc(n):
x = min(n, x)
if my_cond(x):
break
Here I've used float('inf') for this purpose because it seems to behave correctly. But this feels dirty, because I think it relies on some underlying float specification and I don't know whether that's going to be platform dependent or break in unexpected ways.
I'm aware that I could create my own class and define the comparison operators, but I thought there might be an existing built-in way.
Is it safe to use float('inf') like this? Is there a less ugly way of creating this "biggest integer"?
float('inf') is guaranteed to test as larger than any number, including integers. This is not platform specific.
From the floatobject.c source code:
else if (!Py_IS_FINITE(i)) {
if (PyInt_Check(w) || PyLong_Check(w))
/* If i is an infinity, its magnitude exceeds any
* finite integer, so it doesn't matter which int we
* compare i with. If i is a NaN, similarly.
*/
j = 0.0;
Python integers themselves are only bounded by memory, so using 10 ** 3000 is not going to be big enough, probably.
float('inf') is always available; Python will handle underlying platform specifics for you to make this so.
Why don't just use:
x = float('inf')
instead of:
x = 1e3000
Read this post for more information.
In the following I remove the need for that first sentinel x value by using an outer while loop to capture the first valid x then using it on the preserved inner while loop:
while True:
n = next(my_gen)
if my_calc(n):
x = n
if my_cond(x):
break
else:
while True:
n = next(my_gen)
if my_calc(n):
x = min(n, x)
if my_cond(x):
break
break
It is more code. Usually the removal of sentinel values is good but the above would have to be assessed for maintainability.
Further factoring of the code gives the following, but the code above preserves more of the original conditionals.
while True:
n = next(my_gen)
if my_calc(n):
x = n
if not my_cond(x):
while True:
n = next(my_gen)
if my_calc(n):
x = min(n, x)
if my_cond(x):
break
break
Related
I'm implementing Wiener's Exponent Attack using Python and SageMath.
My code is as follows
from sage.all import *
# constants
b = some_very_large_number
n = some_very_large_number
b_over_n = continued_fraction(b/n)
while True:
t_over_a = b_over_n.convergent(i+1)
t = t_over_a.numerator()
a = t_over_a.denominator()
# check if t divides ab-1
if ((t != 0) and (gcd(a*b-1, t)== t)):
print("Found i: ", i)
break
i += 1
I found out that the loop would not end forever and added this line of code before the while loop.
print(b_over_n.convergent(5))
And I found that b_over_n was always returning 0 no matter what.
I also printed out type(b_over_n) and checked it was of 'long' type.
I have checked SageMath manuals but haven't found anything useful yet.
Is there something I'm doing wrong here?
It turns out I was using Python2, where int/int would return int.
Thus since b was smaller than n in my case, b/n automatically returned 0.
I could do this in brute force, but I was hoping there was clever coding, or perhaps an existing function, or something I am not realising...
So some examples of numbers I want:
00000000001111110000
11111100000000000000
01010101010100000000
10101010101000000000
00100100100100100100
The full permutation. Except with results that have ONLY six 1's. Not more. Not less. 64 or 32 bits would be ideal. 16 bits if that provides an answer.
I think what you need here is using the itertools module.
BAD SOLUTION
But you need to be careful, for instance, using something like permutations would just work for very small inputs. ie:
Something like the below would give you a binary representation:
>>> ["".join(v) for v in set(itertools.permutations(["1"]*2+["0"]*3))]
['11000', '01001', '00101', '00011', '10010', '01100', '01010', '10001', '00110', '10100']
then just getting decimal representation of those number:
>>> [int("".join(v), 16) for v in set(itertools.permutations(["1"]*2+["0"]*3))]
[69632, 4097, 257, 17, 65552, 4352, 4112, 65537, 272, 65792]
if you wanted 32bits with 6 ones and 26 zeroes, you'd use:
>>> [int("".join(v), 16) for v in set(itertools.permutations(["1"]*6+["0"]*26))]
but this computation would take a supercomputer to deal with (32! = 263130836933693530167218012160000000 )
DECENT SOLUTION
So a more clever way to do it is using combinations, maybe something like this:
import itertools
num_bits = 32
num_ones = 6
lst = [
f"{sum([2**vv for vv in v]):b}".zfill(num_bits)
for v in list(itertools.combinations(range(num_bits), num_ones))
]
print(len(lst))
this would tell us there is 906192 numbers with 6 ones in the whole spectrum of 32bits numbers.
CREDITS:
Credits for this answer go to #Mark Dickinson who pointed out using permutations was unfeasible and suggested the usage of combinations
Well I am not a Python coder so I can not post a valid code for you. Instead I can do a C++ one...
If you look at your problem you set 6 bits and many zeros ... so I would approach this by 6 nested for loops computing all the possible 1s position and set the bits...
Something like:
for (i0= 0;i0<32-5;i0++)
for (i1=i0+1;i1<32-4;i1++)
for (i2=i1+1;i2<32-3;i2++)
for (i3=i2+1;i3<32-2;i3++)
for (i4=i3+1;i4<32-1;i4++)
for (i5=i4+1;i5<32-0;i5++)
// here i0,...,i5 marks the set bits positions
So the O(2^32) become to less than `~O(26.25.24.23.22.21/16) and you can not go faster than that as that would mean you miss valid solutions...
I assume you want to print the number so for speed up you can compute the number as a binary number string from the start to avoid slow conversion between string and number...
The nested for loops can be encoded as increment operation of an array (similar to bignum arithmetics)
When I put all together I got this C++ code:
int generate()
{
const int n1=6; // number of set bits
const int n=32; // number of bits
char x[n+2]; // output number string
int i[n1],j,cnt; // nested for loops iterator variables and found solutions count
for (j=0;j<n;j++) x[j]='0'; x[j]='b'; j++; x[j]=0; // x = 0
for (j=0;j<n1;j++){ i[j]=j; x[i[j]]='1'; } // first solution
for (cnt=0;;)
{
// Form1->mm_log->Lines->Add(x); // here x is the valid answer to print
cnt++;
for (j=n1-1;j>=0;j--) // this emulates n1 nested for loops
{
x[i[j]]='0'; i[j]++;
if (i[j]<n-n1+j+1){ x[i[j]]='1'; break; }
}
if (j<0) break;
for (j++;j<n1;j++){ i[j]=i[j-1]+1; x[i[j]]='1'; }
}
return cnt; // found valid answers
};
When I use this with n1=6,n=32 I got this output (without printing the numbers):
cnt = 906192
and it was finished in 4.246 ms on AMD A8-5500 3.2GHz (win7 x64 32bit app no threads) which is fast enough for me...
Beware once you start outputing the numbers somewhere the speed will drop drastically. Especially if you output to console or what ever ... it might be better to buffer the output somehow like outputting 1024 string numbers at once etc... But as I mentioned before I am no Python coder so it might be already handled by the environment...
On top of all this once you will play with variable n1,n you can do the same for zeros instead of ones and use faster approach (if there is less zeros then ones use nested for loops to mark zeros instead of ones)
If the wanted solution numbers are wanted as a number (not a string) then its possible to rewrite this so the i[] or i0,..i5 holds the bitmask instead of bit positions ... instead of inc/dec you just shift left/right ... and no need for x array anymore as the number would be x = i0|...|i5 ...
You could create a counter array for positions of 1s in the number and assemble it by shifting the bits in their respective positions. I created an example below. It runs pretty fast (less than a second for 32 bits on my laptop):
bitCount = 32
oneCount = 6
maxBit = 1<<(bitCount-1)
ones = [1<<b for b in reversed(range(oneCount)) ] # start with bits on low end
ones[0] >>= 1 # shift back 1st one because it will be incremented at start of loop
index = 0
result = []
while index < len(ones):
ones[index] <<= 1 # shift one at current position
if index == 0:
number = sum(ones) # build output number
result.append(number)
if ones[index] == maxBit:
index += 1 # go to next position when bit reaches max
elif index > 0:
index -= 1 # return to previous position
ones[index] = ones[index+1] # and prepare it to move up (relative to next)
64 bits takes about a minute, roughly proportional to the number of values that are output. O(n)
The same approach can be expressed more concisely in a recursive generator function which will allow more efficient use of the bit patterns:
def genOneBits(bitcount=32,onecount=6):
for bitPos in range(onecount-1,bitcount):
value = 1<<bitPos
if onecount == 1: yield value; continue
for otherBits in genOneBits(bitPos,onecount-1):
yield value + otherBits
result = [ n for n in genOneBits(32,6) ]
This is not faster when you get all the numbers but it allows partial access to the list without going through all values.
If you need direct access to the Nth bit pattern (e.g. to get a random one-bits pattern), you can use the following function. It works like indexing a list but without having to generate the list of patterns.
def numOneBits(bitcount=32,onecount=6):
def factorial(X): return 1 if X < 2 else X * factorial(X-1)
return factorial(bitcount)//factorial(onecount)//factorial(bitcount-onecount)
def nthOneBits(N,bitcount=32,onecount=6):
if onecount == 1: return 1<<N
bitPos = 0
while bitPos<=bitcount-onecount:
group = numOneBits(bitcount-bitPos-1,onecount-1)
if N < group: break
N -= group
bitPos += 1
if bitPos>bitcount-onecount: return None
result = 1<<bitPos
result |= nthOneBits(N,bitcount-bitPos-1,onecount-1)<<(bitPos+1)
return result
# bit pattern at position 1000:
nthOneBit(1000) # --> 10485799 (00000000101000000000000000100111)
This allows you to get the bit patterns on very large integers that would be impossible to generate completely:
nthOneBits(10000, bitcount=256, onecount=9)
# 77371252457588066994880639
# 100000000000000000000000000000000001000000000000000000000000000000000000000000001111111
It is worth noting that the pattern order does not follow the numerical order of the corresponding numbers
Although nthOneBits() can produce any pattern instantly, it is much slower than the other functions when mass producing patterns. If you need to manipulate them sequentially, you should go for the generator function instead of looping on nthOneBits().
Also, it should be fairly easy to tweak the generator to have it start at a specific pattern so you could get the best of both approaches.
Finally, it may be useful to obtain then next bit pattern given a known pattern. This is what the following function does:
def nextOneBits(N=0,bitcount=32,onecount=6):
if N == 0: return (1<<onecount)-1
bitPositions = []
for pos in range(bitcount):
bit = N%2
N //= 2
if bit==1: bitPositions.insert(0,pos)
index = 0
result = None
while index < onecount:
bitPositions[index] += 1
if bitPositions[index] == bitcount:
index += 1
continue
if index == 0:
result = sum( 1<<bp for bp in bitPositions )
break
if index > 0:
index -= 1
bitPositions[index] = bitPositions[index+1]
return result
nthOneBits(12) #--> 131103 00000000000000100000000000011111
nextOneBits(131103) #--> 262175 00000000000001000000000000011111 5.7ns
nthOneBits(13) #--> 262175 00000000000001000000000000011111 49.2ns
Like nthOneBits(), this one does not need any setup time. It could be used in combination with nthOneBits() to get subsequent patterns after getting an initial one at a given position. nextOneBits() is much faster than nthOneBits(i+1) but is still slower than the generator function.
For very large integers, using nthOneBits() and nextOneBits() may be the only practical options.
You are dealing with permutations of multisets. There are many ways to achieve this and as #BPL points out, doing this efficiently is non-trivial. There are many great methods mentioned here: permutations with unique values. The cleanest (not sure if it's the most efficient), is to use the multiset_permutations from the sympy module.
import time
from sympy.utilities.iterables import multiset_permutations
t = time.process_time()
## Credit to #BPL for the general setup
multiPerms = ["".join(v) for v in multiset_permutations(["1"]*6+["0"]*26)]
elapsed_time = time.process_time() - t
print(elapsed_time)
On my machine, the above computes in just over 8 seconds. It generates just under a million results as well:
len(multiPerms)
906192
x = [1,2,3,4,5,6,7,8,9,10]
#Random list elements
for i in range(int(len(x)/2)):
value = x[i]
x[i] = x[len(x)-i-1]
x[len(x)-i-1] = value
#Confusion on efficiency
print(x)
This is a uni course for first year. So no python shortcuts are allowed
Not sure what counts as "a shortcut" (reversed and the "Martian Smiley" [::-1] being obvious candidates -- but does either count as "a shortcut"?!), but at least a couple small improvements are easy:
L = len(x)
for i in range(L//2):
mirror = L - i - 1
x[i], x[mirror] = x[mirror], x[i]
This gets len(x) only once -- it's a fast operation but there's no reason to keep repeating it over and over -- also computes mirror but once, does the swap more directly, and halves L (for the range argument) directly with the truncating-division operator rather than using the non-truncating division and then truncating with int. Nanoseconds for each case, but it may be considered slightly clearer as well as microscopically faster.
x = [1,2,3,4,5,6,7,8,9,10]
x = x.__getitem__(slice(None,None,-1))
slice is a python builtin object (like range and len that you used in your example)
__getitem__ is a method belonging to iterable types ( of which x is)
there are absolutely no shortcuts here :) and its effectively one line.
I currently have ↓ set as my randprime(p,q) function. Is there any way to condense this, via something like a genexp or listcomp? Here's my function:
n = randint(p, q)
while not isPrime(n):
n = randint(p, q)
It's better to just generate the list of primes, and then choose from that line.
As is, with your code there is the slim chance that it will hit an infinite loop, either if there are no primes in the interval or if randint always picks a non-prime then the while loop will never end.
So this is probably shorter and less troublesome:
import random
primes = [i for i in range(p,q) if isPrime(i)]
n = random.choice(primes)
The other advantage of this is there is no chance of deadlock if there are no primes in the interval. As stated this can be slow depending on the range, so it would be quicker if you cached the primes ahead of time:
# initialising primes
minPrime = 0
maxPrime = 1000
cached_primes = [i for i in range(minPrime,maxPrime) if isPrime(i)]
#elsewhere in the code
import random
n = random.choice([i for i in cached_primes if p<i<q])
Again, further optimisations are possible, but are very much dependant on your actual code... and you know what they say about premature optimisations.
Here is a script written in python to generate n random prime integers between tow given integers:
import numpy as np
def getRandomPrimeInteger(bounds):
for i in range(bounds.__len__()-1):
if bounds[i + 1] > bounds[i]:
x = bounds[i] + np.random.randint(bounds[i+1]-bounds[i])
if isPrime(x):
return x
else:
if isPrime(bounds[i]):
return bounds[i]
if isPrime(bounds[i + 1]):
return bounds[i + 1]
newBounds = [0 for i in range(2*bounds.__len__() - 1)]
newBounds[0] = bounds[0]
for i in range(1, bounds.__len__()):
newBounds[2*i-1] = int((bounds[i-1] + bounds[i])/2)
newBounds[2*i] = bounds[i]
return getRandomPrimeInteger(newBounds)
def isPrime(x):
count = 0
for i in range(int(x/2)):
if x % (i+1) == 0:
count = count+1
return count == 1
#ex: get 50 random prime integers between 100 and 10000:
bounds = [100, 10000]
for i in range(50):
x = getRandomPrimeInteger(bounds)
print(x)
So it would be great if you could use an iterator to give the integers from p to q in random order (without replacement). I haven't been able to find a way to do that. The following will give random integers in that range and will skip anything that it's tested already.
import random
fail = False
tested = set([])
n = random.randint(p,q)
while not isPrime(n):
tested.add(n)
if len(tested) == p-q+1:
fail = True
break
while n in s:
n = random.randint(p,q)
if fail:
print 'I failed'
else:
print n, ' is prime'
The big advantage of this is that if say the range you're testing is just (14,15), your code would run forever. This code is guaranteed to produce an answer if such a prime exists, and tell you there isn't one if such a prime does not exist. You can obviously make this more compact, but I'm trying to show the logic.
next(i for i in itertools.imap(lambda x: random.randint(p,q)|1,itertools.count()) if isPrime(i))
This starts with itertools.count() - this gives an infinite set.
Each number is mapped to a new random number in the range, by itertools.imap(). imap is like map, but returns an iterator, rather than a list - we don't want to generate a list of inifinite random numbers!
Then, the first matching number is found, and returned.
Works efficiently, even if p and q are very far apart - e.g. 1 and 10**30, which generating a full list won't do!
By the way, this is not more efficient than your code above, and is a lot more difficult to understand at a glance - please have some consideration for the next programmer to have to read your code, and just do it as you did above. That programmer might be you in six months, when you've forgotten what this code was supposed to do!
P.S - in practice, you might want to replace count() with xrange (NOT range!) e.g. xrange((p-q)**1.5+20) to do no more than that number of attempts (balanced between limited tests for small ranges and large ranges, and has no more than 1/2% chance of failing if it could succeed), otherwise, as was suggested in another post, you might loop forever.
PPS - improvement: replaced random.randint(p,q) with random.randint(p,q)|1 - this makes the code twice as efficient, but eliminates the possibility that the result will be 2.
I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.
The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?
It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.
here's an example:
import numpy
def my_mean(values):
n = 0
Sum = 0.0
try:
while True:
Sum += next(values)
n += 1
except StopIteration: pass
return float(Sum)/n
X = [k for k in range(1,7)]
Y = (k for k in range(1,7))
print numpy.mean(X)
print my_mean(Y)
these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.
I really like the idea of working with generators, but details like this seem to spoil things.
In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.
The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.
def mean(data):
n = 0
mean = 0.0
for x in data:
n += 1
mean += (x - mean)/n
if n < 1:
return float('nan')
else:
return mean
I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.
Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.
def my_mean(values):
n = 0
Sum = 0.0
for v in values:
Sum += v
n += 1
return Sum / n
def my_mean(values):
total = 0
for n, v in enumerate(values, 1):
total += v
return total / n
print my_mean(X)
print my_mean(Y)
There is statistics.mean() in Python 3.4 but it calls list() on the input:
def mean(data):
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
return _sum(data)/n
where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).
The old-fashioned way to do it:
def my_mean(values):
sum, n = 0, 0
for x in values:
sum += x
n += 1
return float(sum)/n
One way would be
numpy.fromiter(Y, int).mean()
but this actually temporarily stores the numbers.
Your approach is a good one, but you should instead use the for x in y idiom instead of repeatedly calling next until you get a StopIteration. This works for both lists and generators:
def my_mean(values):
n = 0
Sum = 0.0
for value in values:
Sum += value
n += 1
return float(Sum)/n
You can use reduce without knowing the size of the array:
from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)
def my_mean(values):
n = 0
sum = 0
for v in values:
sum += v
n += 1
return sum/n
The above is very similar to your code, except by using for to iterate values you are good no matter if you get a list or an iterator.
The python sum method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.
(Also notice that since you are using python3, you don't need float(sum)/n)
If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:
reduce(np.add, generator)/length
Try:
import itertools
def mean(i):
(i1, i2) = itertools.tee(i, 2)
return sum(i1) / sum(1 for _ in i2)
print mean([1,2,3,4,5])
tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.
(Note that 'tee' will still use intermediate storage).