Should I pickle primes in Python to produce primorials? - python

I produced an algorithm in Python to sieve the first n primes, then list the ordered pairs of the index n and the nth primorial p_n#.
Next I evaluate a function based on n and p_n# and finally, the objective, to determine whether the function f(n,p_n#) is monotonic, so the algorithm assesses where the sequence changes from rising to falling and vice-versa. The code is listed here for what it's worth.
This is of course memory-intensive and my pc can only cope with numbers up to around 2,000,000.
At any given point all I actually need is f(n-1), the ordered pair n,p_n#, the prime p_n (in order to quickly find the next prime), and a boolean indicating whether the sequence most recently rose or fell.
What are the best approaches to avoid storing a hundred thousand or more primes and primorials in memory while preserving speed?
I thought a first step would be to make a sieve that finds the one next prime above some given prime rather than every prime below some maximum. Then I can evaluate the next value of the function.
But I also wondered if it would be better to sieve batches of say 100 primes at a time. This could be supported by some "perpetual list" of ordered triples [n,p_n,p_n#] only containing n=100,200,300,... which I generate before runtime. Searching, I found the concept of "pickling" a list and wondered if this is the right scenario in which to use it, or is there a better way?

Related

Can a genetic algorithm optimize my NP-complete problem?

I have an array that stores a large collection of elements.
Any two elements can be compared in some function with the result being true or false.
The problem is to find the largest or at least a relatively large subgroup, where every element with all the others in that subgroup is in a true relationship.
Finding the largest subgroup from an array of size N requires N! operations, so the iterative way is out.
Randomly adding successive matching elements works, but the resulting subgroups are too small.
Can this problem be significantly optimised using a genetic algorithm and thus find much larger subgroups in a reasonable time?

Time and Space analysis in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Can someone provide an example of O(log(n)) and O(nlog(n)) problems for both time and space?
I am quiet new to this type of analysis and can not see past polynomial time/space.
What I don't get is how can you be O(1) < O(log(n)) < O(n) is that
like "semi-constant"?
Additionally, I would appreciate any great examples which cover these cases (both time and space):
I find space analysis a bit more ambiguous so it would be nice to see it compared to other cases from the time analysis in the same place - something I couldn't find reliably online.
Can you provide examples for each case in both space and time
analysis?
Before examples, a little clarification on big O notation
Perhaps I'm mistaken, but seeing
What I don't get is how can you be O(1) < O(log(n)) < O(n) is that like "semi-constant"?
makes me think that you have been introduced to the idea of big-O notation as the number of operation to be carried (or number of bytes to be stored, etc), e.g. if you have a loop for(int i=0;i<n;++i) then there are n operations so the time complexity is O(n). While this is a nice first intuition, I think that it can be misleading as big-O notation defines a higher asymptotic bound.
Let's say that you have chosen an algorithm to sort an array of numbers, and let's denote x the number of element in that array, and f(x) the time complexity of that algorithm. Assume now that we say that the algorithm is O(g(x)). What this means is that as x grows, we will eventually reach a threshold x_t such that if x_i>x_t, then abs(f(x_i)) will always be lower or equal than alpha*g(x_i) where alpha is a postivie real number.
As a result, a function that is O(1) doesn't always take the same constant time, rather, you can be sure that no matter how many data it needs, the time it will take to complete its task will be lower than a constant amount of time, e.g. 5seconds. Similarly, O(log(n)) doesn't mean that there is any notion of a semi-constant. It just means that 1) the time the algorithm will take will depend on the size of the dataset that you feed it and 2) If the dataset is large enough (i.e. n is sufficiently large) then the time that it will take for it to complete is will always be less or equal than log(n).
Some examples regarding time complexity
O(1): Accessing an element from an array.
O(log(n)): binary search in an incrementally sorted array. Say you have an array of n elements and you want to find the index where the value is equal to x. You can start at the middle of the array, and if the value v that you read there is greater than x, you repeat the same process on the left side of v, and if it's smaller you look to the right side of v. You continue this process until the value you're looking for is found. As you can see, if you're lucky, you can find the value in the middle of the array on first try, or you can find it after log(n) operations. So there is no semi-constancy, and Big-O notation tells you the worst case.
O(nlogn): sorting an array using Heap sort. This is a bit too long to explain here.
O(n^2): computing the sum of all pixels on square gray-scale images (which you can consider as a 2d matrix of numbers).
O(n^3): naively multiplying two matrices of size n*n.
O(n^{2+epsilon}): multiplying matrices in smart ways (see wikipedia)
O(n!) naively computing a factorial.
Some examples regarding space complexity
O(1) Heapsort. One might think that since you need to remove variables from the root of the tree, you will need extra space. However, since a heap can just be implemented as an array, you can store the removed values at the end of said array instead of allocating new space.
An interesting example would be, I think, to compare two solutions to a classical problem: assume you have an array X of integers and a target value T, and that you are given the guarentee that there exist two values x,y in X such that x+y==T. You goal is to find those two values.
One solution (known as two-pointers) would be to sort the array using heapsort (O(1) space ) and then define two indexes i,j that respectively point to the start and end of the sorted array X_sorted. Then, if X[i]+X[j]<T, we increment i and if X[i]+X[j]>T, we decrement j. We stop when X[i]+X[j]==T. It's obvious that this requires no extra allocations, and so the solution has O(1) space complexity. A second solution would be this:
D={}
for i in range(len(X)):
D[T-X[i]]=i
for x in X:
y=T-x
if y in D:
return X[D[y]],x
which has space complexity O(n) because of the dictionary.
The examples given for time complexity above are (except the one regarding efficient matrix multiplications) pretty straight-forward to derive. As others have said I think that reading a book on the subject is your best bet at understanding this topic in depth. I highly recomment Cormen's book.
Here is a rather trivial answer: whatever formula f(n) you have, the following algorithms run in O(f(n)) time and space respectively, so long as f itself isn't too slow to compute.
def meaningless_waste_of_time(n):
m = f(n)
for i in range(int(m)):
print('foo')
def meaningless_waste_of_space(n):
m = f(n)
lst = []
for i in range(int(m)):
lst.append('bar')
For example, if you define f = lambda n: (n ** 2) * math.log(n) then the time and space complexities will be O(n² log n) respectively.
First of all I would like to point out the fact that we find out Time Complexity or Space Complexity of an Algorithm and not that of a programming language. If you consider calculating the time complexity of any program I can only suggest you go for C. Calculating Time Complexity in python is technically very difficult.
Example:
Say you are creating an list and the sorting it at every pass of a for loop, something like this
n = int(input())
for i in range(n):
l.append(int(input())
l = sorted(l)
Here, on the first glance our intuition will be that this has a time complexity of O(n), but on closer examination, one would notice that the sorted() function is being called and as we all know that any sorting algorithm can not be less than O(n log n) (except for radix and counting sort which have O(kn) and O(n+k) time complexity), so the minimum time complexity of this code will be O(n^2 log n).
With this I would suggest you to read some good Data Structure and Algorithm book for better understanding. You can go for a book which in prescribed in B. Tech or B.E. curriculum. Hope this helps you :)

Create high nr of random sequences with min Edit Distance time efficient

I need to create a program/script for the creation of a high numbers of random sequences (20 letter long sequence based on 4 different letters) with a minimum edit distance between all sequences. "High" would here be a minimum of 100k sequences, but if possible up to 1 million.
I started with a naive approach of just generating random 20 letter sequences, and for each sequence, calculate the edit distance between the sequence and all other sequences already created and stored. If the new sequence pass my threshold value, store it, otherwise discard.
As you understand, this scales very badly for higher number of sequences. Up to 10k is reasonably fine, but trying to get 100k this starts to get troublesome.
I really only need to create the sequences once and store the output, so I'm really not that fussy about speed, but making 1 million at this rate today is just no possible.
Been trying to think of alternatives to speed up the process, like building the sequences is "blocks" of minimal ED and then combining, but haven't come up with any solution.
Wondering, do anyone have any smart idea/method that could be implemented to create such high number of sequences with minimal ED more time efficient?
Cheers,
JB
It seems, from wikipedia, that edit distance is one of three operations insertion, deletion, substitution; performed on a starting string. Why not systematically generate all strings up to N edits from a starting string then stop when you reach your limit?
There would be no need to check for the actual edit distance as they would be correct by generation. For randomness could you generate a number then shuffle them.

Why does an iterative approximation algorithm sometime have fewer iterations despite larger input values?

Why does my iterative approximation loop get executed fewer times when executed for 24690 than 12345 which is half the size?
I am using a bisection algorithm or bisecting search. Please help me.
It really depends on how your bisection loop is terminated. The divide-and-average method (also called Newton's method) can be iterated a fixed number of times (obviously not in your case) or it can go until the successive differences are within some tolerance. In the latter case, the number of iterations doesn't depend on size, it depends on the remainders of the divisions (i.e. how close the initial guesses are).
Hope this helps :-)
Consider a simple example of 4 and 8
4/2 =2 ( 2*2=4) therefore we get the answer in 1 iteration
8/2 =4 (4*4=16) therefore you will do more than 1 iteration
Number of steps is not linear to input value

Find the 'pits' in a list of integers. Fast

I'm attempting to write a program which finds the 'pits' in a list of
integers.
A pit is any integer x where x is less than or equal to the integers
immediately preceding and following it. If the integer is at the start
or end of the list it is only compared on the inward side.
For example in:
[2,1,3] 1 is a pit.
[1,1,1] all elements are pits.
[4,3,4,3,4] the elements at 1 and 3 are pits.
I know how to work this out by taking a linear approach and walking along
the list however i am curious about how to apply divide and conquer
techniques to do this comparatively quickly. I am quite inexperienced and
am not really sure where to start, i feel like something similar to a binary
tree could be applied?
If its pertinent i'm working in Python 3.
Thanks for your time :).
Without any additional information on the distribution of the values in the list, it is not possible to achieve any algorithmic complexity of less than O(x), where x is the number of elements in the list.
Logically, if the dataset is random, such as a brownian noise, a pit can happen anywhere, requiring a full 1:1 sampling frequency in order to correctly find every pit.
Even if one just wants to find the absolute lowest pit in the sequence, that would not be possible to achieve in sub-linear time without repercussions on the correctness of the results.
Optimizations can be considered, such as mere parallelization or skipping values neighbor to a pit, but the overall complexity would stay the same.

Categories