Find the 'pits' in a list of integers. Fast

Find the 'pits' in a list of integers. Fast - python

I'm attempting to write a program which finds the 'pits' in a list of
integers.
A pit is any integer x where x is less than or equal to the integers
immediately preceding and following it. If the integer is at the start
or end of the list it is only compared on the inward side.
For example in:
[2,1,3] 1 is a pit.
[1,1,1] all elements are pits.
[4,3,4,3,4] the elements at 1 and 3 are pits.
I know how to work this out by taking a linear approach and walking along
the list however i am curious about how to apply divide and conquer
techniques to do this comparatively quickly. I am quite inexperienced and
am not really sure where to start, i feel like something similar to a binary
tree could be applied?
If its pertinent i'm working in Python 3.
Thanks for your time :).

Without any additional information on the distribution of the values in the list, it is not possible to achieve any algorithmic complexity of less than O(x), where x is the number of elements in the list.
Logically, if the dataset is random, such as a brownian noise, a pit can happen anywhere, requiring a full 1:1 sampling frequency in order to correctly find every pit.
Even if one just wants to find the absolute lowest pit in the sequence, that would not be possible to achieve in sub-linear time without repercussions on the correctness of the results.
Optimizations can be considered, such as mere parallelization or skipping values neighbor to a pit, but the overall complexity would stay the same.

Related

How do write a code to distribute different weights as evenly as possible among 4 boxes?

I was given a problem in which you are supposed to write a python code that distributes a number of different weights among 4 boxes.
Logically we can't expect a perfect distribution as in case we are given weights like 10, 65, 30, 40, 50 and 60 kilograms, there is no way of grouping those numbers without making one box heavier than another. But we can aim for the most homogenous distribution. ((60),(40,30),(65),(50,10))
I can't even think of an algorithm to complete this task let alone turn it into python code. Any ideas about the subject would be appreciated.

The problem you're describing is similar to the "fair teams" problem, so I'd suggest looking there first.
Because a simple greedy algorithm where weights are added to the lightest box won't work, the most straightforward solution would be a brute force recursive backtracking algorithm that keeps track of the best solution it has found while iterating over all possible combinations.

As stated in #j_random_hacker's response, this is not going to be something easily done. My best idea right now is to find some baseline. I describe a baseline as an object with the largest value since it cannot be subdivided. Using that you can start trying to match the rest of the data to that value which would only take about three iterations to do. The first and second would create a list of every possible combination and then the third can go over that list and compare the different options by taking the average of each group and storing the closest average value to your baseline.
Using your example, 65 is the baseline and since you cannot subdivide it you know that has to be the minimum bound on your data grouping so you would try to match all of the rest of the values to that. It wont be great, but it does give you something to start with.

As j_random_hacker notes, the partition problem is NP-complete. This problem is also NP-complete by a reduction from the 4-partition problem (the article also contains a link to a paper by Garey and Johnson that proves that 4-partition itself is NP-complete).
In particular, given a list to 4-partition, you could feed that list as an input to a function that solves your box distribution problem. If each box had the same weight in it, a 4-partition would exist, otherwise not.
Your best bet would be to create an exponential time algorithm that uses backtracking to iterate over the 4^n possible assignments. Because unless P = NP (highly unlikely), no polynomial time algorithm exists for this problem.

Time and Space analysis in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Can someone provide an example of O(log(n)) and O(nlog(n)) problems for both time and space?
I am quiet new to this type of analysis and can not see past polynomial time/space.
What I don't get is how can you be O(1) < O(log(n)) < O(n) is that
like "semi-constant"?
Additionally, I would appreciate any great examples which cover these cases (both time and space):
I find space analysis a bit more ambiguous so it would be nice to see it compared to other cases from the time analysis in the same place - something I couldn't find reliably online.
Can you provide examples for each case in both space and time
analysis?

Before examples, a little clarification on big O notation
Perhaps I'm mistaken, but seeing
What I don't get is how can you be O(1) < O(log(n)) < O(n) is that like "semi-constant"?
makes me think that you have been introduced to the idea of big-O notation as the number of operation to be carried (or number of bytes to be stored, etc), e.g. if you have a loop for(int i=0;i<n;++i) then there are n operations so the time complexity is O(n). While this is a nice first intuition, I think that it can be misleading as big-O notation defines a higher asymptotic bound.
Let's say that you have chosen an algorithm to sort an array of numbers, and let's denote x the number of element in that array, and f(x) the time complexity of that algorithm. Assume now that we say that the algorithm is O(g(x)). What this means is that as x grows, we will eventually reach a threshold x_t such that if x_i>x_t, then abs(f(x_i)) will always be lower or equal than alpha*g(x_i) where alpha is a postivie real number.
As a result, a function that is O(1) doesn't always take the same constant time, rather, you can be sure that no matter how many data it needs, the time it will take to complete its task will be lower than a constant amount of time, e.g. 5seconds. Similarly, O(log(n)) doesn't mean that there is any notion of a semi-constant. It just means that 1) the time the algorithm will take will depend on the size of the dataset that you feed it and 2) If the dataset is large enough (i.e. n is sufficiently large) then the time that it will take for it to complete is will always be less or equal than log(n).
Some examples regarding time complexity
O(1): Accessing an element from an array.
O(log(n)): binary search in an incrementally sorted array. Say you have an array of n elements and you want to find the index where the value is equal to x. You can start at the middle of the array, and if the value v that you read there is greater than x, you repeat the same process on the left side of v, and if it's smaller you look to the right side of v. You continue this process until the value you're looking for is found. As you can see, if you're lucky, you can find the value in the middle of the array on first try, or you can find it after log(n) operations. So there is no semi-constancy, and Big-O notation tells you the worst case.
O(nlogn): sorting an array using Heap sort. This is a bit too long to explain here.
O(n^2): computing the sum of all pixels on square gray-scale images (which you can consider as a 2d matrix of numbers).
O(n^3): naively multiplying two matrices of size n*n.
O(n^{2+epsilon}): multiplying matrices in smart ways (see wikipedia)
O(n!) naively computing a factorial.
Some examples regarding space complexity
O(1) Heapsort. One might think that since you need to remove variables from the root of the tree, you will need extra space. However, since a heap can just be implemented as an array, you can store the removed values at the end of said array instead of allocating new space.
An interesting example would be, I think, to compare two solutions to a classical problem: assume you have an array X of integers and a target value T, and that you are given the guarentee that there exist two values x,y in X such that x+y==T. You goal is to find those two values.
One solution (known as two-pointers) would be to sort the array using heapsort (O(1) space ) and then define two indexes i,j that respectively point to the start and end of the sorted array X_sorted. Then, if X[i]+X[j]<T, we increment i and if X[i]+X[j]>T, we decrement j. We stop when X[i]+X[j]==T. It's obvious that this requires no extra allocations, and so the solution has O(1) space complexity. A second solution would be this:
D={}
for i in range(len(X)):
D[T-X[i]]=i
for x in X:
y=T-x
if y in D:
return X[D[y]],x
which has space complexity O(n) because of the dictionary.
The examples given for time complexity above are (except the one regarding efficient matrix multiplications) pretty straight-forward to derive. As others have said I think that reading a book on the subject is your best bet at understanding this topic in depth. I highly recomment Cormen's book.

Here is a rather trivial answer: whatever formula f(n) you have, the following algorithms run in O(f(n)) time and space respectively, so long as f itself isn't too slow to compute.
def meaningless_waste_of_time(n):
m = f(n)
for i in range(int(m)):
print('foo')
def meaningless_waste_of_space(n):
m = f(n)
lst = []
for i in range(int(m)):
lst.append('bar')
For example, if you define f = lambda n: (n ** 2) * math.log(n) then the time and space complexities will be O(n² log n) respectively.

First of all I would like to point out the fact that we find out Time Complexity or Space Complexity of an Algorithm and not that of a programming language. If you consider calculating the time complexity of any program I can only suggest you go for C. Calculating Time Complexity in python is technically very difficult.
Example:
Say you are creating an list and the sorting it at every pass of a for loop, something like this
n = int(input())
for i in range(n):
l.append(int(input())
l = sorted(l)
Here, on the first glance our intuition will be that this has a time complexity of O(n), but on closer examination, one would notice that the sorted() function is being called and as we all know that any sorting algorithm can not be less than O(n log n) (except for radix and counting sort which have O(kn) and O(n+k) time complexity), so the minimum time complexity of this code will be O(n^2 log n).
With this I would suggest you to read some good Data Structure and Algorithm book for better understanding. You can go for a book which in prescribed in B. Tech or B.E. curriculum. Hope this helps you :)

Should I pickle primes in Python to produce primorials?

I produced an algorithm in Python to sieve the first n primes, then list the ordered pairs of the index n and the nth primorial p_n#.
Next I evaluate a function based on n and p_n# and finally, the objective, to determine whether the function f(n,p_n#) is monotonic, so the algorithm assesses where the sequence changes from rising to falling and vice-versa. The code is listed here for what it's worth.
This is of course memory-intensive and my pc can only cope with numbers up to around 2,000,000.
At any given point all I actually need is f(n-1), the ordered pair n,p_n#, the prime p_n (in order to quickly find the next prime), and a boolean indicating whether the sequence most recently rose or fell.
What are the best approaches to avoid storing a hundred thousand or more primes and primorials in memory while preserving speed?
I thought a first step would be to make a sieve that finds the one next prime above some given prime rather than every prime below some maximum. Then I can evaluate the next value of the function.
But I also wondered if it would be better to sieve batches of say 100 primes at a time. This could be supported by some "perpetual list" of ordered triples [n,p_n,p_n#] only containing n=100,200,300,... which I generate before runtime. Searching, I found the concept of "pickling" a list and wondered if this is the right scenario in which to use it, or is there a better way?

In python, how can I efficiently find a consecutive sequence that is a subset of a larger consecutive sequence?

I need to find all the days of the month where a certain activity occurs. The days when the activity occurs will be sequential. The sequence of days can range from one to the entire month, and the sequence will occur exactly one time per month.
To test whether or not the activity occurs on any given day is not an expensive calculation, but I thought I would use this problem learn something new. Which algorithm minimizes the number of days I have to test?

You can't really do much better than iterating through the sequence to find the first match, then iterating until the first non match. You can use itertools to make it nice and readable:
itertools.takewhile(mytest,
itertools.dropwhile(lambda x: not mytest(x), mysequence))

I think the linear probe suggested by #isbadawi is the best way to find the beginning of the subsequence. This is because the subsequence could be very short and could be anywhere within the larger sequence.
However, once the beginning of the subsequence is found, we can use a binary search to find the end of it. That will require fewer tests than doing a second linear probe, so it's a better solution for you.
As others have pointed out, there is not much practical reason for doing this. This is true for two reasons: your large sequence is quite short (only about 31 elements), and you still need to do at least one linear probe anyway, so the big-O runtime will be still be linear in the length of the large sequence, even though we have reduced part of the algorithm from linear to logarithmic.

The best method depends a bit on your input data structure. If your input data structure is a list of booleans for each day of the month then you can use the following code.
start = activity.find(True)
end = activity.rfind(True)

how to generate all possible combinations of a 14x10 matrix containing only 1's and 0's

I'm working on a problem and one solution would require an input of every 14x10 matrix that is possible to be made up of 1's and 0's... how can I generate these so that I can input every possible 14x10 matrix into another function? Thank you!
Added March 21: It looks like I didn't word my post appropriately. Sorry. What I'm trying to do is optimize the output of 10 different production units (given different speeds and amounts of downtime) for several scenarios. My goal is to place blocks of downtime to minimized the differences in production on a day-to-day basis. The amount of downtime and frequency each unit is allowed is given. I am currently trying to evaluate a three week cycle, meaning every three weeks each production unit is taken down for a given amount of hours. I was asking the computer to determine the order the units would be taken down based on the constraint that the lines come down only once every 3 weeks and the difference in daily production is the smallest possible. My first approach was to use Excel (as I tried to describe above) and it didn't work (no suprise there)... where 1- running, 0- off and when these are summed to calculate production. The calculated production is subtracted from a set max daily production. Then, these differences were compared going from Mon-Tues, Tues-Wed, etc for a three week time frame and minimized using solver. My next approach was to write a Matlab code where the input was a tolerance (set allowed variation day-to-day). Is there a program that already does this or an approach to do this easiest? It seems simple enough, but I'm still thinking through the different ways to go about this. Any insight would be much appreciated.

The actual implementation depends heavily on how you want to represent matrices… But assuming the matrix can be represented by a 14 * 10 = 140 element list:
from itertools import product
for matrix in product([0, 1], repeat=140):
# ... do stuff with the matrix ...
Of course, as other posters have noted, this probably isn't what you want to do… But if it really is what you want to do, that's the best code (given your requirements) to do it.

Generating Every possible matrix of 1's and 0's for 14*10 would generate 2**140 matrixes. I don't believe you would have enough lifetime for this. I don't know, if the sun would still shine before you finish that. This is why it is impossible to generate all those matrices. You must look for some other solution, this looks like a brute force.

This is absolutely impossible! The number of possible matrices is 2140, which is around 1.4e42. However, consider the following...
If you were to generate two 14-by-10 matrices at random, the odds that they would be the same are 1 in 1.4e42.
If you were to generate 1 billion unique 14-by-10 matrices, then the odds that the next one you generate would be the same as one of those would still be exceedingly slim: 1 in 1.4e33.
The default random number stream in MATLAB uses a Mersenne twister algorithm that has a period of 219936-1. Therefore, the random number generator shouldn't start repeating itself any time this eon.
Your approach should be thus:
Find a computer no one ever wants to use again.
Give it as much storage space as possible to save your results.
Install MATLAB on it and fire it up.
Start computing matrices at random like so:
while true
newMatrix = randi([0 1],14,10);
%# Process the matrix and output your results to disk
end
Walk away
Since there are so many combinations, you don't have to compare newMatrix with any of the previous matrices since the length of time before a repeat is likely to occur is astronomically large. Your processing is more likely to stop due to other reasons first, such as (in order of likely occurrence):
You run out of disk space to store your results.
There's a power outage.
Your computer suffers a fatal hardware failure.
You pass away.
The Earth passes away.
The Universe dies a slow heat death.
NOTE: Although I injected some humor into the above answer, I think I have illustrated one useful alternative. If you simply want to sample a small subset of the possible combinations (where even 1 billion could be considered "small" due to the sheer number of combinations) then you don't have to go through the extra time- and memory-consuming steps of saving all of the matrices you've already processed and comparing new ones to it to make sure you aren't repeating matrices. Since the odds of repeating a combination are so low, you could safely do this:
for iLoop = 1:whateverBigNumberYouWant
newMatrix = randi([0 1],14,10); %# Generate a new matrix
%# Process the matrix and save your results
end

Are you sure you want every possible 14x10 matrix? There are 140 elements in each matrix, and each element can be on or off. Therefore there are 2^140 possible matrices. I suggest you reconsider what you really want.
Edit: I noticed you mentioned in a comment that you are trying to minimize something. There is an entire mathematical field called optimization devoted to doing this type of thing. The reason this field exists is because quite often it is not possible to exhaustively examine every solution in anything resembling a reasonable amount of time.

Trying this:
import numpy
for i in xrange(int(1e9)): a = numpy.random.random_integers(0,1,(14,10))
(which is much, much, much smaller than what you require) should be enough to convince you that this is not feasible. It also shows you how to calculate one, or few, such random matrices even up to a million is pretty fast).
EDIT: changed to xrange to "improve speed and memory requirements" :)

You don't have to iterate over this:
def everyPossibleMatrix(x,y):
N=x*y
for i in range(2**N):
b="{:0{}b}".format(i,N)
yield '\n'.join(b[j*x:(j+1)*x] for j in range(y))

Depending on what you want to accomplish with the generated matrices, you might be better off generating a random sample and running a number of simulations. Something like:
matrix_samples = []
# generate 10 matrices
for i in range(10):
sample = numpy.random.binomial(1, .5, 14*10)
sample.shape = (14, 10)
matrix_samples.append(sample)
You could do this a number of times to see how results vary across simulations. Of course, you could also modify the code to ensure that there are no repeats in a sample set, again depending on what you're trying to accomplish.

Are you saying that you have a table with 140 cells and each value can be 1 or 0 and you'd like to generate every possible output? If so, you would have 2^140 possible combinations...which is quite a large number.

Instead of just suggesting the this is unfeasible, I would suggest considering a scheme that samples the important subset of all possible combinations instead of applying a brute force approach. As one of your replies suggested, you are doing minimization. There are numerical techniques to do this such as simulated annealing, monte carlo sampling as well as traditional minimization algorithms. You might want to look into whether one is appropriate in your case.

I was actually much more pessimistic to begin with, but consider:
from math import log, e
def timeInYears(totalOpsNeeded=2**140, currentOpsPerSecond=10**9, doublingPeriodInYears=1.5):
secondsPerYear = 365.25 * 24 * 60 * 60
doublingPeriodInSeconds = doublingPeriodInYears * secondsPerYear
k = log(2,e) / doublingPeriodInSeconds # time-proportionality constant
timeInSeconds = log(1 + k*totalOpsNeeded/currentOpsPerSecond, e) / k
return timeInSeconds / secondsPerYear
if we assume that computer processing power continues to double every 18 months, and you can currently do a billion combinations per second (optimistic, but for sake of argument) and you start today, your calculation will be complete on or about April 29th 2137.

Here is an efficient way to do get started Matlab:
First generate all 1024 possible rows of length 10 containing only zeros and ones:
dec2bin(0:2^10-1)
Now you have all possible rows, and you can sample from them as you wish. For example by calling the following line a few times:
randperm(1024,14)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.