Again I have a question concerning large loops.
Suppose I have a function
limits
def limits(a,b):
*evaluate integral with upper and lower limits a and b*
return float result
A and B are simple np.arrays that store my values a and b. Now I want to calculate the integral 300'000^2/2 times because A and B are of the length of 300'000 each and the integral is symmetrical.
In Python I tried several ways like itertools.combinations_with_replacement to create the combinations of A and B and then put them into the integral but that takes huge amount of time and the memory is totally overloaded.
Is there any way, for example transferring the loop in another language, to speed this up?
I would like to run the loop
for i in range(len(A)):
for j in range(len(B)):
np.histogram(limits(A[i],B[j]))
I think histrogramming the return of limits is desirable in order not to store additional arrays that grow squarely.
From what I read python is not really the best choice for this iterative ansatzes.
So would it be reasonable to evaluate this loop in another language within Python, if yes, How to do it. I know there are ways to transfer code, but I have never done it so far.
Thanks for your help.
If you're worried about memory footprint, all you need to do is bin the results as you go in the for loop.
num_bins = 100
bin_upper_limits = np.linspace(-456, 456, num=num_bins-1)
# (last bin has no upper limit, it goes from 456 to infinity)
bin_count = np.zeros(num_bins)
for a in A:
for b in B:
if b<a:
# you said the integral is symmetric, so we can skip these, right?
continue
new_result = limits(a,b)
which_bin = np.digitize([new_result], bin_upper_limits)
bin_count[which_bin] += 1
So nothing large is saved in memory.
As for speed, I imagine that the overwhelming majority of time is spent evaluating limits(a,b). The looping and binning is plenty fast in this case, even in python. To convince yourself of this, try replacing the line new_result = limits(a,b) with new_result = 234. You'll find that the loop runs very fast. (A few minutes on my computer, much much less than the 4 hour figure you quote.) Python does not loop very fast compared to C, but it doesn't matter in this case.
Whatever you do to speed up the limits() call (including implementing it in another language) will speed up the program.
If you change the algorithm, there is vast room for improvement. Let's take an example of what it seems you're doing. Let's say A and B are 0,1,2,3. You're integrating a function over the ranges 0-->0, 0-->1, 1-->1, 1-->2, 0-->2, etc. etc. You're re-doing the same work over and over. If you have integrated 0-->1 and 1-->2, then you can add up those two results to get the integral 0-->2. You don't have to use a fancy integration algorithm, you just have to add two numbers you already know.
Therefore it seems to me that you can compute integrals in all the smallest ranges (0-->1, 1-->2, 2-->3), store the results in an array, and add subsets of the results to get the integral over whatever range you want. If you want this program to run in a few minutes instead of 4 hours, I suggest thinking through an alternative algorithm along those lines.
(Sorry if I'm misunderstanding the problem you're trying to solve.)
Related
I'm currently trying to solve the 'dance recital' kattis challenge in Python 3. See here
After taking input on how many performances there are in the dance recital, you must arrange performances in such a way that sequential performances by a dancer are minimized.
I've seen this challenge completed in C++, but my code kept running out of time and I wanted to optimize it.
Question: As of right now, I generate all possible permutations of performances and run comparisons off of that. A faster way to would be to not generate all permutations, as some of them are simply reversed and would result in the exact same output.
import itertools
print(list(itertools.permutations(range(2)))) --> [(0,1),(1,0)] #They're the same, backwards and forwards
print(magic_algorithm(range(2))) --> [(0,1)] #This is what I want
How might I generate such a list of permutations?
I've tried:
-Generating all permutation, running over them again to reversed() duplicates and saving them. This takes too long and the result cannot be hard coded into the solution as the file becomes too big.
-Only generating permutations up to the half-way mark, then stopping, assuming that after that, no unique permutations are generated (not true, as I found out)
-I've checked out questions here, but no one seems to have the same question as me, ditto on the web
Here's my current code:
from itertools import permutations
number_of_routines = int(input()) #first line is number of routines
dance_routine_list = [0]*10
permutation_list = list(permutations(range(number_of_routines))) #generate permutations
for q in range(number_of_routines):
s = input()
for c in s:
v = ord(c) - 65
dance_routine_list[q] |= (1 << v) #each routine ex.'ABC' is A-Z where each char represents a performer in the routine
def calculate():
least_changes_possible = 1e9 #this will become smaller, as optimizations are found
for j in permutation_list:
tmp = 0
for i in range(1,number_of_routines):
tmp += (bin(dance_routine_list[j[i]] & dance_routine_list[j[i - 1]]).count('1')) #each 1 represents a performer who must complete sequential routines
least_changes_possible = min(least_changes_possible, tmp)
return least_changes_possible
print(calculate())
Edit: Took a shower and decided adding a 2-element-comparison look-up table would speed it up, as many of the operations are repeated. Still doesn't fix iterating over the whole permutations, but it should help.
Edit: Found another thread that answered this pretty well. How to generate permutations of a list without "reverse duplicates" in Python using generators
Thank you all!
There are at most 10 possible dance routines, so at most 3.6M permutations, and even bad algorithms like generate 'em all and test will be done very quickly.
If you wanted a fast solution for up to 24 or so routines, then I would do it like this...
Given the the R dance routines, at any point in the recital, in order to decide which routine you can perform next, you need to know:
Which routines you've already performed, because there you can't do those ones next. There are 2R possible sets of already-performed routines; and
Which routine was performed last, because that helps determine the cost of the next one. There are at most R-1 possible values for that.
So there are at less than (R-2)*2R possible recital states...
Imagine a directed graph that connects each possible state to all the possible following states, by an edge for the routine that you would perform to get to that state. Label each edge with the cost of performing that routine.
For example, if you've performed routines 5 and 6, with 5 last, then you would be in state (5,6):5, and there would be an edge to (3,5,6):3 that you could get to after performing routine 3.
Starting at the initial "nothing performed yet" state ():-, use Dijkstra's algorithm to find the least cost path to a state with all routines performed.
Total complexity is O(R2*2R) or so, depending exactly how you implement it.
For R=10, R2*2R is ~100 000, which won't take very long at all. For R=24 it's about 9 billion, which is going to take under half a minute in pretty good C++.
I am not sure what maxint is, and what it does. But I am trying to solve this problem without using maxint, possibly write another function to do this.
I would like to write this function without using:
from sys import maxint
Here is what I am trying to implement without using maxint.
def maxSubArraySum(a,size):
maxi = -maxint - 1
maxi_ends = 0
for i in range(0, size):
maxi_ends = maxi_ends + a[i]
if (maxi < maxi_ends):
maxi = maxi_ends
if maxi_ends < 0:
maxi_ends = 0
return maxi
this is a common pattern in programming (see this answer): to find the maximum number in a list, initialize ans to a value that's guaranteed to be lower than anything else in the list, then go through the list and simply perform the operation
if element > ans:
ans = element
alternatively, you could set ans to be the value of the first value, but this is less elegant, since you need to check for the existence of the first element (otherwise, if the list is empty, you will get an error). Or, the "list" might not be an actual list, but rather an iterator, like in your example, so getting the first value would be messy.
Anyways, that's the point of initializing your variable to be an extreme value. MaxInt, in other languages like java and C, are very useful and nice to use as this extreme value, because the values you encounter along the way are literally not possibly bigger than MaxInt.
But note that, as I said in the very beginning, the point of infinity/maxint/minint is only to be larger/smaller than anything you could possibly encounter. So, for problems like the one you posted, you can usually easily make yourself a lower-bound for the smallest possible value of any maxi_ends. And you can make it a really loose lower-bound too (it doesn't have to be realistically attainable). So here, one way to find the lowest possible value of maxi_ends would be the sum of all the negative values in a. maxi_ends can't possibly go any lower than that.
Even easier, if you know what the smallest possible value of a[i] is, and you know the maximum possible length of a, you can calculate the smallest possible value that maxi_end could take for any input, right off the bat. This is useful in these kinds of abstract programming problems that you see on coding competitions, coding interview prep, and homework, especially since you just need a quick and dirty solution. For example, if a[i] can't be any less than -100, and len(a) is at most 1000, then maxi_end will never exceed -100 * 1000 = -100000, so you can just write maxi = -100001.
It's also a useful python trick in some real life situations (again, where you only need a quick hack) to pick a preposterously large number if you are lazy -- for example, if you're estimating the shortest path, in miles, between two buildings in a road network -- you could just pick 1000000000000 as an upper-bound, since you know no path will ever be that high.
I don't know why you don't want to use sys.maxint (although as the comments say, it's a good idea not to use it), but there are actual ways to represent infinity in python: in python 3.5 or above, you can do
import math
test = math.inf
and otherwise, you can use float('inf').
I am quite new to Python but I've started using it for some data analysis and now I love it. Before, I used C, which I find just terrible for file I/O.
I am working on a script which computes the radial distribution function between a set of N=10000 (ten thousands) points in a 3D box with periodic boundary conditions (PBC). Basically, I have a file of 10000 lines made like this:
0.037827 0.127853 -0.481895
12.056849 -12.100216 1.607448
10.594823 1.937731 -9.527205
-5.333775 -2.345856 -9.217283
-5.772468 -10.625633 13.097802
-5.290887 12.135528 -0.143371
0.250986 7.155687 2.813220
which represents the coordinates of the N points. What I have to do is to compute the distance between every couple of points (I hence have to consider all the 49995000 combinations of 2 elements) and then do some operation on it.
Of course the most taxing part of the program is the loop over the 49995000 combinations.
My current function looks like this:
g=[0 for i in range(Nbins)]
for i in range(Nparticles):
for j in range(i+1,Nparticles):
#compute distance and apply PBC
dx=coors[i][0]-coors[j][0]
if(dx>halfSide):
dx-=boxSide
elif(dx<-halfSide):
dx+=boxSide
dy=coors[i][1]-coors[j][1]
if(dy>halfSide):
dy-=boxSide
elif(dy<-halfSide):
dy+=boxSide
dz=coors[i][2]-coors[j][2]
if(dz>halfSide):
dz-=boxSide
elif(dz<-halfSide):
dz+=boxSide
d2=dx**2+dy**2+dz**2
if(d2<(cutoff*boxSide)**2):
g[int(sqrt(d2)/dr)]+=2
Notice: coors is a nested array, created using loadtxt() on the data file.
I basically recycled a function that I used in another program, written in C.
I am not using itertool.combinations() because I have noticed that the program runs slightly slower if I use it for some reason (one iteration is around 111 s while it runs in around 106 with this implementation).
This function takes around 106 s to run, which is terrible considering that I have to analyze some 500 configuration files.
Now, my question is: is there general a way to make this kind of loops run faster using Python? I guess that the corresponding C code would be faster, but I would like to stick to Python because it is so much easier to use.
I would like to stress that even though I'm certainly looking for a particular solution to my problem, I would like to know how to iterate more efficiently when using Python in general.
PS Please try to explain the code in your answers as much as possible (if you write any) because I'm a newbie in Python.
Update
First, I want to say that I know that, since there is a cutoff, I could write a more efficient algorithm if I divide the box in smaller boxes and only compute the distances in neighboring boxes, but for now I would only like to speed up this particular algorithm.
I also want to say that using Cython (I followed this tutorial) I managed to speed up everything a little bit, as expected (77 s, where before it took 106).
If memory is not an issue (and it probably isn't given that the actual amount of data won't be different from what you are doing now), you can use numpy to do the math for you and put everything into an NxN array (around 800MB at 8 bytes/float).
Given the operations your code is trying to do, I do not think you need any loops outside numpy:
g = numpy.zeros((Nbins,))
coors = numpy.array(coors)
#compute distance and apply PBC
dx = numpy.subtract.outer(coors[:, 0], coors[:, 0])
dx[dx < -halfSide] += boxSide
dx[dx > halfSide)] -= boxSide
dy = numpy.subtract.outer(coors[:, 1], coors[:, 1])
dy[dy < -halfSide] += boxSide
dy[dy > halfSide] -= boxSide
dz = numpy.subtract.outer(coors[:, 2], coors[:, 2])
dz[dz < -halfSide] += boxSide
dz[dz > halfSide] -= boxSide
d2=dx**2 + dy**2 + dz**2
# Avoid counting diagonal elements: inf would do as well as nan
numpy.fill_diagonal(d2, numpy.nan)
# This is the same length as the number of times the
# if-statement in the loops would have triggered
cond = numpy.sqrt(d2[d2 < (cutoff*boxSide)**2]) / dr
# See http://stackoverflow.com/a/24100418/2988730
np.add.at(g, cond.astype(numpy.int_), 2)
The point is to always stay away from looping in Python when dealing with large amounts of data. Python loops are inefficient. They perform lots of book-keeping operations that slow down the math code.
Libraries such as numpy and dynd provide operations that run the loops you want "under the hood", usually bypassing the bookkeeping with a C implementation. The advantage is that you can do the Python-level book-keeping once for each enormous array and then get on with processing the raw numbers instead of dealing with Python wrapper objects (numbers are full blown-objects in Python, there are no primitive types).
In this particular example, I have recast your code to use a sequence of array-level operations that do the same thing that your original code did. This requires restructuring how you think about the steps a little, but should save a lot on runtime.
You've got an interesting problem there. There are some common guidelines for high performance Python; these are for Python2 but should carry for the most part to Python3.
Profile your code. Using %timeit and %%timeit on jupyter/ipython is quick and fast for interactive sessions, but cProfile and line_profiler are valuable for finding bottlenecks.
This a link to a short essay that covers the basics from the Python documentation that I found helpful: https://www.python.org/doc/essays/list2str
Numpy is a great package for vectorized operations. Note that numpy vectors are generally slower to work with than small-medium size lists but the memory savings are huge. Speed gain is substantial over lists when doing multi-dimensional arrays. Also if you start observing cache misses and page-faults with pure Python, then the numpy benefit will be even greater.
I have been using Cython recently with a fair amount of success, where numpy/scipy don't quite cut it with the builtin functions.
Check out scipy which has a huge library of functions for scientific computing. There's a lot to explore; eg., the scipy.spatial.pdist function computes fast nC2 pairwise distances. In the testrun below, 10k items pairwise distances completed in 375ms. 100k items will probably break my machine though without refactoring.
import numpy as np
from scipy.spatial.distance import pdist
xyz_list = np.random.rand(10000, 3)
xyz_list
Out[2]:
array([[ 0.95763306, 0.47458207, 0.24051024],
[ 0.48429121, 0.12201472, 0.80701931],
[ 0.26035835, 0.76394588, 0.7832222 ],
...,
[ 0.07835084, 0.8775841 , 0.20906537],
[ 0.73248369, 0.60908474, 0.57163023],
[ 0.68945879, 0.19393467, 0.23771904]])
In [10]: %timeit xyz_pairwise = pdist(xyz_list)
1 loop, best of 3: 375 ms per loop
In [12]: xyz_pairwise = pdist(xyz_list)
In [13]: len(xyz_pairwise)
Out[13]: 49995000
Happy exploring!
I have to solve many independent constrained linear least squares problems (with both bounds and constrains). So I do it multiple times in loops. For each problem I am looking for x that is min||Cx-d|| and x is bounded (in 0,1) and all x elements must sum to 1.
I am seeking a way of doing it fast, because although each optimization doesn't require a lot of time, I need to include it in a large loop.
For example my Matlab implementation seems like this:
img = imread('test.tif');
C = randi(255,6,4);
x=zeros(size(C,2),1);
tp = zeros(size(C,2),1);
Aeq = ones(1,size(C,2));
beq = 1;
options = optimset('LargeScale','off','Display','off');
A = (-1).*eye(size(C,2));
b = zeros(1,size(C,2));
result = zeros(size(img,1),size(img,2),size(C,2));
for i=1:size(img,1)
for j=1:size(img,2)
for k=1:size(img,3)
tp(k) = img(i,j,k);
end
x = lsqlin(C,tp,A,b,Aeq,beq,[],[],[],options);
for l=1:size(C,2)
result(i,j,l)=x(l);
end
end
end
and for a 500x500 loop it takes about 5 minutes. But my loops are much bigger than this. Any idea is welcome, but I would prefer a Matlab, Python or R solution.
As mentioned in the comments: The used commands are highly optimized, and switching languages is not going to make a world of difference.
First of all go back to what you are trying to achieve, perhaps to achieve that you don't need to solve this many hard problems.
If you come to the conclusion that you do want to do this exact calculation, consider changing the settings, and play around with the algorithm.
From doc lsqlin it appears the two main parameters that determine the speed are:
Number of iterations
Accepted tolerance
If all these things don't help enough, consider some tricks like:
Only solving problems that are sufficiently different
Solving with easier constraints, and rounding the result
I am new to python and my problem is the following:
I have defined a function func(a,b) that return a value, given two input values.
Now I have my data stored in lists or numpy arrays A,Band would like to use func for every combination. (A and B have over one million entries)
ATM i use this snippet:
for p in A:
for k in B:
value = func(p,k)
This takes really really a lot of time.
So i was thinking that maybe something like this:
C=(map(func,zip(A,B)))
But this method only works pairwise... Any ideas?
Thanks for help
First issue
You need to calculate the output of f for many pairs of values. The "standard" way to speed up this kind of loops (calculations) is to make your function f accept (NumPy) arrays as input, and do the calculation on the whole array at once (ie, no looping as seen from Python). Check any NumPy tutorial to get an introduction.
Second issue
If A and B have over a million entries each, there are one trillion combinations. For 64 bits numbers, that means you'll need 7.3 TiB of space just to store the result of your calculation. Do you have enough hard drive to just store the result?
Third issue
If A and B where much smaller, in your particular case you'd be able to do this:
values = f(*meshgrid(A, B))
meshgrid returns the cartesian product of A and B, so it's simply a way to generate the points that have to be evaluated.
Summary
You need to use NumPy effectively to avoid Python loops. (Or if all else fails or they can't easily be vectorized, write those loops in a compiled language, for instance by using Cython)
Working with terabytes of data is hard. Do you really need that much data?
Any solution that calls a function f 1e12 times in a loop is bound to be slow, specially in CPython (which is the default Python implementation. If you're not really sure and you're using NumPy, you're using it too).
suppose, itertools.product does what you need:
from itertools import product
pro = product(A,B)
C = map(lambda x: func(*x), pro)
so far as it is generator it doesn't require additional memory
One million times one million is one trillion. Calling f one trillion times will take a while.
Unless you have a way of reducing the number of values to compute, you can't do better than the above.
If you use NumPy, you should definitely look the np.vectorize function which is designed for this kind of problems...