How to store array of really huge numbers in python? - python

I am working with huge numbers, such as 150!. To calculate the result is not a problem, by example
f = factorial(150) is
57133839564458545904789328652610540031895535786011264182548375833179829124845398393126574488675311145377107878746854204162666250198684504466355949195922066574942592095735778929325357290444962472405416790722118445437122269675520000000000000000000000000000000000000.
But I also need to store an array with N of those huge numbers, in full presison. A list of python can store it, but it is slow. A numpy array is fast, but can not handle the full precision, wich is required for some operations I perform later, and as I have tested, a number in scientific notation (float) does not produce the accurate result.
Edit:
150! is just an example of huge number, it does not mean I am working only with factorials. Also, the full set of numbers (NOT always a result of factorial) change over time, and I need to do the actualization and reevaluation of a function for wich those numbers are a parameter, and yes, full precision is required.

numpy arrays are very fast when they can internally work with a simple data type that can be directly manipulated by the processor. Since there is no simple, native data type that can store huge numbers, they are converted to a float. numpy can be told to work with Python objects but then it will be slower.
Here are some times on my computer. First the setup.
a is a Python list containing the first 50 factorials. b is a numpy array with all the values converted to float64. c is a numpy array storing Python objects.
import numpy as np
import math
a=[math.factorial(n) for n in range(50)]
b=np.array(a, dtype=np.float64)
c=np.array(a, dtype=np.object)
a[30]
265252859812191058636308480000000L
b[30]
2.6525285981219107e+32
c[30]
265252859812191058636308480000000L
Now to measure indexing.
%timeit a[30]
10000000 loops, best of 3: 34.9 ns per loop
%timeit b[30]
1000000 loops, best of 3: 111 ns per loop
%timeit c[30]
10000000 loops, best of 3: 51.4 ns per loop
Indexing into a Python list is fastest, followed by extracting a Python object from a numpy array, and slowest is extracting a 64-bit float from an optimized numpy array.
Now let's measure multiplying each element by 2.
%timeit [n*2 for n in a]
100000 loops, best of 3: 4.73 µs per loop
%timeit b*2
100000 loops, best of 3: 2.76 µs per loop
%timeit c*2
100000 loops, best of 3: 7.24 µs per loop
Since b*2 can take advantage of numpy's optimized array, it is the fastest. The Python list takes second place. And a numpy array using Python objects is the slowest.
At least with the tests I ran, indexing into a Python list doesn't seem slow. What is slow for you?

Store it as tuples of prime factors and their powers. A factorization of a factorial (of, let's say, N) will contain ALL primes less than N. So k'th place in each tuple will be k'th prime. And you'll want to keep a separate list of all the primes you've found. You can easily store factorials as high as a few hundred thousand in this notation. If you really need the digits, you can easily restore them from this (just ignore the power of 5 and subtract the power of 5 from the power of 2 when you multiply the factors to get the factorial... cause 5*2=10).

If you need for the future the exact number of a factorial why dont you save in an array not the result but the number you want to 'factorialize'?
E.G.
You have f = factorial(150)
and you have the result 57133839564458545904789328652610540031895535786011264182548375833179829124845398393126574488675311145377107878746854204162666250198684504466355949195922066574942592095735778929325357290444962472405416790722118445437122269675520000000000000000000000000000000000000
But you can simply:
def values():
to_factorial_list = []
...
to_factorial_list.append(values_you_want_to_factorialize)
return to_factorial_list
def setToFactorial(number):
return factorial(number)
print setToFactorial(values()[302])
EDIT:
fair, then my advice is to work both with the logic i suggested as the getsizeof(number) you can merge or work with two arrays, an array to save low factorialized numbers and another to save the big ones, e.g. when getsizeof(number) exceed any size.

Related

Best way to count Greater Than in numpy 2d array

results is 2d numpy array with size 300000
for i in range(np.size(results,0)):
if results[i][0]>=0.7:
count+=1
it takes me 0.7 second in this python code,but I run this in C++ code,it takes less than 0.07 second.
So how to make this python code as fast as possible?
When doing numerical computation for speed, especially in Python, you never want to use for loops if possible. Numpy is optimized for "vectorized" computation, so you want to pass off the work you'd typically do in for loops to special numpy indexing and functions like where.
I did a quick test on a 300,000 x 600 array of random values from 0 to 1 and found the following.
Your code, non-vectorized with one for loop:
226 ms per run
%%timeit
count = 0
for i in range(np.size(n,0)):
if results[i][0]>=0.7:
count+=1
emilaz Solution:
8.36 ms per run
%%timeit
first_col = results[:,0]
x = len(first_col[first_col>.7])
Ethan's Solution:
7.84 ms per run
%%timeit
np.bincount(results[:,0]>=.7)[1]
Best I came up with
6.92 ms per run
%%timeit
len(np.where(results[:,0] > 0.7)[0])
All 4 methods yielded the same answer, which for my data was 90,134. Hope this helps!
Try
first_col=results[:,0]
res =len(first_col[first_col>.7])
Depending on the shape of your matrix, this can be 2-10 times faster than your approach.
You could give the following a try:
np.bincount(results[:,0]>=.7)[1]
Not sure it’s faster, but should produce the correct answer

Python: Very slow execution loops

I am writing a code for proposing typo correction using HMM and Viterbi algorithm. At some point for each word in the text I have to do the following. (lets assume I have 10,000 words)
#FYI Windows 10, 64bit, interl i7 4GRam, Python 2.7.3
import numpy as np
import pandas as pd
for k in range(10000):
tempWord = corruptList20[k] #Temp word read form the list which has all of the words
delta = np.zeros(26, len(tempWord)))
sai = np.chararray(26, len(tempWord)))
sai[:] = '#'
# INITIALIZATION DELTA
for i in range(26):
delta[i][0] = #CALCULATION matrix read and multiplication each cell is different
# INITILIZATION END
# 6.DELTA CALCULATION
for deltaIndex in range(1, len(tempWord)):
for j in range(26):
tempDelta = 0.0
maxDelta = 0.0
maxState = ''
for i in range(26):
# CALCULATION to fill each cell involve in:
# 1-matrix read and multiplication
# 2 Finding Column Max
# logical operation and if-then-else operations
# 7. SAI BACKWARD TRACKING
delta2 = pd.DataFrame(delta)
sai2 = pd.DataFrame(sai)
proposedWord = np.zeros(len(tempWord), str)
editId = 0
for col in delta2.columns:
# CALCULATION to fill each cell involve in:
# 1-matrix read and multiplication
# 2 Finding Column Max
# logical operation and if-then-else operations
editList20.append(''.join(editWord))
#END OF LOOP
As you can see it is computationally involved and When I run it takes too much time to run.
Currently my laptop is stolen and I run this on Windows 10, 64bit, 4GRam, Python 2.7.3
My question: Anybody can see any point that I can use to optimize? Do I have to delete the the matrices I created in the loop before loop goes to next round to make memory free or is this done automatically?
After the below comments and using xrange instead of range the performance increased almost by 30%. I am adding the screenshot here after this change.
I don't think that range discussion makes much difference. With Python3, where range is the iterator, expanding it into a list before iteration doesn't change time much.
In [107]: timeit for k in range(10000):x=k+1
1000 loops, best of 3: 1.43 ms per loop
In [108]: timeit for k in list(range(10000)):x=k+1
1000 loops, best of 3: 1.58 ms per loop
With numpy and pandas the real key to speeding up loops is to replace them with compiled operations that work on the whole array or dataframe. But even in pure Python, focus on streamlining the contents of the iteration, not the iteration mechanism.
======================
for i in range(26):
delta[i][0] = #CALCULATION matrix read and multiplication
A minor change: delta[i, 0] = ...; this is the array way of addressing a single element; functionally it often is the same, but the intent is clearer. But think, can't you set all of that column as once?
delta[:,0] = ...
====================
N = len(tempWord)
delta = np.zeros(26, N))
etc
In tight loops temporary variables like this can save time. This isn't tight, so here is just adds clarity.
===========================
This one ugly nested triple loop; admittedly 26 steps isn't large, but 26*26*N is:
for deltaIndex in range(1,N):
for j in range(26):
tempDelta = 0.0
maxDelta = 0.0
maxState = ''
for i in range(26):
# CALCULATION
# 1-matrix read and multiplication
# 2 Finding Column Max
# logical operation and if-then-else operations
Focus on replacing this with array operations. It's those 3 commented lines that need to be changed, not the iteration mechanism.
================
Make proposedWord a list rather than array might be faster. Small list operations are often faster than array one, since numpy arrays have a creation overhead.
In [136]: timeit np.zeros(20,str)
100000 loops, best of 3: 2.36 µs per loop
In [137]: timeit x=[' ']*20
1000000 loops, best of 3: 614 ns per loop
You have to careful when creating 'empty' lists that the elements are truly independent, not just copies of the same thing.
In [159]: %%timeit
x = np.zeros(20,str)
for i in range(20):
x[i] = chr(65+i)
.....:
100000 loops, best of 3: 14.1 µs per loop
In [160]: timeit [chr(65+i) for i in range(20)]
100000 loops, best of 3: 7.7 µs per loop
As noted in the comments, the behavior of range changed between Python 2 and 3.
In 2, range constructs an entire list populated with the numbers to iterate over, then iterates over the list. Doing this in a tight loop is very expensive.
In 3, range instead constructs a simple object that (as far as I know), consists only of 3 numbers: the starting number, the step (distance between numbers), and the end number. Using simple math, you can calculate any point along the range instead of needing to iterate necessarily. This makes "random access" on it O(1) instead of O(n) when the entire list is interated, and prevents the creation of a costly list.
In 2, use xrange to iterate over a range object instead of a list.
(#Tom: I'll delete this if you post an answer).
It's hard to see exactly what you need to do because of the missing code, but it's clear that you need to learn how to vectorize your numpy code. This can lead to a 100x speedup.
You can probably get rid of all the inner for-loops and replace them with vectorized operations.
eg. instead of
for i in range(26):
delta[i][0] = #CALCULATION matrix read and multiplication each cell is differen
do
delta[:, 0] = # Vectorized form of whatever operation you were going to do.

Why does numpy's fromiter function require specifying the dtype when other array creation routines don't?

In order to improve memory efficiency, I've been working on converting some of my code from lists to generators/iterators where I can. I've found a lot of instances of cases where I am just converting a list I've made to an np.array with the code pattern np.array(some_list).
Notably, some_list is often a list comprehension that is iterating over a generator.
I was looking into np.fromiter to see if I could use the generator more directly (rather than having to first cast it into a list to then convert it into an numpy array), but I noticed that the np.fromiter function, unlike any other array creation routine that uses existing data requires specifying the dtype.
In most of my particular cases, I can make that work(mostly dealing with loglikelihoods so float64 will be fine), but it left me wondering why it was that this is only necessary for the fromiter array creator and not other array creators.
First attempts at a guess:
Memory preallocation?
What I understand is that if you know the dtype and the count, it allows preallocating memory to the resulting np.array, and that if you don't specify the optional count argument that it will "resize the output array on demand". But if you do not specify the count, it would seem that you should be able to infer the dtype on the fly in the same way that you can in a normal np.array call.
Datatype recasting?
I could see this being useful for recasting data into new dtypes, but that would hold for other array creation routines as well, and would seem to merit placement as an optional but not required argument.
A couple ways of restating the question
So why is it that you need to specify the dtype to use np.fromiter; or put another way what are the gains that result from specifying the dtype if the array is going to be resized on demand anyway?
A more subtle version of the same question that is more directly related to my problem:
I know many of the efficiency gains of np.ndarrays are lost when you're constantly resizing them, so what is gained from using np.fromiter(generator,dtype=d) over np.fromiter([gen_elem for gen_elem in generator],dtype=d) over np.array([gen_elem for gen_elem in generator],dtype=d)?
If this code was written a decade ago, and there hasn't been pressure to change it, then the old reasons still apply. Most people are happy using np.array. np.fromiter is mainly used by people who are trying squeeze out some speed from iterative methods of generating values.
My impression is that np.array, the main alternative reads/processes the whole input, before deciding on the dtype (and other properties):
I can force a float return just by changing one element:
In [395]: np.array([0,1,2,3,4,5])
Out[395]: array([0, 1, 2, 3, 4, 5])
In [396]: np.array([0,1,2,3,4,5,6.])
Out[396]: array([ 0., 1., 2., 3., 4., 5., 6.])
I don't use fromiter much, but my sense is that by requiring dtype, it can start converting the inputs to that type right from the start. That could end up producing a faster iteration, though that needs time tests.
I know that the np.array generality comes at a certain time cost. Often for small lists it is faster to use a list comprehension than to convert it to an array - even though array operations are fast.
Some time tests:
In [404]: timeit np.fromiter([0,1,2,3,4,5,6.],dtype=int)
100000 loops, best of 3: 3.35 µs per loop
In [405]: timeit np.fromiter([0,1,2,3,4,5,6.],dtype=float)
100000 loops, best of 3: 3.88 µs per loop
In [406]: timeit np.array([0,1,2,3,4,5,6.])
100000 loops, best of 3: 4.51 µs per loop
In [407]: timeit np.array([0,1,2,3,4,5,6])
100000 loops, best of 3: 3.93 µs per loop
The differences are small, but suggest my reasoning is correct. Requiring dtype helps keep fromiter faster. count does not make a difference in this small size.
Curiously, specifying a dtype for np.array slows it down. It's as though it appends a astype call:
In [416]: timeit np.array([0,1,2,3,4,5,6],dtype=float)
100000 loops, best of 3: 6.52 µs per loop
In [417]: timeit np.array([0,1,2,3,4,5,6]).astype(float)
100000 loops, best of 3: 6.21 µs per loop
The differences between np.array and np.fromiter are more dramatic when I use range(1000) (Python3 generator version)
In [430]: timeit np.array(range(1000))
1000 loops, best of 3: 704 µs per loop
Actually, turning the range into a list is faster:
In [431]: timeit np.array(list(range(1000)))
1000 loops, best of 3: 196 µs per loop
but fromiter is still faster:
In [432]: timeit np.fromiter(range(1000),dtype=int)
10000 loops, best of 3: 87.6 µs per loop
It is faster to apply the int to float conversion on the whole array than to each element during the generation/iteration
In [434]: timeit np.fromiter(range(1000),dtype=int).astype(float)
10000 loops, best of 3: 106 µs per loop
In [435]: timeit np.fromiter(range(1000),dtype=float)
1000 loops, best of 3: 189 µs per loop
Note that the astype resizing operation is not that expensive, only some 20 µs.
============================
array_fromiter(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *keywds) is defined in:
https://github.com/numpy/numpy/blob/eeba2cbfa4c56447e36aad6d97e323ecfbdade56/numpy/core/src/multiarray/multiarraymodule.c
It processes the keywds and calls
PyArray_FromIter(PyObject *obj, PyArray_Descr *dtype, npy_intp count)
in
https://github.com/numpy/numpy/blob/97c35365beda55c6dead8c50df785eb857f843f0/numpy/core/src/multiarray/ctors.c
This makes an initial array ret using the defined dtype:
ret = (PyArrayObject *)PyArray_NewFromDescr(&PyArray_Type, dtype, 1,
&elcount, NULL,NULL, 0, NULL);
The data attribute of this array is grown with 50% overallocation => 0, 4, 8, 14, 23, 36, 56, 86 ..., and shrunk to fit at the end.
The dtype of this array, PyArray_DESCR(ret), apparently has a function that can take value (provided by the iterator next), convert it, and set it in the data.
`(PyArray_DESCR(ret)->f->setitem(value, item, ret)`
In other words, all the dtype conversion is done by the defined dtype. The code would be lot more complicated if it decided 'on the fly' how to convert the value (and all previously allocated ones). Most of the code in this function deals with allocating the data buffer.
I'll hold off on looking up np.array. I'm sure it is much more complex.

How to insert sort through swap or through pop() - insert()?

I've made my own version of insertion sort that uses pop and insert - to pick out the current element and insert it before the smallest element larger than the current one - rather than the standard swapping backwards until a larger element is found. When I run the two on my computer, mine is about 3.5 times faster. When we did it in class, however, mine was much slower, which is really confusing. Anyway, here are the two functions:
def insertionSort(alist):
for x in range(len(alist)):
for y in range(x,0,-1):
if alist[y]<alist[y-1]:
alist[y], alist[y-1] = alist[y-1], alist[y]
else:
break
def myInsertionSort(alist):
for x in range(len(alist)):
for y in range(x):
if alist[y]>alist[x]:
alist.insert(y,alist.pop(x))
break
Which one should be faster? Does alist.insert(y,alist.pop(x)) change the size of the list back and forth, and how does that affect time efficiency?
Here's my quite primitive test of the two functions:
from time import time
from random import shuffle
listOfLists=[]
for x in range(100):
a=list(range(1000))
shuffle(a)
listOfLists.append(a)
start=time()
for i in listOfLists:
myInsertionSort(i[:])
myInsertionTime=time()-start
start=time()
for i in listOfLists:
insertionSort(i[:])
insertionTime=time()-start
print("regular:",insertionTime)
print("my:",myInsertionTime)
I had underestimated your question, but it actually isn't easy to answer. There are a lot of different elements to consider.
Doing lst.insert(y, lst.pop(x)) is a O(n) operation, because lst.pop(x) costs O(len(lst) - x) since list elements must be contiguous, and thus the list has to shift-left by one all the elements after index x, and dually lst.insert(y, _) costs O(len(lst) - y) since it has to shift all the elements right by one.
This means that a naive analysis can give an upperbound of O(n^3) complexity in the worst case for your code. As you suggested this is actually correct [remember that O(n^2) is a subset of O(n^3)], however it's not a tight upperbound because you swap each element only once. So for n times you do n work, and this complexity is indeed O(n * n + n^2) = O(n^2), where the second n^2 refers to the number of comparisons which is n^2 in the worst case. So, asymptotically your solution is the same as insertion sort.
The first algorithm and the second algorithm change the order of iterations over the y. As I have already commented this changes the worst-case for the algorithm.
While insertion sort has its worst-case with reverse-sorted sequences, your algorithm doesn't (which is actually good). This might be a factor that adds to the difference in timings since if you do not use random lists you might use an input that is worst-case for one algorithm but not worst-case for the other.
In [2]: %timeit insertionSort(list(range(10)))
100000 loops, best of 3: 5.46 us per loop
In [3]: %timeit myInsertionSort(list(range(10)))
100000 loops, best of 3: 8.47 us per loop
In [4]: %timeit insertionSort(list(reversed(range(10))))
10000 loops, best of 3: 20.4 us per loop
In [5]: %timeit myInsertionSort(list(reversed(range(10))))
100000 loops, best of 3: 9.81 us per loop
You should always tests with (also) random inputs with different lengths.
The average complexity of insertion sort is O(n^2). Your algorithm might have a lower average time, however it's not entirely trivial to compute it.
I don't get why you use the insert+pop at all when you can use the swap. Trying this on my machine yields a quite big improvement in efficiency since you reduce an O(n^2) component to a O(n) component.
Now, you ask why there was such a big change between the execution at home and in class.
There can be various reasons, for example if you did not use a random generated list you might have used an almost best-case input for insertion sort while it was an almost worst-case input for your algorithm. And similar considerations. Without seeing what you did in class is not possible to give an exact answer.
However I believe there is a very simple answer: you forgot to copy the list before profiling. This is the same error I did when I first posted this answer (quote from the previous answer):
If you want to compare the two functions you should use random
lists:
In [6]: import random
...: input_list = list(range(10))
...: random.shuffle(input_list)
...:
In [7]: %timeit insertionSort(input_list) # Note: no input_list[:]!! Argh!
100000 loops, best of 3: 4.82 us per loop
In [8]: %timeit myInsertionSort(input_list)
100000 loops, best of 3: 7.71 us per loop
Also you should use big inputs to see the difference clearly:
In [11]: input_list = list(range(1000))
...: random.shuffle(input_list)
In [12]: %timeit insertionSort(input_list) # Note: no input_list[:]! Argh!
1000 loops, best of 3: 508 us per loop
In [13]: %timeit myInsertionSort(input_list)
10 loops, best of 3: 55.7 ms per loop
Note also that I, unfortunately, always executed the pairs of profilings in the same order, confirming my previous ideas.
As you can see all calls to insertionSort except the first one used a sorted list as input, which is the best-case for insertion-sort! This means that the timing for insertion sort is wrong (and I'm sorry for having written this before!) While myInsertionSort was always executed with an already sorted list, and guess what? Turns out that one of the worst-cases for myInsertionSort is the sorted list!
think about it:
for x in range(len(alist)):
for y in range(x):
if alist[y]>alist[x]:
alist.insert(y,alist.pop(x))
break
If you have a sorted list the alist[y] > alist[x] comparison will always be false. You might say "perfect! no swaps => no O(n) work => better timing", unfortunately this is false because no swaps also mean no break and hence you are doing n*(n+1)/2 iterations, i.e. the worst-case performance.
Note that this is very bad!!! Real-world data really often is partially sorted, so an algorithm whose worst-case is the sorted list is usually not a good algorithm for real-world use.
Note that this does not change if you replace insert + pop with a simple swap, hence the algorithm itself is not good from this point of view, independently from the implementation.

Are Numpy functions slow?

Numpy is supposed to be fast. However, when comparing Numpy ufuncs with standard Python functions I find that the latter are much faster.
For example,
aa = np.arange(1000000, dtype = float)
%timeit np.mean(aa) # 1000 loops, best of 3: 1.15 ms per loop
%timeit aa.mean # 10000000 loops, best of 3: 69.5 ns per loop
I got similar results with other Numpy functions like max, power. I was under the impression that Numpy has an overhead that makes it slower for small arrays but would be faster for large arrays. In the code above aa is not small: it has 1 million elements. Am I missing something?
Of course, Numpy is fast, only the functions seem to be slow:
bb = range(1000000)
%timeit mean(bb) # 1 loops, best of 3: 551 ms per loop
%timeit mean(list(bb)) # 10 loops, best of 3: 136 ms per loop
Others already pointed out that your comparison is not a real comparison (you are not calling the function + both are numpy).
But to give an answer to the question "Are numpy function slow?": generally speaking, no, numpy function are not slow (or not slower than plain python function). Off course there are some side notes to make:
'Slow' depends off course on what you compare with, and it can always faster. With things like cython, numexpr, numba, calling C-code, ... and others it is in many cases certainly possible to get faster results.
Numpy has a certain overhead, which can be significant in some cases. For example, as you already mentioned, numpy can be slower on small arrays and scalar math. For a comparison on this, see eg Are NumPy's math functions faster than Python's?
To make the comparison you wanted to make:
In [1]: import numpy as np
In [2]: aa = np.arange(1000000)
In [3]: bb = range(1000000)
For the mean (note, there is no mean function in python standard library: Calculating arithmetic mean (average) in Python):
In [4]: %timeit np.mean(aa)
100 loops, best of 3: 2.07 ms per loop
In [5]: %timeit float(sum(bb))/len(bb)
10 loops, best of 3: 69.5 ms per loop
For max, numpy vs plain python:
In [6]: %timeit np.max(aa)
1000 loops, best of 3: 1.52 ms per loop
In [7]: %timeit max(bb)
10 loops, best of 3: 31.2 ms per loop
As a final note, in the above comparison I used a numpy array (aa) for the numpy functions and a list (bb) for the plain python functions. If you would use a list with numpy functions, in this case it would again be slower:
In [10]: %timeit np.max(bb)
10 loops, best of 3: 115 ms per loop
because the list is first converted to an array (which consumes most of the time). So, if you want to rely on numpy in your application, it is important to make use of numpy arrays to store you data (or if you have a list, convert it to an array so this conversion has to be done only once).
You're not calling aa.mean. Put the function call parentheses on the end, to actually call it, and the speed difference will nearly vanish. (Both np.mean(aa) and aa.mean() are NumPy; neither uses Python builtins to do the math.)

Categories