Why numpy.sum does not give me the right result? - python

The sum of a standard python list say x=list(range(100000)) does not coincide with a sum of the same sequence x_array=np.array(x). In the first case I obtain sum(x)=4999950000, which is the correct result. Using numpy sum I obtain np.sum(x_array))=704982704. This troubles me because i am a beginner with this language. Does anyone have an explanation for this difference? Thank you. The code is
import numpy as np
x=list(range(100000))
print("sum x using standard python function = ",sum(x))
x_array=np.array(x)
print("sum x using numpy =",np.sum(x_array))

Your NumPy defaults to standard 32-bit integers; Python will switch to indefinitely long integers as needed.
You got bitten by overflow/wraparound.
4999950000 % (2**32) ==> 704982704

Related

Converting MATLAB random function to python

My task is to convert one big MATLAB file into python.
There is a line in MATLAB
weightsEI_slow = random('binom',1,0.2,[EneuronNum_slow,IneuronNum_slow]);
I am trying to convert this into python code, I am not quite finding the right documentation. I looked for numpy library too. Does any one have any suggestions?
It looks like you generate a random number that follows the Binomial distribution with probability p=0.2 and sample size n=1. In turn, you can leverage numpy
import numpy as np
np.random.binomial(n=1, p=0.2)
>0
If you require replicability, add np.random.seed(3408) before the number is sampled. Otherwise, the output might be 0 or 1 depending on the execution. Of course, you can switch in another integer value as the seed instead of 3408.

How to handle very large integers in python?

I wrote a code for getting the Fibonacci sequence, using the concept of the Fibonacci matrix, inspired from this Instagram post
def f(n):
import numpy as np
m=np.array([[1,1],[1,0]],dtype=np.int64)
m1=m
for i in range(0,n):
m1=np.matmul(m1,m)
return m1[0,1]
but after n=93, it starts to give negative numbers. If I use np.int32 as dtype then after n=47, it starts to give negative and erroneous results.
I am using python 3.9 and I want my result to be integers(not float) what to do so that I can get correct results for n=1000 or larger?
Numpy ctypes are implemented natively, so a long datatype will start overflowing when reaching 64 bits. However, you can use dtype=object to use python's arbitrarily large integers, this could get quite slow to process though.

Generate a numpy array from a python function

I have what I thought would be a simple task in numpy, but I'm having trouble.
I have a function which takes an index in the array and returns the value that belongs at that index. I would like to, efficiently, write the values into a numpy array.
I have found numpy.fromfunction, but it doesn't behave remotely like the documentation suggests. It seems to "vectorise" the function, which means that instead of passing the actual indices it passes a numpy array of indices:
def vsin(i):
return float(round(A * math.sin((2 * pi * wf) * i)))
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
# TypeError: only length-1 arrays can be converted to Python scalars
(if we use a debugger to inspect i, it is a numpy.array instance.)
So, if we try to use numpy's vectorised sin function:
def vsin(i):
return (A * numpy.sin((2 * pi * wf) * i)).astype(numpy.int16)
numpy.fromfunction(vsin, (len,), dtype=numpy.int16)
We don't get a type error, but if len > 2**15 we get discontinuities chopping accross our oscillator, because numpy is using int16_t to represent the index!
The point here isn't about sin in particular: I want to be able to write arbitrary python functions like this (whether a numpy vectorised version exists or not) and be able to run them inside a tight C loop (rather than a roundabout python one), and not have to worry about integer wraparound.
Do I really have to write my own cython extension in order to be able to do this? Doesn't numpy have support for running python functions once per item in an array, with access to the index?
It doesn't have to be a creation function: I can use numpy.empty (or indeed, reuse an existing array from somewhere else.) So a vectorised transformation function would also do.
I think the issue of integer wraparound is unrelated to numpy's vectorized sin implementation and even the use of python or C.
If you use a 2-byte signed integer and try to generate an array of integer values ranging from 0 to above 32767, you will get a wrap-around error. The array will look like:
[0, 1, 2, ... , 32767, -32768, -32767, ...]
The simplest solution, assuming memory is not too tight, is to use more bytes for your integer array generated by fromfunction so you don't have a wrap-around problem in the first place (up to a few billion):
numpy.fromfunction(vsin, (len,), dtype=numpy.int32)
numpy is optimized to work fast on arrays by passing the whole array around between vectorized functions. I think in general the numpy tools are inconvenient for trying to run scalar functions once per array element.

Comparing Matlab and Numpy code that uses random number generation

Is there some way to make the random number generator in numpy generate the same random numbers as in Matlab, given the same seed?
I tried the following in Matlab:
>> rng(1);
>> randn(2, 2)
ans =
0.9794 -0.5484
-0.2656 -0.0963
And the following in iPython with Numpy:
In [21]: import numpy as np
In [22]: np.random.seed(1)
In [23]: np.random.randn(2, 2)
Out[23]:
array([[ 1.624, -0.612],
[-0.528, -1.073]])
Values in both the arrays are different.
Or could someone suggest a good idea to compare two implementations of the same algorithm in Matlab and Python that uses random number generation.
Thanks!
Just wanted to further clarify on using the twister/seeding method: MATLAB and numpy generate the same sequence using this seeding but will fill them out in matrices differently.
MATLAB fills out a matrix down columns, while python goes down rows. So in order to get the same matrices in both, you have to transpose:
MATLAB:
rand('twister', 1337);
A = rand(3,5)
A =
Columns 1 through 2
0.262024675015582 0.459316887214567
0.158683972154466 0.321000540520167
0.278126519494360 0.518392820597537
Columns 3 through 4
0.261942925565145 0.115274226683149
0.976085284877434 0.386275068634359
0.732814552690482 0.628501179539712
Column 5
0.125057926335599
0.983548605143641
0.443224868645128
python:
import numpy as np
np.random.seed(1337)
A = np.random.random((5,3))
A.T
array([[ 0.26202468, 0.45931689, 0.26194293, 0.11527423, 0.12505793],
[ 0.15868397, 0.32100054, 0.97608528, 0.38627507, 0.98354861],
[ 0.27812652, 0.51839282, 0.73281455, 0.62850118, 0.44322487]])
As Bakuriu suggest it works using MATLABs twister:
MATLAB:
>> rand('twister', 1337)
>> rand()
ans =
0.2620
Python (Numpy):
>>> import numpy as np
>>> np.random.seed(1337)
>>> np.random.random()
0.2620246750155817
One way to ensure the same numbers are fed to your process is to generate them in one of the two languges, save them and import into the other language. This is fairly easy, you could write them in a simple textfile.
If this is not possible or desirable, you can also make sure the numbers are the same by doing the generation of the pseudo random numbers yourself. Here is a site that shows a very simple example of an easy to implement algorithm: Build your own simple random numbers
If the quality of your homemade random generator is not sufficient, you can build a random generation function in one language, and call it from the other. The easiest path is probably to call matlab from python.
If you are feeling lucky, try playing around with the settings. For example try using the (outdated) seed input to matlabs random functions. Or try using different kinds of generators. I believe the default in both languages is mersenne twister, but if this implementation is not the same, perhaps a simpler one is.
How about running a matlab script to get the random numbers based upon a seed, from within your python code?

Python SciPy FFT function - Input?

I am currently writing some code which is supposed to perform FFT on a set of data. I have a python list of points and I can easily create a time list. When I run fft(datalist), I get the 'TypeError: 'numpy.ndarray' object is not callable' error. I think (but please correct me) the issue is that the list is one dimension and they have no attachment to time at all by using that one line of code above. My question is, do I have to input a two dimensional array with time and data points? or am I completely wrong and have to rethink?
Thanks, Mike
Edit - forgot to add some code. The t=time. Could it be because the number of entries in the array isnt equal to 2^n where N is an integer?
sample_rate=10.00
t=r_[0:191.6:1/sample_rate]
S = fft([mylist])
print S
The Numpy and SciPy fft functions are looking to have numpy arrays as input, not native python lists. Also they work just fine with lengths that are not powers of two. You probably just need to cast your list as an array before passing it to the fft.
From your example code above try:
from numpy.fftpack import fft
from numpy import array
""" However you generate your list goes here """
S = fft(array([mylist]))

Categories