I'm trying to square all the elements in a numpy array but the results are not what I'm expecting (ie some are negative numbers and none are the actual square values). Can anyone please explain what I'm doing wrong and/or whats going on?
import numpy as np
import math
f = 'file.bin'
frameNum = 25600
channelNum = 2640
data = np.fromfile(f,dtype=np.int16)
total = frameNum*channelNum*2
rs = data[:total].reshape(channelNum,-1) #reshaping the data a little. Omitting added values at the end.
I = rs[:,::2] # pull out every other column
print "Shape :", I.shape
print "I : ", I[1,:10]
print "I**2 : ", I[1,:10]**2
print "I*I : ",I[1,:10]* I[1,:10]
print "np.square : ",np.square(I[1,:10])
exit()
Output:
Shape : (2640L, 25600L)
I : [-5302 -5500 -5873 -5398 -5536 -6708 -6860 -6506 -6065 -6363]
I**2 : [ -3740 -27632 20193 -25116 -23552 -25968 4752 -8220 18529 -13479]
I*I : [ -3740 -27632 20193 -25116 -23552 -25968 4752 -8220 18529 -13479]
np.square : [ -3740 -27632 20193 -25116 -23552 -25968 4752 -8220 18529 -13479]
Any suggestions?
It is because of the dtype=np.int16. You are allowing only 16 bits to represent the numbers, and -5302**2 is larger than the maximum value (32767) that a signed 16-bit integer can take. So you're seeing only the lowest 16 bits of the result, the first of which is interpreted (or, from your point of view, misinterpreted) as a sign bit.
Convert your array to a different dtype - for example
I = np.array( I, dtype=np.int32 )
or
I = np.array( I, dtype=np.float )
before performing numerical operations that might go out of range.
With dtype=np.int16, the highest-magnitude integers you can square are +181 and -181. The square of 182 is larger than 32767 and so it overflows. Even with dtype=np.int32 representation, the highest-magnitude integers you can square are +46340 and -46340: the square of 46341 overflows.
This is the reason:
>>> a = np.array([-5302, -5500], dtype=np.int16)
>>> a * a
array([ -3740, -27632], dtype=int16)
This the solution:
b = np.array([-5302, -5500], dtype=np.int32)
>>> b * b
>>> array([28111204, 30250000], dtype=int32)
Change:
data = np.fromfile(f, dtype=np.int16)
into:
data = np.fromfile(f, dtype=np.in16).astype(np.int32)
Related
So, I was working on implementing my own version of the Statsitical Test of Homogeneity in Python where the user would submit a list of lists and the fuction would compute the corresponding chi value.
One issue I found was that my function was removing decimals when performing division, resulting in a somewhat innaccurate chi value for small sample sizes.
Here is the code:
import numpy as np
import scipy.stats as stats
def test_of_homo(list1):
a = np.array(list1)
#n = a.size
num_rows = a.shape[0]
num_cols = a.shape[1]
dof = (num_cols-1)*(num_rows-1)
column_totals = np.sum(a, axis=0)
row_totals = np.sum(a, axis=1)
n = sum(row_totals)
b = np.array(list1)
c = 0
for x in range(num_rows):
for y in range(num_cols):
print("X is " + str(x))
print("Y is " + str(y))
print("a[x][y] is " + str(a[x][y]))
print("row_totals[x] is " + str(row_totals[x]))
print("column_total[y] is " + str(column_totals[y]))
b[x][y] = (float(row_totals[x])*float(column_totals[y]))/float(n)
print("b[x][y] is " + str(b[x][y]))
numerator = ((a[x][y]) - b[x][y])**2
chi = float(numerator)/float(b[x][y])
c = float(c)+ float(chi)
print(b)
print(c)
print(stats.chi2.cdf(c, df=dof))
print(1-(stats.chi2.cdf(c, df=dof)))
listc = [(21, 36, 30), (48, 26, 19)]
test_of_homo(listc)
When the resulted were printed I saw that the b[x][y] values were [[33 29 23] [35 32 25]] instead of like 33.35, 29.97, 23.68 etc. This caused my resulting chi value to be 15.58 with a p of 0.0004 instead of the expected 14.5.
I tried to convert everything to float but that didn't seem to work. Using the decimal.Decimal(b[x][y]) resulted in a type error. Any help?
I think the problem could be due to the numbers you are providing to the function in the list. Note that if you convert a list to a Numpy array without specifying the data type it will try to guess based on the values:
>>> listc = [(21, 36, 30), (48, 26, 19)]
>>> a = np.array(listc)
>>> a.dtype
dtype('int64')
Here is how you force conversion to a desired data type:
>>> a = np.array(listc, dtype=float)
>>> a.dtype
dtype('float64')
Try that in the first and 9th lines of your function and see if it solves the problem. If you do this you shouldn't need to use float() all the time.
The code below calculates the Compounding values starting from $100 and the percentage gains gains. The code below goes from the start off with the entirety of the gains array [20,3,4,55,6.5,-10, 20,-60,5] resulting in 96.25 at the end and then takes off the first index and recalculates the compounding value [3,4,55,6.5,-10, 20,-60,5] resulting in 80.20. It would do this until the end of the gains array [5]. I want to write a code that calculates maximum drawdown as it is calculating f. This would be the compounding results for the first iteration of f [120., 123.6 ,128.544, 199.243, 212.194008 190.9746072, 229.16952864, 91.66781146, 96.25120203] I want to record a value if it is lower than the initial capital Amount value. So the lowest value is 91.67 on the first iteration so that would be the output, and on the second iteration it would be 76.37. Since in the last iteration there is [5] which results in the compounding output of 105 there are no values that go below 100 so it is None as the output. How would I be able to implement this to the code below and get the expected output?
import numpy as np
Amount = 100
def moneyrisk(array):
f = lambda array: Amount*np.cumprod(array/100 + 1, 1)
rep = array[None].repeat(len(array), 0)
rep_t = np.triu(rep, k=0)
final = f(rep_t)[:, -1]
gains= np.array([20,3,4,55,6.5,-10, 20,-60,5])
Expected output:
[91.67, 76.37, 74.164, 71.312, 46.008, 43.2, 48., 40., None]
I think I've understood the requirement. Calculating the compound factors after the np.triu fills the zeroes with ones which means the min method returns a valid value.
import numpy as np
gains= np.array( [20,3,4,55,6.5,-10, 20,-60,5] ) # Gains in %
amount = 100
def moneyrisk( arr ):
rep = arr[ None ].repeat( len(arr), 0 )
rep_t = np.triu( rep, k = 0 )
rep_t = ( 1 + rep_t * .01 ) # Create factors to compound in rep_t
result = amount*(rep_t.cumprod( axis = 1 ).min( axis = 1 ))
# compound and find min value.
return [ x if x < amount else None for x in result ]
# Set >= amount to None in a list as numpy floats can't hold None
moneyrisk( gains )
# [91.667811456, 76.38984288, 74.164896, 71.3124, 46.008, 43.2, 48.0, 40.0, None]
I am attempting to convert a 3 channel numpy array to a single channel numpy array. I want to combine all 3 element values into 1 number using:
x << 16 + y << 8 + z
My code below does that but it seems to make alot of the numbers zero. Is that correct? Or am I doing something wrong? Should those last 2 numbers be zero or something else?
ar = np.array((
((255,255,255),),
((255,20,255),),
((0,255,255),), # this becomes zero, is that correct?
((22,10,12),), # this becomes zero, is that correct?
), dtype='uint8')
c1,c2,c3 = cv2.split(ar)
single = np.int32(c1) << 16 + np.int32(c2) << 8 + np.int32(c3)
print(single)
print(ar.shape)
[[1069547520]
[ 522240]
[ 0]
[ 0]]
(4, 1, 3)
Add a column of zeros to make the array 4 bytes wide:
ar4 = np.insert(ar, 0, 0, 2)
Then simply view it as a big-endian array of 4-byte integers:
ar4.view('>u4')
This gives:
array([[[16777215]],
[[16717055]],
[[ 65535]],
[[ 1444364]]], dtype=uint32)
The only step here which really takes time is np.insert(), so if you are able to add that extra column while loading your data, the rest of the transformation is basically free (i.e. does not require copying data).
I am doing a project on encrypting data using RSA algo and for that, I have taken a .wav file as an input and reading it by using wavfile and I can apply the key (3, 25777) but when I am applying the decryption key (16971,25777) it is giving wrong output like this:
The output I'm getting:
[[ 0 -25777]
[ 0 -25777]
[ 0 -25777]
...
[-25777 -25777]
[-15837 -15837]
[ -8621 1]]
output i want:
[[ 0 -1]
[ 2 -1]
[ 2 -3]
...
[-9 -5]
[-2 -2]
[-4 1]]
This was happening only with the decryption part of the array so I decided to convert the 2d array to a 2d list. After that, it is giving me the desired output but it is taking a lot of time to apply the keys to all the elements of the list(16min, in case of array it was 2sec). I don't understand why it is happening and if there is any other solution to this problem ?
here is the encryption and decryption part of the program:
#encryption
for i in range(0, tup[0]): #tup[0] is the no of rows
for j in range(0, tup[1]): #tup[1] is the no of cols
x = data[i][j]
x = ((pow(x,3)) % 25777) #applying the keys
data[i][j] = x #storing back the updated value
#decryption
data= data.tolist() #2d array to list of lists
for i1 in (range(len(data)):
for j1 in (range(len(data[i1]))):
x1 = data[i1][j1]
x1 = (pow(x1, 16971)%25777) #applying the keys
data[i1][j1] = x1
Looking forward to suggestions. Thank you.
The occurrence of something like pow(x1, 16971) should give you pause. This will for almost any integer x1 yield a result which a 64 bit int cannot hold. Which is the reason numpy gives the wrong result, because numpy uses 64 bit or 32 bit integers on the most common platforms. It is also the reason why plain python is slow, because while it can handle large integers this is costly.
A way around this is to apply the modulus in between multiplications, that way numbers remain small and can be readily handled by 64 bit arithmetic.
Here is a simple implementation:
def powmod(b, e, m):
b2 = b
res = 1
while e:
if e & 1:
res = (res * b2) % m
b2 = (b2*b2) % m
e >>= 1
return res
For example:
>>> powmod(2000, 16971, 25777)
10087
>>> (2000**16971)%25777
10087
>>> timeit(lambda: powmod(2000, 16971, 25777), number=100)
0.00031936285085976124
>>> timeit(lambda: (2000**16971)%25777, number=100)
0.255017823074013
For some reason I'm having a heck of a time figuring out how to do this in Python.
I am trying to represent a binary string in a string variable, and all I want it to have is
0010111010
However, no matter how I try to format it as a string, Python always chops off the leading zeroes, which is giving me a headache in trying to parse it out.
I'd hoped this question would have helped, but it doesn't really...
Is there a way to force Python to stop auto-converting my string to an integer?
I have tried the following:
val = ""
if (random.random() > 0.50):
val = val + "1"
else
val = val + "0"
and
val = ""
if (random.random() > 0.50):
val = val + "%d" % (1)
else:
val = val + "%d" % (0)
I had stuck it into an array previously, but ran into issues inserting that array into another array, so I figured it would just be easier to parse it as a string.
Any thoughts on how to get my leading zeroes back? The string is supposed to be a fixed length of 10 bits if that helps.
Edit:
The code:
def create_string(x):
for i in xrange(10): # 10 random populations
for j in xrange(int(x)): # population size
v = ''.join(choice(('0','1')) for _ in range(10))
arr[i][j] = v
return arr
a = create_string(5)
print a
Hopefully the output I'm seeing will show you why I'm having issues:
[[ 10000100 1100000001 101010110 111011 11010111]
[1001111000 1011011100 1110110111 111011001 10101000]
[ 110010001 1011010111 1100111000 1011100011 1000100001]
[ 10011010 1000011001 1111111010 11100110 110010101]
[1101010000 1010110101 110011000 1100001001 1010100011]
[ 10001010 1100000001 1110010000 10110000 11011010]
[ 111011 1000111010 1100101 1101110001 110110000]
[ 110100100 1100000000 1010101001 11010000 1000011011]
[1110101110 1100010101 1110001110 10011111 101101100]
[ 11100010 1111001010 100011101 1101010 1110001011]]
The issue here isn't only with printing, I also need to be able to manipulate them on a per-element basis. So if I go to play with the first element, then it returns a 1, not a 0 (on the first element).
If I understood you right, you could do it this way:
a = 0b0010111010
'{:010b}'.format(a)
#The output is: '0010111010'
Python 2.7
It uses string format method.
This is the answer if you want to represent the binary string with leading zeros.
If you are just trying to generate a random string with a binary you could do it this way:
from random import choice
''.join(choice(('0','1')) for _ in range(10))
Update
Unswering your update.
I made a code which has a different output if compared to yours:
from random import choice
from pprint import pprint
arr = []
def create_string(x):
for i in xrange(10): # 10 random populations
arr.append([])
for j in xrange(x): # population size
v = ''.join(choice(('0','1')) for _ in range(10))
arr[-1].append(v)
return arr
a = create_string(5)
pprint(a)
The output is:
[['1011010000', '1001000010', '0110101100', '0101110111', '1101001001'],
['0010000011', '1010011101', '1000110001', '0111101011', '1100001111'],
['0011110011', '0010101101', '0000000100', '1000010010', '1101001000'],
['1110101111', '1011111001', '0101100110', '0100100111', '1010010011'],
['0100010100', '0001110110', '1110111110', '0111110000', '0000001010'],
['1011001011', '0011101111', '1100110011', '1100011001', '1010100011'],
['0110011011', '0001001001', '1111010101', '1110010010', '0100011000'],
['1010011000', '0010111110', '0011101100', '1111011010', '1011101110'],
['1110110011', '1110111100', '0011000101', '1100000000', '0100010001'],
['0100001110', '1011000111', '0101110100', '0011100111', '1110110010']]
Is this what you are looking for?
How about the following:
In [30]: ''.join('1' if random.random() > 0.50 else '0' for i in xrange(10))
Out[30]: '0000110111'
This gives a ten-character binary string; there's no chopping off of leading zeroes.
If you don't need to vary digit probability (the 0.50 above), a slightly more concise version is:
In [39]: ''.join(random.choice('01') for i in xrange(10))
Out[39]: '0001101001'