amplitude spectrum in Python - python

I have a given array with a length of over 1'000'000 and values between 0 and 255 (included) as integers. Now I would like to plot on the x-axis the integers from 0 to 255 and on the y-axis the quantity of the corresponding x value in the given array (called Arr in my current code).
I thought about this code:
list = []
for i in range(0, 256):
icounter = 0
for x in range(len(Arr)):
if Arr[x] == i:
icounter += 1
list.append(icounter)
But is there any way I can do this a little bit faster (it takes me several minutes at the moment)? I thought about an import ..., but wasn't able to find a good package for this.

Use numpy.bincount for this task (look for more details here)
import numpy as np
list = np.bincount(Arr)

While I completely agree with the previous answers that you should use a standard histogram algorithm, it's quite easy to greatly speed up your own implementation. Its problem is that you pass through the entire input for each bin, over and over again. It would be much faster to only process the input once, and then write only to the relevant bin:
def hist(arr):
nbins = 256
result = [0] * nbins # or np.zeroes(nbins)
for y in arr:
if y>=0 and y<nbins:
result[y] += 1
return result

Related

Efficiently adding two different sized one dimensional arrays

I want to add two numpy arrays of different sizes starting at a specific index. As I need to do this couple of thousand times with large arrays, this needs to be efficient, and I am not sure how to do this efficiently without iterating through each cell.
a = [5,10,15]
b = [0,0,10,10,10,0,0]
res = add_arrays(b,a,2)
print(res) => [0,0,15,20,25,0,0]
naive approach:
# b is the bigger array
def add_arrays(b, a, i):
for j in range(len(a)):
b[i+j] = a[j]
You might assign smaller one into zeros array then add, I would do it following way
import numpy as np
a = np.array([5,10,15])
b = np.array([0,0,10,10,10,0,0])
z = np.zeros(b.shape,dtype=int)
z[2:2+len(a)] = a # 2 is offset
res = z+b
print(res)
output
[ 0 0 15 20 25 0 0]
Disclaimer: I assume that offset + len(a) is always less or equal len(b).
Nothing wrong with your approach. You cannot get better asymptotic time or space complexity. If you want to reduce code lines (which is not an end in itself), you could use slice assignment and some other utils:
def add_arrays(b, a, i):
b[i:i+len(a)] = map(sum, zip(b[i:i+len(a)], a))
But the functional overhead should makes this less performant, if anything.
Some docs:
map
sum
zip
It should be faster than Daweo answer, 1.5-5x times (depending on the size ratio between a and b).
result = b.copy()
result[offset: offset+len(a)] += a

`numpy.nanpercentile` is extremely slow

numpy.nanpercentile is extremely slow.
So, I wanted to use cupy.nanpercentile; but there is not cupy.nanpercentile implemented yet.
Do someone have solution for it?
I also had a problem with np.nanpercentile being very slow for my datasets. I found a wokraround that lets you use the standard np.percentile. And it can also be applied to many other libs.
This one should solve your problem. And it also works alot faster than np.nanpercentile:
arr = np.array([[np.nan,2,3,1,2,3],
[np.nan,np.nan,1,3,2,1],
[4,5,6,7,np.nan,9]])
mask = (arr >= np.nanmin(arr)).astype(int)
count = mask.sum(axis=1)
groups = np.unique(count)
groups = groups[groups > 0]
p90 = np.zeros((arr.shape[0]))
for g in range(len(groups)):
pos = np.where (count == groups[g])
values = arr[pos]
values = np.nan_to_num (values, nan=(np.nanmin(arr)-1))
values = np.sort (values, axis=1)
values = values[:,-groups[g]:]
p90[pos] = np.percentile (values, 90, axis=1)
So instead of taking the percentile with the nans, it sorts the rows by the amount of valid data, and takes the percentile of those rows separated. Then adds everything back together. This also works for 3D-arrays, just add y_pos and x_pos instead of pos. And watch out for what axis you are calculating over.
def testset_gen(num):
init=[]
for i in range (num):
a=random.randint(65,122) # Dummy name
b=random.randint(1,100) # Dummy value: 11~100 and 10% of nan
if b<11:
b=np.nan # 10% = nan
init.append([a,b])
return np.array(init)
np_testset=testset_gen(30000000) # 468,751KB
def f1_np (arr, num):
return np.percentile (arr[:,1], num)
# 55.0, 0.523902416229248 sec
print (f1_np(np_testset[:,1], 50))
def cupy_nanpercentile (arr, num):
return len(cp.where(arr > num)[0]) / (len(arr) - cp.sum(cp.isnan(arr))) * 100
# 55.548758317136446, 0.3640251159667969 sec
# 43% faster
# If You need same result, use int(). But You lose saved time.
print (cupy_nanpercentile(cp_testset[:,1], 50))
I can't imagine How test result takes few days. With my computer, It seems 1 Trillion line of data or more. Because of this, I can't reproduce same problem due to lack of resource.
Here's an implementation with numba. After it's been compiled it is more than 7x faster than the numpy version.
Right now it is set up to take the percentile along the first axis, however it could be changed easily.
#numba.jit(nopython=True, cache=True)
def nan_percentile_axis0(arr, percentiles):
"""Faster implementation of np.nanpercentile
This implementation always takes the percentile along axis 0.
Uses numba to speed up the calculation by more than 7x.
Function is equivalent to np.nanpercentile(arr, <percentiles>, axis=0)
Params:
arr (np.array): Array to calculate percentiles for
percentiles (np.array): 1D array of percentiles to calculate
Returns:
(np.array) Array with first dimension corresponding to
values as passed in percentiles
"""
shape = arr.shape
arr = arr.reshape((arr.shape[0], -1))
out = np.empty((len(percentiles), arr.shape[1]))
for i in range(arr.shape[1]):
out[:,i] = np.nanpercentile(arr[:,i], percentiles)
shape = (out.shape[0], *shape[1:])
return out.reshape(shape)

Matrix of variable size [i x j] (Python, Numpy)

I am attempting to build a simple genetic algorithm that will optimize to an input string, but am having trouble building the [individual x genome] matrix (row n is individual n's genome.) I want to be able to change the population size, mutation rate, and other parameters to study how that affects convergence rate and program efficiency.
This is what I have so far:
import random
import itertools
import numpy as np
def evolve():
goal = 'Hello, World!' #string to optimize towards
ideal = list(goal)
#converting the string into a list of integers
for i in range (0,len(ideal)):
ideal [i] = ord(ideal[i])
print(ideal)
popSize = 10 #population size
genome = len(ideal) #determineing the length of the genome to be the length of the target string
mut = 0.03 #mutation rate
S = 4 #tournament size
best = float("inf") #initial best is very large
maxVal = max(ideal)
minVal = min(ideal)
print (maxVal)
i = 0 #counting variables assigned to solve UnboundLocalError
j = 0
print(maxVal, minVal)
#constructing initial population array (individual x genome)
pop = np.empty([popSize, len(ideal)])
for i, j in itertools.product(range(i), range(j)):
pop[i, j] = [i, random.randint(minVal,maxVal)]
print(pop)
This produces a matrix of the population size with the correct genome length, but the genomes are something like:
[ 6.91364167e-310 6.91364167e-310 1.80613009e-316 1.80613009e-316
5.07224590e-317 0.00000000e+000 6.04100487e+151 3.13149876e-120
1.11787892e+253 1.47872844e-028 7.34486815e+223 1.26594941e-118
7.63858409e+228]
I need them to be random integers corresponding to random ASCII characters .
What am I doing wrong with this method?
Is there a way to make this faster?
I found my current method here:
building an nxn matrix in python numpy, for any n
I found another method that I do not understand, but seems faster and simper, if I can use it here I would like to.
Initialise numpy array of unknown length
Thank you for any assistance you can provide.
Your loop isn't executing because i and j are both 0, so range(i) and range(j) are empty. Also you can't assign a list [i,random] to an array value (np.empty defaults to np.float64). I've simply changed it to only store the random number, but if you really want to store a list, you can change the creation of pop to be pop = np.empty([popSize, len(ideal)],dtype=list)
Otherwise use this for the last lines:
for i, j in itertools.product(range(popSize), range(len(ideal))):
pop[i, j] = random.randint(minVal,maxVal)

2D Bit matrix with every possible combination

I need to create a Python generator which yields every possible combination of a 2D bit matrix.
The length of each dimension is variable.
So for a 2x2 matrix:
1.
00
00
2.
10
00
3.
11
00
....
x.
00
01
Higher lengths of the dimentions (up to 200*1000) need to work too.
In the end, I will not need all of the combinations. I need only the ones where the sum in each line is 1. But I need all combinations where this is the case. I would filter them before yielding. Printing is not required.
I want to use this as filter masks to test all possible variations of a data set.
Generating variations like this must be a common problem. Maybe there is even a good library for Python?
Going through all possible values of a bit vector of a given size is exactly what a counter does. It's not evident from your question what order you want, but it looks much like a Gray counter. Example:
from sys import stdout
w,h=2,2
for val in range(2**(w+h)):
gray=val^(val>>1)
for y in range(h):
for x in range(w):
stdout.write('1' if gray & (1<<(w*y+x)) else '0')
stdout.write('\n')
stdout.write('\n')
Note that the dimensions of the vector don't matter to the counter, only the size. Also, while this gives every static pattern, it does not cover all possible transitions.
This can be done using permutation from itertools in the following way.
import itertools
dim=2
dimension = dim*dim
data = [0 for i in range(0,dimension)] + [1 for i in range(0,dimension)]
count = 1
for matrix in set(itertools.permutations(data,dimension)):
print('\n',count,'.')
for i in range(0,dimension,dim):
print(' '.join(map(str,matrix[i:i+dim])))
count+=1
P.S: This will be good for 2X2 matrix but a little bit time consuming and memory consuming for higher order. I would be glad if some one provides the less expensive algorithms for this.
You can generate every possibility of length 2 by using every number from 0 to 4(2 to the power of 2).
0 -> 00
1 -> 01
2 -> 10
3 -> 11
For the displaying part of a number as binary, bin function can be used.
Since you have 2x2 matrix, you need 2 numbers(i and j), each for a row. Then you can just convert these numbers to binary and print them.
for i in range(4):
for j in range(4):
row1 = bin(i)[2:].zfill(2)
row2 = bin(j)[2:].zfill(2)
print row1, "\n" , row2, "\n"
EDIT:
I have found zfill function which fills a string with zeros to make it fixed length.
>>> '1'.zfill(5)
'00001'
Another generic solution might be:
import re
dim1 = 2
dim2 = 2
n = dim1 * dim2
i = 0
limit = 2**n
while i < limit:
print '\n'.join(re.findall('.'*dim2, bin(i)[2:].zfill(n))), '\n'
i += 1
you could do something like this for 3x3 binary matrix:
for i in range(pow(2,9)):
p = '{0:09b}'.format(i)
print(p)
x = []
x.append([p[0],p[1],p[2]])
x.append([p[3],p[4],p[5]])
x.append([p[6],p[7],p[8]])
for i in range(3):
x[i] = map(int, x[i])

Compress an array in python?

Is there a way to "compress" an array in python so as to keep the same range but simply decrease the number of elements to a given value?
For example I have an array with 1000 elements and I want to modify it to have 100. Specifically I have a numpy array that is
x = linspace(-1,1,1000)
But because of the way in which I am using it in my project, I can't simply recreate it using linspace as it will not always be in the domain of -1 to 1 and have 1000 elements. These parameters change and I don't have access to them in the function I am defining. So I need a way to compress the array while keeping the -1 to 1 mapping. Think of it as decreasing the "resolution" of the array. Is this possible with any built in functions or different libraries?
A simple way to "resample" your array is to group it into chunks, then average each chunk:
(Chunking function is from this answer)
# Chunking function
def chunks(l, n):
for i in xrange(0, len(l), n):
yield l[i:i+n]
# Resampling function
def resample(arr, newLength):
chunkSize = len(arr)/newLength
return [np.mean(chunk) for chunk in chunks(arr, chunkSize)]
# Example:
import numpy as np
x = np.linspace(-1,1,15)
y = resample(x, 5)
print y
# Result:
# [-0.85714285714285721, -0.4285714285714286, -3.7007434154171883e-17, 0.42857142857142844, 0.8571428571428571]
As you can see, the range of the resampled array does drift inward, but this effect would be much smaller for larger arrays.
It's not clear to me whether the arrays will always be generated by numpy.linspace or not. If so, there are simpler ways of doing this, like simply picking each nth member of the original array, where n is determined by the "compression" ratio:
def linearResample(arr, newLength):
spacing = len(arr) / newLength
return arr[::spacing]
You could pick items at random to reduce any bias you have in the reduction. If the original sample is unordered it would just be:
import random
sample = range(1000)
def reduce(sample, count):
work = sample[:]
random.shuffle(work)
return work[:count]
If order matters, then use enum to track position and reassemble
def reduce(sample, count):
indexed = [item for item in enumerate(sample)]
random.shuffle(indexed)
trimmed = indexed[:count]
trimmed.sort()
return [item for index,item in trimmed]

Categories