Suppose I have a huge array of data and sample of them are :
x= [ 511.31, 512.24, 571.77, 588.35, 657.08, 665.49, -1043.45, -1036.56,-969.39, -955.33]
I used the following code to generate all possible pairs
Pairs=[(x[i],x[j]) for i in range(len(x)) for j in range(i+1, len(x))]
Which gave me all possible pairs. Now, I would like to group these pairs if they are within threshold values of -25 or +25 and label them accordingly.
Any idea or advice on how to do this? Thanks in advance
If I understood correctly your problem, the code below should do the trick. The idea is to generate a dictionary whose keys are the mean value, and just keep appending data onto it:
import numpy as np #I use numpy for the mean.
#Your threshold
threshold = 25
#A dictionary will hold the relevant pairs
mylist = {}
for i in Pairs:
#Check for the threshold and discard otherwise
diff = abs(i[1]-i[0])
if(diff < threshold):
#Name of the entry in the dictionary
entry = str('%d'%int(np.mean(i)))
#If the entry already exists, append. Otherwise, create a container list
if(entry in mylist):
mylist[entry].append(i)
else:
mylist[entry] = [i]
which results in the following output:
{'-1040': [(-1043.45, -1036.56)],
'-962': [(-969.39, -955.33)],
'511': [(511.1, 511.31),
(511.1, 512.24),
(511.1, 512.35),
(511.31, 512.24),
(511.31, 512.35)],
'512': [(511.1, 513.35),
(511.31, 513.35),
(512.24, 512.35),
(512.24, 513.35),
(512.35, 513.35)],
'580': [(571.77, 588.35)],
'661': [(657.08, 665.49)]}
This should be a fast way to do that:
import numpy as np
from scipy.spatial.distance import pdist
# Input data
x = np.array([511.31, 512.24, 571.77, 588.35, 657.08,
665.49, -1043.45, -1036.56,-969.39, -955.33])
thres = 25.0
# Compute pairwise distances
# default distance metric is'euclidean' which
# would be equivalent but more expensive to compute
d = pdist(x[:, np.newaxis], 'cityblock')
# Find distances within threshold
d_idx = np.where(d <= thres)[0]
# Convert "condensed" distance indices to pair of indices
r = np.arange(len(x))
c = np.zeros_like(r, dtype=np.int32)
np.cumsum(r[:0:-1], out=c[1:])
i = np.searchsorted(c[1:], d_idx, side='right')
j = d_idx - c[i] + r[i] + 1
# Get pairs of values
v_i = x[i]
v_j = x[j]
# Find means
m = np.round((v_i + v_j) / 2).astype(np.int32)
# Print result
for idx in range(len(m)):
print(f'{m[idx]}: ({v_i[idx]}, {v_j[idx]})')
Output
512: (511.31, 512.24)
580: (571.77, 588.35)
661: (657.08, 665.49)
-1040: (-1043.45, -1036.56)
-962: (-969.39, -955.33)
I create a list in python with 17 other lists inside. See the exemple:
[[3.29588, 3.14241, 2.53874, 1.87257, 1.01365, 0.844504, 0.761601, 1.28007, 1.95795, 2.33491, 3.21032, 3.6976],
[3.74857, 3.4343, 2.97245, 1.7386, 0.931359, 0.82109, 0.840537, 1.46436, 1.75026, 2.467, 3.36575, 3.6428],
[3.2517, 3.37892, 2.84753, 1.7375, 1.11921, 0.761399, 0.780625, 1.40971, 1.80878, 2.49257, 3.0503, 3.22026],
[4.86471, 3.95591, 3.31745, 2.16819, 1.40167, 0.962902, 1.01542, 1.56245, 2.2488, 3.30197, 3.78625, 4.16218],
[4.37859, 3.58889, 2.18892, 1.85142, 1.36302, 1.04413, 1.14967, 1.63279, 2.06895, 3.36799, 3.64174, 4.00779],
[3.78213, 2.85967, 2.29597, 2.0755, 1.32856, 1.07074, 1.05019, 1.43226, 2.01495, 2.96983, 4.20358, 3.97129],
[4.11538, 2.98188, 2.51697, 1.81049, 1.23526, 0.982138, 1.09718, 1.55118, 2.42966, 3.4746, 3.70046, 4.6149],
[4.28626, 4.00553, 3.36899, 2.40897, 1.40696, 0.961761, 0.881263, 1.25325, 2.05434, 2.54193, 4.13187, 4.60115],
[4.15797, 3.16266, 3.31037, 2.16276, 1.42262, 0.924327, 1.11161, 1.57012, 2.21882, 2.94404, 4.18211, 4.19463],
[3.94132, 3.74934, 3.52944, 1.98444, 1.33248, 0.974261, 0.976807, 1.63763, 1.96279, 3.17012, 2.96314, 4.23448],
[4.21067, 4.1027, 3.48602, 2.26189, 1.36373, 1.06551, 1.06262, 1.24214, 2.11701, 3.19951, 3.83816, 4.18072],
[4.52377, 4.02346, 3.10936, 2.41148, 1.44596, 1.03784, 0.997611, 1.66809, 2.2909, 3.13247, 4.07816, 3.4008],
[2.40782, 3.18881, 2.95376, 1.84203, 1.28495, 0.957945, 1.03246, 1.80852, 2.15366, 2.74635, 4.26849, 4.12046],
[4.48346, 3.81883, 2.96019, 2.34712, 1.33384, 1.01678, 1.09052, 1.44302, 2.18529, 3.29472, 3.90009, 4.67098],
[4.34282, 4.45031, 3.55955, 2.35169, 1.44429, 1.02647, 1.24539, 1.73125, 2.3716, 3.3476, 4.21021, 4.11485],
[4.5259, 4.21495, 3.26138, 2.38399, 1.55304, 1.21289, 1.17101, 1.79027, 2.24747, 3.03854, 3.31494, 3.70687],
[4.47717, 4.6265, 3.10359, 2.15151, 1.26597, 0.886686, 1.18106, 1.67292, 2.45298, 3.21713, 4.20611, 4.35356],
[4.10159, 3.83354, 2.95835, 1.65168, 1.26774, 0.846464, 0.943836, 1.49787, 2.01609, 2.84914, 3.47291, 3.63075]]
How i create a mean to each elemento of this lists. i need take the first element of each list and calculate the mean, after i need take the second element of each list and calculate the mean... And this for each one of the twelve elements of this list. In the end, i'll have just one list, with 12 elements, that represent the mean of the twelve elements of each list.
Thank you so much for the help!
Here is a solution (lst is your list of lists):
means = [sum(sublst[i] for sublst in lst) / len(lst) for i in range(len(lst[0]))]
Using map and zip functions would be appropriate here:
list(map(lambda x: sum(x)/len(x), zip(*lst)))
[4.049761666666666,
3.695478333333333,
3.015501666666667,
2.067323888888889,
1.3063504999999997,
0.9665465000000002,
1.0216338888888887,
1.5359944444444444,
2.130572222222222,
2.993912222222222,
3.7513661111111114,
4.029226111111111]
You could also use statistics.mean:
from statistics import mean
list(map(mean, zip(*lst)))
a = [[3.29588, 3.14241, 2.53874, 1.87257, 1.01365, 0.844504, 0.761601, 1.28007, 1.95795, 2.33491, 3.21032, 3.6976], [3.74857, 3.4343, 2.97245, 1.7386, 0.931359, 0.82109, 0.840537, 1.46436, 1.75026, 2.467, 3.36575, 3.6428], [3.2517, 3.37892, 2.84753, 1.7375, 1.11921, 0.761399, 0.780625, 1.40971, 1.80878, 2.49257, 3.0503, 3.22026], [4.86471, 3.95591, 3.31745, 2.16819, 1.40167, 0.962902, 1.01542, 1.56245, 2.2488, 3.30197, 3.78625, 4.16218], [4.37859, 3.58889, 2.18892, 1.85142, 1.36302, 1.04413, 1.14967, 1.63279, 2.06895, 3.36799, 3.64174, 4.00779], [3.78213, 2.85967, 2.29597, 2.0755, 1.32856, 1.07074, 1.05019, 1.43226, 2.01495, 2.96983, 4.20358, 3.97129], [4.11538, 2.98188, 2.51697, 1.81049, 1.23526, 0.982138, 1.09718, 1.55118, 2.42966, 3.4746, 3.70046, 4.6149], [4.28626, 4.00553, 3.36899, 2.40897, 1.40696, 0.961761, 0.881263, 1.25325, 2.05434, 2.54193, 4.13187, 4.60115], [4.15797, 3.16266, 3.31037, 2.16276, 1.42262, 0.924327, 1.11161, 1.57012, 2.21882, 2.94404, 4.18211, 4.19463], [3.94132, 3.74934, 3.52944, 1.98444, 1.33248, 0.974261, 0.976807, 1.63763, 1.96279, 3.17012, 2.96314, 4.23448], [4.21067, 4.1027, 3.48602, 2.26189, 1.36373, 1.06551, 1.06262, 1.24214, 2.11701, 3.19951, 3.83816, 4.18072], [4.52377, 4.02346, 3.10936, 2.41148, 1.44596, 1.03784, 0.997611, 1.66809, 2.2909, 3.13247, 4.07816, 3.4008], [2.40782, 3.18881, 2.95376, 1.84203, 1.28495, 0.957945, 1.03246, 1.80852, 2.15366, 2.74635, 4.26849, 4.12046], [4.48346, 3.81883, 2.96019, 2.34712, 1.33384, 1.01678, 1.09052, 1.44302, 2.18529, 3.29472, 3.90009, 4.67098], [4.34282, 4.45031, 3.55955, 2.35169, 1.44429, 1.02647, 1.24539, 1.73125, 2.3716, 3.3476, 4.21021, 4.11485], [4.5259, 4.21495, 3.26138, 2.38399, 1.55304, 1.21289, 1.17101, 1.79027, 2.24747, 3.03854, 3.31494, 3.70687], [4.47717, 4.6265, 3.10359, 2.15151, 1.26597, 0.886686, 1.18106, 1.67292, 2.45298, 3.21713, 4.20611, 4.35356], [4.10159, 3.83354, 2.95835, 1.65168, 1.26774, 0.846464, 0.943836, 1.49787, 2.01609, 2.84914, 3.47291, 3.63075]]
avgs = []
cnts = []
N = len(a)
M = len(a[0])
# initialize arrays
for i in range(0, M):
avgs.append(0)
cnts.append(0)
# update averages
for i in range(0, N):
for j in range(0, len(a[i])):
cnts[j] += 1
avgs[j] += a[i][j]
# divide by count
for i in range(0, M):
avgs[i] /= cnts[i]
# print averages
print(avgs)
I am new to programming and had a question. If I had two numpy arrays:
A = np.array([[1,0,3], [2,6,5], [3,4,1],[4,3,2],[5,7,9]], dtype=np.int64)
B = np.array([[3,4,5],[6,7,9],[1,0,3],[4,5,6]], dtype=np.int64)
I want to compare the last two columns of array A to the last two columns of array B, and then if they are equal, output the entire row to a new array. So, the output of these two arrays would be:
[1,0,3
1,0,3
5,7,9
6,7,9]
Because even though the first element does not match for the last two rows, the last two elements do.
Here is my code so far, but it is not even close to working. Can anyone give me some tips?
column_two_A = A[:,1]
column_two_B = B[:,1]
column_three_A = A[:,2]
column_three_B = B[:,2]
column_four_A = A[:,3]
column_four_B = B[:,3]
times = A[:,0]
for elementA in column_three_A:
for elementB in column_three_B:
if elementA == elementB:
continue
for elementC in column_two_A:
for elementD in column_two_B:
if elementC == elementD:
continue
for elementE in column_four_A:
for elementF in column_four_B:
if elementE == elementF:
continue
element.append(time)
print(element)
Numpy holds many functions for that kind of tasks. Here is a solution to check if the values of A are in B. Add print() statements and check what chk, chk2 and x are.
import numpy as np
A = np.array([[1,0,3], [2,6,5], [3,4,1],[4,3,2],[5,7,9]], dtype=np.int64)
B = np.array([[3,4,5],[6,7,9],[1,0,3],[4,5,6]], dtype=np.int64)
c = []
for k in A:
chk = np.equal(k[-2:], B[:, -2:])
chk2 = np.all(chk, axis=1)
x = (B[chk2, :])
if x.size:
c.append(x)
print(c)
I think I figured it out by staying up all night... thank you!
`for i in range(len(A)):
for j in range(len(B)):
if A[i][1] == B[j][1]:
if A[i][2] == B[j][2]:
print(B[j])
print(A[i])`