Related
I have two arrays of the same length as shown below.
import numpy as np
y1 = [12.1, 6.2, 1.4, 0.8, 5.6, 6.8, 8.5]
y2 = [8.2, 5.6, 2.8, 1.4, 2.5, 4.2, 6.4]
y1_a = np.array(y1)
y2_a = np.array(y2)
print(y1_a)
print(y2_a)
for i in range(len(y2_a)):
y3_a[i] = abs(y2_a[i] - y2_a[i])
I am computing the absolute difference at each index/location between the two arrays. I have to replace 'y1_a' with 'y2_a' whenever the absolute difference exceeds 2.0 at a given index/location and write it to a new array variable 'y3_a'. The starter code is added.
First of all, let numpy do the lifting for you. You can calculate your absolute differences without a manual for loop:
abs_diff = np.abs(y2_a - y1_a) # I assume your original code has a typo
Now you can get all the values where the absolute difference is more than 2.0:
y3_a = y1_a
y3_a[abs_diff > 2.0] = y2_a[abs_diff > 2.0]
So i have this fucntion to find the lowest value in a matrix, and return its position in the matrix, ie its indices:
final_matrix=[[3.57 2.71 9.2 5.63]
[4.42 1.4 3.53 8.97]
[1.2 0.33 6.26 7.77]
[6.36 3.6 8.91 7.42]
[1.59 0.9 2.4 4.24]] # this changes in my code, im just giving a very simple version of it here
def lowest_values(final_matrix):
best_value=10000 #or any arbitrarily high number
for i in range(0,len(final_matrix[:,0])):
for j in range(0,len(final_matrix[0,:])):
if final_matrix[i,j]<best_value:
best_value=final_matrix[i,j]
lowest_val_i=i
lowest_val_j=j
return(lowest_val_i, lowest_val_j)
this returns me (1,2), which just by visual analysis is correct. i want to now find the lowest 3 values - hopefully, to build into this loop. But i really cannot think how! Or at least i dont know how to implement it. I was thinking of some if-else loop, that if the lowest value is already found, then 'void' this one and found 2nd lowest, and then same thing to find the third. But im not sure.
Please dont be too quick to shut this question down - im very new to programming, and very stuck!
The human Approach
I think my approach to this is different enough from different answers to share.
I am only doing 3 comparisons for every list element so it should be O(n). Also I'm not creating a entirely new list with (value, indices) tuple of all the elements.
matrix=[[3.57, 2.71, 9.2, 5.63],
[4.42, 1.4, 3.53, 8.97],
[1.2, 0.33, 6.26, 7.77],
[6.36, 3.6, 8.91, 7.42],
[1.59, 0.9, 2.4, 4.24]]
def compare_least_values(value, i, j):
global least
if value < least[2][0] :
if value < least[1][0] :
if value < least[0][0] :
least.insert(0, (value,(i,j)))
else:
least.insert(1, (value,(i,j)))
else:
least.insert(2, (value,(i,j)))
def lowest_three_values(matrix):
global least
least = [(10000, (None, None)), (10000, (None, None)), (10000, (None, None))]
for i, row in enumerate(matrix):
for j, value in enumerate(row):
compare_least_values(value, i, j)
return least[:3]
print(lowest_three_values(matrix))
Output:
[(0.33, (2, 1)), (0.9, (4, 1)), (1.2, (2, 0))]
The practical approach (Numpy)
If you're familiar with numpy than this is the way to go. Even if you're not it can be use as a copy-paste snippet.
import numpy as np
matrix=[[3.57, 2.71, 9.2, 5.63],
[4.42, 1.4, 3.53, 8.97],
[1.2, 0.33, 6.26, 7.77],
[6.36, 3.6, 8.91, 7.42],
[1.59, 0.9, 2.4, 4.24]]
matrix = np.array(matrix)
indices_1d = np.argpartition(matrix, 3, axis=None)[:3]
indices_2d = np.unravel_index(indices_1d, matrix.shape)
least_three = matrix[indices_2d]
print('least three values : ', least_three)
print('indices : ', *zip(*indices_2d) )
Output:
least three values : [0.33 0.9 1.2 ]
indices : (2, 1) (4, 1) (2, 0)
See this Stackoverflow query for detailed answer on this.
I didn't understand that return thing (1,2). Lowest value of matrix is 0.33. and that's position is (2,1).
So my solution for your code;
all_items = []
# i appended matrix's all items to one list
for row in final_matrix:
for i in row:
all_items.append(i)
# and i sorted that list as from min to max
all_items.sort()
# then i took first 3 values
lowest_3 = all_items[0:3]
positions = []
# and i append their positions that's in the matrix into positions
for i in lowest_3:
for row in range(len(final_matrix)):
if(i in final_matrix[row]):
positions.append([row, final_matrix[row].index(i)])
break
# lowest_3 = [0.33, 0.9, 1.2]
# positions = [[2, 1], [4, 1], [2, 0]]
this is my solution:
final_matrix=[[3.57, 2.71, 9.2, 5.63],
[4.42, 1.4, 3.53, 8.97],
[1.2, 0.33, 6.26, 7.77],
[6.36, 3.6, 8.91, 7.42],
[1.59, 0.9, 2.4, 4.24]]
min_values = []
for i in range(3):
mini = final_matrix[0][0]
for row in final_matrix:
for n in row:
if n < mini:
mini = n
n_index = row.index(n)
row_index = final_matrix.index(row)
min_values.append(mini)
del final_matrix[row_index][n_index]
print("Finals {}".format(min_values))
Let me explains you:
The first loop through how many min values you want (change it, you will see what I mean)
the second and the last one, loop through the matrix to take the minimun value
The line del final_matrix[row_index][n_index] will destroy the minimal number IN the original matrix
So if you want the keep the original matrix you have to create a new one and copy the original in => use deepcopy() from the copy module
You are looking for the N smallest items from a list of items (or a "matrix" in this case) and when N is small you can do better than sorting the items of your list by creating a heap queue, which takes linear time and then by popping the N smallest elements which is an O(log N) operation for each pop. The heap queue is an important data structure, which you should study.
import heapq
final_matrix=[[3.57, 2.71, 9.2, 5.63],
[4.42, 1.4, 3.53, 8.97],
[1.2, 0.33, 6.26, 7.77],
[6.36, 3.6, 8.91, 7.42],
[1.59, 0.9, 2.4, 4.24]]
def lowest_values(final_matrix):
# create a flat list, l, from the matrix
# each element is a tuple: (value, x-coordinate, y-coordinate)
l = [(final_matrix[x][y], x, y)
for x in range(len(final_matrix))
for y in range(len(final_matrix[0]))
]
heapq.heapify(l) # O(N) time
for _ in range(3):
# pop next smallest tuple:
value, x, y = heapq.heappop(l) # O(log N) time
print(f'value={value}, x={x}, y={y}')
lowest_values(final_matrix)
Prints:
value=0.33, x=2, y=1
value=0.9, x=4, y=1
value=1.2, x=2, y=0
Note
The above code could have been simplified to the following, which is probably even slightly more efficient if all you want are the 3 smallest items and then you have no further need for the heap queue structure. But I wanted to show the two basic operations of creating a heap queue from a list and then successively producing the smallest items from that heap queue:
import heapq
final_matrix=[[3.57, 2.71, 9.2, 5.63],
[4.42, 1.4, 3.53, 8.97],
[1.2, 0.33, 6.26, 7.77],
[6.36, 3.6, 8.91, 7.42],
[1.59, 0.9, 2.4, 4.24]]
def lowest_values(final_matrix):
# create a flat list, l, from the matrix
# each element is a tuple: (value, x-coordinate, y-coordinate)
l = [(final_matrix[x][y], x, y)
for x in range(len(final_matrix))
for y in range(len(final_matrix[0]))
]
for value, x, y in heapq.nsmallest(3, l):
print(f'value={value}, x={x}, y={y}')
lowest_values(final_matrix)
I was computing spearman correlations for matrix. I found the matrix input and two-array input gave different results when using scipy.stats.spearmanr. The results are also different from pandas.Data.Frame.corr.
from scipy.stats import spearmanr # scipy 1.0.1
import pandas as pd # 0.22.0
import numpy as np
#Data
X = pd.DataFrame({"A":[-0.4,1,12,78,84,26,0,0], "B":[-0.4,3.3,54,87,25,np.nan,0,1.2], "C":[np.nan,56,78,0,np.nan,143,11,np.nan], "D":[0,-9.3,23,72,np.nan,-2,-0.3,-0.4], "E":[78,np.nan,np.nan,0,-1,-11,1,323]})
matrix_rho_scipy = spearmanr(X,nan_policy='omit',axis=0)[0]
matrix_rho_pandas = X.corr('spearman')
print(matrix_rho_scipy == matrix_rho_pandas.values) # All False except diagonal
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8839285714285714 from scipy 1.0.1
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8829187134416477 from scipy 1.1.0
print(matrix_rho_scipy[0,1]) # 0.8263621207201486
print(matrix_rho_pandas.values[0,1]) # 0.8829187134416477
Later I found Pandas's rho is the same as R's rho.
X = data.frame(A=c(-0.4,1,12,78,84,26,0,0),
B=c(-0.4,3.3,54,87,25,NaN,0,1.2), C=c(NaN,56,78,0,NaN, 143,11,NaN),
D=c(0,-9.3,23,72,NaN,-2,-0.3,-0.4), E=c(78,NaN,NaN,0,-1,-11,1,323))
cor.test(X$A,X$B,method='spearman', exact = FALSE, na.action="na.omit") # 0.8829187
However, Pandas's corr doesn't work with large tables (e.g., here and my case is 16,000).
Thanks to Warren Weckesser's testing, I found the two-array results from Scipy 1.1.0 (but not 1.0.1) are the same results as Pandas and R.
Please let me know if you have any suggestions or comments. Thank you.
I use Python: 3.6.2 (Anaconda); Mac OS: 10.10.5.
It appears that scipy.stats.spearmanr doesn't handle nan values as expected when the input is an array and an axis is given. Here's a script that compares a few methods of computing pairwise Spearman rank-order correlations:
import numpy as np
import pandas as pd
from scipy.stats import spearmanr
x = np.array([[np.nan, 3.0, 4.0, 5.0, 5.1, 6.0, 9.2],
[5.0, np.nan, 4.1, 4.8, 4.9, 5.0, 4.1],
[0.5, 4.0, 7.1, 3.8, 8.0, 5.1, 7.6]])
r = spearmanr(x, nan_policy='omit', axis=1)[0]
print("spearmanr, array: %11.7f %11.7f %11.7f" % (r[0, 1], r[0, 2], r[1, 2]))
r01 = spearmanr(x[0], x[1], nan_policy='omit')[0]
r02 = spearmanr(x[0], x[2], nan_policy='omit')[0]
r12 = spearmanr(x[1], x[2], nan_policy='omit')[0]
print("spearmanr, individual: %11.7f %11.7f %11.7f" % (r01, r02, r12))
df = pd.DataFrame(x.T)
c = df.corr('spearman')
print("Pandas df.corr('spearman'): %11.7f %11.7f %11.7f" % (c[0][1], c[0][2], c[1][2]))
print("R cor.test: 0.2051957 0.4857143 -0.4707919")
print(' (method="spearman", continuity=FALSE)')
"""
# R code:
> x0 = c(NA, 3, 4, 5, 5.1, 6.0, 9.2)
> x1 = c(5.0, NA, 4.1, 4.8, 4.9, 5.0, 4.1)
> x2 = c(0.5, 4.0, 7.1, 3.8, 8.0, 5.1, 7.6)
> cor.test(x0, x1, method="spearman", continuity=FALSE)
> cor.test(x0, x2, method="spearman", continuity=FALSE)
> cor.test(x1, x2, method="spearman", continuity=FALSE)
"""
Output:
spearmanr, array: -0.0727393 -0.0714286 -0.4728054
spearmanr, individual: 0.2051957 0.4857143 -0.4707919
Pandas df.corr('spearman'): 0.2051957 0.4857143 -0.4707919
R cor.test: 0.2051957 0.4857143 -0.4707919
(method="spearman", continuity=FALSE)
My suggestion is to not use scipy.stats.spearmanr in the form spearmanr(x, nan_policy='omit', axis=<whatever>). Use the corr() method of the Pandas DataFrame, or use a loop to compute the values pairwise using spearmanr(x0, x1, nan_policy='omit').
(Quick note! While I know there are plenty of options for sorting in Python, this code is more of a generalized proof-of-concept and will later be ported to another language, so I won't be able to use any specific Python libraries or functions.
In addition, the solution you provide doesn't necessarily have to follow my approach below.)
Background
I have a quicksort algorithm and am trying to implement a method to allow later 'unsorting' of the new location of a sorted element. That is, if element A is at index x and is sorted to index y, then the 'pointer' (or, depending on your terminology, reference or mapping) array changes its value at index x from x to y.
In more detail:
You begin the program with an array, arr, with some given set of numbers. This array is later run through a quick sort algorithm, as sorting the array is important for future processing on it.
The ordering of this array is important. As such, you have another array, ref, which contains the indices of the original array such that when you map the reference array to the array, the original ordering of the array is reproduced.
Before the array is sorted, the array and mapping looks like this:
arr = [1.2, 1.5, 1.5, 1.0, 1.1, 1.8]
ref = [0, 1, 2, 3, 4, 5]
--------
map(arr,ref) -> [1.2, 1.5, 1.5, 1.0, 1.1, 1.8]
You can see that index 0 of ref points to index 0 of arr, giving you 1.2. Index 1 of ref points to index 1 of arr, giving you 1.5, and so on.
When the algorithm is sorted, ref should be rearranged such that when you map it according to the above procedure, it generates the pre-sorted arr:
arr = [1.0, 1.1, 1.2, 1.5, 1.5, 1.8]
ref = [2, 3, 4, 0, 1, 5]
--------
map(arr,ref) -> [1.2, 1.5, 1.5, 1.0, 1.1, 1.8]
Again, index 0 of ref is 2, so the first element of the mapped array is arr[2]=1.2. Index 1 of ref is 3, so the second element of the mapped array is arr[3]=1.5, and so on.
The Issue
The current implementation of my code works great for sorting, but horrible for the remapping of ref.
Given the same array arr, the output of my program looks like this:
arr = [1.0, 1.1, 1.2, 1.5, 1.5, 1.8]
ref = [3, 4, 0, 1, 2, 5]
--------
map(arr,ref) -> [1.5, 1.5, 1.0, 1.1, 1.2, 1.8]
This is a problem because this mapping is definitely not equal to the original:
[1.5, 1.5, 1.0, 1.1, 1.2, 1.8] != [1.2, 1.5, 1.5, 1.0, 1.1, 1.8]
My approach has been this:
When elements a and b, at indices x and y in arr are switched,
Then set ref[x] = y and ref[y] = x.
This is not working and I can't think of another solution that doesn't need O(n^2) time.
Thank you!
Minimally Reproducible Example
testing = [1.5, 1.2, 1.0, 1.0, 1.2, 1.2, 1.5, 1.3, 2.0, 0.7, 0.2, 1.4, 1.2, 1.8, 2.0, 2.1]
# This is the 'map(arr,ref) ->' function
def print_links(a,b):
tt = [a[b[i]-1] for i in range(0,len(a))]
print("map(arr,ref) -> {}".format(tt))
# This tests the re-mapping against an original copy of the array
f = 0
for i in range(0,len(testing)):
if testing[i] == tt[i]:
f += 1
print("{}/{}".format(f,len(a)))
def quick_sort(arr,ref,first=None,last=None):
if first == None:
first = 0
if last == None:
last = len(arr)-1
if first < last:
split = partition(arr,ref,first,last)
quick_sort(arr,ref,first,split-1)
quick_sort(arr,ref,split+1,last)
def partition(arr,ref,first,last):
pivot = arr[first]
left = first+1
right = last
done = False
while not done:
while left <= right and arr[left] <= pivot:
left += 1
while arr[right] >= pivot and right >= left:
right -= 1
if right < left:
done = True
else:
temp = arr[left]
arr[left] = arr[right]
arr[right] = temp
# This is my attempt at preserving indices part 1
temp = ref[left]
ref[left] = ref[right]
ref[right] = temp
temp = arr[first]
arr[first] = arr[right]
arr[right] = temp
# This is my attempt at preserving indices part 2
temp = ref[first]
ref[first] = ref[right]
ref[right] = temp
return right
# Main body of code
a = [1.5,1.2,1.0,1.0,1.2,1.2,1.5,1.3,2.0,0.7,0.2,1.4,1.2,1.8,2.0,2.1]
b = range(1,len(a)+1)
print("The following should match:")
print("a = {}".format(a))
a0 = a[:]
print("ref = {}".format(b))
print("----")
print_links(a,b)
print("\nQuicksort:")
quick_sort(a,b)
print(a)
print("\nThe following should match:")
print("arr = {}".format(a0))
print("ref = {}".format(b))
print("----")
print_links(a,b)
You can do what you ask, but when we have to do something like this in real life, we usually mess with the sort's comparison function instead of the swap function. Sorting routines provided with common languages usually have that capability built in so you don't have to write your own sort.
In this procedure, you sort the ref array (called order below), by the value of the arr value it points to. The generates the same ref array you already have, but without modifying arr.
Mapping with this ordering sorts the original array. You expected it to unsort the sorted array, which is why your code isn't working.
You can invert this ordering to get the ref array you were originally looking for, or you can just leave arr unsorted and map it through order when you need it ordered.
arr = [1.5, 1.2, 1.0, 1.0, 1.2, 1.2, 1.5, 1.3, 2.0, 0.7, 0.2, 1.4, 1.2, 1.8, 2.0, 2.1]
order = range(len(arr))
order.sort(key=lambda i:arr[i])
new_arr = [arr[order[i]] for i in range(len(arr))]
print("original array = {}".format(arr))
print("sorted ordering = {}".format(order))
print("sorted array = {}".format(new_arr))
ref = [0]*len(order)
for i in range(len(order)):
ref[order[i]]=i
unsorted = [new_arr[ref[i]] for i in range(len(ref))]
print("unsorted after sorting = {}".format(unsorted))
Output:
original array = [1.5, 1.2, 1.0, 1.0, 1.2, 1.2, 1.5, 1.3, 2.0, 0.7, 0.2, 1.4, 1.2, 1.8, 2.0, 2.1]
sorted ordering = [10, 9, 2, 3, 1, 4, 5, 12, 7, 11, 0, 6, 13, 8, 14, 15]
sorted array = [0.2, 0.7, 1.0, 1.0, 1.2, 1.2, 1.2, 1.2, 1.3, 1.4, 1.5, 1.5, 1.8, 2.0, 2.0, 2.1]
unsorted after sorting = [1.5, 1.2, 1.0, 1.0, 1.2, 1.2, 1.5, 1.3, 2.0, 0.7, 0.2, 1.4, 1.2, 1.8, 2.0, 2.1]
You don't need to maintain the map of indices and elements,just sort the indices as you sort your array.for example:
unsortedArray = [1.2, 1.5, 2.1]
unsortedIndexes = [0, 1, 2]
sortedAray = [1.2, 1.5, 2.1]
then you just swap 0 and 1as you sort unsortedArray.and get the sortedIndexes[1, 0, 2],you can get the origin array by sortedArray[1],sortedArray[0],sortedArray[2].
def inplace_quick_sort(s, indexes, start, end):
if start>= end:
return
pivot = getPivot(s, start, end)#it's should be a func
left = start
right = end - 1
while left <= right:
while left <= right and customCmp(pivot, s[left]):
# s[left] < pivot:
left += 1
while left <= right and customCmp(s[right], pivot):
# pivot < s[right]:
right -= 1
if left <= right:
s[left], s[right] = s[right], s[left]
indexes[left], indexes[right] = indexes[right], indexes[left]
left, right = left + 1, right -1
s[left], s[end] = s[end], s[left]
indexes[left], indexes[end] = indexes[end], indexes[left]
inplace_quick_sort(s, indexes, start, left-1)
inplace_quick_sort(s, indexes, left+1, end)
def customCmp(a, b):
return a > b
def getPivot(s, start, end):
return s[end]
if __name__ == '__main__':
arr = [1.5,1.2,1.0,1.0,1.2,1.2,1.5,1.3,2.0,0.7,0.2,1.4,1.2,1.8,2.0,2.1]
indexes = [i for i in range(len(arr))]
inplace_quick_sort(arr,indexes, 0, len(arr)-1)
print("sorted = {}".format(arr))
ref = [0]*len(indexes)
for i in range(len(indexes)):
#the core point of Matt Timmermans' answer about how to construct the ref
#the value of indexes[i] is index of the orignal array
#and i is the index of the sorted array,
#so we get the map by ref[indexes[i]] = i
ref[indexes[i]] = i
unsorted = [arr[ref[i]] for i in range(len(ref))]
print("unsorted after sorting = {}".format(unsorted))
It's not that horrible: you've merely reversed your reference usage. Your indices, ref, tell you how to build the sorted list from the original. However, you've used it in the opposite direction: you've applied it to the sorted list, trying to reconstruct the original. You need the inverse mapping.
Is that enough to get you to solve your problem?
I think you can just repair your ref array after the fact. From your code sample, just insert the following snippet after the call toquick_sort(a,b)
c = range(1, len(b)+1)
for i in range(0, len(b)):
c[ b[i]-1 ] = i+1
The c array should now contain the correct references.
Stealing/rewording what #Prune writes: what you have in b is the forward transformation, the sorting itself. Applying it to a0 provides the sorted list (print_links(a0,b))
You just have to revert it via looking up which element went to what position:
c=[b.index(i)+1 for i in range(1,len(a)+1)]
print_links(a,c)
The relevant excerpt of my code is as follows:
import numpy as np
def create_function(duration, start, stop):
rates = np.linspace(start, stop, duration*1000)
return rates
def generate_spikes(duration, start, stop):
rates = [create_function(duration, start, stop)]
array = [np.arange(0, (duration*1000), 1)]
start_value = [np.repeat(start, duration*1000)]
double_array = [np.add(array,array)]
times = np.arange(np.add(start_value,array), np.add(start_value,double_array), rates)
return times/1000.
I know this is really inefficient coding (especially the start_value and double_array stuff), but it's all a product of trying to somehow use arange with lists as my inputs.
I keep getting this error:
Type Error: int() argument must be a string, a bytes-like element, or a number, not 'list'
Essentially, an example of what I'm trying to do is this:
I had two arrays a = [1, 2, 3, 4] and b = [0.1, 0.2, 0.3, 0.4], I'd want to use np.arange to generate [1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8, 5.2]? (I'd be using a different step size for every element in the array.)
Is this even possible? And if so, would I have to flatten my list?
You can use broadcasting there for efficiency purposes -
(a + (b[:,None] * a)).ravel('F')
Sample run -
In [52]: a
Out[52]: array([1, 2, 3, 4])
In [53]: b
Out[53]: array([ 0.1, 0.2, 0.3, 0.4])
In [54]: (a + (b[:,None] * a)).ravel('F')
Out[54]:
array([ 1.1, 1.2, 1.3, 1.4, 2.2, 2.4, 2.6, 2.8, 3.3, 3.6, 3.9,
4.2, 4.4, 4.8, 5.2, 5.6])
Looking at the expected output, it seems you are using just the first three elements off b for the computation. So, to achieve that target, we just slice the first three elements and do that computation, like so -
In [55]: (a + (b[:3,None] * a)).ravel('F')
Out[55]:
array([ 1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8,
5.2])