I am a complete beginner in terms of programming and I just started learning my first language, which is Python. Lately, I've been practising solving problems in Hackerrank and I got stuck at some "diagonal difference" problem.
The problem is completely new to me so I search the web for some answers and come across with this function that somebody made in github.
def diagonalDifference(arr):
prim =0
sec=0
length = len(arr[0])
i=0 #what does i=0 even do here?
for count in range(length):
prim += arr[count][count] #don't understand what "[count][count]" mean
sec += arr[count][(length-count-1)] #don't understand this either
return abs(prim-sec)
Here is same code with further explanation. Basically this function sums the elements
of the upper left to bottom right diagonal storing the running total in prim, and sums
the elements of the upper right to bottom left diagonal storing the running total in sec. Then
the absolute value of the difference is returned. For an array the indexes are: arr[row][column]
from 0 to one less than the lenght of the array. Hope it helps
import numpy as np
def diagonalDifference(arr):
prim = 0
sec = 0
length = len(arr[0])
for i in range(length):
print("Iteration:", i,
"UL to BR Diagonal:", arr[i][i],
"UR to BL Diagonal:", arr[i][(length-i-1)])
# Get value of arr in the ith row and ith column (i.e. the UL to BR diagonal)
# Add to the cummulative sum
prim = prim + arr[i][i]
# Get the value of arr in the ith row and the (length-i-1)th column
# Columns traverse right to left (i.e. the UR to BL diagonal)
sec = sec + arr[i][(length-i-1)]
print("UL to BR Diagonal Sum:", prim,
"----",
"UR to BL Diagonal Sum:", sec)
# Take the absolute value of the difference between the running totals
return abs(prim-sec)
a = np.array([[1, 2, 4], [3, 4, 6], [3, 8, 1]])
print(a)
diagonalDifference(a)
Output:
[[1 2 4]
[3 4 6]
[3 8 1]]
Iteration: 0 UL to BR Diagonal: 1 UR to BL Diagonal: 4
Iteration: 1 UL to BR Diagonal: 4 UR to BL Diagonal: 4
Iteration: 2 UL to BR Diagonal: 1 UR to BL Diagonal: 3
UL to BR Diagonal Sum: 6 ---- UR to BL Diagonal Sum: 11
First of all i here is unnecessary. Now, let's say we have a square matrix:
arr =
[[1, 2, 4],
[3, 5, 8],
[6, 2, 1]]
The indices will be:
[[(0,0), (0,1), (0,2)],
[(1,0), (1,1), (1,2)],
[(2,0), (2,1), (2,2)]]
So the primary diagonal is [(0,0),(1,1),(2,2)]
And the secondary diagonal is: [(0,2),(1,1),(2,0)]
Now in the function:
length = len(arr[0])
arr[0] is := [1, 2, 4], i.e. the first row,
so length = 3
for count in range(length):
so count will have values: [0, 1, 2]
Now, for all the iterations:
arr[count][count] will yield: arr[0][0], arr[1][1] and arr[2][2],
hence giving the first diagonal.
And
arr[count][(length-count-1)] will yield: arr[0][(3-0-1)], arr[1][(3-1-1)],
and arr[2][(3-2-1)], i.e arr[0][2], arr[1][1] and arr[2][0],
which is the second diagonal
The question is, how can I remove elements that appear more often than once in an array completely. Below you see an approach that is very slow when it comes to bigger arrays.
Any idea of doing this the numpy-way? Thanks in advance.
import numpy as np
count = 0
result = []
input = np.array([[1,1], [1,1], [2,3], [4,5], [1,1]]) # array with points [x, y]
# count appearance of elements with same x and y coordinate
# append to result if element appears just once
for i in input:
for j in input:
if (j[0] == i [0]) and (j[1] == i[1]):
count += 1
if count == 1:
result.append(i)
count = 0
print np.array(result)
UPDATE: BECAUSE OF FORMER OVERSIMPLIFICATION
Again to be clear: How can I remove elements appearing more than once concerning a certain attribute from an array/list ?? Here: list with elements of length 6, if first and second entry of every elements both appears more than once in the list, remove all concerning elements from list. Hope I'm not to confusing. Eumiro helped me a lot on this, but I don't manage to flatten the output list as it should be :(
import numpy as np
import collections
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
# here, from input there should be removed input[0], input[1] and input[4] because
# first and second entry appears more than once in the list, got it? :)
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a[2:])
outputDict = [list(k)+list(v) for k,v in d.iteritems() if len(v) == 1 ]
result = []
def flatten(x):
if isinstance(x, collections.Iterable):
return [a for i in x for a in flatten(i)]
else:
return [x]
# I took flatten(x) from http://stackoverflow.com/a/2158522/1132378
# And I need it, because output is a nested list :(
for i in outputDict:
result.append(flatten(i))
print np.array(result)
So, this works, but it's impracticable with big lists.
First I got
RuntimeError: maximum recursion depth exceeded in cmp
and after applying
sys.setrecursionlimit(10000)
I got
Segmentation fault
how could I implement Eumiros solution for big lists > 100000 elements?
np.array(list(set(map(tuple, input))))
returns
array([[4, 5],
[2, 3],
[1, 1]])
UPDATE 1: If you want to remove the [1, 1] too (because it appears more than once), you can do:
from collections import Counter
np.array([k for k, v in Counter(map(tuple, input)).iteritems() if v == 1])
returns
array([[4, 5],
[2, 3]])
UPDATE 2: with input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]:
input=[[1,1,2], [1,1,3], [2,3,4], [4,5,5], [1,1,7]]
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a[2])
d is now:
{(1, 1): [2, 3, 7],
(2, 3): [4],
(4, 5): [5]}
so we want to take all key-value pairs, that have single values and re-create the arrays:
np.array([k+tuple(v) for k,v in d.iteritems() if len(v) == 1])
returns:
array([[4, 5, 5],
[2, 3, 4]])
UPDATE 3: For larger arrays, you can adapt my previous solution to:
import numpy as np
input = [[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]]
d = {}
for a in input:
d.setdefault(tuple(a[:2]), []).append(a)
np.array([v for v in d.itervalues() if len(v) == 1])
returns:
array([[[456, 6, 5, 343, 435, 5]],
[[ 1, 3, 4, 5, 6, 7]],
[[ 3, 4, 6, 7, 7, 6]],
[[ 3, 3, 3, 3, 3, 3]]])
This is a corrected, faster version of Hooked's answer. count_unique counts the number of the number of occurrences for each unique key in keys.
import numpy as np
input = np.array([[1,1,3,5,6,6],
[1,1,4,4,5,6],
[1,3,4,5,6,7],
[3,4,6,7,7,6],
[1,1,4,6,88,7],
[3,3,3,3,3,3],
[456,6,5,343,435,5]])
def count_unique(keys):
"""Finds an index to each unique key (row) in keys and counts the number of
occurrences for each key"""
order = np.lexsort(keys.T)
keys = keys[order]
diff = np.ones(len(keys)+1, 'bool')
diff[1:-1] = (keys[1:] != keys[:-1]).any(-1)
count = np.where(diff)[0]
count = count[1:] - count[:-1]
ind = order[diff[1:]]
return ind, count
key = input[:, :2]
ind, count = count_unique(key)
print key[ind]
#[[ 1 1]
# [ 1 3]
# [ 3 3]
# [ 3 4]
# [456 6]]
print count
[3 1 1 1 1]
ind = ind[count == 1]
output = input[ind]
print output
#[[ 1 3 4 5 6 7]
# [ 3 3 3 3 3 3]
# [ 3 4 6 7 7 6]
# [456 6 5 343 435 5]]
Updated Solution:
From the comments below, the new solution is:
idx = argsort(A[:, 0:2], axis=0)[:,1]
kidx = where(sum(A[idx,:][:-1,0:2]!=A[idx,:][1:,0:2], axis=1)==0)[0]
kidx = unique(concatenate((kidx,kidx+1)))
for n in arange(0,A.shape[0],1):
if n not in kidx:
print A[idx,:][n]
> [1 3 4 5 6 7]
[3 3 3 3 3 3]
[3 4 6 7 7 6]
[456 6 5 343 435 5]
kidx is a index list of the elements you don't want. This preserves rows where the first two inner elements do not match any other inner element. Since everything is done with indexing, it should be fast(ish), though it requires a sort on the first two elements. Note that original row order is not preserved, though I don't think this is a problem.
Old Solution:
If I understand it correctly, you simply want to filter out the results of a list of lists where the first element of each inner list is equal to the second element.
With your input from your update A=[[1,1,3,5,6,6],[1,1,4,4,5,6],[1,3,4,5,6,7],[3,4,6,7,7,6],[1,1,4,6,88,7],[3,3,3,3,3,3],[456,6,5,343,435,5]], the following line removes A[0],A[1] and A[4]. A[5] is also removed since that seems to match your criteria.
[x for x in A if x[0]!=x[1]]
If you can use numpy, there is a really slick way of doing the above. Assume that A is an array, then
A[A[0,:] == A[1,:]]
Will pull out the same values. This is probably faster than the solution listed above if you want to loop over it.
Why not create another array to hold the output?
Iterate through your main list and for each i check if i is in your other array and if not append it.
This way, your new array will not contain more than one of each element