Unique items in a list with condition

Unique items in a list with condition - python

If i have a list in python say
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0],[30,1,1]]
I would want to have a resulting list
thing = [[20,1,1],[20,0,2],[30,1,1]]
That is if the first element is the same, remove duplicates and give priority to the number 1 in the second element. Lastly the 3rd element must also be unique to the first element.
In this previous question we solved a complicated method where for a transaction it details a purchased unit. I want to output other units in that course. If two transactions exist that relate to two units in one course it will display them a duplicate (or times each subsequent unit).
The aim of this question it to ensure that this duplication is stopped. Because of the complication of this solution it has resulted in a series of question. Thanks for everyone that has helped so far.

I am not sure you would like this, but it works with your example:
[list(i) + j for i, j in dict([(tuple(x[:2]), x[2:]) for x in sorted(thing, key=lambda x:len(x))]).items()]
EDIT:
Here a bit more detailed (note that it fits better to your description of the problem, sorting ONLY by the length of each sublist, may not be the best solution):
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0],[30,1,1]]
dico = {}
for x in thing:
if not tuple(x[:2]) in dico:
dico[tuple(x[:2])] = x[2:]
continue
if tuple(x[:2])[1] < x[1]:
dico[tuple(x[:2])] = x[2:]
new_thing = []
for i, j in dico.items():
new_thing.append(list(i) + j)

You might want to try using the unique_everseen function from the itertools recipes.
As a first step, here is a solution excluding [20, 0]:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
thing = [[20,0,1],[20,0,2],[20,1,1],[30,1,1]]
thing.sort(key=lambda x: 0 if x[1] == 1 else 1)
print(list(unique_everseen(thing, key=lambda x: (x[0], x[2]))))
Output:
[[20, 1, 1], [30, 1, 1], [20, 0, 2]]

thing = [[20,0,1],[20,0,2],[20,1,1],[20,0,1],[30,1,1]]
d = {}
for e in thing:
k = (e[0], e[2])
if k not in d or (d[k][1] != 1 and e[1] == 1):
d[k] = list(e)
print d.values()
[[20, 0, 2], [30, 1, 1], [20, 1, 1]]
if you don't need initial list:
thing = [[20,0,1],[20,0,2],[20,1,1],[20,0,1],[30,1,1]]
d = {}
for e in thing:
k = (e[0], e[2])
if k not in d or (d[k][1] != 1 and e[1] == 1):
d[k] = e
thing = d.values()
[[20, 0, 2], [30, 1, 1], [20, 1, 1]]
if you want to keep order of your lists, use OrderedDict
from collections import OrderedDict
d = OrderedDict()

Related

getting second smallest value from list

x = [[1,2,3,4],[4,5,0,1],[22,21,31,10]]
def minFor(x):
removingvalue = []
for i in x:
minvalue = i[0]
for j in i:
if j < minvalue:
minvalue=j
for i in range(0,len(x)):
if x==minvalue:
removingvalue = removingvalue + minvalue
return minvalue
print(minvalue)
what I 'm trying to do here is first find the smallest number from the list. And remove that smallest value and find again the smallest number from the list. But the remove function doesn't work

This finds second smallest of each sublists in the list:
lst = [[1,2,3,4],[4,5,0,1],[22,21,31,10]]
print([sorted(x)[1] for x in lst])
# [2, 1, 21]
You just needed to sort sublist in ascending order and select second value out. There is no need of removing value from list.

Personally, I'd use the builtin sorted function:
def second_min(x):
result = []
for sublist in x:
result.extend(sublist)
# this flattens the sublists
# into a single list
result = sorted(result)
return result[1]
# return the second element
And without the built-ins, replace the sorted() call with:
...
for i in range(len(result) - 1):
if result[i] > result[i + 1]:
result[i:i + 2] = [result[i + 1], result[i]]
...

Use min(iterable) and a list comprehension to get the overall minimal value.
Then use min on the same list comp with a twist: only allow values in the second list comp that are bigger then your minimal min-value:
xxxx = [[1,2,3,4],[4,5,0,1],[22,21,31,10]]
minmin = min((x for y in xxxx for x in y)) # flattening list comp
secmin = min((x for y in xxxx for x in y if x >minmin))
print(minmin,secmin)
Output:
0 1

You can convert the given data-structure, to a single list and then sort the list. The first two elements give you the answer you want. Here's what you can do:
input_list = x
new_l = []
for sl in input_list:
new_l.extend(sl)
new_l.sort()
# First two elements of new_l are the ones that you want.

remove_smallest = [sorted(i)[1:] for i in x]
get_smallest = [min(i) for i in remove_smallest]
print(remove_smallest)
print(get_smallest)
[[2, 3, 4], [1, 4, 5], [21, 22, 31]]
[2, 1, 21]
Expanded loops:
remove_smallest = []
for i in x:
remove_smallest.append(sorted(i)[1:])
get_smallest = []
for i in remove_smallest:
get_smallest.append(min(i))

Filtering a (Nx1) list in Python

I have a list of the form
[(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
I want to scan the list and return those elements whose (i,1) are repeated. (I apologize I couldn't frame this better)
For example, in the given list the pairs are (2,3),(4,3) and I see that 3 is repeated so I wish to return 2 and 4. Similarly, from (3,4),(1,4),(5,4) I will return 3, 1, and 5 because 4 is repeated.
I have implemented the bubble search but that is obviously very slow.
for i in range(0,p):
for j in range(i+1,p):
if (arr[i][1] == arr[j][1]):
print(arr[i][0],arr[j][0])
How do I go about it?

You can use collections.defaultdict. This will return a mapping from the second item to a list of first items. You can then filter for repetition via a dictionary comprehension.
from collections import defaultdict
lst = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
d = defaultdict(list)
for i, j in lst:
d[j].append(i)
print(d)
# defaultdict(list, {3: [2, 4], 4: [3, 1, 5], 5: [6]})
res = {k: v for k, v in d.items() if len(v)>1}
print(res)
# {3: [2, 4], 4: [3, 1, 5]}

Using numpy allows to avoid for loops:
import numpy as np
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
a = np.array(l)
items, counts = np.unique(a[:,1], return_counts=True)
is_duplicate = np.isin(a[:,1], items[counts > 1]) # get elements that have more than one count
print(a[is_duplicate, 0]) # return elements with duplicates
# tuple(map(tuple, a[is_duplicate, :])) # use this to get tuples in output
(toggle comment to get output in form of tuples)
pandas is another option:
import pandas as pd
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
df = pd.DataFrame(l, columns=list(['first', 'second']))
df.groupby('second').filter(lambda x: len(x) > 1)

Group repeated elements of a list

I am trying to create a function that receives a list and return another list with the repeated elements.
For example for the input A = [2,2,1,1,3,2] (the list is not sorted) and the function would return result = [[1,1], [2,2,2]]. The result doesn't need to be sorted.
I already did it in Wolfram Mathematica but now I have to translate it to python3, Mathematica has some functions like Select, Map and Split that makes it very simple without using long loops with a lot of instructions.

result = [[x] * A.count(x) for x in set(A) if A.count(x) > 1]

Simple approach:
def grpBySameConsecutiveItem(l):
rv= []
last = None
for elem in l:
if last == None:
last = [elem]
continue
if elem == last[0]:
last.append(elem)
continue
if len(last) > 1:
rv.append(last)
last = [elem]
return rv
print grpBySameConsecutiveItem([1,2,1,1,1,2,2,3,4,4,4,4,5,4])
Output:
[[1, 1, 1], [2, 2], [4, 4, 4, 4]]
You can sort your output afterwards if you want to have it sorted or sort your inputlist , then you wouldnt get consecutive identical numbers any longer though.
See this https://stackoverflow.com/a/4174955/7505395 for how to sort lists of lists depending on an index (just use 0) as all your inner lists are identical.
You could also use itertools - it hast things like TakeWhile - that looks much smarter if used
This will ignore consecutive ones, and just collect them all:
def grpByValue(lis):
d = {}
for key in lis:
if key in d:
d[key] += 1
else:
d[key] = 1
print(d)
rv = []
for k in d:
if (d[k]<2):
continue
rv.append([])
for n in range(0,d[k]):
rv[-1].append(k)
return rv
data = [1,2,1,1,1,2,2,3,4,4,4,4,5,4]
print grpByValue(data)
Output:
[[1, 1, 1, 1], [2, 2, 2], [4, 4, 4, 4, 4]]

You could do this with a list comprehension:
A = [1,1,1,2,2,3,3,3]
B = []
[B.append([n]*A.count(n)) for n in A if B.count([n]*A.count(n)) == 0]
outputs [[1,1,1],[2,2],[3,3,3]]
Or more pythonically:
A = [1,2,2,3,4,1,1,2,2,2,3,3,4,4,4]
B = []
for n in A:
if B.count([n]*A.count(n)) == 0:
B.append([n]*A.count(n))
outputs [[1,1,1],[2,2,2,2,2],[3,3,3],[4,4,4,4]]
Works with sorted or unsorted list, if you need to sort the list before hand you can do for n in sorted(A)

This is a job for Counter(). Iterating over each element, x, and checking A.count(x) has a O(N^2) complexity. Counter() will count how many times each element exists in your iterable in one pass and then you can generate your result by iterating over that dictionary.
>>> from collections import Counter
>>> A = [2,2,1,1,3,2]
>>> counts = Counter(A)
>>> result = [[key] * value for key, value in counts.items() if value > 1]
>>> result
[[2, 2, 2], [[1, 1]]

Slice a list in python

I have a list in python like:
l = [1,2,4,5,7,8,2,1]
I want a code that will check if the number before in the list is bigger that the next one. And if it is i want the code to remove the the number. I want my code to return [1,2,4,5,7,8] and remove the last 2 and 1.
Pseudo code:
If l[i] > l[i+1]
remove l[i+1]
then check if l[i] is bigger than the next one after that. What use to be l[i+2]
Can someone help me?

Started with your list and appended based upon your criteria to a new one.
l = [1,2,4,5,7,8,2,1]
result = []
chk = 0
for num in l:
if chk > num:
pass
else:
chk = num
result.append(num)
print result
Apparently it is bad practice to delete entries from lists within a loop , so this is a workaround.

Iterating through array if there is array[i] > array[i+1] then remove array[i+1].
array = [1,2,4,5,7,8,2,1]
l = len(array)
first = 0
secound = 1
while secound != l:
if array[first] > array[secound]:
del array[secound]
l -= 1
else:
first += 1
secound += 1
print(array)

You can use itertools groupby to find areas of lists that fulfill a condition; in this case, each number is larger than the number proceeding:
from itertools import groupby
l = [1,2,4,5,7,8,2,1]
for k, g in groupby(enumerate(l[1:],1), key=lambda (i,e): e>l[i-1]):
if k:
grp=list(g)
print l[grp[0][0]-1:grp[-1][0]+1]
# [1, 2, 4, 5, 7, 8]
This will find any area in the list that meets that condition:
l = [1,2,4,5,7,8,2,5]
for k, g in groupby(enumerate(l[1:],1), key=lambda (i,e): e>l[i-1]):
if k:
grp=list(g)
print l[grp[0][0]-1:grp[-1][0]+1]
# prints [1, 2, 4, 5, 7, 8] and [2, 5]
So break after the first if you only want 1 group:
l = [10, 11, 12, 1, 2]
for k, g in groupby(enumerate(l[1:],1), key=lambda (i,e): e>l[i-1]):
if k:
grp=list(g)
print l[grp[0][0]-1:grp[-1][0]+1]
break
# [10, 11, 12]

you can do this even in one line:
l = l[0:1] + [el for index,el in enumerate(l[0:]) if el >l[index-1]]
#l[0:1] + is ugly but i couldnt figure out another way to get the
#first element since in python l[-1] = l[len(l)-1]

Removing Elements in A from B

I have two lists, let's say:
a = [1,2,3]
b = [1,2,3,1,2,3]
I would like to remove 1, 2 and 3 from list b, but not all occurrences. The resulting list should have:
b = [1,2,3]
I currently have:
for element in a:
try:
b.remove(element)
except ValueError:
pass
However, this has poor performance when a and b get very large. Is there a more efficient way of getting the same results?
EDIT
To clarify 'not all occurrences', I mean I do not wish to remove both '1's from b, as there was only one '1' in a.

I would do this:
set_a = set(a)
new_b = []
for x in b:
if x in set_a:
set_a.remove(x)
else:
new_b.append(x)
Unlike the other set solutions, this maintains order in b (if you care about that).

I would do something like this:
from collections import defaultdict
a = [1, 2, 3]
b = [1, 2, 3, 1, 2, 3]
# Build up the count of occurrences in b
d = defaultdict(int)
for bb in b:
d[bb] += 1
# Remove one for each occurrence in a
for aa in a:
d[aa] -= 1
# Create a list for all elements that still have a count of one or more
result = []
for k, v in d.iteritems():
if v > 0:
result += [k] * v
Or, if you are willing to be slightly more obscure:
from operator import iadd
result = reduce(iadd, [[k] * v for k, v in d.iteritems() if v > 0], [])
defaultdict generates a count of the occurrences of each key. Once it has been built up from b, it is decremented for each occurrence of a key in a. Then we print out the elements that are still left over, allowing them to occur multiple times.
defaultdict works with python 2.6 and up. If you are using a later python (2.7 and up, I believe), you can look into collections.Counter.
Later: you can also generalize this and create subtractions of counter-style defaultdicts:
from collections import defaultdict
from operator import iadd
a = [1, 2, 3, 4, 5, 6]
b = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
def build_dd(lst):
d = defaultdict(int)
for item in lst:
d[item] += 1
return d
def subtract_dd(left, right):
return {k: left[k] - v for k, v in right.iteritems()}
db = build_dd(b)
da = build_dd(a)
result = reduce(iadd,
[[k] * v for k, v in subtract_dd(db, da).iteritems() if v > 0],
[])
print result
But the reduce expression is pretty obscure now.
Later still: in python 2.7 and later, using collections.Counter, it looks like this:
from collections import Counter
base = [1, 2, 3]
missing = [4, 5, 6]
extra = [7, 8, 9]
a = base + missing
b = base * 4 + extra
result = Counter(b) - Counter(a)
print result
assert result == dict([(k, 3) for k in base] + [(k, 1) for k in extra])

Generally, you want to always avoid list.remove() (you are right, it would hurt the performance really badly). Also, it is much faster (O(1)) to look up elements in a dictionary or a set than in a list; so create a set out of your list1 (and if order doesn't matter, out of your list2).
Something like this:
sa = set(a)
new_b = [x for x in b if not x in sa]
# here you created a 3d list but I bet it's OK.
However I have no idea what is your actual algo for choosing elements for removal. Please elaborate on "but not all occurrences".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unique items in a list with condition - python

Related

getting second smallest value from list

Filtering a (Nx1) list in Python

Group repeated elements of a list

Slice a list in python

Removing Elements in A from B

Categories

Resources