Python: unique value in list array - python

Image this output from fuzzywuzzy (values could be in another sequence):
[('car', 100, 28),
('tree', 80, 5),
('house', 44, 12),
('house', 44, 25),
('house', 44, 27)]
i want to treat the three houses as the same.
What is an efficient way to have only unique string values to come to this result:
(EDIT: since all houses has the same value 44, I don´t care which of them is in the list. The last house value is irrelevant)
[('car', 100, 28),
('tree', 80, 5),
('house', 44, 12)]
I saw a lot of questions here about uniqueness in lists, but the answers are not working for my example, mostly because author needs a solution just for one list.
I tried this:
unique = []
for element in domain1:
if element[0] not in unique:
unique.append(element)
I thought I cound address the first values with element[0] and check if they exists in unique.
If I print unique I have the same result as after fuzzywuzzy. Seems I am not on the right path with my idea, so how can I achieve my desired result?
Thanks!

you can use dict for it for example:
data = [('car', 100, 28),
('tree', 80, 5),
('house', 44, 12),
('house', 44, 25),
('house', 44, 27)
]
list({x[0]: x for x in reversed(data)}.values())
give you
[('house', 44, 12), ('tree', 80, 5), ('car', 100, 28)]
using the dict give you unique by first element, and the reversed need to put right value to the result ( by default it will be last met).

Could use dict.setdefault here to store the first item found(using first item in tuple as key):
lst = [
("car", 100, 28),
("tree", 80, 5),
("house", 44, 12),
("house", 44, 25),
("house", 44, 27),
]
d = {}
for x, y, z in lst:
d.setdefault(x, (x, y, z))
print(list(d.values()))
Or using indexing instead of tuple unpacking:
d = {}
for item in lst:
d.setdefault(item[0], item)
Output:
[('car', 100, 28), ('tree', 80, 5), ('house', 44, 12)]

Related

Python sum values in list of tuples up to certain values

NOTE: I edited the question!
I am having trouble with iteration in Python, especially when I would like to sum up values up to a certain number. Here's more information on the problem I'm facing:
I have a list of tuples that looks like this:
[(1, 0.5, 'min'),
(2, 3, 'NA'),
(3, 6, 'NA'),
(4, 40, 'NA'),
(5, 90, 'NA'),
(6, 130.8, 'max'),
(7, 129, 'NA'),
(8, 111, 'NA'),
(9, 8, 'NA'),
(10, 9, 'NA'),
(11, 0.01, 'min'),
(12, 9, 'NA'),
(13, 40, 'NA'),
(14, 90, 'NA'),
(15, 130.1, 'max'),
(16, 112, 'NA'),
(17, 108, 'NA'),
(18, 90, 'NA'),
(19, 77, 'NA'),
(20, 68, 'NA'),
(21, 0.9, 'min'),
(22, 8, 'NA'),
(23, 40, 'NA'),
(24, 90, 'NA'),
(25, 92, 'NA'),
(26, 130.4, 'max')]
I want to sum each value leading up to "max" and each value leading up to "min" and append these results to two separate lists.
For instance, the output should be:
min_sums = [1+2+3+4+5,11+12+13+14, 21+22+23+24+15]
max_sums = [6+7+8+9+10, 15+16+17+18+19+20, 26]
I would also like to keep track of the values I am actually summing up and have this as an output as well:
min_sums_lst = [[1,2,3,4,5], [11,12,13,14],[21,22,23,24,15]]
max_sums_lst = [[6,7,8,9,10], [15,16,17,18,19,20], [26]]
I'm thinking I can use the index value, but am pretty new to Python and am not exactly sure how to proceed. I'm studying biology, but I believe learning CS could help with my work.
max_list = []
min_list = []
flag = ''
min_index = 0
max_index = float('inf');
if flag == 'h':
    max_list.append(item)
elif flag == 'c':
    min_list.append(item)
for i, item in enumerate(minmax_list):
    print(i, item)
    print("max_index: ", max_index)
    print("min_index: ", min_index)
    if item[2] == 'min':
         min_index = i
         max_list('h', item[0])
    elif item[2] == 'NA' and (i < max_index):
        max_list('h', item[0])
    elif item[2] == 'max':
         max_index = i
         max_list('c', item[0])
    elif item[2] == 'NA' and (i > min_index):
        min_list('c', item[0])
I'm quite new to Python - any help would be appreciated. I am only trying to add the first item in each tuple based on min and max as indicated in the output above.
My answer takes a slightly different approach to #Stefan's. It does a bit more validation, and you could pretty easily add other kinds besides 'min' and 'max'.
def partition_items(items):
lists = {
'min': [],
'max': [],
}
current_kind = None
current_list = None
for value, _, kind in items:
if kind != current_kind and kind != 'NA':
current_kind = kind
# You'll get a error here if current_kind isn't one of 'min'
# or 'max'.
current_list = lists[current_kind]
current_list.append(0)
# You'll get an error here if the first item in the list doesn't
# have type of 'min' or 'max'.
current_list[-1] += value
return lists
lists = partition_items(items)
print(lists['min'])
# -> [15, 50, 115]
print(lists['max'])
# -> [40, 105, 26]
Sorry, didn't bother reading your attempt, looks very complicated.
min_sums = []
max_sums = []
for x, _, what in minmax_list:
if what != 'NA':
current = min_sums if what == 'min' else max_sums
current.append(0)
current[-1] += x

Sorting tuples with a custom sorting function using a range?

I would like to sort a list of tuples based on the two last columns:
mylist = [(33, 36, 84),
(34, 37, 656),
(23, 38, 42)]
I know I can do this like:
final = sorted(mylist, key:lambda x: [ x[1], x[2]])
Now my problem is that I want to compare the second column of my list with a special condition: if the difference between two numbers is less than an offset they should be taken as equal ( 36 == 37 == 38) and the third column should be used to sort the list. The end result I wish to see is:
mylist = [(23, 38, 42)
(33, 36, 84),
(34, 37, 656)]
I was thinking of creating my own integer type and overriding the equal operator. Is this possible? is it overkill? Is there a better way to solve this problem?
I think the easiest way is to create a new class that compares like you want it to:
mylist = [(33, 36, 84),
(34, 37, 656),
(23, 38, 42)]
offset = 2
class Comp(object):
def __init__(self, tup):
self.tup = tup
def __lt__(self, other): # sorted works even if only __lt__ is implemented.
# If the difference is less or equal the offset of the second item compare the third
if abs(self.tup[1] - other.tup[1]) <= offset:
return self.tup[2] < other.tup[2]
# otherwise compare them as usual
else:
return (self.tup[1], self.tup[2]) < (other.tup[1], other.tup[2])
A sample run shows your expected result:
>>> sorted(mylist, key=Comp)
[(23, 38, 42), (33, 36, 84), (34, 37, 656)]
I think it's a bit cleaner than using functools.cmp_to_key but that's a matter of personal preference.
Sometimes an old-style sort based on a cmp function is easier than doing one based on a key. So -- write a cmp function and then use functools.cmp_to_key to convert it to a key:
import functools
def compare(s,t,offset):
_,y,z = s
_,u,v = t
if abs(y-u) > offset: #use 2nd component
if y < u:
return -1
else:
return 1
else: #use 3rd component
if z < v:
return -1
elif z == v:
return 0
else:
return 1
mylist = [(33, 36, 84),
(34, 37, 656),
(23, 38, 42)]
mylist.sort(key = functools.cmp_to_key(lambda s,t: compare(s,t,2)))
for t in mylist: print(t)
output:
(23, 38, 42)
(33, 36, 84)
(34, 37, 656)
In https://wiki.python.org/moin/HowTo/Sorting look for "The Old Way Using the cmp Parameter". This allows you to write your own comparison function, instead of just setting the key and using comparison operators.
There is a danger to making a sort ordering like this. Look up "strict weak ordering." You could have multiple different valid orderings. This can break other code which assumes there is one correct way to sort things.
Now to actually answer your question:
mylist = [(33, 36, 84),
(34, 37, 656),
(23, 38, 42)]
def custom_sort_term(x, y, offset = 2):
if abs(x-y) <= offset:
return 0
return x-y
def custom_sort_function(x, y):
x1 = x[1]
y1 = y[1]
first_comparison_result = custom_sort_term(x1, y1)
if (first_comparison_result):
return first_comparison_result
x2 = x[2]
y2 = y[2]
return custom_sort_term(x2, y2)
final = sorted(mylist, cmp=custom_sort_function)
print final
[(23, 38, 42), (33, 36, 84), (34, 37, 656)]
Not pretty, but I tried be general in my interpretation of OP's problem statement
I expanded test case, then applied unsophisticated blunt force
# test case expanded
mylist = [(33, 6, 104),
(31, 36, 84),
(35, 86, 84),
(30, 9, 4),
(23, 38, 42),
(34, 37, 656),
(33, 88, 8)]
threshld = 2 # different final output can be seen if changed to 1, 3, 30
def collapse(nums, threshld):
"""
takes sorted (increasing) list of numbers, nums
replaces runs of consequetive nums
that successively differ by threshld or less
with 1st number in each run
"""
cnums = nums[:]
cur = nums[0]
for i in range(len(nums)-1):
if (nums[i+1] - nums[i]) <= threshld:
cnums[i+1] = cur
else:
cur = cnums[i+1]
return cnums
mylists = [list(i) for i in mylist] # change the tuples to lists to modify
indxd=[e + [i] for i, e in enumerate(mylists)] # append the original indexing
#print(*indxd, sep='\n')
im0 = sorted(indxd, key=lambda x: [ x[1]]) # sort by middle number
cns = collapse([i[1] for i in im0], threshld) # then collapse()
#print(cns)
for i in range(len(im0)): # overwrite collapsed into im0
im0[i][1] = cns[i]
#print(*im0, sep='\n')
im1 = sorted(im0, key=lambda x: [ x[1], x[2]]) # now do 2 level sort
#print(*sorted(im0, key=lambda x: [ x[1], x[2]]), sep='\n')
final = [mylist[im1[i][3]] for i in range(len(im1))] # rebuid using new order
# of original indices
print(*final, sep='\n')
(33, 6, 104)
(30, 9, 4)
(23, 38, 42)
(31, 36, 84)
(34, 37, 656)
(33, 88, 8)
(35, 86, 84)

Find duplicates in a list of lists with tuples

I am trying to find duplicates within tuples that are nested within a list. This whole construction is a list too. If there are other better ways to organize this to let my problem to be solved - I'd be glad to know, because this is something I build on the way.
pairsList = [
[1, (11, 12), (13, 14)], #list1
[2, (21, 22), (23, 24)], #list2
[3, (31, 32), (13, 14)], #list3
[4, (43, 44), (21, 22)], #list4
]
The first element in each list uniquely identifies each list.
From this object pairsList, I want to find out which lists have identical tuples. So I want to report that list1 has the same tuple as list3 (because both have (13,14). Likewise, list2 and list4 have the same tuple (both have (21,22)) and need to be reported. The position of tuples within the list doesn't matter (list2 and list4 both have (13,14) even though the position in the list the tuple has is different).
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
I am aware of sets and have used them to delete duplicates within the lists in other situations, but cannot understand how to solve this problem. I can check like this if one list contains any element from the other list:
list1 = [1, (11, 12), (13, 14)]
list2 = [3, (31, 32), (13, 14)]
print not set(list1).isdisjoint(list2)
>>>True
So, the code below lets me know what lists have same tuple(s) as the first one. But what is the correct way to perform this on all the lists?
counter = 0
for pair in pairsList:
list0 = pairsList[0]
iterList = pairsList[counter]
if not set(list0).isdisjoint(iterList):
print iterList[0] #print list ID
counter += 1
The first element in each list uniquely identifies each list.
Great, then let's convert it to a dict first:
d = {x[0]: x[1:] for x in pairsList}
# d:
{1: [(11, 12), (13, 14)],
2: [(21, 22), (23, 24)],
3: [(31, 32), (13, 14)],
4: [(43, 44), (21, 22)]}
Let's index the whole data structure:
index = {}
for k, vv in d.iteritems():
for v in vv:
index.setdefault(v, []).append(k)
Now index is:
{(11, 12): [1],
(13, 14): [1, 3],
(21, 22): [2, 4],
(23, 24): [2],
(31, 32): [3],
(43, 44): [4]}
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
pairs = [v for v in index.itervalues() if len(v) == 2]
returns [[1,3],[2,4]].

sorting dates in the form mm/dd/yy

I'm extremely new to Python and I was wondering how I would be able to sort a list of tuples where the second element of each tuple is another tuple, like this:
[ ('Adam', (12,16,1949) ), ('Charlie', (9,4,1988) ), ('Daniel', (11,29,1990) ),
('Ellie', (11, 28, 1924) ), ('Feenie', (2,10,1954) ), ('Harry', (8,15,1924) ),
('Iggy', (12, 29, 1924) ), ('Jack', (2,21,1920) )]
I want to sort the list by using the sorted function to organize from youngest to oldest, but when I tried my function:
def sort_ages(a:list):
return sorted(a, key=lambda x:x[1][2], reverse=True)
[('Daniel', (11, 29, 1990)), ('Charlie', (9, 4, 1988)), ('Feenie', (2, 10, 1954)), ('Adam', (12, 16, 1949)), ('Ellie', (11, 28, 1924)), ('Harry', (8, 15, 1924)), ('Iggy', (12, 29, 1924)), ('Jack', (2, 21, 1920))]
It organizes it by ascending year, but doesn't seem to care about the month or day.
Most of the questions I found here had their dates in the form YYYY-MM-DD.
sorted(xs, reverse=True, key=lambda (name, (m,d,y)): (y,m,d))
Converting to datetime.date isn't necessary, but it is perhaps a better expression of the meaning.
sorted(xs, reverse=True, key=lambda (name, (m,d,y)): datetime.date(y,m,d))
Make the key function to return a tuple that contains year, month, day in order.
def by_day(t):
m, d, y = t[1]
return y, m, d
def sort_ages(a:list):
return sorted(a, key=by_day, reverse=True)
example output for the given list:
[('Daniel', (11, 29, 1990)),
('Charlie', (9, 4, 1988)),
('Feenie', (2, 10, 1954)),
('Adam', (12, 16, 1949)),
('Iggy', (12, 29, 1924)),
('Ellie', (11, 28, 1924)),
('Harry', (8, 15, 1924)),
('Jack', (2, 21, 1920))]
def sort_ages(a:list):
return sorted(a, key=lambda x:(x[1][2], x[1][0], x[1][1]), reverse=True)
Sorts according to year, then month, then date

removing something from a list of tuples

Say I have a list:
[(12,34,1),(123,34,1),(21,23,1)]
I want to remove the 1 from each tuple in the list so it becomes
[(12,34),(123,34),(21,23)]
You want to truncate your tuples, use a list comprehension:
[t[:-1] for t in listoftuples]
or, as a simple demonstration:
>>> listoftuples = [(12,34,1),(123,34,1),(21,23,1)]
>>> [t[:-1] for t in listoftuples]
[(12, 34), (123, 34), (21, 23)]
Tuples are immutable, so you can not remove an item. However, you can create a new tuple from the old tuple not including the elements you do not want to. So, to delete an arbitrary item from each tuple from a list of tuples, you can do:
def deleteItem(lst, toDel):
return [tuple(x for x in y if x != toDel) for y in lst]
Result:
>>> lst = [(12,34,1),(123,34,1),(21,23,1)]
>>> deleteItem(lst, 1)
[(12, 34), (123, 34), (21, 23)]
>>> a=[(12, 34, 1), (123, 34, 1), (21, 23, 1)]
>>> [filter (lambda a: a != 1, x) for x in a]
[(12, 34), (123, 34), (21, 23)]
THis will remove all 1 from the tuple irrespective of index
Since you can't change tuples (as they are immutable), I suggest using a list of lists:
my_list = [[12,34,1],[123,34,1],[21,23,1]]
for i in my_list:
i.remove(1)
return my_list
This returns: [[12, 34], [123, 34], [21, 21]].
python 3.2
1. [(i,v)for i,v,c in list1]
2. list(map(lambda x:x[:2],list1))

Categories