parse sublists of a list in python - python

In python, I need to parse a list consisting of sublists. If the first elements of some sublists are the same, I need to pick the sublist with the least 4th element; but if the 4th elements are also the same, then I need select the sublist with higher 3rd element. For example, in the following list, I need to select sublists 1, 4 and 5.
alignments=[["A","B","10","4"],["A","C","15","8"],["A","E","20","10"],\
["D","C","15","3"],\
["G","U","1","9"],["G","O","10","9"]]
I achieved it with the code below which is very cumbersome:
best_alignments=[]
best_al=alignments[0]
k=0
c=0
counter_list=[]
for al in alignments[1:]:
c+=1
if best_al[0]==al[0]:
if best_al[3]==al[3]:
if best_al[2]<al[2]:
best_al=al
counter_list.append(c-1)
else:
counter_list.append(c)
else:
counter_list.append(c)
else:
if k==0:
best_al=al
k+=1
else:
best_al=al
for index in sorted(counter_list, reverse=True):
del alignments[index]
for el in alignments:
print(el)
I am sure there a much easier way to do that. Any suggestions are appreciated.

Here's a method that essentially does two passes over the data. First, it groups the data by the first item. Then, it returns the maximum as defined by your criteria, the least of the third element, and the most of the fourth (assuming you meant the integer value of the string).
from collections import defaultdict
def foo(alignments):
grouped = defaultdict(list)
for al in alignments:
grouped[al[0]].append(al)
return [
max(v, key=lambda al: (-int(al[2]),int(al[3])))
for v in grouped.values()
]
Pretty sure this is O(N) space and time, so not terribly inefficient.
In an Ipython repl:
In [3]: from collections import defaultdict
...: def foo(alignments):
...: grouped = defaultdict(list)
...: for al in alignments:
...: grouped[al[0]].append(al)
...: return [
...: max(v, key=lambda al: (-int(al[2]),int(al[3])))
...: for v in grouped.values()
...: ]
...:
In [4]: foo([['A', 'B', '10', '4'],
...: ['A', 'C', '15', '8'],
...: ['A', 'E', '20', '10'],
...: ['D', 'C', '15', '3'],
...: ['G', 'U', '1', '9'],
...: ['G', 'O', '10', '9']])
Out[4]: [['A', 'B', '10', '4'], ['D', 'C', '15', '3'], ['G', 'U', '1', '9']]

This is a sorted grouping where the sort order has multiple fields with different ascending/descending sequence. So you can sort the list in accordance with the fields and sequence, then pick the first occurrence of items based on the sublist's first element:
a = [["A","B","10","4"],["A","C","15","8"],["A","E","20","10"],
["D","C","15","3"],
["G","U","1","9"],["G","O","10","9"]]
seen = set()
sortKey = lambda sl: (sl[0],-int(sl[3]),sl[2])
first = lambda sl: sl[0] not in seen and not seen.add(sl[0])
result = [ sl for sl in sorted(a,key=sortKey) if first(sl) ]
print(result)
# [['A', 'E', '20', '10'], ['D', 'C', '15', '3'], ['G', 'U', '1', '9']]
This uses the sorted function's key parameter to produce a sorting order that will combine the 3 fields (reversing the order for the second sort field). Then filters the sorted list using a set to identify the first occurrence of the sublist's first field in each consecutive group.

Related

When I want to remove an element from the list, it deletes it incorrectly

pasw = "1234abc"
mylist = list(pasw)
a = list(map(lambda x: mylist.remove(x) if x.isnumeric() == True else False, mylist))
print(mylist)
Output:
['2', '4', 'a', 'b', 'c']
I want to check if there is a number in the list, and if there is a number, I want to delete it from the list.
As a general rule, it's not recommanded to modify a sequence you are iterating upon. The below function is similar to your map.
def deleting_while_iterating(iterable):
for i in iterable:
iterable.remove(i)
print(f"i: {i}, iterable: {iterable}")
If I give this function your input, the output is:
i: 1, iterable: ['2', '3', '4', 'a', 'b', 'c']
i: 3, iterable: ['2', '4', 'a', 'b', 'c']
i: a, iterable: ['2', '4', 'b', 'c']
i: c, iterable: ['2', '4', 'b']
As you can see, after the first iteration, "2" who originally was at index 1 is now at index 0. However, i is now at index 1 and thus "2" will be skipped. That's why it's better to create a new list containing only the elements you want. There are different ways to do it.

How to write complex sort in python?

Is there a concise way to sort a list by first sorting numbers in ascending order and then sort the characters in descending order?
How would you sort the following:
['2', '4', '1', '6', '7', '4', '2', 'K', 'A', 'Z', 'B', 'W']
To:
['1', '2', '2', '4', '4', '6', '7', 'Z', 'W', 'K', 'B', 'A']
One way (there might be better ones) is to separate digits and letters beforehand, sort them appropriately and glue them again together in the end:
lst = ['2', '4', '1', '6', '7', '4', '2', 'K', 'A', 'Z', 'B', 'W']
numbers = sorted([number for number in lst if number.isdigit()])
letters = sorted([letter for letter in lst if not letter.isdigit()], reverse=True)
combined = numbers + letters
print(combined)
Another way makes use of ord(...) and the ability to sort by tuples. Here we use zero for numbers and one for letters:
def sorter(item):
if item.isdigit():
return 0, int(item)
else:
return 1, -ord(item)
print(sorted(lst, key=sorter))
Both will yield
['1', '2', '2', '4', '4', '6', '7', 'Z', 'W', 'K', 'B', 'A']
As for timing:
def different_lists():
global my_list
numbers = sorted([number for number in my_list if number.isdigit()])
letters = sorted([letter for letter in my_list if not letter.isdigit()], reverse=True)
return numbers + letters
def key_function():
global my_list
def sorter(item):
if item.isdigit():
return 0, int(item)
else:
return 1, -ord(item)
return sorted(my_list, key=sorter)
from timeit import timeit
print(timeit(different_lists, number=10**6))
print(timeit(key_function, number=10**6))
This yields (running it a million times on my MacBook):
2.9208732349999997
4.54283629
So the approach with list comprehensions is faster here.
To elaborate on the custom-comparison approach: in Python the built-in sort does key comparison.
How to think about the problem: to group values and then sort each group by a different quality, we can think of "which group is a given value in?" as a quality - so now we are sorting by multiple qualities, which we do with a key that gives us a tuple of the value for each quality.
Since we want to sort the letters in descending order, and we can't "negate" them (in the arithmetic sense), it will be easiest to apply reverse=True to the entire sort, so we keep that in mind.
We encode: True for digits and False for non-digits (since numerically, these are equivalent to 1 and 0 respectively, and we are sorting in descending order overall). Then for the second value, we'll use the symbol directly for non-digits; for digits, we need the negation of the numeric value, to re-reverse the sort.
This gives us:
def custom_key(value):
numeric = value.isdigit()
return (numeric, -int(value) if numeric else value)
And now we can do:
my_list.sort(key=custom_key, reverse=True)
which works for me (and also handles multi-digit numbers):
>>> my_list
['1', '2', '2', '4', '4', '6', '7', 'Z', 'W', 'K', 'B', 'A']
You will have to implement your own comparison function and pass it as the key argument for the sorted function. What you are seeking is not a trivial comparison as you "assign" custom values to fields so you will have to let Python know how you value each one of them

How to keep unique inner lists within a list of lists by ignoring one element of the inner list

I have a list of lists, and would like to keep the unique lists by ignoring one element of the list.
MWE:
my_list_of_lists = [['b','c','1','d'],['b','c','1','d'],['b','c','2','e']]
print(my_list_of_lists)
new_list_of_lists = []
for the_list in my_list_of_lists:
if the_list not in new_list_of_lists:
new_list_of_lists.append(the_list)
print(new_list_of_lists)
MWE Output:
[['b', 'c', '1', 'd'], ['b', 'c', '1', 'd'], ['b', 'c', '2', 'e']] # 1st print
[['b', 'c', '1', 'd'], ['b', 'c', '2', 'e']] # 2nd print
Question:
Is there a Pythonic way to remove duplicates as with the example above by ignoring a specific element within the inner list?
ie for my_list_of_lists = [['b','c','1','d'],['b','c','3','d'],['b','c','2','e']] should yield [['b','c','1','d'],['b','c','2','e']]
my_list_of_lists = [['b','c','1','d'],['b','c','3','d'],['b','c','2','e']]
# my_list_of_lists[0] and my_list_of_lists[1] are identical
# if my_list_of_lists[n][-2] is ignored
print(my_list_of_lists)
new_list_of_lists = []
for the_list in my_list_of_lists:
if the_list[ignore:-2] not in new_list_of_lists: #ignore the second last element when comparing
new_list_of_lists.append(the_list)
print(new_list_of_lists)
This is not "Pythonic" per se, but it is relatively short and gets the job done:
my_list_of_lists = [['b','c','1','d'],['b','c','3','d'],['b','c','2','e']]
print(my_list_of_lists)
new_list_of_lists = []
ignore = 2
for the_list in my_list_of_lists:
if all(
any(e != other_list[i]
for i, e in enumerate(the_list)
if i != ignore)
for other_list in new_list_of_lists
):
new_list_of_lists.append(the_list)
print(new_list_of_lists)
It outputs [['b', 'c', '1', 'd'], ['b', 'c', '2', 'e']] for the given input.
My question and your reply from the comments:
"ignoring a specific element" - Which one? The first? The largest? The one that's a digit? Some other rule? Your example input doesn't specify. – Heap Overflow
#HeapOverflow, I think a generic (non-specific) function would be best as other users in the future can integrate this generic function for their own use. – 3kstc
Doing that #GreenCloakGuy's style:
def unique(values, key):
return list({key(value): value for value in values}.values())
new_list_of_lists = unique(my_list_of_lists, lambda a: tuple(a[:2] + a[3:]))
A bit shorter:
def unique(values, key):
return list(dict(zip(map(key, values), values)).values())
Those take the last duplicate. If you want the first, you could use this:
def unique(values, key):
tmp = {}
for value in values:
tmp.setdefault(key(value), value)
return list(tmp.values())
The following approach
Creates a dict
where the values are the lists in the list-of-lists
and the corresponding keys are those lists without the indices you want to ignore, converted to tuples (since lists cannot be used as dict keys but tuples can)
Gets the dict's values, which should appear in order of insertion
Converts that to a list, and returns it
This preserves the later elements in the original list, as they overwrite earlier elements that are 'identical'.
def filter_unique(list_of_lists, indices_to_ignore):
return list({
tuple(elem for idx, elem in enumerate(lst) if idx not in indices_to_ignore) : lst
for lst in list_of_lists
}.values())
mlol = [['b','c','1','d'],['b','c','3','d'],['b','c','2','d']]
print(filter_unique(mlol, [2]))
# [['b', 'c', '3', 'd'], ['b', 'c', '2', 'e']]
print(filter_unique(mlol, [3]))
# [['b', 'c', '1', 'd'], ['b', 'c', '3', 'd'], ['b', 'c', '2', 'e']]
That's a one-liner, abusing a dict comprehension. A multi-line version might look like this:
def filter_unique(list_of_lists, indices_to_ignore):
dct = {}
for lst in list_of_lists:
key = []
for idx, elem in enumerate(lst):
if idx not in indices_to_ignore:
key.append(elem)
dct[tuple(key)] = lst
return list(dct.values())

How to convert only parts of a string list into integers

I have a list of lists like:
[['c', '2', '3', '4', 'd', '1'], ['e', '14', '16', '18', 'f', '1'], etc.]
They all follow the same pattern (one character string, 3 number strings, one character string, one number string). I would like to convert all of the number strings into integers and am having difficulties doing so.
I have tried an exception loop, which doesn't seem to be working (I'm not sure why).
I know its targeting the sublists as originally I got a value error of int() doesn't recognise base 10 'c' (the first letter in the first element of the sublist.
rows = []
with open(path) as infile:
for line in infile:
line = line.strip()
if not line:
continue
try:
[[int(i) for i in sub] for i in rows for sub in i]
except ValueError:
pass
rows.append(line.split("\t"))
del rows[0]
When I print the results with the exception loop in it, it still produces a list of lists as if the exception wasn't there in the first place.
e.g.
[['c', '2', '3', '4', 'd', '1'], ['e', '14', '16', '18', 'f', '1'], etc.]
whereas I expect it to be:
[['c', 2, 3, 4, 'd', 1], ['e', 14, 16, 18, 'f', 1], etc.]
It is a data set analysis, so a requirement for it is to remain in this list of list format (so I can't target just a list using rows.append, as it changes how to split the final data). I was thinking if I can't get this to work, I might experiment with a full list to tuple conversion, with an exception loop for characters and then attempt to convert and split it back into a list of lists. Any help or understanding why this loop isn't working would be very appreciated, or other ways to get this result.
Thank you!
Use:
print([[int(x) if x.isdigit() else x for x in i] for i in rows])
Full code:
rows = []
with open(path) as infile:
for line in infile:
line = line.strip()
if not line:
continue
rows.append(line.split("\t"))
rows = [[int(x) if x.isdigit() else x for x in i] for i in rows]
del rows[0]
If you don't want to rely on autodetect, follow your data format:
inp = [['c', '2', '3', '4', 'd', '1'], ['e', '14', '16', '18', 'f', '1']]
out = [[c1, int(d1), int(d2), int(d3), c2, int(d4)] for c1, d1, d2, d3, c2, d4 in inp]
I see two problems in your code, both in the same lane:
[[int(i) for i in sub] for i in rows for sub in i]
first of all, you are using i twice, and you might override it's value. Try to replace one of those i with a different letter, for example, j
[[int(j) for j in sub] for i in rows for sub in i]
The second problem is that this is a list of comprehension. You are creating a new list, you are not updating any list. You should assign this list to some variable:
rows = [[int(j) for j in sub] for i in rows for sub in i]
Also, I saw that user U10-Forward added a nice solution. I just wanted to explain why your solution is not working ;)

How to split string at index[0] of each sublist, and have each split index in it's own original list?

How could you split a string at index 0 of all sublists into separate elements, and have each of the split elements be contained within a copy of the original sublist?
For example:
Original_List = [['a,b','c','d','e','f'],['1','2','3'],['z,y','1','2']]
Desired Result:
Desired_List = [['a','c','d','e','f'],['b','c','d','e','f'],['1','2','3'],['z','1','2'],['y','1','2']]
Also, to add further clarity with one more actual example:
Original_List = [['Contract_ID1,Contract_ID2','Amount','Date'],['Contract_ID3,Contract_ID4','400','Jan1']]
I want every sublist to have only one Contract_ID, but still have the Amount and Date Associated with it
Desired_List = [['Contract_ID1','Amount','Date'],['Contract_ID2','Amount','Date'],['Contract_ID3','400','Jan1'],['Contract_ID4','400','Jan1']]
I can split all strings at index 0 of all sublists with the below, but I can't figure out how I would duplicate the whole list for each split element and then replace the strings that were split with the split element, so that each split element had its own list.
Split_First_Indices_of_Sublists = map(lambda x: x[0].split(","),Original_List)
>>[['a', 'b'], ['1'], ['z', 'y']]
for x in Original_List:
x.pop(0)
>> [['c', 'd', 'e', 'f'], ['2', '3'], ['1', 2]]
I think it's clearest written out as explicit loops:
Desired_List = []
for li in Original_List:
for spl in li[0].split(','):
Desired_List.append([spl] + li[1:])
gives:
Desired_List
Out[153]:
[['a', 'c', 'd', 'e', 'f'],
['b', 'c', 'd', 'e', 'f'],
['1', '2', '3'],
['z', '1', '2'],
['y', '1', '2']]
And of course you can immediately turn this into the equivalent one-line list comp:
[[spl] + li[1:] for li in Original_List for spl in li[0].split(',')]
Which may or may not be less readable, depending on the reader :-)
And as a last note, make sure that this is the data structure you really want. A dict keyed by Contract_ID seems like a very natural structure for your end product.
Original_List=[['a,b','c','d','e','f'],['1','2','3'],['z,y','1','2']]
desired_list=[]
for p in pl:
try:
splited=p.split(',')
if not type(splited) is list:
splited=[splited]
if count(splited)>1:
for list in splited:
p[0]=list
desired_list.append(p)
else:
desired_list.append()
except:
pass

Categories