concentrate 2 lists in python based on a shared substring - python

I have two lists, for example they look something like this:
list1 = ["Time1Person001", "Time1Person002", "Time1Person003", "Time1Person004", "Time1Person005"]
list2 = ["Time2Person001", "Time2Person003", "Time2Person004", "Time2Person007"]
I want to create a third list that contains only strings that share a substring of the last 3 charachters, so the output should be:
list3 = ["Time1Person001", "Time1Person003", "Time1Person004", "Time2Person001", "Time2Person003", "Time2Person004"]
Efficient way to do it?
Thanks!

Create a set of the common endings then filter each list by that set.
list1 = ["Time1Person001", "Time1Person002", "Time1Person003", "Time1Person004", "Time1Person005"]
list2 = ["Time2Person001", "Time2Person003", "Time2Person004", "Time2Person007"]
endings = set(v[-3:] for v in list1) & set(v[-3:] for v in list2)
list3 = [v for v in list1+list2 if v[-3:] in endings]

Is this what you're after:
list3 = [x for x in list1 for y in list2 if x[-3:] == y[-3:]]

Related

Return matching strings from two lists based on last 4 characters

list1 = ['1_Maths','2_Chemistry','10.1_Geography','12_History']
list2 = ['1_Maths', '2_Physics', '3_Chemistry','11.1_Geography','13_History']
I want to produce two outputs from list1 and list2 based on last 4 characters.
lists = [item for itm in list1 if itm in list2]
The above only prints 1_Maths. Cannot think of a way to produce all the matching subjects.
last_4char = [sub[ : -4] for sub in list1]
This could be an idea but I'm not sure how I can implement this to product the exact results from list1/2
Output
print(new_list1) = ['1_Maths','2_Chemistry','10.1_Geography','12_History']
print(new_list2) = ['1_Maths', '3_Chemistry','11.1_Geography','13_History']
Try this:
def common_4_last(list1, list2):
return [[i for i in list1 if i[-4:] in {k[-4:] for k in list2}], [i for i in list2 if i[-4:] in {k[-4:] for k in list1}]
This will result to a list with 2 elements, one list for items form list1 and a second list for items from list2 that fit the criteria of common last 4
You can run the function for any pair of lists
For example for your given list1 and list2 result will be:
common_4_last(list1, list2)
[['1_Maths', '2_Chemistry', '10.1_Geography', '12_History'], ['1_Maths', '3_Chemistry', '11.1_Geography', '13_History']]
If you want the first list you can get it by
common_4_last(list1, list2)[0] and the same for second list

Create two lists from list of tuples

I want to create two lists based on sorted_bounds with every other tuple.
bounds = [1078.08, 1078.816, 1078.924, 1079.348, 1079.448, 1079.476]
sorted_bounds = list(zip(bounds,bounds[1:]))
print(sorted_bounds)
# -> [(1078.08, 1078.816), (1078.816, 1078.924), (1078.924, 1079.348), (1079.348, 1079.448), (1079.448, 1079.476)]
Desired output:
list1 = [(1078.08, 1078.816), (1078.924, 1079.348), (1079.448, 1079.476)]
list2 = [(1078.816, 1078.924), (1079.348, 1079.448)]
How would I do this? I am completely blanking.
list1 = sorted_bounds[0::2]
list2 = sorted_bounds[1::2]
The third value in brackets is "step", so every second element in this case.
Trying to think of a way of doing this in a single pass, but here's a clean way to do it in two passes:
list1 = [x for i, x in enumerate(sorted_bounds) if not i % 2]
list2 = [x for i, x in enumerate(sorted_bounds) if i % 2]
print(list1)
print(list2)
Result:
[(1078.08, 1078.816), (1078.924, 1079.348), (1079.448, 1079.476)]
[(1078.816, 1078.924), (1079.348, 1079.448)]

Loop through 2 list of dictionaries

I have this 2 list of dictionaries, I was trying to print out all the names from list1 if they are not found in list2.
list1=[{'name':'A','color':'1'},
{'name':'B','color':'2'}]
list2=[{'name':'A','color':'3'},
{'name':'C','color':'1'}]
for item in list1:
for ii in list2:
if item['name'] != ii['name']:
print item['name']
The output I'm getting is
A
B
B
I expected it to print B because there's not b in list2. Not sure what I'm doing wrong...any help would be appreciated.
Thanks
That's (obviously) not the logic of your code. You iterate through all combinations of names, and print the one from list1 every time it doesn't match any name in list2.
Instead, don't print it until you know it is a mismatch for all of those names:
for item in list1:
found = False
for ii in list2:
if item['name'] == ii['name']:
found = True
if not found:
print item['name']
This is the direct change to your implementation. There are one-liners that can do this using comprehensions, all, and other Python capabilities.
You iterate over and print in every cases where the match is not found.
You can instead use a lookup in a set which is more effective:
for x in list1:
if x['name'] not in {y['name'] for y in list2}:
print(x['name'])
Using all(), you can do:
for x in list1:
if all(x['name'] != y['name'] for y in list2):
print(x['name'])
Currently in your double for loop you print item['name'] for a mismatch between any two elements of list1 and list2, which is not what you want.
Instead you can convert the names in both lists to a set and take the set difference
list1=[{'name':'A','color':'1'},
{'name':'B','color':'2'}]
list2=[{'name':'A','color':'3'},
{'name':'C','color':'1'}]
#Iterate through both lists and convert the names to a set in both lists
set1 = {item['name'] for item in list1}
set2 = {item['name'] for item in list2}
#Take the set difference to find items in list1 not in list2
output = set1 - set2
print(output)
The output will be
{'B'}
If the names are unique in list1, you can use a set:
list1=[{'name':'A','color':'1'},
{'name':'B','color':'2'}]
list2=[{'name':'A','color':'3'},
{'name':'C','color':'1'}]
set1 = set(d['name'] for d in list1)
missingNames = set1.difference(d['name'] for d in list2) # {'B'}
If they are not unique and you want to match number of instances, you can do it with Counter from collections:
from collections import Counter
count1 = Counter(d['name'] for d in list1)
count2 = Counter(d['name'] for d in list2)
missingNames = list((count1-count2).elements()) # ['B']
With Counter, if you had two entries in list1 with name 'A', then the output would have been ['A','B'] since only one of the two 'A' in list1 would find a match in list2

Matching characters in strings

I have 2 lists of strings:
list1 = ['GERMANY','FRANCE','SPAIN','PORTUAL','UK']
list2 = ['ERMANY','FRANCE','SPAN','PORTUGAL','K']
I wanted to obtain a list where only the respective strings with 1 character less are shown. i.e:
final_list = ['ERMANY','SPAN','K']
What's the best way to do it? Using regular expressions?
Thanks
You can try this:
list1 = ['GERMANY','FRANCE','SPAIN','PORTUGAL','UK']
list2 = ['ERMANY','FRANCE','SPAN','PORTUGAL','K']
new = [a for a, b in zip(list2, list1) if len(a) < len(b)]

compare two lists and print out unequal elements

I have two lists in the following format:
list1 = ['A','B','C','D']
list2 = [('A',1),('B',2),('C',3)]
I want to compare the two lists and print out a third list which will have those elements present in list1 but not in list2 and I want to compare only the list2[i][0] elements.
I tried the below code:
fin = [i for i in list1 if i not in list2]
But it prints all the elements in list1. I want the output in the above case to be :
fin = ['D']
Could somebody please suggest how to do that?
Also, I do not want to convert my 2D array list2 to 1D array.
Use the set difference.
set(list1) - set(i[0] for i in list2)
You can do this as well (you need to compare i with the first element of each tuple in list2):
fin = [i for i in list1 if i not in map(lambda(x,_):x,list2)]
How about nested comprehensions:
fin = [a for a in list1 if a not in [b for b,_ in list2]]

Categories