Fastest way to match two list fields python

Fastest way to match two list fields python - python

I have an issue with time in my latest python script.
In essence, i have two lists, e.g.
List1:
([a,1],[b,2])
List2:
([a,3],[b,4])
Now in the example above i have provided two entries in each list. However, in reality there is about 150,000.
In my current script I retrieve the first field from the first list [a] and loop through the entire List2 till there is a match. The two list entries are then appended.
The final result would be:
([a,1,3],[b,2,4])
However, given the size of the lists this is taking forever.
Is there a way i can use the field of list1 [a] and in constant time retrieve all entries in list2 that have [a]
I have seen some answers online suggesting sets, but i am unsure as to how to implement one and use it to solve the solution above.
Any help would be appreciated.
Further example:
l1=(['abc123','hi'], ['efg456','bye']) - l1 has around 2000 tuples
l2=(['abc123','letter'],['abc123','john'],['abc123','leaf']) - l2 has around 100,000+ tuples
Output:
l3=(['abc123','hi','letter'],['abc123','hi','john'],['abc123','hi','leaf'])

Not so hard, just use a dict for list1 and a for loop for list2.
dict1 = {key1: [value1] for key1, value1 in list1} # convert list1 to dict
# and the values should be converted to dict
for key2, value2 in list2:
try:
dict1[key2].append(value2)
except KeyError:
continue # I'm not sure what do you want to do if the keys in list2 didn't exist in list1, so just ignore them
list3 = tuple([key3, *value3] for key3, value3 in dict1.items())
print(list3)

If your a and b values are unique, you can convert the "lists" (what you have is actually a tuple of lists, not a list of lists) into dictionaries and then merge them. For example:
l1 = (['a', 1], ['b', 2], ['c', 5])
l2 = (['a', 3], ['b', 4])
d1 = { k : [v] for [k, v] in l1 }
d2 = { k : [v] for [k, v] in l2 }
for k in d1.keys():
d1[k] += d2.get(k, [])
print(d1)
Output:
{'a': [1, 3], 'b': [2, 4], 'c': [5]}
You can convert that dictionary back to a tuple of lists using a comprehension:
print(tuple([k, *v] for k, v in d1.items()))
Output:
(['a', 1, 3], ['b', 2, 4], ['c', 5])

Related

how can I merge lists to create a python dictionary

So I have tried many methods to do this but could not find a working solution. In this problem, I have two python arrays and I would like to join them to create a big dictionary. It would go something like this:
`list1 = [
[2, "ford"],
[4,"Ferrari"],
[3, "Mercedes"],
[1, "BMW"]
]`
`list2 = [
[4, "mustang"],
[3,"LaFerrari"],
[2,"CLA"],
[1,"M5"],
[6,"opel"]
]`
The result that I would like to have is a dictionary that looks like this:
`result = {
1: ["BMW","M5"], 2: ["Ford","CLA"], 3: ["Mercedes","LaFerrari"], 4: ["Ferrari","Mustang"], 6:["Opel"]
}`
So it just basically needs to merge these two arrays based on the "key" (which is just the [0] place in the array)

It looks like task for collections.defaultdict I would do:
import collections
list1 = [
[1, "ford"],
[2,"Ferrari"],
[3, "Mercedes"],
[4, "BMW"]
]
list2 = [
[1, "mustang"],
[2,"LaFerrari"],
[3,"CLA"],
[4,"M5"]
]
result = collections.defaultdict(list)
for key, value in list1:
result[key].append(value)
for key, value in list2:
result[key].append(value)
result = dict(result)
print(result)
Output:
{1: ['ford', 'mustang'], 2: ['Ferrari', 'LaFerrari'], 3: ['Mercedes', 'CLA'], 4: ['BMW', 'M5']}
Here I used defaultdict with lists, unlike common dict if you try do something with value under key which do not exist yet, it did place list() i.e. empty list, then do requested action (appending in this case). At end I convert it into dict just to fullfill your requirement (create a python dictionary).

Use collections.defaultdict
from collections import defaultdict
result = defaultdict(list)
for k,v in list1 + list2:
result[k].append(v)
print (dict(result))
#{2: ['ford', 'CLA'], 4: ['Ferrari', 'mustang'], 3: ['Mercedes', 'LaFerrari'], 1: ['BMW', 'M5'], 6: ['opel']}

I am also pretty new to Python, but I think something like this should work if both lists have the same keys:
list1 = [
[1, "ford"],
[2, "Ferrari"],
[3, "Mercedes"],
[4, "BMW"]
]
list2 = [
[1, "mustang"],
[2, "LaFerrari"],
[3, "CLA"],
[4, "M5"]
]
dict1 = dict(list1)
dict2 = dict(list2)
result = {}
for key,val in dict1.items():
result[key] = [val]
for key, val in dict2.items():
result[key].append(val)
print(result)
output
{1: ['ford', 'mustang'], 2: ['Ferrari', 'LaFerrari'], 3: ['Mercedes', 'CLA'], 4: ['BMW', 'M5']}
As already mentioned, I am a newbie too, so there is probably a more "pythonic" way of doing this.

First, create a dict using the values in list1. Then update the lists in the dict with the values from list2, or create new lists for the keys in list2 which don't exist in list1:
result = {i: [j] for i, j in list1} # create initial dict from all values in list1
for i, j in list2:
if i in result:
result[i].append(j) # add to preexisting list corresponding to key
else:
result[i] = [j] # create new list corresponding to key
If your lists will have multiple values, you can use this where you handle the add logic in a separate function:
result = {}
def add_to_dict(d, key, val):
if key in d:
d[key].append(val)
else:
d[key] = [val]
for el in (list1 + list2):
key, *vals = el
for val in vals:
add_to_dict(result, key, val)
Here, rather than assuming each sublist has only 2 elements, we can unpack the key as the first element and the rest of the elements into a list called vals. Then, we can iterate over the list and perform the same adding logic

If you're sure that both lists contain the same number of items and both has a matching first element in each item (1, 2, 3, 4 in your example),
result = {k: [dict(list1)[k], dict(list2)[k]] for k in dict(list1)}

Python: sort elements of first list by elements of second list

dict = {“Liz”: 4, “Garry”: 4, “Barry”:6}
list1 = []
for m in sorted(result_dict, key=result_dict.get, reverse=True):
list1.append(m)
After that we have two lists:
list1 = ["Barry","Liz", "Garry"]
list2 = [“Garry”, “Liz”, “Barry”]
I want that output be like - if elements had same value in dict, in list1 they should be in order of list2 -> for example, if Garry was first in list2, in list1 he too sould be first after "Barry":
list1 = ["Barry", "Garry", "Liz"]

The key function can return a tuple to break ties. So in your case
d = {"Liz": 4, "Garry": 4, "Barry": 6}
list2 = ["Garry", "Liz", "Barry"]
list1 = sorted(d, key=lambda x: (d.get(x), -list2.index(x)), reverse=True)
print(list1)
will print
['Barry', 'Garry', 'Liz']

You need to use as a key the combination of your current key with the positions on the second list, something like this:
dict = {'Liz': 4, 'Garry': 4, 'Barry': 6}
list2 = ['Garry', 'Liz', 'Barry']
dict2 = {key: i for i, key in enumerate(list2)}
list1 = sorted(dict, key=lambda x: (dict.get(x), -1*dict2.get(x)), reverse=True)
print(list1)
Output
['Barry', 'Garry', 'Liz']
This approach is faster for large list than using list.index. In fact calling index will make the complexity of the algorithm O(n*2) therefore hindering the expected complexity of the sorting algorithm which is O(n*logn) using a dictionary will keep it the same.

How to sort one string using another string

I have one string of letters that are ordered as follows:
List1 = 'ZQXJKVBPYGFWMUCLDRHSNIOATE'
I have another string which is a bunch of characters
List2 = 'AVERT'
I want to order List2 based on List1. eg. List2 should get ordered as,
VRATE
How would I do this in python?

You can use sorted with the following key:
List1 = 'ZQXJKVBPYGFWMUCLDRHSNIOATE'
List2 = 'AVERT'
''.join(sorted(List2, key=List1.index))
# 'VRATE'
Or, for a better performance you could define a dictionary from List1 using enumerate, consisting on (value, index) and sort by looking up each value in List2:
d = {j:i for i, j in enumerate(List1)}
# {'Z': 0, 'Q': 1, 'X': 2, 'J': 3, 'K': 4, ...
''.join(sorted(List2, key = lambda x: d[x]))
# 'VRATE'

This will work:
List1 = 'ZQXJKVBPYGFWMUCLDRHSNIOATE'
List2 = 'AVERT'
List3 = ''
for i in List1:
if i in List2:
List3+=i
print(List3)

Complexity reduction: Find common elements in lists

Simple set-up: I have a list (roughly 40,000 entries) containing lists of strings (each with 2-15 elements). I want to compare all of the sublists to check if they have a common element (they share at most one). At the end, I want to create a dictionary (graph if you wish) where the index of each sublist is used as a key, and its values are the indices of the other sublists with which it shares common elements.
For example
lst = [['dam', 'aam','adm', 'ada', 'adam'], ['va','ea','ev','eva'], ['va','aa','av','ava']]
should give the following:
dic = {0: [], 1: [2], 2: [1]}
My problem is that I found a solution, but it's very computationally expensive. First, I wrote a function to compute the intersection of two lists:
def intersection(lst1, lst2):
temp = set(lst2)
lst3 = [value for value in lst1 if value in temp]
return lst3
Then I would loop over all the lists to check for intersections:
dic = {}
iter_range = range(len(lst))
#loop over all lists where k != i
for i in iter_range:
#create range that doesn't contain i
new_range = list(iter_range)
new_range.remove(i)
lst = []
for k in new_range:
#check if the lists at position i and k intersect
if len(intersection(mod_names[i], mod_names[k])) > 0:
lst.append(k)
# fill dictionary
dic[i] = lst
I know that for loops are slow, and that I'm looping over the list unnecessarily often (in the above example, I compare 1 with 2, then 2 with 1), but I don't know how to change it to make the program run faster.

You can create a dict word_occurs_in which will store data which word occurs in which lists, for your sample that would be:
{'dam': [0], 'aam': [0], 'adm': [0], 'ada': [0], 'adam': [0], 'va':
[1, 2], 'ea': [1], 'ev': [1], 'eva': [1], 'aa': [2], 'av': [2], 'ava':
[2]}
Then you can create a new dict, let's call it result, in which you should store the final result, e.g. {0: [], 1: [2], 2: [1]} in your case.
Now, to get result from word_occurs_in, you should traverse the values of word_occurs_in and see if the list has more then one element. If it does, then you just need add all other values except the value of the currently observed key in result. For instance, when checking the value [1, 2] (for key 'va'), you' will add 1 to the value corresponding to 2 in the result dict and will add 2 to the value corresponding to key 1. I hope this helps.
In my understanding, the biggest complexity to your code comes from iterating the list of 40K entries twice, so this approach iterates the list only once, but uses a bit more space.
Maybe I didn't explain myself sufficiently, so here is the code:
from collections import defaultdict
lst = [['dam', 'aam', 'adm', 'ada', 'adam'], ['va', 'ea', 'ev', 'eva'], ['va', 'aa', 'av', 'ava']]
word_occurs_in = defaultdict(list)
for idx, l in enumerate(lst):
for i in l:
word_occurs_in[i].append(idx)
print(word_occurs_in)
result = defaultdict(list)
for v in word_occurs_in.values():
if len(v) > 1:
for j in v:
result[j].extend([k for k in v if k != j])
print(result)

get the smallest items from a listOfLists group by keys

I've got a list like
listOfLists = [['key2', 1], ['key1', 2], ['key2', 2], ['key1', 1]]
The first item of an inner list is the key. The second item of an inner list is the value.
I want to get an output [['key1', 1], ['key2', 1]] which gives the list that its value is the smallest of the lists that has the same key and the output group by the key (my English is poor so just use the concept of Sql Syntax)
I've written some code like this:
listOfLists = [['key2', 1], ['key1', 2], ['key2', 2], ['key1', 1]]
listOfLists.sort() #this will sort by key, and then ascending by value
output = []
for index, l in enumerate(listOfLists):
if index == 0:
output.append(l)
if l[0] == listOfLists[index - 1][0]:
#has the same key, and the value is larger, discard
continue
else:
output.append(l)
this seems not smart enough
is there any simpler way to do this work?

How about using a dictionary (no need to sort the data)?
>>> listOfLists = [['key2', 1], ['key1', 2], ['key2', 2], ['key1', 1]]
>>> d = {}
>>> for k,v in listOfLists:
d.setdefault(k, []).append(v)
>>> d = {k:min(v) for k,v in d.items()}
>>> d
{'key2': 1, 'key1': 1}
You can convert to a list if you want

O(N log N) solution
You can just use the dict constructor for this. It is O(N log N) because of the sorting step
>>> dict(sorted(listOfLists, reverse=True))
{'key2': 1, 'key1': 1}
To see why this works, look at the result of sorted
>>> sorted(listOfLists, reverse=True)
[['key2', 2], ['key2', 1], ['key1', 2], ['key1', 1]]
The dict constructor will replace each key as it traverses the list and sorted has pushed the minimum for each key to the end of the sublist for that key
O(N) solution
>>> d = {}
>>> for k, v in listOfLists:
... d[k] = min(d.get(k, v), v)
...
>>> d
{'key2': 1, 'key1': 1}

The itertools module has a very useful groupby function that is probably exactly what you need:
from itertools import groupby
listOfLists.sort()
for key, subgroup in groupby(listOfLists, lambda item: item[0]):
print key, min(subgroup)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fastest way to match two list fields python - python

Related

how can I merge lists to create a python dictionary

Python: sort elements of first list by elements of second list

How to sort one string using another string

Complexity reduction: Find common elements in lists

get the smallest items from a listOfLists group by keys

Categories

Resources