Efficiently find strings in list of lists of strings (Python) - python

I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:
inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
[ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
[ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]
# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]
Those reported in the result are the indices for the position in the output list of each string in the inp list. For example, ans2 is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1. ans3, however, does not appear in any sublist and, therefore, the returned index is -1.
What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.
Some considerations:
output has shape equal to [ len( inp ), L ], where L is the size of the dictionary. In this case L = 5.

You can try list comprehension:
result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]
Update:
Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.

You can create dictionary index first to speed-up the search:
inp = ["ans1", "ans2", "ans3"]
output = [
["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]
tmp = [{v: i for i, v in enumerate(subl)} for subl in output]
result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)
Prints:
[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Related

list of dicts - change key values from one to many to many to one

with a list of dict, say list1 like below
[
{'subId': 0, 'mainIds': [0]},
{'subId': 3, 'mainIds': [0, 3, 4, 5], 'parameter': 'off', 'Info': 'true'}
]
Need to convert to below format.
[
{'mainId': 0, 'subIds':[0,3]},
{'mainId': 3, 'subIds': [3] },
{'mainId': 4, 'subIds': [3] },
{'mainId': 5, 'subIds': [3]}
]
What is tried so far
finalRes = []
for i in list1:
subId = i['subId']
for j in i['mainIds']:
res = {}
res[mainId] = j
res['subIds'] = []
res['subIds'].append(subId)
finalRes.append(res)
This gives something closer to the required output. Need help with getting the output mentioned above. Is there any popular name for this kind of operation (something like one to many to many to one ?)
[
{'mainId': 0, 'subIds':[0]},
{'mainId': 0, 'subIds':[3]}
{'mainId': 3, 'subIds': [3]},
{'mainId': 4, 'subIds': [3]},
{'mainId': 5, 'subIds': [3]}
]
This kinds of joins can be implemented easily with defaultdict:
subs_by_main_id = defaultdict(list)
for entry in list1:
sub_id = entry['subId']
for main_id in entry['mainIds']:
subs_by_main_id[main_id].append(sub_id)
return [{'mainId': main_id, 'subIds': sub_ids}
for main_id, sub_ids in sub_by_main_id.items()]
Here's a solution using comprehensions and itertools.chain. Start by converting the lists to sets, for fast membership tests; then build the result directly. It is not as efficient as the defaultdict solution.
from itertools import chain
sets = { d['subId']: set(d['mainIds']) for d in data }
result = [
{'mainId': i, 'subIds': [ j for j, v in sets.items() if i in v ]}
for i in set(chain.from_iterable(sets.values()))
]

Python: Find count of the elements of one list in another list

Let's say I have two lists list1 and list2 as:
list1 = [ 3, 4, 7 ]
list2 = [ 5, 2, 3, 5, 3, 4, 4, 9 ]
I want to find the count of the elements of list1 which are present in list2.
Expected output is 4 because 3 and 4 from list1 are appearing twice in list2. Hence, total count is as 4.
Use list comprehension and check if element exists
c = len([i for i in list2 if i in list1 ])
Better one from #Jon i.e
c = sum(el in list1 for el in list2)
Output : 4
You may use sum(...) to achieve this with the generator expression as:
>>> list1 = [ 3, 4, 7 ]
>>> list2 = [ 5, 2, 3, 5, 3, 4, 4, 9 ]
# v returns `True`/`False` and Python considers Boolean value as `0`/`1`
>>> sum(x in list1 for x in list2)
4
As an alternative, you may also use Python's __contains__'s magic function to check whether element exists in the list and use filter(..) to filter out the elements in the list not satisfying the "in" condition. For example:
>>> len(list(filter(list1.__contains__, list2)))
4
# Here "filter(list(list1.__contains__, list2))" will return the
# list as: [3, 3, 4, 4]
For more details about __contains__, read: What does __contains__ do, what can call __contains__ function?.
You can iterate first list and add occurences of a given number to a sum using count method.
for number in list1:
s += list2.count(number);
You can use collections.Counter here, so a naive and rather ugly implementation first (mine).
list1 = [ 3, 4, 7 ]
list2 = [ 5, 2, 3, 5, 3, 4, 4, 9 ]
from collections import Counter
total = 0
c = Counter(list2)
for i in list1:
if c[i]:
total += c[i]
This doesn't take into account what happens if you've got duplicates in the first list (HT Jon), and a much more elegant version of this would be:
counter = Counter(list2)
occurrences = sum(counter[v] for v in set(list1))

Can I subtract a value from a column of a list of lists?

If I have
listOfLists = [
[123, "str1"],
[234, "str2"]
]
listOfLists[:[0]] = [x - 15 for x in listOfLists]
can I perform an operation to subtract a value from just the [:[0]] part of the information?
The error I am getting currently is
can only concatenate list (not "int") to list
You could do it like this:
listOfLists = [ [x[0]-15, x[1]] for x in listOfLists]
You can use the map function to apply an operation to each member of a list. One way to subtract from the first element regardless of the length of each list would be like so:
>>> foo = [ [1, "a"], [2, "b", "c"] ]
>>> map(lambda x: [x[0] - 5] + x[1:], foo)
[[-4, 'a'], [-3, 'b', 'c']]

Map a Numpy array into a list of characters

Given a two dim numpy array:
a = array([[-1, -1],
[-1, 1],
[ 1, 1],
[ 1, 1],
[ 1, 0],
[ 0, -1],
[-1, 0],
[ 0, -1],
[-1, 0],
[ 0, 1],
[ 1, 1],
[ 1, 1]])
and a dictionary of conversions:
d = {-1:'a', 0:'b', 1:'c'}
how to map the original array into a list of character combinations?
What I need is the following list (or array)
out_put = ['aa', 'ac', 'cc', 'cc', 'cb', 'ba', ....]
(I am doing some machine learning classification and my classes are labeled by the combination of -1, 0,1 and I need to convert the array of 'labels' into something readable, as 'aa', bc' and so on).
If there is a simple function (binarizer, or one-hot-encoding) within the sklearn package, which can convert the original bumpy array into a set of labels, that would be perfect!
Here's another approach with list comprehension:
my_dict = {-1:'a', 0:'b', 1:'c'}
out_put = ["".join([my_dict[val] for val in row]) for row in a]
i think you ought to be able to do this via a list comprehension
# naming something `dict` is a bad idea
d = {-1:'a', 0:'b', 1:'c'}
out_put = ['%s%s' % (d[x], d[y]) for x, y in a]
I think the following is very readable:
def switch(row):
dic = {
-1:'a',
0:'b',
1:'c'
}
return dic.get(row)
out_put = [switch(x)+switch(y) for x,y in a]

How do I delete the Nth list item from a list of lists (column delete)?

How do I delete a "column" from a list of lists?
Given:
L = [
["a","b","C","d"],
[ 1, 2, 3, 4 ],
["w","x","y","z"]
]
I would like to delete "column" 2 to get:
L = [
["a","b","d"],
[ 1, 2, 4 ],
["w","x","z"]
]
Is there a slice or del method that will do that? Something like:
del L[:][2]
You could loop.
for x in L:
del x[2]
If you're dealing with a lot of data, you can use a library that support sophisticated slicing like that. However, a simple list of lists doesn't slice.
just iterate through that list and delete the index which you want to delete.
for example
for sublist in list:
del sublist[index]
You can do it with a list comprehension:
>>> removed = [ l.pop(2) for l in L ]
>>> print L
[['a', 'b', 'd'], [1, 2, 4], ['w', 'x', 'z']]
>>> print removed
['d', 4, 'z']
It loops the list and pops every element in position 2.
You have got list of elements removed and the main list without these elements.
A slightly twisted version:
index = 2 # Delete column 2
[(x[0:index] + x[index+1:]) for x in L]
[(x[0], x[1], x[3]) for x in L]
It works fine.
This is a very easy way to remove whatever column you want.
L = [
["a","b","C","d"],
[ 1, 2, 3, 4 ],
["w","x","y","z"]
]
temp = [[x[0],x[1],x[3]] for x in L] #x[column that you do not want to remove]
print temp
O/P->[['a', 'b', 'd'], [1, 2, 4], ['w', 'x', 'z']]
L = [['a', 'b', 'C', 'd'], [1, 2, 3, 4], ['w', 'x', 'y', 'z']]
_ = [i.remove(i[2]) for i in L]
If you don't mind on creating new list then you can try the following:
filter_col = lambda lVals, iCol: [[x for i,x in enumerate(row) if i!=iCol] for row in lVals]
filter_out(L, 2)
An alternative to pop():
[x.__delitem__(n) for x in L]
Here n is the index of the elements to be deleted.

Categories