Python How to find arrays that has a certain element efficiently - python

Given lists(a list can have an element that is in another list) and a string, I want to find all names of lists that contains a given string.
Simply, I could just go through all lists using if statements, but I feel that there is more efficient way to do so.
Any suggestion and advice would be appreciated. Thank you.
Example of Simple Method I came up with
arrayA = ['1','2','3','4','5']
arrayB = ['3','4','5']
arrayC = ['1','3','5']
arrayD = ['7']
foundArrays = []
if givenString in arrayA:
foundArrays.append('arrayA')
if givenString in arrayB:
foundArrays.append('arrayB')
if givenString in arrayC:
foundArrays.append('arrayC')
if givenString in arrayD:
foundArrays.append('arrayD')
return foundArrays

Lookup in a list is not very efficient; a set is much better.
Let's define your data like
data = { # a dict of sets
"a": {1, 2, 3, 4, 5},
"b": {3, 4, 5},
"c": {1, 3, 5},
"d": {7}
}
then we can search like
search_for = 3 # for example
in_which = {label for label,values in data.items() if search_for in values}
# -> in_which = {'a', 'b', 'c'}
If you are going to repeat this often, it may be worth pre-processing your data like
from collections import defaultdict
lookup = defaultdict(set)
for label,values in data.items():
for v in values:
lookup[v].add(label)
Now you can simply
in_which = lookup[search_for] # -> {'a', 'b', 'c'}

The simple one-liner is:
result = [lst for lst in [arrayA, arrayB, arrayC, arrayD] if givenString in lst]
or if you prefer a more functional style:
result = filter(lambda lst: givenString in lst, [arrayA, arrayB, arrayC, arrayD])
Note that neither of these gives you the NAME of the list. You shouldn't ever need to know that, though.

Array names?
Try something like this with eval() nonetheless using eval() is evil
arrayA = [1,2,3,4,5,'x']
arrayB = [3,4,5]
arrayC = [1,3,5]
arrayD = [7,'x']
foundArrays = []
array_names = ['arrayA', 'arrayB', 'arrayC', 'arrayD']
givenString = 'x'
result = [arr for arr in array_names if givenString in eval(arr)]
print result
['arrayA', 'arrayD']

Related

Append list based on another element in list and remove lists that contained the items

Let's say I have two lists like this:
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
I want to combine the items together by their holder (i.e 'Robert') only when they are in separate lists. Ie in the end list_all should contain:
list_all = [[['some_name','something_else'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice]]
What is a fast and effective way of doing it?
I've tried in different ways but I'm looking for something more elegant, more simplistic.
Thank you
Here is one solution. It is often better to store your data in a more structured form, e.g. a dictionary, rather than manipulate from one list format to another.
from collections import defaultdict
list_all = [[['some_item'],'Robert'],
[['another_item'],'Robert'],
[['itemx'],'Adam'],
[['item2','item3'],'Maurice']]
d = defaultdict(list)
for i in list_all:
d[i[1]].extend(i[0])
# defaultdict(list,
# {'Adam': ['itemx'],
# 'Maurice': ['item2', 'item3'],
# 'Robert': ['some_item', 'another_item']})
d2 = [[v, k] for k, v in d.items()]
# [[['some_item', 'another_item'], 'Robert'],
# [['itemx'], 'Adam'],
# [['item2', 'item3'], 'Maurice']]
You can try this, though it's quite similar to above answer but you can do this without importing anything.
list_all = [[['some_item'], 'Robert'], [['another_item'], 'Robert'], [['itemx'], 'Adam'], [['item2', 'item3'], 'Maurice']]
x = {} # initializing a dictionary to store the data
for i in list_all:
try:
x[i[1]].extend(i[0])
except KeyError:
x[i[1]] = i[0]
list2 = [[j, i ] for i,j in x.items()]
list_all = [[['some_item'],'Robert'] ,[['another_item'],'Robert'],[['itemx'],'Adam'],[['item2','item3'],'Maurice']]
dict_value = {}
for val in list_all:
list_, name = val
if name in dict_value:
dict_value[name][0].extend(list_)
else:
dict_value.setdefault(name,[list_, name])
print(list(dict_value.values()))
>>>[[['some_item', 'another_item'], 'Robert'],
[['itemx'], 'Adam'],
[['item2', 'item3'], 'Maurice']]

Extending python dictionary and changing key values

Assume I have a python dictionary with 2 keys.
dic = {0:'Hi!', 1:'Hello!'}
What I want to do is to extend this dictionary by duplicating itself, but change the key value.
For example, if I have a code
dic = {0:'Hi!', 1:'Hello'}
multiplier = 3
def DictionaryExtend(number_of_multiplier, dictionary):
"Function code"
then the result should look like
>>> DictionaryExtend(multiplier, dic)
>>> dic
>>> dic = {0:'Hi!', 1:'Hello', 2:'Hi!', 3:'Hello', 4:'Hi!', 5:'Hello'}
In this case, I changed the key values by adding the multipler at each duplication step. What's the efficient way of doing this?
Plus, I'm also planning to do the same job for list variable. I mean, extend a list by duplicating itself and change some values like above exmple. Any suggestion for this would be helpful, too!
You can try itertools to repeat the values and OrderedDict to maintain input order.
import itertools as it
import collections as ct
def extend_dict(multiplier, dict_):
"""Return a dictionary of repeated values."""
return dict(enumerate(it.chain(*it.repeat(dict_.values(), multiplier))))
d = ct.OrderedDict({0:'Hi!', 1:'Hello!'})
multiplier = 3
extend_dict(multiplier, d)
# {0: 'Hi!', 1: 'Hello!', 2: 'Hi!', 3: 'Hello!', 4: 'Hi!', 5: 'Hello!'}
Regarding handling other collection types, it is not clear what output is desired, but the following modification reproduces the latter and works for lists as well:
def extend_collection(multiplier, iterable):
"""Return a collection of repeated values."""
repeat_values = lambda x: it.chain(*it.repeat(x, multiplier))
try:
iterable = iterable.values()
except AttributeError:
result = list(repeat_values(iterable))
else:
result = dict(enumerate(repeat_values(iterable)))
return result
lst = ['Hi!', 'Hello!']
multiplier = 3
extend_collection(multiplier, lst)
# ['Hi!', 'Hello!', 'Hi!', 'Hello!', 'Hi!', 'Hello!']
It's not immediately clear why you might want to do this. If the keys are always consecutive integers then you probably just want a list.
Anyway, here's a snippet:
def dictExtender(multiplier, d):
return dict(zip(range(multiplier * len(d)), list(d.values()) * multiplier))
I don't think you need to use inheritance to achieve that. It's also unclear what the keys should be in the resulting dictionary.
If the keys are always consecutive integers, then why not use a list?
origin = ['Hi', 'Hello']
extended = origin * 3
extended
>> ['Hi', 'Hello', 'Hi', 'Hello', 'Hi', 'Hello']
extended[4]
>> 'Hi'
If you want to perform a different operation with the keys, then simply:
mult_key = lambda key: [key,key+2,key+4] # just an example, this can be any custom implementation but beware of duplicate keys
dic = {0:'Hi', 1:'Hello'}
extended = { mkey:dic[key] for key in dic for mkey in mult_key(key) }
extended
>> {0:'Hi', 1:'Hello', 2:'Hi', 3:'Hello', 4:'Hi', 5:'Hello'}
You don't need to extend anything, you need to pick a better input format or a more appropriate type.
As others have mentioned, you need a list, not an extended dict or OrderedDict. Here's an example with lines.txt:
1:Hello!
0: Hi.
2: pylang
And here's a way to parse the lines in the correct order:
def extract_number_and_text(line):
number, text = line.split(':')
return (int(number), text.strip())
with open('lines.txt') as f:
lines = f.readlines()
data = [extract_number_and_text(line) for line in lines]
print(data)
# [(1, 'Hello!'), (0, 'Hi.'), (2, 'pylang')]
sorted_text = [text for i,text in sorted(data)]
print(sorted_text)
# ['Hi.', 'Hello!', 'pylang']
print(sorted_text * 2)
# ['Hi.', 'Hello!', 'pylang', 'Hi.', 'Hello!', 'pylang']
print(list(enumerate(sorted_text * 2)))
# [(0, 'Hi.'), (1, 'Hello!'), (2, 'pylang'), (3, 'Hi.'), (4, 'Hello!'), (5, 'pylang')]

python group by and ordered a list by another list

I wonder if there is more Pythonic way to do group by and ordered a list by the order of another list.
The lstNeedOrder has couple pairs in random order. I want the output to be ordered as order in lst. The result should have all pairs containing a's then follow by all b's and c's.
The lstNeedOrder would only have either format in a/c or c/a.
input:
lstNeedOrder = ['a/b','c/b','f/d','a/e','c/d','a/c']
lst = ['a','b','c']
output:
res = ['a/b','a/c','a/e','c/b','c/d','f/d']
update
The lst = ['a','b','c'] is not actual data. it just make logic easy to understand. the actual data are more complex string pairs
Using sorted with customer key function:
>>> lstNeedOrder = ['a/b','c/d','f/d','a/e','c/d','a/c']
>>> lst = ['a','b','c']
>>> order = {ch: i for i, ch in enumerate(lst)} # {'a': 0, 'b': 1, 'c': 2}
>>> def sort_key(x):
... # 'a/b' -> (0, 1), 'c/d' -> (2, 3), ...
... a, b = x.split('/')
... return order.get(a, len(lst)), order.get(b, len(lst))
...
>>> sorted(lstNeedOrder, key=sort_key)
['a/b', 'a/c', 'a/e', 'c/d', 'c/d', 'f/d']

Get a unique list of items that occur more than once in a list

I have a list of items:
mylist = ['A','A','B','C','D','E','D']
I want to return a unique list of items that appear more than once in mylist, so that my desired output would be:
[A,D]
Not sure how to even being this, but my though process is to first append a count of each item, then remove anything equal to 1. Then dedupe, but this seems like a really roundabout, inefficient way to do it, so I am looking for advice.
You can use collections.Counter to do what you have described easily:
from collections import Counter
mylist = ['A','A','B','C','D','E','D']
cnt = Counter(mylist)
print [k for k, v in cnt.iteritems() if v > 1]
# ['A', 'D']
>>> mylist = ['A','A','B','C','D','E','D']
>>> set([i for i in mylist if mylist.count(i)>1])
set(['A', 'D'])
import collections
cc = collections.Counter(mylist) # Counter({'A': 2, 'D': 2, 'C': 1, 'B': 1, 'E': 1})
cc.subtract(cc.keys()) # Counter({'A': 1, 'D': 1, 'C': 0, 'B': 0, 'E': 0})
cc += collections.Counter() # remove zeros (trick from the docs)
print cc.keys() # ['A', 'D']
Try some thing like this:
a = ['A','A','B','C','D','E','D']
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
['A', 'D']
Reference: How to find duplicate elements in array using for loop in Python?
OR
def list_has_duplicate_items( mylist ):
return len(mylist) > len(set(mylist))
def get_duplicate_items( mylist ):
return [item for item in set(mylist) if mylist.count(item) > 1]
mylist = [ 'oranges' , 'apples' , 'oranges' , 'grapes' ]
print 'List: ' , mylist
print 'Does list have duplicate item(s)? ' , list_has_duplicate_items( mylist )
print 'Redundant item(s) in list: ' , get_duplicate_items( mylist )
Reference https://www.daniweb.com/software-development/python/threads/286996/get-redundant-items-in-list
Using a similar approach to others here, heres my attempt:
from collections import Counter
def return_more_then_one(myList):
counts = Counter(my_list)
out_list = [i for i in counts if counts[i]>1]
return out_list
It can be as simple as ...
print(list(set([i for i in mylist if mylist.count(i) > 1])))
Use set to help you do that, like this maybe :
X = ['A','A','B','C','D','E','D']
Y = set(X)
Z = []
for val in Y :
occurrences = X.count(val)
if(occurrences > 1) :
#print(val,'occurs',occurrences,'times')
Z.append(val)
print(Z)
The list Z will save the list item which occur more than once. And the part I gave comment (#), that will show the number of occurrences of each list item which occur more than once
Might not be as fast as internal implementations, but takes (almost) linear time (since set lookup is logarithmic)
mylist = ['A','A','B','C','D','E','D']
myset = set()
dups = set()
for x in mylist:
if x in myset:
dups.add(x)
else:
myset.add(x)
dups = list(dups)
print dups
another solution what's written:
def delete_rep(list_):
new_list = []
for i in list_:
if i not in list_[i:]:
new_list.append(i)
return new_list
This is my approach without using packages
result = []
for e in listy:
if listy.count(e) > 1:
result.append(e)
else:
pass
print(list(set(result)))

Python list set value at index if index does not exist

Is there a way, lib, or something in python that I can set value in list at an index that does not exist?
Something like runtime index creation at list:
l = []
l[3] = 'foo'
# [None, None, None, 'foo']
And more further, with multi dimensional lists:
l = []
l[0][2] = 'bar'
# [[None, None, 'bar']]
Or with an existing one:
l = [['xx']]
l[0][1] = 'yy'
# [['xx', 'yy']]
There isn't a built-in, but it's easy enough to implement:
class FillList(list):
def __setitem__(self, index, value):
try:
super().__setitem__(index, value)
except IndexError:
for _ in range(index-len(self)+1):
self.append(None)
super().__setitem__(index, value)
Or, if you need to change existing vanilla lists:
def set_list(l, i, v):
try:
l[i] = v
except IndexError:
for _ in range(i-len(l)+1):
l.append(None)
l[i] = v
Not foolproof, but it seems like the easiest way to do this is to initialize a list much larger than you will need, i.e.
l = [None for i in some_large_number]
l[3] = 'foo'
# [None, None, None, 'foo', None, None None ... ]
If you really want the syntax in your question, defaultdict is probably the best way to get it:
from collections import defaultdict
def rec_dd():
return defaultdict(rec_dd)
l = rec_dd()
l[3] = 'foo'
print l
{3: 'foo'}
l = rec_dd()
l[0][2] = 'xx'
l[1][0] = 'yy'
print l
<long output because of defaultdict, but essentially)
{0: {2: 'xx'}, 1: {0: 'yy'}}
It isn't exactly a 'list of lists' but it works more or less like one.
You really need to specify the use case though... the above has some advantages (you can access indices without checking whether they exist first), and some disadvantages - for example, l[2] in a normal dict will return a KeyError, but in defaultdict it just creates a blank defaultdict, adds it, and then returns it.
Other possible implementations to support different syntactic sugars could involve custom classes etc, and will have other tradeoffs.
You cannot create a list with gaps. You could use a dict or this quick little guy:
def set_list(i,v):
l = []
x = 0
while x < i:
l.append(None)
x += 1
l.append(v)
return l
print set_list(3, 'foo')
>>> [None, None, None, 'foo']

Categories