Efficient way of comparing multiple lists in python - python

I have 5 long lists with word pairs as given in the example below. Note that this could include word pair lists like [['Salad', 'Fat']] AND word pair list of lists like [['Bread', 'Oil'], ['Bread', ' Salt']]
list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread'] ]
Now I want to calculate the frequency of word pairs.
For example, in the above 5 lists, I should get the output as follows, where the word pairs and its frequency is shown.
output_list = [{'['Salad', 'Fat']': 3}, {['Bread', 'Oil']: 2}, {['Salt', 'Sugar']: 2,
{['Sugar','Salt']: 1} and so on]
What is the most efficient way of doing it in python?

Given you have uneven nested lists this makes the code ugly, so would look to fix the input lists.
collections.Counter() is built for this kind of thing but lists are not hashable so you need to turn them into tuples (as well as strip off the spurious spaces):
In []:
import itertools as it
from collections import Counter
list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread']]
t = lambda x: tuple(map(str.strip, x))
c = Counter(map(t, it.chain.from_iterable(it.chain(list_1, list_2, list_3, list_4))))
c += Counter(map(t, list_5))
c
Out[]:
Counter({('Bread', 'Oil'): 2,
('Bread', 'Salt'): 1,
('Protein', 'Soup'): 1,
('Salad', 'Chicken'): 1,
('Salad', 'Fat'): 3,
('Salad', 'Protein'): 1,
('Salt', 'Sugar'): 2,
('Sugar', 'Bread'): 1,
('Sugar', 'Protein'): 1,
('Sugar', 'Salt'): 1})

You could flatten all the lists. Then use Counter to count the word frequencies.
>>> import itertools
>>> from collections import Counter
>>> l = [[1,2,3],[3,4,1,5]]
>>> counts = Counter(list(itertools.chain(*l)))
>>> counts
Counter({1: 2, 3: 2, 2: 1, 4: 1, 5: 1})
NOTE: this flattening technique will work only with lists of lists. For other flattening techniques see the link provided above.
EDIT:
Thanks to AChampion counts = Counter(list(itertools.chain(*l))) can be written as counts = Counter(list(itertools.chain.from_iterable(l)))

Related

Find List2 in List1 at starting Position

List1 = ['RELEASE', 'KM123', 'MOTOR', 'XS4501', 'NAME']
List2 = ['KM', 'XS', 'M']
Now I am using code that only searches List2 in List1 in any position.
Result = [ s for s in List1 if any(xs in s for xs in List2]
Output :
[KM123', 'MOTOR', 'XS4501', 'NAME']
But I don't want 'NAME' to be in the list because it contains 'M' not in the starting. Any help...
Use str.startswith() which checks if a string starts with a particular sequence of characters:
[s for s in List1 if any(s.startswith(xs) for xs in List2)]
Looks like you can use str.startswith
Ex:
List1 = ['RELEASE', 'KM123', 'MOTOR', 'XS4501', 'NAME']
List2 = ('KM', 'XS', 'M') #convert to tuple
result = [ s for s in List1 if s.startswith(List2)]
print(result) #-->['KM123', 'MOTOR', 'XS4501']

how can I manipulate key with for loops to update dictionary

I am trying to put a list into a dictionary and count the number of occurrences of each word in the list. The only problem I don't understand is when I use the update function, it takes x as a dictionary key, when I want x to be the x value of list_ . I am new to python so any advice is appreciated. Thanks
list_ = ["hello", "there", "friend", "hello"]
d = {}
for x in list_:
d.update(x = list_.count(x))
Use a Counter object if you want a simple way of converting a list of items to a dictionary which contains a mapping of list_entry: number_of_occurences .
>>> from collections import Counter
>>> words = ['hello', 'there', 'friend', 'hello']
>>> c = Counter(words)
>>> print(c)
Counter({'hello': 2, 'there': 1, 'friend': 1})
>>> print(dict(c))
{'there': 1, 'hello': 2, 'friend': 1}
An option would be using dictionary comprehension with list.count() like this:
list_ = ["hello", "there", "friend", "hello"]
d = {item: list_.count(item) for item in list_}
Output:
>>> d
{'hello': 2, 'there': 1, 'friend': 1}
But the best option should be collections.Counter() used in #AK47's solution.

Populating a list of lists with random items

I have a list of lists with a certain range:
l = [["this", "is", "a"], ["list", "of"], ["lists", "that", "i", "want"], ["to", "copy"]]
And a list of words:
words = ["lorem", "ipsum", "dolor", "sit", "amet", "id", "sint", "risus", "per", "ut", "enim", "velit", "nunc", "ultricies"]
I need to create an exact replica of the list of lists, but with random terms picked from the other list.
This was the first thing that came to mind, but no dice.
for random.choice in words:
for x in list:
for y in x:
y = random.choice
Any ideas? Thank you in advance!
You can use list comprehensions for this:
import random
my_list = [[1, 2, 3], [5, 6]]
words = ['hello', 'Python']
new_list = [[random.choice(words) for y in x] for x in my_list]
print(new_list)
Output:
[['Python', 'Python', 'hello'], ['Python', 'hello']]
This is equivalent to:
new_list = []
for x in my_list:
subl = []
for y in x:
subl.append(random.choice(words))
new_list.append(subl)
With your example data:
my_list = [['this', 'is', 'a'], ['list', 'of'],
['lists', 'that', 'i', 'want'], ['to', 'copy']]
words = ['lorem', 'ipsum', 'dolor', 'sit', 'amet', 'id', 'sint', 'risus',
'per', 'ut', 'enim', 'velit', 'nunc', 'ultricies']
new_list = [[random.choice(words) for y in x] for x in my_list]
print(new_list)
Output:
[['enim', 'risus', 'sint'], ['dolor', 'lorem'], ['sint', 'nunc', 'ut', 'lorem'], ['ipsum', 'amet']]
You're not storing the values back into your lists. Try:
for i in range(0, len(list)):
subl = list[i]
for n in range(0, len(subl)):
list[i][n] = random.choice(words)
You should flatten your list of lists, then shuffle, then rebuild. Example:
import random
def super_shuffle(lol):
sublist_lengths = [len(sublist) for sublist in lol]
flat = [item for sublist in lol for item in sublist]
random.shuffle(flat)
pos = 0
shuffled_lol = []
for length in sublist_lengths:
shuffled_lol.append(flat[pos:pos+length])
pos += length
return shuffled_lol
print super_shuffle([[1,2,3,4],[5,6,7],[8,9]])
Prints:
[[7, 8, 5, 6], [9, 1, 3], [2, 4]]
This randomizes across ALL the lists, not just within a single sublist and guarantees no dups.

Remove a substr from string items in a list

list = [ 'u'adc', 'u'toto', 'u'tomato', ...]
What I want is to end up with a list of the kind:
list2 = [ 'adc', 'toto', 'tomato'... ]
Can you please tell me how to do that without using regex?
I'm trying:
for item in list:
list.extend(str(item).replace("u'",''))
list.remove(item)
but this ends up giving something of the form [ 'a', 'd', 'd', 'm'...]
In the list I may have an arbitrary number of strings.
you can encode it to "utf-8" like this:
list_a=[ u'adc', u'toto', u'tomato']
list_b=list()
for i in list_a:
list_b.append(i.encode("utf-8"))
list_b
output:
['adc', 'toto', 'tomato']
Or you can use str function:
list_c = list()
for i in list_a:
list_c.append(str(i))
list_c
Output:
['adc', 'toto', 'tomato']
Use "u\'"
For example:
l = [ "u'adc", "u'toto", "u'tomato"]
for item in l:
print(item.replace("u\'", ""))
Will output:
adc
toto
tomato
I verified your question but it says the syntax problem, which means that the way you are declaring the string in the list is not proper. In which case, I have corrected that at line #2.
In [1]: list = [ 'u'adc', 'u'toto', 'u'tomato']
File "<ipython-input-1-2c6e581e868e>", line 1
list = [ 'u'adc', 'u'toto', 'u'tomato']
^
SyntaxError: invalid syntax
In [2]: list = [ u'adc', u'toto', u'tomato']
In [3]: list = [ str(item) for item in list ]
In [4]: list
Out[4]: ['adc', 'toto', 'tomato']
In [5]:
Solution-1
input_list = [ u'adc', u'toto', u'tomato']
output_list=map(lambda x:str(x),input_list )
print output_list
And Output Look like:
['adc', 'toto', 'tomato']
Solution-2
input_list = [ u'adc', u'toto', u'tomato']
output_list=map(lambda x:x.encode("utf-8"),input_list )
print output_list
And Output Look like:
['adc', 'toto', 'tomato']
Try this:
for item in list:
for x in range(0, len(item)):
if item[x] == 'u':
item[x] = ''
This takes all instances in the list, and checks for the string 'u'. If 'u' is found, than the code replaces it with a blank string, essentially deleting it. Some more code could allow this to check for combinations of letters ('abc', etc.).
Your input is nothing but a json! You the dump each item in the list(which is a json!) to get the desired output!
Since your output comes with quotes - you need to strip(beginning and trailing) them!
import json
list = [ u'adc', u'toto', u'tomato']
print [json.dumps(i).strip('\"') for i in list]
Output:
['adc', 'toto', 'tomato']
Hope it helps!

counting the number of co-occurences in a list

I have an array consisting of a set of lists of strings (can assume each string is a single word).
I want an efficient way, in Python, to count pairs of words in this array.
It is not collocation or bi-grams, as each word in the pair may be in any position on the list.
It's unclear how your list is, Is it something like:
li = ['hello','bye','hi','good','bye','hello']
If so the solution is simple:
In [1342]: [i for i in set(li) if li.count(i) > 1]
Out[1342]: ['bye', 'hello']
Otherwise if it is like:
li = [['hello'],['bye','hi','good'],['bye','hello']]
Then:
In [1378]: f = []
In [1379]: for x in li:
.......... for i in x:
.......... f.append(i)
In [1380]: f
Out[1380]: ['hello', 'bye', 'hi', 'good', 'bye', 'hello']
In [1381]: [i for i in set(f) if f.count(i) > 1]
Out[1381]: ['bye', 'hello']
>>> from itertools import chain
>>> from collections import Counter
>>> L = [['foo', 'bar'], ['apple', 'orange', 'mango'], ['bar']]
>>> c = Counter(frozenset(x) for x in combinations(chain.from_iterable(L), r=2))
>>> c
Counter({frozenset(['mango', 'bar']): 2, frozenset(['orange', 'bar']): 2, frozenset(['foo', 'bar']): 2, frozenset(['bar', 'apple']): 2, frozenset(['orange', 'apple']): 1, frozenset(['foo', 'apple']): 1, frozenset(['bar']): 1, frozenset(['orange', 'mango']): 1, frozenset(['foo', 'mango']): 1, frozenset(['mango', 'apple']): 1, frozenset(['orange', 'foo']): 1})

Categories