Removing phrases in reverse order from a List - python

I have two lists.
L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams list
If I look closely, L1 has 'not worry' and 'good very' which are exact reversed repetitions of 'worry not' and 'very good'.
I need to remove such reversed elements from the list. Similary in L2, 'happy be always' is a reverse of 'always be happy', which is to be removed as well.
The final output I'm looking for is:
L1 = ['worry not', 'be happy', 'very good', 'full stop']
L2 = ['take into account', 'always be happy', 'stay safe friend']
I tried one solution
[[max(zip(map(set, map(str.split, group)), group))[1]] for group in L1]
But it is not giving the correct output.
Should I be writing different functions for bigrams and trigrams reverse repetition removal, or is there a pythonic way of doing this in a faster way,because I'll have to run this for about 10K+strings.

You can do it with list comprehensions if you iterate over the list from the end
lst = L1[::-1] # L2[::-1]
x = [s for i, s in enumerate(lst) if ' '.join(s.split()[::-1]) not in lst[i+1:]][::-1]
# L1: ['worry not', 'be happy', 'very good', 'full stop']
# L2: ['take into account', 'always be happy', 'stay safe friend']

You can use an index set and add both direct and reversed n-grams to it:
index = set()
res = []
for x in L1:
a = tuple(x.split())
b = tuple(reversed(a))
if a in index or b in index:
continue
index.add(a)
index.add(b)
res.append(x)
print(res)

Using a set of tuples is the way to deal with this:
L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams list
for list_ in L1, L2:
s = set()
for e in list_:
t = tuple(e.split())
if not t[::-1] in s:
s.add(t)
print([' '.join(e) for e in s])
Output:
['be happy', 'worry not', 'very good', 'full stop']
['always be happy', 'stay safe friend', 'take into account']

L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams list
def solution(lst):
res = []
for item in lst:
if " ".join(item.split()[::-1]) not in res:
res.append(item)
return res
print(solution(L2))

My solution consist on iterate foreach element in the list, transform that element in a list, sort it and compare with the next element making the same, transform it in a list and sort it, if the arrays are matching, remove this element.
Here is my code:
L1 = ['worry not', 'be happy', 'very good', 'not worry', 'good very', 'full stop'] # bigrams list
L2 = ['take into account', 'always be happy', 'stay safe friend', 'happy be always'] #trigrams l
def remove_duplicates(L):
for idx_i, l_i in enumerate(L):
aux_i = l_i.split()
aux_i.sort()
for idx_j, l_j in enumerate(L[idx_i+1:]):
aux_j = l_j.split()
aux_j.sort()
if aux_i == aux_j:
L.pop(idx_i + idx_j + 1)
print(L)
remove_duplicates(L1)
remove_duplicates(L2)
The output is what you're looking for:
>>> remove_duplicates(L1)
['worry not', 'be happy', 'very good', 'full stop']
>>> remove_duplicates(L2)
['take into account', 'always be happy', 'stay safe friend']
Hope this works for you

This is a possible solution (the complexity is linear with respect to the number of strings):
from collections import defaultdict
from operator import itemgetter
d = defaultdict(list)
for s in L2:
d[max(s, reversed(s.split()))].append(s)
result = list(map(itemgetter(0), d.values()))
Here are the results:
['worry not', 'be happy', 'very good', 'full stop']
['take into account', 'always be happy', 'stay safe friend']

Related

How do I create a new list of tuples?

I have this homework problem, and I'm new to python. I have this list of tuples:
[('the, this is me', 'the night'), ('the night', 'me'), ('me', 'the store')]
My code doesn't work when I'm trying to write to target_bigrams with only the tuples that have "the" in position [0]. Please help.
target_bigrams = ()
bigrams_length = len(bigrams)
for i in range(bigrams_length):
if i[0] == target_word:
target_bigrams.append(i[0])
​
I think this is what you need;
bigrams = [('the, this is me', 'the night'), ('the night', 'me'), ('me', 'the store')]
target_word = 'the'
target_bigrams = []
bigrams_length = len(bigrams)
for i in range(bigrams_length):
if bigrams[i][0].startswith(target_word):
target_bigrams.append(bigrams[i][0])
The question is not clear, but i believe that you want to separate the tuples which has "the" in the first position.
If that is the case, here is the sample code for your reference
lst = [('the, this is me', 'the night'), ('the night', 'me'), ('me', 'the store'), ('the', 'the store'),("the","How are you")]
target_list = []
target_word = "the"
for i in range(len(lst)):
if target_word == lst[i][0]:
target_list.append(lst[i])
for i in target_list:
print(i)
Output:
('the', 'the store')
('the', 'How are you')

How to sort a dictionary by value

I am trying to sort a dictionary by value, which is a timestamp in the format H:MM:SS (eg "0:41:42") but the code below doesn't work as expected:
album_len = {
'The Piper At The Gates Of Dawn': '0:41:50',
'A Saucerful of Secrets': '0:39:23',
'More': '0:44:53', 'Division Bell': '1:05:52',
'The Wall': '1:17:46',
'Dark side of the moon': '0:45:18',
'Wish you were here': '0:44:17',
'Animals': '0:41:42'
}
album_len = OrderedDict(sorted(album_len.items()))
This is the output I get:
OrderedDict([
('A Saucerful of Secrets', '0:39:23'),
('Animals', '0:41:42'),
('Dark side of the moon', '0:45:18'),
('Division Bell', '1:05:52'),
('More', '0:44:53'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('The Wall', '1:17:46'),
('Wish you were here', '0:44:17')])
It's not supposed to be like that. The first element I expected to see is ('The Wall', '1:17:46'), the longest one.
How do I get the elements sorted the way I intended?
Try converting each value to a datetime and using that as the key:
from collections import OrderedDict
from datetime import datetime
def convert_to_datetime(val):
return datetime.strptime(val, "%H:%M:%S")
album_len = {'The Piper At The Gates Of Dawn': '0:41:50',
'A Saucerful of Secrets': '0:39:23', 'More': '0:44:53',
'Division Bell': '1:05:52', 'The Wall': '1:17:46',
'Dark side of the moon': '0:45:18',
'Wish you were here': '0:44:17', 'Animals': '0:41:42'}
album_len = OrderedDict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]))
)
print(album_len)
Output:
OrderedDict([('A Saucerful of Secrets', '0:39:23'), ('Animals', '0:41:42'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('Wish you were here', '0:44:17'), ('More', '0:44:53'),
('Dark side of the moon', '0:45:18'), ('Division Bell', '1:05:52'),
('The Wall', '1:17:46')])
Or in descending order with reverse set to True:
album_len = OrderedDict(
sorted(
album_len.items(),
key=lambda i: convert_to_datetime(i[1]),
reverse=True
)
)
Output:
OrderedDict([('The Wall', '1:17:46'), ('Division Bell', '1:05:52'),
('Dark side of the moon', '0:45:18'), ('More', '0:44:53'),
('Wish you were here', '0:44:17'),
('The Piper At The Gates Of Dawn', '0:41:50'),
('Animals', '0:41:42'), ('A Saucerful of Secrets', '0:39:23')])
Edit: If only insertion order needs maintained and the OrderedDict specific functions like move_to_end are not going to be used then a regular python dict also works here for Python3.7+.
Ascending:
album_len = dict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]))
)
Descending:
album_len = dict(
sorted(album_len.items(), key=lambda i: convert_to_datetime(i[1]),
reverse=True)
)
This is a duplicate of the question: How do I sort a dictionary by value?"
>>> dict(sorted(album_len.items(), key=lambda item: item[1]))
{'A Saucerful of Secrets': '0:39:23',
'Animals': '0:41:42',
'The Piper At The Gates Of Dawn': '0:41:50',
'Wish you were here': '0:44:17',
'More': '0:44:53',
'Dark side of the moon': '0:45:18',
'Division Bell': '1:05:52',
'The Wall': '1:17:46'}
Note: the time format is already lexicographically ordered, you don't need to convert to datetime.
See comment below of #DarrylG. He's totally right, therefore, the remark on the lexicographic order is valid as long as the duration does not exceed 9:59:59 except if hours are padded with a leading zero.

How do I convert a list of strings to a proper sentence

How do I convert a list of strings to a proper sentence like this?
lst = ['eat', 'drink', 'dance', 'sleep']
string = 'I love"
output: "I love to eat, drink, dance and sleep."
Note: the "to" needs to be generated and not added manually to string
Thanks!
You can join all the verbs except the last with commas, and add the last with an and
def build(start, verbs):
return f"{start} to {', '.join(verbs[:-1])} and {verbs[-1]}."
string = 'I love'
lst = ['eat', 'drink', 'dance', 'sleep']
print(build(string, lst)) # I love to eat, drink, dance and sleep
lst = ['eat', 'drink', 'dance', 'sleep', 'run', 'walk', 'count']
print(build(string, lst)) # I love to eat, drink, dance, sleep, run, walk and count.
One option, using list to string joining:
lst = ['eat', 'drink', 'dance', 'sleep']
string = 'I love'
output = string + ' to ' + ', '.join(lst)
output = re.sub(r', (?!.*,)', ' and ', output)
print(output) # I love to eat, drink, dance and sleep
Note that the call to re.sub above selectively replaces the final comma with and.
Heyy, you can add string elements of lists to form bigger string by doing the following :-
verbs = lst[:-1].join(", ") # This will result in "eat, drink, dance"
verbs = verbs + " and " + lst[-1] # This will result in "eat, drink, dance and sleep"
string = string + ' to ' + verbs # This will result in "I love to eat, drink, dance and sleep"
print(string)

Python if statement on list of strings with any statements on desirable / exclusion lists of keywords

I'm trying to check whether a list of string items have substrings belong to a list of strings (desirable list) but not to another list of string (exclusion list). Here's an example below of what I'm trying to do:
worthwhile_gifts = []
wishlist = ['dog', 'cat', 'horse', 'pony']
gifts = ['a dog', 'a bulldog', 'a cartload of cats', 'Mickey doghouse', 'blob fish']
# Checking that various Xmas gifts include items from wishlist
for gift in gifts:
if any(i in gift for i in wishlist):
worthwhile_gifts.append(gift)
Looking at the result, we get what we expect this way
>>> print(worthwhile_gifts)
['a dog', 'a bulldog', 'a cartload of cats', 'Mickey doghouse']
Now what I'm trying to do is to check the list of gifts against the following two lists (I want items form wishlist but not from blocklist) and I'm having a hard time generating the if statement condition with two any statements in it
wishlist = ['dog', 'cat', 'horse', 'poney']
blocklist = ['bulldog', 'caterpillar', 'workhorse']
# Expected result would exclude 'bulldog'
>>> print(worthwhile_gifts)
['a dog', 'a cartload of cats', 'Mickey doghouse']
Any idea how to construct this if statement? I tried if (any(i in gift for i in wishlist)) and (any(i in gift for i not in blocklist)) but this doesn't work.
You're close, you need to check that the gift is not in the blacklist (all & not int)
wishlist = ['dog', 'cat', 'horse', 'poney']
blocklist = ['bulldog', 'caterpillar', 'workhorse']
gifts = ['a dog', 'a bulldog', 'a cartload of cats', 'Mickey doghouse', 'blob fish']
worthwhile_gifts = []
for gift in gifts:
if any(i in gift for i in wishlist) and all(i not in gift for i in blocklist):
worthwhile_gifts.append(gift)
print(worthwhile_gifts)
Result:
['a dog', 'a cartload of cats', 'Mickey doghouse']

Using Python Higher Order Functions to Manipulate Lists

I've made this list; each item is a string that contains commas (in some cases) and colon (always):
dinner = [
'cake,peas,cheese : No',
'duck,broccoli,onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes,bacon,carrots,water: Yes',
'rats,hats : Definitely Not',
'seltzer : Yes',
'sleeping,whining,spitting : No Way',
'marmalade : No'
]
I would like to create a new list from the one above as follows:
['cake : No',
'peas : No',
'cheese : No',
'duck : Maybe',
'broccoli : Maybe',
'onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes : Yes',
'bacon : Yes',
'carrots : Yes',
'water : Yes',
'rats : Definitely Not',
'hats : Definitely Not',
'seltzer : Yes',
'sleeping : No Way',
'whining : No Way',
'spitting : No Way',
'marmalade : No']
But I'd like to know if/ how it's possible to do so in a line or two of efficient code employing primarily Python's higher order functions. I've been attempting it:
reduce(lambda x,y: x + y, (map(lambda x: x.split(':')[0].strip().split(','), dinner)))
...produces this:
['cake',
'peas',
'cheese',
'duck',
'broccoli',
'onions',
'motor oil',
'pizza',
'ice cream',
'bologna',
'potatoes',
'bacon',
'carrots',
'water',
'rats',
'hats',
'seltzer',
'sleeping',
'whining',
'spitting',
'marmalade']
...but I'm struggling with appending the piece of each string after the colon back onto each item.
I would create a dict using, zip, map and itertools.repeat:
from itertools import repeat
data = ({k.strip(): v.strip() for _k, _v in map(lambda x: x.split(":"), dinner)
for k, v in zip(_k.split(","), repeat(_v))})
from pprint import pprint as pp
pp(data)
Output:
{'bacon': 'Yes',
'bologna': 'No',
'broccoli': 'Maybe',
'cake': 'No',
'carrots': 'Yes',
'cheese': 'No',
'duck': 'Maybe',
'hats': 'Definitely Not',
'ice cream': 'Maybe',
'marmalade': 'No',
'motor oil': 'Definitely Not',
'onions': 'Maybe',
'peas': 'No',
'pizza': 'Damn Right',
'potatoes': 'Yes',
'rats': 'Definitely Not',
'seltzer': 'Yes',
'sleeping': 'No Way',
'spitting': 'No Way',
'water': 'Yes',
'whining': 'No Way'}
Or using the dict constructor:
from itertools import repeat
data = dict(map(str.strip, t) for _k, _v in map(lambda x: x.split(":"), dinner)
for t in zip(_k.split(","), repeat(_v)))
from pprint import pprint as pp
pp(data)
If you really want a list of strings, we can do something similar using itertools.chain and joining the substrings:
from itertools import repeat, chain
data = chain.from_iterable(map(":".join, zip(_k.split(","), repeat(_v)))
for _k, _v in map(lambda x: x.split(":"), dinner))
from pprint import pprint as pp
pp(list(data))
Output:
['cake: No',
'peas: No',
'cheese : No',
'duck: Maybe',
'broccoli: Maybe',
'onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes: Yes',
'bacon: Yes',
'carrots: Yes',
'water: Yes',
'rats: Definitely Not',
'hats : Definitely Not',
'seltzer : Yes',
'sleeping: No Way',
'whining: No Way',
'spitting : No Way',
'marmalade : No']
Assuming you really need it as a list of strings vs. a dictionary, which looks like a better data structure.
By simplify using comprehensions you can do this:
>>> [[x+':'+y for x in i.split(',')]
... for i, y in map(lambda l: map(str.strip, l.split(':')), dinner)]
[['cake:No', 'peas:No', 'cheese:No'],
['duck:Maybe', 'broccoli:Maybe', 'onions:Maybe'],
['motor oil:Definitely Not'],
...
['marmalade:No']]
Now just add up the lists:
>>> from operator import add
>>> reduce(add, ([x+':'+y for x in i.split(',')]
... for i, y in map(lambda l: map(str.strip, l.split(':')), dinner)), [])
['cake:No',
'peas:No',
'cheese:No',
'duck:Maybe',
...
'marmalade:No']
Or just flatten the list:
>>> [a for i, y in map(lambda l: map(str.strip, l.split(':')), dinner)
... for a in (x+':'+y for x in i.split(','))]
['cake:No',
'peas:No',
'cheese:No',
'duck:Maybe',
...
'marmalade:No']
This may work:
def processList (aList):
finalList = []
for aListEntry in aList:
aListEntry_entries = aListEntry.split(':')
aListEntry_list = aListEntry_entries[0].split(',')
for aListEntry_list_entry in aListEntry_list:
finalList.append(aListEntry_list_entry.strip() + ' : ' + aListEntry_entries[1].strip())
return finalList
List comprehensions are preferred in Python (check eg this), due to better legibility (at least for some;).
The code demonstrates two types of list comprehension nesting, the first is basically chaining the operations, the other produces one list from two nested loops.
If you make your data more consistent by adding one space after the carrots, water, you can get rid of two .strip() calls;)
dinner = [
'cake,peas,cheese : No',
'duck,broccoli,onions : Maybe',
'motor oil : Definitely Not',
'pizza : Damn Right',
'ice cream : Maybe',
'bologna : No',
'potatoes,bacon,carrots,water : Yes',
'rats,hats : Definitely Not',
'seltzer : Yes',
'sleeping,whining,spitting : No Way',
'marmalade : No'
]
prefs = [(pref, items.split(',')) for items, pref in [it.split(" : ") for it in dinner]]
[" : ".join([item, pref]) for pref, items in prefs for item in items]

Categories