Select strings by positions of words

Select strings by positions of words - python

For the following tuple
mysentence = 'i have a dog and a cat', 'i have a cat and a dog', 'i have a cat',
'i have a dog'
How to select only the strings 'i have a cat' , 'i have a dog', i.e exclude strings having the word dog or cat in the middle.

You can do this with regular expressions. The regex .+(dog|cat).+ will match one or more characters, followed by dog or cat, and one of more characters afterwards. You can then use filter to find strings which don't match this regex:
import re
regex.compile(r'.+(dog|cat).+')
sentence = 'i have a dog and a cat', 'i have a cat and a dog', 'i have a cat',
'i have a dog'
filtered_sentence = filter(lambda s: not regex.match(s), sentence)

You could use a Regular Expression to match the sentences you don't want.
We can build up the pattern as follows:
We want to match dog or cat - (dog|cat)
followed by a space, i.e. not at the end of the line
So our code looks like so:
>>> mysentence = ('i have a dog and a cat', 'i have a cat and a dog', 'i have a cat', 'i have a dog')
>>> import re
>>> pattern = re.compile("(dog|cat) ")
>>> [x for x in mysentence if not pattern.search(x)]
['i have a cat', 'i have a dog']

If the string should just end with a specific phrase then this will do the job:
phases = ("I have a cat", "I have a dog")
for sentence in mysentence:
for phase in phases:
if sentence.lower().endswith(phase.lower()):
print(sentence)

Simplest thing that could possibly work:
In [10]: [phrase for phrase in mysentence if not ' and ' in phrase]
Out[10]: ['i have a cat', 'i have a dog']

You can use regexp or string methods.
I see other answered with regex, so I try string methods: with string.find() you will get position of substring in string. Then check if it is in the middle of the sentence.
def filter_function(sentence, words):
for word in words:
p = sentence.find(word)
if p > 0 and p < len(sentence) - len(word):
return 0
return 1
for sentence in mysentence:
print('%s: %d' % (sentence, filter_function(sentence, ['dog', 'cat'])))
You also must define what to do when you will have only 'cat' in sentence.

for items in mysentence:
if (items.find("dog")>=0)^(items.find("cat")>=0):
print(items)
You just need an xor operator and the find function. No need to import

Related

How do I convert a list of strings to a proper sentence

How do I convert a list of strings to a proper sentence like this?
lst = ['eat', 'drink', 'dance', 'sleep']
string = 'I love"
output: "I love to eat, drink, dance and sleep."
Note: the "to" needs to be generated and not added manually to string
Thanks!

You can join all the verbs except the last with commas, and add the last with an and
def build(start, verbs):
return f"{start} to {', '.join(verbs[:-1])} and {verbs[-1]}."
string = 'I love'
lst = ['eat', 'drink', 'dance', 'sleep']
print(build(string, lst)) # I love to eat, drink, dance and sleep
lst = ['eat', 'drink', 'dance', 'sleep', 'run', 'walk', 'count']
print(build(string, lst)) # I love to eat, drink, dance, sleep, run, walk and count.

One option, using list to string joining:
lst = ['eat', 'drink', 'dance', 'sleep']
string = 'I love'
output = string + ' to ' + ', '.join(lst)
output = re.sub(r', (?!.*,)', ' and ', output)
print(output) # I love to eat, drink, dance and sleep
Note that the call to re.sub above selectively replaces the final comma with and.

Heyy, you can add string elements of lists to form bigger string by doing the following :-
verbs = lst[:-1].join(", ") # This will result in "eat, drink, dance"
verbs = verbs + " and " + lst[-1] # This will result in "eat, drink, dance and sleep"
string = string + ' to ' + verbs # This will result in "I love to eat, drink, dance and sleep"
print(string)

Remove first word from the sentence and return remaining string

Write a function that is given a phrase and returns the phrase we get if we take
out the first word from the input phrase.
For example, given ‘the quick brown fox’, your function should return ‘quick brown fox’
This is my code:
def whatistheremainder(v):
remainderforone = v.split(' ', 1)
outcome = remainderforone[1:]
return outcome
Instead of getting a sensible output like:
'quick brown fox'
I am getting something like this:
['quick brown fox']
Please help

Is this what you wanted to get
def whatistheremainder(v):
remainderforone = v.split(' ', 1)
outcome = remainderforone[1:][0]
return outcome
print(whatistheremainder('the quick brown fox'))
Output
quick brown fox

Your logic can be further simplified in one line by setting maxsplit param as 1 for str.split() function as:
>>> my_string = 'the quick brown fox'
>>> my_string.split(' ', 1)[1]
'quick brown fox'
This will raise IndexError if your string is with one or no word.
Another alternative using string slicing with list.index(...) as:
>>> my_string[my_string.index(' ')+1:]
'quick brown fox'
Similar to earlier solution, this one will also not work for one or no word string and will raise ValueError exception.
To handle strings with one or no word, you can utilise the first solution using maxsplit param, but access it as list using list slicing instead of index and join it back:
>>> ''.join(my_string.split(' ', 1)[1:])
'quick brown fox'
Issue with your code is that your need to join the list of strings you are sending back using ' '.join(outcome). Hence, your function will become:
def whatistheremainder(v):
remainderforone = v.split(' ', 1)
outcome = remainderforone[1:]
return ' '.join(outcome)
Sample run:
>>> whatistheremainder('the quick brown fox')
'quick brown fox'
You above logic to split the string into words and joining it back skipping first word can also be converted into one line as:
>>> ' '.join(my_string.split()[1:])
'quick brown fox'

[1:] takes a slice from the list, which is itself a list:
>>> remainderforone
['the', 'quick brown fox']
>>> remainderforone[1:]
['quick brown fox']
Here the slice notation [1:] says to slice everything from index 1 (the second item) to the end of the list. There are only two items in the list, so you get a list of size one because the first item is skipped over.
To fix just extract a single element of the list. We know that the list should contain 2 elements, so you want the second item so just use index 1:
>>> remainderforone[1]
'quick brown fox'
As a more general solution you might want to consider using str.partition():
for s in ['the quick brown fox', 'hi there', 'single', '', 'abc\tefg']:
first, sep, rest = s.partition(' ')
first, sep, rest
('the', ' ', 'quick brown fox')
('hi', ' ', 'there')
('single', '', '')
('', '', '')
('abc\tefg', '', '')
Depending on how you want to handle those cases where no partitioning occurred you could just return rest, or possibly first:
def whatistheremainder(v):
first, sep, rest = v.partition(' ')
return rest
for s in ['the quick brown fox', 'hi there', 'single', '', 'abc\tefg']:
whatistheremainder(s)
'quick brown fox'
'there'
''
''
''
Or you could argue that the original string should be returned if no partitioning occurred because there was no first word to remove. You can use the fact that sep will be an empty string if no partitioning occurred:
def whatistheremainder(v):
first, sep, rest = v.partition(' ')
return rest if sep else first
for s in ['the quick brown fox', 'hi there', 'single', '', 'abc\tefg']:
whatistheremainder(s)
'quick brown fox'
'there'
'single'
''
'abc\tefg'

def whatistheremainder(v):
remainderforone = v.split(' ', 1)
outcome=v if len(remainderforone)== 1 else ''.join(remainderforone[1:])
return outcome
this line 'outcome=v if len(remainderforone)== 1 else ''.join(remainderforone[1:])' check if the length of the list contains all the words if the length is equal to 1 its mean there is only one word so the outcome will equal to v (the word entered) else mean there is more then one word the outcome will equal to the entered string without the first word

def whatistheremainder(v):
remainderforone = v.split(' ', 1)
outcome = ''.join(remainderforone[1:])
return outcome

split string by using regex in python

What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?

Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses

If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.

I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]

I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']

An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]

Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']

Python - Iterate through a list of strings and group partial matching strings

So I have a list of strings as below:
list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
How do I iterate through the list and group partially matching strings without given keywords. The result should like below:
list 1 = [["I love cat","I love dog","I love fish"],["I hate banana","I hate apple","I hate orange"]]
Thank you so much.

Sequence matcher will do the task for you. Tune the score ratio for better results.
Try this:
from difflib import SequenceMatcher
sentence_list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
result=[]
for sentence in sentence_list:
if(len(result)==0):
result.append([sentence])
else:
for i in range(0,len(result)):
score=SequenceMatcher(None,sentence,result[i][0]).ratio()
if(score<0.5):
if(i==len(result)-1):
result.append([sentence])
else:
if(score != 1):
result[i].append(sentence)
Output:
[['I love cat', 'I love dog', 'I love fish'], ['I hate banana', 'I hate apple', 'I hate orange']]

Try building an inverse index, and then you can pick whichever keywords you like. This approach ignores word order:
index = {}
for sentence in sentence_list:
for word in set(sentence.split()):
index.setdefault(word, set()).add(sentence)
Or this approach, which keys the index by all possible full-word phrase prefixes:
index = {}
for sentence in sentence_list:
number_of_words = length(sentence.split())
for i in xrange(1, number_of_words):
key_phrase = sentence.rsplit(maxsplit=i)[0]
index.setdefault(key_phrase, set()).add(sentence)
And then if you want to find all of the sentences that contain a keyword (or start with a phrase, if that's your index):
match_sentences = index[key_term]
Or a given set of keywords:
matching_sentences = reduce(list_of_keywords[1:], lambda x, y: x & index[y], initializer = index[list_of_keywords[0]])
Now you can generate a list grouped by pretty much any combination of terms or phrases by building a list comprehension using those indices to generate sentences. E.g., if you built the phrase prefix index and want everything grouped by the first two word phrase:
return [list(index[k]) for k in index if len(k.split()) == 2]

You can try this approach. Although it is not the best approach, it is helpful for understanding the problem in a more methodical way.
from itertools import groupby
my_list = ["I love cat","I love dog","I love fish","I hate banana","I hate apple","I hate orange"];
each_word = sorted([x.split() for x in my_list])
# I assumed the keywords would be everything except the last word
grouped = [list(value) for key, value in groupby(each_word, lambda x: x[:-1])]
result = []
for group in grouped:
temp = []
for i in range(len(group)):
temp.append(" ".join(group[i]))
result.append(temp)
print(result)
Output:
[['I hate apple', 'I hate banana', 'I hate orange'], ['I love cat', 'I love dog', 'I love fish']]

Avoid words like list in naming your variables. Also list 1 is not a valid python variable.
Try this:
import sys
from itertools import groupby
#Assuming you group by the first two words in each string, e.g. 'I love', 'I hate'.
L = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
L = sorted(L)
result = []
for key,group in groupby(L, lambda x: x.split(' ')[0] + ' ' + x.split(' ')[1]):
result.append(list(group))
print(result)

Search list: match only exact word/string

How to match exact string/word while searching a list. I have tried, but its not correct. below I have given the sample list, my code and the test results
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
my code:
for str in list:
if s in str:
print str
test results:
s = "hello" ~ expected output: 'Hi, hello' ~ output I get: 'Hi, hello'
s = "123" ~ expected output: *nothing* ~ output I get: 'hi mr 12345'
s = "12345" ~ expected output: 'hi mr 12345' ~ output I get: 'hi mr 12345'
s = "come" ~ expected output: *nothing* ~ output I get: 'welcome sir'
s = "welcome" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'
s = "welcome sir" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'
My list contains more than 200K strings

It looks like you need to perform this search not only once so I would recommend to convert your list into dictionary:
>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> d = dict()
>>> for item in l:
... for word in item.split():
... d.setdefault(word, list()).append(item)
...
So now you can easily do:
>>> d.get('hi')
['hi mr 12345']
>>> d.get('come') # nothing
>>> d.get('welcome')
['welcome sir']
p.s. probably you have to improve item.split() to handle commas, point and other separators. maybe use regex and \w.
p.p.s. as cularion mentioned this won't match "welcome sir". if you want to match whole string, it is just one additional line to proposed solution. but if you have to match part of string bounded by spaces and punctuation regex should be your choice.

>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> search = lambda word: filter(lambda x: word in x.split(),l)
>>> search('123')
[]
>>> search('12345')
['hi mr 12345']
>>> search('hello')
['Hi, hello']

if you search for exact match:
for str in list:
if set (s.split()) & set(str.split()):
print str

Provided s only ever consists of just a few words, you could do
s = s.split()
n = len(s)
for x in my_list:
words = x.split()
if s in (words[i:i+n] for i in range(len(words) - n + 1)):
print x
If s consists of many words, there are more efficient, but also much more complex algorithm for this.

use regular expression here to match exact word with word boundary \b
import re
.....
for str in list:
if re.search(r'\b'+wordToLook+'\b', str):
print str
\b only matches a word which is terminated and starts with word terminator e.g. space or line break
or do something like this to avoid typing the word for searching again and again.
import re
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
listOfWords = ['hello', 'Mr', '123']
reg = re.compile(r'(?i)\b(?:%s)\b' % '|'.join(listOfWords))
for str in list:
if reg.search(str):
print str
(?i) is to search for without worrying about the case of words, if you want to search with case sensitivity then remove it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select strings by positions of words - python

For the following tuple mysentence = 'i have a dog and a cat', 'i have a cat and a dog', 'i have a cat', 'i have a dog' How to select only the strings 'i have a cat' , 'i have a dog', i.e exclude strings having the word dog or cat in the middle.

If the string should just end with a specific phrase then this will do the job: phases = ("I have a cat", "I have a dog") for sentence in mysentence: for phase in phases: if sentence.lower().endswith(phase.lower()): print(sentence)

Simplest thing that could possibly work: In [10]: [phrase for phrase in mysentence if not ' and ' in phrase] Out[10]: ['i have a cat', 'i have a dog']

for items in mysentence: if (items.find("dog")>=0)^(items.find("cat")>=0): print(items) You just need an xor operator and the find function. No need to import

Related

How do I convert a list of strings to a proper sentence

Remove first word from the sentence and return remaining string

split string by using regex in python

Python - Iterate through a list of strings and group partial matching strings

Search list: match only exact word/string

Categories

Resources