Find substring in Python

Find substring in Python - python

I have found synonyms of a word "plant"
syn = wordnet.synsets('plant')[0].lemmas()
>>>[Lemma('plant.n.01.plant'), Lemma('plant.n.01.works'), Lemma('plant.n.01.industrial_plant')]
and an input word
word = 'work'
I want to find if 'work' appears in syn. How to do it?

Lemma's have a name() method so what you could do is
>>> 'works' in map(lambda x: x.name(), syn)
True
Edit: did not see you said "work", not works, so this would be:
>>> for i in syn:
... if 'work' in i.name():
... print True
...
True
You can wrap it in a function for example.
Or a mixture of the two suggestions I made:
any(map(lambda x: 'work' in x, map(lambda x: x.name(), syn)))

You can easily check for the presence of a substring using the keyword in in python:
>>> word = "work"
>>> word in 'plant.n.01.works'
True
>>> word in 'plant.n.01.industrial_plant'
False
If you want to test this in a list you can do a loop:
syn = ["plant.one","plant.two"]
for plant in syn:
if word in plant:
print("ok")
Or better a list comprehension:
result = [word in plant for plant in syn]
# To get the number of matches, you can sum the resulting list:
sum(result)
Edit: If you have a long list of words to look for, you can just nest two loops:
words_to_search = ["work","spam","foo"]
syn = ["plant.one","plant.two"]
for word in words_to_search_for:
if sum([word in plant for plant in syn]):
print("{} is present in syn".format(word))
Note that you are manipulating Lemma objects and not strings. You might need to check for word in plant.name instead of just word if the object do not implement the [__contains__](https://docs.python.org/2/library/operator.html#operator.__contains__) method. I am not familiar with this library though.

str1 = "this is a example , xxx"
str2 = "example"
target_len = len(str2)
str_start_position = str1.index(str2) #or str1.find(str2)
str_end_position = str_start_position + target_len
you can use str_start_position and str_end_position to get your target substring

Related

Remove words from a list that end with a suffix without using endswith()

I want to write a python function that takes 2 parameters:
List of words and
Ending letters
I want my function to work in such a way that it modifies the original list of words and removes the words which end with the "ending letters" specified.
For example:
list_words = ["hello", "jello","whatsup","right", "cello", "estello"]
ending = "ello"
my_func(list_words, ending)
This should give the following output:
list_words = ["whatsup","right"]
It should pop off all the strings that end with the ending letters given in the second argument of the function.
I can code this function using the .endswith method but I am not allowed to use it. How else can I do this using a loop?

Try:
def my_func(list_words, ending):
return [word for word in list_words if word[len(word)-len(ending):] != ending]

def filter_words(list_words, ending):
return [*filter(lambda x: x[-len(ending):] != ending , list_words)]

Not allowed to use endswith? Not a problem :-P
def my_func(list_words, ending):
list_words[:] = [word for word in list_words
if not word[::-1].startswith(ending[::-1])]
return list_words
Loopholes ftw.
(Adapted to your insistence on modifying the given list. You should probably really decide whether to modify or return, though, not do both, which is rather unusual in Python.)

You can easily check for the last4 characters of a string using string[-4:].
So you can use the below code
list_words = ["hello", "jello","whatsup","right", "cello", "estello"]
ending = "ello"
def my_func(wordsArray, endingStr):
endLen = len(endingStr)
output = []
for x in wordsArray:
if not x[-endLen:] == endingStr:
output.append(x)
return output
list_words = my_func(list_words, ending)
You can shorten the function with some list comprehension like this:
def short_func(wordsArray, endingStr):
endLen = len(endingStr)
output = [x for x in wordsArray if x[-endLen:] != endingStr]
return output
list_words = short_func(list_words, ending)

It is always better to not modify the existing list you can get a list which doesn't have the words with the ending specified like below. If you want to have it as a function you can have it in a following manner. You can assign the formatted list to list_words again.
def format_list(words, ending):
new_list = []
n = len(ending)
for word in words:
if len(word) >= n and n > 0:
if not word[-n:] == ending:
new_list.append(word)
else:
new_list.append(word)
return new_list
list_words = format_list(list_words, ending)
print(list_words)

Rearranging a string based on specific requirements

Hi there so I am looking to build this python function with simple things like def, find etc. so far I know how to get the first part of the code.
Given a string such as "HELLODOGMEMEDOGPAPA", I will need to return a list that gives me three things:
Everything before the word dog which i will denote as before_dog
The word dog until dog appears again dog_todog
Everything after the second time dog appears will be denoted by after_todog
The list will be in the form [before_dog,dog_todog,after_todog].
so for example given ("HELLODOGMEMEDOGPAPADD") this will return the list
("HELLO","DOGMEME","DOGPAPADD")
another example would be ("HEYHELLOMANDOGYDOGDADDY") this would return the list
("HEYHELLOMAN","DOGY","DOGDADDY")
but if I have ("HEYHELLODOGDADDY")
the output will be ("HEYHELLO","DOGDADDY","")
also if dog never appears ("HEYHELLOYO") then the output will be ("HEYHELLOYO,"","")
This is what I have so far:
def split_list(words):
# declare the list
lst = []
# find the first position
first_pos=words.find("DOG")
# find the first_pos
before_dog = words [0:first_pos]
lst.append(before_dog)
return lst

Funny function split_2_dogs() with re.findall() function:
import re
def split_2_dogs(s):
if s.count('DOG') == 2: # assuring 2 dogs are "walking" there
return list(re.findall(r'^(.*)(DOG.*)(DOG.*)$', s)[0])
print(split_2_dogs("HELLODOGMEMEDOGPAPADD"))
print(split_2_dogs("HEYHELLOMANDOGYDOGDADDY"))
The output:
['HELLO', 'DOGMEME', 'DOGPAPADD']
['HEYHELLOMAN', 'DOGY', 'DOGDADDY']
Alternative solution with str.index() and str.rfind() functions:
def split_2_dogs(s):
if 'DOG' not in s: return [s,'']
pos1, pos2 = s.index('DOG'), s.rfind('DOG')
return [s[0:pos1], s[pos1:pos2], s[pos2:]]

This is pretty easy to do using the split function. For example, you can split any string by a delimiter, like dog, as so:
>>> chunks = 'HELLODOGMEMEDOGPAPA'.split('DOG')
>>> print(chunks)
['HELLO', 'MEME', 'PAPA']
You could then use the output of that in a list comprehension, like so:
>>> dog_chunks = chunks[:1] + ["DOG" + chunk for chunk in chunks[1:]]
>>> print(dog_chunks)
['HELLO', 'DOGMEME', 'DOGPAPA']
The only slightly tricky bit is making sure you don't prepend dog to the first string in the list, hence the little bits of slicing.

Split the string at 'DOG' and use conditions to get the desired result
s = 'HELLODOGMEMEDOGPAPADD'
l = s.split('DOG')
dl = ['DOG'+i for i in l[1:]]
[l[0]]+dl if l[0] else dl
Output:
['HELLO', 'DOGMEME', 'DOGPAPADD']

Splitting at DOG is the key!! This code will for all the cases that you have mentioned.
from itertools import izip_longest
words = 'HEYHELLODOGDADDY'
words = words.split("DOG")
words = ['DOG'+j if i>0 else j for i,j in enumerate(words)]
# words = ['HEYHELLO', 'DOGDADDY']
ans = ['','','']
# stitch words and ans together
ans = [m+n for m,n in izip_longest(words,ans,fillvalue='')]
print ans
Output :
['HEYHELLO', 'DOGDADDY', '']

Can I call a function inside a Lambda expression in python

I have a function with including if, else condition and for loop. I want to write this function inside a lambda expression. I tried from many ways to create this lambda function. But still I couldn't do it. This is my function with another rules.
negation ='no,not,never'.split(',')
list2 = 'miss,loss,gone,give up,lost'.split(',')
def f(sentence):
s = sentence.split()
l = [s.index(word) for word in s if word in list2]
# Will returns list of indices (of sentence) where word is in list2
if len(l) > 0:
for e in l:
# Check previous word
if s[e-1] not in negation:
print 'sad'
Can I express this function inside a lambda expression since I developing a rule based classifier for detect emotion from a sentence like happy, sad, angry. Following is my lambda function.
rules = [(lambda x: word_tokenize(x)[-1] == '?', "neutral"),
(lambda x: word_tokenize(x)[0] in question, "neutral"),
(lambda x: any(word in list2 for word in [WordNetLemmatizer().lemmatize(word,'v') for word in word_tokenize(x)]), "sad"),
(lambda x: any(word in list1 for word in [WordNetLemmatizer().lemmatize(word,'v') for word in word_tokenize(x)]), "happy")]
print classify("I miss you", rules)

Instead of cramming everything into a lambda expression, I would just create a function that did everything you need it to do (from your comment, it sounds like you want to apply certain rules to a sentence in a certain order). You can always use that function in list comprehension, map, reduce, etc. Since I don't know exactly what your rules are though, this is the best example I can give:
a = ["This is not a sentence. That was false.",
"You cannot play volleyball. You can play baseball.",
"My uncle once ate an entire bag of corn chips! I am not lying!"]
def f(paragraph):
sentences = paragraph.split(".")
result = []
for i in range(len(sentences)):
//apply rules to sentences
if "not" in sentences[i]:
result.append("negative")
else:
result.append("positive")
return result
my_result = [f(x) for x in a]

Your function could use some improvement:
negation_words = {"no", "not", "never"}
sad_words = {"miss", "loss", "gone", "give", "lost"}
def count_occurrences(s, search_words, negation_words=negation_words):
count = 0
neg = False
for word in s.lower().split(): # should also strip punctuation
if word in search_words and not neg:
count += 1
neg = word in negation_words
return count
print("\n".join(["sad"] * count_occurrences(s, sad_words)))

more efficient way to replace items on a list based on a condition

I have the following piece of code. Basically, I'm trying to replace a word if it matches one of these regex patterns. If the word matches even once, the word should be completely gone from the new list. The code below works, however, I'm wondering if there's a way to implement this so that I can indefinitely add more patterns to the 'pat' list without having to write additional if statements within the for loop.
To clarify, my regex patterns have negative lookaheads and lookbehinds to make sure it's one word.
pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']
if isinstance(x, list):
new = []
for i in x:
if re.search(pat[0], i):
i = re.sub(pat[0], '', i)
if re.search(pat[1], i):
i = re.sub(pat[1], '', i)
if len(i) > 0:
new.append(i)
x = new
else:
x = x.strip()

Just add another for loop:
for patn in pat:
if re.search(patn, i):
i = re.sub(patn, '', i)
if i:
new.append(i)

pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']
if isinstance(x, list):
new = []
for i in x:
for p in pat:
i = re.sub(p, '', i)
if len(i) > 0:
new.append(i)
x = new
else:
x = x.strip()

Add another loop:
pat = [r'(?<![a-z][ ])Pacific(?![ ])', r'(?<![a-z][ ])Global(?![ ])']
if isinstance(x, list):
new = []
for i in x:
# iterate through pat list
for regx in pat:
if re.search(regx, i):
i = re.sub(regx, '', i)
...

If in your pattern, then changes are only the words, then you can add the words joined with | to make it or. So for your two patterns from the example will become one like below one.
r'(?<![a-z][ ])(?:Pacific|Global)(?![ ])'
If you need to add more words, just add with a pipe. For example (?:word1|word2|word3)
Inside the bracket ?: means do not capture the group.

something like this:
[word for word in l if not any(re.search(p, word) for p in pat)]

I will attempt a guess here; if I am wrong, please skip to the "this is how I'd write it" and modify the code that I provide, according to what you intend to do (which I may have failed to understand).
I am assuming you are trying to eliminate the words "Global" and "Pacific" in a list of phrases that may contain them.
If that is the case, I think your regular expression does not do what you specify. You probably intended to have something like the following (which does not work as-is!):
pat = [r'(?<=[a-z][ ])Pacific(?=[ ])', r'(?<=[a-z][ ])Global(?=[ ])']
The difference is in the look-ahead patterns, which are positive ((?=...) and (?<=...)) instead of negative ((?!...) and (?<!...)).
Furthermore, writing your regular expressions like this will not always correctly eliminate white space between your words.
This is how I'd write it:
words = ['Pacific', 'Global']
pat = "|".join(r'\b' + word + r'\b\s*' for word in words)
if isinstance(x, str):
x = x.strip() # I don't understand why you don't sub here, anyway!
else:
x = [s for s in (re.sub(pat, '', s) for s in x) if s != '']
In the regular expression for patterns, notice (a) \b, standing for "the empty string, but only at the beginning or end of a word" (see the manual), (b) the use of | for separating alternative patterns, and (c) \s, standing for "characters considered whitespace". The latter is what takes care of correctly removing unnecessary space after each eliminated word.
This works correctly in both Python 2 and Python 3. I think the code is much clearer and, in terms of efficiency, it's best if you leave re to do its work instead of testing each pattern separately.
Given:
x = ["from Global a to Pacific b",
"Global Pacific",
"Pacific Global",
"none",
"only Global and that's it"]
this produces:
x = ['from a to b', 'none', "only and that's it"]

Find word infront and behind of a Python list

This is related to following question - Searching for Unicode characters in Python
I have string like this -
sentence = 'AASFG BBBSDC FEKGG SDFGF'
I split it and get list of words like below -
sentence = ['AASFG', 'BBBSDC', 'FEKGG', 'SDFGF']
I search of part of a word using following code and get whole word -
[word for word in sentence.split() if word.endswith("GG")]
It returns ['FEKGG']
Now i need to find out what is infront and behind of that word.
For example when i search for "GG" it returns ['FEKGG']. Also it should able to get
behind = 'BBBSDC'
infront = 'SDFGF'

Using this generator:
If you have the following string (edited from original):
sentence = 'AASFG BBBSDC FEKGG SDFGF KETGG'
def neighborhood(iterable):
iterator = iter(iterable)
prev = None
item = iterator.next() # throws StopIteration if empty.
for next in iterator:
yield (prev,item,next)
prev = item
item = next
yield (prev,item,None)
matches = [word for word in sentence.split() if word.endswith("GG")]
results = []
for prev, item, next in neighborhood(sentence.split()):
for match in matches:
if match == item:
results.append((prev, item, next))
This returns:
[('BBBSDC', 'FEKGG', 'SDFGF'), ('SDFGF', 'KETGG', None)]

Here's one possibility:
words = sentence.split()
[pos] = [i for (i, word) in enumerate(words) if word.endswith("GG") ]
behind = words[pos - 1]
infront = words[pos + 1]
You might need to take care with edge-cases, such as "…GG" not appearing, appearing more than once, or being the first and/or last word. As it stands, any of these will raise an exception, which may well be the correct behaviour.
A completely different solution using regexes avoids splitting the string into an array in the first place:
match = re.search(r'\b(\w+)\s+(?:\w+GG)\s+(\w+)\b', sentence)
(behind, infront) = m.groups()

This is one way. The infront and behind elements will be None if the "GG" word is at the beginning or end of the sentence.
words = sentence.split()
[(infront, word, behind) for (infront, word, behind) in
zip([None] + words[:-1], words, words[1:] + [None])
if word.endswith("GG")]

sentence = 'AASFG BBBSDC FEKGG SDFGF AAABGG FOOO EEEGG'
def make_trigrams(l):
l = [None] + l + [None]
for i in range(len(l)-2):
yield (l[i], l[i+1], l[i+2])
for result in [t for t in make_trigrams(sentence.split()) if t[1].endswith('GG')]:
behind,match,infront = result
print 'Behind:', behind
print 'Match:', match
print 'Infront:', infront, '\n'
Output:
Behind: BBBSDC
Match: FEKGG
Infront: SDFGF
Behind: SDFGF
Match: AAABGG
Infront: FOOO
Behind: FOOO
Match: EEEGG
Infront: None

another itertools based option, may be more memory friendly on large datasets
from itertools import tee, izip
def sentence_targets(sentence, endstring):
before, target, after = tee(sentence.split(), 3)
# offset the iterators....
target.next()
after.next()
after.next()
for trigram in izip(before, target, after):
if trigram[1].endswith(endstring): yield trigram
EDIT: fixed typo

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find substring in Python - python

I have found synonyms of a word "plant" syn = wordnet.synsets('plant')[0].lemmas() >>>[Lemma('plant.n.01.plant'), Lemma('plant.n.01.works'), Lemma('plant.n.01.industrial_plant')] and an input word word = 'work' I want to find if 'work' appears in syn. How to do it?

str1 = "this is a example , xxx" str2 = "example" target_len = len(str2) str_start_position = str1.index(str2) #or str1.find(str2) str_end_position = str_start_position + target_len you can use str_start_position and str_end_position to get your target substring

Related

Remove words from a list that end with a suffix without using endswith()

Rearranging a string based on specific requirements

Can I call a function inside a Lambda expression in python

more efficient way to replace items on a list based on a condition

Find word infront and behind of a Python list

Categories

Resources