In Python, how to match a string in a word - python

Python code
str= "bcd"
word = "abcd1"
if pattern = re.search(str, word):
print pattern.group(1)
I want to search "bdc" in a word.. how do I do it?

>>> str= "bcd"
>>> word = "abcd1"
>>> str in word
True
Simple way

You can do it with find() function of a string object.
str = "abc"
word = "abcd1"
index = word.find (str)
if ( index != -1 ) :
print (index)
Index shows the first character of the subsecuence that you are looking for.

You can use word.index(str). It will return the position of str in word, or raise an exception if str is not found in word.
But if you like/want to use regular expressions, your use of re.search() is correct. re.search returns a match if the pattern is found in the string, or None if it is not found. Since None evaluates to False, you can just do:
if re.search(str, word):
# found the pattern

if all you need is to know if string is contained in word, you can also simply:
if str in word:
whatever
else:
somethingelse

You don't mention whether you need any information, like what position the result is in. If you just want a simple check, you can use the "If X in Y" approach:
In [1]: needle = "pie"
In [2]: haystack = "piece of string"
In [3]: if needle in haystack: print True
True

If you simply want to know if the string is in another word, you can do:
if 'bdc' in word:
# do something
If you need to do that with regex:
import re
pat = re.compile('bdc')
pat.search(word)
(Or just re.search('bdc', word) )
Also, you should most likely never use str as an variable name, as it already is a builtin function str()

Related

Pattern matching Strings

I have an array of strings:
Input:
["series-a-funding", "series-b-financials","series-c-funding","series-b-funding","abc-funding","raised-acd", "fund-series-a", "fund-series-b"]
and I want to filter all the strings which has series-a, series-b, series-c in the strings. My sample output will be
["series-a-funding", "series-b-financials","series-c-funding","series-b-funding","fund-series-a", "fund-series-b"]
I have tried something like this
def interesting(textInput):
textInput = textInput.lower()
if any([word in textInput for word in ['fund-series-%s' or 'series-%s-funding' or 'series-%s-financing' %i for i in ['a', 'b', 'c', 'd']]]):
return True
return False
But no luck. Can anyone help me with this. I am new to python so don't have much idea on this.
I believe this will do:
import re
pattern = re.compile(
'^(fund-series-[abcd]|series-[abcd]-funding|series-[abcd]-financing)$',
re.IGNORECASE
)
def interesting(word):
return bool(pattern.match(word))
We declare pattern, described by your rules and your presented code and then declare function to check if word matched the pattern (case-insensitive)
Or if you need more global series-a/b/c/d pattern, then search will do:
pattern = re.compile('series-[abcd]', re.IGNORECASE)
def interesting(word):
return bool(pattern.search(word))
This solution will filter strings matching the regex series-[abc]:
import re
regex = re.compile('series-[abc]')
output_list = list(filter(regex.search, input_list))
The expression filter(regex.search, input_list) applies the function regex.search to each element in the list and returns only those elements for which the result is True when converted to bool. We make use of the fact that the regex.search function returns a SRE_Match object in the case of match, which evaluates to True when converted to bool, or a None in the case of mismatch, which evaluates to False.
import re
s = ["series-a-funding", "series-b-financials","series-c-funding","series-b-funding","abc-funding","raised-acd", "fund-series-a", "fund-series-b"]
[i for i in s if bool(re.match(r"series-['a','b','c']+", i))]
gives
['series-a-funding', 'series-b-financials', 'series-c-funding', 'series-b-funding']
What you want is a list which contains the string with necessary keywords. Not whether they are true or false. Then instead of using return True or return False you should try return a list.
Another problem in your code is using methods incorrectly. lower () method is applicable for strings only, not for lists. Therefore you should have used textInput = text.lower() for text in textInput
Also you would want to return the true string. Not its lower cased clone. Therefore converting original list to lowercase at the beginning of the function is inadvisable. Instead you could lower case each string when comparing and not at the beginning.
Here is a simple program I have written:
textList = ["series-a-funding", "series-b-financials","series-c-funding","series-b-funding","abc-funding","raised-acd", "fund-series-a", "fund-series-b"]
def printStrings (strings):
return [string for string in textList if 'series-a' in string.lower () or 'series-b' in string.lower () or 'series-c' in string.lower ()]
print printStrings (strings)

Check if a string has unique characters excluding whitespace

I'm practicing questions from Cracking the coding interview to become better and just in case, be prepared. The first problem states: Find if a string has all unique characters or not? I wrote this and it works perfectly:
def isunique(string):
x = []
for i in string:
if i in x:
return False
else:
x.append(i)
return True
Now, my question is, what if I have all unique characters like in:
'I am J'
which would be pretty rare, but lets say it occurs by mere chance, how can I create an exception for the spaces? I a way it doesn't count the space as a character, so the func returns True and not False?
Now no matter how space or how many special characters in your string , it will just count the words :
import re
def isunique(string):
pattern=r'\w'
search=re.findall(pattern,string)
string=search
x = []
for i in string:
if i in x:
return False
else:
x.append(i)
return True
print(isunique('I am J'))
output:
True
without space words test case :
print(isunique('war'))
True
with space words test case:
print(isunique('w a r'))
True
repeating letters :
print(isunique('warrior'))
False
Create a list of characters you want to consider as non-characters and replace them in string. Then perform your function code.
As an alternative, to check the uniqueness of characters, the better approach will be to compare the length of final string with the set value of that string as:
def isunique(my_string):
nonchars = [' ', '.', ',']
for nonchar in nonchars:
my_string = my_string.replace(nonchar, '')
return len(set(my_string)) == len(my_string)
Sample Run:
>>> isunique( 'I am J' )
True
As per the Python's set() document:
Return a new set object, optionally with elements taken from iterable.
set is a built-in class. See set and Set Types — set, frozenset for
documentation about this class.
And... a pool of answers is never complete unless there is also a regex solution:
def is_unique(string):
import re
patt = re.compile(r"^.*?(.).*?(\1).*$")
return not re.search(patt, string)
(I'll leave the whitespace handling as an exercise to the OP)
An elegant approach (YMMV), with collections.Counter.
from collections import Counter
def isunique(string):
return Counter(string.replace(' ', '')).most_common(1)[0][-1] == 1
Alternatively, if your strings contain more than just whitespaces (tabs and newlines for instance), I'd recommend regex based substitution:
import re
string = re.sub(r'\s+', '', string, flags=re.M)
Simple solution
def isunique(string):
return all(string.count(i)==1 for i in string if i!=' ')

simple regex pattern not matching [duplicate]

>>> import re
>>> s = 'this is a test'
>>> reg1 = re.compile('test$')
>>> match1 = reg1.match(s)
>>> print match1
None
in Kiki that matches the test at the end of the s. What do I miss? (I tried re.compile(r'test$') as well)
Use
match1 = reg1.search(s)
instead. The match function only matches at the start of the string ... see the documentation here:
Python offers two different primitive operations based on regular expressions: re.match() checks for a match only at the beginning of the string, while re.search() checks for a match anywhere in the string (this is what Perl does by default).
Your regex does not match the full string. You can use search instead as Useless mentioned, or you can change your regex to match the full string:
'^this is a test$'
Or somewhat harder to read but somewhat less useless:
'^t[^t]*test$'
It depends on what you're trying to do.
It's because of that match method returns None if it couldn't find expected pattern, if it find the pattern it would return an object with type of _sre.SRE_match .
So, if you want Boolean (True or False) result from match you must check the result is None or not!
You could examine texts are matched or not somehow like this:
string_to_evaluate = "Your text that needs to be examined"
expected_pattern = "pattern"
if re.match(expected_pattern, string_to_evaluate) is not None:
print("The text is as you expected!")
else:
print("The text is not as you expected!")

Does python function str.find() returns string rather than index num

a = 'hello world'
a.find('wor')
# result: 5
which is index 5 .
this is not what i want. i want to return that particular sub string which has been found on index 5.
In Python how to search / find sub-string in a string that returns that sub-string if found not an index number.
Is there way out.....
Note that strings in Python are immutable, so if you somehow extract a substring from a source string a new string has to be created by copying the characters from the source string. It's not like in some other languages where you can simply return a reference into the source string.
To do what you want, I'd simply use the in operator. Eg:
a = 'hello world'
data = ('wor', 'WOR', 'hell', 'help')
for s in data:
print(s, s if s in a else None)
output
wor wor
WOR None
hell hell
help None
Or if you prefer a function:
def found(src, substring):
return substring if substring in src else None
for s in data:
print(s, found(a, s))
In case you're unfamiliar with Python's yes_value if some_condition else no_value syntax (aka a conditional expression), here's that function re-written using a more "traditional" if ... else block:
def found(src, substring):
if substring in src:
return substring
else:
return None
You can simply do like this:
myString = "hello world"
found = a.find('wor')
a = a[found: found+3]
of course just a simple example but should give you the idea, it uses python list slicing functionality, you can go one step further:
def find_substring(string, substring):
pos = string.find(substring)
return string[pos: pos + len(substring)]
This fulfills your stated requirement:
>>> import re
>>> (re.search('wor','Hello world')).group(0)
'wor'
>>>
(Read the Documentation for re.search for more information)

Matching whole words using "in" in python

I've been searching around for some time for this, but have still not found an answer, maybe its got some thing to do with regular expressions, but i think there should be a simple answer that I am missing here. It seems very trivial to me ... here goes:
On the python interpreter I get:
"abc" in "abc123"
as True.
I want it a command that returns a False. I want the entire word to be matched.
Thanks!
in isn't how it's done.
>>> re.search(r'\babc\b', 'abc123')
>>> re.search(r'\babc\b', 'abc 123')
<_sre.SRE_Match object at 0x1146780>
If you want to do a plain match of just one word, use ==:
'abc' == 'abc123' # false
If you're doing 'abc' in ['cde','fdabc','abc123'], that returns False anyway:
'abc' in ['cde','fdabc','abc123'] # False
The reason 'abc' in 'abc123' returns true, from the docs:
For the Unicode and string types, x in y is true if and only if x is a
substring of y. An equivalent test is y.find(x) != -1.
So for comparing against a single string, use '==', and if comparing in a collection of strings, in can be used (you could also do 'abc' in ['abc123'] - since the behaviour of in works as your intuition imagines when y is a list or collection of sorts.
I might not understand your question, but it seems like what you want is "abc123" == "abc". This returns False, whereas "abc123" == "abc123" returns True.
Perhaps what you are looking for is matching on whole words but splitting on whitespace? That is, "abc" does not match "abc123", but it does match "abc def"? If that is the case, you want something like this:
def word_in (word, phrase):
return word in phrase.split()
word_in("abc", "abc123") # False
word_in("abc", "abc def") # True
In my case, I used a small trick. The whole word should be surrounded with spaces. So, if I want to find word like "Kill", I will search for " Kill ". In this case, it wont match with word like "Skill"
' kill ' in myString
Perhaps a wrong shot but it can be done in a more simple way.
def word_in(needle,haystack,case_sensitive=True):
if needle + ' ' in haystack or ' ' + needle + ' ' in haystack or needle + ' ' in haystack:
return True
return False
print word_in('abc','abc123')
print word_in('abc','abc 123')
First example produces False, the other True
You can try to use find method and compare the result to -1:
>>> a = "abc123"
>>> a.find("abc")
0
>>> a.find("bcd")
-1

Categories