"IN" operator with empty strings in Python 3.0 [duplicate] - python

This question already has answers here:
Why empty string is on every string? [duplicate]
(2 answers)
Closed 6 years ago.
As I am going through tutorials on Python 3, I came across the following:
>>> '' in 'spam'
True
My understanding is that '' equals no blank spaces.
When I try the following the shell terminal, I get the output shown below it:
>>> '' in ' spam '
True
Can someone please help explain what is happening?

'' is the empty string, same as "". The empty string is a substring of every other string.
When a and b are strings, the expression a in b checks that a is a substring of b. That is, the sequence of characters of a must exist in b; there must be an index i such that b[i:i+len(a)] == a. If a is empty, then any index i satisfies this condition.
This does not mean that when you iterate over b, you will get a. Unlike other sequences, while every element produced by for a in b satisfies a in b, a in b does not imply that a will be produced by iterating over b.
So '' in x and "" in x returns True for any string x:
>>> '' in 'spam'
True
>>> "" in 'spam'
True
>>> "" in ''
True
>>> '' in ""
True
>>> '' in ''
True
>>> '' in ' '
True
>>> "" in " "
True

The string literal '' represents the empty string. This is basically a string with a length of zero, which contains no characters.
The in operator is defined for sequences to return “True if an item of s is equal to x, else False” for an expression x in s. For general sequences, this means that one of the items in s (usually accessible using iteration) equals the tested element x. For strings however, the in operator has subsequence semantics. So x in s is true, when x is a substring of s.
Formally, this means that for a substring x with a length of n, there must be an index i which satisfies the following expression: s[i:i+n] == x.
This is easily understood with an example:
>>> s = 'foobar'
>>> x = 'foo'
>>> n = len(x) # 3
>>> i = 0
>>> s[i:i+n] == x
True
>>> x = 'obar'
>>> n = len(x) # 4
>>> i = 2
>>> s[i:i+n] == x
True
Algorithmically, what the in operator (or the underlying __contains__ method) needs to do is iterate the i to all possible values (0 <= i < len(s) - n) and check if the condition is true for any i.
Looking back at the empty string, it becomes clear why the '' in s check is true for every string s: n is zero, so we are checking s[i:i]; and that is the empty string itself for every valid index i:
>>> s[0:0]
''
>>> s[1:1]
''
>>> s[2:2]
''
It is even true for s being the empty string itself, because sequence slicing is defined to return an empty sequence when a range outside of the sequence is specified (that’s why you could do s[74565463:74565469] on short strings).
So that explains why the containment check with in always returns True when checking the empty string as a substring. But even if you think about it logically, you can see the reason: A substring is part of a string which you can find in another string. The empty string however can be find between every two characters. It’s like how you can add an infinite amount of zeros to a number, you can add an infinite amount of empty strings to a string without actually modifying that string.

As Rushy Panchal points out, in inclusion operator follows set-theoretic convention and assumes that an empty string is a substring of any string.
You can try to persuade yourself why this makes sense by considering the following: let s be a string such that '' in s == False. Then '' in s[len(s):] better be false by transitivity (or else there is a subset of s that contains '', but s does not contain '', etc). But then '' in '' == False, which isn't great either. So you cannot pick any string s such that '' not in s which does not create a problem.
Of course, when in doubt, simulate it:
s = input('Enter any string you dare:\n')
print('' in '')
print(s == s + '' == '' + s)
print('' in '' + s)

Related

Testing if a string contains or startswith "no-value" ("") [duplicate]

I am not able to understand the behavior of the str.startswith method.
If I execute "hello".startswith("") it returns True. Ideally it doesn't starts with empty string.
>>> "hello".startswith("")
True
The documentation states:
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for.
So how does the function work?
str.startswith() can be expressed in Python code as:
def startswith(source, prefix):
return source[:len(prefix)] == prefix
It tests if the first len(prefix) characters of the source string are equal to the prefix. If you pass in a prefix of length zero, that means the first 0 characters are tested. A string of length 0 is always equal to any other string of length 0.
Note that this applies to other string tests too:
>>> s = 'foobar'
>>> '' in s
True
>>> s.endswith('')
True
>>> s.find('')
0
>>> s.index('')
0
>>> s.count('')
7
>>> s.replace('', ' -> ')
' -> f -> o -> o -> b -> a -> r -> '
Those last two demos, counting the empty string or replacing the empty string with something else, shows that you can find an empty string at every position in the input string.
A string p is a prefix of a string s if s = p + x, so the empty string is a prefix of all strings (it's like 0, s = 0 + s).

How does python startswith work?

I am not able to understand the behavior of the str.startswith method.
If I execute "hello".startswith("") it returns True. Ideally it doesn't starts with empty string.
>>> "hello".startswith("")
True
The documentation states:
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for.
So how does the function work?
str.startswith() can be expressed in Python code as:
def startswith(source, prefix):
return source[:len(prefix)] == prefix
It tests if the first len(prefix) characters of the source string are equal to the prefix. If you pass in a prefix of length zero, that means the first 0 characters are tested. A string of length 0 is always equal to any other string of length 0.
Note that this applies to other string tests too:
>>> s = 'foobar'
>>> '' in s
True
>>> s.endswith('')
True
>>> s.find('')
0
>>> s.index('')
0
>>> s.count('')
7
>>> s.replace('', ' -> ')
' -> f -> o -> o -> b -> a -> r -> '
Those last two demos, counting the empty string or replacing the empty string with something else, shows that you can find an empty string at every position in the input string.
A string p is a prefix of a string s if s = p + x, so the empty string is a prefix of all strings (it's like 0, s = 0 + s).

How to Check if String Has the same characters in Python [duplicate]

This question already has answers here:
efficiently checking that string consists of one character in Python
(8 answers)
Closed 6 years ago.
What is the shortest way to check if a given string has the same characters?
For example if you have name = 'aaaaa' or surname = 'bbbb' or underscores = '___' or p = '++++', how do you check to know the characters are the same?
An option is to check whether the set of its characters has length 1:
>>> len(set("aaaa")) == 1
True
Or with all(), this could be faster if the strings are very long and it's rare that they are all the same character (but then the regex is good too):
>>> s = "aaaaa"
>>> s0 = s[0]
>>> all(c == s0 for c in s[1:])
True
You can use regex for this:
import re
p = re.compile(ur'^(.)\1*$')
re.search(p, "aaaa") # returns a match object
re.search(p, "bbbb") # returns a match object
re.search(p, "aaab") # returns None
Here's an explanation of what this regex pattern means: https://regexper.com/#%5E(.)%5C1*%24
Also possible:
s = "aaaaa"
s.count(s[0]) == len(s)
compare == len(name) * name[0]
if(compare):
# all characters are same
else:
# all characters aren't same
Here are a couple of ways.
def all_match0(s):
head, tail = s[0], s[1:]
return tail == head * len(tail)
def all_match1(s):
head, tail = s[0], s[1:]
return all(c == head for c in tail)
all_match = all_match0
data = [
'aaaaa',
'bbbb',
'___',
'++++',
'q',
'aaaaaz',
'bbbBb',
'_---',
]
for s in data:
print(s, all_match(s))
output
aaaaa True
bbbb True
___ True
++++ True
q True
aaaaaz False
bbbBb False
_--- False
all_match0 will be faster unless the string is very long, because its testing loop runs at C speed, but it uses more RAM because it constructs a duplicate string. For very long strings, the time taken to construct the duplicate string becomes significant, and of course it can't do any testing until it creates that duplicate string.
all_match1 should only be slightly slower, even for short strings, and because it stops testing as soon as it finds a mismatch it may even be faster than all_match0, if the mismatch occurs early enough in the string.
try to use Counter (High-performance container datatypes).
>>> from collections import Counter
>>> s = 'aaaaaaaaa'
>>> c = Counter(s)
>>> len(c) == 1
True

Search for a pattern in a string in python

Question: I am very new to python so please bear with me. This is a homework assignment that I need some help with.
So, for the matchPat function, I need to write a function that will take two arguments, str1 and str2, and return a Boolean indicating whether str1 is in str2. But I have to use an asterisk as a wild card in str1. The * can only be used in str1 and it will represent one or more characters that I need to ignore. Examples of matchPat are as follow:
matchPat ( 'a*t*r', 'anteaters' ) : True
matchPat ( 'a*t*r', 'albatross' ) : True
matchPat ( 'a*t*r', 'artist' ) : False
My current matchPat function can tell whether the characters of str1 are in str2 but I don't really know how I could tell python (by using the * as a wild card) to look for 'a' (the first letter) and after it finds a, skip the next 0 or more characters until it finds the next letter(which would be 't' in the example) and so on.
def matchPat(str1,str2):
## str(*)==str(=>1)
if str1=='':
return True
elif str2=='':
return False
elif str1[0]==str2[0]:
return matchPat(str1[2],str2[len(str1)-1])
else: return True
Python strings have the in operator; you can check if str1 is a substring of str2 using str1 in str2.
You can split a string into a list of substrings based on a token. "a*b*c".split("*") is ["a","b","c"].
You can find the offset of next occurrence of a substring in a string using the string's find method.
So the problem of wildcard matching becomes:
split the pattern into parts which were separated by astrix
for each part of the pattern
can we find this after the previous part's locations?
You are going to have to cope with corner cases like patterns that start with or end with an asterisk or have two asterisk beside each other and so on. Good luck!
There is a find() method of strings that searches for a substring from a particular point, returning either its index (if found) or -1 if not found. The index() method is similar but raises an exception if the target string is not found.
I'd suggest that you first split the pattern string on "*". This will give you a list of chunks to look for. Set the starting position to zero, and for each element in the list of chunks, do a find() or index() from the current position.
If you find the current chunk then work out from its starting position and length where to start searching for the next chunk and update the starting position. If you find all the chunks then the target string matches the pattern. If any chunk is missing then the pattern search should fail.
Since this is homework I am hoping that gives you enough of an idea to move on.
The basic idea here is to compare each character in str1 and str2, and if char in str1 is "*", find that character in str2 which is the character next to the "*" in str1.
Assuming that you are not going to use any function, (except find(), which can be implemented easily), this is the hard way (the code is straight-forward but messy, and I've commented wherever possible)-
def matchPat(str1, str2):
index1 = 0
index2 = 0
while index1 < len(str1):
c = str1[index1]
#Check if the str2 has run it's course.
if index2 >= len(str2):
#This needs to be checked,assuming matchPatch("*", "") to be true
if(len(str2) == 0 and str1 == "*"):
return True
return False
#If c is not "*", then it's normal comparision.
if c != "*":
if c != str2[index2]:
return False
index2 += 1
#If c is "*", then you need to increment str1,
#search for the next value in str2,
#and update index2
else:
index1 += 1
if(index1 == len(str1)):
return True
c = str1[index1]
#Search the character in str2
i = str2.find(c, index2)
#If search fails, return False
if(i == -1):
return False
index2 = i + 1
index1 += 1
return True
OUTPUT -
print matchPat("abcde", "abcd")
#False
print matchPat("a", "")
#False
print matchPat("", "a")
#True
print matchPat("", "")
#True
print matchPat("abc", "abc")
#True
print matchPat("ab*cd", "abacacd")
#False
print matchPat("ab*cd", "abaascd")
#True
print matchPat ('a*t*r', 'anteater')
#True
print matchPat ('a*t*r', 'albatross')
#True
print matchPat ('a*t*r', 'artist')
#False
Without giving you the complete answer, first, split the str1 string into a list of strings on the '*' character. I usually call str1 the "needle" and str2 the "haystack", since you are looking for the needle in the haystack.
needles = needle.split('*')
Next, have a counter (which I will call i) start at 0. You will always be looking at haystack[i:] for the next string in needles.
In pseudocode, it'll look like this:
needles = needle.split('*')
i = 0
loop through all strings in needles:
if current needle not in haystack[i:], return false
increment i to just after the occurence of the current needle in haystack (use the find() string method or write your own function to handle this)
return true
Are you allowed to use regular expressions? If so, the function you're looking for already exists in the re.search function:
import re
bool(re.search('a.t.r', 'anteasters')) # True
bool(re.search('a.t.r', 'artist' )) # False
And if asterisks are a strict necessity, you can use regular expressions for that, too:
newstr = re.sub('\*', '.', 'a*t*r') # Replace * with .
bool(re.search(newstr, 'anteasters')) # Search using the new string
If regular expressions aren't allowed, the simplest way to do that would be to look at substrings of the second string that are the same length as the first string, and compare the two. Something like this:
def matchpat(str1, str2):
if len(str1) > len(str2): return False #Can't match if the first string is longer
for i in range(0, len(str2)-len(str1)+1):
substring = str2[i:i+len(str1)] # create substring of same length as first string
for j in range(0, len(str1)):
matched = False # assume False until match is found
if str1[j] != '*' and str1[j] != substring[j]: # check each character
break
matched = True
if matched == True: break # we don't need to keep searching if we've found a match
return matched

Find index of last occurrence of a substring in a string

I want to find the position (or index) of the last occurrence of a certain substring in given input string str.
For example, suppose the input string is str = 'hello' and the substring is target = 'l', then it should output 3.
How can I do this?
Use .rfind():
>>> s = 'hello'
>>> s.rfind('l')
3
Also don't use str as variable name or you'll shadow the built-in str().
You can use rfind() or rindex()
Python2 links: rfind() rindex()
>>> s = 'Hello StackOverflow Hi everybody'
>>> print( s.rfind('H') )
20
>>> print( s.rindex('H') )
20
>>> print( s.rfind('other') )
-1
>>> print( s.rindex('other') )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
The difference is when the substring is not found, rfind() returns -1 while rindex() raises an exception ValueError (Python2 link: ValueError).
If you do not want to check the rfind() return code -1, you may prefer rindex() that will provide an understandable error message. Else you may search for minutes where the unexpected value -1 is coming from within your code...
Example: Search of last newline character
>>> txt = '''first line
... second line
... third line'''
>>> txt.rfind('\n')
22
>>> txt.rindex('\n')
22
Use the str.rindex method.
>>> 'hello'.rindex('l')
3
>>> 'hello'.index('l')
2
Not trying to resurrect an inactive post, but since this hasn't been posted yet...
(This is how I did it before finding this question)
s = "hello"
target = "l"
last_pos = len(s) - 1 - s[::-1].index(target)
Explanation: When you're searching for the last occurrence, really you're searching for the first occurrence in the reversed string. Knowing this, I did s[::-1] (which returns a reversed string), and then indexed the target from there. Then I did len(s) - 1 - the index found because we want the index in the unreversed (i.e. original) string.
Watch out, though! If target is more than one character, you probably won't find it in the reversed string. To fix this, use last_pos = len(s) - 1 - s[::-1].index(target[::-1]), which searches for a reversed version of target.
Try this:
s = 'hello plombier pantin'
print (s.find('p'))
6
print (s.index('p'))
6
print (s.rindex('p'))
15
print (s.rfind('p'))
For this case both rfind() and rindex() string methods can be used, both will return the highest index in the string where the substring is found like below.
test_string = 'hello'
target = 'l'
print(test_string.rfind(target))
print(test_string.rindex(target))
But one thing should keep in mind while using rindex() method, rindex() method raises a ValueError [substring not found] if the target value is not found within the searched string, on the other hand rfind() will just return -1.
The more_itertools library offers tools for finding indices of all characters or all substrings.
Given
import more_itertools as mit
s = "hello"
pred = lambda x: x == "l"
Code
Characters
Now there is the rlocate tool available:
next(mit.rlocate(s, pred))
# 3
A complementary tool is locate:
list(mit.locate(s, pred))[-1]
# 3
mit.last(mit.locate(s, pred))
# 3
Substrings
There is also a window_size parameter available for locating the leading item of several items:
s = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"
substring = "chuck"
pred = lambda *args: args == tuple(substring)
next(mit.rlocate(s, pred=pred, window_size=len(substring)))
# 59
Python String rindex() Method
Description
Python string method rindex() returns the last index where the substring str is found, or raises an exception if no such index exists, optionally restricting the search to string[beg:end].
Syntax
Following is the syntax for rindex() method −
str.rindex(str, beg=0 end=len(string))
Parameters
str − This specifies the string to be searched.
beg − This is the starting index, by default its 0
len − This is ending index, by default its equal to the length of the string.
Return Value
This method returns last index if found otherwise raises an exception if str is not found.
Example
The following example shows the usage of rindex() method.
Live Demo
!/usr/bin/python
str1 = "this is string example....wow!!!";
str2 = "is";
print str1.rindex(str2)
print str1.index(str2)
When we run above program, it produces following result −
5
2
Ref: Python String rindex() Method
- Tutorialspoint
If you don't wanna use rfind then this will do the trick/
def find_last(s, t):
last_pos = -1
while True:
pos = s.find(t, last_pos + 1)
if pos == -1:
return last_pos
else:
last_pos = pos
# Last Occurrence of a Character in a String without using inbuilt functions
str = input("Enter a string : ")
char = input("Enter a character to serach in string : ")
flag = 0
count = 0
for i in range(len(str)):
if str[i] == char:
flag = i
if flag == 0:
print("Entered character ",char," is not present in string")
else:
print("Character ",char," last occurred at index : ",flag)
you can use rindex() function to get the last occurrence of a character in string
s="hellloooloo"
b='l'
print(s.rindex(b))
str = "Hello, World"
target='l'
print(str.rfind(target) +1)
or
str = "Hello, World"
flag =0
target='l'
for i,j in enumerate(str[::-1]):
if target == j:
flag = 1
break;
if flag == 1:
print(len(str)-i)

Categories