I am not able to understand the behavior of the str.startswith method.
If I execute "hello".startswith("") it returns True. Ideally it doesn't starts with empty string.
>>> "hello".startswith("")
True
The documentation states:
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for.
So how does the function work?
str.startswith() can be expressed in Python code as:
def startswith(source, prefix):
return source[:len(prefix)] == prefix
It tests if the first len(prefix) characters of the source string are equal to the prefix. If you pass in a prefix of length zero, that means the first 0 characters are tested. A string of length 0 is always equal to any other string of length 0.
Note that this applies to other string tests too:
>>> s = 'foobar'
>>> '' in s
True
>>> s.endswith('')
True
>>> s.find('')
0
>>> s.index('')
0
>>> s.count('')
7
>>> s.replace('', ' -> ')
' -> f -> o -> o -> b -> a -> r -> '
Those last two demos, counting the empty string or replacing the empty string with something else, shows that you can find an empty string at every position in the input string.
A string p is a prefix of a string s if s = p + x, so the empty string is a prefix of all strings (it's like 0, s = 0 + s).
Related
There is an input string like "2r-rj1225-f11e-12-x-w"
The task is to return it in the following format:
all groups except the first and last must be 5 characters
the first and the last groups must be between 1 and 5 characters
if the first group in the input is less than 5 characters, it must be preserved
that results to is "2r-rj122-5f11e-12xw"
import re
string = "2r-rj1225-f11e-12-x-w"
baseLength = 5
def formatKey(string: str, baseLength: int) -> str:
p = re.compile(r"{1,baseLength}[a-zA-Z0-9]{baseLength}[a-zA-z0-9]+")
formatted = '-'.join(p.match(string))
return formatted
print(f'The reformatted string is {formatKey(string, baseLength)}')
that does not work, naturally. And I also wish to avoid '-'.join and to simply return something like regexp(re.compile('[a-z]FORMATREGEXP'), string) where FORMATREGEXP is the regexp that does the job.
Clarification: The actual solution is to use re.sub(pattern, repl, string) function: "The sub() function searches for the pattern in the string and replaces the matched strings with the replacement" -- And that is exactly what I've been asking for, that simple, in one line!!
I don't really see this as a regex problem. It's just reorganizing the characters after the first hyphen.
x = "2r-rj1225-f11e-12-x-w"
def reencode(x):
parts = x.split('-')
p1 = ''.join(parts[1:])
s = parts[0]
while len(p1) >= 5:
s += '-' + p1[:5]
p1 = p1[5:]
if p1:
s += '-' + p1
return s
print(reencode(x))
Output:
2r-rj122-5f11e-12xw
I am not able to understand the behavior of the str.startswith method.
If I execute "hello".startswith("") it returns True. Ideally it doesn't starts with empty string.
>>> "hello".startswith("")
True
The documentation states:
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for.
So how does the function work?
str.startswith() can be expressed in Python code as:
def startswith(source, prefix):
return source[:len(prefix)] == prefix
It tests if the first len(prefix) characters of the source string are equal to the prefix. If you pass in a prefix of length zero, that means the first 0 characters are tested. A string of length 0 is always equal to any other string of length 0.
Note that this applies to other string tests too:
>>> s = 'foobar'
>>> '' in s
True
>>> s.endswith('')
True
>>> s.find('')
0
>>> s.index('')
0
>>> s.count('')
7
>>> s.replace('', ' -> ')
' -> f -> o -> o -> b -> a -> r -> '
Those last two demos, counting the empty string or replacing the empty string with something else, shows that you can find an empty string at every position in the input string.
A string p is a prefix of a string s if s = p + x, so the empty string is a prefix of all strings (it's like 0, s = 0 + s).
This question already has answers here:
efficiently checking that string consists of one character in Python
(8 answers)
Closed 6 years ago.
What is the shortest way to check if a given string has the same characters?
For example if you have name = 'aaaaa' or surname = 'bbbb' or underscores = '___' or p = '++++', how do you check to know the characters are the same?
An option is to check whether the set of its characters has length 1:
>>> len(set("aaaa")) == 1
True
Or with all(), this could be faster if the strings are very long and it's rare that they are all the same character (but then the regex is good too):
>>> s = "aaaaa"
>>> s0 = s[0]
>>> all(c == s0 for c in s[1:])
True
You can use regex for this:
import re
p = re.compile(ur'^(.)\1*$')
re.search(p, "aaaa") # returns a match object
re.search(p, "bbbb") # returns a match object
re.search(p, "aaab") # returns None
Here's an explanation of what this regex pattern means: https://regexper.com/#%5E(.)%5C1*%24
Also possible:
s = "aaaaa"
s.count(s[0]) == len(s)
compare == len(name) * name[0]
if(compare):
# all characters are same
else:
# all characters aren't same
Here are a couple of ways.
def all_match0(s):
head, tail = s[0], s[1:]
return tail == head * len(tail)
def all_match1(s):
head, tail = s[0], s[1:]
return all(c == head for c in tail)
all_match = all_match0
data = [
'aaaaa',
'bbbb',
'___',
'++++',
'q',
'aaaaaz',
'bbbBb',
'_---',
]
for s in data:
print(s, all_match(s))
output
aaaaa True
bbbb True
___ True
++++ True
q True
aaaaaz False
bbbBb False
_--- False
all_match0 will be faster unless the string is very long, because its testing loop runs at C speed, but it uses more RAM because it constructs a duplicate string. For very long strings, the time taken to construct the duplicate string becomes significant, and of course it can't do any testing until it creates that duplicate string.
all_match1 should only be slightly slower, even for short strings, and because it stops testing as soon as it finds a mismatch it may even be faster than all_match0, if the mismatch occurs early enough in the string.
try to use Counter (High-performance container datatypes).
>>> from collections import Counter
>>> s = 'aaaaaaaaa'
>>> c = Counter(s)
>>> len(c) == 1
True
I'm trying to create a method that brings in a string, looks for the first letter of the string and then replaces all occurrences of that letter with another character.
It obviously does not work to use s[letter] to the new character, since letter in this case is not an index. But what solution should be used instead?
def fix_start(s):
letterToReplace = s[0]
for letter in s:
if letter is letterToReplace:
s[letter] = '*'
return s
Always keep the manual at hand. The str type has a method which is even called replace.
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
So, you can just do
def fix_start(s):
return s.replace(s[0], '*')
Note that if you have a situation where you cannot just do str.replace (e.g. you have a list of strings, not string of characters) the enumerate function is handy. It will give you the element and its index:
def fix_start(sl):
elementToReplace = sl[0]
for idx, elem in enumerate(sl):
if elem == elementToReplace:
sl[idx] = '*'
return sl
On a side node, do not test letter is letterToReplace in such cases. The is operator tests for identity, not equality. Identity means its the same object (same memory address, for example), whereas equality means it has the same meaning (represents the letter A, for example).
For basic primitive types (chars, small ints, etc.), python keeps only one object around. For example, int(1) == int(1) and int(1) is int(1) are equivalent. However, foo = int(123123); (foo == 123123, foo is 123123) will fail the is test.
You can use the re library:
import re
def fix_start(s):
return re.sub(s[0], "*", s)
print(fix_start("string"))
Output:
*tring
another way is this:
def fix_start(s):
return "".join(map(lambda x: '*' if x is s[0] else x, s))
This question already has answers here:
Why empty string is on every string? [duplicate]
(2 answers)
Closed 6 years ago.
As I am going through tutorials on Python 3, I came across the following:
>>> '' in 'spam'
True
My understanding is that '' equals no blank spaces.
When I try the following the shell terminal, I get the output shown below it:
>>> '' in ' spam '
True
Can someone please help explain what is happening?
'' is the empty string, same as "". The empty string is a substring of every other string.
When a and b are strings, the expression a in b checks that a is a substring of b. That is, the sequence of characters of a must exist in b; there must be an index i such that b[i:i+len(a)] == a. If a is empty, then any index i satisfies this condition.
This does not mean that when you iterate over b, you will get a. Unlike other sequences, while every element produced by for a in b satisfies a in b, a in b does not imply that a will be produced by iterating over b.
So '' in x and "" in x returns True for any string x:
>>> '' in 'spam'
True
>>> "" in 'spam'
True
>>> "" in ''
True
>>> '' in ""
True
>>> '' in ''
True
>>> '' in ' '
True
>>> "" in " "
True
The string literal '' represents the empty string. This is basically a string with a length of zero, which contains no characters.
The in operator is defined for sequences to return “True if an item of s is equal to x, else False” for an expression x in s. For general sequences, this means that one of the items in s (usually accessible using iteration) equals the tested element x. For strings however, the in operator has subsequence semantics. So x in s is true, when x is a substring of s.
Formally, this means that for a substring x with a length of n, there must be an index i which satisfies the following expression: s[i:i+n] == x.
This is easily understood with an example:
>>> s = 'foobar'
>>> x = 'foo'
>>> n = len(x) # 3
>>> i = 0
>>> s[i:i+n] == x
True
>>> x = 'obar'
>>> n = len(x) # 4
>>> i = 2
>>> s[i:i+n] == x
True
Algorithmically, what the in operator (or the underlying __contains__ method) needs to do is iterate the i to all possible values (0 <= i < len(s) - n) and check if the condition is true for any i.
Looking back at the empty string, it becomes clear why the '' in s check is true for every string s: n is zero, so we are checking s[i:i]; and that is the empty string itself for every valid index i:
>>> s[0:0]
''
>>> s[1:1]
''
>>> s[2:2]
''
It is even true for s being the empty string itself, because sequence slicing is defined to return an empty sequence when a range outside of the sequence is specified (that’s why you could do s[74565463:74565469] on short strings).
So that explains why the containment check with in always returns True when checking the empty string as a substring. But even if you think about it logically, you can see the reason: A substring is part of a string which you can find in another string. The empty string however can be find between every two characters. It’s like how you can add an infinite amount of zeros to a number, you can add an infinite amount of empty strings to a string without actually modifying that string.
As Rushy Panchal points out, in inclusion operator follows set-theoretic convention and assumes that an empty string is a substring of any string.
You can try to persuade yourself why this makes sense by considering the following: let s be a string such that '' in s == False. Then '' in s[len(s):] better be false by transitivity (or else there is a subset of s that contains '', but s does not contain '', etc). But then '' in '' == False, which isn't great either. So you cannot pick any string s such that '' not in s which does not create a problem.
Of course, when in doubt, simulate it:
s = input('Enter any string you dare:\n')
print('' in '')
print(s == s + '' == '' + s)
print('' in '' + s)