How to Check if String Has the same characters in Python [duplicate] - python

This question already has answers here:
efficiently checking that string consists of one character in Python
(8 answers)
Closed 6 years ago.
What is the shortest way to check if a given string has the same characters?
For example if you have name = 'aaaaa' or surname = 'bbbb' or underscores = '___' or p = '++++', how do you check to know the characters are the same?

An option is to check whether the set of its characters has length 1:
>>> len(set("aaaa")) == 1
True
Or with all(), this could be faster if the strings are very long and it's rare that they are all the same character (but then the regex is good too):
>>> s = "aaaaa"
>>> s0 = s[0]
>>> all(c == s0 for c in s[1:])
True

You can use regex for this:
import re
p = re.compile(ur'^(.)\1*$')
re.search(p, "aaaa") # returns a match object
re.search(p, "bbbb") # returns a match object
re.search(p, "aaab") # returns None
Here's an explanation of what this regex pattern means: https://regexper.com/#%5E(.)%5C1*%24

Also possible:
s = "aaaaa"
s.count(s[0]) == len(s)

compare == len(name) * name[0]
if(compare):
# all characters are same
else:
# all characters aren't same

Here are a couple of ways.
def all_match0(s):
head, tail = s[0], s[1:]
return tail == head * len(tail)
def all_match1(s):
head, tail = s[0], s[1:]
return all(c == head for c in tail)
all_match = all_match0
data = [
'aaaaa',
'bbbb',
'___',
'++++',
'q',
'aaaaaz',
'bbbBb',
'_---',
]
for s in data:
print(s, all_match(s))
output
aaaaa True
bbbb True
___ True
++++ True
q True
aaaaaz False
bbbBb False
_--- False
all_match0 will be faster unless the string is very long, because its testing loop runs at C speed, but it uses more RAM because it constructs a duplicate string. For very long strings, the time taken to construct the duplicate string becomes significant, and of course it can't do any testing until it creates that duplicate string.
all_match1 should only be slightly slower, even for short strings, and because it stops testing as soon as it finds a mismatch it may even be faster than all_match0, if the mismatch occurs early enough in the string.

try to use Counter (High-performance container datatypes).
>>> from collections import Counter
>>> s = 'aaaaaaaaa'
>>> c = Counter(s)
>>> len(c) == 1
True

Related

Detecting if there are 3 same letters next to each other [duplicate]

This question already has answers here:
How to use re to find consecutive, repeated chars
(3 answers)
Closed 2 years ago.
I want to detect if there are three of the same letter next to each other in a string.
For example:
string1 = 'this is oooonly excaple' # ooo
string2 = 'nooo way that he did this' # ooo
string3 = 'I kneeeeeew it!' # eee
Is there any pythonic way to do this?
I guess that a solution like this is not the best one:
for letters in ['aaa', 'bbb', 'ccc', 'ddd', ..., 'zzz']:
if letters in string:
print(True)
you dont have to use regex but solution is little long for something as simple as that
def repeated(string, amount):
current = None
count = 0
for letter in string:
if letter == current:
count += 1
if count == amount:
return True
else:
count = 1
current = letter
return False
print(repeated("helllo", 3) == True)
print(repeated("hello", 3) == False)
You can use groupby to group similar letters and then check the length of each group:
from itertools import groupby
string = "this is ooonly an examplle nooo wway that he did this I kneeeeeew it!"
for letter, group in groupby(string):
if len(list(group)) >= 3:
print(letter)
Will output:
o
o
e
If you don't care for the letters themselves and just want to know if there was a repetition, take advantage of short-circuiting with the built-in any function:
print(any(len(list(group)) >= 3 for letter, group in groupby(string)))
One of the best ways to tackle these simple pattern problems is with regex
import re
test_cases = [
'abc',
'a bbb a', # expected match for 'bbb'
'bb a b',
'aaa c bbb', # expected match for 'aaa' and 'bbb'
]
for string in test_cases:
# We use re.findall because don't want to keep only with the first result.
# In case we want to stop at the first result, we should use re.search
match = re.findall(r'(?P<repeated_characters>(.)\2{2})', string)
if match:
print([groups[0] for groups in match])
Result:
['bbb']
['aaa', 'bbb']
Use a regular expression:
import re
pattern = r"(\w)\1{2}"
string = "this is ooonly an example"
print(re.search(pattern, string) is not None)
Output:
True
>>>
How about using regex? - ([a-z])\1{2}
>>> import re
>>> re.search(r'([a-z])\1{2}', 'I kneeew it!', flags=re.I)
<re.Match object; span=(4, 7), match='eee'>
re.search will return None if it doesn't find a match, otherwise it'll return a match object, you can get the full match from the match object using [0] on it.
string1 = 'nooo way that he did this'
for i in range(0,len(string1)-2):
sub_st = string1[i:i+3]
if sub_st[0]*3 == sub_st:
print('true')
print statement is from your example.
sub_st[0]*3 clone fist character in the sub_st combine those into single string. If original sub_st and clone one same it means sub_st carries the same latter 3 times.
If you don't need a general answer for n repetitions you can just iterate through the string and print true if previous character and next character are equal to current character, excluding the first and last character.
text = "I kneeeeeew it!"
for i in range(1,len(text)-1):
if text[i-1] == text[i] and text[i+1] == text[i]:
print(True)
break;
We can define following predicate with itertools.groupby + any functions like
from itertools import groupby
def has_repeated_letter(string):
return any(len(list(group)) >= 3 for _, group in groupby(string))
and after that use it
>>> has_repeated_letter('this is oooonly example')
True
>>> has_repeated_letter('nooo way that he did this')
True
>>> has_repeated_letter('I kneeeeeew it!')
True
>>> has_repeated_letter('I kneew it!')
False

Python's slice notation when a two-word start with same string and it should return True

I need to check whether a two-word string start with same string (letter) should return True. I am not sure which slicing method apply here. I gone through the various post here but could not find the required one. Based on my code, the result always give 'none'.
def word_checker(name):
if name[0] =='a' and name[::1] == 'a':
return True
print(word_checker('abc adgh'))
You need to split the string on spaces and check the first letter of each split:
def word_checker(name):
first, second = name.split()
return first[0] == 'a' and second[0] == 'a'
print(word_checker('abc adgh'))
Output
True
But the previous code will only return True if both words start with 'a', if both must start with the same letter, you can do it like this:
def word_checker(name):
first, second = name.split()
return first[0] == second[0]
print(word_checker('abc adgh'))
print(word_checker('bar barfoo'))
print(word_checker('bar foo'))
Output
True
True
False
'abc adgh'[::1] will simply return the entire string. See Understanding Python's slice notation for more details (list slicing is similar to string slicing).
Instead, you need to split by whitespace, e.g. using str.split. A functional method can use map with operator.itemgetter:
from operator import itemgetter
def word_checker(name):
a, b = map(itemgetter(0), name.split())
return a == b
print(word_checker('abc adgh')) # True
print(word_checker('abc bdgh')) # False

Regex replace in Spyder with case conversion [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
Example:
>>> convert('CamelCase')
'camel_case'
Camel case to snake case
import re
name = 'CamelCaseName'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name) # camel_case_name
If you do this many times and the above is slow, compile the regex beforehand:
pattern = re.compile(r'(?<!^)(?=[A-Z])')
name = pattern.sub('_', name).lower()
To handle more advanced cases specially (this is not reversible anymore):
def camel_to_snake(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
print(camel_to_snake('camel2_camel2_case')) # camel2_camel2_case
print(camel_to_snake('getHTTPResponseCode')) # get_http_response_code
print(camel_to_snake('HTTPResponseCodeXYZ')) # http_response_code_xyz
To add also cases with two underscores or more:
def to_snake_case(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
name = re.sub('__([A-Z])', r'_\1', name)
name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', name)
return name.lower()
Snake case to pascal case
name = 'snake_case_name'
name = ''.join(word.title() for word in name.split('_'))
print(name) # SnakeCaseName
There's an inflection library in the package index that can handle these things for you. In this case, you'd be looking for inflection.underscore():
>>> inflection.underscore('CamelCase')
'camel_case'
I don't know why these are all so complicating.
for most cases, the simple expression ([A-Z]+) will do the trick
>>> re.sub('([A-Z]+)', r'_\1','CamelCase').lower()
'_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camelCase').lower()
'camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camel2Case2').lower()
'camel2_case2'
>>> re.sub('([A-Z]+)', r'_\1','camelCamelCase').lower()
'camel_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'
To ignore the first character simply add look behind (?!^)
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCase').lower()
'camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCamelCase').lower()
'camel_camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','Camel2Camel2Case').lower()
'camel2_camel2_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'
If you want to separate ALLCaps to all_caps and expect numbers in your string you still don't need to do two separate runs just use | This expression ((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z])) can handle just about every scenario in the book
>>> a = re.compile('((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z]))')
>>> a.sub(r'_\1', 'getHTTPResponseCode').lower()
'get_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponseCode').lower()
'get2_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponse123Code').lower()
'get2_http_response123_code'
>>> a.sub(r'_\1', 'HTTPResponseCode').lower()
'http_response_code'
>>> a.sub(r'_\1', 'HTTPResponseCodeXYZ').lower()
'http_response_code_xyz'
It all depends on what you want so use the solution that best suits your needs as it should not be overly complicated.
nJoy!
Avoiding libraries and regular expressions:
def camel_to_snake(s):
return ''.join(['_'+c.lower() if c.isupper() else c for c in s]).lstrip('_')
>>> camel_to_snake('ThisIsMyString')
'this_is_my_string'
stringcase is my go-to library for this; e.g.:
>>> from stringcase import pascalcase, snakecase
>>> snakecase('FooBarBaz')
'foo_bar_baz'
>>> pascalcase('foo_bar_baz')
'FooBarBaz'
I think this solution is more straightforward than previous answers:
import re
def convert (camel_input):
words = re.findall(r'[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+', camel_input)
return '_'.join(map(str.lower, words))
# Let's test it
test_strings = [
'CamelCase',
'camelCamelCase',
'Camel2Camel2Case',
'getHTTPResponseCode',
'get200HTTPResponseCode',
'getHTTP200ResponseCode',
'HTTPResponseCode',
'ResponseHTTP',
'ResponseHTTP2',
'Fun?!awesome',
'Fun?!Awesome',
'10CoolDudes',
'20coolDudes'
]
for test_string in test_strings:
print(convert(test_string))
Which outputs:
camel_case
camel_camel_case
camel_2_camel_2_case
get_http_response_code
get_200_http_response_code
get_http_200_response_code
http_response_code
response_http
response_http_2
fun_awesome
fun_awesome
10_cool_dudes
20_cool_dudes
The regular expression matches three patterns:
[A-Z]?[a-z]+: Consecutive lower-case letters that optionally start with an upper-case letter.
[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$): Two or more consecutive upper-case letters. It uses a lookahead to exclude the last upper-case letter if it is followed by a lower-case letter.
\d+: Consecutive numbers.
By using re.findall we get a list of individual "words" that can be converted to lower-case and joined with underscores.
Personally I am not sure how anything using regular expressions in python can be described as elegant. Most answers here are just doing "code golf" type RE tricks. Elegant coding is supposed to be easily understood.
def to_snake_case(not_snake_case):
final = ''
for i in xrange(len(not_snake_case)):
item = not_snake_case[i]
if i < len(not_snake_case) - 1:
next_char_will_be_underscored = (
not_snake_case[i+1] == "_" or
not_snake_case[i+1] == " " or
not_snake_case[i+1].isupper()
)
if (item == " " or item == "_") and next_char_will_be_underscored:
continue
elif (item == " " or item == "_"):
final += "_"
elif item.isupper():
final += "_"+item.lower()
else:
final += item
if final[0] == "_":
final = final[1:]
return final
>>> to_snake_case("RegularExpressionsAreFunky")
'regular_expressions_are_funky'
>>> to_snake_case("RegularExpressionsAre Funky")
'regular_expressions_are_funky'
>>> to_snake_case("RegularExpressionsAre_Funky")
'regular_expressions_are_funky'
''.join('_'+c.lower() if c.isupper() else c for c in "DeathToCamelCase").strip('_')
re.sub("(.)([A-Z])", r'\1_\2', 'DeathToCamelCase').lower()
Here's my solution:
def un_camel(text):
""" Converts a CamelCase name into an under_score name.
>>> un_camel('CamelCase')
'camel_case'
>>> un_camel('getHTTPResponseCode')
'get_http_response_code'
"""
result = []
pos = 0
while pos < len(text):
if text[pos].isupper():
if pos-1 > 0 and text[pos-1].islower() or pos-1 > 0 and \
pos+1 < len(text) and text[pos+1].islower():
result.append("_%s" % text[pos].lower())
else:
result.append(text[pos].lower())
else:
result.append(text[pos])
pos += 1
return "".join(result)
It supports those corner cases discussed in the comments. For instance, it'll convert getHTTPResponseCode to get_http_response_code like it should.
I don't get idea why using both .sub() calls? :) I'm not regex guru, but I simplified function to this one, which is suitable for my certain needs, I just needed a solution to convert camelCasedVars from POST request to vars_with_underscore:
def myFunc(...):
return re.sub('(.)([A-Z]{1})', r'\1_\2', "iTriedToWriteNicely").lower()
It does not work with such names like getHTTPResponse, cause I heard it is bad naming convention (should be like getHttpResponse, it's obviously, that it's much easier memorize this form).
For the fun of it:
>>> def un_camel(input):
... output = [input[0].lower()]
... for c in input[1:]:
... if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
... output.append('_')
... output.append(c.lower())
... else:
... output.append(c)
... return str.join('', output)
...
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'
Or, more for the fun of it:
>>> un_camel = lambda i: i[0].lower() + str.join('', ("_" + c.lower() if c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" else c for c in i[1:]))
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'
Using regexes may be the shortest, but this solution is way more readable:
def to_snake_case(s):
snake = "".join(["_"+c.lower() if c.isupper() else c for c in s])
return snake[1:] if snake.startswith("_") else snake
This is not a elegant method, is a very 'low level' implementation of a simple state machine (bitfield state machine), possibly the most anti pythonic mode to resolve this, however re module also implements a too complex state machine to resolve this simple task, so i think this is a good solution.
def splitSymbol(s):
si, ci, state = 0, 0, 0 # start_index, current_index
'''
state bits:
0: no yields
1: lower yields
2: lower yields - 1
4: upper yields
8: digit yields
16: other yields
32 : upper sequence mark
'''
for c in s:
if c.islower():
if state & 1:
yield s[si:ci]
si = ci
elif state & 2:
yield s[si:ci - 1]
si = ci - 1
state = 4 | 8 | 16
ci += 1
elif c.isupper():
if state & 4:
yield s[si:ci]
si = ci
if state & 32:
state = 2 | 8 | 16 | 32
else:
state = 8 | 16 | 32
ci += 1
elif c.isdigit():
if state & 8:
yield s[si:ci]
si = ci
state = 1 | 4 | 16
ci += 1
else:
if state & 16:
yield s[si:ci]
state = 0
ci += 1 # eat ci
si = ci
print(' : ', c, bin(state))
if state:
yield s[si:ci]
def camelcaseToUnderscore(s):
return '_'.join(splitSymbol(s))
splitsymbol can parses all case types: UpperSEQUENCEInterleaved, under_score, BIG_SYMBOLS and cammelCasedMethods
I hope it is useful
Take a look at the excellent Schematics lib
https://github.com/schematics/schematics
It allows you to created typed data structures that can serialize/deserialize from python to Javascript flavour, eg:
class MapPrice(Model):
price_before_vat = DecimalType(serialized_name='priceBeforeVat')
vat_rate = DecimalType(serialized_name='vatRate')
vat = DecimalType()
total_price = DecimalType(serialized_name='totalPrice')
So many complicated methods...
Just find all "Titled" group and join its lower cased variant with underscore.
>>> import re
>>> def camel_to_snake(string):
... groups = re.findall('([A-z0-9][a-z]*)', string)
... return '_'.join([i.lower() for i in groups])
...
>>> camel_to_snake('ABCPingPongByTheWay2KWhereIsOurBorderlands3???')
'a_b_c_ping_pong_by_the_way_2_k_where_is_our_borderlands_3'
If you don't want make numbers like first character of group or separate group - you can use ([A-z][a-z0-9]*) mask.
A horrendous example using regular expressions (you could easily clean this up :) ):
def f(s):
return s.group(1).lower() + "_" + s.group(2).lower()
p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(f, "CamelCase")
print p.sub(f, "getHTTPResponseCode")
Works for getHTTPResponseCode though!
Alternatively, using lambda:
p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "CamelCase")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "getHTTPResponseCode")
EDIT: It should also be pretty easy to see that there's room for improvement for cases like "Test", because the underscore is unconditionally inserted.
Lightely adapted from https://stackoverflow.com/users/267781/matth
who use generators.
def uncamelize(s):
buff, l = '', []
for ltr in s:
if ltr.isupper():
if buff:
l.append(buff)
buff = ''
buff += ltr
l.append(buff)
return '_'.join(l).lower()
This simple method should do the job:
import re
def convert(name):
return re.sub(r'([A-Z]*)([A-Z][a-z]+)', lambda x: (x.group(1) + '_' if x.group(1) else '') + x.group(2) + '_', name).rstrip('_').lower()
We look for capital letters that are precedeed by any number of (or zero) capital letters, and followed by any number of lowercase characters.
An underscore is placed just before the occurence of the last capital letter found in the group, and one can be placed before that capital letter in case it is preceded by other capital letters.
If there are trailing underscores, remove those.
Finally, the whole result string is changed to lower case.
(taken from here, see working example online)
Here's something I did to change the headers on a tab-delimited file. I'm omitting the part where I only edited the first line of the file. You could adapt it to Python pretty easily with the re library. This also includes separating out numbers (but keeps the digits together). I did it in two steps because that was easier than telling it not to put an underscore at the start of a line or tab.
Step One...find uppercase letters or integers preceded by lowercase letters, and precede them with an underscore:
Search:
([a-z]+)([A-Z]|[0-9]+)
Replacement:
\1_\l\2/
Step Two...take the above and run it again to convert all caps to lowercase:
Search:
([A-Z])
Replacement (that's backslash, lowercase L, backslash, one):
\l\1
I was looking for a solution to the same problem, except that I needed a chain; e.g.
"CamelCamelCamelCase" -> "Camel-camel-camel-case"
Starting from the nice two-word solutions here, I came up with the following:
"-".join(x.group(1).lower() if x.group(2) is None else x.group(1) \
for x in re.finditer("((^.[^A-Z]+)|([A-Z][^A-Z]+))", "stringToSplit"))
Most of the complicated logic is to avoid lowercasing the first word. Here's a simpler version if you don't mind altering the first word:
"-".join(x.group(1).lower() for x in re.finditer("(^[^A-Z]+|[A-Z][^A-Z]+)", "stringToSplit"))
Of course, you can pre-compile the regular expressions or join with underscore instead of hyphen, as discussed in the other solutions.
Concise without regular expressions, but HTTPResponseCode=> httpresponse_code:
def from_camel(name):
"""
ThisIsCamelCase ==> this_is_camel_case
"""
name = name.replace("_", "")
_cas = lambda _x : [_i.isupper() for _i in _x]
seq = zip(_cas(name[1:-1]), _cas(name[2:]))
ss = [_x + 1 for _x, (_i, _j) in enumerate(seq) if (_i, _j) == (False, True)]
return "".join([ch + "_" if _x in ss else ch for _x, ch in numerate(name.lower())])
Without any library :
def camelify(out):
return (''.join(["_"+x.lower() if i<len(out)-1 and x.isupper() and out[i+1].islower()
else x.lower()+"_" if i<len(out)-1 and x.islower() and out[i+1].isupper()
else x.lower() for i,x in enumerate(list(out))])).lstrip('_').replace('__','_')
A bit heavy, but
CamelCamelCamelCase -> camel_camel_camel_case
HTTPRequest -> http_request
GetHTTPRequest -> get_http_request
getHTTPRequest -> get_http_request
Very nice RegEx proposed on this site:
(?<!^)(?=[A-Z])
If python have a String Split method, it should work...
In Java:
String s = "loremIpsum";
words = s.split("(?<!^)(?=[A-Z])");
Just in case someone needs to transform a complete source file, here is a script that will do it.
# Copy and paste your camel case code in the string below
camelCaseCode ="""
cv2.Matx33d ComputeZoomMatrix(const cv2.Point2d & zoomCenter, double zoomRatio)
{
auto mat = cv2.Matx33d::eye();
mat(0, 0) = zoomRatio;
mat(1, 1) = zoomRatio;
mat(0, 2) = zoomCenter.x * (1. - zoomRatio);
mat(1, 2) = zoomCenter.y * (1. - zoomRatio);
return mat;
}
"""
import re
def snake_case(name):
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
def lines(str):
return str.split("\n")
def unlines(lst):
return "\n".join(lst)
def words(str):
return str.split(" ")
def unwords(lst):
return " ".join(lst)
def map_partial(function):
return lambda values : [ function(v) for v in values]
import functools
def compose(*functions):
return functools.reduce(lambda f, g: lambda x: f(g(x)), functions, lambda x: x)
snake_case_code = compose(
unlines ,
map_partial(unwords),
map_partial(map_partial(snake_case)),
map_partial(words),
lines
)
print(snake_case_code(camelCaseCode))
Wow I just stole this from django snippets. ref http://djangosnippets.org/snippets/585/
Pretty elegant
camelcase_to_underscore = lambda str: re.sub(r'(?<=[a-z])[A-Z]|[A-Z](?=[^A-Z])', r'_\g<0>', str).lower().strip('_')
Example:
camelcase_to_underscore('ThisUser')
Returns:
'this_user'
REGEX DEMO
def convert(name):
return reduce(
lambda x, y: x + ('_' if y.isupper() else '') + y,
name
).lower()
And if we need to cover a case with already-un-cameled input:
def convert(name):
return reduce(
lambda x, y: x + ('_' if y.isupper() and not x.endswith('_') else '') + y,
name
).lower()
Not in the standard library, but I found this module that appears to contain the functionality you need.
If you use Google's (nearly) deterministic Camel case algorithm, then one does not need to handle things like HTMLDocument since it should be HtmlDocument, then this regex based approach is simple. It replace all capitals or numbers with an underscore. Note does not handle multi digit numbers.
import re
def to_snake_case(camel_str):
return re.sub('([A-Z0-9])', r'_\1', camel_str).lower().lstrip('_')
def convert(camel_str):
temp_list = []
for letter in camel_str:
if letter.islower():
temp_list.append(letter)
else:
temp_list.append('_')
temp_list.append(letter)
result = "".join(temp_list)
return result.lower()
Use: str.capitalize() to convert first letter of the string (contained in variable str) to a capital letter and returns the entire string.
Example:
Command: "hello".capitalize()
Output: Hello

"IN" operator with empty strings in Python 3.0 [duplicate]

This question already has answers here:
Why empty string is on every string? [duplicate]
(2 answers)
Closed 6 years ago.
As I am going through tutorials on Python 3, I came across the following:
>>> '' in 'spam'
True
My understanding is that '' equals no blank spaces.
When I try the following the shell terminal, I get the output shown below it:
>>> '' in ' spam '
True
Can someone please help explain what is happening?
'' is the empty string, same as "". The empty string is a substring of every other string.
When a and b are strings, the expression a in b checks that a is a substring of b. That is, the sequence of characters of a must exist in b; there must be an index i such that b[i:i+len(a)] == a. If a is empty, then any index i satisfies this condition.
This does not mean that when you iterate over b, you will get a. Unlike other sequences, while every element produced by for a in b satisfies a in b, a in b does not imply that a will be produced by iterating over b.
So '' in x and "" in x returns True for any string x:
>>> '' in 'spam'
True
>>> "" in 'spam'
True
>>> "" in ''
True
>>> '' in ""
True
>>> '' in ''
True
>>> '' in ' '
True
>>> "" in " "
True
The string literal '' represents the empty string. This is basically a string with a length of zero, which contains no characters.
The in operator is defined for sequences to return “True if an item of s is equal to x, else False” for an expression x in s. For general sequences, this means that one of the items in s (usually accessible using iteration) equals the tested element x. For strings however, the in operator has subsequence semantics. So x in s is true, when x is a substring of s.
Formally, this means that for a substring x with a length of n, there must be an index i which satisfies the following expression: s[i:i+n] == x.
This is easily understood with an example:
>>> s = 'foobar'
>>> x = 'foo'
>>> n = len(x) # 3
>>> i = 0
>>> s[i:i+n] == x
True
>>> x = 'obar'
>>> n = len(x) # 4
>>> i = 2
>>> s[i:i+n] == x
True
Algorithmically, what the in operator (or the underlying __contains__ method) needs to do is iterate the i to all possible values (0 <= i < len(s) - n) and check if the condition is true for any i.
Looking back at the empty string, it becomes clear why the '' in s check is true for every string s: n is zero, so we are checking s[i:i]; and that is the empty string itself for every valid index i:
>>> s[0:0]
''
>>> s[1:1]
''
>>> s[2:2]
''
It is even true for s being the empty string itself, because sequence slicing is defined to return an empty sequence when a range outside of the sequence is specified (that’s why you could do s[74565463:74565469] on short strings).
So that explains why the containment check with in always returns True when checking the empty string as a substring. But even if you think about it logically, you can see the reason: A substring is part of a string which you can find in another string. The empty string however can be find between every two characters. It’s like how you can add an infinite amount of zeros to a number, you can add an infinite amount of empty strings to a string without actually modifying that string.
As Rushy Panchal points out, in inclusion operator follows set-theoretic convention and assumes that an empty string is a substring of any string.
You can try to persuade yourself why this makes sense by considering the following: let s be a string such that '' in s == False. Then '' in s[len(s):] better be false by transitivity (or else there is a subset of s that contains '', but s does not contain '', etc). But then '' in '' == False, which isn't great either. So you cannot pick any string s such that '' not in s which does not create a problem.
Of course, when in doubt, simulate it:
s = input('Enter any string you dare:\n')
print('' in '')
print(s == s + '' == '' + s)
print('' in '' + s)

Python how to eliminate 3 or more consequent letter

I am trying to replace those words which had continuous letters more than 3 example realllllly to really.
pattern = re.compile(r"(.)\1\1{2,}", re.DOTALL)
return pattern.sub(r"\1\1\1", text)
I can't get it work anyone can help?
Your solution actually appears to be working correctly:
>>> import re
>>> a = 'foooooooo baaaar'
>>> reg = re.compile( r"(.)\1\1{2,}")
>>> reg.sub(r'\1\1', a)
'foo baar'
But based on comment, you want to replace xyyyx by xyyx, but you've specified regexp for at least 4 of them, therefor only xyyyyx gets replaced... Simply change this line:
>>> reg = re.compile( r"(.)\1{2,}")
>>> reg.sub(r'\1\1', 'fooo baaaar actuallly')
'foo baar actually'
I'd suggest not to use regular expressions when they're not necessary. This task can be accomplished easily without, in a more readable fashion, with linear time and constant space complexity (not sure about the regex).
def filter_repetitions(text, max_repetitions=0):
last_character = None
repetition_count = 0
for character in text:
if character == last_character:
repetition_count += 1
else:
last_character = character
repetition_count = 0
if repetition_count <= max_repetitions:
yield character
print ''.join(filter_repetitions("fooo baaaar actuallly", 1))

Categories