How to check if string is a pangram? - python

I want to create a function which takes a string as input and check whether the string is pangram or not (pangram is a piece of text which contains every letter of the alphabet).
I wrote the following code, which works, but I am looking for an alternative way to do it, hopefully a shorted way.
import string
def is_pangram (gram):
gram = gram.lower()
gram_list_old = sorted([c for c in gram if c != ' '])
gram_list = []
for c in gram_list_old:
if c not in gram_list:
gram_list.append(c)
if gram_list == list(string.ascii_lowercase): return True
else: return False
I feel like this question might be against the rules of this website but hopefully it isn't. I am just curious and would like to see alternative ways to do this.

is_pangram = lambda s: not set('abcdefghijklmnopqrstuvwxyz') - set(s.lower())
>>> is_pangram('abc')
False
>>> is_pangram('the quick brown fox jumps over the lazy dog')
True
>>> is_pangram('Does the quick brown fox jump over the lazy dog?')
True
>>> is_pangram('Do big jackdaws love my sphinx of quartz?')
True
Test string s is a pangram if we start with the alphabet, remove every letter found in the test string, and all the alphabet letters get removed.
Explanation
The use of 'lambda' is a way of creating a function, so it's a one line equivalent to writing a def like:
def is_pangram(s):
return not set('abcdefghijklmnopqrstuvwxyz') - set(s.lower())
set() creates a data structure which can't have any duplicates in it, and here:
The first set is the (English) alphabet letters, in lowercase
The second set is the characters from the test string, also in lowercase. And all the duplicates are gone as well.
Subtracting things like set(..) - set(..) returns the contents of the first set, minus the contents of the second set. set('abcde') - set('ace') == set('bd').
In this pangram test:
we take the characters in the test string away from the alphabet
If there's nothing left, then the test string contained all the letters of the alphabet and must be a pangram.
If there's something leftover, then the test string did not contain all the alphabet letters, so it must not be a pangram.
any spaces, punctuation characters from the test string set were never in the alphabet set, so they don't matter.
set(..) - set(..) will return an empty set, or a set with content. If we force sets into the simplest True/False values in Python, then containers with content are 'True' and empty containers are 'False'.
So we're using not to check "is there anything leftover?" by forcing the result into a True/False value, depending on whether there's any leftovers or not.
not also changes True -> False, and False -> True. Which is useful here, because (alphabet used up) -> an empty set which is False, but we want is_pangram to return True in that case. And vice-versa, (alphabet has some leftovers) -> a set of letters which is True, but we want is_pangram to return False for that.
Then return that True/False result.
is_pangram = lambda s: not set('abcdefghijklmnopqrstuvwxyz') - set(s.lower())
# Test string `s`
#is a pangram if
# the alphabet letters
# minus
# the test string letters
# has NO leftovers

You can use something as simple as:
import string
is_pangram = lambda s: all(c in s.lower() for c in string.ascii_lowercase)

Sets are excellent for membership testing:
>>> import string
>>> candidate = 'ammdjri * itouwpo ql ? k # finvmcxzkasjdhgfytuiopqowit'
>>> ascii_lower = set(string.ascii_lowercase)
Strip the whitespace and punctuation from the candidate then test:
>>> candidate_lower = ascii_lower.intersection(candidate.lower())
>>> ascii_lower == candidate_lower
False
Find out what is missing:
>>> ascii_lower.symmetric_difference(candidate_lower)
set(['b', 'e'])
Try it again but add the missing letters:
>>> candidate = candidate + 'be'
>>> candidate_lower = ascii_lower.intersection(candidate.lower())
>>> ascii_lower == candidate_lower
True
>>>

def pangram(word):
return all(chr(c+97) in word for c in range(25))

How about simply check whether each one of the lowercased alphabet is in the sentence:
text = input()
s = set(text.lower())
if sum(1 for c in s if 96 < ord(c) < 123) == 26:
print ('pangram')
else:
print ('not pangram')
or in a function:
def ispangram(text):
return sum(1 for c in set(text.lower()) if 96 < ord(c) < 123) == 26

Here is another definition:
def is_pangram(s):
return len(set(s.lower().replace(" ", ""))) == 26

I came up with the easiest and without using module programe.
def checking(str_word):
b=[]
for i in str_word:
if i not in b:
b.append(i)
b.sort()
#print(b)
#print(len(set(b)))
if(len(set(b))>=26):
print(b)
print(len(b))
print(" String is pangram .")
else:
print(" String isn't pangram .")
#b.sort()
#print(b)
str_word=input(" Enter the String :")
checking(str_word)

I see this thread is a little old, but I thought I'd throw in my solution anyway.
import string
def panagram(phrase):
new_phrase=sorted(phrase.lower())
phrase_letters = ""
for index in new_phrase:
for letter in string.ascii_lowercase:
if index == letter and index not in phrase_letters:
phrase_letters+=letter
print len(phrase_letters) == 26
or for the last line:
print phrase_letters == string.ascii_lowercase

def panagram(phrase):
alphabet="abcdefghiklmnopqrstuvwxyz"
pharseletter=""
for char in phrase:
if char in aphabet:
phraseletter= phraseletter + char
for char in aplhabet:
if char not in phrase:
return false

import string
def ispangram(str, alphabet=string.ascii_lowercase):
alphabet = set(alphabet)
return alphabet <= set(str.lower())
or more simpler way
def ispangram(str):
return len(set(str.lower().replace(" ", ""))) == 26

import string
def is_pangram(phrase, alpha=string.ascii_lowercase):
num = len(alpha)
count=0
for i in alpha:
if i in phrase:
count += 1
return count == num

def panagram(str1):
str1=str1.replace(' ','').lower()
s=set(str1)
l=list(s)
if len(l)==26:
return True
return False
str1='The quick brown fox jumps over the dog'
q=panagram(str1)
print(q)
True

import string
def ispangram(str1,alphabet=string.ascii.lowercase):
for myalphabet in alphabet:
if myalphabet not in str1:
print(it's not pangram)
break
else:
print(it's pangram)
Execute the command:
ispangram("The quick brown fox jumps over the lazy dog")
Output: "it's pangram."
Hint: string.ascii_lowercase returns output
abcdefghijklmnopqrstuvwxyz

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution {
public static void main(String[] args) {
String s;
char f;
Scanner in = new Scanner(System.in);
s = in.nextLine();
char[] charArray = s.toLowerCase().toCharArray();
final Set set = new HashSet();
for (char a : charArray) {
if ((int) a >= 97 && (int) a <= 122) {
f = a;
set.add(f);
}
}
if (set.size() == 26){
System.out.println("pangram");
}
else {
System.out.println("not pangram");
}
}
}

import string
import re
list_lower= list(string.lowercase);
list_upper=list(string.uppercase);
list_total=list_lower + list_upper ;
def is_panagram(temp):
for each in temp:
if each not in list_total :
return 'true'
sample=raw_input("entre the string\n");
string2=re.sub('[^A-Za-z0-9]+', '', sample)
ram=is_panagram(string2);
if ram =='true':
print "sentence is not a panagram"
else:`enter code here`
print"sentece is a panagram"

Related

check if string contains number and return the number

I'm using python and I have a string variable foo = " I have 1 kilo of tomatoes "
What I want is to check if my string contains an integer (in our case 1 ) and return the specific integer
I know I can use the isdigit function like :
def hashnumbers(inputString):
return any(char.isdigit() for char in inputString)
But it returns true or false and does not store the number .
I would appreciate your help . Thank you in advance .
List Comprehension : Return the digits
To return the digits, use a list comprehension with if
def hashnumbers(inputString):
return [char for char in inputString if char.isdigit()]
print(hashnumbers("super string")) # []
print(hashnumbers("super 2 string")) # ['2']
print(hashnumbers("super 2 3 string")) # ['2', '3']
Return a default value if no digits found (empty list is evaluated as False)
return [char for char in inputString if char.isdigit()] or None
Regex version with re.findall
return re.findall(r"\d", inputString)
return re.findall(r"\d", inputString) or None
Return first one only
def hashnumbers(inputString):
return next((char for char in inputString if char.isdigit()), None)
print(hashnumbers("super string")) # None
print(hashnumbers("super 2 string")) # 2
print(hashnumbers("super 2 3string")) # 2
Your question is a little unclear, but I've assumed the following:
You want your function to return true or false. Not a number
You want the function to print any numbers in the string
If that's right, this code should work:
def hashnumbers(inputString):
num = False
for i in inputString:
if i.isdigit():
num = True
print(i)
return num
However, if I've misunderstood what functionality you're looking for, let me know and I'll amend this.
I hope this helps.

Problem with changing lowercase letters to uppercase and vice versa using str.replace

Ok, so this is my code, i don't want to use the built in swapcase() method. It does not work for the given string.
def myFunc(a):
for chars in range(0,len(a)):
if a[chars].islower():
a = a.replace(a[chars], a[chars].upper())
elif a[chars].isupper():
a = a.replace(a[chars], a[chars].lower())
return a
print(myFunc("AaAAaaaAAaAa"))
replace changes all the letters and you assign the values back to aso you end up with all upper cases.
def myFunc(a):
# use a list to collect changed letters
new_text = []
for char in a:
if char.islower():
new_text.append(char.upper())
else:
new_text.append(char.lower())
# join the letters back into a string
return ''.join(new_text)
print(myFunc("AaAAaaaAAaAa")) # aAaaAAAaaAaA
or shorter:
def my2ndFunc(text):
return ''.join( a.upper() if a.islower() else a.lower() for a in text)
using a list comprehension and a ternary expression to modify the letter (see Does Python have a ternary conditional operator?)
The problem was that you were doing a replace of all ocurrances of that character in the string. Here you have a working solution:
def myFunc(a):
result = ''
for chars in range(0,len(a)):
print(a[chars])
if a[chars].islower():
result += a[chars].upper()
elif a[chars].isupper():
result += a[chars].lower()
return result
print(myFunc("AaAAaaaAAaAa"))

Checking for duplicate letters within lists by using a histogram function

I'm trying to knock out my homework, but having difficulties incorporating the required histogram function.
This is the code I have to work with:
alphabet = "abcdefghijklmnopqrstuvwxyz"
test_dups = ["zzz","dog","bookkeeper","subdermatoglyphic","subdermatoglyphics"]
test_miss = ["zzz","subdermatoglyphic","the quick brown fox jumps over the lazy dog"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
I need to write a function called has_duplicates() that takes a string parameter and returns True if the string has any repeated characters. Otherwise, it should return False.
Implement has_duplicates() by creating a histogram using the histogram() function above. Do not use any of the implementations of has_duplicates() that are given in your textbook. Instead, your implementation should use the counts in the histogram to decide if there are any duplicates.
Write a loop over the strings in the provided test_dups list. Print each string in the list and whether or not it has any duplicates based on the return value of has_duplicates() for that string. For example, the output for aaa and abc would be the following.
aaa has duplicates
abc has no duplicates
Print a line like one of the above for each of the strings in test_dups.
Write a function called missing_letters that takes a string parameter and returns a new string with all the letters of the alphabet that are not in the argument string. The letters in the returned string should be in alphabetical order.
My implementation should use a histogram from the histogram() function. It should also use the global variable alphabet. It should use this global variable directly, not through an argument or a local copy. It should loop over the letters in alphabet to determine which are missing from the input parameter.
The function missing_letters should combine the list of missing letters into a string and return that string.
Write a loop over the strings in list test_miss and call missing_letters with each string. Print a line for each string listing the missing letters. For example, for the string "aaa", the output should be the following.
aaa is missing letters bcdefghijklmnopqrstuvwxyz
If the string has all the letters in alphabet, the output should say it uses all the letters. For example, the output for the string alphabet itself would be the following.
"abcdefghijklmnopqrstuvwxyz uses all the letters"
Print a line like one of the above for each of the strings in test_miss.
This is as far as I got...
def has_duplicates(t):
if histogram(t) > 1:
return True
else:
return False
Result:
'>' not supported between instances of 'str' and 'int'
The following should provide the desired result:
alphabet = "abcdefghijklmnopqrstuvwxyz"
test_dups = ["zzz","dog","bookkeeper","subdermatoglyphic","subdermatoglyphics"]
test_miss = ["zzz","subdermatoglyphic","the quick brown fox jumps over the lazy dog"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def has_duplicates(s):
# Return False if each letter in s is not distinct
return len(histogram(s)) != len(s)
def missing_letters(s):
h = histogram(s)
rv = ''
# Loop over letters in alphabet, if the letter is not in the histogram then
# append to the return string.
for c in alphabet:
if c not in h:
rv = rv + c
return rv
# Loop over test strings as required.
for s in test_miss:
miss = missing_letters(s)
if miss:
print(f"{s} is missing letters {miss}.")
else:
print(f"{s} uses all the letters.")
Output:
zzz is missing letters abcdefghijklmnopqrstuvwxy.
subdermatoglyphic is missing letters fjknqvwxz.
the quick brown fox jumps over the lazy dog uses all the letters.
alphabet = "abcdefghijklmnopqrstuvwxyz"
test_dups = ["zzz", "dog", "bookkeeper", "subdermatoglyphic", "subdermatoglyphics"]
test_miss = ["zzz", "subdermatoglyphic", "the quick brown fox jumps over the lazy dog"]
def histogram(string):
d = dict()
for char in string:
if char not in d:
d[char] = 1
else:
d[char] += 1
return d
# Part 1
def has_duplicate(string):
h = histogram(string)
for k, v in h.items():
if v > 1:
return True
return False
for string in test_dups:
if has_duplicate(string):
print(string, "has duplicates")
else:
print(string, "has no duplicates")
# Part 2
def missing_letters(string):
h = histogram(string)
new_list = []
for char in alphabet:
if char not in h:
new_list.append(char)
return "".join(new_list)
for string in test_miss:
new_list = missing_letters(string)
if len(new_list):
print(string, "is missing letters", new_list)
else:
print(string, "uses all letters")

Regex replace in Spyder with case conversion [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
Example:
>>> convert('CamelCase')
'camel_case'
Camel case to snake case
import re
name = 'CamelCaseName'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name) # camel_case_name
If you do this many times and the above is slow, compile the regex beforehand:
pattern = re.compile(r'(?<!^)(?=[A-Z])')
name = pattern.sub('_', name).lower()
To handle more advanced cases specially (this is not reversible anymore):
def camel_to_snake(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
print(camel_to_snake('camel2_camel2_case')) # camel2_camel2_case
print(camel_to_snake('getHTTPResponseCode')) # get_http_response_code
print(camel_to_snake('HTTPResponseCodeXYZ')) # http_response_code_xyz
To add also cases with two underscores or more:
def to_snake_case(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
name = re.sub('__([A-Z])', r'_\1', name)
name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', name)
return name.lower()
Snake case to pascal case
name = 'snake_case_name'
name = ''.join(word.title() for word in name.split('_'))
print(name) # SnakeCaseName
There's an inflection library in the package index that can handle these things for you. In this case, you'd be looking for inflection.underscore():
>>> inflection.underscore('CamelCase')
'camel_case'
I don't know why these are all so complicating.
for most cases, the simple expression ([A-Z]+) will do the trick
>>> re.sub('([A-Z]+)', r'_\1','CamelCase').lower()
'_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camelCase').lower()
'camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camel2Case2').lower()
'camel2_case2'
>>> re.sub('([A-Z]+)', r'_\1','camelCamelCase').lower()
'camel_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'
To ignore the first character simply add look behind (?!^)
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCase').lower()
'camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCamelCase').lower()
'camel_camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','Camel2Camel2Case').lower()
'camel2_camel2_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'
If you want to separate ALLCaps to all_caps and expect numbers in your string you still don't need to do two separate runs just use | This expression ((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z])) can handle just about every scenario in the book
>>> a = re.compile('((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z]))')
>>> a.sub(r'_\1', 'getHTTPResponseCode').lower()
'get_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponseCode').lower()
'get2_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponse123Code').lower()
'get2_http_response123_code'
>>> a.sub(r'_\1', 'HTTPResponseCode').lower()
'http_response_code'
>>> a.sub(r'_\1', 'HTTPResponseCodeXYZ').lower()
'http_response_code_xyz'
It all depends on what you want so use the solution that best suits your needs as it should not be overly complicated.
nJoy!
Avoiding libraries and regular expressions:
def camel_to_snake(s):
return ''.join(['_'+c.lower() if c.isupper() else c for c in s]).lstrip('_')
>>> camel_to_snake('ThisIsMyString')
'this_is_my_string'
stringcase is my go-to library for this; e.g.:
>>> from stringcase import pascalcase, snakecase
>>> snakecase('FooBarBaz')
'foo_bar_baz'
>>> pascalcase('foo_bar_baz')
'FooBarBaz'
I think this solution is more straightforward than previous answers:
import re
def convert (camel_input):
words = re.findall(r'[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+', camel_input)
return '_'.join(map(str.lower, words))
# Let's test it
test_strings = [
'CamelCase',
'camelCamelCase',
'Camel2Camel2Case',
'getHTTPResponseCode',
'get200HTTPResponseCode',
'getHTTP200ResponseCode',
'HTTPResponseCode',
'ResponseHTTP',
'ResponseHTTP2',
'Fun?!awesome',
'Fun?!Awesome',
'10CoolDudes',
'20coolDudes'
]
for test_string in test_strings:
print(convert(test_string))
Which outputs:
camel_case
camel_camel_case
camel_2_camel_2_case
get_http_response_code
get_200_http_response_code
get_http_200_response_code
http_response_code
response_http
response_http_2
fun_awesome
fun_awesome
10_cool_dudes
20_cool_dudes
The regular expression matches three patterns:
[A-Z]?[a-z]+: Consecutive lower-case letters that optionally start with an upper-case letter.
[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$): Two or more consecutive upper-case letters. It uses a lookahead to exclude the last upper-case letter if it is followed by a lower-case letter.
\d+: Consecutive numbers.
By using re.findall we get a list of individual "words" that can be converted to lower-case and joined with underscores.
Personally I am not sure how anything using regular expressions in python can be described as elegant. Most answers here are just doing "code golf" type RE tricks. Elegant coding is supposed to be easily understood.
def to_snake_case(not_snake_case):
final = ''
for i in xrange(len(not_snake_case)):
item = not_snake_case[i]
if i < len(not_snake_case) - 1:
next_char_will_be_underscored = (
not_snake_case[i+1] == "_" or
not_snake_case[i+1] == " " or
not_snake_case[i+1].isupper()
)
if (item == " " or item == "_") and next_char_will_be_underscored:
continue
elif (item == " " or item == "_"):
final += "_"
elif item.isupper():
final += "_"+item.lower()
else:
final += item
if final[0] == "_":
final = final[1:]
return final
>>> to_snake_case("RegularExpressionsAreFunky")
'regular_expressions_are_funky'
>>> to_snake_case("RegularExpressionsAre Funky")
'regular_expressions_are_funky'
>>> to_snake_case("RegularExpressionsAre_Funky")
'regular_expressions_are_funky'
''.join('_'+c.lower() if c.isupper() else c for c in "DeathToCamelCase").strip('_')
re.sub("(.)([A-Z])", r'\1_\2', 'DeathToCamelCase').lower()
Here's my solution:
def un_camel(text):
""" Converts a CamelCase name into an under_score name.
>>> un_camel('CamelCase')
'camel_case'
>>> un_camel('getHTTPResponseCode')
'get_http_response_code'
"""
result = []
pos = 0
while pos < len(text):
if text[pos].isupper():
if pos-1 > 0 and text[pos-1].islower() or pos-1 > 0 and \
pos+1 < len(text) and text[pos+1].islower():
result.append("_%s" % text[pos].lower())
else:
result.append(text[pos].lower())
else:
result.append(text[pos])
pos += 1
return "".join(result)
It supports those corner cases discussed in the comments. For instance, it'll convert getHTTPResponseCode to get_http_response_code like it should.
I don't get idea why using both .sub() calls? :) I'm not regex guru, but I simplified function to this one, which is suitable for my certain needs, I just needed a solution to convert camelCasedVars from POST request to vars_with_underscore:
def myFunc(...):
return re.sub('(.)([A-Z]{1})', r'\1_\2', "iTriedToWriteNicely").lower()
It does not work with such names like getHTTPResponse, cause I heard it is bad naming convention (should be like getHttpResponse, it's obviously, that it's much easier memorize this form).
For the fun of it:
>>> def un_camel(input):
... output = [input[0].lower()]
... for c in input[1:]:
... if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
... output.append('_')
... output.append(c.lower())
... else:
... output.append(c)
... return str.join('', output)
...
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'
Or, more for the fun of it:
>>> un_camel = lambda i: i[0].lower() + str.join('', ("_" + c.lower() if c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" else c for c in i[1:]))
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'
Using regexes may be the shortest, but this solution is way more readable:
def to_snake_case(s):
snake = "".join(["_"+c.lower() if c.isupper() else c for c in s])
return snake[1:] if snake.startswith("_") else snake
This is not a elegant method, is a very 'low level' implementation of a simple state machine (bitfield state machine), possibly the most anti pythonic mode to resolve this, however re module also implements a too complex state machine to resolve this simple task, so i think this is a good solution.
def splitSymbol(s):
si, ci, state = 0, 0, 0 # start_index, current_index
'''
state bits:
0: no yields
1: lower yields
2: lower yields - 1
4: upper yields
8: digit yields
16: other yields
32 : upper sequence mark
'''
for c in s:
if c.islower():
if state & 1:
yield s[si:ci]
si = ci
elif state & 2:
yield s[si:ci - 1]
si = ci - 1
state = 4 | 8 | 16
ci += 1
elif c.isupper():
if state & 4:
yield s[si:ci]
si = ci
if state & 32:
state = 2 | 8 | 16 | 32
else:
state = 8 | 16 | 32
ci += 1
elif c.isdigit():
if state & 8:
yield s[si:ci]
si = ci
state = 1 | 4 | 16
ci += 1
else:
if state & 16:
yield s[si:ci]
state = 0
ci += 1 # eat ci
si = ci
print(' : ', c, bin(state))
if state:
yield s[si:ci]
def camelcaseToUnderscore(s):
return '_'.join(splitSymbol(s))
splitsymbol can parses all case types: UpperSEQUENCEInterleaved, under_score, BIG_SYMBOLS and cammelCasedMethods
I hope it is useful
Take a look at the excellent Schematics lib
https://github.com/schematics/schematics
It allows you to created typed data structures that can serialize/deserialize from python to Javascript flavour, eg:
class MapPrice(Model):
price_before_vat = DecimalType(serialized_name='priceBeforeVat')
vat_rate = DecimalType(serialized_name='vatRate')
vat = DecimalType()
total_price = DecimalType(serialized_name='totalPrice')
So many complicated methods...
Just find all "Titled" group and join its lower cased variant with underscore.
>>> import re
>>> def camel_to_snake(string):
... groups = re.findall('([A-z0-9][a-z]*)', string)
... return '_'.join([i.lower() for i in groups])
...
>>> camel_to_snake('ABCPingPongByTheWay2KWhereIsOurBorderlands3???')
'a_b_c_ping_pong_by_the_way_2_k_where_is_our_borderlands_3'
If you don't want make numbers like first character of group or separate group - you can use ([A-z][a-z0-9]*) mask.
A horrendous example using regular expressions (you could easily clean this up :) ):
def f(s):
return s.group(1).lower() + "_" + s.group(2).lower()
p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(f, "CamelCase")
print p.sub(f, "getHTTPResponseCode")
Works for getHTTPResponseCode though!
Alternatively, using lambda:
p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "CamelCase")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "getHTTPResponseCode")
EDIT: It should also be pretty easy to see that there's room for improvement for cases like "Test", because the underscore is unconditionally inserted.
Lightely adapted from https://stackoverflow.com/users/267781/matth
who use generators.
def uncamelize(s):
buff, l = '', []
for ltr in s:
if ltr.isupper():
if buff:
l.append(buff)
buff = ''
buff += ltr
l.append(buff)
return '_'.join(l).lower()
This simple method should do the job:
import re
def convert(name):
return re.sub(r'([A-Z]*)([A-Z][a-z]+)', lambda x: (x.group(1) + '_' if x.group(1) else '') + x.group(2) + '_', name).rstrip('_').lower()
We look for capital letters that are precedeed by any number of (or zero) capital letters, and followed by any number of lowercase characters.
An underscore is placed just before the occurence of the last capital letter found in the group, and one can be placed before that capital letter in case it is preceded by other capital letters.
If there are trailing underscores, remove those.
Finally, the whole result string is changed to lower case.
(taken from here, see working example online)
Here's something I did to change the headers on a tab-delimited file. I'm omitting the part where I only edited the first line of the file. You could adapt it to Python pretty easily with the re library. This also includes separating out numbers (but keeps the digits together). I did it in two steps because that was easier than telling it not to put an underscore at the start of a line or tab.
Step One...find uppercase letters or integers preceded by lowercase letters, and precede them with an underscore:
Search:
([a-z]+)([A-Z]|[0-9]+)
Replacement:
\1_\l\2/
Step Two...take the above and run it again to convert all caps to lowercase:
Search:
([A-Z])
Replacement (that's backslash, lowercase L, backslash, one):
\l\1
I was looking for a solution to the same problem, except that I needed a chain; e.g.
"CamelCamelCamelCase" -> "Camel-camel-camel-case"
Starting from the nice two-word solutions here, I came up with the following:
"-".join(x.group(1).lower() if x.group(2) is None else x.group(1) \
for x in re.finditer("((^.[^A-Z]+)|([A-Z][^A-Z]+))", "stringToSplit"))
Most of the complicated logic is to avoid lowercasing the first word. Here's a simpler version if you don't mind altering the first word:
"-".join(x.group(1).lower() for x in re.finditer("(^[^A-Z]+|[A-Z][^A-Z]+)", "stringToSplit"))
Of course, you can pre-compile the regular expressions or join with underscore instead of hyphen, as discussed in the other solutions.
Concise without regular expressions, but HTTPResponseCode=> httpresponse_code:
def from_camel(name):
"""
ThisIsCamelCase ==> this_is_camel_case
"""
name = name.replace("_", "")
_cas = lambda _x : [_i.isupper() for _i in _x]
seq = zip(_cas(name[1:-1]), _cas(name[2:]))
ss = [_x + 1 for _x, (_i, _j) in enumerate(seq) if (_i, _j) == (False, True)]
return "".join([ch + "_" if _x in ss else ch for _x, ch in numerate(name.lower())])
Without any library :
def camelify(out):
return (''.join(["_"+x.lower() if i<len(out)-1 and x.isupper() and out[i+1].islower()
else x.lower()+"_" if i<len(out)-1 and x.islower() and out[i+1].isupper()
else x.lower() for i,x in enumerate(list(out))])).lstrip('_').replace('__','_')
A bit heavy, but
CamelCamelCamelCase -> camel_camel_camel_case
HTTPRequest -> http_request
GetHTTPRequest -> get_http_request
getHTTPRequest -> get_http_request
Very nice RegEx proposed on this site:
(?<!^)(?=[A-Z])
If python have a String Split method, it should work...
In Java:
String s = "loremIpsum";
words = s.split("(?<!^)(?=[A-Z])");
Just in case someone needs to transform a complete source file, here is a script that will do it.
# Copy and paste your camel case code in the string below
camelCaseCode ="""
cv2.Matx33d ComputeZoomMatrix(const cv2.Point2d & zoomCenter, double zoomRatio)
{
auto mat = cv2.Matx33d::eye();
mat(0, 0) = zoomRatio;
mat(1, 1) = zoomRatio;
mat(0, 2) = zoomCenter.x * (1. - zoomRatio);
mat(1, 2) = zoomCenter.y * (1. - zoomRatio);
return mat;
}
"""
import re
def snake_case(name):
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
def lines(str):
return str.split("\n")
def unlines(lst):
return "\n".join(lst)
def words(str):
return str.split(" ")
def unwords(lst):
return " ".join(lst)
def map_partial(function):
return lambda values : [ function(v) for v in values]
import functools
def compose(*functions):
return functools.reduce(lambda f, g: lambda x: f(g(x)), functions, lambda x: x)
snake_case_code = compose(
unlines ,
map_partial(unwords),
map_partial(map_partial(snake_case)),
map_partial(words),
lines
)
print(snake_case_code(camelCaseCode))
Wow I just stole this from django snippets. ref http://djangosnippets.org/snippets/585/
Pretty elegant
camelcase_to_underscore = lambda str: re.sub(r'(?<=[a-z])[A-Z]|[A-Z](?=[^A-Z])', r'_\g<0>', str).lower().strip('_')
Example:
camelcase_to_underscore('ThisUser')
Returns:
'this_user'
REGEX DEMO
def convert(name):
return reduce(
lambda x, y: x + ('_' if y.isupper() else '') + y,
name
).lower()
And if we need to cover a case with already-un-cameled input:
def convert(name):
return reduce(
lambda x, y: x + ('_' if y.isupper() and not x.endswith('_') else '') + y,
name
).lower()
Not in the standard library, but I found this module that appears to contain the functionality you need.
If you use Google's (nearly) deterministic Camel case algorithm, then one does not need to handle things like HTMLDocument since it should be HtmlDocument, then this regex based approach is simple. It replace all capitals or numbers with an underscore. Note does not handle multi digit numbers.
import re
def to_snake_case(camel_str):
return re.sub('([A-Z0-9])', r'_\1', camel_str).lower().lstrip('_')
def convert(camel_str):
temp_list = []
for letter in camel_str:
if letter.islower():
temp_list.append(letter)
else:
temp_list.append('_')
temp_list.append(letter)
result = "".join(temp_list)
return result.lower()
Use: str.capitalize() to convert first letter of the string (contained in variable str) to a capital letter and returns the entire string.
Example:
Command: "hello".capitalize()
Output: Hello

Problems title-casing a string in Python

I have a name as a string, in this example "markus johansson".
I'm trying to code a program that makes 'm' and 'j' uppercase:
name = "markus johansson"
for i in range(1, len(name)):
if name[0] == 'm':
name[0] = "M"
if name[i] == " ":
count = name[i] + 1
if count == 'j':
name[count] = 'J'
I'm pretty sure this should work, but it gives me this error:
File "main.py", line 5 in <module>
name[0] = "M"
TypeError: 'str' object does support item assignment
I know there is a library function called .title(), but I want to do "real programming".
How do I fix this?
I guess that what you're trying to achieve is:
from string import capwords
capwords(name)
Which yields:
'Markus Johansson'
EDIT: OK, I see you want to tear down a open door.
Here's low level implementation.
''.join([char.upper() if prev==' ' else char for char,prev in zip(name,' '+name)])
>>> "markus johansson".title()
'Markus Johansson'
Built in string methods are the way to go.
EDIT:
I see you want to re-invent the wheel. Any particular reason ?
You can choose from any number of convoluted methods like:
' '.join(j[0].upper()+j[1:] for j in "markus johansson".split())
Standard Libraries are still the way to go.
string.capwords() (defined in string.py)
# Capitalize the words in a string, e.g. " aBc dEf " -> "Abc Def".
def capwords(s, sep=None):
"""capwords(s, [sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. Note that this replaces runs of whitespace characters by
a single space.
"""
return (sep or ' ').join(x.capitalize() for x in s.split(sep))
str.title() (defined in stringobject.c)
PyDoc_STRVAR(title__doc__,
"S.title() -> string\n\
\n\
Return a titlecased version of S, i.e. words start with uppercase\n\
characters, all remaining cased characters have lowercase.");
static PyObject*
string_title(PyStringObject *self)
{
char *s = PyString_AS_STRING(self), *s_new;
Py_ssize_t i, n = PyString_GET_SIZE(self);
int previous_is_cased = 0;
PyObject *newobj = PyString_FromStringAndSize(NULL, n);
if (newobj == NULL)
return NULL;
s_new = PyString_AsString(newobj);
for (i = 0; i < n; i++) {
int c = Py_CHARMASK(*s++);
if (islower(c)) {
if (!previous_is_cased)
c = toupper(c);
previous_is_cased = 1;
} else if (isupper(c)) {
if (previous_is_cased)
c = tolower(c);
previous_is_cased = 1;
} else
previous_is_cased = 0;
*s_new++ = c;
}
return newobj;
}
str.title() in pure Python
class String(str):
def title(self):
s = []
previous_is_cased = False
for c in self:
if c.islower():
if not previous_is_cased:
c = c.upper()
previous_is_cased = True
elif c.isupper():
if previous_is_cased:
c = c.lower()
previous_is_cased = True
else:
previous_is_cased = False
s.append(c)
return ''.join(s)
Example:
>>> s = ' aBc dEf '
>>> import string
>>> string.capwords(s)
'Abc Def'
>>> s.title()
' Abc Def '
>>> s
' aBc dEf '
>>> String(s).title()
' Abc Def '
>>> String(s).title() == s.title()
True
Strings are immutable. They can't be changed. You must create a new string with the changed content.
If you want to make every 'j' uppercase:
def make_uppercase_j(char):
if char == 'j':
return 'J'
else:
return char
name = "markus johansson"
''.join(make_uppercase_j(c) for c in name)
If you're looking into more generic solution for names, you should also look at following examples:
John Adams-Smith
Joanne d'Arc
Jean-Luc de'Breu
Donatien Alphonse François de Sade
Also some parts of the names shouldn't start with capital letters, like:
Herbert von Locke
Sander van Dorn
Edwin van der Sad
so, if you're looking into creating a more generic solution, keep all those little things in mind.
(This would be a perfect place to run a test-driven development, with all those conditions your method/function must follow).
If I understand your original algorithm correctly, this is what you want to do:
namn = list("markus johansson")
if namn[0] == 'm':
namn[0] = "M"
count = 0
for i in range(1, len(namn)):
if namn[i] == " ":
count = i + 1
if count and namn[count] == 'j':
namn[count] = 'J'
print ''.join(namn)
Of course, there's a million better ways ("wannabe" ways) to do what you're trying to do, like as shown in vartec's answer. :)
As it stands, your code only works for names that start with a J and an M for the first and last names, respectively.
Plenty of good suggestions, so I'll be in good company adding my own 2 cents :-)
I'm assuming you want something a little more generic that can handle more than just names starting with 'm' and 'j'. You'll probably also want to consider hyphenated names (like Markus Johnson-Smith) which have caps after the hyphen too.
from string import lowercase, uppercase
name = 'markus johnson-smith'
state = 0
title_name = []
for c in name:
if c in lowercase and not state:
c = uppercase[lowercase.index(c)]
state = 1
elif c in [' ', '-']:
state = 0
else:
state = 1 # might already be uppercase
title_name.append(c)
print ''.join(title_name)
Last caveat is the potential for non-ascii characters. Using the uppercase and lowercase properties of the string module is good in this case becase their contents change depending on the user's locale (ie: system-dependent, or when locale.setlocale() is called). I know you want to avoid using upper() for this exercise, and that's quite neat... as an FYI, upper() uses the locale controlled by setlocale() too, so the practice of use uppercase and lowercase is a good use of the API without getting too high-level. That said, if you need to handle, say, French names on a system running an English locale, you'll need a more robust implementation.
"real programming"?
I would use .title(), and I'm a real programmer.
Or I would use regular expressions
re.sub(r"(^|\s)[a-z]", lambda m: m.group(0).upper(), "this is a set of words")
This says "If the start of the text or a whitespace character is followed by a lower-case letter" (in English - other languages are likely not supported), then for each match convert the match text to upper-case. Since the match text is the space and the lower-case letter, this works just fine.
If you want it as low-level code then the following works. Here I only allow space as the separator (but you may want to support newline and other characters). On the other hand, "string.lowercase" is internationalized, so if you're in another locale then it will, for the most part, still work. If you don't want that then use string.ascii_lowercase.
import string
def title(s):
# Capitalize the first character
if s[:1] in string.lowercase:
s = s[0].upper() + s[1:]
# Find spaces
offset = 0
while 1:
offset = s.find(" ", offset)
# Reached the end of the string or the
# last character is a space
if offset == -1 or offset == len(s)-1:
break
if s[offset+1:offset+2] in string.lowercase:
# Is it followed by a lower-case letter?
s = s[:offset+1] + s[offset+1].upper() + s[offset+2:]
# Skip the space and the letter
offset += 2
else:
# Nope, so start searching for the next space
offset += 1
return s
To elaborate on my comment to this answer, this question can only be an exercise for curiosity's sake. Real names have special capitalization rules: the "van der" in "Johannes Diderik van der Waals" is never capitalized, "Farrah Fawcett-Majors" has the "M", and "Cathal Ó hEochaidh" uses the non-ASCII Ó and h, which modify "Eochaidh" to mean "grandson of Eochaidh".
string = 'markus johansson'
string = ' '.join(substring[0].upper() + substring[1:] for substring in string.split(' '))
# string == 'Markus Johansson'

Categories