How to check whether two words are anagrams python - python

I'm dealing with a simple problem:
checking if two strings are anagrams.
I wrote the simple code that can check whether two strings, such as
'abcd' and 'dcba', are anagrams, but I have no idea what to do with more complex ones, like "Astronomer" and "Moon starter."
line1 = input('Enter the first word: ')
line2 = input('Enter the second word: ')
def deleteSpaces(s):
s_new = s.replace(" ","")
return s_new
def anagramSolution2(s1,s2):
alist1 = list(deleteSpaces(s1))
alist2 = list(deleteSpaces(s2))
print(alist1)
print(alist2)
alist1.sort()
alist2.sort()
pos = 0
matches = True
while pos < len(deleteSpaces(s1)) and matches:
if alist1[pos]==alist2[pos]:
pos = pos + 1
else:
matches = False
return matches
Firstly I thought that the problem lies in working with spaces, but then I understood that my algorithm doesn't work if the strings are not the same size.
I have no idea what to do in that case.
Here I found a beautiful solution, but it doesn't work either:
def anagrams(s1,s2):
return [False, True][sum([ord(x) for x in s1]) == sum([ord(x) for x in s2])]
If I run this function and test it on two strings, I'll get such output:
Examples:
First Word: apple
Second Word: pleap
output: True
First Word: Moon starter
Second Word: Astronomer
output: False //however it should should be True because this words are anagrams

Your algorithm is ok. Your problem is that you don't consider upper and lower case letters. Changing the two lines
alist1 = list(deleteSpaces(s1))
alist2 = list(deleteSpaces(s2))
to
alist1 = list(deleteSpaces(s1).lower())
alist2 = list(deleteSpaces(s2).lower())
will solve your issue.
As an alternative you could simply use the following function:
def anagrams(s1, s2):
def sort(s):
return sorted(s.replace(" ", "").lower())
return sort(s1) == sort(s2)
If you want to have a faster solution with complexity of O(n), you should use a Counter instead of sorting the two words:
from collections import Counter
def anagrams(s1, s2):
def get_counter(s):
return Counter(s.replace(" ", "").lower())
return get_counter(s1) == get_counter(s2)

As other's have pointed out, your algorithm is given 'false' results as Moon starter and Astronomer are in fact not anagrams.
You can drastically improve your algorithm by simply using the available object methods. They already provide all the functionality.
def normalize_str(s):
return s.replace(" ","").lower()
def anagramSolution2(s1,s2):
return sorted(normalize_str(s1)) == sorted(normalize_str(s2))
normalize_str is like your deleteSpaces, but it also converts everything to lowercase. This way, Moon and moon will compare equal. You may want to do stricter or looser normalization in the end, it's just an example.
The call to sorted will already provide you with a list, you don't have to do an explicit conversion. Also, list comparison via == will compare the lists element wise (what you are doing with the for loop), but much more efficiently.

Something like this?
$ cat /tmp/tmp1.py
#!/usr/bin/env python
def anagram (first, second):
return sorted(first.lower()) == sorted(second.lower())
if __name__ == "__main__":
for first, second in [("abcd rsh", "abcd x rsh"), ("123 456 789", "918273645 ")]:
print("is anagram('{0}', '{1}')? {2}".format(first, second, anagram(first, second)))
which gives:
$ python3 /tmp/tmp1.py
is anagram('abcd rsh', 'abcd x rsh')? False
is anagram('123 456 789', '918273645 ')? True

a=input("string1:");
b=input("string2:");
def anagram(a,b):
arra=split(a)
arrb=split(b)
arra.sort()
arrb.sort()
if (len(arra)==len(arrb)):
if(arra==arrb):
print ("True")
else:
ana=0;
print ("False");
else:
print ("False");
def split(x):
x=x.replace(' ','').lower()
temp=[]
for i in x:
temp.append(i)
return temp;
anagram(a,b)

Lazy way of doing it but super clean:
def anagram(a,b):
return cleanString(a) == cleanString(b)
def cleanString(string):
return ''.join(sorted(string)).lower().split()

def anagram(str1,str2):
l1=list(str1)
l2=list(str2)
if sorted(l1) == sorted(l2):
print("yess")
else:
print("noo")
str1='listen'
str2='silent'
anagram(str1,str2)
although not the most optimized solution but works fine.

I found this the best way:
from collections import Counter
def is_anagram(string1, string2):
return Counter(string1) == Counter(string2)

I tried this and it worked
def is_anagram(word1,word2):
a = len(word1)
b = len(word2)
if a == b:
for l in word1:
if l in word2:
return True
return False

Related

Permutations to compare two different string to match

I'm cracking my head here to solve this problem.
I'm trying to compare two strings like:
a = 'apple'
b = 'ppale'
If I user permutation method, variable b could be re-generated and it can match with the variable a.
I wrote the code, god knows why, I'm only getting False value even when I'm expecting True.
import itertools
def gen(char, phrase):
char = list(char)
wordsList = list(map(''.join, itertools.permutations(char)))
for word in wordsList:
if word == phrase:
return True
else:
return False
print(gen('apple', 'appel')) # should be True
print(gen('apple', 'elppa')) # should be True
print(gen('apple', 'ap')) # should be False
The problem is you're returning on the first iteration of the loop, regardless of whether there's a match or not. In particular, you're returning False the first time there's a mismatch. Instead, you need to complete the entire loop before returning False:
def gen(char, phrase):
char = list(char)
wordsList = list(map(''.join, itertools.permutations(char)))
for word in wordsList:
if word == phrase:
return True
return False
Note that there are some improvements that can be made. One is there's no need to do char = list(char), since char is already an iterable. Also, there's no need to expand the map result into a list. It's just being used as an iterable, so it can be used directly, which can potentially save a lot of memory:
def gen(char, phrase):
wordsIter = map(''.join, itertools.permutations(char))
for word in wordsIter:
if word == phrase:
return True
return False
However, since you're just comparing two words for the same characters, you don't really need to generate all the permutations. Instead, you just need to check to see of the two sets of characters are the same (allowing for multiple instances of characters, so technically you want to compare two multisets). You can do this much more efficiently as follows:
import collections
def gen(char, phrase):
counter1 = collections.Counter(char)
counter2 = collections.Counter(phrase)
return counter1 == counter2
This is an O(n) algorithm, which is the best that can be done for this problem. This is much faster than generating the permutations. For long strings, it's also significantly faster than sorting the letters and comparing the sorted results, which is O(n*log(n)).
Example output:
>>> gen("apple", "elxpa")
False
>>> gen("apple", "elppa")
True
>>> gen("apple", "elpa")
False
>>>
Note that this only returns True if the letters are the same, and the number of each letter is the same.
If you want to speed up the case where the two strings have different lengths, you could add a fast check up front that returns False if the lengths differ, before counting the characters.
The main reason it's not working is this that your loop is returning False back to the caller as soon as the first non-match occurs. What you want is something like this:
for word in wordsList:
if word == phrase:
return True
return False
which will test one or more permutations; if one matches, it will return True immediately, but only after they all fail to match will it return False.
Also, there's no need to do the char = list(char). A string is an iterable, just like a list, so you can use it as an argument to permutations().
You can do this much more simply: sort the letter in each word and compare for equality.
def gen(a, b):
return sorted(a) == sorted(b)
An alternate method you can use is to just loop through the letters of one word, and remove the letters in the other word if it exists. If after the iteration of the first word, the second word is empty, it is an anagram:
def anagram(word1, word2):
for letter in word1:
word2 = word2.replace(letter,"",1)
if word2 == "":
return True
return False
a = "apple"
b = "pleap"
print(anagram(a,b)) #returns True
a = "apple"
b = "plaap"
print(anagram(a,b)) #returns False

Isomorphic Strings (the easy way?)

So I have been trying to solve the Easy questions on Leetcode and so far I dont understand most of the answers I find on the internet. I tried working on the Isomorphic strings problem (here:https://leetcode.com/problems/isomorphic-strings/description/)
and I came up with the following code
def isIso(a,b):
if(len(a) != len(b)):
return false
x=[a.count(char1) for char1 in a]
y=[b.count(char1) for char1 in b]
return x==y
string1 = input("Input string1..")
string2 = input("Input string2..")
print(isIso(string1,string2))
Now I understand that this may be the most stupid code you have seen all day but that is kinda my point. I'd like to know why this would be wrong(and where) and how I should further develop on this.
If I understand the problem correctly, because a character can map to itself, it's just a case of seeing if the character counts for the two words are the same.
So egg and add are isomorphic as they have character counts of (1,2). Similarly paper and title have counts of (1,1,1,2).
foo and bar aren't isomorphic as the counts are (1,2) and (1,1,1) respectively.
To see if the character counts are the same we'll need to sort them.
So:
from collections import Counter
def is_isomorphic(a,b):
a_counts = list(Counter(a).values())
a_counts.sort()
b_counts = list(Counter(b).values())
b_counts.sort()
if a_counts == b_counts:
return True
return False
Your code is failing because here:
x=[a.count(char1) for char1 in a]
You count the occurrence of each character in the string for each character in the string. So a word like 'odd' won't have counts of (1,2), it'll have (1,2,2) as you count d twice!
You can use two dicts to keep track of the mapping of each character in a to b, and the mapping of each character in b to a while you iterate through a, and if there's any violation in a corresponding character, return False; otherwise return True in the end.
def isIso(a, b):
m = {} # mapping of each character in a to b
r = {} # mapping of each character in b to a
for i, c in enumerate(a):
if c in m:
if b[i] != m[c]:
return False
else:
m[c] = b[i]
if b[i] in r:
if c != r[b[i]]:
return False
else:
r[b[i]] = c
return True
So that:
print(isIso('egg', 'add'))
print(isIso('foo', 'bar'))
print(isIso('paper', 'title'))
print(isIso('paper', 'tttle')) # to test reverse mapping
would output:
True
False
True
False
I tried by creating a dictionary, and it resulted in 72ms runtime.
here's my code -
def isIsomorphic(s: str, t: str) -> bool:
my_dict = {}
if len(s) != len(t):
return False
else:
for i in range(len(s)):
if s[i] in my_dict.keys():
if my_dict[s[i]] == t[i]:
pass
else:
return False
else:
if t[i] in my_dict.values():
return False
else:
my_dict[s[i]] = t[i]
return True
There are many different ways on how to do it. Below I provided three different ways by using a dictionary, set, and string.translate.
Here I provided three different ways how to solve Isomorphic String solution in Python.
from itertools import zip_longest
def isomorph(a, b):
return len(set(a)) == len(set(b)) == len(set(zip_longest(a, b)))
here is the second way to do it:
def isomorph(a, b):
return [a.index(x) for x in a] == [b.index(y) for y in b]

Finding anagrams in Python

I solved this problem the following way in Python:
s1,s2 = raw_input().split()
set1 = set(s1)
set2 = set(s2)
diff = len(set1.intersection(s2))
if(diff == 0)
print "Anagram!"
else:
print "Not Anagram!"
It seemed fine to me. But my professor's program said I'm missing some edge cases. Can you think of any edge cases I might have missed?
The correct way to solve this would be to count the number of characters in both the strings and comparing each of them to see if all the characters are the same and their counts are the same.
Python has a collections.Counter to do this job for you. So, you can simply do
from collections import Counter
if Counter(s1) == Counter(s2):
print "Anagram!"
else:
print "Not Anagram!"
If you don't want to use Counter, you can roll your own version of it, with normal dictionaries and then compare them.
def get_frequency(input_string):
result = {}
for char in input_string:
result[char] = result.get(char, 0) + 1
return result
if get_frequency(s1) == get_frequency(s2):
print "Anagram!"
else:
print "Not Anagram!"
use sorted :
>>> def using_sorted(s1,s2):
... return sorted(s1)==sorted(s2)
...
>>> using_sorted("hello","llho")
False
>>> using_sorted("hello","llhoe")
True
you can also use count:
>>> def using_count(s1,s2):
... if len(s1)==len(s2):
... for x in s1:
... if s1.count(x)!=s2.count(x):
... return False
... return True
... else: return False
...
>>> using_count("abb","ab")
False
>>> using_count("abb","bab")
True
>>> using_count("hello","llohe")
True
>>> using_count("hello","llohe")
sorted solution runs in O(n lg n) complexity and the count solution runs in O(n ^ 2) complexity, whereas the Counter solution in runs in O(N).
Note collections.Counter is better to use
check #fourtheye solution
Another way without sorting considering all are alphabets:
>>> def anagram(s1, s2):
... return sum([ord(x)**2 for x in s1]) == sum([ord(x)**2 for x in s2])
...
>>> anagram('ark', 'day')
False
>>> anagram('abcdef', 'bdefa')
False
>>> anagram('abcdef', 'bcdefa')
True
>>>
Don't do it with set Theory:
Code:
a='aaab'
b='aab'
def anagram(a,b):
setA=list(a)
setB=list(b)
print setA, setB
if len(setA) !=len(setB):
print "no anagram"
diff1 =''.join(sorted(setA))
diff2= ''.join(sorted(setB))
if (diff1 == diff2 ):
print "matched"
else:
print "Mismatched"
anagram(a,b)
the anagram check with two strings
def anagrams (s1, s2):
# the sorted strings are checked
if(sorted(s1.lower())== sorted(s2.lower())):
return True
else:
return False
check anagram in one liner
def anagram_checker(str1, str2):
"""
Check if the input strings are anagrams of each other
Args:
str1(string),str2(string): Strings to be checked
Returns:
bool: Indicates whether strings are anagrams
"""
return sorted(str1.replace(" ", "").lower()) == sorted(str2.replace(" ", "").lower())
print(anagram_checker(" XYZ","z y x"))

Run length encoding in Python

I'm trying to write a simple Python algorithm to solve this problem. Can you please help me figure out how to do this?
If any character is repeated more than 4 times, the entire set of
repeated characters should be replaced with a slash '/', followed by a
2-digit number which is the length of this run of repeated characters,
and the character. For example, "aaaaa" would be encoded as "/05a".
Runs of 4 or less characters should not be replaced since performing
the encoding would not decrease the length of the string.
I see many great solutions here but none that feels very pythonic to my eyes. So I'm contributing with a implementation I wrote myself today for this problem.
def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
"""Returns run length encoded Tuples for string"""
# A memory efficient (lazy) and pythonic solution using generators
return ((x, sum(1 for _ in y)) for x, y in groupby(data))
This will return a generator of Tuples with the character and number of instances, but can easily be modified to return a string as well. A benefit of doing it this way is that it's all lazy evaluated and won't consume more memory or cpu than needed if you don't need to exhaust the entire search space.
If you still want string encoding the code can quite easily be modified for that use case like this:
def run_length_encode(data: str) -> str:
"""Returns run length encoded string for data"""
# A memory efficient (lazy) and pythonic solution using generators
return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))
This is a more generic run length encoding for all lengths, and not just for those of over 4 characters. But this could also quite easily be adapted with a conditional for the string if wanted.
Rosetta Code has a lot of implementations, that should easily be adaptable to your usecase.
Here is Python code with regular expressions:
from re import sub
def encode(text):
'''
Doctest:
>>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
'12W1B12W3B24W1B14W'
'''
return sub(r'(.)\1*', lambda m: str(len(m.group(0))) + m.group(1),
text)
def decode(text):
'''
Doctest:
>>> decode('12W1B12W3B24W1B14W')
'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
'''
return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
text)
textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
assert decode(encode(textin)) == textin
Aside for setting a=i after encoding a sequence and setting a width for your int when printed into the string. You could also do the following which takes advantage of pythons groupby. Its also a good idea to use format when constructing strings.
from itertools import groupby
def runLengthEncode (plainText):
res = []
for k,i in groupby(plainText):
run = list(i)
if(len(run) > 4):
res.append("/{:02}{}".format(len(run), k))
else:
res.extend(run)
return "".join(res)
Just observe the behaviour:
>>> runLengthEncode("abcd")
'abc'
Last character is ignored. You have to append what you've collected.
>>> runLengthEncode("abbbbbcd")
'a/5b/5b'
Oops, problem after encoding. You should set a=i even if you found a long enough sequence.
I know this is not the most efficient solution, but we haven't studied functions like groupby() yet so here's what I did:
def runLengthEncode (plainText):
res=''
a=''
count = 0
for i in plainText:
count+=1
if a.count(i)>0:
a+=i
else:
if len(a)>4:
if len(a)<10:
res+="/0"+str(len(a))+a[0][:1]
else:
res+="/" + str(len(a)) + a[0][:1]
a=i
else:
res+=a
a=i
if count == len(plainText):
if len(a)>4:
if len(a)<10:
res+="/0"+str(len(a))+a[0][:1]
else:
res+="/" + str(len(a)) + a[0][:1]
else:
res+=a
return(res)
Split=(list(input("Enter string: ")))
Split.append("")
a = 0
for i in range(len(Split)):
try:
if (Split[i] in Split) >0:
a = a + 1
if Split[i] != Split[i+1]:
print(Split[i],a)
a = 0
except IndexError:
print()
this is much easier and works everytime
def RLE_comp_encode(text):
if text == text[0]*len(text) :
return str(len(text))+text[0]
else:
comp_text , r = '' , 1
for i in range (1,len(text)):
if text[i]==text[i-1]:
r +=1
if i == len(text)-1:
comp_text += str(r)+text[i]
else :
comp_text += str(r)+text[i-1]
r = 1
return comp_text
This worked for me,
You can use the groupby() function combined with a list/generator comprehension:
from itertools import groupby, imap
''.join(x if reps <= 4 else "/%02d%s" % (reps, x) for x, reps in imap(lambda x: (x[0], len(list(x[1]))), groupby(s)))
An easy solution to run-length encoding which I can think of:
For encoding a string like "a4b5c6d7...":
def encode(s):
counts = {}
for c in s:
if counts.get(c) is None:
counts[c] = s.count(c)
return "".join(k+str(v) for k,v in counts.items())
For decoding a string like "aaaaaabbbdddddccccc....":
def decode(s):
return "".join((map(lambda tup: tup[0] * int(tup[1]), zip(s[0:len(s):2], s[1:len(s):2]))))
Fairly easy to read and simple.
text=input("Please enter the string to encode")
encoded=[]
index=0
amount=1
while index<=(len(text)-1):
if index==(len(text)-1) or text[index]!=text[(index+1)]:
encoded.append((text[index],amount))
amount=1
else:
amount=amount+1
index=index+1
print(encoded)

How can I simplify this conversion from underscore to camelcase in Python?

I have written the function below that converts underscore to camelcase with first word in lowercase, i.e. "get_this_value" -> "getThisValue". Also I have requirement to preserve leading and trailing underscores and also double (triple etc.) underscores, if any, i.e.
"_get__this_value_" -> "_get_ThisValue_".
The code:
def underscore_to_camelcase(value):
output = ""
first_word_passed = False
for word in value.split("_"):
if not word:
output += "_"
continue
if first_word_passed:
output += word.capitalize()
else:
output += word.lower()
first_word_passed = True
return output
I am feeling the code above as written in non-Pythonic style, though it works as expected, so looking how to simplify the code and write it using list comprehensions etc.
This one works except for leaving the first word as lowercase.
def convert(word):
return ''.join(x.capitalize() or '_' for x in word.split('_'))
(I know this isn't exactly what you asked for, and this thread is quite old, but since it's quite prominent when searching for such conversions on Google I thought I'd add my solution in case it helps anyone else).
Your code is fine. The problem I think you're trying to solve is that if first_word_passed looks a little bit ugly.
One option for fixing this is a generator. We can easily make this return one thing for first entry and another for all subsequent entries. As Python has first-class functions we can get the generator to return the function we want to use to process each word.
We then just need to use the conditional operator so we can handle the blank entries returned by double underscores within a list comprehension.
So if we have a word we call the generator to get the function to use to set the case, and if we don't we just use _ leaving the generator untouched.
def underscore_to_camelcase(value):
def camelcase():
yield str.lower
while True:
yield str.capitalize
c = camelcase()
return "".join(c.next()(x) if x else '_' for x in value.split("_"))
I prefer a regular expression, personally. Here's one that is doing the trick for me:
import re
def to_camelcase(s):
return re.sub(r'(?!^)_([a-zA-Z])', lambda m: m.group(1).upper(), s)
Using unutbu's tests:
tests = [('get__this_value', 'get_ThisValue'),
('_get__this_value', '_get_ThisValue'),
('_get__this_value_', '_get_ThisValue_'),
('get_this_value', 'getThisValue'),
('get__this__value', 'get_This_Value')]
for test, expected in tests:
assert to_camelcase(test) == expected
Here's a simpler one. Might not be perfect for all situations, but it meets my requirements, since I'm just converting python variables, which have a specific format, to camel-case. This does capitalize all but the first word.
def underscore_to_camelcase(text):
"""
Converts underscore_delimited_text to camelCase.
Useful for JSON output
"""
return ''.join(word.title() if i else word for i, word in enumerate(text.split('_')))
I think the code is fine. You've got a fairly complex specification, so if you insist on squashing it into the Procrustean bed of a list comprehension, then you're likely to harm the clarity of the code.
The only changes I'd make would be:
To use the join method to build the result in O(n) space and time, rather than repeated applications of += which is O(n²).
To add a docstring.
Like this:
def underscore_to_camelcase(s):
"""Take the underscore-separated string s and return a camelCase
equivalent. Initial and final underscores are preserved, and medial
pairs of underscores are turned into a single underscore."""
def camelcase_words(words):
first_word_passed = False
for word in words:
if not word:
yield "_"
continue
if first_word_passed:
yield word.capitalize()
else:
yield word.lower()
first_word_passed = True
return ''.join(camelcase_words(s.split('_')))
Depending on the application, another change I would consider making would be to memoize the function. I presume you're automatically translating source code in some way, and you expect the same names to occur many times. So you might as well store the conversion instead of re-computing it each time. An easy way to do that would be to use the #memoized decorator from the Python decorator library.
This algorithm performs well with digit:
import re
PATTERN = re.compile(r'''
(?<!\A) # not at the start of the string
_
(?=[a-zA-Z]) # followed by a letter
''', re.X)
def camelize(value):
tokens = PATTERN.split(value)
response = tokens.pop(0).lower()
for remain in tokens:
response += remain.capitalize()
return response
Examples:
>>> camelize('Foo')
'foo'
>>> camelize('_Foo')
'_foo'
>>> camelize('Foo_')
'foo_'
>>> camelize('Foo_Bar')
'fooBar'
>>> camelize('Foo__Bar')
'foo_Bar'
>>> camelize('9')
'9'
>>> camelize('9_foo')
'9Foo'
>>> camelize('foo_9')
'foo_9'
>>> camelize('foo_9_bar')
'foo_9Bar'
>>> camelize('foo__9__bar')
'foo__9_Bar'
Here's mine, relying mainly on list comprehension, split, and join. Plus optional parameter to use different delimiter:
def underscore_to_camel(in_str, delim="_"):
chunks = in_str.split(delim)
chunks[1:] = [_.title() for _ in chunks[1:]]
return "".join(chunks)
Also, for sake of completeness, including what was referenced earlier as solution from another question as the reverse (NOT my own code, just repeating for easy reference):
first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def camel_to_underscore(in_str):
s1 = first_cap_re.sub(r'\1_\2', name)
return all_cap_re.sub(r'\1_\2', s1).lower()
I agree with Gareth that the code is ok. However, if you really want a shorter, yet readable approach you could try something like this:
def underscore_to_camelcase(value):
# Make a list of capitalized words and underscores to be preserved
capitalized_words = [w.capitalize() if w else '_' for w in value.split('_')]
# Convert the first word to lowercase
for i, word in enumerate(capitalized_words):
if word != '_':
capitalized_words[i] = word.lower()
break
# Join all words to a single string and return it
return "".join(capitalized_words)
The problem calls for a function that returns a lowercase word the first time, but capitalized words afterwards. You can do that with an if clause, but then the if clause has to be evaluated for every word. An appealing alternative is to use a generator. It can return one thing on the first call, and something else on successive calls, and it does not require as many ifs.
def lower_camelcase(seq):
it=iter(seq)
for word in it:
yield word.lower()
if word.isalnum(): break
for word in it:
yield word.capitalize()
def underscore_to_camelcase(text):
return ''.join(lower_camelcase(word if word else '_' for word in text.split('_')))
Here is some test code to show that it works:
tests=[('get__this_value','get_ThisValue'),
('_get__this_value','_get_ThisValue'),
('_get__this_value_','_get_ThisValue_'),
('get_this_value','getThisValue'),
('get__this__value','get_This_Value'),
]
for test,answer in tests:
result=underscore_to_camelcase(test)
try:
assert result==answer
except AssertionError:
print('{r!r} != {a!r}'.format(r=result,a=answer))
Here is a list comprehension style generator expression.
from itertools import count
def underscore_to_camelcase(value):
words = value.split('_')
counter = count()
return ''.join('_' if w == '' else w.capitalize() if counter.next() else w for w in words )
def convert(word):
if not isinstance(word, str):
return word
if word.startswith("_"):
word = word[1:]
words = word.split("_")
_words = []
for idx, _word in enumerate(words):
if idx == 0:
_words.append(_word)
continue
_words.append(_word.capitalize())
return ''.join(_words)
This is the most compact way to do it:
def underscore_to_camelcase(value):
words = [word.capitalize() for word in value.split('_')]
words[0]=words[0].lower()
return "".join(words)
Another regexp solution:
import re
def conv(s):
"""Convert underscore-separated strings to camelCase equivalents.
>>> conv('get')
'get'
>>> conv('_get')
'_get'
>>> conv('get_this_value')
'getThisValue'
>>> conv('__get__this_value_')
'_get_ThisValue_'
>>> conv('_get__this_value__')
'_get_ThisValue_'
>>> conv('___get_this_value')
'_getThisValue'
"""
# convert case:
s = re.sub(r'(_*[A-Z])', lambda m: m.group(1).lower(), s.title(), count=1)
# remove/normalize underscores:
s = re.sub(r'__+|^_+|_+$', '|', s).replace('_', '').replace('|', '_')
return s
if __name__ == "__main__":
import doctest
doctest.testmod()
It works for your examples, but it might fail for names containting digits - it depends how you would capitalize them.
For regexp sake !
import re
def underscore_to_camelcase(value):
def rep(m):
if m.group(1) != None:
return m.group(2) + m.group(3).lower() + '_'
else:
return m.group(3).capitalize()
ret, nb_repl = re.subn(r'(^)?(_*)([a-zA-Z]+)', rep, value)
return ret if (nb_repl > 1) else ret[:-1]
A slightly modified version:
import re
def underscore_to_camelcase(value):
first = True
res = []
for u,w in re.findall('([_]*)([^_]*)',value):
if first:
res.append(u+w)
first = False
elif len(w)==0: # trailing underscores
res.append(u)
else: # trim an underscore and capitalize
res.append(u[:-1] + w.title())
return ''.join(res)
I know this has already been answered, but I came up with some syntactic sugar that handles a special case that the selected answer does not (words with dunders in them i.e. "my_word__is_____ugly" to "myWordIsUgly"). Obviously this can be broken up into multiple lines but I liked the challenge of getting it on one. I added line breaks for clarity.
def underscore_to_camel(in_string):
return "".join(
list(
map(
lambda index_word:
index_word[1].lower() if index_word[0] == 0
else index_word[1][0].upper() + (index_word[1][1:] if len(index_word[1]) > 0 else ""),
list(enumerate(re.split(re.compile(r"_+"), in_string)
)
)
)
)
)
Maybe, pydash works for this purpose (https://pydash.readthedocs.io/en/latest/)
>>> from pydash.strings import snake_case
>>>> snake_case('needToBeSnakeCased')
'get__this_value'
>>> from pydash.strings import camel_case
>>>camel_case('_get__this_value_')
'getThisValue'

Categories