Parsing strings in python - python

So my problem is this, I have a file that looks like this:
[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1
This would of course translate to
' This is an example file!'
I am looking for a way to parse the original content into the end content, so that a [BACKSPACE] will delete the last character(spaces included) and multiple backspaces will delete multiple characters. The [SHIFT] doesnt really matter as much to me. Thanks for all the help!

Here's one way, but it feels hackish. There's probably a better way.
def process_backspaces(input, token='[BACKSPACE]'):
"""Delete character before an occurence of "token" in a string."""
output = ''
for item in (input+' ').split(token):
output += item
output = output[:-1]
return output
def process_shifts(input, token='[SHIFT]'):
"""Replace characters after an occurence of "token" with their uppecase
equivalent. (Doesn't turn "1" into "!" or "2" into "#", however!)."""
output = ''
for item in (' '+input).split(token):
output += item[0].upper() + item[1:]
return output
test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1'
print process_backspaces(process_shifts(test_string))

If you don't care about the shifts, just strip them, load
(defun apply-bspace ()
(interactive)
(let ((result (search-forward "[BACKSPACE]")))
(backward-delete-char 12)
(when result (apply-bspace))))
and hit M-x apply-bspace while viewing your file. It's Elisp, not python, but it fits your initial requirement of "something I can download for free to a PC".
Edit: Shift is trickier if you want to apply it to numbers too (so that [SHIFT]2 => #, [SHIFT]3 => #, etc). The naive way that works on letters is
(defun apply-shift ()
(interactive)
(let ((result (search-forward "[SHIFT]")))
(backward-delete-char 7)
(upcase-region (point) (+ 1 (point)))
(when result (apply-shift))))

This does exactly what you want:
def shift(s):
LOWER = '`1234567890-=[];\'\,./'
UPPER = '~!##$%^&*()_+{}:"|<>?'
if s.isalpha():
return s.upper()
else:
return UPPER[LOWER.index(s)]
def parse(input):
input = input.split("[BACKSPACE]")
answer = ''
i = 0
while i<len(input):
s = input[i]
if not s:
pass
elif i+1<len(input) and not input[i+1]:
s = s[:-1]
else:
answer += s
i += 1
continue
answer += s[:-1]
i += 1
return ''.join(shift(i[0])+i[1:] for i in answer.split("[SHIFT]") if i)
>>> print parse("[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1")
>>> This is an example file!

It seems that you could use a regular expression to search for (something)[BACKSPACE] and replace it with nothing...
re.sub('.?\[BACKSPACE\]', '', YourString.replace('[SHIFT]', ''))
Not sure what you meant by "multiple spaces delete multiple characters".

You need to read the input, extract the tokens, recognize them, and give a meaning to them.
This is how I would do it:
# -*- coding: utf-8 -*-
import re
upper_value = {
1: '!', 2:'"',
}
tokenizer = re.compile(r'(\[.*?\]|.)')
origin = "[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1"
result = ""
shift = False
for token in tokenizer.findall(origin):
if not token.startswith("["):
if(shift):
shift = False
try:
token = upper_value[int(token)]
except ValueError:
token = token.upper()
result = result + token
else:
if(token == "[SHIFT]"):
shift = True
elif(token == "[BACKSPACE]"):
result = result[0:-1]
It's not the fastest, neither the elegant solution, but I think it's a good start.
Hope it helps :-)

Related

Parsing a string containing code into a list / tree in python

as the title suggests I'm trying to parse a piece of code into a tree or a list.
First off I would like to thank for any contribution and time spent on this.
So far my code is doing what I expect, yet I am not sure that this is the optimal / most generic way to do this.
Problem
1. I want to have a more generic solution since in the future I am going to need further analysis of this sintax.
2. I am unable right now to separate the operators like '=' or '>=' as you can see below in the output I share.
In the future I might change the content of the list / tree from strings to tuples so i can identify the kind of operator (parameter, comparison like = or >= ....). But this is not a real need right now.
Research
My first attempt was parsing the text character by character, but my code was getting too messy and barely readable, so I assumed that I was doing something wrong there (I don't have that code to share here anymore)
So i started looking around how people where doing it and found some approaches that didn't necessarily fullfil the requirements of simplicity and generic.
I would share the links to the sites but I didn't keep track of them.
The Syntax of the code
The syntax is pretty simple, after all I'm no interested in types or any further detail. just the functions and parameters.
strings are defined as 'my string', variables as !variable and numbers as in any other language.
Here is a sample of code:
db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)
My Output
Here my output is partialy correct since I'm still unable to separate the "= '3'" part (of course I have to separate it because in this case its a comparison operator and not part of a string)
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, "= '3'", "'4'", "'5'"]}, '6']}]
Desired Output
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, "=", "'3'", "'4'", "'5'"]}, '6']}]
My code so far
The parseRecursive method is the entry point.
import re
class FileParser:
#order is important to avoid miss splits
COMPARATOR_SIGN = {
'#='
,'#<>'
,'<>'
,'>='
,'<='
,'='
,'>'
,'<'
}
def __init__(self):
pass
def __charExistsInOccurences(self,current_needle, needles, text):
"""
check if other needles are present in text
current_needle : string -> the current needle being evaluated
needles : list -> list of needles
text : string/list<string> -> a string or a list of string to evaluate
"""
#if text is a string convert it to list of strings
text = text if isinstance(text, list) else [text]
exists = False
for t in text:
#check if needle is inside text value
for needle in needles:
#dont check the same key
if needle != current_needle:
regex_search_needle = split_regex = '\s*'+'\s*'.join(needle) + '\s*'
#list of 1's and 0's . 1 if another character is found in the string.
found = [1 if re.search(regex_search_needle, x) else 0 for x in t]
if sum(found) > 0:
exists = True
break
return exists
def findOperator(self, needles, haystack):
"""
split parameters from operators
needles : list -> list of operators
haystack : string
"""
string_open = haystack.find("'")
#if no string has been found set the index to 0
if string_open < 0:
string_open = 0
occurences = []
string_closure = haystack.rfind("'")
operator = ''
for needle in needles:
#regex to ignore the possible spaces between characters of the needle
split_regex = '\s*'+'\s*'.join(needle) + '\s*'
#parse parameters before and after the string
before_string = re.split(split_regex, haystack[0:string_open])
after_string = re.split(split_regex, haystack[string_closure+1:])
#check if any other needle exists in the results found
before_string_exists = self.__charExistsInOccurences(needle, needles, before_string)
after_string_exists = self.__charExistsInOccurences(needle, needles, after_string)
#if the operator has been found merge the results with the occurences and assign the operator
if not before_string_exists and not after_string_exists:
occurences.extend(before_string)
occurences.extend([haystack[string_open:string_closure+1]])
occurences.extend(after_string)
operator = needle
#filter blank spaces generated
occurences = list(filter(lambda x: len(x.strip())>0,occurences))
result_check = [1 if x==haystack else 0 for x in occurences]
#if the haystack was originaly a simple string like '1' the occurences list is going to be filled with the same character over and over due to the before string an after string part
if len(result_check) == sum(result_check):
occurences= [haystack]
operator = ''
return operator, occurences
def parseRecursive(self,text):
"""
parse a block of text
text : string
"""
assert(len(text) < 1, "text is empty")
function_open = text.find('(')
accumulated_params = []
if function_open > -1:
#there is another function nested
text_prev_function = text[0:function_open]
#find last space coma or equal to retrieve the function name
last_space = -1
for j in range(len(text_prev_function)-1, 0 , -1):
if text_prev_function[j] == ' ' or text_prev_function[j] == ',' or text_prev_function[j] == '=':
last_space = j
break
func_name = ''
if last_space > -1:
#there is something else behind the function name
func_name = text_prev_function[last_space+1:]
#no parentesis before so previous characters from function name are parameters
text_prev_func_params = list(filter(lambda x: len(x.strip())>0,text_prev_function[:last_space+1].split(',')))
text_prev_func_params = [x.strip() for x in text_prev_func_params]
#debug here
#accumulated_params.extend(text_prev_func_params)
for itext_prev in text_prev_func_params:
operator, text_prev_operator = self.findOperator(self.COMPARATOR_SIGN,itext_prev)
if operator == '':
accumulated_params.extend(text_prev_operator)
else:
text_prev_operator.append(operator)
accumulated_params.extend(text_prev_operator)
#accumulated_params.extend(text_prev_operator)
else:
#function name is the start of the string
func_name = text_prev_function[0:].strip()
#find the closure of parentesis
function_close = text.rfind(')')
#parse the next function and extend the current list of parameters
next_func = text[function_open+1:function_close]
func_params = {func_name : self.parseRecursive(next_func)}
accumulated_params.append(func_params)
#
# parameters after the function
#
new_text = text[function_close+1:]
accumulated_params.extend(self.parseRecursive(new_text))
else:
#there is no other function nested
split_text = text.split(',')
current_func_params = list(filter(lambda x: len(x.strip())>0,split_text))
current_func_params = [x.strip() for x in current_func_params]
accumulated_params.extend(current_func_params)
#accumulated_params = list(filter(lambda x: len(x.strip())>0,accumulated_params))
return accumulated_params
text = "db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)"
obj = FileParser()
print(obj.parseRecursive(text))
You can use pyparsing to deal with such a case.
* pyparsing can be installed by pip install pyparsing
Code:
import pyparsing as pp
# A parsing pattern
w = pp.Regex(r'(?:![^(),]+)|[^(), ]+') ^ pp.Suppress(',')
pattern = w + pp.nested_expr('(', ')', content=w)
# A recursive function to transform a pyparsing result into your desirable format
def transform(elements):
stack = []
for e in elements:
if isinstance(e, list):
key = stack.pop()
stack.append({key: transform(e)})
else:
stack.append(e)
return stack
# A sample
string = "db('1', '2', if(ATTRS('Dim 1', !Element Structure, 'ID') = '3','4','5'), 6)"
# Operations to parse the sample string
elements = pattern.parse_string(string).as_list()
result = transform(elements)
# Assertion
assert result == [{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, '=', "'3'", "'4'", "'5'"]}, '6']}]
# Show the result
print(result)
Output:
[{'db': ["'1'", "'2'", {'if': [{'ATTRS': ["'Dim 1'", '!Element Structure', "'ID'"]}, '=', "'3'", "'4'", "'5'"]}, '6']}]
Note:
If there is an unbalanced parenthesis inside () (for example a(b(c), a(b)c), etc), an unexpected result is obtained or an IndexError is raised. So be careful in such cases.
At the moment, only a single sample is available to make a pattern to parse string. So if you encounter a parsing error, provide more examples in your question.

Replace character in parentheses with another

I need to replace all occurrences of dots but only if the dot is in parenteses, with something else (semicolon for example), using python like this:
Input: "Hello (This . will be replaced, this one. too)."
Output:"Hello (This ; will be replaced, this one; too)."
Assuming the parentheses are balanced and not nested, here's an idea with re.split.
>>> import re
>>>
>>> s = 'Hello (This . will be replaced, this one. too). This ... not but this (.).'
>>> ''.join(m.replace('.', ';') if m.startswith('(') else m
...: for m in re.split('(\([^)]+\))', s))
...:
'Hello (This ; will be replaced, this one; too). This ... not but this (;).'
The main trick here is to wrap the regex \([^)]+\) with another pair of () such that the splitting-matches are kept.
Loop over characters in string, track number of opening and closing parentheses, only replace if more opening than closing parentheses encountered.
def replace_inside_parentheses(string, find_string, replace_string):
bracket_count = 0
return_string = ""
for a in string:
if a == "(":
bracket_count += 1
elif a == ")":
bracket_count -= 1
if bracket_count > 0:
return_string += a.replace(find_string, replace_string)
else:
return_string += a
return return_string
my_str = "Hello (This . will be replaced, this one. too, (even this one . inside nested parentheses!))."
print(my_str)
print(replace_inside_parentheses(my_str, ".", ";"))
Not the most elegant way, but this should work.
def sanitize(string):
string = string.split("(",1)
string0 = str(string[0])+"("
string1 = str(string[1]).split(")",1)
ending = str(")"+string1[1])
middle = str(string1[0])
# replace second "" with character you'd like to replace with
# I.E. middle.replace(".","!")
middle = middle.replace(".","").replace(";","")
stringBackTogether = string0+middle+ending
return stringBackTogether
a = sanitize("Hello (This . will be replaced, this one. too).")
print(a)

How do I reverse words in a string with Python

I am trying to reverse words of a string, but having difficulty, any assistance will be appreciated:
S = " what is my name"
def reversStr(S):
for x in range(len(S)):
return S[::-1]
break
What I get now is: eman ym si tahw
However, I am trying to get: tahw is ym eman (individual words reversed)
def reverseStr(s):
return ' '.join([x[::-1] for x in s.split(' ')])
orig = "what is my name"
reverse = ""
for word in orig.split():
reverse = "{} {}".format(reverse, word[::-1])
print(reverse)
Since everyone else's covered the case where the punctuation moves, I'll cover the one where you don't want the punctuation to move.
import re
def reverse_words(sentence):
return re.sub(r'[a-zA-Z]+', lambda x : x.group()[::-1], sentence)
Breaking this down.
re is python's regex module, and re.sub is the function in that module that handles substitutions. It has three required parameters.
The first is the regex you're matching by. In this case, I'm using r'\w+'. The r denotes a raw string, [a-zA-Z] matches all letters, and + means "at least one".
The second is either a string to substitute in, or a function that takes in a re.MatchObject and outputs a string. I'm using a lambda (or nameless) function that simply outputs the matched string, reversed.
The third is the string you want to do a find in a replace in.
So "What is my name?" -> "tahW si ym eman?"
Addendum:
I considered a regex of r'\w+' initially, because better unicode support (if the right flags are given), but \w also includes numbers and underscores. Matching - might also be desired behavior: the regexes would be r'[a-zA-Z-]+' (note trailing hyphen) and r'[\w-]+' but then you'd probably want to not match double-dashes (ie --) so more regex modifications might be needed.
The built-in reversed outputs a reversed object, which you have to cast back to string, so I generally prefer the [::-1] option.
inplace refers to modifying the object without creating a copy. Yes, like many of us has already pointed out that python strings are immutable. So technically we cannot reverse a python string datatype object inplace. However, if you use a mutable datatype, say bytearray for storing the string characters, you can actually reverse it inplace
#slicing creates copy; implies not-inplace reversing
def rev(x):
return x[-1::-1]
# inplace reversing, if input is bytearray datatype
def rev_inplace(x: bytearray):
i = 0; j = len(x)-1
while i<j:
t = x[i]
x[i] = x[j]
x[j] = t
i += 1; j -= 1
return x
Input:
x = bytearray(b'some string to reverse')
rev_inplace(x)
Output:
bytearray(b'esrever ot gnirts emose')
Try splitting each word in the string into a list (see: https://docs.python.org/2/library/stdtypes.html#str.split).
Example:
>>string = "This will be split up"
>>string_list = string.split(" ")
>>string_list
>>['This', 'will', 'be', 'split', 'up']
Then iterate through the list and reverse each constituent list item (i.e. word) which you have working already.
def reverse_in_place(phrase):
res = []
phrase = phrase.split(" ")
for word in phrase:
word = word[::-1]
res.append(word)
res = " ".join(res)
return res
[thread has been closed, but IMO, not well answered]
the python string.lib doesn't include an in place str.reverse() method.
So use the built in reversed() function call to accomplish the same thing.
>>> S = " what is my name"
>>> ("").join(reversed(S))
'eman ym si tahw'
There is no obvious way of reversing a string "truly" in-place with Python. However, you can do something like:
def reverse_string_inplace(string):
w = len(string)-1
p = w
while True:
q = string[p]
string = ' ' + string + q
w -= 1
if w < 0:
break
return string[(p+1)*2:]
Hope this makes sense.
In Python, strings are immutable. This means you cannot change the string once you have created it. So in-place reverse is not possible.
There are many ways to reverse the string in python, but memory allocation is required for that reversed string.
print(' '.join(word[::-1] for word in string))
s1 = input("Enter a string with multiple words:")
print(f'Original:{s1}')
print(f'Reverse is:{s1[::-1]}')
each_word_new_list = []
s1_split = s1.split()
for i in range(0,len(s1_split)):
each_word_new_list.append(s1_split[i][::-1])
print(f'New Reverse as List:{each_word_new_list}')
each_word_new_string=' '.join(each_word_new_list)
print(f'New Reverse as String:{each_word_new_string}')
If the sentence contains multiple spaces then usage of split() function will cause trouble because you won't know then how many spaces you need to rejoin after you reverse each word in the sentence. Below snippet might help:
# Sentence having multiple spaces
given_str = "I know this country runs by mafia "
tmp = ""
tmp_list = []
for i in given_str:
if i != ' ':
tmp = tmp + i
else:
if tmp == "":
tmp_list.append(i)
else:
tmp_list.append(tmp)
tmp_list.append(i)
tmp = ""
print(tmp_list)
rev_list = []
for x in tmp_list:
rev = x[::-1]
rev_list.append(rev)
print(rev_list)
print(''.join(rev_list))
output:
def rev(a):
if a == "":
return ""
else:
z = rev(a[1:]) + a[0]
return z
Reverse string --> gnirts esreveR
def rev(k):
y = rev(k).split()
for i in range(len(y)-1,-1,-1):
print y[i],
-->esreveR gnirts

Make multiple modifications to string: how to, being inmutable in Python?

I'm new to Python, so maybe I'm asking for something very easy but I can't think of the problem in a Python way.
I have a compressed string. The idea is, if a character gets repeated 4-15 times, I make this change:
'0000' ---> '0|4'
If more than 15 times, I use a slash and two digits to represent the amount (working with hexadecimal values):
'00...(16 times)..0' ---> '0/10'
So, accustomed to other languages, my approach is the following:
def uncompress(line):
verticalBarIndex = line.index('|')
while verticalBarIndex!=-1:
repeatedChar = line[verticalBarIndex-1:verticalBarIndex]
timesRepeated = int(line[verticalBarIndex+1:verticalBarIndex+2], 16)
uncompressedChars = [repeatedChar]
for i in range(timesRepeated):
uncompressedChars.append(repeatedChar)
uncompressedString = uncompressedChars.join()
line = line[:verticalBarIndex-1] + uncompressedString + line[verticalBarIndex+2:]
verticalBarIndex = line.index('|') #next one
slashIndex = line.index('/')
while slashIndex!=-1:
repeatedChar = line[slashIndex-1:slashIndex]
timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16)
uncompressedChars = [repeatedChar]
for i in range(timesRepeated):
uncompressedChars.append(repeatedChar)
uncompressedString = uncompressedChars.join()
line = line[:slashIndex-1] + uncompressedString + line[slashIndex+3:]
slashIndex = line.index('/') #next one
return line
Which I know it is wrong, since strings are inmutable in Python, and I am changing line contents all the time until no '|' or '/' are present.
I know UserString exists, but I guess there is an easier and more Pythonish way of doing it, which would be great to learn.
Any help?
The changes necessary to get your code running with the sample strings:
Change .index() to .find(). .index() raises an exception if the substring isn't found, .find() returns -1.
Change uncompressedChars.join() to ''.join(uncompressedChars).
Change timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16) to timesRepeated = int(line[slashIndex+1:slashIndex+3], 16)
Set uncompressedChars = [] to start with, instead of uncompressedChars = [repeatedChar].
This should get it working properly. There are a lot of places where the code an be tidied and otpimised, but this works.
The most common pattern I have seen is to use a list of characters. Lists are mutable and work as you describe above.
To create a list from a string
mystring = 'Hello'
mylist = list(mystring)
To create a string from a list
mystring = ''.join(mylist)
You should build a list of substrings as you go and join them at the end:
def uncompress(line):
# No error checking, sorry. Will crash with empty strings.
result = []
chars = iter(line)
prevchar = chars.next() # this is the previous character
while True:
try:
curchar = chars.next() # and this is the current character
if curchar == '|':
# current character is a pipe.
# Previous character is the character to repeat
# Get next character, the number of repeats
curchar = chars.next()
result.append(prevchar * int(curchar, 16))
elif curchar == '/':
# current character is a slash.
# Previous character is the character to repeat
# Get two next characters, the number of repeats
curchar = chars.next()
nextchar = chars.next()
result.append(prevchar * int(curchar + nextchar, 16))
else:
# No need to repeat the previous character, append it to result.
result.append(curchar)
prevchar = curchar
except StopIteration:
# No more characters. Append the last one to result.
result.append(curchar)
break
return ''.join(result)

Problems title-casing a string in Python

I have a name as a string, in this example "markus johansson".
I'm trying to code a program that makes 'm' and 'j' uppercase:
name = "markus johansson"
for i in range(1, len(name)):
if name[0] == 'm':
name[0] = "M"
if name[i] == " ":
count = name[i] + 1
if count == 'j':
name[count] = 'J'
I'm pretty sure this should work, but it gives me this error:
File "main.py", line 5 in <module>
name[0] = "M"
TypeError: 'str' object does support item assignment
I know there is a library function called .title(), but I want to do "real programming".
How do I fix this?
I guess that what you're trying to achieve is:
from string import capwords
capwords(name)
Which yields:
'Markus Johansson'
EDIT: OK, I see you want to tear down a open door.
Here's low level implementation.
''.join([char.upper() if prev==' ' else char for char,prev in zip(name,' '+name)])
>>> "markus johansson".title()
'Markus Johansson'
Built in string methods are the way to go.
EDIT:
I see you want to re-invent the wheel. Any particular reason ?
You can choose from any number of convoluted methods like:
' '.join(j[0].upper()+j[1:] for j in "markus johansson".split())
Standard Libraries are still the way to go.
string.capwords() (defined in string.py)
# Capitalize the words in a string, e.g. " aBc dEf " -> "Abc Def".
def capwords(s, sep=None):
"""capwords(s, [sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. Note that this replaces runs of whitespace characters by
a single space.
"""
return (sep or ' ').join(x.capitalize() for x in s.split(sep))
str.title() (defined in stringobject.c)
PyDoc_STRVAR(title__doc__,
"S.title() -> string\n\
\n\
Return a titlecased version of S, i.e. words start with uppercase\n\
characters, all remaining cased characters have lowercase.");
static PyObject*
string_title(PyStringObject *self)
{
char *s = PyString_AS_STRING(self), *s_new;
Py_ssize_t i, n = PyString_GET_SIZE(self);
int previous_is_cased = 0;
PyObject *newobj = PyString_FromStringAndSize(NULL, n);
if (newobj == NULL)
return NULL;
s_new = PyString_AsString(newobj);
for (i = 0; i < n; i++) {
int c = Py_CHARMASK(*s++);
if (islower(c)) {
if (!previous_is_cased)
c = toupper(c);
previous_is_cased = 1;
} else if (isupper(c)) {
if (previous_is_cased)
c = tolower(c);
previous_is_cased = 1;
} else
previous_is_cased = 0;
*s_new++ = c;
}
return newobj;
}
str.title() in pure Python
class String(str):
def title(self):
s = []
previous_is_cased = False
for c in self:
if c.islower():
if not previous_is_cased:
c = c.upper()
previous_is_cased = True
elif c.isupper():
if previous_is_cased:
c = c.lower()
previous_is_cased = True
else:
previous_is_cased = False
s.append(c)
return ''.join(s)
Example:
>>> s = ' aBc dEf '
>>> import string
>>> string.capwords(s)
'Abc Def'
>>> s.title()
' Abc Def '
>>> s
' aBc dEf '
>>> String(s).title()
' Abc Def '
>>> String(s).title() == s.title()
True
Strings are immutable. They can't be changed. You must create a new string with the changed content.
If you want to make every 'j' uppercase:
def make_uppercase_j(char):
if char == 'j':
return 'J'
else:
return char
name = "markus johansson"
''.join(make_uppercase_j(c) for c in name)
If you're looking into more generic solution for names, you should also look at following examples:
John Adams-Smith
Joanne d'Arc
Jean-Luc de'Breu
Donatien Alphonse François de Sade
Also some parts of the names shouldn't start with capital letters, like:
Herbert von Locke
Sander van Dorn
Edwin van der Sad
so, if you're looking into creating a more generic solution, keep all those little things in mind.
(This would be a perfect place to run a test-driven development, with all those conditions your method/function must follow).
If I understand your original algorithm correctly, this is what you want to do:
namn = list("markus johansson")
if namn[0] == 'm':
namn[0] = "M"
count = 0
for i in range(1, len(namn)):
if namn[i] == " ":
count = i + 1
if count and namn[count] == 'j':
namn[count] = 'J'
print ''.join(namn)
Of course, there's a million better ways ("wannabe" ways) to do what you're trying to do, like as shown in vartec's answer. :)
As it stands, your code only works for names that start with a J and an M for the first and last names, respectively.
Plenty of good suggestions, so I'll be in good company adding my own 2 cents :-)
I'm assuming you want something a little more generic that can handle more than just names starting with 'm' and 'j'. You'll probably also want to consider hyphenated names (like Markus Johnson-Smith) which have caps after the hyphen too.
from string import lowercase, uppercase
name = 'markus johnson-smith'
state = 0
title_name = []
for c in name:
if c in lowercase and not state:
c = uppercase[lowercase.index(c)]
state = 1
elif c in [' ', '-']:
state = 0
else:
state = 1 # might already be uppercase
title_name.append(c)
print ''.join(title_name)
Last caveat is the potential for non-ascii characters. Using the uppercase and lowercase properties of the string module is good in this case becase their contents change depending on the user's locale (ie: system-dependent, or when locale.setlocale() is called). I know you want to avoid using upper() for this exercise, and that's quite neat... as an FYI, upper() uses the locale controlled by setlocale() too, so the practice of use uppercase and lowercase is a good use of the API without getting too high-level. That said, if you need to handle, say, French names on a system running an English locale, you'll need a more robust implementation.
"real programming"?
I would use .title(), and I'm a real programmer.
Or I would use regular expressions
re.sub(r"(^|\s)[a-z]", lambda m: m.group(0).upper(), "this is a set of words")
This says "If the start of the text or a whitespace character is followed by a lower-case letter" (in English - other languages are likely not supported), then for each match convert the match text to upper-case. Since the match text is the space and the lower-case letter, this works just fine.
If you want it as low-level code then the following works. Here I only allow space as the separator (but you may want to support newline and other characters). On the other hand, "string.lowercase" is internationalized, so if you're in another locale then it will, for the most part, still work. If you don't want that then use string.ascii_lowercase.
import string
def title(s):
# Capitalize the first character
if s[:1] in string.lowercase:
s = s[0].upper() + s[1:]
# Find spaces
offset = 0
while 1:
offset = s.find(" ", offset)
# Reached the end of the string or the
# last character is a space
if offset == -1 or offset == len(s)-1:
break
if s[offset+1:offset+2] in string.lowercase:
# Is it followed by a lower-case letter?
s = s[:offset+1] + s[offset+1].upper() + s[offset+2:]
# Skip the space and the letter
offset += 2
else:
# Nope, so start searching for the next space
offset += 1
return s
To elaborate on my comment to this answer, this question can only be an exercise for curiosity's sake. Real names have special capitalization rules: the "van der" in "Johannes Diderik van der Waals" is never capitalized, "Farrah Fawcett-Majors" has the "M", and "Cathal Ó hEochaidh" uses the non-ASCII Ó and h, which modify "Eochaidh" to mean "grandson of Eochaidh".
string = 'markus johansson'
string = ' '.join(substring[0].upper() + substring[1:] for substring in string.split(' '))
# string == 'Markus Johansson'

Categories