How to explain the behavior of 'Abc123P'.istitle() in Python? - python

I cannot understand the return result of 'istitle()' method of Python marked as 'Incomprehensible' below:
>>> # Comprehensible
...
>>> print 'Abc123'.istitle()
True
>>> # Incomprehensible
...
>>> print 'Abc123P'.istitle()
True
>>> # Comprehensible
...
>>> print 'This is 27Python'.istitle()
False
>>> # Comprehensible
...
>>> print 'ABc123D'.istitle()
False
>>> # Incomprehensible
...
>>> print 'Abc1D'.istitle()
True
The documentation of this method is:
"i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise."
I thought it might be some special behavior of String, say, regard '1D' as a decimal '1', but seem it isn't when I printed it out:
>>> # Check
...
>>> print 'Abc1D'
Abc1D
>>> l = []
>>> l.extend('Abc1D')
>>> print l
['A', 'b', 'c', '1', 'D']
I really cannot understand it, or is this a bug of Python?
I'm using Python 2.7 on Windows 7 Enterprise 64bit.

Take Abc123P as example.
Uppercase characters: A and P. A follows nothing while P follows a decimal digit which is uncased.
Lowercase characters: b and c. b follows A which is cased; c follows b which is also cased.
Thus, Abc123P follows the definition of istitle().

The implementation is here.
It's pretty easy to read. The trick is to realize that numbers are neither upper or lower case, so it causes a reset of the previous_is_cased clause. The same would go for any other non-letter character: Abc&D -> True, ABc&D -> False.
For a more simple explanation, think of your string if you replaced all non-letter characters with spaces. The result of the translated string will be the same as the result of the original.

Related

regex in python using OR for single or double quotation marks

I am trying to write regex in python for either single or double quotation marks from these examples:
animal="cat"
animal="horse"
animal='dog'
animal='cow'
It comes up empty when trying with |
re.compile("animal=\"|'(.+?)\"|'").findall
Please help. Thanks
You can take advantage of back-reference:
r = re.compile(r"""animal=(["'])(.+?)\1""")
This guarantees that the opening and closing characters are the same.
It's time to test it:
assert r.search('animal="cat"').group(2) == "cat"
assert r.search('animal="horse"').group(2) == "horse"
assert r.search("animal='dog'").group(2) == "dog"
assert r.search("animal='cow'").group(2) == "cow"
Your logical OR doesn't works on ' and " instead use a character class :
>>> s="""animal="cat"
...
... animal="horse"
...
... animal='dog'
...
... animal='cow'"""
>>>
>>> re.findall(r"""animal=["'](.+?)["']""",s)
['cat', 'horse', 'dog', 'cow']
>>>

Checking two string in python?

let two strings
s='chayote'
d='aceihkjouty'
the characters in string s is present in d Is there any built-in python function to accomplish this ?
Thanks In advance
Using sets:
>>> set("chayote").issubset("aceihkjouty")
True
Or, equivalently:
>>> set("chayote") <= set("aceihkjouty")
True
I believe you are looking for all and a generator expression:
>>> s='chayote'
>>> d='aceihkjouty'
>>> all(x in d for x in s)
True
>>>
The code will return True if all characters in string s can be found in string d.
Also, if string s contains duplicate characters, it would be more efficient to make it a set using set:
>>> s='chayote'
>>> d='aceihkjouty'
>>> all(x in d for x in set(s))
True
>>>
Try this
for i in s:
if i in d:
print i

How to get the first 2 letters of a string in Python?

Let's say I have a string
str1 = "TN 81 NZ 0025"
two = first2(str1)
print(two) # -> TN
How do I get the first two letters of this string? I need the first2 function for this.
It is as simple as string[:2]. A function can be easily written to do it, if you need.
Even this, is as simple as
def first2(s):
return s[:2]
In general, you can get the characters of a string from i until j with string[i:j].
string[:2] is shorthand for string[0:2]. This works for lists as well.
Learn about Python's slice notation at the official tutorial
t = "your string"
Play with the first N characters of a string with
def firstN(s, n=2):
return s[:n]
which is by default equivalent to
t[:2]
Heres what the simple function would look like:
def firstTwo(string):
return string[:2]
In python strings are list of characters, but they are not explicitly list type, just list-like (i.e. it can be treated like a list). More formally, they're known as sequence (see http://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange):
>>> a = 'foo bar'
>>> isinstance(a, list)
False
>>> isinstance(a, str)
True
Since strings are sequence, you can use slicing to access parts of the list, denoted by list[start_index:end_index] see Explain Python's slice notation . For example:
>>> a = [1,2,3,4]
>>> a[0]
1 # first element, NOT a sequence.
>>> a[0:1]
[1] # a slice from first to second, a list, i.e. a sequence.
>>> a[0:2]
[1, 2]
>>> a[:2]
[1, 2]
>>> x = "foo bar"
>>> x[0:2]
'fo'
>>> x[:2]
'fo'
When undefined, the slice notation takes the starting position as the 0, and end position as len(sequence).
In the olden C days, it's an array of characters, the whole issue of dynamic vs static list sounds like legend now, see Python List vs. Array - when to use?
All previous examples will raise an exception in case your string is not long enough.
Another approach is to use
'yourstring'.ljust(100)[:100].strip().
This will give you first 100 chars.
You might get a shorter string in case your string last chars are spaces.
For completeness: Instead of using def you could give a name to a lambda function:
first2 = lambda s: s[:2]

python convert unicode to string

I got my results from sqlite by python, it's like this kind of tuples: (u'PR:000017512',)
However, I wanna print it as 'PR:000017512'. At first, I tried to select the first one in tuple by using index [0]. But the print out results is still u'PR:000017512'. Then I used str() to convert and nothing changed. How can I print this without u''?
You're confusing the string representation with its value. When you print a unicode string the u doesn't get printed:
>>> foo=u'abc'
>>> foo
u'abc'
>>> print foo
abc
Update:
Since you're dealing with a tuple, you don't get off this easy: You have to print the members of the tuple:
>>> foo=(u'abc',)
>>> print foo
(u'abc',)
>>> # If the tuple really only has one member, you can just subscript it:
>>> print foo[0]
abc
>>> # Join is a more realistic approach when dealing with iterables:
>>> print '\n'.join(foo)
abc
Don't see the problem:
>>> x = (u'PR:000017512',)
>>> print x
(u'PR:000017512',)
>>> print x[0]
PR:000017512
>>>
You the string is in unicode format, but it still means PR:000017512
Check out the docs on String literals
http://docs.python.org/2/reference/lexical_analysis.html#string-literals
In [22]: unicode('foo').encode('ascii','replace')
Out[22]: 'foo'

Alternative to python string item assignment

What is the best / correct way to use item assignment for python string ?
i.e s = "ABCDEFGH" s[1] = 'a' s[-1]='b' ?
Normal way will throw : 'str' object does not support item assignment
Strings are immutable. That means you can't assign to them at all. You could use formatting:
>>> s = 'abc{0}efg'.format('d')
>>> s
'abcdefg'
Or concatenation:
>>> s = 'abc' + 'd' + 'efg'
>>> s
'abcdefg'
Or replacement (thanks Odomontois for reminding me):
>>> s = 'abc0efg'
>>> s.replace('0', 'd')
'abcdefg'
But keep in mind that all of these methods create copies of the string, rather than modifying it in-place. If you want in-place modification, you could use a bytearray -- though that will only work for plain ascii strings, as alexis points out.
>>> b = bytearray('abc0efg')
>>> b[3] = 'd'
>>> b
bytearray(b'abcdefg')
Or you could create a list of characters and manipulate that. This is probably the most efficient and correct way to do frequent, large-scale string manipulation:
>>> l = list('abc0efg')
>>> l[3] = 'd'
>>> l
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> ''.join(l)
'abcdefg'
And consider the re module for more complex operations.
String formatting and list manipulation are the two methods that are most likely to be correct and efficient IMO -- string formatting when only a few insertions are required, and list manipulation when you need to frequently update your string.
Since strings are "immutable", you get the effect of editing by constructing a modified version of the string and assigning it over the old value. If you want to replace or insert to a specific position in the string, the most array-like syntax is to use slices:
s = "ABCDEFGH"
s = s[:3] + 'd' + s[4:] # Change D to d at position 3
It's more likely that you want to replace a particular character or string with another. Do that with re, again collecting the result rather than modifying in place:
import re
s = "ABCDEFGH"
s = re.sub("DE", "--", s)
I guess this Object could help:
class Charray(list):
def __init__(self, mapping=[]):
"A character array."
if type(mapping) in [int, float, long]:
mapping = str(mapping)
list.__init__(self, mapping)
def __getslice__(self,i,j):
return Charray(list.__getslice__(self,i,j))
def __setitem__(self,i,x):
if type(x) <> str or len(x) > 1:
raise TypeError
else:
list.__setitem__(self,i,x)
def __repr__(self):
return "charray['%s']" % self
def __str__(self):
return "".join(self)
For example:
>>> carray = Charray("Stack Overflow")
>>> carray
charray['Stack Overflow']
>>> carray[:5]
charray['Stack']
>>> carray[-8:]
charray['Overflow']
>>> str(carray)
'Stack Overflow'
>>> carray[6] = 'z'
>>> carray
charray['Stack zverflow']
s = "ABCDEFGH" s[1] = 'a' s[-1]='b'
you can use like this
s=s[0:1]+'a'+s[2:]
this is very simple than other complex ways

Categories