Why is lowercase 'p' greater then uppercase 'P'? [duplicate] - python

This question already has answers here:
How are strings compared?
(7 answers)
Closed 13 days ago.
print 'Python' > 'python' # equals False
print 'python' > 'Python' # equals True
Can someone please explain how this is interpreted since p is smaller case then then capital P? But yet p is always greater then P.
Tested on Python 2.7

String comparison in Python is case-sensitive and conventionally uppercase characters go before lowercase characters.
Python compares strings lexicographically, using the constituent characters based on their ASCII or Unicode code points. The same principle applies for Python3.
In ASCII, and therefore in Unicode, lowercase letters are greater than all uppercase letters. Therefore, 'p' > 'P', and indeed, 'a' > 'Z'. In your case, "python" begins with the letter 'p', whereas "Python" begins with the uppercase letter 'P'. They begin with different code points; the lowercase variant is greater.
The convention that lowercase letters are greater than uppercase letters in ASCII is historical.

It may have something to do with the unicode values of the letters.
>>> ord('p')
112
>>> ord('P')
80
112 > 80, therefore 'p' > 'P'

Related

why "a" is bigger than "A" in python? [duplicate]

This question already has answers here:
How are strings compared?
(7 answers)
Closed 1 year ago.
print(4 > 5)
output is False
this is very easy to understand using basic math
print("a" > "A")
output is True
how does python compare a and A ?
Python string comparison is performed using the characters in both strings. The characters in both strings are compared one by one. When different characters are found then their Unicode value is compared. The character with lower Unicode value is considered to be smaller.
The Unicode value of 'A' is 65, whereas the for 'a' it is 97.
The ord() function returns the Unicode value of a character.
ord('A') # returns 65
ord('a') # returns 97
ord('AA') # ERROR: ord() expects a string of length 1.
"a" ascii is 97 --> ord("a")
"A" ascii is 65 --> ord("A")
Hence:
print("a" > "A") --> True

Printing escape sequence character in python [duplicate]

This question already has answers here:
How to get the ASCII value of a character
(5 answers)
Closed 3 years ago.
I tried to print the escape sequence characters or the ASCII representation of numbers in Python in a for loop.
Like:
for i in range(100, 150):
b = "\%d" %i
print(b)
I expected the output like,
A
B
C
Or something.
But I got like,
\100
\101
How to print ASCII representation of the numbers?
There's a builtin function for python called ord and chr
ord is used to get the value of ASCII letter, for example:
print(ord('h'))
The output of the above is 104
ord only support a one length string
chr is inverse of ord
print(chr(104))
The output of the above is 'h'
chr only supports integer. float, string, and byte doesn't support
chr and ord are really important if you want to make a translation of a text file (encoded text file)
You can use the ord() function to print the ASCII value of a character.
print(ord('b'))
> 98
Likewise, you can use the chr() function to print the ASCII character represented by a number.
print(chr(98))
> b

Is it possible to check if a Letter is in a Range of the ASCII Alphabet? [duplicate]

This question already has answers here:
How to detect lowercase letters in Python?
(6 answers)
Closed 4 years ago.
USR_PWD = raw_input ("Please Input A 10 Digit Password")
if USR_PWD[0] == chr(range(65,90))
print "True"
Line 2 does not work, I'm attempting to check and see if the input's first character is a capital letter (65 is A and 90 is Z). Not even sure if this is the best way to go about it either. I'm a beginner so I could be making a very easy mistake but, thanks for the help.
You shouldn't need to use chr. Just check the character is between 'A' and 'Z'.
if 'A' <= USR_PWD[0] <= 'Z':
print "True"
You could also use if USR_PWD[0].isupper(), but that also returns true for lots of characters outside the A-Z range, like Œ.
If you want to know the ASCII code of a character you can use ord().
See here.
In this case your code looks like:
USR_PWD = raw_input ("Please Input A 10 Digit Password")
if ord(USR_PWD[0]) in range(65,90):
print "True"

Capitalize first letter in string if first character is not letter? [duplicate]

This question already has answers here:
python capitalize first letter only
(10 answers)
Closed 9 years ago.
I'd like to capitalize the first letter in a string. The string will be a hash (and therefore mostly numbers), so string.title() won't work, because a string like 85033ba6c would be changed to 85033Ba6C, not 85033Ba6c, because the number seperates words, confusing title(). I'd like to capitalize the first letter of a string, no matter how far into the string the letter is. Is there a function for this?
Using re.sub with count:
>>> strs = '85033ba6c'
>>> re.sub(r'[A-Za-z]',lambda m:m.group(0).upper(),strs,1)
'85033Ba6c'
It is assumed in this answer that there is at least one character in the string where isalpha will return True (otherwise, this raises StopIteration)
i,letter = next(x for x in enumerate(myhash) if x[1].isalpha())
new_string = ''.join((myhash[:i],letter.upper(),myhash[i+1:]))
Here, I pick out the character (and index) of the first alpha character in the string. I turn that character into an uppercase character and I join the rest of the string with it.

Right justify string containing Thai characters

I would like to right justify strings containing Thai characters (Thai rendering doesn't work from left to right, but can go up and down as well).
For example, for the strings ไป (two characters, length 2) and ซื้อ (four characters, length 2) I want to have the following output (length 5):
...ไป
...ซื้อ
The naive
print 'ไป'.decode('utf-8').rjust(5)
print 'ซื้อ'.decode('utf-8').rjust(5)
however, respectively produce
...ไป
.ซื้อ
Any ideas how to get to the desired formatting?
EDIT:
Given a string of Thai characters tc, I want to determine how many [places/fields/positions/whatever you want to call it] the string uses. This is not the same as len(tc); len(tc) is usually larger than the number of places used. The second word gives len(tc) = 4, but has length 2 / uses 2 places / uses 2 positions.
Cause
Thai script contains normal characters (positive advance width) and non-spacing marks as well (zero advance width).
For example, in the word ซื้อ:
the first character is the initial consonant "SO SO",
then it has vowel mark SARA UUE,
then tone mark MAI THO,
and then the final pseudo-consonant O ANG
The problem is that characters ##2 and 3 in the list above are zero-width ones.
In other words, they do not make the string "wider".
In yet other words, ซื้อ ("to buy") and ซอ ("fiddle") would have equal width of two character places (but string lengths of 4 and 2, correspondingly).
Solution
In order to calculate the "real" string length, one must skip zero-width characters.
Python-specific
The unicodedata module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 8.0.0.
The unicodedata.category(unichr) method returns one the following General Category Values:
"Lo" for normal character;
"Mn" for zero-width non-spacing marks;
The rest is obvious, simply filter out the latter ones.
Further info:
Unicode data for Thai script (scroll till the first occurrence of "THAI CHARACTER")
I think what you mean to ask is, how to determine the 'true' # of characters in เรือ, ไป, ซื้อ etc. (which are 3,2 and 2, respectively)
Unfortunately, here's how Python interprets these characters:
ไป
>>> 'ไป'
'\xe0\xb9\x84\xe0\xb8\x9b'
>>> len('ไป')
6
>>> len('ไป'.decode('utf-8'))
2
ซื้อ
>>> 'ซื้อ'
'\xe0\xb8\x8b\xe0\xb8\xb7\xe0\xb9\x89\xe0\xb8\xad'
>>> len('ซื้อ')
12
>>> len('ซื้อ'.decode('utf-8'))
4
เรือ
>>> 'เรือ'
'\xe0\xb9\x80\xe0\xb8\xa3\xe0\xb8\xb7\xe0\xb8\xad'
>>> len('เรือ')
12
>>> len('เรือ'.decode('utf-8'))
4
There's no real correlation between the # of characters displayed and the # of actual (from Python's perspective) characters that make up the string.
I can't think of an obvious way to do this. However, I've found this library which might be of help to you. (You will also need to install some prequisites.
It looks like the rjust() function will not work for you and you will need to count the number of cells in the string yourself. You can then insert the number of spaces required before the string to achieve justification
You seem to know about Thai language. Sum the number of consonants, preceding vowels, following vowels and Thai punctuation. Don't count diacritics and above and below vowels.
Something like (forgive my pseudo Python code),
cells = 0
for i in range (0, len(string))
if (string[i] == \xe31) or ((string[i] >= \xe34) and (string[i] <= \xe3a)) or ((string[i] >= \xe47) and (string[i] <= \xe4e))
# do nothing
else
# consonant, preceding or following vowel or punctuation
cells++
Here's a function to compute the length of a thai string (the number of characters arranged horizontally), based on bytebuster's answer
import unicodedata
def get_thai_string_length(string):
length = 0
for c in string:
if unicodedata.category(c) != 'Mn':
length += 1
return length
print(len('บอินทัช'))
print(get_thai_string_length('บอินทัช'))

Categories