Python UTF-8 Lowercase Turkish Specific Letter

Python UTF-8 Lowercase Turkish Specific Letter - python

with using python 2.7:
>myCity = 'Isparta'
>myCity.lower()
>'isparta'
#-should be-
>'ısparta'
tried some decoding, (like, myCity.decode("utf-8").lower()) but could not find how to do it.
how can lower this kinds of letters? ('I' > 'ı', 'İ' > 'i' etc)
EDIT: In Turkish, lower case of 'I' is 'ı'. Upper case of 'i' is 'İ'

Some have suggested using the tr_TR.utf8 locale. At least on Ubuntu, perhaps related to this bug, setting this locale does not produce the desired result:
import locale
locale.setlocale(locale.LC_ALL, 'tr_TR.utf8')
myCity = u'Isparta İsparta'
print(myCity.lower())
# isparta isparta
So if this bug affects you, as a workaround you could perform this translation yourself:
lower_map = {
ord(u'I'): u'ı',
ord(u'İ'): u'i',
}
myCity = u'Isparta İsparta'
lowerCity = myCity.translate(lower_map)
print(lowerCity)
# ısparta isparta
prints
ısparta isparta

You should use new derived class from unicode from emre's solution
class unicode_tr(unicode):
CHARMAP = {
"to_upper": {
u"ı": u"I",
u"i": u"İ",
},
"to_lower": {
u"I": u"ı",
u"İ": u"i",
}
}
def lower(self):
for key, value in self.CHARMAP.get("to_lower").items():
self = self.replace(key, value)
return self.lower()
def upper(self):
for key, value in self.CHARMAP.get("to_upper").items():
self = self.replace(key, value)
return self.upper()
if __name__ == '__main__':
print unicode_tr("kitap").upper()
print unicode_tr("KİTAP").lower()
Gives
KİTAP
kitap
This must solve your problem.

You can just use .replace() function before changing to upper/lower. In your case:
myCity.replace('I', 'ı').lower()

I forked and redesigned Emre's solution by monkey-patching method to built-in unicode module. The advantage of this new approach is no need to use a subclass of unicode and
redefining unicode strings by my_unicode_string = unicode_tr(u'bla bla bla')
Just importing this module, integrates seamlessly with builtin native unicode strings
https://github.com/technic-programming/unicode_tr
# -*- coding: utf8 -*-
# Redesigned by #guneysus
import __builtin__
from forbiddenfruit import curse
lcase_table = tuple(u'abcçdefgğhıijklmnoöprsştuüvyz')
ucase_table = tuple(u'ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZ')
def upper(data):
data = data.replace('i',u'İ')
data = data.replace(u'ı',u'I')
result = ''
for char in data:
try:
char_index = lcase_table.index(char)
ucase_char = ucase_table[char_index]
except:
ucase_char = char
result += ucase_char
return result
def lower(data):
data = data.replace(u'İ',u'i')
data = data.replace(u'I',u'ı')
result = ''
for char in data:
try:
char_index = ucase_table.index(char)
lcase_char = lcase_table[char_index]
except:
lcase_char = char
result += lcase_char
return result
def capitalize(data):
return data[0].upper() + data[1:].lower()
def title(data):
return " ".join(map(lambda x: x.capitalize(), data.split()))
curse(__builtin__.unicode, 'upper', upper)
curse(__builtin__.unicode, 'lower', lower)
curse(__builtin__.unicode, 'capitalize', capitalize)
curse(__builtin__.unicode, 'title', title)
if __name__ == '__main__':
print u'istanbul'.upper()
print u'İSTANBUL'.lower()

You need to set the proper locale (I'm guessing tr-TR) with locale.setLocale(). Otherwise the default upper-lower mappings will be used, and if that default is en-US, the lowercase version of I is i.

Related

Using SendMessage in python to set text in foreground window

We have an old, legacy database that needs input from another system. SendInput method of data input into database forms is slow and unreliable, setting clipboard and then ^v is not reliable either (I have no idea why, but database interface is very old, early 2000s). After a lot of fiddling I discovered that using SendMessage to set text and then sending VK_RETURN is fast (much faster than SendInput/keybd_event) and reliable with our database. Now this code in plain C works:
HWND fghwnd = GetForegroundWindow();
DWORD threadId = GetWindowThreadProcessId(fghwnd, NULL);
DWORD myId = GetCurrentThreadId();
if (AttachThreadInput(myId, threadId, true)) {
HWND ctrl = GetFocus();
SendMessage(ctrl, WM_SETTEXT, 0, (LPARAM) sendbuf); // TESTING
PostMessage(ctrl, WM_KEYDOWN, VK_RETURN, 0);
PostMessage(ctrl, WM_KEYUP, VK_RETURN, 0);
AttachThreadInput(myId, threadId, false);
} else {
printf("\nError: AttachThreadInput failure!\n");
}
But this one in python does not:
foregroundHwnd = win32gui.GetForegroundWindow()
foregroundThreadID = win32process.GetWindowThreadProcessId(foregroundHwnd)[0]
ourThreadID = win32api.GetCurrentThreadId()
if foregroundThreadID != ourThreadID:
win32process.AttachThreadInput(foregroundThreadID, ourThreadID, True)
focus_whd = win32gui.GetFocus()
win32gui.SendMessage(focus_whd, win32con.WM_SETTEXT, None, "test text")
win32gui.PostMessage(focus_whd, win32con.WM_KEYDOWN, win32con.VK_RETURN, None)
win32gui.PostMessage(focus_whd, win32con.WM_KEYUP, win32con.VK_RETURN, None)
win32process.AttachThreadInput(foregroundThreadID, ourThreadID, False)
The trouble is, most of our new logic in python. I turned that C code into a small python module and it works, but as result now I've got dependency on Microsoft's huge compiler and a lot of fiddling with module building. I'd like to have a python-only solution.
Any ideas why this python code does not work? These system calls look the same...

Yes, AttachThreadInput failed. According to the comment here https://toster.ru/q/79336 win32process.GetWindowThreadProcessId returns wrong value, ctypes must be used. This code works:
"""
Fast "paste" implemented via calls to Windows internals, sends parameter
string and RETURN after that
Usage:
from paste import paste
paste("test")
"""
import time
import random
import string
from ctypes import windll
import ctypes
import win32con
def random_string(string_length=10):
"""Generate a random string of fixed length """
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(string_length))
ERROR_INVALID_PARAMETER = 87
def paste(text_to_paste):
"""Fast "paste" using WM_SETTEXT method + Enter key"""
current_hwnd = windll.user32.GetForegroundWindow()
current_thread_id = windll.kernel32.GetCurrentThreadId()
thread_process_id = windll.user32.GetWindowThreadProcessId(current_hwnd, None)
if thread_process_id != current_thread_id:
res = windll.user32.AttachThreadInput(thread_process_id, current_thread_id, True)
# ERROR_INVALID_PARAMETER means that the two threads are already attached.
if res == 0 and ctypes.GetLastError() != ERROR_INVALID_PARAMETER:
print("WARN: could not attach thread input to thread {0} ({1})"
.format(thread_process_id, ctypes.GetLastError()))
return
focus_whd = windll.user32.GetFocus()
windll.user32.SendMessageW(focus_whd, win32con.WM_SETTEXT, None, text_to_paste)
windll.user32.PostMessageW(focus_whd, win32con.WM_KEYDOWN, win32con.VK_RETURN, None)
windll.user32.PostMessageW(focus_whd, win32con.WM_KEYUP, win32con.VK_RETURN, None)
res = windll.user32.AttachThreadInput(thread_process_id, current_thread_id, True)
if __name__ == '__main__':
time.sleep(5) # time to switch to the target
# paste random 150 char string
paste(random_string(150))

How check application verision information in PySide

I created EXE file with Python (PySide) + PyInstaller. Once I try to use
print QtGui.QApplication.applicationVersion()
I don't see valid version in x.x.x.x format of the application.
Are there any built-in functions in PySide instead of this, or maybe should I use other library for it?
PS. I don't believe that Python doesn't have any methods to extract information about EXE :)

Have a look here.
There's everything you're looking for !
Hope this helps.

Found one way to do it:
import win32api
def get_version_info():
try:
filename = APP_FILENAME
return get_file_properties(filename)['FileVersion']
except BaseException, err:
log_error(err.message)
return '0.0.0.0'
def get_file_properties(filename):
"""
Read all properties of the given file return them as a dictionary.
"""
propNames = ('Comments', 'InternalName', 'ProductName',
'CompanyName', 'LegalCopyright', 'ProductVersion',
'FileDescription', 'LegalTrademarks', 'PrivateBuild',
'FileVersion', 'OriginalFilename', 'SpecialBuild')
props = {'FixedFileInfo': None, 'StringFileInfo': None, 'FileVersion': '0.0.0.0'}
try:
# backslash as parm returns dictionary of numeric info corresponding to VS_FIXEDFILEINFO struc
print filename
fixedInfo = win32api.GetFileVersionInfo(filename, '\\')
props['FixedFileInfo'] = fixedInfo
props['FileVersion'] = "%d.%d.%d.%d" % (fixedInfo['FileVersionMS'] / 65536,
fixedInfo['FileVersionMS'] % 65536, fixedInfo['FileVersionLS'] / 65536,
fixedInfo['FileVersionLS'] % 65536)
# \VarFileInfo\Translation returns list of available (language, codepage)
# pairs that can be used to retreive string info. We are using only the first pair.
lang, codepage = win32api.GetFileVersionInfo(filename, '\\VarFileInfo\\Translation')[0]
# any other must be of the form \StringfileInfo\%04X%04X\parm_name, middle
# two are language/codepage pair returned from above
strInfo = {}
for propName in propNames:
strInfoPath = u'\\StringFileInfo\\%04X%04X\\%s' % (lang, codepage, propName)
## print str_info
strInfo[propName] = win32api.GetFileVersionInfo(filename, strInfoPath)
props['StringFileInfo'] = strInfo
except BaseException, err:
log_error(err.message)
return props

decoding base64 guid in python

I am trying to convert a base64 string back to a GUID style hex number in python and having issues.
Base64 encoded string is: bNVDIrkNbEySjZ90ypCLew==
And I need to get it back to: 2243d56c-0db9-4c6c-928d-9f74ca908b7b
I can do it with the following PHP code but can't work out how to to it in Python
function Base64ToGUID($guid_b64) {
$guid_bin = base64_decode($guid_b64);
return join('-', array(
bin2hex(strrev(substr($guid_bin, 0, 4))),
bin2hex(strrev(substr($guid_bin, 4, 2))),
bin2hex(strrev(substr($guid_bin, 6, 2))),
bin2hex(substr($guid_bin, 8, 2)),
bin2hex(substr($guid_bin, 10, 6))
));
}
Here is the GUIDtoBase64 version:
function GUIDToBase64($guid) {
$guid_b64 = '';
$guid_parts = explode('-', $guid);
foreach ($guid_parts as $k => $part) {
if ($k < 3)
$part = join('', array_reverse(str_split($part, 2)));
$guid_b64 .= pack('H*', $part);
}
return base64_encode($guid_b64);
}
Here are some of the results using some of the obvious and not so obvious options:
import base64
import binascii
>>> base64.b64decode("bNVDIrkNbEySjZ90ypCLew==")
'l\xd5C"\xb9\rlL\x92\x8d\x9ft\xca\x90\x8b{'
>>> binascii.hexlify(base64.b64decode("bNVDIrkNbEySjZ90ypCLew=="))
'6cd54322b90d6c4c928d9f74ca908b7b'

Python port of the existing function (bitstring required)
import bitstring, base64
def base64ToGUID(b64str):
s = bitstring.BitArray(bytes=base64.b64decode(b64str)).hex
def rev2(s_):
def chunks(n):
for i in xrange(0, len(s_), n):
yield s_[i:i+n]
return "".join(list(chunks(2))[::-1])
return "-".join([rev2(s[:8]),rev2(s[8:][:4]),rev2(s[12:][:4]),s[16:][:4],s[20:]])
assert base64ToGUID("bNVDIrkNbEySjZ90ypCLew==") == "2243d56c-0db9-4c6c-928d-9f74ca908b7b"

First off, the b64 string and the resultant GUID doesn't match if we decode properly.
>>> import uuid
>>> import base64
>>> u = uuid.UUID("2243d56c-0db9-4c6c-928d-9f74ca908b7b")
>>> u
UUID('2243d56c-0db9-4c6c-928d-9f74ca908b7b')
>>> u.bytes
'"C\xd5l\r\xb9Ll\x92\x8d\x9ft\xca\x90\x8b{'
>>> base64.b64encode(u.bytes)
'IkPVbA25TGySjZ90ypCLew=='
>>> b = base64.b64decode('bNVDIrkNbEySjZ90ypCLew==')
>>> u2 = uuid.UUID(bytes=b)
>>> print u2
6cd54322-b90d-6c4c-928d-9f74ca908b7b
The base64 encoded version of the resultant GUID you posted is wrong. I'm not sure I understand the way you're encoding the GUID in the first place.
Python has in its arsenal all the tools required for you to be able to answer this problem. However, here's the rough scratching I did in a python terminal:
import uuid
import base64
base64_guid = "bNVDIrkNbEySjZ90ypCLew=="
bin_guid = base64.b64decode(base64_guid)
guid = uuid.UUID(bytes=bin_guid)
print guid
This code should give you enough of a hint to build your own functions. Don't forget, the python shell gives you a powerful tool to test and play with code and ideas. I would investigate using something like IPython notebooks.

I needed to do this to decode a BASE64 UUID that had been dumped from Mongodb. Originally the field had been created by Mongoose. The code I used, based on the code by #tpatja is here:
def base64ToGUID(b64str):
try:
bytes=base64.urlsafe_b64decode(b64str)
except Exception as e:
print("Can't decode base64 ", e)
s = bitstring.BitArray(bytes).hex
return "-".join([s[:8],s[8:][:4],s[12:][:4],s[16:][:4],s[20:]])

Based on good answers above, I wrote a version that does not require the bitstring package and includes validations and support for more input options.
import base64
import regex
import uuid
from typing import Optional
def to_uuid(obj) -> Optional[uuid.UUID]:
if obj is None:
return None
elif isinstance(obj, uuid.UUID):
return obj
elif isinstance(obj, str):
if regex.match(r'[0-9a-fA-F]{8}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{4}[-]{0,1}[0-9a-fA-F]{12}', obj):
return uuid.UUID(hex=obj)
elif regex.match(r'[0-9a-zA-Z\+\/]{22}[\=]{2}', obj):
b64_str = base64.b64decode(obj).hex()
uid_str = '-'.join([b64_str[:8], b64_str[8:][:4], b64_str[12:][:4], b64_str[16:][:4], b64_str[20:]])
return uuid.UUID(hex=uid_str)
raise ValueError(f'{obj} is not a valid uuid/guid')
else:
raise ValueError(f'{obj} is not a valid uuid/guid')

Extracting text from highlighted annotations in a PDF file

Since yesterday I'm trying to extract the text from some highlighted annotations in one pdf, using python-poppler-qt4.
According to this documentation, looks like I have to get the text using the Page.text() method, passing a Rectangle argument from the higlighted annotation, which I get using Annotation.boundary(). But I get only blank text. Can someone help me? I copied my code below and added a link for the PDF I am using. Thanks for any help!
import popplerqt4
import sys
import PyQt4
def main():
doc = popplerqt4.Poppler.Document.load(sys.argv[1])
total_annotations = 0
for i in range(doc.numPages()):
page = doc.page(i)
annotations = page.annotations()
if len(annotations) > 0:
for annotation in annotations:
if isinstance(annotation, popplerqt4.Poppler.Annotation):
total_annotations += 1
if(isinstance(annotation, popplerqt4.Poppler.HighlightAnnotation)):
print str(page.text(annotation.boundary()))
if total_annotations > 0:
print str(total_annotations) + " annotation(s) found"
else:
print "no annotations found"
if __name__ == "__main__":
main()
Test pdf:
https://www.dropbox.com/s/10plnj67k9xd1ot/test.pdf

Looking at the documentation for Annotations it seems that the boundary property Returns this annotation's boundary rectangle in normalized coordinates. Although this seems a strange decision we can simply scale the coordinates by the page.pageSize().width() and .height() values.
import popplerqt4
import sys
import PyQt4
def main():
doc = popplerqt4.Poppler.Document.load(sys.argv[1])
total_annotations = 0
for i in range(doc.numPages()):
#print("========= PAGE {} =========".format(i+1))
page = doc.page(i)
annotations = page.annotations()
(pwidth, pheight) = (page.pageSize().width(), page.pageSize().height())
if len(annotations) > 0:
for annotation in annotations:
if isinstance(annotation, popplerqt4.Poppler.Annotation):
total_annotations += 1
if(isinstance(annotation, popplerqt4.Poppler.HighlightAnnotation)):
quads = annotation.highlightQuads()
txt = ""
for quad in quads:
rect = (quad.points[0].x() * pwidth,
quad.points[0].y() * pheight,
quad.points[2].x() * pwidth,
quad.points[2].y() * pheight)
bdy = PyQt4.QtCore.QRectF()
bdy.setCoords(*rect)
txt = txt + unicode(page.text(bdy)) + ' '
#print("========= ANNOTATION =========")
print(unicode(txt))
if total_annotations > 0:
print str(total_annotations) + " annotation(s) found"
else:
print "no annotations found"
if __name__ == "__main__":
main()
Additionally, I decided to concatenate the .highlightQuads() to get a better representation of what was actually highlighted.
Please be aware of the explicit <space> I have appended to each quad region of text.
In the example document the returned QString could not be passed directly to print() or str(), the solution to this was to use unicode() instead.
I hope this helps someone as it helped me.
Note: Page rotation may affect the scaling values, I have not been able to test this.

Handling Python program arguments in a json file

I am a Python re-newbie. I would like advice on handling program parameters which are in a file in json format. Currently, I am doing something like what is shown below, however, it seems too wordy, and the idea of typing the same literal string multiple times (sometimes with dashes and sometimes with underscores) seems juvenile - error prone - stinky... :-) (I do have many more parameters!)
#!/usr/bin/env python
import sys
import os
import json ## for control file parsing
# control parameters
mpi_nodes = 1
cluster_size = None
initial_cutoff = None
# ...
#process the arguments
if len(sys.argv) != 2:
raise Exception(
"""Usage:
run_foo <controls.json>
Where:
<control.json> is a dictionary of run parameters
"""
)
# We expect a .json file with our parameters
controlsFileName = sys.argv[1]
err = ""
err += "" #validateFileArgument(controlsFileName, exists=True)
# read in the control parameters from the .json file
try:
controls = json.load(open(controlsFileName, "r"))
except:
err += "Could not process the file '" + controlsFileName + "'!\n"
# check each control parameter. The first one is optional
if "mpi-nodes" in controls:
mpi_nodes = controls["mpi-nodes"]
else:
mpi_nodes = controls["mpi-nodes"] = 1
if "cluster-size" in controls:
cluster_size = controls["cluster-size"]
else:
err += "Missing control definition for \"cluster-size\".\n"
if "initial-cutoff" in controls:
initial_cutoff = controls["initial-cutoff"]
else:
err += "Missing control definition for \"initial-cutoff\".\n"
# ...
# Quit if any of these things were not true
if len(err) > 0:
print err
exit()
#...
This works, but it seems like there must be a better way. I am stuck with the requirements to use a json file and to use the hyphenated parameter names. Any ideas?
I was looking for something with more static binding. Perhaps this is as good as it gets.

Usually, we do things like this.
def get_parameters( some_file_name ):
source= json.loads( some_file_name )
return dict(
mpi_nodes= source.get('mpi-nodes',1),
cluster_size= source['cluster-size'],
initial_cutoff = source['initial-cutoff'],
)
controlsFileName= sys.argv[1]
try:
params = get_parameters( controlsFileName )
except IOError:
print "Could not process the file '{0}'!".format( controlsFileName )
sys.exit( 1 )
except KeyError, e:
print "Missing control definition for '{0}'.".format( e.message )
sys.exit( 2 )
A the end params['mpi_nodes'] has the value of mpi_nodes
If you want a simple variable, you do this. mpi_nodes = params['mpi_nodes']
If you want a namedtuple, change get_parameters like this
def get_parameters( some_file_name ):
Parameters= namedtuple( 'Parameters', 'mpi_nodes, cluster_size, initial_cutoff' )
return Parameters( source.get('mpi-nodes',1),
source['cluster-size'],
source['initial-cutoff'],
)
I don't know if you'd find that better or not.

the argparse library is nice, it can handle most of the argument parsing and validation for you as well as printing pretty help screens
[1] http://docs.python.org/dev/library/argparse.html
I will knock up a quick demo showing how you'd want to use it this arvo.

Assuming you have many more parameters to process, something like this could work:
def underscore(s):
return s.replace('-','_')
# parameters with default values
for name, default in (("mpi-nodes", 1),):
globals()[underscore(name)] = controls.get(name, default)
# mandatory parameters
for name in ("cluster-size", "initial-cutoff"):
try:
globals()[underscore(name)] = controls[name]
except KeyError:
err += "Missing control definition for %r" % name
Instead of manipulating globals, you can also make this more explicit:
def underscore(s):
return s.replace('-','_')
settings = {}
# parameters with default values
for name, default in (("mpi-nodes", 1),):
settings[underscore(name)] = controls.get(name, default)
# mandatory parameters
for name in ("cluster-size", "initial-cutoff"):
try:
settings[underscore(name)] = controls[name]
except KeyError:
err += "Missing control definition for %r" % name
# print out err if necessary
mpi_nodes = settings['mpi_nodes']
cluster_size = settings['cluster_size']
initial_cutoff = settings['initial_cutoff']

I learned something from all of these responses - thanks! I would like to get feedback on my approach which incorporates something from each suggestion. In addition to the conditions imposed by the client, I want something:
1) that is fairly obvious to use and to debug
2) that is easy to maintain and modify
I decided to incorporate str.replace, namedtuple, and globals(), creating a ControlParameters namedtuple in the globals() namespace.
#!/usr/bin/env python
import sys
import os
import collections
import json
def get_parameters(parameters_file_name ):
"""
Access all of the control parameters from the json filename given. A
variable of type namedtuple named "ControlParameters" is injected
into the global namespace. Parameter validation is not performed. Both
the names and the defaults, if any, are defined herein. Parameters not
found in the json file will get values of None.
Parameter usage example: ControlParameters.cluster_size
"""
parameterValues = json.load(open(parameters_file_name, "r"))
Parameters = collections.namedtuple( 'Parameters',
"""
mpi_nodes
cluster_size
initial_cutoff
truncation_length
"""
)
parameters = Parameters(
parameterValues.get(Parameters._fields[0].replace('_', '-'), 1),
parameterValues.get(Parameters._fields[1].replace('_', '-')),
parameterValues.get(Parameters._fields[2].replace('_', '-')),
parameterValues.get(Parameters._fields[3].replace('_', '-'))
)
globals()["ControlParameters"] = parameters
#process the program argument(s)
err = ""
if len(sys.argv) != 2:
raise Exception(
"""Usage:
foo <control.json>
Where:
<control.json> is a dictionary of run parameters
"""
)
# We expect a .json file with our parameters
parameters_file_name = sys.argv[1]
err += "" #validateFileArgument(parameters_file_name, exists=True)
if err == "":
get_parameters(parameters_file_name)
cp_dict = ControlParameters._asdict()
for name in ControlParameters._fields:
if cp_dict[name] == None:
err += "Missing control parameter '%s'\r\n" % name
print err
print "Done"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python UTF-8 Lowercase Turkish Specific Letter - python

You can just use .replace() function before changing to upper/lower. In your case: myCity.replace('I', 'ı').lower()

You need to set the proper locale (I'm guessing tr-TR) with locale.setLocale(). Otherwise the default upper-lower mappings will be used, and if that default is en-US, the lowercase version of I is i.

Related

Using SendMessage in python to set text in foreground window

How check application verision information in PySide

decoding base64 guid in python

Extracting text from highlighted annotations in a PDF file

Handling Python program arguments in a json file

Categories

Resources