How can I convert this to unicode so it displays properly?

How can I convert this to unicode so it displays properly? - python

I'm querying a database which from the MySQL workbench returns the following value:
VitÃ³ria da Conquista
which should be displayed as:
Vitória da Conquista
No matter what I've tried I can't get convert 'Vit\xc3\xb3ria da Conquista' into 'Vitória da Conquista'
#Querying MySQL "world" database
print "====================================="
query = 'select name from city where id=283;'
cursor.execute(query)
cities = cursor.fetchall()
print cities
for city in cities:
cs = str(city)
cs = cs[3:-3].decode('utf-8')
print cs
print cs.decode('utf-8')
print cs.encode('ascii','ignore')
the output of which looks like:
=====================================
[(u'Vit\xc3\xb3ria da Conquista',)]
Vit\xc3\xb3ria da Conquista
Vit\xc3\xb3ria da Conquista
Vit\xc3\xb3ria da Conquista

Well, this actually worked. I'm not sure why however. But I am getting the correct value of Vitória da Conquista. I would like to understand what is happening however.
#Querying MySQL "world" database
query = 'SELECT CONVERT(CAST(Name as BINARY) USING utf8) from city where id = 283;'
cursor.execute(query)
cities = cursor.fetchall()
for tup in cities:
cs=tup[0]
print cs

If the data coming in is in UTF-8 (which looks like it is), use (in Python 2), unicode() to convert it from bytes to a Python Unicode string:
cs = unicode(cs[3:-3], "utf-8")
Basic rule: inside your code, always use Unicode strings. Convert with unicode() input data and with encode() output data.

You are getting unicode strings back, stored in a list of tuples, which is what fetchall does. So you don't need to encode or decode at all. Just try this:
#Querying MySQL "world" database
print "====================================="
query = 'select name from city where id=283;'
cursor.execute(query)
cities = cursor.fetchall()
for tup in cities:
cs = tup[0]
print cs
If this doesn't print right, then you probably have issues with your terminal, as mentioned by #Jarrod Roberson. The only other possibility is that the data was entered into, or is being returned from, the database with the wrong (unexpected) encoding.

Related

How do I use escape caracter in SQL to query with LIKE? (Python)

Trying but failing to use LIKE in my Query.
This works fine, using ? and =:
def read_part_of_database(table_column, user_query):
c_lime.execute("SELECT * FROM lime_database_table"
"WHERE {} = ? ORDER BY time_start".format(table_column), [(user_query)])
for row in c_lime.fetchall():
print(row)
But with this, the user needs to input the exact and full query as it is presented in the database. So instead I want to use LIKE. I have tried the following, but none of it seems to work:
c_lime.execute("SELECT * FROM lime_database_table WHERE {} LIKE %s"
"ORDER BY time_start".format(table_column), ["%" + user_query + "%"])
This gives me error:
c_lime.execute("SELECT * FROM lime_database_table WHERE {} LIKE %s ORDER BY time_start".format(table_column), ["%" + user_query + "%"])
sqlite3.OperationalError: near "%": syntax error
I have tried a few more variations of this, taken from SO and other sources, but none of it seems to work. Is there something I'm doing wrong with the escape %s?
user_query is, when using = ?, a date: 2/14/2015 1:00:00 PM and when using LIKE I take a part of that: 2/14/2015

To use % character in a query with LIKE, enclose it in single quotation marks:
c_lime.execute("""SELECT * FROM lime_database_table WHERE {} LIKE '%{}'
ORDER BY time_start""".format(table_column, user_query))

How to fix python hardcoded dictionary encoding issue

Error:
pymysql.err.InternalError: (1366, "Incorrect string value: '\\xEF\\xBF\\xBD 20...' for column 'history' at row 1")
I've received a few variations of this as I've tried to tweak my dictionary, always in the history column, the only variations is the characters it tells me are issues.
I can't post the dictionary because it's got sensitive information, but here is the jist:
I started with 200 addresses (including state, zip, etc) that needed
to be validated, normalized and standardized for DB insertion.
I spent a lot of time on google maps validating and standardizing.
I decided to get fancy, and put all the crazy accented letters in the addresses of these world addresses (often copies from google because I don't know how to type and A with an o over it, lol), Singapore to Brazil, everywhere.
I ended up with 120 unique addresses in my dictionary after processing.
Everything works 100% perfectly when INSERTING the data in SQLite and OUTPUTING to a CSV. The issue is exclusively with MySQL and some sneaky un-viewable characters.
Note: I used this to remove the accents after 7 hours of copy/pasting to notepad, encoding it with notepad++ and just trying to processes the data in a way that made it all the correct encoding. I think I did lose the version with the accents and only have this tools output now.
I do not see "\xEF\xBF\xBD 20..." in my dictionary I only see text. Currently I don't even see "20"... those two chars helped me find the previous issues.
Code I can show:
def insert_tables(cursor, assets_final, ips_final):
#Insert Asset data into asset table
field_names_dict = get_asset_field_names(assets_final)
sql_field_names = ",".join(field_names_dict.keys())
for key, row in assets_final.items():
insert_sql = 'INSERT INTO asset(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(row.values())) + '")'
print(insert_sql)
cursor.execute(insert_sql)
#Insert IP data into IP table
field_names_dict = get_ip_field_names(ips_final)
sql_field_names = ",".join(field_names_dict.keys())
for hostname_key, ip_dict in ips_final.items():
for ip_key, ip_row in ip_dict.items():
insert_sql = 'INSERT INTO ip(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(ip_row.values())) + '")'
print(insert_sql)
cursor.execute(insert_sql)
def output_sqlite_db(sqlite_file, assets_final, ips_final):
conn = sqlite3.connect(sqlite_file)
cursor = conn.cursor()
insert_tables(cursor, assets_final, ips_final)
conn.commit()
conn.close()
def output_mysql_db(assets_final, ips_final):
conn = mysql.connect(host=config.mysql_ip, port=config.mysql_port, user=config.mysql_user, password=config.mysql_password, charset="utf8mb4", use_unicode=True)
cursor = conn.cursor()
cursor.execute('USE ' + config.mysql_DB)
insert_tables(cursor, assets_final, ips_final)
conn.commit()
conn.close()
EDIT: Could this have something to do with the fact I'm using Cygwin as my terminal? HA! I added this line and got a different message (now using the accented version again):
cursor.execute('SET NAMES utf8')
Error:
pymysql.err.InternalError: (1366, "Incorrect string value: '\\xC5\\x81A II...' for column 'history' at row 1")

I can shine a bit of light on the messages that you have supplied:
Case 1:
>>> import unicodedata as ucd
>>> s1 = b"\xEF\xBF\xBD"
>>> s1
b'\xef\xbf\xbd'
>>> u1 = s1.decode('utf8')
>>> u1
'\ufffd'
>>> ucd.name(u1)
'REPLACEMENT CHARACTER'
>>>
Looks like you have obtained some bytes encoded in an encoding other than utf8 (e.g. cp1252) then tried bytes.decode(encoding='utf8', errors='strict'). This detected some errors. You then decoded again with errors="replace". This raised no exceptions. However your data has had the error bytes replaced by the replacement character (U+FFFD). Then you encoded your data using str.encodeso that you could write to a file or database. Each replacement characters turns up as 3 hex bytes EF BF BD.
... more to come
Case 2:
>>> s2 = b"\xC5\x81A II"
>>> s2
b'\xc5\x81A II'
>>> u2 = s2.decode('utf8')
>>> u2
'\u0141A II'
>>> ucd.name(u2[0])
'LATIN CAPITAL LETTER L WITH STROKE'
>>>

Python: Iterating through MySQL columns

I'm wondering if you can help me. I'm trying to change the value in each column if the text matches a corresponding keyword. This is the loop:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike %s") %(column_names[i], search_terms[i])
The MySQL command works fine on its own, but not when I put it in the loop. It's giving an error at the first %s
Does anyone have any insights?
This is the error:
_mysql_exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%s = 1 WHERE text rlike %s' at line 1")
Column names looks like
column_names = ["col1","col2","col3"...]
Search terms look like
search_terms = ["'(^| |.|-)word1[;:,. ?-]'","'(^| |.|-)word2[;:,. ?-]'",...]

The right way to do this is to give values to Python, which will quote things correctly.
adapted from voyager's post:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET {} = 1 WHERE text rlike %s".format(column_names[i]),
(search_terms[i],),
)
In this case it's confusing because the column_name isn't a value, it's part of the table structure, so it's inserted using good old string formatting. The search_term is a value, so is passed to cursor.execute() for correct, safe quoting.
(Don't use string manipulation to add the quotes -- you're exposing yourself to SQL injection.)

Missing quotes and wrong parenthesis placement...
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike '%s'" %(column_names[i], search_terms[i]))
# ^ ^
# (-----------------------------------------------------------------------------------)
Please note, this is not the right way of doing this, if your string may contain quotes by itself...
What about that instead:
for i in range(0, 20, 1):
cur.execute("UPDATE table SET %s = 1 WHERE text rlike ?" % (column_names[i],),
(search_terms[i],))
This uses the % operator to set the column name, but uses an executes parameter to bind the data, letting the DB driver escape all characters that need so.

sqlite database supporting unicode and longdate

I'm working on my python script to pull the data from the sqlite3 database.
When I try this code:
#Pull the data from the database
c = con.cursor()
channelList = list()
channel_db = xbmc.translatePath(os.path.join('special://userdata/addon_data/script.tvguide', 'source.db'))
if os.path.exists(channel_db):
c.execute('SELECT channel, title, start_date, stop_date FROM programs WHERE channel')
for row in c:
channel = row[0], row[1],row[2], row[3]
channelList.append(channel)
print channel
c.close()
I will get the list of data with unicode u and long date L like this:
20:52:01 T:5212 NOTICE: (u'101 ABC FAMILY ', u'Reba - Location, Location, Location', 20140522133000L, 20140522140000L)
20:52:01 T:5212 NOTICE: (u'101 ABC FAMILY ', u'Reba - Your Place or Mine', 20140522140000L, 20140522143000L)
20:52:01 T:5212 NOTICE: (u'101 ABC FAMILY ', u"Reba - She's Leaving Home, Bye, Bye", 20140522143000L, 20140522150000L)
20:52:01 T:5212 NOTICE: (u'101 ABC FAMILY ', u'Boy Meets World - No Such Thing as a Sure Thing', 20140522150000L, 20140522153000L)
I want to print the data without the u and L strings.
Could you please tell me how I can print the data without the u and the L strings?

The problem is that you are printing a tuple, the elements of which will be printed using __repr__ instead of __str__. To get each to be printed in a more natural way, try:
print row[0], row[1], row[2], row[3]
Explanation by example:
>>> print u'Hello'
Hello
>>> print (u'Hello', u'World')
(u'Hello', u'World')
>>> print u'Hello', u'World'
Hello World
Converting:
If you're interested in converting the data so that the strings are no longer unicode, and the dates are ints instead of longs, you can do the following:
>>> channel = row[0].encode('ascii'), row[1].encode('ascii'), int(row[2]), int(row[3])
>>> print channel
('101 ABC FAMILY ', 'Reba - Location, Location, Location', 20140522133000, 20140522140000)
Beware that encoding to ascii will fail if the string contains a non-ascii character, by raising a UnicodeDecodeError. Casting the long to int will never raise an exception, but the result will simply be another long if the number is too large to be stored in an int. More about Python's long.
Text factory:
Another option is to use a sqlite3 feature called text_factory. Do this before the c.execute:
con.text_factory = lambda x: x.encode('ascii')
This will be automatically called when retrieving any text columns. Note that in this case, the UnicodeDecodeError will be raised by c.execute if the text can't be decoded properly.

why Python doesn't convert \n to newline when queried from Sqlite?

I want to query from Sqlite and contains \n and I expect python converts it to newline but it doesn't. and I also changed \n to \n in the database but still can't be converted.
cursor.execute('''SELECT test FROM table_name ''')
for row in cursor:
self.ui.textEdit.append(row[0])
# or
print row[0]
I also tried unicode(row[0]) and not working. I am surprised there is no an easy solution for this in the web.

Neither SQLite nor Python convert characters in strings (except for \ escapes in a Python string written in the source code).
Newlines work correctly if you handle them correctly:
>>> import sqlite3
>>> db = sqlite3.connect(':memory:')
>>> c = db.cursor()
>>> c.execute('create table t(x)')
>>> c.execute("insert into t values ('x\ny')")
>>> c.execute("insert into t values ('x\\ny')")
>>> c.execute("select * from t")
>>> for row in c:
... print row[0]
...
x
y
x\ny

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I convert this to unicode so it displays properly? - python

If the data coming in is in UTF-8 (which looks like it is), use (in Python 2), unicode() to convert it from bytes to a Python Unicode string: cs = unicode(cs[3:-3], "utf-8") Basic rule: inside your code, always use Unicode strings. Convert with unicode() input data and with encode() output data.

Related

How do I use escape caracter in SQL to query with LIKE? (Python)

How to fix python hardcoded dictionary encoding issue

Python: Iterating through MySQL columns

sqlite database supporting unicode and longdate

why Python doesn't convert \n to newline when queried from Sqlite?

Categories

Resources