I have troubles with encoding in python while using xlrd and mysqldb.
I am reading an excel file which contains Turkish characters in it.
When I print the value like that print sheet.cell(rownum,19).value it writes İstanbul to console, which is correct.(Win7 Lucida ConsoleLine,encoding is `cp1254)
However, if I want to insert that value to database like
sql = "INSERT INTO city (name) VALUES('"+sheet.cell(rownum,19).value+"')"
cursor.execute (sql)
db.commit()
gives error as
Traceback (most recent call last):
File "excel_employer.py", line 112, in <module> cursor.execute (sql_deneme)
File "C:\Python27\lib\site-packages\MySQLdb\cursors.py", line 157, in execute
query = query.encode(charset)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0130' in position
41: ordinal not in range(256)
If I change the sql as
sql = "INSERT INTO city (name) VALUES('"+sheet.cell(rownum,19).value.encode('utf8')+"')"
the value is inserted without any error but it becomes Ä°stanbul
Could you give me any idea how can I put the value İstanbul to database as it is.
Just as #Kazark said, maybe the encoding of your connector of mysql is not set.
conn = MySQLdb.connect(
host="localhost",
user="root",
passwd="root",
port=3306,
db="test1",
init_command="set names utf8"
)
Try this, when you init your python connector of mysql. But be sure the content been inserted is utf-8.
Related
I am trying to use ceODBC to try and improve some query times, and I have a problem starting with ceODBC library.
I import the lib, connect, execute the select statement, but when running cursor.fetchall(), or similar, I get the error.
this error seems to happen only with labels that could have spaces or special characters
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 26: invalid start byte
sample
import ceODBC
conn = ceODBC.connect('cnx string', autocommit=False)
cursor = conn.cursor()
cursor.execute("select [name], [label] from table1")
# error because of label having a "héllo world" value for eg
# none works
[print(row) for row in cursor]
print(cursor.fetchall())
I tried looking for decode/encode methods but found none on ceODBC
I'm trying to get an admin account to edit a 'rank' (basically access level) for one of the profiles in my data-base. The error is:
Traceback (most recent call last):
File "U:/A-level Computor Science/Y12-13/SQL/sqlite/Databases/ork task/Python for SQL V_2.py", line 154, in <module>
main()
File "U:/A-level Computor Science/Y12-13/SQL/sqlite/Databases/ork task/Python for SQL V_2.py", line 9, in main
start_menu()
File "U:/A-level Computor Science/Y12-13/SQL/sqlite/Databases/ork task/Python for SQL V_2.py", line 22, in start_menu
login()
File "U:/A-level Computor Science/Y12-13/SQL/sqlite/Databases/ork task/Python for SQL V_2.py", line 72, in login
Mek_menu()
File "U:/A-level Computor Science/Y12-13/SQL/sqlite/Databases/ork task/Python for SQL V_2.py", line 108, in Mek_menu
where Uzaname = %s""" % (NewRank, Findaname))
sqlite3.OperationalError: unrecognized token: "0rk_D4T4B453"`
The code that seems to be the problem is:
cursor.execute(""" update 0rk_D4T4B453.Da_Boyz
set Rank = %s
where Uzaname = %s""" % (NewRank, Findaname))
Originally, it was all on one line and it didn't work, and now I've tried it on multiple lines and it still doesn't work. So I checked here to see if anyone could help.
EDIT1: Thanks for the suggestions. None of them have fixed the code, but I've narrowed the problem code to: where Uzaname = %s""" % (NewRank, Findaname))
Unless you use ATTACH, SQLite (a file-level database) does not recognize other databases. Usually server-level databases (Oracle, Postgres, SQL Server, etc.) use the database.schema.table reference. However, in SQLite the very database file you connect to is the main database in scope. But ATTACH allows you to connect to other SQLite databases and then recognizes database.table referencing.
Additionally, for best practices:
In sqlite3 and any other Python DB-APIs, use parameterization for literal values and do not format values to SQL statement.
In general Python, stop using the de-emphasized (not deprecated yet) string modulo operator, %. Use str.format or more recent F-string for string formatting. But neither is needed here.
Altogether, if you connect to the 0rk_D4T4B453 database, simply query without database reference:
conn = sqlite3.connect('/path/to/0rk_D4T4B453.db')
cursor = conn.cursor()
# PREPARED STATEMENT WITH QMARK PLACEHOLDERS
sql = """UPDATE Da_Boyz
SET Rank = ?
WHERE Uzaname = ?"""
# BIND WITH TUPLE OF PARAMS IN SECOND ARG
cursor.execute(sql, (NewRank, Findaname))
conn.commit()
If you do connect to a different database, call ATTACH. Here also, you can alias other database with better naming instead of number leading identifier.
cursor.execute("ATTACH '/path/to/0rk_D4T4B453.db' AS other_db")
sql = """UPDATE other_db.Da_Boyz
SET Rank = ?
WHERE Uzaname = ?"""
cursor.execute(sql, (NewRank, Findaname))
conn.commit()
cur.execute("DETACH other_db")
I can read from a MSSQL database by sending queries in python through pypyodbc.
Mostly unicode characters are handled correctly, but I've hit a certain character that causes an error.
The field in question is of type nvarchar(50) and begins with this character "" which renders for me a bit like this...
-----
|100|
|111|
-----
If that number is hex 0x100111 then it's the character supplementary private use area-b u+100111. Though interestingly, if it's binary 0b100111 then it's an apostrophe, could it be that the wrong encoding was used when the data was uploaded? This field is storing part of a Chinese postal address.
The error message includes
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data
Here it is in full...
Traceback (most recent call last): File "question.py", line 19, in <module>
results.fetchone() File "/VIRTUAL_ENVIRONMENT_DIR/local/lib/python2.7/site-packages/pypyodbc.py", line 1869, in fetchone
value_list.append(buf_cvt_func(from_buffer_u(alloc_buffer))) File "/VIRTUAL_ENVIRONMENT_DIR/local/lib/python2.7/site-packages/pypyodbc.py", line 482, in UCS_dec
uchar = buffer.raw[i:i + ucs_length].decode(odbc_decoding) File "/VIRTUAL_ENVIRONMENT_DIR/lib/python2.7/encodings/utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data
Here's some minimal reproducing code...
import pypyodbc
connection_string = (
"DSN=sqlserverdatasource;"
"UID=REDACTED;"
"PWD=REDACTED;"
"DATABASE=obi_load")
connection = pypyodbc.connect(connection_string)
cursor = connection.cursor()
query_sql = (
"SELECT address_line_1 "
"FROM address "
"WHERE address_id == 'REDACTED' ")
with cursor.execute(query_sql) as results:
row = results.fetchone() # This is the line that raises the error.
print row
Here is a chunk of my /etc/freetds/freetds.conf
[global]
; tds version = 4.2
; dump file = /tmp/freetds.log
; debug flags = 0xffff
; timeout = 10
; connect timeout = 10
text size = 64512
[sqlserver]
host = REDACTED
port = 1433
tds version = 7.0
client charset = UTF-8
I've also tried with client charset = UTF-16 and omitting that line all together.
Here's the relevant chunk from my /etc/odbc.ini
[sqlserverdatasource]
Driver = FreeTDS
Description = ODBC connection via FreeTDS
Trace = No
Servername = sqlserver
Database = REDACTED
Here's the relevant chunk from my /etc/odbcinst.ini
[FreeTDS]
Description = TDS Driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
UsageCount = 1
I can work around this issue by fetching results in a try/except block, throwing away any rows that raise a UnicodeDecodeError, but is there a solution? Can I throw away just the undecodable character, or is there a way to fetch this line without raising an error?
It's not inconceivable that some bad data has ended up on the database.
I've Googled around and checked this site's related questions, but have had no luck.
I fixed the issue myself by using this:
conn.setencoding('utf-8')
immediately before creating a cursor.
Where conn is the connection object.
I was fetching tens of millions of rows with fetchall(), and in the middle of a transaction that would be extremely expensive to undo manually, so I couldn't afford to simply skip invalid ones.
Source where I found the solution: https://github.com/mkleehammer/pyodbc/issues/112#issuecomment-264734456
This problem was eventually worked around, I suspect that the problem was that text had a character of one encoding hammered into a field with another declared encoding through some hacky method when the table was being set up.
I have some bulk postgresql 9.3.9 data inserts I do in python 3.4. I've been using SQLAlchemy which works fine for normal data processing. For a while I've been using psycopg2 so as to utilize the copy_from function which I found faster when doing bulk inserts. The issue I have is that when using copy_from, the bulk inserts fail when I have data that has got some special characters in it. When I remove the highlighted line the insert runs successfully.
Error
Traceback (most recent call last):
File "/vagrant/apps/data_script/data_update.py", line 1081,
in copy_data_to_db
'surname', 'other_name', 'reference_number', 'balance'), sep="|", null='None')
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:
COPY source_file_raw, line 98: "94|1|99|2015-09-03 10:17:34|False|True|John|Doe|A005-001\008020-01||||||..."
Code producing the error
cursor.copy_from(data_list, 'source_file_raw',
columns=('id', 'partner_id', 'pos_row', 'loaded_at', 'has_error',
'can_be_loaded', 'surname', 'other_name', 'reference_number', .............),
sep="|", null='None')
The db connection
import psycopg2
pg_conn_string = "host='%s' port='%s' dbname='%s' user='%s' password='%s'"
%(con_host, con_port, con_db, con_user, con_pass)
conn = psycopg2.connect(pg_conn_string)
conn.set_isolation_level(0)
if cursor_type == 'dict':
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
else:
cursor = conn.cursor()
return cursor
So the baffling thing is that SQlAlchemy can do the bulk inserts even when those "special characters" are present but using psycopg2 directly fails. Am thinking there must be a way for me to escape this or to tell psycopg2 to find a smart way to do the insert or am I missing a setting somewhere?
With this code I tried to delete a table if it exists. But I need to do it via passing
a variables.
import MySQLdb as mdb
conn = mdb.connect(host='db01.myhost.co.nl,
user='pdbois',
passwd='triplex',
db='myxxx')
cursor = conn.cursor()
# Without passing variables this works OK!
#cursor.execute("""drop table if exists testtable""")
# But this break
table_name = "testtable"
cursor.execute("""drop table if exists %s""",(table_name))
conn.close()
But why the way I do it above breaks by giving this error?
File "test_mysql.py", line 63, in <module>
main()
File "test_mysql.py", line 59, in main
create_table()
File "test_mysql.py", line 25, in create_table
cursor.execute("""drop table if exists %s""",(table_name))
File "build/bdist.linux-x86_64/egg/MySQLdb/cursors.py", line 174, in execute
File "build/bdist.linux-x86_64/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
_mysql_exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''testtable'' at line 1")
What's the right way to do it?
Update:
Another problem is to create a table via parameter.
sql = "create table %s(
first_name char(20) not null,
last_name char(20))" % mdb.escape_string(table_name)
cursor.execute(sql)
It gives `SyntaxError: EOL while scanning string literal`.
You cannot parameterize the table name, use string formatting and escape the value manually:
cursor.execute("drop table if exists %s" % mdb.escape_string(table_name))