MySQLdb is a module of python to communicate with mysql database. The escape_string is a method provided by MySQLdb to escape some characters in sql. For example, sql like 'Update table Set col = "My"s"' will cause a error. So escape_string will help us to add a '\' before the " in My"s.
However, in multibyte encoding like gbk, which use more than 2 bytes to store a chinese word, the escape_string only search the character to be escaped one character by one, which will cause some special characters to be escaped incorrectly. for example, the Chinese character ' 昞', whose bytes are '\x95\x5c', if the sql is 'update table set col = "昞"', then the MySQLdb.escape_string(sql) will get the result: update table set col = "昞\", which is wrong and cannot be executed correctly.
So is there anyone who ever came over such a problem.
P.S I googled the problem and found there is a method mysqli_set_charset in php which can solve such case, So, I wonder whether there is a such one in python.
This problem is most likely cause because the default character set for your connection is latin1 instead of unicode. There are a couple different things you can try. From this post,
conn = mysql.connect(host='127.0.0.1',
user='user',
passwd='passwd',
db='db',
charset='utf8',
use_unicode=True)
then you run your query like this
cursor.execute('INSERT INTO mytable VALUES (null, %s)',
('\x95\x5c',))
Appearently a similar problem was solved by running the following query first
SET NAMES 'gbk
Related
I am reading scandinavian language websites with a web-crawler - and wish to insert them into my PostgreSQL database.
Originally I tried to encode my PSQL DB as utf-8, then manually tried to insert the characters that would be of a problem like this:
Insert into name (surname) VALUES ('Børre');
This was done in the windows PSQL shell.
This gave me the following error: ERROR: invalid byte sequence for encoding "UTF8": 0x9b. So after doing some googling I changed the client encoding to latin1. Now that statement was successfull. The server encoding is still utf8.
When I do the same insert through my python script the name appears in my database as B°rre. If I change back the encoding of client to utf8, I also get entries with wrong special characters.
My python script is utf8 encoded, but prints the name correct.
Insert statement:
con = psycopg2.connect(*database details*)
print("Opened database successfully")
cur = con.cursor()
#INSERT NAME
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre')
cur.execute(query,data)
As previously stated, print(personObject.surname) gives 'Børre'
If I try the following:
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre'.encode('utf-8'))
cur.execute(query,data)
I get the following in my database:
\x62c383c2b8727265
psycopg2 doesn't understand postgresql queries it just converts the arguments given into their postgresql representation
if you give it an array of bytes to will convert it to a postgresql BYTEA literal,
data = ('børre'.encode('utf-8')) gets you a bytes.
so, don't do that, use a string.
The code fragment you have at the top should work.
In the error I see ø encoded as hex c383c2b8, that hex translates to UTF8 as two charactersà and ¸. It looks to me like python thinks your script is not wtitten is UTF8, but instead some other codepage.
using client_encoding key words
eg: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass' client_encoding='utf8'")
I'm using python to execute the following query in Oracle:
SELECT COUNT(*) FROM TABLE WHERE DATA = 'CAMIÓN'
I'm getting a 0 when I should be getting a value different to 0 because there are rows where DATA is 'CAMIÓN'.
If you execute the query like this:
SELECT COUNT(*) FROM TABLE WHERE DATA = 'CAMIN'
It will give you 0, so I'm thinking it might be due to the accent because it doesn't give an error, it seems oracle is removing the troubled character.
How does Oracle handle the accents? Does it remove those?
If alternative convention spelling is enabled, then the acents equivalent word will be taken by oracle.
For Example,
Character Alternative Spelling
ä ae
You can use ctx_ddl.unset_attribute/ctx_ddl.set_attribute to set or unset alternative spelling conventions.
I am trying to insert some Arabic word into the arabic_word column of my hanswehr2 database Maria DB using the MySQLdb driver.
I was getting a latin-1 encode error. But after reading around, I found out that the MySQLdb driver was defaulted to latin-1 and I had to explicitly set utf-8 as my charset of choice at the mariadb.connect() function. Sauce.
The entire database is set to utf-8.
Code:
def insert_into_db(arabic_word, definition):
try:
conn = mariadb.connect('localhost', 'root', 'xyz1234passwd', 'hans_wehr', charset='utf-8', use_unicode=True)
conn.autocommit(True)
cur = conn.cursor()
cur.execute("INSERT INTO hanswehr2 (arabic_word , definition) VALUES (%s,%s)", (arabic_word, definition,))
except mariadb.Error, e:
print e
sys.exit(1)
However now I get the following error:
/usr/bin/python2.7 /home/heisenberg/hans_wehr/main.py
Total lines 87672
(2019, "Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)")
Process finished with exit code 1
I have specified the Python MySQL driver to use the utf-8 character however it seems to ignore this.
Any inputs would be highly appreciated.
The charset alias for UTF-8 in MySQL is utf8 (no hyphen).
See https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html for available charsets.
Note, if you need to use non-BMP Unicode points, such as emojis, use utf8mb4 for the connection charset and the varchar type.
There is a thing called collations that helps encode/decode characters for specific languages.
https://softwareengineering.stackexchange.com/questions/95048/what-is-the-difference-between-collation-and-character-set
I think u need to specify it when creating your database table or in the connection string. refer this:
store arabic in SQL database
More on python mysql connection :
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-set-charset-collation.html
I am trying to insert a string to a SQL DB starting with 0x but it keeps failing on the insert. The characters that come after 0x are random characters that range from A-Z, a-z and 0-9 with no set length. I tried to get around it by adding a letter in front of the string and update it afterwards but it does not work. I am using
SQL statement I am trying to mimic
insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[LastModified],[MarkedForDelete],[VersionID]) values ('pmtg-dox/CCM/Trades/Buy/Seller_Provided_-_Raw_Data/C''Ds_v2/NID3153422.pdf','1','2015-10-11','Yes', '0xih91kjhdaoi23ojsdpf')
Python Code
import pymssql as mssql
...
cursor.execute("insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[LastModified],[MarkedForDelete],[VersionID]) values (%s,%s,%s,%s,%s)",(deleteitems['Key'],deleteitems['IsLatest'],deleteitems['LastModified'],MarkedforDelete, deleteitems['VersionId']))
conn_db.commit()
pymssql.ProgrammingError: (102, "Incorrect syntax near
'qb_QWQDrabGr7FTBREfhCLMZLw4ztx'.DB-Lib error message 20018, severity
15: General SQL Server error: Check messages from the SQL Server")
Is there a way to make Python, pymssql\mysql force insert the string? Is there a string manipulation technique that I am not using? I have tried pypyodbc but no luck.
Edit: My current patch is to alter the string and add a flag to the row so I remember that the string starts with 0x
This is the solution that I came up with.
Since running the insert command with the appropriate values worked, I created a stored procedure in SQL to handle my request
USE [TestDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[stp_S3Files]
#Key_ varchar(4000),
#IsLatest_ varchar(200),
#Size_ varchar(200),
#LastModified_ varchar(200),
#MarkedForDelete_ varchar(200),
#VersionID_ varchar(200)
AS
insert into [TestDB].[dbo].[S3_Files] ([Key],[IsLatest],[Size(Bytes)],[LastModified],[MarkedForDelete],[VersionID]) values (#Key_, #IsLatest_, #Size_, #LastModified_, #MarkedForDelete_, #VersionID_)
Then I call it through Python
modkey = deleteitems['Key'].replace("'", "''")
cursor.execute("""exec TestDB.dbo.stp_S3Files
#Key_ = '%s'
,#IsLatest_ = %s
,#Size_ = '%s'
,#LastModified_ = '%s'
,#MarkedForDelete_ = '%s'
,#VersionID_ = '%s' """ %(modkey, deleteitems['IsLatest'],deleteitems['Size'],deleteitems['LastModified'],MarkedforDelete,deleteitems['VersionId']))
conn_db.commit()
Note: the string replace is to handle path names with ' to escape the character. I hope this helps someone who has the same issue down the road.
Im using python to access a MySQL database and im getting a unknown column in field due to quotes not being around the variable.
code below:
cur = x.cnx.cursor()
cur.execute('insert into tempPDBcode (PDBcode) values (%s);' % (s))
rows = cur.fetchall()
How do i manually insert double or single quotes around the value of s?
I've trying using str() and manually concatenating quotes around s but it still doesn't work.
The sql statement works fine iv double and triple check my sql query.
You shouldn't use Python's string functions to build the SQL statement. You run the risk of leaving an SQL injection vulnerability. You should do this instead:
cur.execute('insert into tempPDBcode (PDBcode) values (%s);', s)
Note the comma.
Python will do this for you automatically, if you use the database API:
cur = x.cnx.cursor()
cur.execute('insert into tempPDBcode (PDBcode) values (%s)',s)
Using the DB API means that python will figure out whether to use quotes or not, and also means that you don't have to worry about SQL-injection attacks, in case your s variable happens to contain, say,
value'); drop database; '
If this were purely a string-handling question, the answer would be tojust put them in the string:
cur.execute('insert into tempPDBcode (PDBcode) values ("%s");' % (s))
That's the classic use case for why Python supports both kinds of quotes.
However as other answers & comments have pointed out, there are SQL-specific concerns that are relevant in this case.