inserting unicode values in mysql using python - python

I want to insert unicode text to mysql table, so for this I have written below code
I am using flask framework
import MySQLdb as ms
db = ms.connect("localhost","username","password","dbname")
cursor = db.cursor()
my_text = "का" #this is marathi lang word
enc = my_text.encode('utf-8') #after encoding still it shows me marathi word
db_insert = "INSERT INTO TEXT(text) VALUES '{0}'"
cursor.execute(db_insert,(enc))
db.commit()
It gives me following error
TypeError: not all arguments converted during string formatting on line cursor.execute()
How to remove this error ?

Put this in the beginning of the source code:
# -*- coding: utf-8 -*-
And don't encode something that is already encoded - remove my_text.encode('utf-8')
Use charset="utf8", use_unicode=True in the connection call.
The CHARACTER SET in the table/column must be utf8 or utf8mb4. latin1 will not work correctly.
Python checklist

You need to pass a sequence (a list or tuple) to the params in cursor.execute statement:
db_insert = "INSERT INTO TEXT(text) VALUES (%s)"
# notice the comma at the end to convert it to tuple
cursor.execute(db_insert,(enc,))
You can read more in the documentation:
Why the tuple? Because the DB API requires you to pass in any parameters as a sequence.
Alternatively, you could even use named parameters:
db_insert = "INSERT INTO TEXT(text) VALUES (%(my_text)s)"
#and then you can pass a dict with key and value
cursor.execute(db_insert, {'my_text': enc})

Related

MySQLdb adding character b infront of strings that have been escaped - Python

I am trying to write a simple Python script to bulk add movie titles into a local database, using the MySQLdb (mysqlclient) package. I am reading the titles from a TSV file. But when go to sanitize the inputs using MySQLdb::escape_string(), I get the character b before my string. I believe this means that SQL is interpreting it as a bit value, but when I go to execute my query I get the following error:
You have an error in your SQL syntax; check the manual that
corresponds to your MariaDB server version for the right syntax to use
near 'b'Bowery to Bagdad',1955)' at line 1"
The select statement in question:
INSERT INTO movies (imdb_id, title, release_year) VALUES ('tt0044388',b'Bowery to Bagdad',1955)
def TSV_to_SQL(file_to_open):
from MySQLdb import _mysql
db=_mysql.connect(host='localhost', user='root', passwd='', db='tutorialdb', charset='utf8')
q = """SELECT * FROM user_id"""
# MySQLdb.escape_string()
# db.query(q)
# results = db.use_result()
# print(results.fetch_row(maxrows=0, how=1))
print("starting?")
with open(file_to_open, encoding="utf8") as file:
tsv = csv.reader(file, delimiter="\t")
count = 0
for line in tsv:
if count == 10:
break
# print(MySQLdb.escape_string(line[1]))
statement = "INSERT INTO movies (imdb_id, title, release_year) VALUES ('{imdb_id}',{title},{year})\n".format(
imdb_id=line[0], title=MySQLdb.escape_string(line[1]), year=line[2])
# db.query(statement)
print(statement)
count = count + 1
I know a simple solution would be to just remove the character b from the start of the string, but I was wondering if there was a more proper way, or if I missed something in documentation.
The 'b' infront of the string represents that the string is binary encoded rather than a literal string.
If you use .encode() you will be able to get what you want.
How to convert 'binary string' to normal string in Python3?
It's more common to let the connector perform the escaping automatically, by inserting placeholders in the SQL statement and passing a sequence (conventionally a tuple) of values as the second argument to cursor.execute.
conn = MySQLdb.connect(host='localhost', user='root', passwd='', db='tutorialdb', charset='utf8')
cursor = conn.cursor()
statement = """INSERT INTO movies (imdb_id, title, release_year) VALUES (%s, %s, %s)"""
cursor.execute(statement, (line[0], line[1], line[2]))
conn.commit()
The resulting code is more portable - apart from the connection it will work with all DB-API connectors*. Dropping down to low-level functions like _mysql.connect and escape_string is unusual in Python code (though you are perfectly free to code like this if you want, of course).
* Some connection packages may use a different placeholder instead of %s, but %s seems to be the favoured placeholder for MySQL connector packages.

python- Insert query to store regex expression in mysql database

Insert query for string having \" character in it in mysql db-
How to write insert query for string such as:
This is the string i want to insert into my table,
reg="item-cell\"(.*?)</span></div>"
cur = db.cursor()
query='INSERT into table_name(col_name) values("%s")'%(reg)
cur.execute(query)
cur.close()
Below is the error:
_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'(.*?)</span></div>")\' at line 1')
I know its something related to escape character, but don't know how to make that work.
EDIT: This string reg is variable i.e. I am getting this string from some API and I want to insert it into my database. So inserting escape characters in between the string literal will not suffice my case. I want something that can generalize single quote, double quote or one double quote(eg. reg) all these cases.
I hope I made my point clear.
EDIT: This is how i am getting the value of reg(froma json file)
import urllib, json
import MySQLdb
url = "some_url"
response = urllib.urlopen(url)
data = json.loads(response.read())
for item in data["key1"]["key2"]["key3"]["key4"]:
prop=str(item)
reg=str(data["key1"]["key2"]["key3"]["key4"][prop]["regex"])
The problem is in the part where you convert the json object to string. To properly do this without altering the string you need to use json.dumps
for item in data["key1"]["key2"]["key3"]["key4"]:
prop = json.dumps(item)
reg = json.dumps(data["key1"]["key2"]["key3"]["key4"][prop]["regex"])

Problems inserting utf8 data to PostgreSQL with Python

I am reading scandinavian language websites with a web-crawler - and wish to insert them into my PostgreSQL database.
Originally I tried to encode my PSQL DB as utf-8, then manually tried to insert the characters that would be of a problem like this:
Insert into name (surname) VALUES ('Børre');
This was done in the windows PSQL shell.
This gave me the following error: ERROR: invalid byte sequence for encoding "UTF8": 0x9b. So after doing some googling I changed the client encoding to latin1. Now that statement was successfull. The server encoding is still utf8.
When I do the same insert through my python script the name appears in my database as B°rre. If I change back the encoding of client to utf8, I also get entries with wrong special characters.
My python script is utf8 encoded, but prints the name correct.
Insert statement:
con = psycopg2.connect(*database details*)
print("Opened database successfully")
cur = con.cursor()
#INSERT NAME
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre')
cur.execute(query,data)
As previously stated, print(personObject.surname) gives 'Børre'
If I try the following:
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre'.encode('utf-8'))
cur.execute(query,data)
I get the following in my database:
\x62c383c2b8727265
psycopg2 doesn't understand postgresql queries it just converts the arguments given into their postgresql representation
if you give it an array of bytes to will convert it to a postgresql BYTEA literal,
data = ('børre'.encode('utf-8')) gets you a bytes.
so, don't do that, use a string.
The code fragment you have at the top should work.
In the error I see ø encoded as hex c383c2b8, that hex translates to UTF8 as two charactersà and ¸. It looks to me like python thinks your script is not wtitten is UTF8, but instead some other codepage.
using client_encoding key words
eg: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass' client_encoding='utf8'")

MySQLdb can't initialize character set utf-8 error

I am trying to insert some Arabic word into the arabic_word column of my hanswehr2 database Maria DB using the MySQLdb driver.
I was getting a latin-1 encode error. But after reading around, I found out that the MySQLdb driver was defaulted to latin-1 and I had to explicitly set utf-8 as my charset of choice at the mariadb.connect() function. Sauce.
The entire database is set to utf-8.
Code:
def insert_into_db(arabic_word, definition):
try:
conn = mariadb.connect('localhost', 'root', 'xyz1234passwd', 'hans_wehr', charset='utf-8', use_unicode=True)
conn.autocommit(True)
cur = conn.cursor()
cur.execute("INSERT INTO hanswehr2 (arabic_word , definition) VALUES (%s,%s)", (arabic_word, definition,))
except mariadb.Error, e:
print e
sys.exit(1)
However now I get the following error:
/usr/bin/python2.7 /home/heisenberg/hans_wehr/main.py
Total lines 87672
(2019, "Can't initialize character set utf-8 (path: /usr/share/mysql/charsets/)")
Process finished with exit code 1
I have specified the Python MySQL driver to use the utf-8 character however it seems to ignore this.
Any inputs would be highly appreciated.
The charset alias for UTF-8 in MySQL is utf8 (no hyphen).
See https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html for available charsets.
Note, if you need to use non-BMP Unicode points, such as emojis, use utf8mb4 for the connection charset and the varchar type.
There is a thing called collations that helps encode/decode characters for specific languages.
https://softwareengineering.stackexchange.com/questions/95048/what-is-the-difference-between-collation-and-character-set
I think u need to specify it when creating your database table or in the connection string. refer this:
store arabic in SQL database
More on python mysql connection :
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlconnection-set-charset-collation.html

Python mysql connector LIKE against unicode value

So here is my problem: I am trying to select a specific value from a table
comparing it with a unicode string. The value is also unicode. I am using
mysql.connector. The server settings are all utf8 oriented. When I run
following query - I get an empty list. When I run it without 'WHERE Title like '%s'' part, I get a full set of values, and they properly displayed in the
output. The same query works in the command line on the server. The value is
there for sure. What is it that I am missing?
conn = sql.connect(host='xxxxxxx', user='xxx', password='xxx', database='db', charset="utf8")
cur = conn.cursor()
townQuery = (u"""SELECT * FROM Towns WHERE Title like '%s' """)
tqd = (u"%" +u"Серов"+u"%")
cur.execute(townQuery, tqd)
for i in cur:
print i
When you use the 2-argument form of cur.execute (thus passing the arguments, tqd, to the parametrized sql, townQuery), the DB adaptor will quote the arguments for you. Therefore, remove the single quotes from around the %s in townQuery:
townQuery = u"""SELECT * FROM Towns WHERE Title like %s"""
tqd = [u"%Серов%"]
cur.execute(townQuery, tqd)
Also note that the second argument, tqd, must be a sequence such as a list or tuple. The square brackets around u"%Серов%" makes [u"%Серов%"] a list. Parentheses around u"%Серов%" do NOT make (u"%Серов%") a tuple because Python evaluates the quantity in parentheses to a unicode. To make it a tuple, add a comma before the closing parenthesis: (u"%Серов%",).

Categories