Python psycopg2 not in utf-8 - python

I use Python to connect to my postgresql data base like this:
conn=psycopg2.connect(database="fedour", user="fedpur", password="***", host="127.0.0.1", port="5432")
No problem for that.
But when I make a query and I want to print the cursor I have something like this:
"Fran\xc3\xa7ois" instead of "François" and it cause problem when I want to create a XML document with this.
I thkink is come from my encodage, but I found any solution. I try to encode('utf-8') but doesn't work.
I have also seen something like this, but its only for mySQL
MySQLdb.connect(charset='utf8', init_command='SET NAMES UTF8')
Can you help me ? Thanks

Make sure you're using the right encodind by running: print conn.encoding, and if you need, you can set the right encoding by conn.set_client_encoding('UNICODE'), or conn.set_client_encoding('UTF8').

Does the value get inserted correctly when query is executed via command prompt?
If yes, then the problem is with your cursor execution.Try cur.execute(u"querystring");(The u indicates utf encoding)
If no, then you need to set the encoding type of postgres to consider utf-8. You can refer character encoding in Postgres to see how to set default character encoding for a database, and on-the-fly conversion from one encoding to another.

Related

How to partially overwrite blob in an sqlite3 db in python via SQLAlchemy?

I'm having a db that contains a blob column with the binary representation as follows
The value that I'm interested in is encoded as little endian unsigned long long (8 byte) value in the marked. Reading this value works fine like this
p = session.query(Properties).filter((Properties.object_id==1817012) & (Properties.name.like("%OwnerUniqueID"))).one()
id = unpack("<Q", p.value[-8:])[0]
id in the above example is 1657266.
Now what I would like to do is the reverse. I have the row object p, I have a number in decimal format (using the same 1657266 for testing purposes) and I want to write that number in little endian format to those same 8 byte.
I've been trying to do so via SQL statement
UPDATE properties SET value = (SELECT substr(value, 1, length(value)-8) || x'b249190000000000' FROM properties WHERE object_id=1817012 AND name LIKE '%OwnerUniqueID%') WHERE object_id=1817012 AND name LIKE '%OwnerUniqueID%'
But when I do it like that I then can't read it anymore. At least not with SQLAlchemy. When I try the same code as above, I get the error message Could not decode to UTF-8 column 'properties_value' with text '☻' so it looks like it's written in a different format.
Interestingly using a normal select statement in DB Browser still works fine and the blob is still displayed exactly as in the screenshot above.
Now ideally I'd like to be able to write just those 8 bytes using the SQLAlchemy ORM but I'd settle for a raw SQL statement if that's what it takes.
I managed to get it to work with SQLAlchemy by basically reversing the process that I used to read it. In hindsight using the + to concatenate and the [:-8] to slice the correct part seems pretty obvious.
p = session.query(Properties).filter((Properties.object_id==1817012) & (Properties.name.like("%OwnerUniqueID"))).one()
p.value = p.value[:-8] + pack("<Q", 1657266)
By turning on ECHO for SQLAlchemy I got the following raw SQL statement:
UPDATE properties SET value=? WHERE properties.object_id = ? AND properties.name = ?
(<memory at 0x000001B93A266A00>, 1817012, 'BP_ThrallComponent_C.OwnerUniqueID')
Which is not particularly helpful if you want to do the same thing manually I suppose.
It's worth noting that the raw SQL statement in my question not only works as far as reading it with the DB Browers is concerned but also with the game client that uses the db in question. It's only SQLAlchemy that seems to have troubles, trying to decode it as UTF-8 it seems.

Python Replace Quoted Values In External SQL Query

I use the simple query below to select from a table based on the date:
select * from tbl where date = '2019-10-01'
The simple query is part of a much larger query that extracts information from many tables on the same server. I don't have execute access on the server, so I can't install a stored procedure to make my life easier. Instead, I read the query into Python and try to replace certain values inside single quote strings, such as:
select * from tbl where date = '<InForceDate>'
I use a simple Python function (below) to replace with another value like 2019-10-01, but the str.replace() function isn't replacing when I look at the output. However, I tried this with a value like that wasn't in quotes and it worked. I'm sure I'm missing something fundamental, but haven't uncovered why it works without quotes and fails with quotes.
Python:
def generate_sql(sql_path, inforce_date):
with open(pd_sql_path, 'r') as sql_file:
sql_string = sql_file.read()
sql_final = str.replace(sql_string, r'<InForceDate>', inforce_date)
return(sql_final)
Can anyone point me in the right direction?
Nevermind folks -- problem solved, but haven't quite figured out why. File encoding is my guess.

Python MySQL connector returns bytearray instead of regular string value

I am loading data from one table into pandas and then inserting that data into new table. However, instead of normal string value I am seeing bytearray.
bytearray(b'TM16B0I8') it should be TM16B0I8
What am I doing wrong here?
My code:
engine_str = 'mysql+mysqlconnector://user:pass#localhost/db'
engine = sqlalchemy.create_engine(engine_str, echo=False, encoding='utf-8')
connection = engine.connect()
th_df = pd.read_sql('select ticket_id, history_date', con=connection)
for row in th_df.to_dict(orient="records"):
var_ticket_id = row['ticket_id']
var_history_date = row['history_date']
query = 'INSERT INTO new_table(ticket_id, history_date)....'
For some reason the Python MySql connector only returns bytearrys, (more info in (How return str from mysql using mysql.connector?) but you can decode them into unicode strings with
var_ticket_id = row['ticket_id'].decode()
var_history_date = row['history_date'].decode()
Make sure you are using the right collation, and encoding. I happen to use UTF8MB4_BIN for one of my website db tables. Changed it to utf8mb4_general_ci, and it did the trick.
Producing a bytearray is now the expected behaviour.
It changed with mysql-connector-python 8.0.24 (2021-04-20). According to the v8.0.24 release notes, "Binary columns were returned as strings instead of 'bytes' or 'bytearray'" behaviour was a bug that was fixed in that release.
So producing a Python binaryarray is the correct behaviour, if the database column is a binary type (e.g. binary or varbinary). Previously, it produced a Python string, but now it produces a binaryarray.
So either change the data type in the database to a non-binary data type, or convert the binaryarray to a string in your code. If the column is nullable, you'll have to check for that first; since attempting to invoke decode() method on None would produce an error. You'll also have to be sure the bytes represent a valid string, in the character encoding being used for the decoding/conversion.
Much easier...
How to return str from MySQL using mysql.connector?
Adding mysql-connector-python==8.0.17 to requirements.txt resolved this issue for me
"pip install mysql-connector-python" from terminal

pickle unicode strings with non-ascii caracters to mysql in django

Consider the I have an dictionary that I want to store in db using python's pickle.
My question is: which django models' field should I use?
So far I've been using a CharField, but there seems to be an error:
I pickle a u'\xe9' (i.e. 'É'), and I get:
Incorrect string value: '\xE1, ist...' for column 'edition' at row 1
(the ,"ist..." was because I have more text after the 'É').
I'm using
data = dict();
data['foo'] = input_that_has_the_caracter
to_save_in_db = cPickle.dumps(data)
Should I use a binary field and pickle with a protocol that uses binary? Because I have to change the db in order to do that, so it is better to be sure first...
You should check if you are using a proper encoding for your table AND column in your database backend (I'm assuming MySQL since your error message seems to be from it). In MySQL columns can have different encoding than the table. See if it's UTF-8.

Getting error when INSERT into MySQL

_mysql_exceptions.Warning: Incorrect string value: '\xE7\xB9\x81\xE9\xAB\x94...' for column 'html' at row 1
def getSource(theurl, moved = 0):
if moved == 1:
theurl = urllib2.urlopen(theurl).geturl()
urlReq = urllib2.Request(theurl)
urlReq.add_header('User-Agent',random.choice(agents))
urlResponse = urllib2.urlopen(urlReq)
htmlSource = urlResponse.read()
return htmlSource
new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
new_u.save()
Why is this happening?
I am basically downloading URL of a page...and then saving it to a database using Django.
It only happens sometimes....and sometimes it works fine.
Edit: it seems like I have to set the database to UTF-8? What is the command to do that?
You basically need to ensure proper a string encoding. E.g. the string you provide to django is not UTF-8 encoded and therefore some characters can't be resolved.
Some helpful advice on how to find the encoding of the requested page can be found here: urllib2 read to Unicode
There are 2 ways to go if you want to alter the character set in MySQL.
First is the default of the database, see MySQL Alter database,
and the second is per-table: MySQL Alter Table.
The database gives the default charset for, I believe, new tables. This
can be overridden on a per-table basis, which you need to do since you
already have tables. "utf8" is a supported character set.
Also have a look at Blog about UTF8 with django and MySQL.

Categories