psycopg2.DataError: invalid byte sequence

psycopg2.DataError: invalid byte sequence - python

Here's what I am trying to do. I am trying to input an .csv file to be input into postgres database. I am using psycopg2 and cur_copy_export to do this. However, i'm hit with the error as below. What should i do to overcome this error?
Thanks in Advance
Error:
cur.copy_expert(sql=copy_sql, file=myfile)
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xdf 0x65
CONTEXT: COPY agents, line 1117
My code as below:
//open file from Amazon S3 Bucket
opener = urllib.URLopener()
myurl=("Amazon S3 bucket URL" + srcbucketid + "/" + file_name)
myfile=opener.open(myurl)
copy_sql = """ COPY agents (
UniqueId,
Code,
CountryCode,
DefaultCommissionRate,
ReportingName)
FROM stdin WITH CSV HEADER DELIMITER as ',' QUOTE '\b' NULL AS ''"""
cur.copy_expert(sql=copy_sql, file=myfile)
My database encoding is in "UTF8" format. I cannot change it as its a production database for now.

copy_source = {'Bucket': srcbucketid, 'Key': file_name}
client.copy(copy_source, srcbucketid, 'tmp/{}'.format(file_name))
key = ('s3://'+srcbucketid+'tmp/'+file_name)
print(key)
BLOCKSIZE = 1024*1024
with s3.open('s3://'+srcbucketid+'/'+file_name, 'rb') as inf:
with s3.open('s3://'+srcbucketid+'/tmp/'+file_name, 'wb') as ouf:
while True:
data = inf.read(BLOCKSIZE)
if not data: break
converted = data.decode('latin1').encode('utf-8')
ouf.write(converted)

Related

How to create a csv file virtually without any physical file / path defined?

Need to create a csv file and convert it into byte like data for sending as EDI doc. I am trying to achieve this without having a physical file because location/path is unknown. Let me know if there is anyway we could achieve.
with open(
"/home/some path/*.dat", "r+", newline="\n"
) as write_f:
data_file = csv.writer(write_f, delimiter=';')
header_vals = ["header values"]
query = """data fetching query"""
data_file.writerow(header_vals)
self.env.cr.execute(query)
data_vals = self.env.cr.fetchall()
data_file.writerows(data_vals)
po_data = write_f.read(1024)
return po_data
Try 1: Instead of path, tried IO objects(BytesIO/StringIO)
data_file = BytesIO()
data_write = csv.writer(data_file, delimiter=';')
header_vals = ["header values"]
query = """data fetching query"""
data_write.writerow(header_vals)
self.env.cr.execute(query)
data_vals = self.env.cr.fetchall()
data_write.writerows(data_vals)
Received the error at writerow: TypeError: a bytes-like object is required, not 'str'

BytesIO behaves like a file in binary (!) mode. You need to write bytes to it.
But a csv.writer cannot write bytes, it only writes strings. That's the error message you see.
from io import StringIO
buffer = StringIO()
writer = csv.writer(buffer, delimiter=';')
header_vals = ['column_1', 'column_2']
writer.writerow(header_vals)
print(buffer.getvalue())
# => 'column_1;column_2\r\n'

Writing string output to csv file. Getting error when working with temporary file

I'm iterating through an excel file that I'm pulling from S3. I want to append this data into one file. The data isn't enough to exceed lambda memory limits so I'm saving it into a variable and then converting the string into csv file that I'm looking to upload to S3. When I run a variation of this code locally it works perfectly, not sure what's going wrong when I'm converting it to AWS.
import csv
import boto3
import urllib3
import tempfile
s3 = boto3.client('s3')
bucket = os.environ['S3_BUCKET']
http = urllib3.PoolManager()
def lambda_handler(event, context):
file = readS3('example.xlsx') # load file with Boto3
latest_scan = openpyxl.load_workbook(io.BytesIO(file), data_only=True)
sh = latest_scan.active
a = []
for row in sh['A']:
r5 = http.request(
'GET',
'https://example.com/api/' + str(row.value),
headers={
'Accept': 'text/csv'
}
)
a.append(r5.data.decode('utf-8'))
s = ''.join(a)
temp = tempfile.TemporaryFile(mode='w+', suffix='.csv')
with open(temp, 'w', encoding="utf-8") as f:
for line in s:
f.write(line)
temp.seek(0)
s3.put_object(temp, Bucket = bucket, Key = 'test.csv')
temp.close()
I'm getting:
"errorMessage": "expected str, bytes or os.PathLike object, not _io.TextIOWrapper",
"errorType": "TypeError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line in lambda_handler\n with open(temp,
'w', encoding=\"utf-8\") as f:\n"
]

tempfile.TemporaryFile() opens the file, it doesn't return a filename. So just assign that to f.
with tempfile.TemporaryFile(mode='w+', suffix='.csv', encoding="utf-8") as f:

Store Gtk.Textbuffer in SQL database. Encoding troubles

I'm working on a note taking app using python2/Gtk3/Glade.
The notes are stored in a MySQL Database and displayed in a TextView widget.
I can load/store/display plain text fine. However I want the ability to add images to the note page, and store them in the Database.so the data has to be serialised and I'm having some trouble figuring out how to encode/decode the serialised data going in and out of the Database. I'm getting unicode start byte errors. If was working with files I could just open the file in binary mode, but I'm storing as a string in a Database. I've tried encoding/decoding as UTF-8 and ASCII using bytes() and string.encode()[see the sample code below] and a few other ways but none work.
I am using this function to add the image to the textview buffer:
def _AddImagetoNode(self,oWidget):
filenm = None
seliter = self.GetTreeSelection(self.treeview)
filenm = self.FileOpenDiag("Select an Image To Insert.","Image","*.png,*.jpg,*.bmp")
if filenm == None:
return()
#filenm = "/home/drift/Pictures/a.png"
buf = self.dataview.get_buffer()
pixbuf = GdkPixbuf.Pixbuf.new_from_file(filenm)
#pixbuf.scale_simple(dest_width, dest_height, gtk.gdk.INTERP_BILINEAR)
buf.insert_pixbuf(buf.get_end_iter(), pixbuf)
self.dataview.set_buffer(buf)
self.dataview.show()
This is the function that stores the textview buffer:
def SaveDataView(self):
global DataViewNode
global DataViewIsImage
if len(self.GetProjectName()) == 0:
return()
buf = self.dataview.get_buffer()
format = buf.register_serialize_tagset()
data2 = buf.serialize(buf, format, buf.get_start_iter(), buf.get_end_iter())
#convert bytes(data) to string
data = data2.decode(encoding='UTF-8') #<< i think my problem is here
print("save b4 decode >>>>>>:%s"%data2)
sql = "UPDATE " + self.GetProjectName() + " SET tDataPath=%s WHERE tNodeID=%s"
val = (data, DataViewNode)
self.cursor.execute(sql,val)
self.mariadb_connection.commit()
This is the function that loads the Buffer:
def UpdateDataView(self, nodeid):
global DataViewNode
#global DataViewIsFile
DataViewNode=nodeid
if self.GetProjectName() != None and DataViewNode != None:
self.dataview.set_sensitive(True)
else:
self.dataview.set_sensitive(False)
self.dataview.show()
return()
buf = self.dataview.get_buffer()
buf.set_text('')
enc = self.DbGetNodeData(nodeid)
#convert string(enc) to bytes
data = enc.encode(encoding='UTF-8')#<<< i think my problem is here
print("update after decode >>>>>>>>>: %s"%data)
########### load
format = buf.register_deserialize_tagset()
buf.deserialize(buf, format, buf.get_end_iter(),data)
#buf.set_text(enc)
self.dataview.set_buffer(buf)
self.dataview.show()
I'm using mysql.connector to connect to a mariadb.
This is the sql connection string:
self.mariadb_connection = mariadb.connect(user='box', password='box', host='localhost', database='Boxer',charset='utf8')
This is the error im getting.
Traceback (most recent call last): File "Boxer.py", line 402, in
_TreeSelectionChanged
self.SaveDataView() File "Boxer.py", line 334, in SaveDataView
data = data2.decode(encoding='UTF-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 174: invalid start byte
Traceback (most recent call last): File "Boxer.py", line 398, in
_DataViewLostFocus
self.SaveDataView() File "Boxer.py", line 334, in SaveDataView
data = data2.decode(encoding='UTF-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 174: invalid start byte
With this code I can add/edit plain text in the text view and successfully save/load it but as soon as I add the image, I'm get the encoding errors. Any help would be appreciated.

Here is a more complete example:
def example (self):
#retrieve info from first textview
buf = self.builder.get_object('textbuffer1')
format = buf.register_serialize_tagset()
data = buf.serialize(buf, format, buf.get_start_iter(), buf.get_end_iter())
#run db update to prove it can be inserted into a database
db = psycopg2.connect(database= 'silrep_restore3', host='192.168.0.101',
user='postgres', password = 'true',
port = '5432')
c = db.cursor()
c.execute("UPDATE products SET byt = %s WHERE id = 1", (psycopg2.Binary(data),))
#append info to second treeview as a proof of concept
c.execute("SELECT byt FROM products WHERE id = 1")
data = c.fetchone()[0]
buf = self.builder.get_object('textbuffer2')
format = buf.register_deserialize_tagset()
buf.deserialize(buf, format, buf.get_end_iter(), data)
Since you are using MySQL, I recommend reading this article about inserting and retrieving data like you are.
For my example I used a bytea column. In MySQL this is may be a BLOB or BINARY type.
P.S. Sorry for not having a complete MySQL example in my answer. I would have posted a comment, but comments are pathetic for proper formatting.

Got it workings. thanks to theGtknerd your answer was the key. for anyone else having trouble with this i ended up using the BLOB type for the MySQL field type for the column im working with. I tried BINARY[it returnd malformed serialize data] AND VARBINARY [wouldnt even allow me to create the table] so i ended up using the LONGBLOB type.
here is the working code for anyone that needs it.
def UpdateDataView(self, nodeid):
global DataViewNode
#global DataViewIsFile
DataViewNode=nodeid
if self.GetProjectName() != None and DataViewNode != None:
self.dataview.set_sensitive(True)
else:
self.dataview.set_sensitive(False)
self.dataview.show()
return()
buf = self.dataview.get_buffer()
buf.set_text('')
data = self.DbGetNodeData(nodeid)
if data =='':
return()
format = buf.register_deserialize_tagset()
buf.deserialize(buf, format, buf.get_end_iter(),data)
self.dataview.set_buffer(buf)
self.dataview.show()
def SaveDataView(self):
global DataViewNode
global DataViewIsImage
if len(self.GetProjectName()) == 0:
return()
buf = self.dataview.get_buffer()
enc = buf.get_text(buf.get_start_iter(),buf.get_end_iter(),False)
self.AddData2Db(DataViewNode,enc)
format = buf.register_serialize_tagset()
data = buf.serialize(buf, format, buf.get_start_iter(), buf.get_end_iter())
sql = "UPDATE " + self.GetProjectName() + " SET tDataPath=%s WHERE tNodeID=%s"
val = (data, DataViewNode)
self.cursor.execute(sql,val)
self.mariadb_connection.commit()
and im using this to create the table
sql = "CREATE TABLE %s (tParentNodeID TEXT,tNodeTxt TEXT,tNodeID TEXT,tDataPath LONGBLOB)" %pName
self.cursor.execute(sql)
self.mariadb_connection.commit()

Loading a json file in python

I've got multiple file to load as JSON, they are all formatted the same way but for one of them I can't load it without raising an exception. This is where you can find the file:
File
I did the following code:
def from_seed_data_extract_summoners():
summonerIds = set()
for i in range(1,11):
file_name = 'data/matches%s.json' % i
print file_name
with open(file_name) as data_file:
data = json.load(data_file)
for match in data['matches']:
for summoner in match['participantIdentities']:
summonerIds.add(summoner['player']['summonerId'])
return summonerIds
The error occurs when I do the following: json.load(data_file). I suppose there is a special character but I can't find it and don't know how to replace it. The error generated is:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xeb in position 6: invalid continuation byte
Do you know how I can get ride of it?

Your JSON is trying to force the data into unicode, not just a simple string. You've got some embedded character (probably a space or something not very noticable) that is not able to be forced into unicode.
How to get string objects instead of Unicode ones from JSON in Python?
That is a great thread about making JSON objects more manageable in python.

replace file_name = 'data/matches%s.json' % i with file_name = 'data/matches%i.json' % i
the right syntax is data = json.load(file_name) and not -
with open(file_name) as data_file:
data = json.load(data_file)
EDIT:
def from_seed_data_extract_summoners():
summonerIds = set()
for i in range(1,11):
file_name = 'data/matches%i.json' % i
with open(file_path) as f:
data = json.load(f, encoding='utf-8')
for match in data['matches']:
for summoner in match['participantIdentities']:
summonerIds.add(summoner['player']['summonerId'])
return summonerIds

Try:
json.loads(unicode(data_file.read(), errors='ignore'))
or :
json.loads(unidecode.unidecode(unicode(data_file.read(), errors='ignore')))
(for the second, you would need to install unidecode)

try :
json.loads(data_file.read(), encoding='utf-8')

Simple encryption and decryption of strings by using bz2 in Python

I am trying to create a file with encrypted username, password and computer name when the user logs in. And the same data I want to use again to authenticate the data but by decrypting them first. I am trying to use something that is built in python and simple.
import os
import bz2
os.chdir("D:/test")
encrypted_username = bz2.compress('username')
encrypted_password = bz2.compress('password')
computer_name = os.environ['COMPUTERNAME']
encrypted_computer_name = bz2.compress(computer_name)
f = open("Session.dat", "w")
f.write(encrypted_username + '\n')
f.write(encrypted_password + '\n')
f.write(encrypted_computer_name)
f.close()
f = open("Session.dat", "r")
data = f.read()
d_data = bz2.decompress(data)
f.close()
print(d_data)
But when I decrypt the data in the file and print it. I get the answer as below. Why am I not getting the password and computer name?? Thank you.
username

The code compressed the strings separately. You should read all lines and decompress them line by line as alecxe commented. But that is not practical because compressed data could contians newline(s).
Instead combine strings (In the following code, I used NULL byte \0 as separator), then compress the combined string.
Decompress: After decompress, split combined strings using the same separator.
import os
import bz2
#os.chdir("D:/test")
username = 'username'
password = 'password'
computer_name = os.environ['COMPUTERNAME']
compressed = bz2.compress(username + '\0' + password + '\0' + computer_name)
with open("Session.dat", "wb") as f:
f.write(compressed)
with open("Session.dat", "rb") as f:
d_data = bz2.decompress(f.read())
print(d_data.split('\0'))
BTW, you should use binary mode to read/write compressed data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

psycopg2.DataError: invalid byte sequence - python

Related

How to create a csv file virtually without any physical file / path defined?

Writing string output to csv file. Getting error when working with temporary file

Store Gtk.Textbuffer in SQL database. Encoding troubles

Loading a json file in python

Simple encryption and decryption of strings by using bz2 in Python

Categories

Resources