MySQL JSON Query sends random numbers - python

I'm writing a MySQL Query in Python using pymysql to send JSON data to a MySQL table. When it sends the data, the following results are produced.
| id | data |
+----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 21 | 0x7B226775696C6473223A207B22383538393137313431303633343031353132223A207B226D656D62657273223A207B22343130373936333038323431303535373436223A207B22706F696E7473223A203137302C2022726166666C65735F776F6E223A20307D7D2C2022726166666C6573223A20307D7D7D |
The code I used to send the data is the following:
self.data_str = json.dumps(self.data)
sql = "insert into jsondata ( data) values ('" + self.data_str + "') "
mysql.exec_sql(sql)
mysql.close_db()
The exec_sql function is:
def exec_sql(self, sql):
# sql is insert, delete or update statement
cursor = self.db.cursor()
try:
cursor.execute(sql)
# commit sql to mysql
self.db.commit()
cursor.close()
return True
except:
self.db.rollback()
return False
An example line of JSON data is
{"guilds": {"853317141063401512": {"members": {"410846308241055746": {"points": 250, "raffles_won": 0}}, "raffles": 0}}}
My SQL table was setup as follows:
| Field | Type | Null | Key | Default | Extra |
+-------+--------+------+-----+---------+----------------+
| id | int(6) | NO | PRI | NULL | auto_increment |
| data | blob | YES | | NULL | |
+-------+--------+------+-----+---------+----------------+

The bytes are not random. They are the hex representation of ASCII bytes in your JSON string. Observe:
mysql> select unhex('7B226775696C6473223A207B22383538393137313431303633343031353132223A207B226D656D62657273223A207B22343130373936333038323431303535373436223A207B22706F696E7473223A203137302C2022726166666C65735F776F6E223A20307D7D2C2022726166666C6573223A20307D7D7D') as j;
+--------------------------------------------------------------------------------------------------------------------------+
| j |
+--------------------------------------------------------------------------------------------------------------------------+
| {"guilds": {"858917141063401512": {"members": {"410796308241055746": {"points": 170, "raffles_won": 0}}, "raffles": 0}}} |
+--------------------------------------------------------------------------------------------------------------------------+
What you're seeing is that when you store a JSON string in a binary column (BLOB), MySQL "forgets" that it is supposed to be text, and dumps only the hex encoding of the bytes when you query it.
If you want to store JSON, then use the JSON data type, not BLOB.

Related

What is the Postgres _text type?

I have a Postgres table with a _text type (note the underscore) and am unable to determine how to insert the string [] into that table.
Here is my table definition:
CREATE TABLE public.newtable (
column1 _text NULL
);
I have the postgis extension enabled:
CREATE EXTENSION IF NOT EXISTS postgis;
And my python code:
conn = psycopg2.connect()
conn.autocommit = True
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
rows = [("[]",)]
insert_query = f"INSERT INTO newtable (column1) values %s"
psycopg2.extras.execute_values(cur, insert_query, rows, template=None, page_size=100)
This returns the following error:
psycopg2.errors.InvalidTextRepresentation: malformed array literal: "[]"
LINE 1: INSERT INTO newtable (column1) values ('[]')
^
DETAIL: "[" must introduce explicitly-specified array dimensions.
How can I insert this data? What does this error mean? And what is a _text type in Postgres?
Pulling my comments together:
CREATE TABLE public.newtable (
column1 _text NULL
);
--_text gets transformed into text[]
\d newtable
Table "public.newtable"
Column | Type | Collation | Nullable | Default
---------+--------+-----------+----------+---------
column1 | text[] | | |
insert into newtable values ('{}');
select * from newtable ;
column1
---------
{}
In Python:
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres')
cur = con.cursor()
cur.execute("insert into newtable values ('{}')")
con.commit()
cur.execute("select * from newtable")
cur.fetchone()
([],)
cur.execute("truncate newtable")
con.commit()
cur.execute("insert into newtable values (%s)", [[]])
con.commit()
cur.execute("select * from newtable")
cur.fetchone()
([],)
From the psycopg2 docs Type adaption Postgres arrays are adapted to Python lists and vice versa.
UPDATE
Finding _text type in Postgres system catalog pg_type. In psql:
\x
Expanded display is on.
select * from pg_type where typname = '_text';
-[ RECORD 1 ]--+-----------------
oid | 1009
typname | _text
typnamespace | 11
typowner | 10
typlen | -1
typbyval | f
typtype | b
typcategory | A
typispreferred | f
typisdefined | t
typdelim | ,
typrelid | 0
typelem | 25
typarray | 0
typinput | array_in
typoutput | array_out
typreceive | array_recv
typsend | array_send
typmodin | -
typmodout | -
typanalyze | array_typanalyze
typalign | i
typstorage | x
typnotnull | f
typbasetype | 0
typtypmod | -1
typndims | 0
typcollation | 100
typdefaultbin | NULL
typdefault | NULL
typacl | NULL
Refer to the pg_type link above to get information on what the columns refer to. The typcategory of A as mapped in "Table 52.63. typcategory Codes Code Category A Array types" at the link is one clue. As well as typinput, typoutput, etc values.

Python 3 - How do I extract data from SQL database and process the data and append to pandas dataframe row by row?

I have a MySQL database, its columns are:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| id | int unsigned | NO | PRI | NULL | auto_increment |
| artist | text | YES | | NULL | |
| title | text | YES | | NULL | |
| album | text | YES | | NULL | |
| duration | text | YES | | NULL | |
| artistlink | text | YES | | NULL | |
| songlink | text | YES | | NULL | |
| albumlink | text | YES | | NULL | |
| instrumental | tinyint(1) | NO | | 0 | |
| downloaded | tinyint(1) | NO | | 0 | |
| filepath | text | YES | | NULL | |
| language | json | YES | | NULL | |
| genre | json | YES | | NULL | |
| style | json | YES | | NULL | |
| artistgender | text | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
I need to extract data from it and process the data and add the data to a pandas DataFrame.
I know how to extract data from SQL database, and I have already implemented a way to pass the data to DataFrame, but it is extremely slow (about 30 seconds), whereas when I used a flat list of namedtuples the operation is tremendously faster (under 3 seconds).
Specifically, filepath is default NULL unless the file is downloaded (currently none of the songs are downloaded), and when Python gets filepath the value will be None, and I need that value become ''.
And because MySQL doesn't have BOOLEAN type, I need to cast the received ints to bool.
And the language, genre, style fields are tags stored as JSON lists, and they are all currently NULL, when Python gets them they are strings and I need to make them lists using json.loads unless they are None, and if they are None I need to append empty lists instead.
This is my inefficient solution to the problem:
import json
import mysql.connector
from pandas import *
fields = {
"artist": str(),
"album": str(),
"title": str(),
"id": int(),
"duration": str(),
"instrumental": bool(),
"downloaded": bool(),
"filepath": str(),
"language": list(),
"genre": list(),
"style": list(),
"artistgender": str(),
"artistlink": str(),
"albumlink": str(),
"songlink": str(),
}
conn = mysql.connector.connect(
user="Estranger", password=PWD, host="127.0.0.1", port=3306, database="Music"
)
cursor = conn.cursor()
def proper(x):
return x[0].upper() + x[1:]
def fetchdata():
cursor.execute("select {} from songs".format(', '.join(list(fields))))
data = cursor.fetchall()
dataframes = list()
for item in data:
entry = list(map(proper, item[0:3]))
entry += [item[3]]
for j in range(4, 7):
cell = item[j]
if isinstance(cell, int):
entry.append(bool(cell))
elif isinstance(cell, str):
entry.append(cell)
if item[7] is not None:
entry.append(item[7])
else:
entry.append('')
for j in range(8, 11):
entry.append(json.loads(item[j])) if item[j] is not None else entry.append([])
entry.append(item[11])
entry += item[12:15]
df = DataFrame(fields, index=[])
row = Series(entry, index = df.columns)
df = df.append(row, ignore_index=True)
dataframes.append(df)
songs = concat(dataframes, axis=0, ignore_index=True)
songs.sort_values(['artist', 'album', 'title'], inplace=True)
return songs
Currently there are 4464 songs in the database and the code takes about 30 seconds to finish.
I sorted my SQL database by artist and title and I need to resort the entries by artist, album and title for QTreeWidget, and MySQL sorts data differently from Python and I prefer Python sorting.
In my testing, df.loc and df = df.append() methods are slow, pd.concat is fast, but I really don't know how to create dataframes with only one row and pass flat lists to dataframe instead of a dictionary, and if there is a faster way than pd.concat, or if operations in the for loop can be vectorized.
How can my code be improved?
I figured out how to create a DataFrame with a list of lists and specify column names, and it is tremendously faster, but I still don't know how to also specify the data types elegantly without the code throwing errors...
def fetchdata():
cursor.execute("select {} from songs".format(', '.join(list(fields))))
data = cursor.fetchall()
for i, item in enumerate(data):
entry = list(map(proper, item[0:3]))
entry += [item[3]]
for j in range(4, 7):
cell = item[j]
if isinstance(cell, int):
entry.append(bool(cell))
elif isinstance(cell, str):
entry.append(cell)
if item[7] is not None:
entry.append(item[7])
else:
entry.append('')
for j in range(8, 11):
entry.append(json.loads(item[j])) if item[j] is not None else entry.append([])
entry.append(item[11])
entry += item[12:15]
data[i] = entry
songs = DataFrame(data, columns=list(fields), index=range(len(data)))
songs.sort_values(['artist', 'album', 'title'], inplace=True)
return songs
And I still need the type conversions, they are already pretty fast, but they don't look elegant.
You could make a list of conversion functions for each column:
funcs = [
str.capitalize,
str.capitalize,
str.capitalize,
int,
str,
bool,
bool,
lambda v: v if v is not None else '',
lambda v: json.loads(v) if v is not None else [],
lambda v: json.loads(v) if v is not None else [],
lambda v: json.loads(v) if v is not None else [],
str,
str,
str,
str,
]
Now you can apply the function that converts the value for each field
for i, item in enumerate(data):
row = [func(field) for field, func in zip(item, funcs)]
data[i] = row
For the first part of the question, for generic database 'history':
import pymysql
# open database
connection = pymysql.connect("localhost","root","123456","blue" )
# prepare a cursor object using cursor() method
cursor = connection.cursor()
# prepare SQL command
sql = "SELECT * FROM history"
try:
cursor.execute(sql)
data = cursor.fetchall()
print ("Last row uploaded",list(data[-1]))
except:
print ("Error: unable to fetch data")
# disconnect from server
connection.close()
You can simply fetch data from the table and create a Data-frame using Pandas.
import pymysql
import pandas as pd
from pymysql import Error
conn = pymysql.connect(host="",user="",connect_timeout=10,password="",database="",port=)
if conn:
cursor = conn.cursor()
sql = f"""SELECT * FROM schema.table_name;"""
cursor.execute(sql)
data =pd.DataFrame(cursor.fetchall())
conn.close()
# You can go ahead and create a csv from this Data-Frame
csv_gen = pd.to_csv(data,index=False)
enter code here

How to pass id from flask app to mysql database

I have a flask application that is connected to the MySQL database.
NOTE
database name = evaluation
table name = evaluation
columns = eval_id, eval_name, date
I have an 'evaluation table' with field eval_id, eval_name and date in it.
mysql> use evaluation;
mysql> describe evaluation;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| eval_id | int(11) | NO | PRI | NULL | |
| eval_name | varchar(20) | NO | | NULL | |
| date | datetime(6) | NO | | NULL | |
+-----------+-------------+------+-----+---------+-------+
How can I write an API to get a particular evaluation by its id?
I tried the below, but it doesn't work.
#app.route('/getEval/<int:eval_id>', methods=['GET'])
def getEvalByID(eval_id):
cur.execute('''select * from evaluation.evaluation where eval_id=eval_id''')
res = cur.fetchall()
return jsonify({'test':str(res)})
How can I correct this and get only the evaluation based on the eval_id mentioned in the app.route.
You need to place the eval_id not as String but as a VAR.
#app.route('/getEval/<int:eval_id>', methods=['GET'])
def getEvalByID(eval_id):
cur.execute('select * from evaluation.evaluation where eval_id=' + str(eval_id))
res = cur.fetchall()
return jsonify({'test':str(res)})
try with cur.execute('select * from evaluation.evaluation where eval_id={}'.format(eval_id))

Using MySQLdb to return values - Python

I have a table called coords and it is defined as:
mysql> describe coords;
+-----------+--------------+------+-----+------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| location | varchar(150) | NO | UNI | NULL | |
| latitude | float(20,14) | YES | | 0.00000000000000 | |
| longitude | float(20,14) | YES | | 0.00000000000000 | |
-----------------------------------------------------------------------------
I am using the MySQLdb import in my Python script. The purpose of this table is to store (as you can guess, but for clarity) location coordinates (but only when I do not have the coordinates already for a particular location).
I will be querying this table in my Python program to see if I already have coordinates for a pre-requested location. I'm doing this to speed up the use of the geopy package that interrogates Google's Geolocation Service.
How do I store the returned floats that correspond to a location? So far I have the following:
myVar = cur.execute("SELECT latitude, longitude FROM coords WHERE location ='" + jobLocation + "';")
if myVar == 1:
print(cur.fetchone())
else:
try:
_place, (_lat, _lon) = geos.geocode(jobLocation, region='GB', exactly_one=False)
print("%s: %.5f, %.5f" % _place, (_lat, _lon))
except ValueError as err:
print(err)
The code works (well, not really...) but I have no idea of how to get the returned coordinates into separate float variables.
Can you help?
When you do cur.fetchone(), you need to store the result somewhere:
row = cur.fetchone()
print row[0], row[1]
Now row[0] will contain the latitude, and row[1] the longitude.
If you do this when connecting:
cur = con.cursor(mdb.cursors.DictCursor)
you can then use a dictionary to refer to the columns by name:
row = cur.fetchone()
print row["latitude"], row["longitude"]

MySQL Python Insert strange?

I dont see why it's not working. I have created several databases and tables and obviously no problem. But I am stuck with this table which is created from django data model. To clarify what I have done, created new database and table from mysql console and try to insert from python and working. But, this one is strange for me.
class Experiment(models.Model):
user = models.CharField(max_length=25)
filetype = models.CharField(max_length=10)
createddate= models.DateField()
uploaddate = models.DateField()
time = models.CharField(max_length=20)
size = models.CharField(max_length=20)
located= models.CharField(max_length=50)
Here is view in mysql console
mysql> describe pmass_experiment;
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user | varchar(25) | NO | | NULL | |
| filetype | varchar(10) | NO | | NULL | |
| createddate | date | NO | | NULL | |
| uploaddate | date | NO | | NULL | |
| time | varchar(20) | NO | | NULL | |
| size | varchar(20) | NO | | NULL | |
| located | varchar(50) | NO | | NULL | |
+-------------+-------------+------+-----+---------+----------------+
8 rows in set (0.01 sec)
Above pmass_experiment table is created by django ORM after python manage.py syncdb
Now I am trying to insert data into pmass_experiment through python MySQLdb
import MySQLdb
import datetime,time
import sys
conn = MySQLdb.connect(
host="localhost",
user="root",
passwd="root",
db="experiment")
cursor = conn.cursor()
user='tchand'
ftype='mzml'
size='10MB'
located='c:\'
date= datetime.date.today()
time = str(datetime.datetime.now())[10:19]
#Insert into database
sql = """INSERT INTO pmass_experiment (user,filetype,createddate,uploaddate,time,size,located)
VALUES (user, ftype, date, date, time, size, located)"""
try:
# Execute the SQL command
cursor.execute(sql)
# Commit your changes in the database
conn.commit()
except:
# Rollback in case there is any error
conn.rollback()
# disconnect from server
conn.close()
But, unfortunately nothing is inserting. I am guessing it's may be due to primary_key (id) in table which is not incrementing automatically.
mysql> select * from pmass_experiment;
Empty set (0.00 sec)
can you simply point out my mistake?
Thanks
sql = """INSERT INTO pmass_experiment (user,filetype,createddate,uploaddate,time,size,located)
VALUES (user, ftype, date, date, time, size, located)"""
Parametrize your sql and pass in the values as the second argument to cursor.execute:
sql = """INSERT INTO pmass_experiment (user,filetype,createddate,uploaddate,time,size,located)
VALUES (%s, %s, %s, %s, %s, %s, %s)"""
try:
# Execute the SQL command
cursor.execute(sql,(user, ftype, date, date, time, size, located))
# Commit your changes in the database
conn.commit()
except Exception as err:
# logger.error(err)
# Rollback in case there is any error
conn.rollback()
It is a good habit to always parametrize your sql since this will help prevent sql injection.
The original sql
INSERT INTO pmass_experiment (user,filetype,createddate,uploaddate,time,size,located)
VALUES (user, ftype, date, date, time, size, located)
seems to be valid. An experiment in the mysql shell shows it inserts a row of NULL values:
mysql> insert into foo (first,last,value) values (first,last,value);
Query OK, 1 row affected (0.00 sec)
mysql> select * from foo order by id desc;
+-----+-------+------+-------+
| id | first | last | value |
+-----+-------+------+-------+
| 802 | NULL | NULL | NULL |
+-----+-------+------+-------+
1 row in set (0.00 sec)
So I'm not sure why your are not seeing any rows committed to the database table.
Nevertheless, the original sql is probably not doing what you intend.

Categories