inserting data into SQL cell, My SQL DB - python

I have created MYSQL DB using XAMPP and created table using python
self.sql = """CREATE TABLE Report_SV (
Date VARCHAR(160) ,
StartTime VARCHAR(160),
EndTime VARCHAR(160),
SystemName VARCHAR(160),
TestBench VARCHAR(160),
DomainName VARCHAR(100),
SourceFile CHAR(200),
Build_Info VARCHAR(200),
HLF VARCHAR(200),
TestcaseID VARCHAR(60),
TestCaseName VARCHAR(100),
Test_Step TEXT(1000),
Step_Result VARCHAR(500))"""
While inserting data into one of its cell my data is not shown properly and only truncated data is viewed.
sql = "INSERT INTO report_sv (Test_Step) VALUES ('FES.statusFESVehicleDynamicSwitchHMI.requestSwitchDrivingExperienceMMIDisplay=1')";
I don't know what is the proper way to insert complete data?

Related

Dataframe to PostgreSQL DB

I query 4hrs data from source PLC MS SQL db, process it with python and write the data to main Postgresql table.
While writing to main Postgres table hourly, there is a duplicate value (previous 3 hrs) -it will result in error (primary key) and prevent the transaction & python error.
So,
I create a temp PostgreSQL table without any key every time hourly
Then copy pandas dataframe to temp table
Then insert rows from temp table --> main PostgreSQL table
Drop temp PostgreSQL table
This python script runs in windows task scheduler hourly
Below is my query.
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
conn = engine.raw_connection()
cur = conn.cursor()
cur.execute("""CREATE TABLE public.table_temp
(
datetime timestamp without time zone NOT NULL,
tagid text COLLATE pg_catalog."default" NOT NULL,
mc text COLLATE pg_catalog."default" NOT NULL,
value text COLLATE pg_catalog."default",
quality text COLLATE pg_catalog."default"
)
TABLESPACE pg_default;
ALTER TABLE public.table_temp
OWNER to postgres;""");
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_temp', null="")
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
cur.execute("""DROP TABLE table_temp CASCADE;""");
conn.commit()
I would like to know if there is any efficient/faster way to do it
If I'm correct in assuming that the data is in the data frame you should just be able to do
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
df.drop_duplicates(subset=None) # Replace None with list of column names that define the primary key ex. ['column_name1', 'column_name2']
df.to_sql('table_main', engine, if_exists='append')
Edit due to comment:
If that's the case you have the right idea. You can make it more efficient by using to_sql to insert the data into the temp table first like so.
engine = create_engine('postgresql://postgres:postgres#host:port/dbname?gssencmode=disable')
df.to_sql('table_temp', engine, if_exists='replace')
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
# cur.execute("""DROP TABLE table_temp CASCADE;"""); # You can drop if you want to but the replace option in to_sql will drop and recreate the table
conn.commit()

How to insert values into a postgresql database with serial id using sqlalchemy

I have a function that I use to update tables in PostgreSQL. It works great to avoid duplicate insertions by creating a temp table and dropping it upon completion. However, I have a few tables with serial ids and I have to pass the serial id in a column. Otherwise, I get an error that the keys are missing. How can I insert values in those tables and have the serial key get assigned automatically? I would prefer to modify the function below if possible.
def export_to_sql(df, table_name):
from sqlalchemy import create_engine
engine = create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method = 'multi')
with engine.begin() as cnx:
insert_sql = f'INSERT INTO {table_name} (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
cnx.execute(insert_sql)
code used to create the tables
CREATE TABLE symbols
(
symbol_id serial NOT NULL,
symbol varchar(50) NOT NULL,
CONSTRAINT PK_symbols PRIMARY KEY ( symbol_id )
);
CREATE TABLE tweet_symols(
tweet_id varchar(50) REFERENCES tweets,
symbol_id int REFERENCES symbols,
PRIMARY KEY (tweet_id, symbol_id),
UNIQUE (tweet_id, symbol_id)
);
CREATE TABLE hashtags
(
hashtag_id serial NOT NULL,
hashtag varchar(140) NOT NULL,
CONSTRAINT PK_hashtags PRIMARY KEY ( hashtag_id )
);
CREATE TABLE tweet_hashtags
(
tweet_id varchar(50) NOT NULL,
hashtag_id integer NOT NULL,
CONSTRAINT FK_344 FOREIGN KEY ( tweet_id ) REFERENCES tweets ( tweet_id )
);
CREATE INDEX fkIdx_345 ON tweet_hashtags
(
tweet_id
);
The INSERT statement does not define the target columns, so Postgresql will attempt to insert values into a column that was defined as SERIAL.
We can work around this by providing a list of target columns, omitting the serial types. To do this we use SQLAlchemy to fetch the metadata of the table that we are inserting into from the database, then make a list of target columns. SQLAlchemy doesn't tell us if a column was created using SERIAL, but we will assume that it is if it is a primary key and is set to autoincrement. Primary key columns defined with GENERATED ... AS IDENTITY will also be filtered out - this is probably desirable as they behave in the same way as SERIAL columns.
import sqlalchemy as sa
def export_to_sql(df, table_name):
engine = sa.create_engine(f'postgresql://{user}:{password}#{host}:5432/{user}')
df.to_sql(con=engine,
name='temporary_table',
if_exists='append',
index=False,
method='multi')
# Fetch table metadata from the database
table = sa.Table(table_name, sa.MetaData(), autoload_with=engine)
# Get the names of columns to be inserted,
# assuming auto-incrementing PKs are serial types
column_names = ','.join(
[f'"{c.name}"' for c in table.columns
if not (c.primary_key and c.autoincrement)]
)
with engine.begin() as cnx:
insert_sql = sa.text(
f'INSERT INTO {table_name} ({column_names}) (SELECT * FROM temporary_table) ON CONFLICT DO NOTHING; DROP TABLE temporary_table'
)
cnx.execute(insert_sql)

SQLite and python - can't set primary keys

I'm trying to create tables using python but when I inspect the data structure in SQLite, the primary keys aren't being assigned. Here's the code for one of the tables. It seems to work as intended except for the primary key part. I'm new to Python and SQLite so I'm probably missing something very obvious but can't find any answers.
# Create a database and connect
conn = sql.connect('Coursework.db')
c = conn.cursor()
# Create the tables from the normalised schema
c.execute('CREATE TABLE IF NOT EXISTS room_host (room_ID integer PRIMARY KEY, host_ID integer)')
c.execute("SELECT count(name) from sqlite_master WHERE type='table' AND name='room_host'")
if c.fetchone()[0] == 1:
c.execute("DROP TABLE room_host")
else:
c.execute('CREATE TABLE room_host (room_ID integer PRIMARY KEY, host_ID integer)')
conn.commit()
# read data from csv
read_listings = pd.read_csv('listings.csv')
room_host = pd.DataFrame(read_listings, columns=['id', 'host_id'])
room_host.set_index('id')
room_host.to_sql("room_host", conn, if_exists='append', index=False)
c.execute("""INSERT INTO room_host (id, host_ID)
SELECT room_host.id, room_host.host_ID
FROM room_host
""")
I can't reporoduce the issue with the primary key, the table is created as expected when I run that SQL statement.
Other than that, the detour through Pandas is not really necessary, the csv module plus .executemany() seems to me as a much more straight-forward way of loading data from a CSV into a table.
import csv
import sqlite3 as sql
conn = sql.connect('Coursework.db')
conn.executescript('CREATE TABLE IF NOT EXISTS room_host (room_ID integer PRIMARY KEY, host_ID integer)')
conn.commit()
with open('listings.csv', encoding='utf8', newline='') as f:
reader = csv.reader(f, delimiter=',')
conn.executemany('INSERT INTO room_host (room_ID, host_ID) VALUES (?, ?)', reader)
conn.commit()

Compare data between MongoDB and MySQL using python script

I am working on a Django Application that uses both MySQL and MongoDB to store its data. What I need to do is to compare the data that are stored in the MongoDB's collection and stored in the MySQL's table.
For example, my MySQL database contains the table "relation", which is created using:
CREATE TABLE relations (service_id int, beneficiary_id int, PRIMARY KEY (service_id, beneficiary_id));
My MongoDB contains a collection called "relation", which is expected to store the same data as the relation table in MySQL. The following is one document of the collection "relation":
{'_id': 0, 'service_id': 1, 'beneficiary_id': 32}
I tried to create a python script that compares the data between the relation table in MySQL and relation collection in Mongo. The script works as the following:
mysql_relations = Relations.objects.values('beneficiary_id', 'service_id')
mongo_relations_not_in_mysql = relations_mongodb.find({'$nor':list(mysql_relations)})
mongo_relations = relations_mongodb.find({}, {'_id': 0, 'beneficiary_id':1, 'service_id': 1})
filter_list = Q()
for mongo_relation in mongo_relations:
filter_list &= Q(mongo_relation)
mysql_relations_not_in_mongo = Relations.objects.exclude(filter_list)
However, this code takes forever.
I think the main problem is because of the primary key that is composed of 2 columns, which required the usage of the Q() and the '$nor'.
What do you suggest?
Just in case someone is interested, I used the following solution to optimize the data comparison.
(The Idea was to create a temporary MySQL Table to store mongo's data, then doing the comparison between the the MySQL tables). The code is below:
Get the relations From MongoDB
mongo_relations = relations_mongodb.find({}, {'_id': 0, 'service_id': 1, 'beneficiary_id': 1})
Create a temporary MySQL table to store MongoDB'S relations
cursor = connection.cursor()
cursor.execute(
"CREATE TEMPORARY TABLE temp_relations (service_id int, beneficiary_id int, INDEX `id_related` (`service_id`, `beneficiary_id`) );"
)
Insert MongoDB's relations not the temporary table just created
cursor.executemany(
'INSERT INTO temp_relations (service_id, beneficiary_id) values (%(service_id)s, %(beneficiary_id)s) ',
list(mongo_relations)
)
Get the MongoDB's relations that does not exist in MySQL
cursor.execute(
"SELECT service_id, beneficiary_id FROM temp_relations WHERE (service_id, beneficiary_id) NOT IN ("
"SELECT service_id, beneficiary_id FROM relations);"
)
mongo_relations_not_in_mysql = cursor.fetchall()
Get MySQL relations that does not exist in MongoDB
cursor.execute(
"SELECT id, service_id, beneficiary_id, date FROM relations WHERE (service_id, beneficiary_id) not IN ("
"SELECT service_id, beneficiary_id FROM temp_relations);"
)
mysql_relations_not_in_mongo = cursor.fetchall()
cursor.close() # Close the connection to MySQL

How to automap the result set of a custom SQL query in SQLAlchemy

I'd like to run raw SQL queries through SQLAlchemy and have the resulting rows use python types which are automatically mapped from the database type. This AutoMap functionality is available for tables in the database. Is it available for any arbitrary resultset?
As an example, we build small sqlite database:
import sqlite3
con = sqlite3.connect('test.db')
cur = con.cursor()
cur.execute("CREATE TABLE Trainer (id INTEGER PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), dob DATE, tiger_skill FLOAT);")
cur.execute("INSERT INTO Trainer VALUES (1, 'Joe', 'Exotic', '1963-03-05', 0.6)")
cur.execute("INSERT INTO Trainer VALUES (2, 'Carole', 'Baskin', '1961-06-06', 0.3)")
cur.close()
con.commit()
con.close()
And uing SQLAlchemy, I query the newly created database "test.db":
from sqlalchemy import create_engine
engine = create_engine("sqlite:///test.db")
connection = engine.connect()
CUSTOM_SQL_QUERY = "SELECT count(*) as total_trainers, min(dob) as first_dob from Trainer"
result = connection.execute(CUSTOM_SQL_QUERY)
for r in result:
print(r)
>>> (2, '1961-06-06')
Notice that the second column in the result set is a python string, not a python datetime.date object. Is there a way for sqlalchemy to automap an arbitrary result set? Or is this automap reflection capability limited to just actual tables in the database?

Categories