Saving a pandas dataframe into sqlite with different column names? - python

I have a a sqlite database and dataframe with different column names but they refer to the same thing. E.g.
My database Cars has the car Id, Name and Price.
My dataframe df has the car Identity, Value and Name.
Additional : I would also like to add an additional 'date' column in the database that is not there in the df and insert it based on the current date.
I would like to save the df in the database so that Id = Identity, Price = Value, Name = Name and date = something specified by the user or current
So I cannot to the usual df.to_sql (unless i rename the column names, but i wonder if there is a better way to do this)
I tried first to sync the names just without the date column cur.execute("INSERT INTO Cars VALUES(?,?,?)",df.to_records(index=False))
However, the above does not work and gives me an error that the binding is incorrect. Plus, the order of the columns in the DB and DF is different
I'm not even sure how to handle the part where I have the extra date column, so any help would be great. Below is a sample code to generate all the values.
import sqlite3 as lite
con = lite.connect('test.db')
with con:
cur = con.cursor()
cur.execute("CREATE TABLE Cars(Id INT, Name TEXT, Price INT, date TEXT)")
df = pd.DataFrame({'Identity': range(5), 'Value': range(5,10), 'Name': range(10,15)})

You can do:
cur.executemany("INSERT INTO Cars (Id, Name, Price) VALUES(?,?,?)", list(df.to_records(index=False)))
Besides, you should specify the dtype attribute of your dataframe as numpy.int32 to meet the constraint of table 'Cars'
con = sqlite3.connect('test.db')
cur = con.cursor()
cur.execute("CREATE TABLE Cars(Id INT, Name TEXT, Price INT, date TEXT)")
df = pandas.DataFrame({'Identity': range(5), 'Value': range(5,10), 'Name': range(10,15)}, dtype=numpy.int32)
cur.executemany("INSERT INTO Cars (Id, Price, Name) VALUES(?,?,?)", list(df[['Identity', 'Value', 'Name']].to_records(index=False)))
query ="SELECT * from Cars"
cur.execute(query)
rows= cur.fetchall()
for row in rows:
print (row)

Related

Python store variable with two columns into table created with SQLite

I created a variable that stores patient ID and a count of the number of missed appointments per patient. I created a table with SQLite and I am trying to store my variable into my created table but I am getting an error of "ValueError: parameters are of unsupported type". Here is my code so far:
import pandas as pd
import sqlite3
conn = sqlite3.connect('STORE')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS PatientNoShow")
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" text)""")
df = pd.read_csv(r"C:\missedappointments.csv")
df2 = df[df['No-show']=="Yes"]
pt_counts = df2["PatientId"].value_counts()
c.executemany("INSERT OR IGNORE INTO PatientNoShow VALUES (?, ?)", pt_counts)
Thank you in advance for any help! Still learning, so any kind of "explain to me like I'm 5" answers will be appreciated! Also, once I create my tables and store info in them, how would I print or get a visual of the output?
You wrote that the two variables are of type text in
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" text)""")
but pt_counts contains integers because it counts the values in the column PatientId, besides .executemany() needs a sequence to work properly.
This piece of code should work if PatientId is of string type:
import pandas as pd
import sqlite3
conn = sqlite3.connect('STORE')
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS PatientNoShow")
c.execute("""CREATE TABLE IF NOT EXISTS PatientNoShow ("PatientId" text, "No-show" integer)""") # type changed
df = pd.read_csv(r"C:/Users/bob/Desktop/Trasporti_project/Matchings_locations/norm_data/standard_locations.csv")
pt_counts = df["standard_name"].value_counts()
c.executemany("INSERT OR IGNORE INTO PatientNoShow VALUES (?, ?)", pt_counts.iteritems()) # this is a sequence

How to integrate python modules with mysql Alchemy using ORM library

I am trying to update mysql database table
So I started by creating ORM Object helping me to reduce the volume of an update query by using UPDATE, WHERE Conditions
First of all, I created an ORM variable as this ORM Object is a filtered data from dataframe by using a condition in another pd.data_frame CSV
this is my simple rule as to be easy to create conditions like this
myOutlook_inBox = pd.read_csv (r'' + mydir + 'test.CSV', usecols=
['Subject','Body', 'From: (Name)', 'To: (Name)' ], encoding='latin-1')
this is simple ORM extracted data from pd.read_csv
replaced_sbj_value = myOutlook_inBox['Subject']
.str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()
and this ORM is extracting csv.column from myOutlook_inBox['Subject']
replaced_sbj_value = myOutlook_inBox['Subject']
.str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()
myOutlook_inBox["Subject"] = replaced_sbj_value
and this is a condition that I am using to filter a specific data
frm_mwfy_to_te = myOutlook_inBox.loc[myOutlook_inBox['From:
(Name)'].str.contains("mowafy", na=False)
& myOutlook_inBox['To:(Name)'].str.contains("te",
na=False)].drop_duplicates(keep=False)
frm_mwfy_to_te.Subject
and this variable is filtered rows in mysql database in a column called Subject
filtered_data = all_data
.loc[all_data.site_code.str.contains('|'.join(frm_mwfy_to_te.Subject))]
and this is my sql query, all I need now I need to create a query that's updates column called "pending" filters in a column called "site_code" and update rows which value contains filtered_data as to update or replace values in column pending with a value TE
update_db_query = engine.execute("UPDATE govtracker SET pending = 'TE'
WHERE site_code = " + filtered_data)
I am thinking that I am on the wrong scenario any Ideas to solve this
Note: I don't need to mention the old value in my query I just want to update the value in the same row according to the the filtered data frame by the new value I mentioned in the query
For example
according to frm_mwfy_to_te.Subject as Subject is a columns name called in csv file
Let's say the output of this ORM frm_mwfy_to_te.Subject
Subject
LCAIN20804
LDELE30434
LSINI20260
and this is my whole code
from sqlalchemy import create_engine
import pandas as pd
import os
import csv
import MySQLdb
from sqlalchemy import types, create_engine
# MySQL Connection
MYSQL_USER = 'root'
MYSQL_PASSWORD = 'Mharooney'
MYSQL_HOST_IP = '127.0.0.1'
MYSQL_PORT = 3306
MYSQL_DATABASE = 'mydb'
engine = create_engine('mysql+mysqlconnector://'+MYSQL_USER+'
:'+MYSQL_PASSWORD+'#'+MYSQL_HOST_IP+':'+str(MYSQL_PORT)+'/'+MYSQL_DATABASE,
echo=False)
#engine = create_engine('mysql+mysqldb://root:#localhost:123456/myDB?
charset=utf8mb4&binary_prefix=true', echo=False)
mydir = (os.getcwd()).replace('\\', '/') + '/'
all_data = pd.read_sql('SELECT * FROM govtracker', engine)
# .drop(['#'], axis=1)
myOutlook_inBox = pd.read_csv(r'' + mydir + 'test.CSV', usecols=['Subject',
'Body', 'From: (Name)', 'To: (Name)'],
encoding='latin-1')
myOutlook_inBox.columns = myOutlook_inBox.columns.str.replace(' ', '')
#this object extract 5 chars and 5 numbers from specific column in csv
replaced_sbj_value = myOutlook_inBox['Subject'].str.extract(pat='(L(?:DEL|CAI|SIN).\d{5})').dropna()
#this columns I want to filter in database
myOutlook_inBox["Subject"] = replaced_sbj_value
# this conditions filters and get and dublicate repeated data from outlook
exported file
# Condition 1 any mail from mowafy to te
frm_mwfy_to_te = myOutlook_inBox.loc[myOutlook_inBox['From:
(Name)'].str.contains("mowafy", na=False)
& myOutlook_inBox['To:
(Name)'].str.contains("te", na=False)].drop_duplicates(
keep=False)
frm_mwfy_to_te.Subject
filtered_data = all_data.loc[all_data.site_code.str.contains
('|'.join(frm_mwfy_to_te.Subject))]
print(myOutlook_inBox)
all_data.replace('\n', '', regex=True)
df = all_data.where((pd.notnull(all_data)), None)
print(df)
print("Success")
print(frm_mwfy_to_te.Subject)
print(filtered_data)
# rows = engine.execute("SELECT * FROM govtracker")#.fetchall()
# print(rows)
update_db_query = engine.execute("UPDATE govtracker SET pending = 'TE'
WHERE site_code = " + filtered_data)
"""engine = create_engine('postgresql+psycopg2://user:pswd#mydb')
df.to_sql('temp_table', engine, if_exists='replace')"""
# select_db_query = pd.read_sql("SELECT * FROM govtracker", con = engine)
#print(update_db_query)
Now let's say this is the output of my ORM then I will use this ORM as to filter and get the row of these three values from mysql database as to update every row contains these values and I want to update columns called Pending and pending status in my sql
and this is my database query
CREATE TABLE `mydb`.`govtracker` (
`id` INT,
`site_name` VARCHAR(255),
`region` VARCHAR(255),
`site_type` VARCHAR(255),
`site_code` VARCHAR(255),
`tac_name` VARCHAR(255),
`dt_readiness` DATE,
`rfs` VARCHAR(255),
`rfs_date` DATE,
`huawei_1st_submission_date` DATE,
`te_1st_submission_date` DATE,
`huawei_2nd_submission_date` DATE,
`te_2nd_submission_date` DATE,
`huawei_3rd_submission_date` DATE,
`te_3rd_submission_date` DATE,
`acceptance_date_opt` DATE,
`acceptance_date_plan` DATE,
`signed_sites` VARCHAR(255),
`as_built_date` DATE,
`as_built_status` VARCHAR(255),
`date_dt` DATE,
`dt_status` VARCHAR(255),
`shr_status` VARCHAR(255),
`dt_planned` INT(255),
`integeration_status` VARCHAR(255),
`comments_snags` LONGTEXT,
`cluster_name` LONGTEXT,
`type_standalone_colocated` VARCHAR(255),
`installed_type_standalone_colocated` VARCHAR(255),
`status` VARCHAR(255),
`pending` VARCHAR(255),
`pending_status` LONGTEXT,
`problematic_details` LONGTEXT,
`ets_tac` INT(255),
`region_r` VARCHAR(255),
`sf6_signed_date` DATE,
`sf6_signed_comment` LONGTEXT,
`comment_history` LONGTEXT,
`on_air_owner` VARCHAR(255),
`pp_owner` VARCHAR(255),
`report_comment` LONGTEXT,
`hu_opt_area_owner` VARCHAR(255),
`planning_owner` VARCHAR(255),
`po_number` VARCHAR(255),
`trigger_date` DATE,
`as_built_status_tr` VARCHAR(255)
) ENGINE = InnoDB;
Another Important note:
In excel while I using filter in some column it shows the all values in the column I selected lets to say Pending is the column I've selected which have values Accepted & PAC in progress Planning TE PP DT FM Rollout Integration Opt Team
So now all the rest columns have values like this
So should I have to create a table something like columns_values and fill this table with all these values I have, as these values are static values
It is easy to solve my case
Last Note: This database is according to an existing xlsm file but I push the data from xlsm to mysql and now mysql Is my main database, not the excel formats but I am updating mysql database through csv file not in my database the orm object frm_mwfy_to_te.Subject is an extracted data from the data frame in the csv file
Any Ideas Here?
I hope everything is clear enough
Is this material could help me or not?
https://auth0.com/blog/sqlalchemy-orm-tutorial-for-python-developers/#SQLAlchemy-ORM
It's called TL;DR
Important Note: the value of fitered data is actually as pandas Dataframe but for one column only from CSV file because I want to filter with this dataframe column values like I posted before to update some columns in my database I just started with updating one column called pending one as to see the result after that I'll update the other columns by the way the script the I want to create that to search in my database with this values in filtered data for an example I have a value called LCAIN20804 I want to take this value and to filter in database table then go the column called Huawei 1st submission date if it wasn't filled then fill with current data if it was filled go to pending column and replace the old value with TE then go to pending_status and replace the old value with waiting TE acceptance and so on that's a small part of my script I want to create
I hope this is clear enough
If you want to turn a pandas DataFrame into a SQL update statement, it may be nice to first transform it into a list of tuples, where the tuples are the new column values, and then use engine.executemany (https://stackoverflow.com/a/27743541/5015356)
values = [tuple(x) for x in filtered_data.values]
query = """
UPDATE govtracker
SET pending = 'TE'
WHERE site_code = '%s')
"""
connection = engine.connect()
update_db_query = connection.execute(query, values)
For each tuple (<sitecode>), this will execute the update statement. If you want to update more columns or expand the where clause, just add the additional columns to filtered_data, and add a new %s where you want the other value to appear.
Just make sure you keep the columns in the correct order!

Set Sqlite query results as variables [duplicate]

This question already has answers here:
How can I get dict from sqlite query?
(16 answers)
Closed 4 years ago.
Issue:
Hi, right now I am making queries to sqlite and assigning the result to variables like this:
Table structure: rowid, name, something
cursor.execute("SELECT * FROM my_table WHERE my_condition = 'ExampleForSO'")
found_record = cursor.fetchone()
record_id = found_record[0]
record_name = found_record[1]
record_something = found_record[2]
print(record_name)
However, it's very possible that someday I have to add a new column to the table. Let's put the example of adding that column:
Table structure: rowid, age, name, something
In that scenario, if we run the same code, name and something will be assigned wrongly and the print will not get me the name but the age, so I have to edit the code manually to fit the current index. However, I am working now with tables of more than 100 fields for a complex UI and doing this is tiresome.
Desired output:
I am wondering if there is a better way to catch results by using dicts or something like this:
Note for lurkers: The next snipped is made up code that does not works, do not use it.
cursor.execute_to(my_dict,
'''SELECT rowid as my_dict["id"],
name as my_dict["name"],
something as my_dict["something"]
FROM my_table WHERE my_condition = "ExampleForSO"''')
print(my_dict['name'])
I am probably wrong with this approach, but that's close to what I want. That way if I don't access the results as an index, and if add a new column, no matter where it's, the output would be the same.
What is the correct way to achieve it? Is there any other alternatives?
You can use namedtuple and then specify connection.row_factory in sqlite. Example:
import sqlite3
from collections import namedtuple
# specify my row structure using namedtuple
MyRecord = namedtuple('MyRecord', 'record_id record_name record_something')
con = sqlite3.connect(":memory:")
con.isolation_level = None
con.row_factory = lambda cursor, row: MyRecord(*row)
cur = con.cursor()
cur.execute("CREATE TABLE my_table (record_id integer PRIMARY KEY, record_name text NOT NULL, record_something text NOT NULL)")
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Andrej', 'This is something'))
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Andrej', 'This is something too'))
cur.execute("INSERT INTO my_table (record_name, record_something) VALUES (?, ?)", ('Adrika', 'This is new!'))
for row in cur.execute("SELECT * FROM my_table WHERE record_name LIKE 'A%'"):
print(f'ID={row.record_id} NAME={row.record_name} SOMETHING={row.record_something}')
con.close()
Prints:
ID=1 NAME=Andrej SOMETHING=This is something
ID=2 NAME=Andrej SOMETHING=This is something too
ID=3 NAME=Adrika SOMETHING=This is new!

How to add multiple Columns into Sqlite3 from a for loop in Python

Admittedly I a still very new to both Python and Sqlite3, and I am attempting to add the contents of two lists into a database so that one list is in the first column and the second list shows up in the second column. To this point, I have been unsuccessful. I am defenitely making a fundamental error, and the error message that I get is this: "sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type."
my code is this:
import sqlite3
names = ['Tom', 'Dick', 'Harry']
ids = ['A452', 'B698', 'Kd9f']
conn = sqlite3.connect('testforinput.db')
c = conn.cursor()
c.execute("CREATE TABLE thetable(name TEXT, id TEXT)")
index = 0
for link in names:
idofperson = ids[index]
c.execute("INSERT INTO thetable(name, id)VALUES(?, ?)", ( [link], idofperson ))
index+=1
conn.commit()
conn.close()
The error occurs because of the for loop specifically the "idofperson" variable
The desired outcome is that I would like to have two columns created in sql one being name and the other being id.
Any help would be greatly appreciated.
I think you just change
index =0
for link in names:
idofperson = ids[index]
c.execute("INSERT INTO thetable(name, id)VALUES(?, ?)", ( [link], idofperson ))
to this (use enumrate and change [list] to list, because you pass a list into a column need TEXT type):
for index, link in enumrable(names):
idofperson = ids[index]
c.execute("INSERT INTO thetable(name, id)VALUES(?, ?)", ( link, idofperson ))
your variable index is not increasing.try using the enumerate on for loop. or just add index += 1 after execute
the error is occurring because of the unsupported data type you are trying to push in, you can't store list as it is, you need to change to another supported data types, i like this solution ....it worked for me https://stackoverflow.com/a/18622264/6180263
for your problem, try this:
import sqlite3
names = ['Tom', 'Dick', 'Harry']
ids = ['A452', 'B698', 'Kd9f']
data = zip(names, ids)
conn = sqlite3.connect('testforinput.db')
c = conn.cursor()
c.execute("CREATE TABLE thetable(name TEXT, id TEXT)")
for d in data:
sql = "INSERT INTO thetable (name, id) VALUES ('%s', '%s'); " % d
c.execute(sql)
conn.commit()
conn.close()
I suggest change data to a list of dict, like this [{'name':'Tom', 'id': 'A452'}, {'name':'dick', 'id':'B698'}..]
and you can generate insert sql by data, this make the insert more flexible.

Replacing old data with new data using python

I have 2 tables TBL1 and TBL2.
TBL1 has 2 columns id, nSql.
TBL2 has 3 columns date, custId, userId.
I have 17 rows in TBL1 with id 1 to 17. Each nSql has a SQL query in it.
For example nSql for
id == 1 is: "select date, pId as custId, tId as userId from TBL3"
id == 2 is: "select date, qId as custId, rId as userId from TBL4" ...
nSql result is always same 3 columns.
Below query runs and puts data into the table TBL2. If there is already data in TBL2 for that day, I want the query to replace the data with new data.
If there is not data in TBL2, I want to put data in normal way.
For example, if I run the query in the morning and if I want to run it again in evening, I want new data to replace old data for that day, since data will be inserted into TBL2 everyday.
It is also precaution that if the data already exists (if run by coworker), I do not want duplicate data for that day.
How can I do it?
Thank you.
(I am new to python, I would appreciate if someone could explain in steps and show in the code)
import MySQLdb
# Open connection
con = MySQLdb.Connection(host="localhost", user="root", passwd="root", db="test")
# create a cursor object
cur = con.cursor()
selectStatement = ("select nSql from TBL1")
cur.execute(selectStatement)
res = cur.fetchall()
for outerrow in res:
nSql = outerrow[0]
cur.execute(nSql)
reslt = cur.fetchall()
for row in reslt:
date = row[0]
custId = row[1]
userId = row[2]
insertStatement = ("insert into TBL2( date, custId, userId) values ('%s', %d, %d)" % (date, custId, userId))
cur.execute(insertStatement)
con.commit()
Timestamp (using datetime) all data inserted into the table. Before inserting, delete from table where the datetime's day is today.
For MySQL, you can use function to_days() with day to get which day a datetime is on: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_to-days
When inserting new rows, now() will let you use the datetime value corresponding to the current time: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now

Categories