Replacing old data with new data using python - python

I have 2 tables TBL1 and TBL2.
TBL1 has 2 columns id, nSql.
TBL2 has 3 columns date, custId, userId.
I have 17 rows in TBL1 with id 1 to 17. Each nSql has a SQL query in it.
For example nSql for
id == 1 is: "select date, pId as custId, tId as userId from TBL3"
id == 2 is: "select date, qId as custId, rId as userId from TBL4" ...
nSql result is always same 3 columns.
Below query runs and puts data into the table TBL2. If there is already data in TBL2 for that day, I want the query to replace the data with new data.
If there is not data in TBL2, I want to put data in normal way.
For example, if I run the query in the morning and if I want to run it again in evening, I want new data to replace old data for that day, since data will be inserted into TBL2 everyday.
It is also precaution that if the data already exists (if run by coworker), I do not want duplicate data for that day.
How can I do it?
Thank you.
(I am new to python, I would appreciate if someone could explain in steps and show in the code)
import MySQLdb
# Open connection
con = MySQLdb.Connection(host="localhost", user="root", passwd="root", db="test")
# create a cursor object
cur = con.cursor()
selectStatement = ("select nSql from TBL1")
cur.execute(selectStatement)
res = cur.fetchall()
for outerrow in res:
nSql = outerrow[0]
cur.execute(nSql)
reslt = cur.fetchall()
for row in reslt:
date = row[0]
custId = row[1]
userId = row[2]
insertStatement = ("insert into TBL2( date, custId, userId) values ('%s', %d, %d)" % (date, custId, userId))
cur.execute(insertStatement)
con.commit()

Timestamp (using datetime) all data inserted into the table. Before inserting, delete from table where the datetime's day is today.
For MySQL, you can use function to_days() with day to get which day a datetime is on: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_to-days
When inserting new rows, now() will let you use the datetime value corresponding to the current time: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now

Related

Create a def to Calculate the average time - there are multiple columns - Python or SQL

I have two columns with the following data: date of access to the site and column with the email of the client who accessed it; many customers access on more than one different day.
What business question do i need to answer: what average time of purchase interest per customer. that is, I need to make the differences between the dates (in days) and create a column with the average of each customer. Some customers accessed on more than 3 different dates, so I believe a function is needed to solve.
The name of my dataframe is dt, and the columns are: data and email_proposta
The date is in this format: 2016-01-01
What business question do i need to answer: what average time of purchase interest per customer. that is, I need to make the differences between the dates (in days) and create a column with the average of each customer. Some customers accessed on more than 3 different dates, so I believe a function is needed to solve.
Here I am making a temp table in a database. Of course make sure you do not break any production database.
Notice in the example that the third client does not show up, since he only visited the site once.
Using sqlite3:
conn = sqlite3.connect(db) # aim at your database
c = conn.cursor()
# dangerQuery = 'DROP TABLE IF EXISTS temp'
# c.execute(dangerQuery)
query = 'CREATE TABLE temp (date text, email_proposta text);'
c.execute(query)
query = "INSERT INTO temp VALUES ('2016-01-01', 'first#client.net'), ('2016-01-30', 'first#client.net'), ('2016-02-05', 'first#client.net');"
c.execute(query)
query = "INSERT INTO temp VALUES ('2016-01-02', 'second#client.net'), ('2016-01-06', 'second#client.net'), ('2016-01-06', 'second#client.net');"
c.execute(query)
query = "INSERT INTO temp VALUES ('2016-01-30', 'first#client.net'), ('2016-01-30', 'first#client.net'), ('2016-02-05', 'first#client.net');"
c.execute(query)
query = "INSERT INTO temp VALUES ('2016-03-13', 'third#client.net');"
c.execute(query)
query_Select = "SELECT DISTINCT date, email_proposta FROM temp ORDER BY email_proposta,date DESC;"
c.execute(query_Select)
rows = c.fetchall()
format = '%Y-%m-%d'
visits_dic = {} # email: [date1, date2, ..]
for row in rows:
email = row[1]
visit_date = datetime.datetime.strptime(row[0], format)
if email not in visits_dic:
visits_dic[email] = [visit_date]
else:
visits_dic[email].append(visit_date)
client_gap_dic = {} # email: average days between
for email, dates in visits_dic.items():
if len(dates) < 2:
continue
daysBetween = []
i = 0
while i < len(dates) - 1:
days = dates[i] - dates[i+1]
days = days.days
daysBetween.append(days)
i += 1
average = sum(daysBetween) / len(daysBetween)
# finally save your findings:
client_gap_dic[email] = average

automatically insert only one row in table after calculating the sum of column

i have 3 table in my database
CREATE TABLE IF NOT EXISTS depances (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
type VARCHAR NOT NULL,
nom VARCHAR,
montant DECIMAL(100,2) NOT NULL,
date DATE,
temp TIME)
CREATE TABLE IF NOT EXISTS transactions (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
montant DECIMAL(100,2),
medecin VARCHAR,
patient VARCHAR,
acte VARCHAR,
date_d DATE,
time_d TIME,
users_id INTEGER)
CREATE TABLE IF NOT EXISTS total_jr (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
total_revenu DECIMAL(100,2),
total_depance DECIMAL(100,2),
total_différence DECIMAL(100,2),
date DATE)
my idea is to insert defrent value in table depances and transaction using a GUI interface.
and after that adding the SUM of montant.depances in total_depance.total_jr
and the SUM of montant.transactions in total_revenu.total_jr where all rows have the same time
that's the easy part using this code
self.cur.execute( '''SELECT SUM(montant) AS totalsum FROM depances WHERE date = %s''',(date,))
result = self.cur.fetchall()
for i in result:
o = i[0]
self.cur_t = self.connection.cursor()
self.cur_t.execute( '''INSERT INTO total_jr(total_depance)
VALUES (%s)'''
, (o,))
self.connection.commit()
self.cur.execute( '''UPDATE total_jr SET total_depance = %s WHERE date = %s''',(o, date))
self.connection.commit()
But every time it adds a new row to the table of total_jr
How can i add thos value of SUM(montant) to the table where the date is the same every time its only put the value of sum in one row not every time it add a new row
The result should will be like this
id|total_revenu|total_depance|total_différence|date
--+------------+-------------+----------------+----
1 sum(montant1) value value 08/07/2020
2 sum(montant2) value value 08/09/2020
3 sum(montant3) value value 08/10/2020
but it only give me this result
id|total_revenu|total_depance|total_différence|date
--+------------+-------------+----------------+----
1 1 value value 08/07/2020
2 2 value value 08/07/2020
3 3 value value 08/7/2020
if there is any idea or any hit that will be hulpefull
You didn't mention which DBMS or SQL module you're using so I'm guessing MySQL.
In your process, run the update first and check how many rows were changed. If zero row changed, then insert a new row for that date.
self.cur.execute( '''SELECT SUM(montant) AS totalsum FROM depances WHERE date = %s''',(date,))
result = self.cur.fetchall()
for i in result:
o = i[0]
self.cur.execute( '''UPDATE total_jr SET total_depance = %s WHERE date = %s''',(o, date))
rowcnt = self.cur.rowcount # number of rows updated - psycopg2
self.connection.commit()
if rowcnt == 0: # no rows updated, need to insert new row
self.cur_t = self.connection.cursor()
self.cur_t.execute( '''INSERT INTO total_jr(total_depance, date)
VALUES (%s, %s)'''
, (o, date))
self.connection.commit()
I find a solution for anyone who need it in future first of all we need to update the table
create_table_total_jr = ''' CREATE TABLE IF NOT EXISTS total_jr (
id SERIAL PRIMARY KEY UNIQUE NOT NULL,
total_revenu DECIMAL(100,2),
total_depance DECIMAL(100,2),
total_différence DECIMAL(100,2),
date DATE UNIQUE)''' #add unique to the date
and after that we use the UPSERT and ON CONFLICT
self.cur_t.execute( ''' INSERT INTO total_jr(date) VALUES (%s)
ON CONFLICT (date) DO NOTHING''', (date,))
self.connection.commit()
with this code when there is an insert value with the same date it will do nothing
after that we update the value of the SUM
self.cur.execute( '''UPDATE total_jr SET total_depance = %s WHERE date = %s''',(o, date))
self.connection.commit()
Special thanks to Mike67 for his help
You do not need 2 database calls for this. As #Mike67 suggested UPSERT functionality is what you want. However, you need to send both date and total_depance. In SQL that becomes:
insert into total_jr(date,total_depance)
values (date_value, total_value
on conflict (date)
do update
set total_depance = excluded.total_depance;
or depending on input total_depance just the transaction value while on the table total_depance is an accumulation:
insert into total_jr(date,total_depance)
values (date_value, total_value
on conflict (date)
do update
set total_depance = total_depance + excluded.total_depance;
I believe your code then becomes something like (assuming the 1st insert is correct)
self.cur_t.execute( ''' INSERT INTO total_jr(date,total_depance) VALUES (%s1,$s2)
ON CONFLICT (date) DO UPDATE set total_depance = excluded.$s2''',(date,total_depance))
self.connection.commit()
But that could off, you will need to verify.
Tip of the day: You should change the column name date to something else. Date is a reserved word in both Postgres and the SQL Standard. It has predefined meanings based on its context. While you may get away with using it as a data name Postgres still has the right to change that at any time without notice, unlikely but still true. If so, then your code (and most code using that/those table(s)) fails, and tracking down why becomes extremely difficult. Basic rule do not use reserved words as data names; using reserved words as data or db object names is a bug just waiting to bite.

Populating Table with Values from other table if ID not in DWH table

I am performing an ETL task where I am querying tables in a Data Warehouse to see if it contains IDs in a DataFrame (df) which was created by joining tables from the operational database.
The DataFrame only has ID columns from each joined table in the operational database. I have created a variable for each of these columns, e.g. 'billing_profiles_id' as below:
billing_profiles_dim_id = df['billing_profiles_dim_id']
I am attempting to iterated row by row to see if the ID here is in the 'billing_profiles_dim' table of the Data Warehouse. Where the ID is not present, I want to populate the DWH tables row by row using the matching ID rows in the ODB:
for key in billing_profiles_dim_id:
sql = "SELECT * FROM billing_profiles_dim WHERE id = '"+str(key)+"'"
dwh_cursor.execute(sql)
result = dwh_cursor.fetchone()
if result == None:
sqlQuery = "SELECT * from billing_profile where id = '"+str(key)+"'"
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name"')
op_cursor = op_connector.execute(sqlInsert)
billing_profile = op_cursor.fetchone()
So far at least, I am receiving the following error:
SyntaxError: EOL while scanning string literal
This error message points at the close of barcket at
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name"')
Which I am currently unable to solve. I'm also aware that this code may run into another problem or two. Could someone please see how I can solve the current issue and please ensure that I head down the correct path?
You are missing a double tick and a +
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name+"')"
But you should really switch to prepared statements like
sql = "SELECT * FROM billing_profiles_dim WHERE id = '%s'"
dwh_cursor.execute(sql,(str(key),))
...
sqlInsert = ('INSERT INTO billing_profile_dim VALUES '
'(%s, %s )')
dwh_cursor.execute(sqlInsert , (str(key), billing_profile.name))

Update SQLITE DB with multiple python lists

I'm attempting to update my sqlite db with 2 python lists. I have a sqlite db with three fields. Name, number, date. I also have three python lists with similar names. I'm trying to figure out a way to update my sqlite db with data from these 2 lists. I can get the db created, and even get a single column filled, but I cant seem to update it correctly or at all. Is there a way to INSERT both lists at once? Rather than INSERT a single column and then UPDATE the db with the other?
Here is what I have so far:
name_list = []
number_list = []
date = now.date()
strDate = date.strftime("%B %Y")
tableName = strDate
sqlTable = 'CREATE TABLE IF NOT EXISTS ' + tableName + '(name text, number integer, date text)'
c.execute(sqlTable)
conn.commit()
for i in name_list:
c.execute('INSERT INTO January2018(names) VALUES (?)', [i])
conn.commit()
I can't seem to get past this point. I still need to add another list of data (number_list) and attach the date to each row.
Here's what I have on that:
for i in number_list:
c.execute('UPDATE myTable SET number = ? WHERE name', [i])
conn.commit()
Any help would be much appreciated. And if you need more information, please let me know.
You can use executemany with zip:
c.executemany('INSERT INTO January2018 (name, number) VALUES (?, ?)', zip(name_list, number_list))
conn.commit()

Saving a pandas dataframe into sqlite with different column names?

I have a a sqlite database and dataframe with different column names but they refer to the same thing. E.g.
My database Cars has the car Id, Name and Price.
My dataframe df has the car Identity, Value and Name.
Additional : I would also like to add an additional 'date' column in the database that is not there in the df and insert it based on the current date.
I would like to save the df in the database so that Id = Identity, Price = Value, Name = Name and date = something specified by the user or current
So I cannot to the usual df.to_sql (unless i rename the column names, but i wonder if there is a better way to do this)
I tried first to sync the names just without the date column cur.execute("INSERT INTO Cars VALUES(?,?,?)",df.to_records(index=False))
However, the above does not work and gives me an error that the binding is incorrect. Plus, the order of the columns in the DB and DF is different
I'm not even sure how to handle the part where I have the extra date column, so any help would be great. Below is a sample code to generate all the values.
import sqlite3 as lite
con = lite.connect('test.db')
with con:
cur = con.cursor()
cur.execute("CREATE TABLE Cars(Id INT, Name TEXT, Price INT, date TEXT)")
df = pd.DataFrame({'Identity': range(5), 'Value': range(5,10), 'Name': range(10,15)})
You can do:
cur.executemany("INSERT INTO Cars (Id, Name, Price) VALUES(?,?,?)", list(df.to_records(index=False)))
Besides, you should specify the dtype attribute of your dataframe as numpy.int32 to meet the constraint of table 'Cars'
con = sqlite3.connect('test.db')
cur = con.cursor()
cur.execute("CREATE TABLE Cars(Id INT, Name TEXT, Price INT, date TEXT)")
df = pandas.DataFrame({'Identity': range(5), 'Value': range(5,10), 'Name': range(10,15)}, dtype=numpy.int32)
cur.executemany("INSERT INTO Cars (Id, Price, Name) VALUES(?,?,?)", list(df[['Identity', 'Value', 'Name']].to_records(index=False)))
query ="SELECT * from Cars"
cur.execute(query)
rows= cur.fetchall()
for row in rows:
print (row)

Categories