I am trying to insert stock market csv data that I downloaded from Yahoo finance into a mysql table named 'TEST' that is in
a database named 'stocks' but I am getting an error code from python:
InternalError: (1292, "incorrect date value: 'Date for column 'date at
row 1")
the data that I am trying to insert has hundreads of rows that look something like this:
1995-03-31,0.141150,0.141150,0.141150,0.141150,0.105375,10000
the table that i am trying to insert this data into contains the following columns:
date DATE NOT NULL PRIMARY KEY,
open DECIMAL(10,6),
high DECIMAL(10,6),
low DECIMAL(10,6),
close DECIMAL(10,6),
adj_close DECIMAL(10,6),
volume INT,
this is the python code that i have used to insert the data into the table
with open('/home/matt/Desktop/python_projects/csv_files/CH8_SG.csv',
'r') as f:
reader = csv.reader(f)
data = next(reader)
query = 'insert into TEST values (%s,%s,%s,%s,%s, %s, %s)'
query = query.format(','.split('%s' * len(data)))
cursor = connection.cursor()
cursor.execute(query, data)
for data in reader:
cursor.execute(query, data)
cursor.commit()
when i run the code pictured above I get the following error
InternalError: (1292, "incorrect date value: 'Date for column 'date at row 1")
I really think that I am close but I do not know what is going on with that error.
Can anyone help me?
When you write the query to insert date time in Python, you should use convert operation in MySQL. In your case:
query = 'insert into TEST values (%s,%s,%s,%s,%s, %s, %s)'
=> 'insert into TEST values (STR_TO_DATE(%s),%s,%s,%s,%s, %s, %s)'
And make sure that there are no duplicate in "date" field.
Related
I have a pandas data frame, and I want to insert/update (upsert) it to a table. Condition is-
There will be an insert of new rows (this scenario will add current timestamp to the column INSERT_TIMESTAMP while inserting to the table)
This scenario will keep UPDATE_TIMESTAMP & PREVIOUS_UPDATE_TIMESTAMP blank.
For the existing rows(calling the row as existing if the primary key exists in the table), there will be an update of the values into existing row, except INSERT_TIMESTAMP value.
And this scenario will add current timestamp to column UPDATE_TIMESTAMP while updating.
Also it should copy into PREVIOUS_UPDATE_TIMESTAMP value with the UPDATE_TIMESTAMP value before update.
Here is my code where I am trying to upsert a dataframe into a table that is already created.
CODE is the primary key.
Again as mentioned, I want to write a code to Insert the row if the CODE is not present in the table, and adding INSERT_TIMESTAMP as current. And if the CODE is existent in the table already, then update the row in table excluding INSERT_TIMESTAMP, but add current time to UPDATE_TIMESTAMP in that case. Also copy UPDATE_TIMESTAMP for the row before UPDATE and add it to PREVIOUS_UPDATE_TIMESTAMP.
This code is giving me error for list tuple out of index.
for index, row in dataframe.iterrows():
print(row)
cur.execute("""INSERT INTO TABLE_NAME
(CODE, NAME, CODE_GROUP,
INDICATOR, INSERT_TIMESTAMP,
UPDATE_SOURCE, IDD, INSERT_SOURCE)
VALUES (%s, %s, %s, %s, NOW(), %s, %s, %s)
ON CONFLICT(CODE)
DO UPDATE SET
NAME = %s,
CODE_GROUP = %s,
INDICATOR = %s,
UPDATE_TIMESTAMP = NOW(),
UPDATE_SOURCE = %s,
IDD = %s, INSERT_SOURCE = %s,
PREV_UPDATE_TIMESTAMP = EXCLUDED.UPDATE_TIMESTAMP""",
(row["CODE"],
row['NAME'],
row['CODE_GROUP'],
row['INDICATOR'],
row['UPDATE_SOURCE'],
row['IDD'],
row['INSERT_SOURCE']))
conn.commit()
cur.close()
conn.close()
Please help me where is it going wrong. Should I add all the columns into UPDATE statement which I have mentioned in the INSERT statement? Because in insert statement, UPDATE_TIMESTAMP & PREVIOUS_UPDATE_TIMESTAMP should be null. And in UPDATE query, INSERT_TIMESTAMP has to be the same as it was before.
I have the following dataframe:
data = [['Alex', 182.2],['Bob', 183.2],['Clarke', 188.4], ['Kelly', NA]]
df = pd.DataFrame(data, columns = ['Name', 'Height'])
I have the following SQL Server table:
create table dbo.heights as (
name varchar(10),
height float
)
This is my code to upload the data to my table:
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.heights(name, height) values (?, ?)', row.name, row.height)
cnxn.commit()
cursor.close()
cnxn.close()
I want to upload the dataframe into my SQL Server table, but it fails on the null value. I tried replacing the NA with an np.nan value and it still failed. I also tried changing the height column to an "object" and replacing the NA with None and that also failed.
Please use the following instead:
for index, row in df.iterrows():
query = "INSERT INTO dbo.heights(name, height) values (?, ?)"
data = [row.name, row.height]
cursor.execute(query, data)
cursor.commit()
Or use the following:
query = "INSERT INTO dbo.heights(name, height) values (?, ?)"
data = [row.name, row.height for index, row in df.iterrows()]
cursor.executemany(query, data)
cursor.commit()
You'll see your None values as None in Python and as NULL in your database.
I tried replacing the NA with an np.nan
Because in such case you have to first define dataframe schema and make it nullable float.
"By default, SeriesSchema/Column objects assume that values are not nullable. In order to accept null values, you need to explicitly specify nullable=True, or else you’ll get an error."
Further Reading
Try like this
for index,row in df.iterrows():
cursor.execute("INSERT INTO table (`name`, `height`) VALUES (%s, %s)", (row.name, row.height))
cnxn.commit()
cursor.close()
cnxn.close()
I have a csv file from stock that is updated every day.
I want to enter this data in a table and Just add new data every day.
this is my code:
# - *- coding: utf- 8 - *-
import csv
import mysql.connector
from datetime import datetime
cnx = mysql.connector.connect(host= 'localhost',
user= 'root',
passwd='pass',
db='stock')
cursor = cnx.cursor()
cursor.execute("""CREATE TABLE IF NOT EXISTS stock(id INT AUTO_INCREMENT ,
name VARCHAR(50), day DATE UNIQUE, open float, high float, low float,
close float, vol float, PRIMARY KEY(id))""")
a = 0
with open("file path")as f:
data = csv.reader(f)
for row in data:
if a== 0 :
a=+1
else:
cursor.execute('''INSERT INTO stock(name,day,open,high,low,close,vol)
VALUES("%s","%s","%s","%s","%s","%s","%s")''',
(row[0],int(row[1]),float(row[2]),float(row[3]),float(row[4]),
float(row[5]),float(row[6])))
cnx.commit()
cnx.close()
But I can not prevent duplication of information
Assuming that you want to avoid duplicates on (name, day), one approach would be to set a unique key on this tuple of columns. You can then use insert ignore, or better yet on duplicate key syntax to skip the duplicate rows.
You create the table like:
create table if not exists stock(
id int auto_increment ,
name varchar(50),
day date,
open float,
high float,
low float,
close float,
vol float,
primary key(id),
unique (name, day) -- unique constraint
);
Then:
insert into stock(name,day,open,high,low,close,vol)
values(%s, %s, %s, %s, %s, %s, %s)
on duplicate key update name = values(name) -- dummy update
Notes:
you should not have double quotes around the %s placeholders
your original create table code had a unique constraint on column day; this does not fit with your question, as I understood it. In any case, you should put the unique constraint on the column (or set of columns) on which you want to avoid duplicates.
I am performing an ETL task where I am querying tables in a Data Warehouse to see if it contains IDs in a DataFrame (df) which was created by joining tables from the operational database.
The DataFrame only has ID columns from each joined table in the operational database. I have created a variable for each of these columns, e.g. 'billing_profiles_id' as below:
billing_profiles_dim_id = df['billing_profiles_dim_id']
I am attempting to iterated row by row to see if the ID here is in the 'billing_profiles_dim' table of the Data Warehouse. Where the ID is not present, I want to populate the DWH tables row by row using the matching ID rows in the ODB:
for key in billing_profiles_dim_id:
sql = "SELECT * FROM billing_profiles_dim WHERE id = '"+str(key)+"'"
dwh_cursor.execute(sql)
result = dwh_cursor.fetchone()
if result == None:
sqlQuery = "SELECT * from billing_profile where id = '"+str(key)+"'"
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name"')
op_cursor = op_connector.execute(sqlInsert)
billing_profile = op_cursor.fetchone()
So far at least, I am receiving the following error:
SyntaxError: EOL while scanning string literal
This error message points at the close of barcket at
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name"')
Which I am currently unable to solve. I'm also aware that this code may run into another problem or two. Could someone please see how I can solve the current issue and please ensure that I head down the correct path?
You are missing a double tick and a +
sqlInsert = "INSERT INTO billing_profile_dim VALUES ('"+str(key)+"','"+billing_profile.name+"')"
But you should really switch to prepared statements like
sql = "SELECT * FROM billing_profiles_dim WHERE id = '%s'"
dwh_cursor.execute(sql,(str(key),))
...
sqlInsert = ('INSERT INTO billing_profile_dim VALUES '
'(%s, %s )')
dwh_cursor.execute(sqlInsert , (str(key), billing_profile.name))
I have 2 tables TBL1 and TBL2.
TBL1 has 2 columns id, nSql.
TBL2 has 3 columns date, custId, userId.
I have 17 rows in TBL1 with id 1 to 17. Each nSql has a SQL query in it.
For example nSql for
id == 1 is: "select date, pId as custId, tId as userId from TBL3"
id == 2 is: "select date, qId as custId, rId as userId from TBL4" ...
nSql result is always same 3 columns.
Below query runs and puts data into the table TBL2. If there is already data in TBL2 for that day, I want the query to replace the data with new data.
If there is not data in TBL2, I want to put data in normal way.
For example, if I run the query in the morning and if I want to run it again in evening, I want new data to replace old data for that day, since data will be inserted into TBL2 everyday.
It is also precaution that if the data already exists (if run by coworker), I do not want duplicate data for that day.
How can I do it?
Thank you.
(I am new to python, I would appreciate if someone could explain in steps and show in the code)
import MySQLdb
# Open connection
con = MySQLdb.Connection(host="localhost", user="root", passwd="root", db="test")
# create a cursor object
cur = con.cursor()
selectStatement = ("select nSql from TBL1")
cur.execute(selectStatement)
res = cur.fetchall()
for outerrow in res:
nSql = outerrow[0]
cur.execute(nSql)
reslt = cur.fetchall()
for row in reslt:
date = row[0]
custId = row[1]
userId = row[2]
insertStatement = ("insert into TBL2( date, custId, userId) values ('%s', %d, %d)" % (date, custId, userId))
cur.execute(insertStatement)
con.commit()
Timestamp (using datetime) all data inserted into the table. Before inserting, delete from table where the datetime's day is today.
For MySQL, you can use function to_days() with day to get which day a datetime is on: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_to-days
When inserting new rows, now() will let you use the datetime value corresponding to the current time: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now