Adding SQL for loop in python - python

i am a newbees for programming, i have an db file with some date, open, high, low , close data in it, and name with 0001.HK; 0002.HK; 0003.HK
then i try to build a loop to take out some data in the database.
conn = sqlite3.connect(os.path.join('data', "hkprice.db"))
def read_price(stock_id):
connect = 'select Date, Open, High, Low, Close, Volume from ' + stock_id
df = pd.read_sql(connect, conn,index_col=['Date'], parse_dates=['Date'])
for y in range(1 ,2):
read_price(str(y).zfill(4) + '.HK')
when it output it show: Execution failed on sql 'select Date, Open, High, Low, Close, Volume from 0001.HK': unrecognized token: "0001.HK"
but i should have the 0001.HK table in the database
what should i do?

If you want to use variables with a query, you need to put a placeholder ?. So in your particular case:
connect = 'select Date, Open, High, Low, Close, Volume from ?'
After that in read_sql you can provide a list of your variables to the params kwarg like so:
df = pd.read_sql(connect, conn, params=[stock_id], index_col=['Date'], parse_dates=['Date'])
If you have multiple parameters and, hence, multiple ? placeholders then when you supply the list of variables to params they need to be in exactly the same order as your ?.
EDIT:
For example if I had a query where I wanted to get data between some dates, this is how I would do it:
start = ['list of dates']
end = ['another list of dates']
query = """select *
from table
where start_date >= ? and
end_date < ?
"""
df = pd.read_sql_query(query, conn, params=[start, end])
Here interpreter will see the first ? and grab the first item from the first list, then when it gets to the second ? it will grab the first item from the second list. If there's a mismatch between the number of ? and the number of supplied params then it will throw an error.

Related

SQLite error: "OperationalError; no such column:” when column does exist

I trying to grab one-year's worth of currency data from the TraderMade Python SDK and store that data in a SQLite database. I have created the DB already with the column headings I need (date, open, high, low, close).
I have written the following code to get and store the data:
conn = db.connect("MarketData.db")
c = conn.cursor()
def data():
tm.set_rest_api_key([MY API KEY])
request = tm.timeseries(
currency='EURUSD',
start="2022-01-01",
end="2022-12-31",
interval="daily",
fields=["open", "high", "low", "close"]
)
print(request)
stmt = '''INSERT INTO eurusd VALUES (?,?,?,?,?), (date, open, high, low, close)'''
c.executemany(stmt, request)
return
data()
conn.commit()
print('complete')
The data is coming from TraderMade fine as 'request' is printing fine:
enter image description here
However, I am getting the following error when I run the code:
enter image description here
I have a date column in my DB. Does anyone know what is wrong with my code? My only thought is that the data is not being pulled in the correct format to store - but I'm not sure why as when 'request' prints it has 5 columns all with the correct column headings.
The column names should come before the values clause:
stmt = '''INSERT INTO eurusd (date, open, high, low, close) VALUES (?,?,?,?,?)'''

Iterate a SQL query via PYODBC and collect the results into a Pandas DF

I am trying to run a query over and over again for all dates in a date range and collect the results into a Pandas DF for each iteration.
I established a connection (PYODBC) and created a list of dates I would like to run through the SQL query to aggregate into a DF. I confirmed that the dates are a list.
link = pyodbc.connect( Connection Details )
date = [d.strftime('%Y-%m-%d') for d in pd.date_range('2020-10-01','2020-10-02')]
type(date)
I created an empty DF to collect the results for each iteration of the SQL query and checked the structure.
empty = pd.DataFrame(columns = ['Date', 'Balance'])
empty
I have the query set up as so:
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
I tried the following for loop in the hopes of aggregating the results of each query execution into the empty df, but I get a blank df.
for day in date:
a = (pd.read_sql_query(sql, link))
empty.append(a)
Any ideas if the issue is related to the SQL setup and/or for loop? A better more efficient way to tackle the issue?
Avoid the loop and run a single SQL query by adding Date as a GROUP BY column and pass start and end dates as parameters for filtering. And use the preferred parameterization method instead of string formatting which pandas.read_sql does support:
# PREPARED STATEMENT WITH ? PLACEHOLDERS
sql = """SALES dt AS "Date"
, SUM(BAL)/1000 AS "Balance"
FROM sales
WHERE item IN (1,2,3,4)
AND dt BETWEEN ? AND ?
GROUP BY dt;
"""
# BIND PARAMS TO QUERY RETURN IN SINGLE DATA FRAME
df = pd.read_sql(sql, conn, params=['2020-10-01', '2020-10-02'])
Looks like you didn't defined the day variable when you generated sql.
That may help:
def sql_gen(day):
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
return sql
for day in date:
a = (pd.read_sql_query(sql_gen(day), link))
empty.append(a)

Read from SQL Server with Python using few parameters from DataFrame

I need to read from SQl Server Database using this parameters:
period of time from uploaded Dataframe (date of order and date after month)
clients id from the same Dataframe
So I have something like this:
sql_sales = """
SELECT
dt,
clientID,
cost
WHERE
dt between %(date1)s AND %(date2)s
AND kod in %(client)s
"""
And I have df with columns:
clientsID
date of order
date after month
I can use list of clients but the code should parsed database with a few lists of paramenters (two of them is a part of period).
sales = sales.append(pd.read_sql(sql_sales, conn, params={'client':df['clientsID].tolist()}))
The way I got something similar to work in the past was to do in {} and then use .format with the parameters listed in order. Also, then you don't need to use the params argument. Finally, if you are using IN with SQL, then in Python you need to create a tuple from the client list. For the line dt between {} AND {}, you may also be able to do dt between ? AND ?.
client = tuple(df['clientsID'].tolist())
sql_sales = """
SELECT
dt,
clientID,
cost
WHERE
dt between {} AND {}
AND kod in {}
""".format(date1,date2,client)
sales = sales.append(pd.read_sql(sql_sales, conn))

Why are my SQL Query parameters not returning proper vales?

I'm trying to create an SQL queries for a large list of records (>42 million) to insert into a remote database. Right now I'm building queries in the format INSERT INTO tablename (columnnames) VALUES (values)
tablename, columnnames, and values are all of varying length so I'm generating a number of placeholders equal to the number of values required.
The result is I have a string called sqcommand that looks like INSERT INTO ColName (?,?,?) VALUES (?,?,?); and a list of parameters that looks like ([Name1, Name2, Name3, Val1, Val2, Val3]).
When try to execute the query as db.execute(sqlcommand, params) I get errors indicating I'm trying to insert into columns "#P1", "#P2", "#P3" et cetera. Why aren't the values from my list properly translating? Where is it getting "#P1" from? I know I don't have a column of that name and as far as I can tell I'm not referencing a column of that name yet the execute method is still trying to use it.
UPDATE: As per request, the full code is below, modified to avoid anything that might be private. The end result of this is to move data, row by row, from an sqlite3 db file to an AWS SQL server.
newDB = pyodbc.connect(newDataBase)
oldDB = sqlite3.connect(oldDatabase)
tables = oldDB.execute("SELECT * FROM sqlite_master WHERE type='table';").fetchall()
t0 = datetime.now()
for table in tables:
print('Parsing:', str(table[1]))
t1 = datetime.now()
colInfo = oldDB.execute('PRAGMA table_info('+table[1]+');').fetchall()
cols = list()
cph = ""
i = 0
for col in colInfo:
cph += "?,"
cols.append(str(col[1]))
rowCount = oldDB.execute("SELECT COUNT(*) FROM "+table[1]+" ;").fetchall()
count = 0
while count <= int(rowCount[0][0]):
params = list()
params.append(cols)
count += 1
row = oldDB.execute("SELECT * FROM "+table[1]+" LIMIT 1;").fetchone()
ph = ""
for val in row:
ph += "?,"
params = params.append(str(val))
ph = ph[:-1]
cph = cph[:-1]
print(str(table[1]))
sqlcommand = "INSERT INTO "+str(table[1])+" ("+cph+") VALUES ("+ph+");"
print(sqlcommand)
print(params)
newDB.execute(sqlcommand, params)
sqlcommand = "DELETE FROM ? WHERE ? = ?;"
oldDB.execute(sqlcommand, (str(table[1]), cols[0], vals[0],))
newDB.commit()
Unbeknownst to me, column names can't be passed as parameters. Panagiotis Kanavos answered this in a comment. I guess I'll have to figure out a different way to generate the queries. Thank you all very much, I appreciate it.

How to make an efficient query for extracting enteries of all days in a database in sets?

I have a database that includes 440 days of several series with a sampling time of 5 seconds. There is also missing data.
I want to calculate the daily average. The way I am doing it now is that I make 440 queries and do the averaging afterward. But, this is very time consuming since for every query the whole database is searched for related entries. I imagine there must be a more efficient way of doing this.
I am doing this in python, and I am just learning sql. Here's the query section of my code:
time_cur = date_begin
Data = numpy.zeros(shape=(N, NoFields - 1))
X = []
nN = 0
while time_cur<date_end:
X.append(time_cur)
cur = con.cursor()
cur.execute("SELECT * FROM os_table \
WHERE EXTRACT(year from datetime_)=%s\
AND EXTRACT(month from datetime_)=%s\
AND EXTRACT(day from datetime_)=%s",\
(time_cur.year, time_cur.month, time_cur.day));
Y = numpy.array([0]*(NoFields-1))
n = 0.0
while True:
n = n + 1
row = cur.fetchone()
if row == None:
break
Y = Y + numpy.array(row[1:])
Data[nN][:] = Y/n
nN = nN + 1
time_cur = time_cur + datetime.timedelta(days=1)
And, my data looks like this:
datetime_,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14
2012-11-13-00:07:53,42,30,0,0,1,9594,30,218,1,4556,42,1482,42
2012-11-13-00:07:58,70,55,0,0,2,23252,55,414,2,2358,70,3074,70
2012-11-13-00:08:03,44,32,0,0,0,11038,32,0,0,5307,44,1896,44
2012-11-13-00:08:08,36,26,0,0,0,26678,26,0,0,12842,36,1141,36
2012-11-13-00:08:13,33,26,0,0,0,6590,26,0,0,3521,33,851,33
I appreciate your suggestions.
Thanks
Iman
I don't know the np function so I don't understand what are you averaging. If you show your table and the logic to get the average...
But this is how to get a daily average for a single column
import psycopg2
conn = psycopg2.connect('host=localhost4 port=5432 dbname=cpn')
cursor = conn.cursor()
cursor.execute('''
select
datetime::date as day,
avg(c1) as c1_average,
avg(c2) as c2_average
from os_table
where datetime between %s and %s
group by 1
order by 1
''',
(time_cur, date_end)
);
rs = cursor.fetchall()
conn.close()
for day in rs:
print day[0], day[1], day[2]
This answer uses SQL Server syntax - I am not sure how different PostgreSQL is - it should be fairly similar, you may find things like the DATEADD, DATEDIFF and CONVERT statements are different, (Actually, almost certainly the CONVERT statement - just convert the date to a varchar instead -I am just using it as a reportName, so it not vital) You should be able to follow the theory of this, even if the code doesn't run in PostgreSQL without tweaking.
First Create a Reports Table ( you will use this to link to the actual table you want to report on )
CREATE TABLE Report_Periods (
report_name VARCHAR(30) NOT NULL PRIMARY KEY,
report_start_date DATETIME NOT NULL,
report_end_date DATETIME NOT NULL,
CONSTRAINT date_ordering
CHECK (report_start_date <= report_end_date)
)
Next populate the report table with the dates you need to report on, there are many ways to do this - the method I've chosen here will only use the days you need, but you could create this with all dates you are ever likely to use, so you only have to do it once.
INSERT INTO Report_Periods (report_name, report_start_date, report_end_date)
SELECT CONVERT(VARCHAR, [DatePartOnly], 0) AS DateName,
[DatePartOnly] AS StartDate,
DATEADD(ms, -3, DATEADD(dd,1,[DatePartOnly])) AS EndDate
FROM ( SELECT DISTINCT DATEADD(DD, DATEDIFF(DD, 0, datetime_), 0) AS [DatePartOnly]
FROM os_table ) AS M
Note in SQL server, the smallest time allowed is 3 milliseconds - so the above statement adds 1 day, then subtracts 3 milliseconds to create a start and end datetime for a day. Again PostgrSQL may have different values
This means you can simply link the reports table back to your os_table to get averages, counts etc very simply
SELECT AVG(value) AS AvgValue, COUNT(value) AS NumValues, R.report_name
FROM os_table AS T
JOIN Report_Periods AS R ON T.datetime_>= R.report_start_date AND T.datetime_<= R.report_end_date
GROUP BY R.report_name

Categories