I trying to grab one-year's worth of currency data from the TraderMade Python SDK and store that data in a SQLite database. I have created the DB already with the column headings I need (date, open, high, low, close).
I have written the following code to get and store the data:
conn = db.connect("MarketData.db")
c = conn.cursor()
def data():
tm.set_rest_api_key([MY API KEY])
request = tm.timeseries(
currency='EURUSD',
start="2022-01-01",
end="2022-12-31",
interval="daily",
fields=["open", "high", "low", "close"]
)
print(request)
stmt = '''INSERT INTO eurusd VALUES (?,?,?,?,?), (date, open, high, low, close)'''
c.executemany(stmt, request)
return
data()
conn.commit()
print('complete')
The data is coming from TraderMade fine as 'request' is printing fine:
enter image description here
However, I am getting the following error when I run the code:
enter image description here
I have a date column in my DB. Does anyone know what is wrong with my code? My only thought is that the data is not being pulled in the correct format to store - but I'm not sure why as when 'request' prints it has 5 columns all with the correct column headings.
The column names should come before the values clause:
stmt = '''INSERT INTO eurusd (date, open, high, low, close) VALUES (?,?,?,?,?)'''
I am trying to run a query over and over again for all dates in a date range and collect the results into a Pandas DF for each iteration.
I established a connection (PYODBC) and created a list of dates I would like to run through the SQL query to aggregate into a DF. I confirmed that the dates are a list.
link = pyodbc.connect( Connection Details )
date = [d.strftime('%Y-%m-%d') for d in pd.date_range('2020-10-01','2020-10-02')]
type(date)
I created an empty DF to collect the results for each iteration of the SQL query and checked the structure.
empty = pd.DataFrame(columns = ['Date', 'Balance'])
empty
I have the query set up as so:
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
I tried the following for loop in the hopes of aggregating the results of each query execution into the empty df, but I get a blank df.
for day in date:
a = (pd.read_sql_query(sql, link))
empty.append(a)
Any ideas if the issue is related to the SQL setup and/or for loop? A better more efficient way to tackle the issue?
Avoid the loop and run a single SQL query by adding Date as a GROUP BY column and pass start and end dates as parameters for filtering. And use the preferred parameterization method instead of string formatting which pandas.read_sql does support:
# PREPARED STATEMENT WITH ? PLACEHOLDERS
sql = """SALES dt AS "Date"
, SUM(BAL)/1000 AS "Balance"
FROM sales
WHERE item IN (1,2,3,4)
AND dt BETWEEN ? AND ?
GROUP BY dt;
"""
# BIND PARAMS TO QUERY RETURN IN SINGLE DATA FRAME
df = pd.read_sql(sql, conn, params=['2020-10-01', '2020-10-02'])
Looks like you didn't defined the day variable when you generated sql.
That may help:
def sql_gen(day):
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
return sql
for day in date:
a = (pd.read_sql_query(sql_gen(day), link))
empty.append(a)
I need to read from SQl Server Database using this parameters:
period of time from uploaded Dataframe (date of order and date after month)
clients id from the same Dataframe
So I have something like this:
sql_sales = """
SELECT
dt,
clientID,
cost
WHERE
dt between %(date1)s AND %(date2)s
AND kod in %(client)s
"""
And I have df with columns:
clientsID
date of order
date after month
I can use list of clients but the code should parsed database with a few lists of paramenters (two of them is a part of period).
sales = sales.append(pd.read_sql(sql_sales, conn, params={'client':df['clientsID].tolist()}))
The way I got something similar to work in the past was to do in {} and then use .format with the parameters listed in order. Also, then you don't need to use the params argument. Finally, if you are using IN with SQL, then in Python you need to create a tuple from the client list. For the line dt between {} AND {}, you may also be able to do dt between ? AND ?.
client = tuple(df['clientsID'].tolist())
sql_sales = """
SELECT
dt,
clientID,
cost
WHERE
dt between {} AND {}
AND kod in {}
""".format(date1,date2,client)
sales = sales.append(pd.read_sql(sql_sales, conn))
I'm trying to create an SQL queries for a large list of records (>42 million) to insert into a remote database. Right now I'm building queries in the format INSERT INTO tablename (columnnames) VALUES (values)
tablename, columnnames, and values are all of varying length so I'm generating a number of placeholders equal to the number of values required.
The result is I have a string called sqcommand that looks like INSERT INTO ColName (?,?,?) VALUES (?,?,?); and a list of parameters that looks like ([Name1, Name2, Name3, Val1, Val2, Val3]).
When try to execute the query as db.execute(sqlcommand, params) I get errors indicating I'm trying to insert into columns "#P1", "#P2", "#P3" et cetera. Why aren't the values from my list properly translating? Where is it getting "#P1" from? I know I don't have a column of that name and as far as I can tell I'm not referencing a column of that name yet the execute method is still trying to use it.
UPDATE: As per request, the full code is below, modified to avoid anything that might be private. The end result of this is to move data, row by row, from an sqlite3 db file to an AWS SQL server.
newDB = pyodbc.connect(newDataBase)
oldDB = sqlite3.connect(oldDatabase)
tables = oldDB.execute("SELECT * FROM sqlite_master WHERE type='table';").fetchall()
t0 = datetime.now()
for table in tables:
print('Parsing:', str(table[1]))
t1 = datetime.now()
colInfo = oldDB.execute('PRAGMA table_info('+table[1]+');').fetchall()
cols = list()
cph = ""
i = 0
for col in colInfo:
cph += "?,"
cols.append(str(col[1]))
rowCount = oldDB.execute("SELECT COUNT(*) FROM "+table[1]+" ;").fetchall()
count = 0
while count <= int(rowCount[0][0]):
params = list()
params.append(cols)
count += 1
row = oldDB.execute("SELECT * FROM "+table[1]+" LIMIT 1;").fetchone()
ph = ""
for val in row:
ph += "?,"
params = params.append(str(val))
ph = ph[:-1]
cph = cph[:-1]
print(str(table[1]))
sqlcommand = "INSERT INTO "+str(table[1])+" ("+cph+") VALUES ("+ph+");"
print(sqlcommand)
print(params)
newDB.execute(sqlcommand, params)
sqlcommand = "DELETE FROM ? WHERE ? = ?;"
oldDB.execute(sqlcommand, (str(table[1]), cols[0], vals[0],))
newDB.commit()
Unbeknownst to me, column names can't be passed as parameters. Panagiotis Kanavos answered this in a comment. I guess I'll have to figure out a different way to generate the queries. Thank you all very much, I appreciate it.
I have a database that includes 440 days of several series with a sampling time of 5 seconds. There is also missing data.
I want to calculate the daily average. The way I am doing it now is that I make 440 queries and do the averaging afterward. But, this is very time consuming since for every query the whole database is searched for related entries. I imagine there must be a more efficient way of doing this.
I am doing this in python, and I am just learning sql. Here's the query section of my code:
time_cur = date_begin
Data = numpy.zeros(shape=(N, NoFields - 1))
X = []
nN = 0
while time_cur<date_end:
X.append(time_cur)
cur = con.cursor()
cur.execute("SELECT * FROM os_table \
WHERE EXTRACT(year from datetime_)=%s\
AND EXTRACT(month from datetime_)=%s\
AND EXTRACT(day from datetime_)=%s",\
(time_cur.year, time_cur.month, time_cur.day));
Y = numpy.array([0]*(NoFields-1))
n = 0.0
while True:
n = n + 1
row = cur.fetchone()
if row == None:
break
Y = Y + numpy.array(row[1:])
Data[nN][:] = Y/n
nN = nN + 1
time_cur = time_cur + datetime.timedelta(days=1)
And, my data looks like this:
datetime_,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14
2012-11-13-00:07:53,42,30,0,0,1,9594,30,218,1,4556,42,1482,42
2012-11-13-00:07:58,70,55,0,0,2,23252,55,414,2,2358,70,3074,70
2012-11-13-00:08:03,44,32,0,0,0,11038,32,0,0,5307,44,1896,44
2012-11-13-00:08:08,36,26,0,0,0,26678,26,0,0,12842,36,1141,36
2012-11-13-00:08:13,33,26,0,0,0,6590,26,0,0,3521,33,851,33
I appreciate your suggestions.
Thanks
Iman
I don't know the np function so I don't understand what are you averaging. If you show your table and the logic to get the average...
But this is how to get a daily average for a single column
import psycopg2
conn = psycopg2.connect('host=localhost4 port=5432 dbname=cpn')
cursor = conn.cursor()
cursor.execute('''
select
datetime::date as day,
avg(c1) as c1_average,
avg(c2) as c2_average
from os_table
where datetime between %s and %s
group by 1
order by 1
''',
(time_cur, date_end)
);
rs = cursor.fetchall()
conn.close()
for day in rs:
print day[0], day[1], day[2]
This answer uses SQL Server syntax - I am not sure how different PostgreSQL is - it should be fairly similar, you may find things like the DATEADD, DATEDIFF and CONVERT statements are different, (Actually, almost certainly the CONVERT statement - just convert the date to a varchar instead -I am just using it as a reportName, so it not vital) You should be able to follow the theory of this, even if the code doesn't run in PostgreSQL without tweaking.
First Create a Reports Table ( you will use this to link to the actual table you want to report on )
CREATE TABLE Report_Periods (
report_name VARCHAR(30) NOT NULL PRIMARY KEY,
report_start_date DATETIME NOT NULL,
report_end_date DATETIME NOT NULL,
CONSTRAINT date_ordering
CHECK (report_start_date <= report_end_date)
)
Next populate the report table with the dates you need to report on, there are many ways to do this - the method I've chosen here will only use the days you need, but you could create this with all dates you are ever likely to use, so you only have to do it once.
INSERT INTO Report_Periods (report_name, report_start_date, report_end_date)
SELECT CONVERT(VARCHAR, [DatePartOnly], 0) AS DateName,
[DatePartOnly] AS StartDate,
DATEADD(ms, -3, DATEADD(dd,1,[DatePartOnly])) AS EndDate
FROM ( SELECT DISTINCT DATEADD(DD, DATEDIFF(DD, 0, datetime_), 0) AS [DatePartOnly]
FROM os_table ) AS M
Note in SQL server, the smallest time allowed is 3 milliseconds - so the above statement adds 1 day, then subtracts 3 milliseconds to create a start and end datetime for a day. Again PostgrSQL may have different values
This means you can simply link the reports table back to your os_table to get averages, counts etc very simply
SELECT AVG(value) AS AvgValue, COUNT(value) AS NumValues, R.report_name
FROM os_table AS T
JOIN Report_Periods AS R ON T.datetime_>= R.report_start_date AND T.datetime_<= R.report_end_date
GROUP BY R.report_name