I am running a query that I plan on using multiple times. However when running this query the 'my-job1a' has to be different everytime so I was planning on making this go by the date time. Does anybody know how to implement the date time function for this?
from google.cloud import bigquery
client = bigquery.Client('dataworks-356fa')
query = query
dataset = client.dataset('FirebaseArchive')
table = dataset.table(name='test1')
tbl = dataset.table(name='test12')
job = client.run_async_query('my-job1a', query)
job.destination = tbl
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
i believe "my-job1a" is a constant string. and you want to change the string for new query.
import datetime
# "my-job1a" replace this with "my-job1a" + datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
job = client.run_async_query("my-job1a-" + datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), query)
this will change for each second . if you want in millisecond then change the strftime function parameter. if you don't want such a big string , then change strftime parameter as per your choice.
Related
I'm new to Python and I'm interested to switch to Python all my current reporting's. As my reports include date frames, mostly of my reports include in the SQL query a "Start_Date" and an "End_Date". I have been looking around on how to write some line of code to do the same in python. Has anyone experienced the same, please help and share. My code is as follows:
import pandas as pd
import numpy as np
import cx_Oracle
import warnings
from datetime import date
from datetime import datetime as dtt
connstr = 'UN/PW#dbpath/DB' # this is hidden due to security
conn = cx_Oracle.connect(connstr)
today=date.today()
start_date = input("Enter start_date in MM/DD/YYYY format :")
month, year, day = map(int, start_date.split('/'))
end_date= input("Enter end_date in MM/DD/YYYY format :")
month, year, day = map(int, end_date.split('/'))
# a pop up will require to enter the start_date and the end_date manually
print (start_date)
print (end_date)
05/01/2021
05/31/2021
df=pd.read_sql_query("""select pr_no
, pr_task_no
, to_date(to_char(act_complete_date_time,'mm/dd/yyyy'),'mm/dd/yyyy') as act_complete_date_time
from pr_task
where pr_task_no = 100
and act_complete_date_time between to_date({start_date},'mm/dd/yyyy') and to_date({end_date},'mm/dd/yyyy')
""", conn)
The error that I'm getting is: DatabaseError: Execution failed on sql
': ORA-00936: missing expression
So Oracle is not recognizing the date entered and is not running the script.
I have give multiple attempts to format the date so it can be recognized from the database.
Can someone help to achieve this step?
Thank you in advance!
i think you are missing a "f" before the query string, as is
df=pd.read_sql_query(f"""select pr_no
, pr_task_no
, to_date(to_char(act_complete_date_time,'mm/dd/yyyy'),'mm/dd/yyyy') as act_complete_date_time
from pr_task
where pr_task_no = 100
and act_complete_date_time between to_date({start_date},'mm/dd/yyyy') and to_date({end_date},'mm/dd/yyyy')
""", conn)
without it you are sending the literal {start_date} to the db as the query and it is not replaced with the variable with the same name
as a side note, this code is considered vulnrable if you are letting an unknown user set the times, he or she could use sql injection to edit you query (imagine if instead of a date they put ;drop table pr_task)
I am using Python to connect to SQL Server database and execute several 'select' type of queries that contain date range written in a particular way. All these queries have the same date range, so instead of hard-coding it, I'd prefer to have it as a string and change it in one place only when needed.
So far, I found out that I can use datetime module and the following logic to convert dates to strings:
from datetime import datetime
start_date = datetime(2020,1,1).strftime("%Y-%m-%d")
end_date = datetime(2020,1,31).strftime("%Y-%m-%d")
Example of the query:
select * from where xxx='yyy' and time between start_date and end_date
How can I make it work?
EDIT
my code:
import pyodbc
import sqlalchemy
from sqlalchemy import create_engine
from datetime import datetime
start_date = datetime(2020,1,1).strftime("%Y-%m-%d")
end_date = datetime(2020,1,31).strftime("%Y-%m-%d")
engine = create_engine("mssql+pyodbc://user:pwd#server/monitor2?driver=SQL+Server+Native+Client+11.0")
sql_query = """ SELECT TOP 1000
[mtime]
,[avgvalue]
FROM [monitor2].[dbo].[t_statistics_agg]
where place = 'Europe' and mtime between 'start_date' and 'end_date'
order by [mtime] asc;"""
df = pd.read_sql(sql_query, engine)
print(df)
Thank you all for your input, I have found the answer to make the query work. The variables should look like:
start_date = date(2020, 1, 1)
end_date = date(2020, 1, 31)
and SQL query like:
sql_query = f""" SELECT TOP 1000
[mtime]
,[avgvalue]
FROM [monitor2].[dbo].[t_statistics_agg]
where place = 'Europe' and mtime between '{start_date}' and '{end_date}'
order by [mtime] asc;"""
I have a Python(3) script that is supposed to run each morning. In it, I call some SQL. However I'm getting an error message:
Error while connecting to PostgreSQL operator does not exist: date = integer
The SQL is based on the concatenation of a string:
ecom_dashboard_query = """
with
days_data as (
select
s.date,
s.user_type,
s.channel_grouping,
s.device_category,
sum(s.sessions) as sessions,
count(distinct s.dimension2) as daily_users,
sum(s.transactions) as transactions,
sum(s.transaction_revenue) as revenue
from ga_flagship_ecom.sessions s
where date = """ + run.start_date + """
group by 1,2,3,4
)
insert into flagship_reporting.ecom_dashboard
select *
from days_data;
"""
Here is the full error:
09:31:25 Error while connecting to PostgreSQL operator does not exist: date = integer
09:31:25 LINE 14: where date = 2020-01-19
09:31:25 ^
09:31:25 HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
I tried wrapping run.start_date within str like so: str(run.start_date) but I received the same error message.
I suspect it may be to do with the way I concatenate the SQL query string, but I am not sure.
The query runs fine in SQL directly with a hard coded date and no concatenation:
where date = '2020-01-19'
How can I get the query string to work correctly?
It's more better to pass query params to cursor.execute method. From docs
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
So instead of string concatenation pass run.start_date as second argument of cursor.execute.
In your query instead of concatenation use %s:
where date = %s
group by 1,2,3,4
In your python code add second argument to execute method:
cur.execute(ecom_dashboard_query , (run.start_date,))
Your sentece is wrong:
where date = """ + run.start_date + """
try to compare a date and a string and this is not posible, you need to convert "run.start_date" to datetime and compare simply:
date_format = datetime.strptime(your_date_string, '%y-%m-%d')
and with this date converted to datetime do:
where date = date_format
Final code:
date_format = datetime.strptime(your_date_string, '%y-%m-%d')
ecom_dashboard_query = """
with
days_data as (
select
s.date,
s.user_type,
s.channel_grouping,
s.device_category,
sum(s.sessions) as sessions,
count(distinct s.dimension2) as daily_users,
sum(s.transactions) as transactions,
sum(s.transaction_revenue) as revenue
from ga_flagship_ecom.sessions s
where date = {}
group by 1,2,3,4
)
insert into flagship_reporting.ecom_dashboard
select *
from days_data;
""".format(date_format)
I am exploring ways to bring BigQuery data into Python, here is my code so far:
from google.cloud import bigquery
from pandas.io import gbq
client = bigquery.Client.from_service_account_json("path_to_my.json")
project_id = "my_project_name"
query_job = client.query("""
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
""")
results = query_job.result() # Waits for job to complete.
#for row in results:
# print("{}: {}".format(row.date, row.visits))
results_df = gbq.read_gbq(query_job,project_id=project_id)
The commented out lines: #for row in results:
print("{}: {}".format(row.date, row.visits))
return the correct results from my query, but they aren't usable in this form, as a next step I'd like to get them into a dataframe, but this code returns the error TypeError: Object of type 'QueryJob' is not JSON serializable.
Can anyone tell me what is wrong with my code to generate this error, or perhaps suggest a better way to bring in BigQuery data to a dataframe?
The method read_gbq expects a str as input and not a QueryJob one.
Try running it like this instead:
query = """
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
"""
results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')
I am trying to use Pandas and SQLAlchemy to run a query on a MySQL instance. In the actual query, there is a 'WHERE' statement referencing a specific date. I'd like to run this query separately for each date in a Python list, and append each date's dataframe iteratively to another Master dataframe. My code right now looks like this (excluding SQLAlchemy engine creation):
dates = ['2016-01-12','2016-01-13','2016-01-14']
for day in dates:
query="""SELECT * from schema.table WHERE date = '%s' """
df = pd.read_sql_query(query,engine)
frame.append(df)
My error is
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/MySQLdb/cursors.pyc in execute(self, query, args)
157 query = query.encode(charset)
158 if args is not None:
--> 159 query = query % db.literal(args)
160 try:
161 r = self._query(query)
TypeError: not enough arguments for format string
I'm wondering what the best way to insert the string from the list into my query string is?
Use params to parameterize your query:
dates = ['2016-01-12', '2016-01-13', '2016-01-14']
query = """SELECT * from schema.table WHERE date = %s"""
for day in dates:
df = pd.read_sql_query(query, engine, params=(day, ))
frame.append(df)
Note that I've removed the quotes around the %s placeholder - the data type conversions would be handled by the database driver itself. It would put quotes implicitly if needed.
And, you can define the query before the loop once - no need to do it inside.
I also think that you may need to have a list of date or datetime objects instead of strings:
from datetime import date
dates = [
date(year=2016, month=1, day=12),
date(year=2016, month=1, day=13),
date(year=2016, month=1, day=14),
]