I have Postgres table called Records with several appIds. I am writing a function to pull out the data for one AppId (ABC0123) and want to store this result in dataframe df.
I have created a variable AppId1 to store my AppId(ABC0123) and passed it to the query.
The script got executed, but it did not create the dataframe.
def fun(AppId):
AppId1=pd.read_sql_query(""" select "AppId" from "Records" where "AppId"='%s'""" %AppId,con=engine )
query = """" SELECT AppId from "Records" where AppId='%s' """ % AppId1
df=pd.read_sql_query(query,con=engine)
return fun
fun('ABC0123')
Change % AppId1 to % AppId only.
return df
def fun(AppId):
AppId1=pd.read_sql_query(""" select 'AppId' from 'Records' where 'AppId'='%s'""" %AppId,con=engine )
query = """ SELECT 'AppId' from 'Records' where AppId='%s' """ %AppId
df=pd.read_sql_query(query,con=engine)
return df
Related
I have been trying to apply an f string formatting for the colname parameter inside the SQL query for a script I am building, but I keep getting a parse exception error.
def expect_primary_key_have_relevant_foreign_key(spark_df1, spark_df2, colname):
'''
Check that all the primary keys have a relevant foreign key
'''
# Create Temporary View
spark_df1.createOrReplaceTempView("spark_df1")
spark_df2.createOrReplaceTempView("spark_df2")
# Wrap Query in spark.sql
result = spark.sql("""
select df1.*
from spark_df1 df1
left join
spark_df2 df2
f"on trim(upper(df1.{colname})) = trim(upper(df2.{colname}))"
f"where df2.{colname} is null"
""")
if result == 0:
print("Validation Passed!")
else:
print("Validation Failed!")
return result
I found the solution, the f goes before the triple quotes """ as:
# Wrap Query in spark.sql
result = spark.sql(f"""
select df1.*
from spark_df1 df1
left join
spark_df2 df2
on trim(upper(df1.{colname})) = trim(upper(df2.{colname}))
where df2.{colname} is null
""")
The following API function works. But I would like to parameterize the query name so that I don't have to use if...else. Instead, I would like to be able to take the parameter from the query url, concatenate it to the query name variable and execute the query.
I would like to be able to stick "qry_+ <reportId>" and use it as the query name variables like qry_R01, qry_R02, or qry_R03. Is it possible?
def get_report():
reportId = request.args.get('reportId', '')`
qry_R01 = """
SELECT
column1,
column2,
column3,
FROM
table1
"""
qry_R02 = """
SELECT
column1,
column2,
column3,
FROM
table2
"""
qry_R03 = """
SELECT
column1,
column2,
column3,
FROM
table3
"""
db = OracleDB('DB_RPT')
if (rptId == 'R01'):
db.cursor.execute(qry_R01,
)
elif (rptId == 'R02'):
db.cursor.execute(qry_R02,
)
elif (rptId == 'R03'):
db.cursor.execute(qry_R03,
)
json_data = db.render_json_data('json_arr')
db.connection.close()
return json_data
It seems what you need in this case is to map from reportId to the table, the rest of the query is identical. The below solutions uses a dictionary and str.format():
def get_report():
reportId = request.args.get('reportId', '')
# this maps ours possible report IDs to their relevant query suffixes
reportTableMap = {
'R01': 'table1',
'R02': 'table2',
'R03': 'table3',
}
# ensure this report ID is valid, else we'll end up with a KeyError later on
if reportId not in reportTableMap:
return 'Error: invalid report'
baseQuery = '''
SELECT
column1,
column2,
column3,
FROM {table}
'''
db = OracleDB('DB_RPT')
db.cursor.execute(baseQuery.format(table=reportTableMap[reportId]))
json_data = db.render_json_data('json_arr')
db.connection.close()
return json_data
This solution only works for fairly simple cases, though, and does risk leaving open a SQL injection attack. A better solution would be to use prepared statements but the exact code to do that depends on the database driver being used.
I am trying to make a simple AWS Lambda function to get few rows from Amazon RDS(MySQL db) and return it in the json format.
If I try to append the object instance then I get error that object of type XXX is not json serializable. If I do something like below then I get only latest entry from the db. (This is unlike to what shown in https://hackersandslackers.com/create-a-rest-api-endpoint-using-aws-lambda/).
def save_events(event):
result = []
conn = pymysql.connect(rds_host,user=name,passwd=password,db=db_name,connect_timeout=5)
with conn:
cur = conn.cursor()
cur.execute("select * from tblEmployees")
rows = cur.fetchall()
for row in rows:
employee = Employee(row)
data['Id'] = employee.id
data['Name']= employee.name
result.append(data)
return result
def main(event, context):
data = save_events(event)
return {
"StatusCode":200,
"Employee": data
}
I understand that the contend of variable 'data' changes runtime and it affects on result.append(). I've 4 entries in table tblEmployees. The output of above gets 4 entries in the result but all the four entries are same (and equal to the latest record in the db).
The json.dumps() didn't work as the data is in the unicode format. I've already tried .toJSON() and byteify() and it didn't work.
Any help ?
You should re-create the data to avoid overriding old values:
for row in rows:
employee = Employee(row)
data = new Dict(Id=employee.id, Name=employee.name)
result.append(data)
I am trying to write. code that will allow a user to select specific columns from a sqlite database which will then be transformed into a pandas data frame. I am using a test database titled test_database.db with a table titled test. The table has three columns, id, value_one, and value_two. The function I am showing exists within a class that establishes a connection to the database and in this function the user only needs to pass the table name and a list of columns that they would like to extract. For instance in command line sqlite I might type the command select value_one, value_two from test if I wanted only to read in the columns value_one and column_two from the table test. If I type this command into command line the method works. However, in this case I use python to build the text string which is fed into pandas.read_sql_query() and the method does not work. My code is shown below
class ReadSQL:
def __init__(self, database):
self.database = database
self.conn = sqlite3.connect(self.database)
self.cur = self.conn.cursor()
def query_columns_to_dataframe(table, columns):
query = 'select '
for i in range(len(columns)):
query = query + columns[I] + ', '
query = query[:-2] + ' from ' + table
# print(query)
df = pd.read_sql_query(query, self.conn)
return
def close_database()
self.conn.close
return
test = ReadSQL(test_database.db)
df = query_columns_to_dataframe('test', ['value_one', 'value_two'])
I am assuming my problem has something to do with the way that query_columns_to_dataframe() pre-processes the information because if I uncomment the print command in query_columnes_to_dataframe() I get a text string that looks identical to what works if I just type it directly into command line. Any help is appreciated.
I mopped up a few mistakes in your code to produce this, which works. Note that I inadvertently changed the names of the fields in your test db.
import sqlite3
import pandas as pd
class ReadSQL:
def __init__(self, database):
self.database = database
self.conn = sqlite3.connect(self.database)
self.cur = self.conn.cursor()
def query_columns_to_dataframe(self, table, columns):
query = 'select '
for i in range(len(columns)):
query = query + columns[i] + ', '
query = query[:-2] + ' from ' + table
#~ print(query)
df = pd.read_sql_query(query, self.conn)
return df
def close_database():
self.conn.close
return
test = ReadSQL('test_database.db')
df = test.query_columns_to_dataframe('test', ['value_1', 'value_2'])
print (df)
Output:
value_1 value_2
0 2 3
Your code are full of syntax errors and issues
The return in query_columns_to_dataframe should be return df. This is the primary reason why your code does not return anything.
self.cur is not used
Missing self parameter when declaring query_columns_to_dataframe
Missing colon at the end of the line def close_database()
Missing self parameter when declaring close_database
Missing parentheses here: self.conn.close
This df = query_columns_to_dataframe should be df = test.query_columns_to_dataframe
Fixing these errors and your code should work.
I have a database with multiple similar entry called Email1, Email2 ... I have created a function to store the data within these columns. However, the function upon call is only able to store/update a particular column and not the other. I have this code:
self.query = ("Update registration set Email1 = %s where Username =%s")
I am using this same code but within a function, so when the function is called it keeps updating the same column Email1 and not the other columns. Is there any way to update the column name within the query so that the same function and same query can be called again and again without typing the query itself? Thanks!
You could try some thing like this
def update_registration(email1, email2, username)
query = "Update registration set Email1 = IFNULL(%s, Email1), Email2 = IFNULL(%s, Email2) where Username = %s"
data = (email1, email2, username)
...
cursor.execute(query, data)
...
Using this function step-by-step like
update_registration('aa#example.com', None, 'Test')
update_registration(None, 'bb#example.com', 'Test')
In result will gave user Test value:
email1 = aa#example.com and email2 = bb#example.com