Getting no such table error using pandas and sqldf - python

I am getting a sqlite3 error.
OperationalError: no such table: Bills
I first call my dataframes using pandas and then call those dataframes in my query which works fine
import pandas as pd
from pandasql import sqldf
Bills = pd.read_csv("Bills.csv")
Accessorials = pd.read_csv("Accessorials.csv")
q = """
Select
CityStateLane,
Count(BillID) as BillsCount,
Sum(BilledAmount) as BillsSum,
Count(Distinct CarrierName) as NumberOfCarriers,
Avg(BilledAmount) as BillsAverage,
Avg(BilledWeight) as WeightAverage
From
Bills
Where
Direction = 'THIRD PARTY'
Group by
CityStateLane
Order by
BillsCount DESC
"""
topCityStateLane = sqldf(q)
I then create another data frame using another query but this calls the errors saying Bills is not there even though I successfully used it in the previous query.
q = """
SELECT
Bills.BillID as BillID,
A2.TotalAcc as TotalAcc
FROM
(SELECT
BillID_Value,
SUM(PaidAmount_Value) as "TotalAcc"
FROM
Accessorials
GROUP BY
BillID_Value
) AS A2,
Bills
WHERE
A2.BillID_Value = Bills.BillID
"""
temp = sqldf(q)
Thank you for taking the time to read this.

Are you trying to join Bills with A2 table? You can't select columns from two tables in one select from statement.
q = """
SELECT
Bills.BillID as BillID,
A2.TotalAcc as TotalAcc
FROM
(SELECT
BillID_Value,
SUM(PaidAmount_Value) as "TotalAcc"
FROM
Accessorials
GROUP BY
BillID_Value
) AS A2
join Bills
on A2.BillID_Value = Bills.BillID
"""
temp = sqldf(q)

) AS A2,
Bills
I think this is where your issue is. You're not calling the Bills table in your FROM clause, you're calling the return table from the subquery you wrote with the alas A2. In other words, your From clause is pointing at the A2 'table' not Bills. As Qianbo Wang mentioned, if you want to return output from these two separate tables you will have to join them together.

Related

Python Sqlite: how to select non-existing records(rows) based on a column?

Hope everyone's doing well.
Database:
Value Date
---------------------------------
3000 2019-12-15
6000 2019-12-17
What I hope to return:
"Data:3000 on 2019-12-15"
"NO data on 2019-12-16" (non-existing column based on Date)
"Data:6000 on 2019-12-17"
I don't know how to filter non-existing records(rows) based on a column.
Possible boilerplate code:
db = sqlite3.connect("Database1.db")
cursor = db.cursor()
cursor.execute("""
SELECT * FROM Table1
WHERE Date >= "2019-12-15" and Date <= "2019-12-17"
""")
entry = cursor.fetchall()
for i in entry:
if i is None:
print("No entry found:", i)
else:
print("Entry found")
db.close()
Any help is much appreciated!
The general way you might handle this problem uses something called a calendar table, which is just a table containing all dates you want to see in your report. Consider the following query:
SELECT
d.dt,
t.Value
FROM
(
SELECT '2019-12-15' AS dt UNION ALL
SELECT '2019-12-16' UNION ALL
SELECT '2019-12-17'
) d
LEFT JOIN yourTable t
ON d.dt = t.Date
ORDER BY
d.dt;
In practice, if you had a long term need to do this and/or had a large number of dates to cover, you might setup a bona-fide calendar table in your SQLite database for this purpose. The above query is only intended to be a proof-of-concept.

how to delete the only the rows in postgres but not to drop table using pandas read_sql_query method?

I wanted to perform an operation where i would like to delete all the rows(but not to drop the table) in postgres and update with new rows in it. And I wanted to use pd.read_sql_query() method from pandas:
qry = 'delete from "table_name"'
pd.read_sql_query(qry, conection, **kwargs)
But it was throwing error 'ResourceClosedError: This result object does not return rows. It has been closed automatically.'
I can expect this because the method should return the empty dataframe.But it was not returning any empty dataframe but only the the above error. Could you please help me in resolving it??
I use MySql, but the logic is the same:
Query 1: Choose all ids from you table
Quear 2: Delete all this ids
As a result you have:
Delete FROM table_name WHERE id IN (Select id FROM table_name)
The line do not return anuthing, it just delete all rows with a special id. I recomend to do the command using psycopg only - no pandas.
Then you need another query to get smth from db like:
pd.read_sql_query("SELECT * FROM table_name", conection, **kwargs)
Probably (I do not use pandas to read from db) in this case you'll get empty dataframe with Column names
Probably you can combine all the actions, the following way:
pd.read_sql_query('''Delete FROM table_name WHERE id IN (Select id FROM table_name); SELECT * FROM table_name''', conection, **kwargs)
Please try and share your results.
You can follow the next steps!
Check 'row existence' first in the table.
And then delete rows
Example code
check_row_query = "select exists(select * from tbl_name limit 1)"
check_exist = pd.read_sql_query(check_row_query, con)
if check_exist.exists[0]:
delete_query = 'DELETE FROM tbl_name WHERE condtion(s)'
con.execute(delete_query) # to delete rows using a sqlalchemy function
print('Delete all rows!)
else:
pass

Importing database takes a lot of time

I am trying to import a table that contains 81462 rows in a dataframe using the following code:
sql_conn = pyodbc.connect('DRIVER={SQL Server}; SERVER=server.database.windows.net; DATABASE=server_dev; uid=user; pwd=pw')
query = "select * from product inner join brand on Product.BrandId = Brand.BrandId"
df = pd.read_sql(query, sql_conn)
And the whole process takes a very long time. I think that I am already 30-minutes in and it's still processing. I'd assume that this is not quite normal - so how else should I import it so the processing time is quicker?
Thanks to #RomanPerekhrest. FETCH NEXT imported everything within 1-2 minutes.
SELECT product.Name, brand.Name as BrandName, description, size FROM Product inner join brand on product.brandid=brand.brandid ORDER BY Name OFFSET 1 ROWS FETCH NEXT 80000 ROWS ONLY

Looping Python Parameters Through SQL Code

I need to create the following report scalable:
query = """
(SELECT
'02/11/2019' as Week_of,
media_type,
campaign,
count(ad_start_ts) as frequency
FROM usotomayor.digital
WHERE ds between 20190211 and 20190217
GROUP BY 1,2,3)
UNION ALL
(SELECT
'02/18/2019' as Week_of,
media_type,
campaign,
count(ad_start_ts) as frequency
FROM usotomayor.digital
WHERE ds between 20190211 and 20190224
GROUP BY 1,2,3)
"""
#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2
However, as you can see I cannot make this report scalable if I have a long list of dates for each SQL query that I need to union.
My first attempt at looping in a list of date variables into the SQL script is as follows:
dfys = ['20190217','20190224']
df2 = ['02/11/2019','02/18/2019']
for i in df2:
date=i
for j in dfys:
date2=j
query = f"""
SELECT
'{date}' as Week_of,
raw.media_type,
raw.campaign,
count(raw.ad_start_ts) as frequency
FROM usotomayor.digital raw
WHERE raw.ds between 20190211 and {date2}
GROUP BY 1,2,3
"""
#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2
However, this is not working for me. I think I need to loop through the sql query itself, but I don't know how to do this. Can someone help me?
As a commenter said "this is not working for me" is not very specific so let's start at specifying the problem. You need to execute a query for each pair of dates you need to execute these queries as a loop and save the result (or actually union them, but then you need to change your query logic).
You could do it like this:
dfys = ['20190217', '20190224']
df2 = ['02/11/2019', '02/18/2019']
query_results = list()
for start_date, end_date in zip(dfys, df2):
query = f"""
SELECT
'{start_date}' as Week_of,
raw.media_type,
raw.campaign,
count(raw.ad_start_ts) as frequency
FROM usotomayor.digital raw
WHERE raw.ds between 20190211 and {end_date}
GROUP BY 1,2,3
"""
query_results.append(spark.sql(query).toPandas())
query_results[0]
query_results[1]
Now you get a list of your results (query_results).

Can't convert this SQL String (with VALUES ... AS) to SQLAlchemy Code

The SQL query I have can identify the Max Edit Time from the 3 tables that it is joining together:
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
(SELECT Max(v)
FROM (VALUES (Students.Edit_DtTm), (Schedule.Edit_DtTm),
(Identity.Edit_DtTm)) AS value(v)) as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id
I need this to be in SQLAlchemy so I can reference the columns being used elsewhere in my code. Below is the simplest version of what i'm trying to do but it doesn't work. I've tried changing around how I query it but I either get a SQL error that I'm using VALUES incorrectly or it doesn't join properly and gets me the actual highest value in those columns without matching it to the outer query
max_edit_subquery = sa.func.values(Students.Edit_DtTm, Schedule.Edit_DtTm, Identity.Edit_DtTm)
base_query = (sa.select([Identity.SSN, Schedule.First_Class, Students.Last_Name,
(sa.select([sa.func.max(self.max_edit_subquery)]))]).
select_from(Schedule.__table__.join(Students, Schedule.stdnt_id == Students.stdnt_id).
join(Ident, Schedule.std_id == Identity.std_id)))
I am not an expert at SQLAlchemy but you could exchange VALUES with UNION ALL:
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
(SELECT Max(v)
FROM (SELECT Students.Edit_DtTm AS v
UNION ALL SELECT Schedule.Edit_DtTm
UNION ALL SELECT Identity.Edit_DtTm) s
) as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id;
Another approach is to use GREATEST function (not available in T-SQL):
Select Identity.SSN, Schedule.First_Class, Students.Last_Name,
GREATEST(Students.Edit_DtTm, Schedule.Edit_DtTm,Identity.Edit_DtTm)
as [MaxEditDate]
FROM Schedule
LEFT JOIN Students ON Schedule.stdnt_id=Students.Student_Id
LEFT JOIN Identity ON Schedule.std_id=Identity.std_id;
I hope that it will help you to translate it to ORM version.
I had the similar problem and i solved using the below approach. I have added the full code and resultant query. The code was executed on the MSSQL server. I had used different tables and masked with the tables and columns used in your requirement in the below code snippet.
from sqlalchemy import *
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.types import String
from sqlalchemy.sql.expression import FromClause
class values(FromClause):
def __init__(self, *args):
self.column_names = args
#compiles(values)
def compile_values(element, compiler, asfrom=False, **kwrgs):
values = "VALUES %s" % ", ".join("(%s)" % compiler.render_literal_value(elem, String()) for elem in element.column_names)
if asfrom:
values = "(%s)" % values
return values
base_query = self.db_session.query(Schedule.Edit_DtTm.label("Schedule_Edit_DtTm"),
Identity.Edit_DtTm.label("Identity_Edit_DtTm"),
Students.Edit_DtTm.label("Students_Edit_DtTm"),
Identity.SSN
).outerjoin(Students, Schedule.stdnt_id==Students.Student_Id
).outerjoin(Identity, Schedule.std_id==Identity.std_id).subquery()
values_at_from_clause = values(("Students_Edit_DtTm"), ("Schedule_Edit_DtTm"), ("Identity_Edit_DtTm")
).alias('values(MaxEditDate)')
get_max_from_values = self.db_session.query(func.max(text('MaxEditDate'))
).select_from(values_at_from_clause)
output_query = self.db_session.query(get_max_from_values.subquery()
).label("MaxEditDate")
**print output_query**
SELECT
anon_1.Schedule_Edit_DtTm AS anon_1_Schedule_Edit_DtTm,
anon_1.Students_Edit_DtTm AS anon_1_Students_Edit_DtTm,
anon_1.Identity_Edit_DtTm AS anon_1_Identity_Edit_DtTm,
anon_1.SSN AS anon_1_SSN
(
SELECT
anon_2.max_1
FROM
(
SELECT
max( MaxEditDate ) AS max_1
FROM
(
VALUES (Students_Edit_DtTm),
(Schedule_Edit_DtTm),
(Identity_Edit_DtTm)
) AS values(MaxEditDate)
) AS anon_2
) AS MaxEditDate
FROM
(
SELECT
Schedule.Edit_DtTm AS Schedule_Edit_DtTm,
Students.Edit_DtTm AS Students_Edit_DtTm,
Identity.Edit_DtTm AS Identity_Edit_DtTm,
Identity.SSN AS SSN
FROM
Schedule WITH(NOLOCK)
LEFT JOIN Students WITH(NOLOCK) ON
Schedule.stdnt_id==Students.Student_Id
LEFT JOIN Identity WITH(NOLOCK) ON
Schedule.std_id==Identity.std_id
) AS anon_1

Categories