pd.read_sql - Unsupported format character error (0x27)

pd.read_sql - Unsupported format character error (0x27) - python

As above, I'm trying to use pd.read_sql to query our mysql database, and getting an error for double/single quotes.
When I remove the % operators from the LIKE clause (lines 84-87) the query runs, but these are needed. I know I need to format the strings but I don't know how within such a big query.
Here's the query:
SELECT
s.offer_id,
s.cap_id,
vi.make,
vi.model,
vi.derivative,
i.vehicle_orders,
s.lowest_offer,
CASE
WHEN f.previous_avg = f.previous_low THEN "n/a"
ELSE FORMAT(f.previous_avg, 2)
END as previous_avg,
f.previous_low,
CASE
WHEN ( ( (s.lowest_offer - f.previous_avg) / f.previous_avg) * 100) = ( ( (s.lowest_offer - f.previous_low) / f.previous_low) * 100) THEN "n/a"
ELSE CONCAT(FORMAT( ( ( (s.lowest_offer - f.previous_avg) / f.previous_avg) * 100), 2), "%")
END as diff_avg,
CONCAT(FORMAT( ( ( (s.lowest_offer - f.previous_low) / f.previous_low) * 100), 2), "%") as diff_low,
s.broker,
CASE
WHEN s.in_stock = '1' THEN "In Stock"
ELSE "Factory Order"
END as in_stock,
CASE
WHEN s.special IS NOT NULL THEN "Already in Specials"
ELSE "n/a"
END as special
FROM
( SELECT o.id as offer_id,
o.cap_id as cap_id,
MIN(o.monthly_payment) as lowest_offer,
b.name as broker,
o.stock as in_stock,
so.id as special
FROM
offers o
INNER JOIN brands b ON ( o.brand_id = b.id )
LEFT JOIN special_offers so ON ( so.cap_id = o.cap_id )
WHERE
( o.date_modified >= DATE_ADD(NOW(), INTERVAL -1 DAY) OR o.date_created >= DATE_ADD(NOW(), INTERVAL -1 DAY) )
AND o.deposit_value = 9
AND o.term = 48
AND o.annual_mileage = 8000
AND o.finance_type = 'P'
AND o.monthly_payment > 100
GROUP BY
o.cap_id
ORDER BY
special DESC) s
INNER JOIN
( SELECT o.cap_id as cap_id,
AVG(o.monthly_payment) as previous_avg,
MIN(o.monthly_payment) as previous_low
FROM
offers o
WHERE
o.date_modified < DATE_ADD(NOW(), INTERVAL -1 DAY)
AND o.date_modified >= DATE_ADD(NOW(), INTERVAL -1 WEEK)
AND o.deposit_value = 9
AND o.term = 48
AND o.annual_mileage = 8000
AND o.finance_type = 'P'
AND o.monthly_payment > 100
GROUP BY
o.cap_id ) f ON ( s.cap_id = f.cap_id )
LEFT JOIN
( SELECT a.cap_id as cap_id,
v.manufacturer as make,
v.model as model,
v.derivative as derivative,
COUNT(*) as vehicle_orders
FROM
( SELECT o.id,
o.name as name,
o.email as email,
o.date_created as date,
SUBSTRING_INDEX(SUBSTRING(offer_serialized, LOCATE("capId", offer_serialized) +12, 10), '"', 1) as cap_id
FROM moneyshake.orders o
WHERE o.name NOT LIKE 'test%'
AND o.email NOT LIKE 'jawor%'
AND o.email NOT LIKE 'test%'
AND o.email NOT LIKE '%moneyshake%'
AND o.phone IS NOT NULL
AND o.date_created > DATE_ADD(NOW(), INTERVAL -1 MONTH)
) a JOIN moneyshake.vehicles_view v ON a.cap_id = v.id
GROUP BY
v.manufacturer,
v.model,
v.derivative,
a.cap_id) i ON ( f.cap_id = i.cap_id )
INNER JOIN
( SELECT v.id as id,
v.manufacturer as make,
v.model as model,
v.derivative as derivative
FROM moneyshake.vehicles_view v
GROUP BY v.id ) vi ON s.cap_id = vi.id
WHERE
( ( s.lowest_offer - f.previous_low ) / f.previous_low) * 100 <= -15
GROUP BY
s.cap_id
Thanks!

That error occurs then the DBAPI layer (e.g., mysqlclient) natively uses the "format" paramstyle and the percent sign (%) is misinterpreted as a format character instead of a LIKE wildcard.
The fix is to wrap the SQL statement in a SQLAlchemy text() object. For example, this will fail:
import pandas as pd
import sqlalchemy as sa
engine = sa.create_engine("mysql+mysqldb://scott:tiger#localhost:3307/mydb")
sql = """\
SELECT * FROM million_rows
WHERE varchar_col LIKE 'record00000%'
ORDER BY id
"""
df = pd.read_sql_query(sql, engine)
but simply changing the read_sql_query() call to
df = pd.read_sql_query(sa.text(sql), engine)
will work.

Related

Issue with parametrized queries using python

I am trying to use python to use a parametrized query through a list. This is the following code:
loan_records =['604150062','604150063','604150064','604150065','604150066','604150067','604150069','604150070']
borr_query = "select distinct a.nbr_aus, cast(a.nbr_trans_aus as varchar(50)) nbr_trans_aus, c.amt_finl_item, case when a.cd_idx in (-9999, 0) then null else a.cd_idx end as cd_idx, a.rate_curr_int, case when a.rate_gr_mrtg_mrgn = 0 then null else a.rate_gr_mrtg_mrgn end as rate_gr_mrtg_mrgn, a.rate_loln_max_cap, case when a.rate_perdc_cap = 0 then null else a.rate_perdc_cap end as rate_perdc_cap from db2mant.i_lp_trans a left join db2mant.i_lp_trans_borr b on a.nbr_aus = b.nbr_aus and a.nbr_trans_aus = b.nbr_trans_aus left join db2mant.i_lp_finl_item c on a.nbr_aus = c.nbr_aus and a.nbr_trans_aus = c.nbr_trans_aus where a.nbr_trans_aus in (?) and c.cd_finl_item = 189"
ODS.execute(borr_query, loan_records)
#PML.execute(PML_SUBMN_Query, (first_evnt, last_evnt, x))
ODS_records = ODS.fetchall()
ODS_records = pd.DataFrame(ODS_records, columns=['nbr_aus', 'nbr_trans_aus', 'amt_finl_item', 'cd_idx', 'rate_curr_int', 'rate_gr_mrtg_mrgn', 'rate_loln_max_cap', 'rate_perdc_cap'])
When I try to run this code: this is the following error message:
error message

How to serialize the complex query (peewee)

I am using the peewee as ORM and my goal is to serialize the result of the complex query whcih also contains subqueries:
machine_usage_alias = RecordDailyMachineUsage.alias()
subquery = (
machine_usage_alias.select(
machine_usage_alias.machine_id,
fn.MAX(machine_usage_alias.date).alias('max_date'),
)
.group_by(machine_usage_alias.machine_id)
.alias('machine_usage_subquery')
)
record_subquery = RecordDailyMachineUsage.select(
RecordDailyMachineUsage.machine_id, RecordDailyMachineUsage.usage
).join(
subquery,
on=(
(RecordDailyMachineUsage.machine_id == subquery.c.machine_id)
& (RecordDailyMachineUsage.date == subquery.c.max_date)
),
)
query = (
Machine.select(
Machine.id, # 0
Machine.name,
Machine.location,
Machine.arch,
Machine.platform,
Machine.machine_version,
Machine.status,
record_subquery.c.usage.alias('usage'),
fn.GROUP_CONCAT(Tag.name.distinct()).alias('tags_list'),
fn.GROUP_CONCAT(Project.full_name.distinct()).alias('projects_list'),
) # 10
.join(MachineTag)
.join(Tag)
.switch(Machine)
.join(MachineProject)
.join(Project)
.join(
record_subquery,
JOIN.LEFT_OUTER,
on=(Machine.id == record_subquery.c.machine_id),
)
.where((Machine.id != 0) & (Machine.is_alive == 1))
.group_by(Machine.id)
)
I've tried to use the method model_to_dict:
jsonify({'rows': [model_to_dict(c) for c in query]})
But this way gives me the columns and and values from the Machine model only. My aim is include all the columns from the select query.

It turned out that I had to use the dicts method of the query and jsonify the result.
machine_usage_alias = RecordDailyMachineUsage.alias()
subquery = (
machine_usage_alias.select(
machine_usage_alias.machine_id,
fn.MAX(machine_usage_alias.date).alias('max_date'),
)
.group_by(machine_usage_alias.machine_id)
.alias('machine_usage_subquery')
)
record_subquery = RecordDailyMachineUsage.select(
RecordDailyMachineUsage.machine_id, RecordDailyMachineUsage.usage
).join(
subquery,
on=(
(RecordDailyMachineUsage.machine_id == subquery.c.machine_id)
& (RecordDailyMachineUsage.date == subquery.c.max_date)
),
)
query = (
Machine.select(
Machine.id, # 0
Machine.name,
Machine.location,
Machine.arch,
Machine.platform,
Machine.machine_version,
Machine.status,
record_subquery.c.usage.alias('usage'),
fn.GROUP_CONCAT(Tag.name.distinct()).alias('tags_list'),
fn.GROUP_CONCAT(Project.full_name.distinct()).alias('projects_list'),
) # 10
.join(MachineTag)
.join(Tag)
.switch(Machine)
.join(MachineProject)
.join(Project)
.join(
record_subquery,
JOIN.LEFT_OUTER,
on=(Machine.id == record_subquery.c.machine_id),
)
.where((Machine.id != 0) & (Machine.is_alive == 1))
.group_by(Machine.id)
).dicts()
return jsonify({'rows': [c for c in query]})

Using Q expression in Case When is not returning expected results in Django. Seems like an inner join is used, not left join

I am trying to build a complex filter with the Django ORM and am running into an issue where objects without a foreign key are not being included. This is due to an inner join that is being generated and I believe it should be a left outer join. I have two models Report and Message. Report has a reference to a particular Message but that could also be null.
class Report(BaseModel):
message = models.ForeignKey(
Message,
on_delete=models.SET_NULL,
related_name="message_reports",
null=True,
)
created_at = models.DateTimeField()
# other fields
class Message(BaseModel):
should_send_messages = models.BooleanField(
default=True
)
num_hold_hours = models.IntegerField(
default=None,
null=True,
)
# other fields
Here is the filter that I am trying to use.
Report.objects.filter(
Q(message__isnull=True) | Q(message__should_send_messages=True),
created_at__lte=
Case(
When(
Q(message__isnull=True) | Q(message__num_hold_hours__isnull=True),
then=ExpressionWrapper(
timezone.now() - timedelta(hours=1) * Cast(
settings.NUM_HOURS_TO_NOTIFY
, output_field=IntegerField())
, output_field=DateTimeField())),
default=ExpressionWrapper(
timezone.now() - timedelta(hours=1) * Cast(
F('message__num_hold_hours')
, output_field=IntegerField())
, output_field=DateTimeField()),
output_field=DateTimeField()),
)
Here is the sql that is generated as a result of that filter block. (I'm not sure why the datetimes look like that ugly)
SELECT "report"."message_id"
FROM "report"
INNER JOIN "message" ON (
"report"."message_id" = "message"."id"
)
WHERE (
(
"report"."message_id" IS NULL
OR "message"."should_send_messages" = True
)
AND "report"."created_at" <= (
CASE
WHEN (
"report"."message_id" IS NULL
OR "message"."num_hold_hours" IS NULL
) THEN (
2020 -11 -24 07 :09 :22.401276 + 00 :00 - (1 :00 :00 * (48)::integer)
)
ELSE (
2020 -11 -24 07 :09 :22.401833 + 00 :00 - (
1 :00 :00 * ("message"."num_hold_hours")::integer
)
)
END
)
)
ORDER BY "report"."created_at" DESC
I believe the output of that sql query should be.
SELECT "report"."message_id"
FROM "report"
LEFT OUTER JOIN "message" ON (
"report"."message_id" = "message"."id"
)
WHERE (
(
"report"."message_id" IS NULL
OR "message"."should_send_messages" = True
)
AND "report"."created_at" <= (
CASE
WHEN (
"report"."message_id" IS NULL
OR "message"."num_hold_hours" IS NULL
) THEN (
2020 -11 -24 07 :09 :22.401276 + 00 :00 - (1 :00 :00 * (48)::integer)
)
ELSE (
2020 -11 -24 07 :09 :22.401833 + 00 :00 - (
1 :00 :00 * ("message"."num_hold_hours")::integer
)
)
END
)
)
ORDER BY "report"."created_at" DESC
Any thoughts?

Make SqlAlchemy put join result in separate classes

I have a condition in which I have joined a table with itself. For getting result rows I need to have two separate attributes in the result rows each of which contain model items.
order_sq1: Order = self._query_from(Order). \
filter(Order.status == OrderStatus.REGISTERED.value,
Order.dispatch_date + Order.dispatch_time > current_time,
Order.who_added_role != UserRole.SHIPPER.value).subquery('Order1')
order_sq2: Order = self._query_from(Order). \
filter(Order.status == OrderStatus.REGISTERED.value,
Order.dispatch_date + Order.dispatch_time > current_time,
Order.who_added_role != UserRole.SHIPPER.value).subquery('Order2')
query = self._query_from(order_sq1).join(
order_sq2,
and_(order_sq1.c.tracking_code != order_sq2.c.tracking_code,
func.ST_DistanceSphere(order_sq1.c.destination_location,
order_sq2.c.source_location) < 0.15 *
order_sq1.c.distance
, func.ST_DistanceSphere(order_sq2.c.destination_location,
order_sq1.c.source_location) < 0.15 *
order_sq1.c.distance
, between(((order_sq1.c.dispatch_date + order_sq1.c.dispatch_time) -
(order_sq2.c.dispatch_date + order_sq2.c.dispatch_time) -
(order_sq1.c.distance / 60) * timedelta(hours=1)),
timedelta(hours=2),
timedelta(hours=20))
, order_sq1.c.source_region_id != order_sq1.c.destination_region_id
, order_sq2.c.source_region_id != order_sq2.c.destination_region_id
, order_sq1.c.vehicle_type == order_sq2.c.vehicle_type
)).add_columns(order_sq2)
I need the result in the form of (Order1, Order2) fields, each as Order class.
I tried many ways with no success!

Group by column to get array results in Postgresql

I have a table called moviegenre which looks like:
moviegenre:
- movie (FK movie.id)
- genre (FK genre.id)
I have a query (ORM generated) which returns all movie.imdb and genre.id's which have genre.id's in common with a given movie.imdb_id.
SELECT "movie"."imdb_id",
"moviegenre"."genre_id"
FROM "moviegenre"
INNER JOIN "movie"
ON ( "moviegenre"."movie_id" = "movie"."id" )
WHERE ( "movie"."imdb_id" IN (SELECT U0."imdb_id"
FROM "movie" U0
INNER JOIN "moviegenre" U1
ON ( U0."id" = U1."movie_id" )
WHERE ( U0."last_ingested_on" IS NOT NULL
AND NOT ( U0."imdb_id" IN
( 'tt0169547' ) )
AND NOT ( U0."imdb_id" IN
( 'tt0169547' ) )
AND U1."genre_id" IN ( 2, 10 ) ))
AND "moviegenre"."genre_id" IN ( 2, 10 ) )
The problem is that I'll get results in the format:
[
('imdbid22`, 'genreid1'),
('imdbid22`, 'genreid2'),
('imdbid44`, 'genreid1'),
('imdbid55`, 'genreid8'),
]
Is there a way within the query itself I can group all of the genre ids into a list under the movie.imdb_id's? I'd like do to grouping in the query.
Currently doing it in my web app code (Python) which is extremely slow when 50k+ rows are returned.
[
('imdbid22`, ['genreid1', 'genreid2']),
('imdbid44`, 'genreid1'),
('imdbid55`, 'genreid8'),
]
thanks in advance!
edit:
here's the python code which runs against the current results
results_list = []
for item in movies_and_genres:
genres_in_common = len(set([
i['genre__id'] for i in movies_and_genres
if i['movie__imdb_id'] == item['movie__imdb_id']
]))
imdb_id = item['movie__imdb_id']
if genres_in_common >= min_in_comon:
result_item = {
'movie.imdb_id': imdb_id,
'count': genres_in_common
}
if result_item not in results_list:
results_list.append(result_item)
return results_list

select m.imdb_id, array_agg(g.genre_id) as genre_id
from
moviegenre g
inner join
movie m on g.movie_id = m.id
where
m.last_ingested_on is not null
and not m.imdb_id in ('tt0169547')
and not m.imdb_id in ('tt0169547')
and g.genre_id in (2, 10)
group by m.imdb_id
array_agg will create an array of all the genre_ids of a certain imdb_id:
http://www.postgresql.org/docs/current/interactive/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE

I hope python code will be fast enough:
movielist = [
('imdbid22', 'genreid1'),
('imdbid22', 'genreid2'),
('imdbid44, 'genreid1'),
('imdbid55', 'genreid8'),
]
dict = {}
for items in movielist:
if dict[items[0]] not in dict:
dict[items[0]] = items[1]
else:
dict[items[0]] = dict[items[0]].append(items[1])
print dict
Output:
{'imdbid44': ['genreid1'], 'imdbid55': ['genreid8'], 'imdbid22': ['genreid1', 'genreid2']}
If you just need movie name, count:
Change this in original query you will get the answer you dont need python code
SELECT "movie"."imdb_id", count("moviegenre"."genre_id")
group by "movie"."imdb_id"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pd.read_sql - Unsupported format character error (0x27) - python

Related

Issue with parametrized queries using python

How to serialize the complex query (peewee)

Using Q expression in Case When is not returning expected results in Django. Seems like an inner join is used, not left join

Make SqlAlchemy put join result in separate classes

Group by column to get array results in Postgresql

Categories

Resources