I am trying to use python to use a parametrized query through a list. This is the following code:
loan_records =['604150062','604150063','604150064','604150065','604150066','604150067','604150069','604150070']
borr_query = "select distinct a.nbr_aus, cast(a.nbr_trans_aus as varchar(50)) nbr_trans_aus, c.amt_finl_item, case when a.cd_idx in (-9999, 0) then null else a.cd_idx end as cd_idx, a.rate_curr_int, case when a.rate_gr_mrtg_mrgn = 0 then null else a.rate_gr_mrtg_mrgn end as rate_gr_mrtg_mrgn, a.rate_loln_max_cap, case when a.rate_perdc_cap = 0 then null else a.rate_perdc_cap end as rate_perdc_cap from db2mant.i_lp_trans a left join db2mant.i_lp_trans_borr b on a.nbr_aus = b.nbr_aus and a.nbr_trans_aus = b.nbr_trans_aus left join db2mant.i_lp_finl_item c on a.nbr_aus = c.nbr_aus and a.nbr_trans_aus = c.nbr_trans_aus where a.nbr_trans_aus in (?) and c.cd_finl_item = 189"
ODS.execute(borr_query, loan_records)
#PML.execute(PML_SUBMN_Query, (first_evnt, last_evnt, x))
ODS_records = ODS.fetchall()
ODS_records = pd.DataFrame(ODS_records, columns=['nbr_aus', 'nbr_trans_aus', 'amt_finl_item', 'cd_idx', 'rate_curr_int', 'rate_gr_mrtg_mrgn', 'rate_loln_max_cap', 'rate_perdc_cap'])
When I try to run this code: this is the following error message:
error message
I'm new in python and sqlalchemy.
I already have a delete method working if I construct the where conditions by hand.
Now, I need to read the columns and values from an enter request in yaml format and create the where conditions.
#enter data as yaml
items:
- item:
table: [MyTable,OtherTable]
filters:
field_id: 1234
#other_id: null
Here is what I try and can't go ahead:
for i in use_case_cfg['items']:
item = i.get('item')
for t in item['table']:
if item['filters']:
filters = item['filters']
where_conditions = ''
count = 0
for column, value in filters.items():
aux = str(getattr(t, column) == bindparam(value))
if count == 0:
where_conditions += aux
else:
where_conditions += ', ' + aux
count += 1
to_delete = inv[t].__table__.delete().where(text(where_conditions))
#to_delete = t.__table__.delete().where(getattr(t, column) == value)
else:
to_delete = inv[t].__table__.delete()
CoreData.session.execute(to_delete)
To me, it looks ok, but when I run, I got the error below:
sqlalchemy.exc.StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter '9876'
[SQL: DELETE FROM MyTable WHERE "MyTable".field_id = %(1234)s]
[parameters: [{}]]
(Background on this error at: http://sqlalche.me/e/cd3x)
Can someone explain to me what is wrong or the proper way to do it?
Thanks.
There are two problems with the code.
Firstly,
str(getattr(t, column) == bindparam(value))
is binding the value as a placeholder, so you end up with
WHERE f2 = :Bob
but it should be the name that maps to the value in filters (so the column name in your case), so you end up with
WHERE f2 = :f2
Secondly, multiple WHERE conditions are being joined with a comma, but you should use AND or OR, depending on what you are trying to do.
Given a model Foo:
class Foo(Base):
__tablename__ = 'foo'
id = sa.Column(sa.Integer, primary_key=True)
f1 = sa.Column(sa.Integer)
f2 = sa.Column(sa.String)
Here's a working version of a segment of your code:
filters = {'f1': 2, 'f2': 'Bob'}
t = Foo
where_conditions = ''
count = 0
for column in filters:
aux = str(getattr(t, column) == sa.bindparam(column))
if count == 0:
where_conditions += aux
else:
where_conditions += ' AND ' + aux
count += 1
to_delete = t.__table__.delete().where(sa.text(where_conditions))
print(to_delete)
session.execute(to_delete, filters)
If you aren't obliged to construct the WHERE conditions as strings, you can do it like this:
where_conditions = [(getattr(t, column) == sa.bindparam(column))
for column in filters]
to_delete = t.__table__.delete().where(sa.and_(*where_conditions))
session.execute(to_delete, filters)
I'm trying to parse tables in a PDF using Camelot. The cells have multiple lines of texts in them, and some have an empty line separating portions of the text:
First line
Second line
Third line
I would expect this to be parsed as First line\nSecond line\n\nThird line (notice the double line breaks), but I get this instead: T\nFirst line\nSecond line\nhird line. The first character after a double-line-break moves to the beginning of the text, and I only get a single line-break instead.
I also tried using tabula, but that one messes up de entire table (data-frame actually) when there is an empty row in the table, and also in case of some words it puts a space between the characters.
EDIT:
My main issue is the removal of multiple line-breaks. The other one I could fix from code if I knew where the empty lines were.
my friend, can you check the example here
https://camelot-py.readthedocs.io/en/master/user/advanced.html#improve-guessed-table-rows
tables = camelot.read_pdf('group_rows.pdf', flavor='stream', row_tol=10)
tables[0].df
I solved the same problem with the code below
tables = camelot.read_pdf(file, flavor = 'stream', table_areas=['24,618,579,93'], columns=['67,315,369,483,571'], row_tol=10,strip_text='t\r\n\v')
I also encountered the same problem in case of a double line break. It was Switching Characters around as its doing in your case. I Spent some time looking at the code and i did some changes and fixed the issue. You can use the below code.
After Adding the below code, instead of using camelot.read_pdf, use the custom method i made read_pdf_custom()
And for a better experience, i suggest you using camelot v==0.8.2
import sys
import warnings
from camelot import read_pdf
from camelot import handlers
from camelot.core import TableList
from camelot.parsers import Lattice
from camelot.parsers.base import BaseParser
from camelot.core import Table
import camelot
from camelot.utils import validate_input, remove_extra,TemporaryDirectory,get_page_layout,get_text_objects,get_rotation,is_url,download_url,scale_image,scale_pdf,segments_in_bbox,text_in_bbox,merge_close_lines,get_table_index,compute_accuracy,compute_whitespace
from camelot.image_processing import (
adaptive_threshold,
find_lines,
find_contours,
find_joints,
)
class custom_lattice(Lattice):
def _generate_columns_and_rows(self, table_idx, tk):
# select elements which lie within table_bbox
t_bbox = {}
v_s, h_s = segments_in_bbox(
tk, self.vertical_segments, self.horizontal_segments
)
custom_horizontal_indexes=[]
custom_vertical_indexes=[]
for zzz in self.horizontal_text:
try:
h_extracted_text=self.find_between(str(zzz),"'","'").strip()
h_text_index=self.find_between(str(zzz),"LTTextLineHorizontal","'").strip().split(",")
custom_horizontal_indexes.append(h_text_index[1])
except:
pass
inserted=0
for xxx in self.vertical_text:
v_extracted_text=self.find_between(str(xxx),"'","'").strip()
v_text_index=self.find_between(str(xxx),"LTTextLineVertical","'").strip().split(",")
custom_vertical_indexes.append(v_text_index[1])
vertical_second_index=v_text_index[1]
try:
horizontal_index=custom_horizontal_indexes.index(vertical_second_index)
self.horizontal_text.insert(horizontal_index,xxx)
except Exception as exxx:
pass
self.vertical_text=[]
t_bbox["horizontal"] = text_in_bbox(tk, self.horizontal_text)
t_bbox["vertical"] = text_in_bbox(tk, self.vertical_text)
t_bbox["horizontal"].sort(key=lambda x: (-x.y0, x.x0))
t_bbox["vertical"].sort(key=lambda x: (x.x0, -x.y0))
self.t_bbox = t_bbox
cols, rows = zip(*self.table_bbox[tk])
cols, rows = list(cols), list(rows)
cols.extend([tk[0], tk[2]])
rows.extend([tk[1], tk[3]])
cols = merge_close_lines(sorted(cols), line_tol=self.line_tol)
rows = merge_close_lines(sorted(rows, reverse=True), line_tol=self.line_tol)
cols = [(cols[i], cols[i + 1]) for i in range(0, len(cols) - 1)]
rows = [(rows[i], rows[i + 1]) for i in range(0, len(rows) - 1)]
return cols, rows, v_s, h_s
def _generate_table(self, table_idx, cols, rows, **kwargs):
print("\n")
v_s = kwargs.get("v_s")
h_s = kwargs.get("h_s")
if v_s is None or h_s is None:
raise ValueError("No segments found on {}".format(self.rootname))
table = Table(cols, rows)
table = table.set_edges(v_s, h_s, joint_tol=self.joint_tol)
table = table.set_border()
table = table.set_span()
pos_errors = []
for direction in ["vertical", "horizontal"]:
for t in self.t_bbox[direction]:
indices, error = get_table_index(
table,
t,
direction,
split_text=self.split_text,
flag_size=self.flag_size,
strip_text=self.strip_text,
)
if indices[:2] != (-1, -1):
pos_errors.append(error)
indices = Lattice._reduce_index(
table, indices, shift_text=self.shift_text
)
for r_idx, c_idx, text in indices:
temp_text=text.strip().replace("\n","")
if len(temp_text)==1:
text=temp_text
table.cells[r_idx][c_idx].text = text
accuracy = compute_accuracy([[100, pos_errors]])
if self.copy_text is not None:
table = Lattice._copy_spanning_text(table, copy_text=self.copy_text)
data = table.data
table.df = pd.DataFrame(data)
table.shape = table.df.shape
whitespace = compute_whitespace(data)
table.flavor = "lattice"
table.accuracy = accuracy
table.whitespace = whitespace
table.order = table_idx + 1
table.page = int(os.path.basename(self.rootname).replace("page-", ""))
# for plotting
_text = []
_text.extend([(t.x0, t.y0, t.x1, t.y1) for t in self.horizontal_text])
_text.extend([(t.x0, t.y0, t.x1, t.y1) for t in self.vertical_text])
table._text = _text
table._image = (self.image, self.table_bbox_unscaled)
table._segments = (self.vertical_segments, self.horizontal_segments)
table._textedges = None
return table
class PDFHandler(handlers.PDFHandler):
def parse(
self, flavor="lattice", suppress_stdout=False, layout_kwargs={}, **kwargs
):
tables = []
with TemporaryDirectory() as tempdir:
for p in self.pages:
self._save_page(self.filepath, p, tempdir)
pages = [
os.path.join(tempdir, f"page-{p}.pdf") for p in self.pages
]
parser = custom_lattice(**kwargs) if flavor == "lattice" else Stream(**kwargs)
for p in pages:
t = parser.extract_tables(
p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
)
tables.extend(t)
return TableList(sorted(tables))
def read_pdf_custom(
filepath,
pages="1",
password=None,
flavor="lattice",
suppress_stdout=False,
layout_kwargs={},
**kwargs
):
if flavor not in ["lattice", "stream"]:
raise NotImplementedError(
"Unknown flavor specified." " Use either 'lattice' or 'stream'"
)
with warnings.catch_warnings():
if suppress_stdout:
warnings.simplefilter("ignore")
validate_input(kwargs, flavor=flavor)
p = PDFHandler(filepath, pages=pages, password=password)
kwargs = remove_extra(kwargs, flavor=flavor)
tables = p.parse(
flavor=flavor,
suppress_stdout=suppress_stdout,
layout_kwargs=layout_kwargs,
**kwargs
)
return tables
I want to make a request to the database and I need to substitute a few arguments.
This is my query:
SUBSCRIPTIONS = '''
select
distinct si.id,
s.name as service_name,
si.request_id,
si.activation_time,
si.extra->>'MSISDN' as msisdn,
(case when t.status = 'success' then t.amount else 0 end) as initial_amount
from processing.service_instance si
inner join processing.service s on s.id = si.service_id
inner join processing.payer_payment_source pps on pps.payer_id = si.payer_id
left join processing.service_instance si2 on
si2.payer_id = pps.payer_id and
si2.id < si.id and
left join processing.transaction t on t.service_instance_id = si2.id
where
si.service_id in (
select s.id from processing.service s
inner join processing.flow f on f.id = s.flow_id
inner join processing.contractor c on c.id = s.contractor_id
where
c.slug = 'mts_bank'
)
and si.instance_details->>'operator' = %operator
'''
My function:
def get_subscriptions_by_date(cur, date, operator):
try:
cur.execute(SUBSCRIPTIONS, {'date': date, 'operator': operator})
except Exception as e:
log.error(f'{traceback.format_exc()}')
raise ValueError(str(e))
else:
while cur.rownumber != cur.rowcount:
yield cur.fetchone()
I get this error:
cur.execute(SUBSCRIPTIONS, {'date': date, 'operator': operator})
TypeError: 'dict' object does not support indexing
How to fix this?
You have to write the values you want to replace as below when passing a dictionary to execute:
%(date)s
%(operator)s
Documentation on it is here for psycopg
For a simple select, pagination works as implemented here:
mheader_dict = dict(request.headers)
no_of_pgs = 0
if 'Maxpage' in mheader_dict.keys():
max_per_pg = int(mheader_dict['Maxpage'])
else:
max_per_pg = 100
page_no = int(request.headers.get('Pageno', type=int, default=1))
offset1 = (page_no - 1) * max_per_pg
s = select[orders]
if s is not None:
s = s.limit(max_per_pg).offset(offset1)
rs = g.conn.execute(s)
Conn is the connection object above
When text is used in the select statement, How to specify the limit?.How to rectify in below:
s1 = text('select d.*, (select array(select localities.name from localities, localities_boys where localities.id = localities_boys.locality_id and localities_boys.boy_id = d.id and localities_boys.boy_id is not null )) from delivery_boys d order by d.id;')
page_no = int(request.headers.get('Pageno', type=int, default=1))
offset1 = (page_no - 1) * max_per_pg
s1 = s1.limit(max_per_pg).offset(offset1)
rs1 = g.conn.execute(s1)
If s1 = s1.compile(engine) is used, it returns sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2 object which doesn't have limit functionality
How to convert sqlalchemy.sql.elements.TextClause to sqlalchemy.sql.selectable.Select using sqlalchemy core 1.0.8 to solve the above?
using sqlalchemy core v. 1.0.8, python 2.7,flask 0.12
Converted the TextClause to Select as :
s1 = select([text('d.*, (select array(select localities.name from localities, localities_boys where localities.id = localities_boys.locality_id and localities_boys.boy_id = d.id and localities_boys.boy_id is not null)) from delivery_boys d')])
Hence able to use limit and offset on the generated Select object