Slow SQL Query (python/sqlalchemy) - python

Here is my query which is done in python and sqlalchemy, but I don't think this is sqlalchemy being slow, just me not knowing how to make fast queries. The query takes about 8 seconds, and returns 45,000 results.
games = s.query(Box_Score, Game).join(Game, Box_Score.espn_game_id==Game.espn_game_id)
.filter(Game.total!=999)
.filter(Game.a_line!=999)\
.order_by(Box_Score.date.desc()).all()
This is the query in regular SQL
SELECT box_scores.date AS box_scores_date, box_scores.id AS box_scores_id, box_scores.player_name AS box_scores_player_name, box_scores.team_name AS box_scores_team_name, box_scores.espn_player_id AS box_scores_espn_player_id, box_scores.espn_game_id AS box_scores_espn_game_id, box_scores.pass_attempt AS box_scores_pass_attempt, box_scores.pass_made AS box_scores_pass_made, box_scores.pass_yards AS box_scores_pass_yards, box_scores.pass_td AS box_scores_pass_td, box_scores.pass_int AS box_scores_pass_int, box_scores.pass_longest AS box_scores_pass_longest, box_scores.run_carry AS box_scores_run_carry, box_scores.run_yards AS box_scores_run_yards, box_scores.run_td AS box_scores_run_td, box_scores.run_longest AS box_scores_run_longest, box_scores.reception AS box_scores_reception, box_scores.reception_yards AS box_scores_reception_yards, box_scores.reception_td AS box_scores_reception_td, box_scores.reception_longest AS box_scores_reception_longest, box_scores.interception_lost AS box_scores_interception_lost, box_scores.interception_won AS box_scores_interception_won, box_scores.fg_attempt AS box_scores_fg_attempt, box_scores.fg_made AS box_scores_fg_made, box_scores.fg_longest AS box_scores_fg_longest, box_scores.punt AS box_scores_punt, box_scores.first_down AS box_scores_first_down, box_scores.penalty AS box_scores_penalty, box_scores.penalty_yards AS box_scores_penalty_yards, box_scores.fumbles AS box_scores_fumbles, box_scores.possession AS box_scores_possession, games.id AS games_id, games.espn_game_id AS games_espn_game_id, games.date AS games_date, games.status AS games_status, games.time AS games_time, games.season AS games_season, games.h_name AS games_h_name, games.a_name AS games_a_name, games.league AS games_league, games.h_q1 AS games_h_q1, games.h_q2 AS games_h_q2, games.h_q3 AS games_h_q3, games.h_q4 AS games_h_q4, games.h_ot AS games_h_ot, games.h_score AS games_h_score, games.a_q1 AS games_a_q1, games.a_q2 AS games_a_q2, games.a_q3 AS games_a_q3, games.a_q4 AS games_a_q4, games.a_ot AS games_a_ot, games.a_score AS games_a_score, games.possession_h2 AS games_possession_h2, games.d_yards_h1 AS games_d_yards_h1, games.f_yards_h1 AS games_f_yards_h1, games.h_ml AS games_h_ml, games.a_ml AS games_a_ml, games.h_h1_ml AS games_h_h1_ml, games.a_h1_ml AS games_a_h1_ml, games.h_q1_ml AS games_h_q1_ml, games.a_q1_ml AS games_a_q1_ml, games.h_h2_ml AS games_h_h2_ml, games.a_h2_ml AS games_a_h2_ml, games.h_line AS games_h_line, games.h_price AS games_h_price, games.a_line AS games_a_line, games.a_price AS games_a_price, games.h_open_line AS games_h_open_line, games.h_open_price AS games_h_open_price, games.a_open_line AS games_a_open_line, games.a_open_price AS games_a_open_price, games.h_h1_line AS games_h_h1_line, games.h_h1_price AS games_h_h1_price, games.a_h1_line AS games_a_h1_line, games.a_h1_price AS games_a_h1_price, games.h_q1_line AS games_h_q1_line, games.h_q1_price AS games_h_q1_price, games.a_q1_line AS games_a_q1_line, games.a_q1_price AS games_a_q1_price, games.h_h2_line AS games_h_h2_line, games.h_h2_price AS games_h_h2_price, games.a_h2_line AS games_a_h2_line, games.a_h2_price AS games_a_h2_price, games.total AS games_total, games.o_price AS games_o_price, games.u_price AS games_u_price, games.total_h1 AS games_total_h1, games.o_h1_price AS games_o_h1_price, games.u_h1_price AS games_u_h1_price, games.total_q1 AS games_total_q1, games.o_q1_price AS games_o_q1_price, games.u_q1_price AS games_u_q1_price, games.total_h2 AS games_total_h2, games.o_h2_price AS games_o_h2_price, games.u_h2_price AS games_u_h2_price
FROM box_scores JOIN games ON box_scores.espn_game_id = games.espn_game_id
WHERE games.total != :total_1 AND games.a_line != :a_line_1 ORDER BY box_scores.date DESC
This query takes over 3 seconds and returns 55,000 results.
box_scores = s.query(Box_Score).all()
I must be doing something wrong. I know people use databases with millions of entries regularly, so I don't get why selecting 50,000 rows should be a big deal. I also tried joining on Box_Score as opposed to Game, and taking out the order_by() part, and neither speeded up performance.
UPDATE; I am trying to learn what fragmentation is to answer the question below. Don't understand it yet but I did run the command PRAGMA page_count -> 64,785, and that doesn't seem like a big number. I also did sqlite nfl.db "VACUUM"; and then ran the query again and there was no performance improvement.

Related

Odd behavior with begins_with and a binary column in DynamoDB

Summary
When querying a binary range key using begins_with, some results are not returned even though they begin with the value being queried. This appears to only happen with certain values, and only in DynamoDB-local - not the AWS hosted version of DynamoDB.
Here is a gist you can run that reproduces the issue: https://gist.github.com/pbaughman/922db7b51f7f82bbd9634949d71f846b
Details
I have a DynamoDB table with the following schema:
user_id - Primary Key - binary - Contains 16 byte UUID
project_id_item_id - Sort Key - binary - 32 bytes - two UUIDs concatinated
While running my unit tests locally using the dynamodb-local docker image I have observed some bizarre behavior
I've inserted 20 items into my table like this:
table.put_item(
Item={
'user_id': user_id.bytes,
'project_id_item_id': project_id.bytes + item_id.bytes
}
)
Each item has the same user_id and the same project_id with a different item_id.
When I attempt to query the same data back out, sometimes (maybe 1 in 5 times that I run the test) I only get some of the items back out:
table.query(
KeyConditionExpression=
Key('user_id').eq(user_id.bytes) &
Key('project_id_item_id').begins_with(project_id.bytes))
)
# Only returns 14 items
If I drop the 2nd condition from the KeyConditionExpression, I get all 20 items.
If I run a scan instead of a query and use the same condition expression, I get all 20 items
table.scan(
FilterExpression=
Key('user_id').eq(user_id.bytes) &
Key('project_id_item_id').begins_with(project_id.bytes))
)
# 20 items are returned
If I print the project_id_item_id of every item in the table, I can see that they all start with the same project_id:
[i['project_id_item_id'].value.hex() for i in table.scan()['Items']]
# Result:
|---------Project Id-----------|
['76761923aeba4edf9fccb9eeb5f80cc40604481b26c84c73b63308dd588a4df1',
'76761923aeba4edf9fccb9eeb5f80cc40ec926452c294c909befa772b86e2175',
'76761923aeba4edf9fccb9eeb5f80cc460ff943b36ec44518175525d6eb30480',
'76761923aeba4edf9fccb9eeb5f80cc464e427afe84d49a5b3f890f9d25ee73b',
'76761923aeba4edf9fccb9eeb5f80cc466f3bfd77b14479a8977d91af1a5fa01',
'76761923aeba4edf9fccb9eeb5f80cc46cd5b7dec9514714918449f8b49cbe4e',
'76761923aeba4edf9fccb9eeb5f80cc47d89f44aae584c1c9da475392cb0a085',
'76761923aeba4edf9fccb9eeb5f80cc495f85af4d1f142608fae72e23f54cbfb',
'76761923aeba4edf9fccb9eeb5f80cc496374432375a498b937dec3177d95c1a',
'76761923aeba4edf9fccb9eeb5f80cc49eba93584f964d13b09fdd7866a5e382',
'76761923aeba4edf9fccb9eeb5f80cc4a6086f1362224115b7376bc5a5ce66b8',
'76761923aeba4edf9fccb9eeb5f80cc4b5c6872aa1a84994b6f694666288b446',
'76761923aeba4edf9fccb9eeb5f80cc4be07cd547d804be4973041cfd1529734',
'76761923aeba4edf9fccb9eeb5f80cc4c48daab011c449f993f061da3746a660',
'76761923aeba4edf9fccb9eeb5f80cc4d09bc44973654f39b95a91eb3e291c68',
'76761923aeba4edf9fccb9eeb5f80cc4d0edda3d8c6643ad8e93afe2f1b518d4',
'76761923aeba4edf9fccb9eeb5f80cc4d8d1f6f4a85e47d78e2d06ec1938ee2a',
'76761923aeba4edf9fccb9eeb5f80cc4dc7323adfa35423fba15f77facb9a41b',
'76761923aeba4edf9fccb9eeb5f80cc4f948fb40873b425aa644f220cdcb5d4b',
'76761923aeba4edf9fccb9eeb5f80cc4fc7f0583f593454d92a8a266a93c6fcd']
As a sanity check, here is the project_id I'm using in my query:
print(project_id)
76761923-aeba-4edf-9fcc-b9eeb5f80cc4 # Matches what's returned by scan above
Finally, the most bizarre part is I can try to match fewer bytes of the project ID and I start to see all 20 items, then zero items, then all 20 items again:
hash_key = Key('hash_key').eq(hash_key)
for n in range(1,17):
short_key = project_id.bytes[:n]
range_key = Key('project_id_item_id').begins_with(short_key)
count = table.query(KeyConditionExpression=hash_key & range_key)['Count']
print("If I only query for 0x{:32} I find {} items".format(short_key.hex(), count))
Gets me:
If I only query for 0x76 I find 20 items
If I only query for 0x7676 I find 20 items
If I only query for 0x767619 I find 20 items
If I only query for 0x76761923 I find 20 items
If I only query for 0x76761923ae I find 20 items
If I only query for 0x76761923aeba I find 20 items
If I only query for 0x76761923aeba4e I find 20 items
If I only query for 0x76761923aeba4edf I find 0 items
If I only query for 0x76761923aeba4edf9f I find 20 items
If I only query for 0x76761923aeba4edf9fcc I find 0 items
If I only query for 0x76761923aeba4edf9fccb9 I find 20 items
If I only query for 0x76761923aeba4edf9fccb9ee I find 0 items
If I only query for 0x76761923aeba4edf9fccb9eeb5 I find 20 items
If I only query for 0x76761923aeba4edf9fccb9eeb5f8 I find 20 items
If I only query for 0x76761923aeba4edf9fccb9eeb5f80c I find 20 items
If I only query for 0x76761923aeba4edf9fccb9eeb5f80cc4 I find 15 items
I am totally dumbfounded by this pattern. If the range key I'm searching for is 8, 10 or 12 bytes long I get no matches. If it's 16 bytes long I get fewer than 20 but more than 0 matches.
Does anybody have any idea what could be going on here? The documentation indicates that the begins_with expression works with Binary data. I'm totally at a loss as to what could be going wrong. I wonder if DynamoDB-local is doing something like converting the binary data to strings internally to do the comparisons and some of these binary patterns don't convert correctly.
It seems like it might be related to the project_id UUID. If I hard-code it to 76761923-aeba-4edf-9fcc-b9eeb5f80cc4 in the test, I can make it miss items every time.
This may be a six year old bug in DynamoDB local I will leave this question open in case someone has more insight, and I will update this answer if I'm able to find out more information from Amazon.
Edit: As of June 23rd, they have managed to reproduce the issue and it is in the queue to be fixed in a future release.
2nd Edit: As of August 4th, they are investigating the issue and a fix will be released shortly

elasticsearch-dsl aggregations returns only 10 results. How to change this

I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations.
I am following code
search.aggs.bucket('per_date', 'terms', field='date')\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()
This works fine but returns only 10 results in response.aggregations.per_ts.buckets
I want all the results
I have tried one solution with size=0 as mentioned in this question
search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()
But this results in error
TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')
I had the same issue. I finally found this solution:
s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()
After 2.x, size=0 for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case.
It is recommended to explicitly set reasonable value for size a number
between 1 to 2147483647.
Hope this helps.
This is a bit older but I ran into the same issue. What I wanted was basically an iterator that i could use to go through all aggregations that i got back (i also have a lot of unique results).
The best thing i found is to create a python generator like this
def scan_aggregation_results():
i=0
partitions=20
while i < partitions:
s = Search(using=elastic, index='my_index').extra(size=0)
agg = A('terms', field='my_field.keyword', size=999999,
include={"partition": i, "num_partitions": partitions})
s.aggs.bucket('my_agg', agg)
result = s.execute()
for item in result.aggregations.my_agg.buckets:
yield my_field.key
i = i + 1
# in other parts of the code just do
for item in scan_aggregation_results():
print(item) # or do whatever you want with it
The magic here is that elastic will automatically partition the number of results by 20, ie the number of partitions i define. I just have to set the size to something large enough to hold a single partition, in this case the result can be up to 20 million items large (or 20*999999). If you have much less items, like me, to return (like 20000) then you will just have 1000 results per query in your bucket, regardless that you defined a much larger size.
Using the generator construct as outlined above you can then even get rid of that and create your own scanner so to speak, iterating over all results individually, just what i wanted.
You should read the documentation.
So in your case, this should be like this :
search.aggs.bucket('per_date', 'terms', field='date')\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})[0:50]
response = search.execute()

Performance SQLAlchemy and or

I use the following sqlalchemy code to retrieve some data from a database
q = session.query(hd_tbl).\
join(dt_tbl, hd_tbl.c['data_type'] == dt_tbl.c['ID']).\
filter(or_(and_(hd_tbl.c['object_id'] == get_id(row['object']),
hd_tbl.c['data_type'] == get_id(row['type']),
hd_tbl.c['data_provider'] == get_id(row['provider']),
hd_tbl.c['data_account'] == get_id(row['account']))
for index, row in data.iterrows())).\
with_entities(hd_tbl.c['ID'], hd_tbl.c['object_id'],
hd_tbl.c['data_type'], hd_tbl.c['data_provider'],
hd_tbl.c['data_account'], dt_tbl.c['value_type'])
where hd_tbland dt_tbl are two tables in sql db, and datais pandas dataframe containing typically around 1k-9k entries. hd_tbl contains at the moment around 90k rows.
The execution time seems to exponentially grow with the length of data. The corresponding sql statement (by sqlalchemy) looks as follows:
SELECT data_header.`ID`, data_header.object_id, data_header.data_type, data_header.data_provider, data_header.data_account, basedata_data_type.value_type
FROM data_header INNER JOIN basedata_data_type ON data_header.data_type = basedata_data_type.`ID`
WHERE data_header.object_id = %s AND data_header.data_type = %s AND data_header.data_provider = %s AND data_header.data_account = %s OR
data_header.object_id = %s AND data_header.data_type = %s AND data_header.data_provider = %s AND data_header.data_account = %s OR
...
data_header.object_id = %s AND data_header.data_type = %s AND data_header.data_provider = %s AND data_header.data_account = %s OR
The tables and columns are fully indexed, and performance is not satisfying. Currently it is way faster to read all the data of hd_tbl and dt_tbl into memory and merge with pandas merge function. However, this is seems to be suboptimal. Anyone having an idea on how to improve the sqlalchemy call?
EDIT:
I was able to improve performance signifcantly by using sqlalchemy tuple_ in the following way:
header_tuples = [tuple([int(y) for y in tuple(x)]) for x in
data_as_int.values]
q = session.query(hd_tbl). \
join(dt_tbl, hd_tbl.c['data_type'] == dt_tbl.c['ID']). \
filter(tuple_(hd_tbl.c['object_id'], hd_tbl.c['data_type'],
hd_tbl.c['data_provider'],
hd_tbl.c['data_account']).in_(header_tuples)). \
with_entities(hd_tbl.c['ID'], hd_tbl.c['object_id'],
hd_tbl.c['data_type'], hd_tbl.c['data_provider'],
hd_tbl.c['data_account'], dt_tbl.c['value_type'])
with corresponding query...
SELECT data_header.`ID`, data_header.object_id, data_header.data_type, data_header.data_provider, data_header.data_account, basedata_data_type.value_type
FROM data_header INNER JOIN basedata_data_type ON data_header.data_type = basedata_data_type.`ID`
WHERE (data_header.object_id, data_header.data_type, data_header.data_provider, data_header.data_account) IN ((%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s), (%(param_5)s, ...))
I'd recommend you create a composite index on fields object_id, data_type, data_provider, ... with the same order, which they are placed in table, and make sure they're following in the same order in your WHERE condition. It may speed-up a bit your requests by cost of the disk space.
Also you may use several consequent small SQL requests instead a large query with complex OR condition. Accumulate extracted data on the application side or, if amount is large enough, in a fast temporary storage (a temporary table, noSQL, etc.)
In addition you may check MySQL configuration and increase values, related to memory volume per a thread, request, etc. A good idea is to check is your composite index fits into available memory, or it is useless.
I guess DB tuning may help a lot to increase productivity. Otherwise you may analyze your application's architecture to get more significant results.

sqlalchemy query using joinedload exponentially slower with each new filter clause

I have this sqlalchemy query:
query = session.query(Store).options(joinedload('salesmen').
joinedload('comissions').
joinedload('orders')).\
filter(Store.store_code.in_(selected_stores))
stores = query.all()
for store in stores:
for salesman in store.salesmen:
for comission in salesman.comissions:
#generate html for comissions for each salesman in each store
#print html document using PySide
This was working perfectly, however I added two new filter queries:
filter(Comissions.payment_status == 0).\
filter(Order.order_date <= self.dateEdit.date().toPython())
If I add just the first filter the application hangs for a couple of seconds, if I add both the application hangs indefinitely
What am I doing wrong here? How do I make this query fast?
Thank you for your help
EDIT: This is the sql generated, unfortunately the class and variable names are in Portuguese, I just translated them to English so it would be easier to undertand,
so Loja = Store, Vendedores = Salesmen, Pedido = Order, Comission = Comissao
Query generated:
SELECT "Loja"."CodLoja", "Vendedores_1"."CodVendedor", "Vendedores_1"."NomeVendedor", "Vendedores_1"."CodLoja", "Vendedores_1"."PercentualComissao",
"Vendedores_1"."Ativo", "Comissao_1"."CodComissao", "Comissao_1"."CodVendedor", "Comissao_1"."CodPedido",
"Pedidos_1"."CodPedido", "Pedidos_1"."CodLoja", "Pedidos_1"."CodCliente", "Pedidos_1"."NomeCliente", "Pedidos_1"."EnderecoCliente", "Pedidos_1"."BairroCliente",
"Pedidos_1"."CidadeCliente", "Pedidos_1"."UFCliente", "Pedidos_1"."CEPCliente", "Pedidos_1"."FoneCliente", "Pedidos_1"."Fone2Cliente", "Pedidos_1"."PontoReferenciaCliente",
"Pedidos_1"."DataPedido", "Pedidos_1"."ValorProdutos", "Pedidos_1"."ValorCreditoTroca",
"Pedidos_1"."ValorTotalDoPedido", "Pedidos_1"."Situacao", "Pedidos_1"."Vendeu_Teflon", "Pedidos_1"."ValorTotalTeflon",
"Pedidos_1"."DataVenda", "Pedidos_1"."CodVendedor", "Pedidos_1"."TipoVenda", "Comissao_1"."Valor", "Comissao_1"."DataPagamento", "Comissao_1"."StatusPagamento"
FROM "Comissao", "Pedidos", "Loja" LEFT OUTER JOIN "Vendedores" AS "Vendedores_1" ON "Loja"."CodLoja" = "Vendedores_1"."CodLoja"
LEFT OUTER JOIN "Comissao" AS "Comissao_1" ON "Vendedores_1"."CodVendedor" = "Comissao_1"."CodVendedor" LEFT OUTER JOIN "Pedidos" AS "Pedidos_1" ON "Pedidos_1"."CodPedido" = "Comissao_1"."CodPedido"
WHERE "Loja"."CodLoja" IN (:CodLoja_1) AND "Comissao"."StatusPagamento" = :StatusPagamento_1 AND "Pedidos"."DataPedido" <= :DataPedido_1
Your FROM clause is producing a Cartesian product and includes each table twice, once for filtering the result and once for eagerly loading the relationship.
To stop this use contains_eager instead of joinedload in your options. This will look for the related attributes in the query's columns instead of constructing an extra join. You will also need to explicitly join to the other tables in your query, e.g.:
query = session.query(Store)\
.join(Store.salesmen)\
.join(Store.commissions)\
.join(Store.orders)\
.options(contains_eager('salesmen'),
contains_eager('comissions'),
contains_eager('orders'))\
.filter(Store.store_code.in_(selected_stores))\
.filter(Comissions.payment_status == 0)\
.filter(Order.order_date <= self.dateEdit.date().toPython())

Filtering a set of data based on indices in line

I have a python script that pulls data from an external servers SQL database and sum's the values based on transaction numbers. I've gotten some assistance in cleaning up the result sets - which have been a huge help, but now I've hit another problem.
My original query:
SELECT th.trans_ref_no, th.doc_no, th.folio_yr, th.folio_mo, th.transaction_date, tc.prod_id, tc.gr_gals FROM TransHeader th, TransComponents tc WHERE th.term_id="%s" and th.source="L" and th.folio_yr="%s" and th.folio_mo="%s" and (tc.prod_id="TEXLED" or tc.prod_id="103349" or tc.prod_id="103360" or tc.prod_id="103370" or tc.prod_id="113107" or tc.prod_id="113093")and th.trans_ref_no=tc.trans_ref_no;
Returns a set of data that I've copied a snippet here:
"0520227370","0001063257","2014","01","140101","113107","000002000"
"0520227370","0001063257","2014","01","140101","TEXLED","000002550"
"0520227378","0001063265","2014","01","140101","113107","000001980"
"0520227378","0001063265","2014","01","140101","TEXLED","000002521"
"0520227380","0001063267","2014","01","140101","113107","000001500"
"0520227380","0001063267","2014","01","140101","TEXLED","000001911"
"0520227384","0001063271","2014","01","140101","113107","000003501"
"0520227384","0001063271","2014","01","140101","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000004000"
"0520227384","0001063271","2014","01","140101","TEXLED","000005103"
"0520227385","0001063272","2014","01","140101","113107","000007500"
"0520227385","0001063272","2014","01","140101","TEXLED","000009565"
"0520227388","0001063275","2014","01","140101","113107","000002000"
"0520227388","0001063275","2014","01","140101","TEXLED","000002553"
The updated query runs this twice and JOINS the trans_ref_no, which is the first position in the result set, so the first 6 lines get condensed into three, and the last four lines get condensed into two. The problem I'm having is getting transaction number 0520227384 to get condensed to two lines.
SELECT t1.trans_ref_no, t1.doc_no, t1.folio_yr, t1.folio_mo, t1.transaction_date, t1.prod_id, t1.gr_gals, t2.prod_id, t2.gr_gals FROM (SELECT th.trans_ref_no, th.doc_no, th.folio_yr, th.folio_mo, th.transaction_date, tc.prod_id, tc.gr_gals FROM Tms6Data.TransHeader th, Tms6Data.TransComponents tc WHERE th.term_id="00000MA" and th.source="L" and th.folio_yr="2014" and th.folio_mo="01" and (tc.prod_id="103349" or tc.prod_id="103360" or tc.prod_id="103370" or tc.prod_id="113107" or tc.prod_id="113093") and th.trans_ref_no=tc.trans_ref_no) t1 JOIN (SELECT th.trans_ref_no, th.doc_no, th.folio_yr, th.folio_mo, th.transaction_date, tc.prod_id, tc.gr_gals FROM Tms6Data.TransHeader th, Tms6Data.TransComponents tc WHERE th.term_id="00000MA" and th.source="L" and th.folio_yr="2014" and th.folio_mo="01" and tc.prod_id="TEXLED" and th.trans_ref_no=tc.trans_ref_no) t2 ON t1.trans_ref_no = t2.trans_ref_no;
Here is what the new query returns for transaction number 0520227384:
"0520227384","0001063271","2014","01","140101","113107","000003501","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000003501","TEXLED","000005103"
"0520227384","0001063271","2014","01","140101","113107","000004000","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000004000","TEXLED","000005103"
What I need to get out of this is a set of condensed lines where, in this group, the seconds and third need to be removed:
"0520227384","0001063271","2014","01","140101","113107","000003501","TEXLED","000004463"
"0520227384","0001063271","2014","01","140101","113107","000004000","TEXLED","000005103"
How can I go about filtering these lines from the updated query result set?
i think, the answer is:
(... your heavy sql ..) group by 7
or
(... your heavy sql ..) group by t1.gr_gals

Categories