Filter SQL elements with adjacent ID - python

I don't really know how to properly state this question in the title.
Suppose I have a table Word like the following:
| id | text |
| --- | --- |
| 0 | Hello |
| 1 | Adam |
| 2 | Hello |
| 3 | Max |
| 4 | foo |
| 5 | bar |
Is it possible to query this table based on text and receive the objects whose primary key (id) is exactly one off?
So, if I do
Word.objects.filter(text='Hello')
I get a QuerySet containing the rows
| id | text |
| --- | --- |
| 0 | Hello |
| 2 | Hello |
but I want the rows
| id | text |
| --- | --- |
| 1 | Adam |
| 3 | Max |
I guess I could do
word_ids = Word.objects.filter(text='Hello').values_list('id', flat=True)
word_ids = [w_id + 1 for w_id in word_ids] # or use a numpy array for this
Word.objects.filter(id__in=word_ids)
but that doesn't seem overly efficient. Is there a straight SQL way to do this in one call? Preferably directly using Django's QuerySets?
EDIT: The idea is that in fact I want to filter those words that are in the second QuerySet. Something like:
Word.objects.filter(text__of__previous__word='Hello', text='Max')

In plain Postgres you could use the lag window function (https://www.postgresql.org/docs/current/static/functions-window.html)
SELECT
id,
name
FROM (
SELECT
*,
lag(name) OVER (ORDER BY id) as prev_name
FROM test
) s
WHERE prev_name = 'Hello'
The lag function adds a column with the text of the previous row. So you can filter by this text in a subquery.
demo:db<>fiddle
I am not really into Django but the documentation means, in version 2.0 the functionality for window function has been added.

If by "1 off" you mean that the difference is exactly 1, then you can do:
select w.*
from w
where w.id in (select w2.id + 1 from words w2 where w2.text = 'Hello');
lag() is also a very reasonable solution. This seems like a direct interpretation of your question. If you have gaps (and the intention is + 1), then lag() is a bit trickier.

Related

How to create a table from another table with GridDB?

I have a GridDB container where I have stored my database. I want to copy the table but this would exclude a few columns. The function I need should extract all columns matching a given keyword and then create a new table from that. It must always include the first column *id because it is needed on every table.
For example, in the table given below:
'''
-- | employee_id | department_id | employee_first_name | employee_last_name | employee_gender |
-- |-------------|---------------|---------------------|---------------------|-----------------|
-- | 1 | 1 | John | Matthew | M |
-- | 2 | 1 | Alexandra | Philips | F |
-- | 3 | 2 | Hen | Lotte | M |
'''
Suppose I need to get the first column and every other column starting with "employee". How can I do this through a Python function?
I am using GridDB Python client on my Ubuntu machine and I have already stored the database.csv file in the container. Thanks in advance for your help!

delete duplicates between two rows Tableau

how to delete duplicates between two values and keep the first value only on tableau for each user id ?
for example for a certain user :
| status | date |
| -------- | -------------- |
| success| 1/1/2022|
| fail| 1/2/2022|
| fail| 1/3/2022|
| fail| 1/4/2022|
| success| 1/5/2022|
i want the results to be :
| status | date |
| -------- | -------------- |
| success| 1/1/2022|
| fail| 1/2/2022|
| success| 1/5/2022|
on python it would be like this :
edited_data=[]
for key in d:
dup = [True]
total_len = len(d[key].index)
for i in range(1, total_len):
if d[key].iloc[i]['status'] == d[key].iloc[i-1]['status']:
dup.append(False)
else:
dup.append(True)
edited_data.append(d[key][dup])```
One way you could do this is with the LOOKUP() function. Since this particular problem requires each row to know what came before it, it will be important to make sure your dates are sorted correctly and that the table calculation is computed correctly. Something like this should work:
IF LOOKUP(MIN([Status]),-1) = MIN([Status]) THEN "Hide" ELSE "Show" END
And then simply hide or exclude the "Hide" rows.

Calculate lag difference in group

I am trying to solve a problem using just SQL (I am able to do this when combining SQL and Python).
Basically what I want to do is to calculate score changes per candidate, where a score consists of joining a score lookup table and then summing these individual event scores. If a candidate fails, they are required to retake the events. Here is an example output:
| brandi_id | retest | total_score |
|-----------|--------|-------------|
| 1 | true | 128 |
| 1 | false | 234 |
| 2 | true | 200 |
| 2 | false | 230 |
| 3 | false | 265 |
What I want is to first only calculate a score change for those candidates who took a retest, where the score change will just be the difference in total_score for retest is true minus retest = false:
| brandi_id | difference |
|-----------|------------|
| 1 | 106 |
| 2 | 30 |
This is the SQL that I am using (with this I need to use Python)
select e.brandi_id, e.retest, sum(sl.scaled_score) as total_score
from event as e
left join apf_score_lookup as sl
on sl.asmnt_code = e.asmnt_code
and sl.raw_score = e.score
where e.asmnt_code in ('APFPS','APFSU','APF2M')
group by e.brandi_id, e.retest
order by e.brandi_id;
I think the solution involves using LAG and PARTITION but I cannot get it. Thanks!
If someone does the retest only once, then you can use a join:
select tc.*, tr.score, (tc.score - tr.score) as diff
from t tc join
t tr
on tc.brandi_id = tr.brandi_id and
tc.retest = 'true' and tr.retest = 'false';
You don't describe your table layout. If the results are from the query in your question, you can just plug that in as a CTE.

flask-sqlalchemy count function

Consider a table named result with the following schema
+----+-----+---------+
| id | tag | user_id |
+----+-----+---------+
| 0 | A | 0 |
| 1 | A | 0 |
| 2 | B | 0 |
| 3 | B | 0 |
+----+-----+---------+
for user with id=0 I would like to count they number of times a result with tag=A has been appeared. For now I have implemented it using raw SQL statement
db.session.execute('select tag, count(tag) from result where user_id = :id group by tag', {'id':user.id})
How can I write it using flask-sqlalchemy APIs?
Most of results I get mention the sqlalchemy function db.func.count() which is not available in flask-sqlalchemy or has a different path which I am not aware of.
I was using PyCharm as my IDE and it was not showing module members correctly, hence I thought count is missing. Here is my solution for the above
user.results.add_columns(Result.tag, db.func.count(Result.tag)).group_by(Result.tag).all()

Identify duplicate values in dictionary and print in a table

I have a dictionary (d) where every key can have multiple values (appended as a list).
For example, dictionary has following two key,value pairs where one has duplicate values while other doesn't:
SPECIFIC-THREATS , ['5 SPECIFIC-THREATS Microsoft Windows print
spooler little endian DoS attempt', '4 SPECIFIC-THREATS obfuscated
RealPlayer Ierpplug.dll ActiveX exploit attempt', '4 SPECIFIC-THREATS
obfuscated RealPlayer Ierpplug.dll ActiveX exploit attempt']
and
TELNET , ['1 TELNET bsd exploit client finishing']
I want to go through the whole dictionary, check if any key has duplicate values and then print results in a table which has key, number of duplicate values, value (which appears multiple times) etc. as columns.
Here is what I have so far:
import texttable
import collections
def dupechecker():
t = texttable.Texttable()
for key, value in d.iteritems():
for x, y in collections.Counter(value).items():
if y > 1:
t.add_rows([["Category", "Number of dupe values", "Value which appears multiple times"], [key, y, x]])
print t.draw()
It works but the keys which do not have any duplicate values (i.e. TELNET in this case) wont appear in the table output (since the table is printed in the if condition statement). This is what I am getting:
+-------------------------+-------------------------+-------------------------+
| Category | Number of dupe values | Value which appears |
| | | multiple times |
+=========================+=========================+=========================+
| SPECIFIC-THREATS | 2 | 4 SPECIFIC-THREATS |
| | | obfuscated RealPlayer |
| | | Ierpplug.dll ActiveX |
| | | exploit attempt |
+-------------------------+-------------------------+-------------------------+
Is there anyway with which I can keep track of interesting parameters (no. of duplicate values and value which appears multiple times) for each key and then print them together. I want the output to be like:
+-------------------------+-------------------------+-------------------------+
| Category | Number of dupe values | Value which appears |
| | | multiple times |
+=========================+=========================+=========================+
| SPECIFIC-THREATS | 2 | 4 SPECIFIC-THREATS |
| | | obfuscated RealPlayer |
| | | Ierpplug.dll ActiveX |
| | | exploit attempt |
+-------------------------+-------------------------+-------------------------+
| TELNET | 0 | |
| | | |
| | | |
| | | |
+-------------------------+-------------------------+-------------------------+
UPDATE
Resolved
Just change your dupechecker to add rows also for "non-duplicates", but only once per category, add the header before the loop and print the table when you are done.
def dupechecker():
t = texttable.Texttable()
t.header(["Category", "Number of dupe values", "Value which appears multiple times"])
for key, value in d.iteritems():
has_dupe = False
for x, y in collections.Counter(value).items():
if y > 1:
has_dupe = True
t.add_row([key, y, x])
if not has_dupe:
t.add_row([key, 0, ''])
print t.draw()

Categories