I have an SQL database that I'm pulling values with a WHERE clause to filter. However, I want to see if an INTEGER field value is a substring of a TEXT field value. The caveat is that I need to use wildcards.
Some sample data:
id components
123 '234,345'
234 -
345 -
456 -
So I thought to use CAST(id AS TEXT) and concatenate it with '%' wildcard characters (using ||) in case the id is in the middle of the components string. But this doesn't work, i.e.
SELECT t1.*, t2.* FROM table t1 JOIN table t2 WHERE (t1.components IS NULL OR
(t1.components IS NOT NULL AND t1.components
NOT LIKE '%' || CAST(t2.id AS TEXT) || '%'))
OR (t2.components IS NULL OR (t2.components IS NOT NULL AND t2.components
NOT LIKE '%' ||
CAST(t1.id AS TEXT) || '%')))
Should NOT return record pairs [123,234] or [123,345]
In short, how do I get '%' || CAST(t2.id AS TEXT) || '%' read in as '%234%' in my query?
Thanks!
UPDATE 9-29-2014: Ok, I figured it out. It does work. I just had a problem with my parentheses and had an OR where I should have had an AND. The new working code (though probably not optimal #deets) is:
SELECT t1.*, t2.* FROM table t1 JOIN table t2 WHERE (t1.components IS NULL OR
(t1.components IS NOT NULL AND t1.components
NOT LIKE '%' || CAST(t2.id AS TEXT) || '%'))
AND (t2.components IS NULL OR (t2.components IS NOT NULL AND t2.components
NOT LIKE '%' || CAST(t1.id AS TEXT) || '%')))
Related
I want to retrieve all the rows of a table where a substring "h" is contained in any of the columns)
I tried something like this:
list_of_columns = [c for c in my_table.columns] # where my_table is of class Table
with my_engine.connect() as conn:
result = conn.execute(select(my_table).where(
list_of_columns.contains(q),
))
Of course this does not work, as "contains()" should be called on a single column...
Any idea ?
p.s: the retrieving of the columns must be dynamic, this is the way my code must work
[EDIT]
An almost working solution:
with my_engine.connect() as conn:
result = conn.execute(select(my_table).where(
or_(
list_of_columns[0].contains(q),
list_of_columns[1].contains(q),
...
)
))
But, I need the listing of the columns to be dynamic
[EDIT 2]
Here is the "computers1" table that I am trying to request, with two rows:
Here is the entire SQL sentence sent (I forced to search for the string 'eee')=:
[SQL: SELECT computers1.id, computers1.name, computers1.ip, computers1.options
FROM computers1
WHERE (computers1.id LIKE '%%' || %(id_1)s || '%%') OR (computers1.name LIKE '%%' || %(name_1)s || '%%') OR (computers1.ip LIKE '%%' || %(ip_1)s || '%%') OR (computers1.options LIKE '%%' || %(options_1)s || '%%')]
[parameters: {'id_1': 'eee', 'name_1': 'eee', 'ip_1': 'eee', 'options_1': 'eee'}]
But still, doing: conn.execute(THE_SENTENCE).fetchall() returns False...
You can use a list comprehension to add all the columns to the query:
with my_engine.connect() as conn:
result = conn.execute(select(my_table).where(
or_(*[col.contains(q) for col in list_of_columns])
))
For this kind of search, you might also get better performance and results by using postgresql's full-text search functionality, and creating a tsvector that combines all of the columns: https://stackoverflow.com/a/42390204/16811479
I have this query that has been written in SQL and it works but I want to write it in pyspark but it's not returning the desired output
SQL code
select count(distinct(studen_num)) from user
where email is null
and matnumber is null
and (academic_history is null or exists (academic_history, x -> x.grade = "A" and
x.level is null))
and graduation_status is null
and exists (investigated_result, x -> x.panelseated[0].grade = "1" and x.level is null)
This is what I have done
df_count_student = df.filter(df.email.isNull() & df.matnumber.isNull() &
(df.academic_history.isNull() | (array_contains(df.academic_history.grade, "A") &
df.academic_history.level.isNull())) & (array_contains(df.investigated_result.panelseated[0].grade, "1") &
df.investigated_result.level.isNull()))
display(df_count_student(countDistinct())
The problem is that I count display was 0 but got the right value when tried in the SQL code.
Hi im trying to make a python script which will update a line each time the script is run.
im having a bit of a headscratcher to how best to tackle the part where i have to update each line (a select statement) based on the value of a dataframe.
simplified i got a string 'select null r_cnt, null t_cnt, null r_dur, null t_dur from my_table'
and i got a list containing the fields for this line [t_cnt, r_dur]
I then want the new string output to be the first string where we have removed the null in front of the the values which was present in my list but kept null in front of those not in the list.
'select null r_cnt, t_cnt, r_dur, null t_dur from my_table'
my whole code looks something like this now where im stuck at the point i mentioned above
str_to_execute = f"select * from {db}.table_desc where grp_id in (400,500,300,1200) and id not in(127,140,125)"
cursor.execute(str_to_execute)
df = as_pandas(cursor)
for index, row in df.iterrows():
# print(row['name'])
str_to_execute = f"SHOW COLUMN STATS {db}.ctrl_{row['id']}"
cursor.execute(str_to_execute)
loop = as_pandas(cursor)
for index, row in loop.iterrows():
print(row['Column'])
str_to_execute = f"select concat(cast(ctrl_id as string),cast(ctrl_date as string)) primarykey, ctrl_id, ctrl_date,null r_cnt, null t_cnt, null r_dur, null t_dur,null r_amt, null t_amt,null p_cnt, null p_dur,null p_amt, null ro_vol, null t_vol, null r_vol, null p_vol, null ro_amt, null ro_cnt, from {db}.ctrl_{row['id']}"
if #This is where im stuck
Try:
s = 'select null r_cnt, null t_cnt, null r_dur, null t_dur from my_table'
lst = ['t_cnt', 'r_dur']
checklist = ['null r_cnt', 'null t_cnt', 'null r_dur']
checkliststr = ','.join(checklist)
for itm in lst:
if itm in checkliststr:
print('null ' + itm)
s=s.replace('null ' + itm, itm)
print(s)
You could just split the functionality of your string in this case Select and from my table from the input of your statement, which is more or less if I understand your correctly a list of strings.
One possible solution would be the following:
Define or get your lists of strings:
origin_list = ["null r_cnt", "null t_cnt", "null r_dur", "null t_dur"]
goal_list = ["t_cnt", "r_dur"]
For each element of your origin_list you want to edit the element according to its existence in the goal_list, I would do it with a map and lambda:
edited_list = list(map(lambda x: edit(x, goal_list), origin_list))
Now we have to define the function and logic from your post I derived the following logic or something similar:
Now you have the adjusted string and can merge it back together with your functionality.
original data:
how the data would look like after required transformation:
I have tried melt function in python pandas, but I am only able to pivot on one column. I am sure I must be missing something.
Below is for BigQuery Standard SQL
execute immediate (
with types as (
select
array_to_string(types, ',') values_list,
regexp_replace(array_to_string(types, ','), r'([^,]+)', r'"\1"') columns_list
from (
select regexp_extract_all(to_json_string(t), r'"([^""]+)":') types
from (
select * except(Country, Branch, Category)
from `project.dataset.your_table` limit 1
) t
)
), categories as (
select distinct Category
from `project.dataset.your_table`
)
select '''
select Country, Branch, Output, ''' ||
(select string_agg('''
max(if(Category = "''' || Category || '''", val, null)) as ''' || Category )
from categories)
|| '''
from (
select Country, Branch, Category,
type[offset(offset)] Output, val
from `project.dataset.your_table` t,
unnest([''' || values_list || ''']) val with offset,
unnest([struct([''' || columns_list || '''] as type)])
)
group by Country, Branch, Output
'''
from types
);
if applied to sample data in your question - output is
I have to check if there is any null in my database,
and I need to check 11 columns (by or) and plus and like year (ex. 2017%).
def test():
sql= "select date from a000760 where (total_assets is null or total_liabilities is null or sales_figures is null or sales_cost is null or business_profits is null or gross_margin is null or current_income is null or depreciation_expense_of_tangible_assets is null or liquid_asset is null or noncurrent_asset is null or liquid_liability is null) and (date like '2010%')"
curs.execute(sql)
#year="2010"
#curs.execute("select date from a000760 where (total_assets is null or total_liabilities is null or sales_figures is null or sales_cost is null or business_profits is null or gross_margin is null or current_income is null or depreciation_expense_of_tangible_assets is null or liquid_asset is null or noncurrent_asset is null or liquid_liability is null) and (date like %s)",year)
result = curs.fetchall()
if len(result)>0: // print shows () even if it's none. so, I use this
print "a000760"
print "2010 null exists"
It's the test version of one table.
I have to check more than 2000 tables.
It works for this def and shows error (only for one table).
But it doesn't work for total tables.
And I get this error:
Warning: (1292, "Incorrect date value: '2010%' for column 'date' at row 1")
I don't know how...
I've searched for the whole grammar,
but when I type %2017% it doesn't work.
Do not use like with dates! Dates are not strings.
You can just do:
year(date) = 2010
Or:
date >= '2010-01-01' and date < '2011-01-01'