Django - Postgresql - JSON Field - #>> operator - Index of out range - python

In a django project which is using postgresql DB, there is a collection called 'table1' which has a JSON field called 'data'. In this JSON field, we will be storing an email against a dynamic key. For ex:
ID | DATA
1 | '{"0_email": "user1#mail.com"}'
2 | '{"3_email": "user2#mail.com"}'
3 | '{"1_email": "user3#mail.com"}'
Problem Statement:
Filter out the rows in which "user2#mail.com" exists in the "data" field.
My Approach:
from django.db import connection
#transaction.atomic
def run(given_email):
with connection.cursor() as crsr:
crsr.execute(
"""
DECLARE mycursor CURSOR FOR
SELECT id, data
FROM table1
WHERE
data #>> '{}' like '%\"%s\"%'
""",
[given_email]
)
while True:
crsr.execute("FETCH 10 FROM mycursor")
chunk = crsr.fetchall()
# DO SOME OPERATIONS...
Explanation for data #>> '{}' like '%\"%s\"%':
I am using the #>> operator to get object at specific path of JSON as text.
I am providing '{}' empty path so that I will get the complete JSON as a text.
From this stringified JSON, I am checking if the given_email (user2#mail.com from the above example) is present
Then, I have pointed this function to a API in django in which I will get the given_email in payload. I am facing the below error which triggering this function:
Traceback (most recent call last): File
"project/lib/python3.9/site-packages/django_extensions/management/debug_cursor.py",
line 49, in execute
return utils.CursorWrapper.execute(self, sql, params) File "project/lib/python3.9/site-packages/django/db/backends/utils.py",
line 67, in execute File
"project/lib/python3.9/site-packages/django/db/backends/utils.py",
line 76, in _execute_with_wrappers File
"project/lib/python3.9/site-packages/django/db/backends/utils.py",
line 87, in _execute
self.db.validate_no_broken_transaction() IndexError: tuple index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File
"project/lib/python3.9/site-packages/django_extensions/management/debug_cursor.py",
line 54, in execute
raw_sql = raw_sql[:truncate] TypeError: 'NoneType' object is not subscriptable
Observations:
I don't think there is anything wrong with the query. In fact, I have tried this query in DBeaver and I am getting the expected response.
I am suspecting that the '{}' part in the query is causing some issue. So, I have tried to replace it with '\\{\\}', '{{}}' but it didn't work.

First, '{}' is interpreted as a text whereas a json path is of type text[].
So you should cast it as text[] in your query : WHERE data #>> '{}' :: text[] like '%\"%s\"%'
Then data #>> '{}' :: text[] could be simplified as data :: text which will provide the same result.
Finally, you convert your json data into a text and then you use the pattern matching like operator. This solution may provide some unexpected results with some values which contains the expected email as a substring but which are not equal to that expected email.
In order to have the exact result for the expected email, you should use a jsonb function, for example :
WHERE jsonb_path_exists(data :: jsonb, ('$.* ? (# == "' || expected_email || '")') :: jsonpath)

Related

Convert json query to insert a variable and re-convert it to json query

I am kinda frustrated. I copied the following Metabase query string from the network tab in the browser:
query = "{\"database\":17,\"query\":{\"source-table\":963,\"filter\":[\"and\",[\"=\",[\"field\",17580,null],\"XXXXXX_XXXXXX\"],[\"=\",[\"field\",17599,null],\"**chl-43d813dd-05a7-45b8-a5b0-8eb960289aa5**\"]],\"fields\":[[\"field\",17579,null],[\"field\",17569,null],[\"field\",17572,null],[\"field\",17586,null],[\"field\",17592,{\"temporal-unit\":\"default\"}],[\"field\",17611,null],[\"field\",17582,null],[\"field\",17597,null],[\"field\",17603,null],[\"field\",17607,null],[\"field\",17576,null],[\"field\",17588,null],[\"field\",17596,null],[\"field\",17608,null],[\"field\",17587,{\"temporal-unit\":\"default\"}],[\"field\",17578,{\"temporal-unit\":\"default\"}],[\"field\",17602,null],[\"field\",17606,null],[\"field\",17605,{\"temporal-unit\":\"default\"}],[\"field\",17601,null],[\"field\",17590,null],[\"field\",17580,null],[\"field\",17598,{\"temporal-unit\":\"default\"}],[\"field\",17577,null],[\"field\",164910,null],[\"field\",46951,null],[\"field\",46952,{\"temporal-unit\":\"default\"}]]},\"type\":\"query\",\"middleware\":{\"js-int-to-string?\":true,\"add-default-userland-constraints?\":true}}"
As the next step I wanted to convert it to a String to replace the bold reference with a variable.
The String looks like this:
query = '{"database\":17,\"query\":{\"source-table\":963,\"filter\":[\"and\",[\"=\",[\"field\",17580,null],\"XXXXXXXX-XXXXXXXX\"],[\"=\",[\"field\",17599,null],\"'+channelRef+'\"]],\"fields\":[[\"field\",17579,null],[\"field\",17569,null],[\"field\",17572,null],[\"field\",17586,null],[\"field\",17592,{\"temporal-unit\":\"default\"}],[\"field\",17611,null],[\"field\",17582,null],[\"field\",17597,null],[\"field\",17603,null],[\"field\",17607,null],[\"field\",17576,null],[\"field\",17588,null],[\"field\",17596,null],[\"field\",17608,null],[\"field\",17587,{\"temporal-unit\":\"default\"}],[\"field\",17578,{\"temporal-unit\":\"default\"}],[\"field\",17602,null],[\"field\",17606,null],[\"field\",17605,{\"temporal-unit\":\"default\"}],[\"field\",17601,null],[\"field\",17590,null],[\"field\",17580,null],[\"field\",17598,{\"temporal-unit\":\"default\"}],[\"field\",17577,null],[\"field\",164910,null],[\"field\",46951,null],[\"field\",46952,{\"temporal-unit\":\"default\"}]]},\"type\":\"query\",\"middleware\":{\"js-int-to-string?\":true,\"add-default-userland-constraints?\":true}}'
With
q = json.dumps(query)
the result looks exactly as I want to:
q = "{\"database\":17,\"query\":{\"source-table\":963,\"filter\":[\"and\",[\"=\",[\"field\",17580,null],\"XXXXXXXX-XXXXXXXX\"],[\"=\",[\"field\",17599,null],\"**chl-caabef81-f081-4532-9b6e-ac20b3d4c6cf**\"]],\"fields\":[[\"field\",17579,null],[\"field\",17569,null],[\"field\",17572,null],[\"field\",17586,null],[\"field\",17592,{\"temporal-unit\":\"default\"}],[\"field\",17611,null],[\"field\",17582,null],[\"field\",17597,null],[\"field\",17603,null],[\"field\",17607,null],[\"field\",17576,null],[\"field\",17588,null],[\"field\",17596,null],[\"field\",17608,null],[\"field\",17587,{\"temporal-unit\":\"default\"}],[\"field\",17578,{\"temporal-unit\":\"default\"}],[\"field\",17602,null],[\"field\",17606,null],[\"field\",17605,{\"temporal-unit\":\"default\"}],[\"field\",17601,null],[\"field\",17590,null],[\"field\",17580,null],[\"field\",17598,{\"temporal-unit\":\"default\"}],[\"field\",17577,null],[\"field\",164910,null],[\"field\",46951,null],[\"field\",46952,{\"temporal-unit\":\"default\"}]]},\"type\":\"query\",\"middleware\":{\"js-int-to-string?\":true,\"add-default-userland-constraints?\":true}}"
But when I use this query string to send an API request, I get the following error message(s):
{"via":[{"type":"java.lang.ClassCastException"}],"trace":[],"message":null}
Traceback (most recent call last):
File "c:\Users\XXXX\Documents\XXXXXXXX\Test.py", line 308, in
main()
File "c:\Users\XXXX\Documents\XXXXXXXX\Test.py", line 114, in main some_function(XXXX, window, selected_path)
File "c:\Users\XXXX\Documents\XXXXXXXX\Test.py", line 290, in some_function
dataframe = DataFrame(result)
File "C:\Users\XXXX\AppData\Roaming\Python\Python310\site-packages\pandas\core\frame.py", line 756, in init
raise ValueError("DataFrame constructor not properly called!")
ValueError: DataFrame constructor not properly called!
Does have anyone have an idea?
Thank you very much in advance!
You can use the built-in json module:
import json
query = "{\"database\":17,\"query\":{\"source-table\":963,\"filter\":[\"and\",[\"=\",[\"field\",17580,null],\"XXXXXX_XXXXXX\"],[\"=\",[\"field\",17599,null],\"**chl-43d813dd-05a7-45b8-a5b0-8eb960289aa5**\"]],\"fields\":[[\"field\",17579,null],[\"field\",17569,null],[\"field\",17572,null],[\"field\",17586,null],[\"field\",17592,{\"temporal-unit\":\"default\"}],[\"field\",17611,null],[\"field\",17582,null],[\"field\",17597,null],[\"field\",17603,null],[\"field\",17607,null],[\"field\",17576,null],[\"field\",17588,null],[\"field\",17596,null],[\"field\",17608,null],[\"field\",17587,{\"temporal-unit\":\"default\"}],[\"field\",17578,{\"temporal-unit\":\"default\"}],[\"field\",17602,null],[\"field\",17606,null],[\"field\",17605,{\"temporal-unit\":\"default\"}],[\"field\",17601,null],[\"field\",17590,null],[\"field\",17580,null],[\"field\",17598,{\"temporal-unit\":\"default\"}],[\"field\",17577,null],[\"field\",164910,null],[\"field\",46951,null],[\"field\",46952,{\"temporal-unit\":\"default\"}]]},\"type\":\"query\",\"middleware\":{\"js-int-to-string?\":true,\"add-default-userland-constraints?\":true}}"
my_json = json.loads(query)
# make edit's (works like a dict)
query = json.dumps(my_json)
I don't see a bold reference in your JSON string, but this is all handled with the json library:
import json
query = "YOUR QUERY STRING"
object = json.dumps(query)
# Make your changes to your dict object here
new_query = json.loads(object)

psycopg2 fetchone() method returning a single-element tuple containing a string representing the «result» tuple

I am trying to write chess-oriented functionalities for a discord bot.
One of these is to being able to play correspondence games.
I am using a postgresql database to store every games.
The problem i have is that when i call cursor.fetchone() after the execution of a SELECT sql request, the returned object is a single-element tuple containing a string that represents the wanted tuple.
For instance :
('(351817698172207105,"",1)',) instead of (351817698172207105,"",1)
I installed psycopg2 with pip3 (and using it with python 3.6.7).
I got round that problem by using ast.literal_eval in the first SELECT request (creating and accepting a challenge)
But the other request contains the PGN of the game, which contains many quotation marks, and that makes the literal_eval function fail.
I could mess with the returned str, but i'm not sure it is the best option (and i'd like to understand why it is like this).
The first «get-around» :
from ast import literal_eval as make_tuple
cdesc = psycopg2.connect(**params_db)
curs = cdesc.cursor()
modele_req = "SELECT (id_j1, id_j2, id_challenge) FROM challenges WHERE (id_j2='{0}' OR id_j2 = '') AND id_challenge = {1};"
# id_acceptant and id_partie_acceptee are given
req = modele_req.format(id_acceptant, str(id_partie_acceptee))
res_tuple = curs.fetchone()
# print(res_tuple) produces the following output :
# ('(351817698172207105,"",1)',)
(idj1, idj2, idpartie) = make_tuple(res_tuple[0])
Also, the following is an example of the returned tuple in the "second" request :
( '("[Event ""?""]\n[Site ""?""]\n[Date ""????.??.??""]\n[Round ""?""]\n[White ""?""]\n[Black ""?""]\n[Result ""*""]\n\n*",351817698172207105,351817698172207105,t)' ,)
When using the make_tuple/literal_eval function, i (obviously) have the following error:
(...)
File "/home/synophride/projets/discord_bot/bot/commandes_echecs.py", line 568, in move_bd
(game_pgn, id_blanc, id_noir, joueur_jouant) = make_tuple(str_tuple)
File "/usr/lib/python3.6/ast.py", line 85, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python3.6/ast.py", line 59, in _convert
return tuple(map(_convert, node.elts))
File "/usr/lib/python3.6/ast.py", line 84, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Name object at 0x7fe1956f5978>
In short :
Is it normal that the cursor.fetchone() method returns that kind of tuple ?
If no, what did i do wrong and what can i do to rectify that ?
Thanks for reading, and sorry for my eventually dubious english.
When you are using brackets in your query, postgres doesnt return the columns (they are stacked into one record).
Your query should look like this, to get all columns:
modele_req = "SELECT id_j1, id_j2, id_challenge FROM challenges WHERE ..."
Then psycopg2 will return a tuple of 3 items, that is not represented as a string:
((351817698172207105,"",1),)

could not determine data type of parameter $1 in python-pgsql

I have a simple table (named test) as:
id | integer
name | character varying(100)
intval | integer
When I try to use prepare statement to update the name like this in python. (I am using python-pgsql http://pypi.python.org/pypi/python-pgsql/)
>>> for i in db.execute("select * from test"): print i
...
(1, 'FOO', None)
>>> query = "UPDATE test set name = '$1' where name = '$2'"
>>> cu.execute(query, "myname", "FOO")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/pgsql.py", line 119, in execute
ret = self._source.execute(operation, params)
ProgrammingError: ERROR: could not determine data type of parameter $1
The demo file for the module can be seen at http://sprunge.us/VgLY?python.
I'm guessing that it could be your single quotes around $1 and $2 inside your string. In the main changes from PYGresql it says that:
support for bind parameters, alleviating the need for extensive,
expensive and vulnerable quoting of user-supplied data
So I'm assuming the single quotes are overloading the string with too many quotes, or just breaking the parser in python-pgsql.

error to use django cursor to save escaped characters

I have a url which I want to save into the MySQL database using the "cursor" tool offered by django, but I keep getting the "not enough arguments for format string" error because this url contains some escaped characters (non-ascii characters). The testing code is fairly short:
test.py
import os
import runconfig #configuration file
os.environ['DJANGO_SETTINGS_MODULE'] = runconfig.django_settings_module
from django.db import connection,transaction
c = connection.cursor()
url = "http://www.academicjournals.org/ijps/PDF/pdf2011/18mar/G%C3%B3mez-Berb%C3%ADs et al.pdf"
dbquery = "INSERT INTO main_crawl_document SET url="+url
c.execute(dbquery)
transaction.commit_unless_managed()
The full error message is
Traceback (most recent call last):
File "./test.py", line 14, in <module>
c.execute(dbquery)
File "/usr/local/lib/python2.6/site-packages/django/db/backends/util.py", line 38, in execute
sql = self.db.ops.last_executed_query(self.cursor, sql, params)
File "/usr/local/lib/python2.6/site-packages/django/db/backends/__init__.py", line 505, in last_executed_query
return smart_unicode(sql) % u_params
TypeError: not enough arguments for format string
Can anybody help me?
You're opening yourself up for a possible SQL injection. Instead, use c.execute() properly:
url = "http://www.academicjournals.org/ijps/PDF/pdf2011/18mar/G%C3%B3mez-Berb%C3%ADs et al.pdf"
dbquery = "INSERT INTO main_crawl_document SET url=?"
c.execute(dbquery, (url,))
transaction.commit_unless_managed()
The .execute method should accept an iterable of parameters to use for escaping, assuming it's the normal dbapi method (which it should be with Django).

Python and MySQL STR_TO_DATE

When I execute the code (see below) I got the next error
Traceback (most recent call last):
File "/home/test/python/main.py", line 18, in <module>
calls = tel.getCallRecordings(member['id'], member['start_date'])
File "/home/test/python/teldb.py", line 49, in getCallRecordings
'start_date': str(start_date),
File "/usr/local/lib/python2.5/site-packages/MySQL_python-1.2.3-py2.5-freebsd-8.1-RELEASE-amd64.egg/MySQLdb/cursors.py", line 159, in execute
TypeError: int argument required
The code I use is the next:
def getCallRecordings(self, member_id, start_date):
self.cursor.execute("""
SELECT var10 as filename,
var9 as duration
FROM csv_data c
LEFT JOIN transcriptions t ON
c.id=t.call_id
WHERE member_id=%(member_id)s AND
var10 IS NOT NULL AND
var9>%(min_duration)d AND
dialed_date>STR_TO_DATE(%(start_date)s, '%%Y-%%m-%%d %%H:%%i:%%s') AND
t.call_id IS NULL
ORDER BY dialed_date DESC
""", {
'member_id': member_id,
'min_duration': MIN_DURATION,
'start_date': str(start_date),
})
logging.debug("Executed query: %s" % self.cursor._executed)
return self.cursor.fetchone()
Why I got this error? Thanks.
You don't show what MIN_DURATION is, but you use %(min_duration)d, so it is required to be an int. Do you have it defined as a string or a float instead?
You don't need to use %d in a SQL query like this, the DB adapter understands the types you pass it, and will properly insert them into the query. Make MIN_DURATION an integer (perhaps with int(MIN_DURATION) in the code you have), and it will work.
BTW: the stack trace focuses you on the str(start_date), but only because that's the last source line in the executable statement that had the error, so it's misleading.
I have faced a similar issue while constructing the sql query strings involving str_to_date function. I have used .format() operator on python strings to parametrize the SQL queries. Here is an example query that would work just fine:
# Takes an input date and returns the manager's name during that period
input_string='"' + "{input_data}".format(input_data=input_data) + '"'
sql_query='''select CONCAT(e.last_name, ",", e.first_name),STR_TO_DATE(from_date,'%Y-%m-%d'),STR_TO_DATE(to_date,'%Y-%m-%d') from dept_manager dm inner join employees e on e.emp_no=dm.emp_no where STR_TO_DATE({input_string},'%Y-%m-%d')>STR_TO_DATE(from_date,'%Y-%m-%d') and STR_TO_DATE({input_string},'%Y-%m-%d')<STR_TO_DATE(to_date,'%Y-%m-%d');'''.format(input_string=input_string)

Categories