Is there a tool to convert a sql statement into python, if it's possible. For example:
(CASE WHEN var = 2 then 'Yes' else 'No' END) custom_var
==>
customVar = 'Yes' if var == 2 else 'No'
I am trying to provide a API for ETL-like transformations from a json input. Here's an example of an input:
{
"ID": 4,
"Name": "David",
"Transformation: "NewField = CONCAT (ID, Name)"
}
And we would translate this into:
{
"ID": 4,
"Name": "David",
"NewField: "4David"
}
Or, is there a better transformation language that could be used here over SQL?
Is SET NewField = CONCAT (ID, Name) actually valid sql? (if Newfield is a variable do you need to declare it and prefix with "#"?). If you want to just execute arbitrary SQL, you could hack something together with sqlite:
import sqlite3
import json
query = """
{
"ID": "4",
"Name": "David",
"Transformation": "SELECT ID || Name AS NewField FROM inputdata"
}"""
query_dict = json.loads(query)
db = sqlite3.Connection('mydb')
db.execute('create table inputdata ({} VARCHAR(100));'.format(' VARCHAR(100), '.join(query_dict.keys())))
db.execute('insert into inputdata ({}) values ("{}")'.format(','.join(query_dict.keys()),'","'.join(query_dict.values())))
r = db.execute(query_dict['Transformation'])
response = {}
response[r.description[0][0]] = r.fetchone()[0]
print(response)
#{'NewField': '4David'}
db.execute('drop table inputdata;')
db.close()
Related
I want to get multiple entries into a table using a single JSON input but I don't know how to go from here. I was to make it to where once the user states
[
{
"VIN": "kjasdfh",
"Make": "Toyota",
"model": "Corolla",
"Year": 1998
},
{
"VIN": "wqeiryu",
"Make": "Honda",
"model": "Civic",
"Year": 1997
}
]
I wasn't to make it where the first one has its own entries and the second one has another entry.
#app.route('/api/addcar', methods = ['POST']) # This is a post method because the user needs to be able to add info
def adding_stuff():
request_data = request.get_json() # Gets the info from the table and converts to JSON format
new_vin = request_data['VIN']
new_make = request_data['Make']
new_year = request_data['Year']
new_color = request_data['Color']
sql = "INSERT INTO carsTEST (VIN, Make, Year, Color, username) VALUES ('%s', '%s', %s, '%s')" % (new_vin, new_make, new_year, new_color) # This sql statement will then be uploaded to the databse to add a new record
conn = create_connection()
execute_query(conn, sql) # This will execute the query
return 'Post worked'
You will have to turn your json array into a model array (with serializer or something your framework offers) .
Or try to build your raw sql with something like this:
json_data = [
{
"VIN": "kjasdfh",
"Make": "Toyota",
"model": "Corolla",
"Color": "red",
"Year": 1998
},
{
"VIN": "wqeiryu",
"Make": "Honda",
"model": "Civic",
"Color": "white",
"Year": 1997
}
]
sql = "INSERT INTO carsTEST (VIN, Make, Year, Color, username) VALUES"
for jo in json_data:
new_vin = jo['VIN']
new_make = jo['Make']
new_year = jo['Year']
new_color = jo['Color']
value_sql = "('{}', '{}', '{}', '{}'),".format(new_vin, new_make, new_year, new_color)
sql = sql + value_sql
sql = sql.rstrip(',') + ";"
print(sql)
Edit1:
The method should be something like :
#app.route('/api/addcar', methods=['POST'])
def adding_stuff():
request_data = request.get_json() # post body must be a json array
sql = "INSERT INTO carsTEST (VIN, Make, Year, Color) VALUES"
for jo in request_data:
new_vin = jo['VIN']
new_make = jo['Make']
new_year = jo['Year']
new_color = jo['Color']
value_sql = "('{}', '{}', '{}', '{}'),".format(new_vin, new_make, new_year, new_color)
sql = sql + value_sql
sql = sql.rstrip(',') + ";"
conn = create_connection()
execute_query(conn, sql) # This will execute the query
return 'Post worked'
The result I got from SQLite in Python looks like this:
{"John", "Alice"}, {"John", "Bob"}, {"Jogn", "Cook"} ......
I want to convert the result into JSON format like this:
{
"Teacher": "John",
"Students": ["Alice", "Bob", "Cook" .....]
}
I used GROUP_CONCAT to concat all the students' name and the following code:
row_headers = [x[0] for x in cursor.description] #this will extract row headers
result = []
for res in cursor.fetchall():
result.append(dict(zip(row_headers, res)))
I was able to get this result:
{
"Teacher": "John",
"Students": "Alice, Bob, Cook"
}
How can I make the students into array format?
If your version of sqlite has the JSON1 extension enabled, it's easy to do in pure SQL:
SELECT json_object('Teacher', teacher,
'Students', json_group_array(student)) AS result
FROM ex
GROUP BY teacher;
DB Fiddle example
You could just do result["Students"] = result["Students"].split(", ").
Trying to take a web API's JSON response and populate a SQL database with the results.
Part of the JSON response has this array:
"MediaLinks": [
{
"MediaType": "Datasheets",
"SmallPhoto": "",
"Thumbnail": "",
"Title": "SN54HC374, SN74HC374",
"Url": "http://www.ti.com/general/docs/suppproductinfo.tsp?distId=10&gotoUrl=http%3A%2F%2Fwww.ti.com%2Flit%2Fgpn%2Fsn74hc374"
},
{
"MediaType": "Product Photos",
"SmallPhoto": "http://media.digikey.com/photos/Texas%20Instr%20Photos/296-20-DIP_sml.jpg",
"Thumbnail": "http://media.digikey.com/photos/Texas%20Instr%20Photos/296-20-DIP_tmb.jpg",
"Title": "20-DIP,R-PDIP-Txx",
"Url": "http://media.digikey.com/photos/Texas%20Instr%20Photos/296-20-DIP.jpg"
},
{
"MediaType": "Featured Product",
"SmallPhoto": "",
"Thumbnail": "",
"Title": "Logic Solutions",
"Url": "https://www.digikey.com/en/product-highlight/t/texas-instruments/logic-solutions "
},
{
"MediaType": "Featured Product",
"SmallPhoto": "",
"Thumbnail": "",
"Title": "Analog Solutions",
"Url": "https://www.digikey.com/en/product-highlight/t/texas-instruments/analog-solutions "
},
{
"MediaType": "PCN Design/Specification",
"SmallPhoto": "",
"Thumbnail": "",
"Title": "Copper Bond Wire Revision A 04/Dec/2013",
"Url": "http://media.digikey.com/pdf/PCNs/Texas%20Instruments/PCN20120223003A_Copper-wire.pdf"
},
{
"MediaType": "PCN Design/Specification",
"SmallPhoto": "",
"Thumbnail": "",
"Title": "Material Set 30/Mar/2017",
"Url": "http://media.digikey.com/pdf/PCNs/Texas%20Instruments/PCN20170310000.pdf"
}
],
For testing I've issued the request and then written the response to a file and I'm experimenting with this file to come up with the correct code
conn.request("POST", "/services/partsearch/v2/partdetails", json.dumps(payload), headers)
res = conn.getresponse()
data = res.read()
data_return = json.loads(data)
print(json.dumps(data_return, indent=4))
with open(y["DigiKeyPartNumber"]+".json", "w") as write_file:
json.dump(data_return, write_file, indent=4, sort_keys=True)
write_file.close()
Then in my test code I've tried this:
import json
with open(r"C:\Users\george\OneDrive\Documents\296-1592-5-ND.json") as json_file:
data = json.load(json_file)
values = ""
placeholder = '?'
thelist = []
thelist = list(data['PartDetails']['MediaLinks'])
print(type(thelist))
#print(thelist)
placeholders = ', '.join(placeholder for unused in (data['PartDetails']['MediaLinks']))
query = 'INSERT INTO thetable VALUES(%s)' % placeholders
print(query)
But this just produces the following output:
<class 'list'>
INSERT INTO thetable VALUES(?, ?, ?, ?, ?, ?)
For reference this creates what I think will work except for the trailing comma:
if len(data['PartDetails']['MediaLinks']):
print('The length is: ' + str(len(data['PartDetails']['MediaLinks'])))
#print(type(data['PartDetails']['MediaLinks']))
for mediadata in data['PartDetails']['MediaLinks']:
#print(mediadata)
for element in mediadata:
#print(element + ' is "' + mediadata[element] + '"')
values += '"' + mediadata[element] + '", '
#print(list(data['PartDetails']['MediaLinks'][1]))
print(values + "\n")
values = ""
else:
print('It is empty')
Which produces this:
The length is: 6
"Datasheets", "", "", "SN54HC374, SN74HC374", "http://www.ti.com/general/docs/suppproductinfo.tsp?distId=10&gotoUrl=http%3A%2F%2Fwww.ti.com%2Flit%2Fgpn%2Fsn74hc374",
"Product Photos", "http://media.digikey.com/photos/Texas%20Instr%20Photos/296-20-DIP_sml.jpg", "http://media.digikey.com/photos/Texas%20Instr%20Photos/296-20-DIP_tmb.jpg", "20-DIP,R-PDIP-Txx", "http://media.digikey.com/photos/Texas%20Instr%20Photos/296-20-DIP.jpg",
"Featured Product", "", "", "Logic Solutions", "https://www.digikey.com/en/product-highlight/t/texas-instruments/logic-solutions ",
"Featured Product", "", "", "Analog Solutions", "https://www.digikey.com/en/product-highlight/t/texas-instruments/analog-solutions ",
"PCN Design/Specification", "", "", "Copper Bond Wire Revision A 04/Dec/2013", "http://media.digikey.com/pdf/PCNs/Texas%20Instruments/PCN20120223003A_Copper-wire.pdf",
"PCN Design/Specification", "", "", "Material Set 30/Mar/2017", "http://media.digikey.com/pdf/PCNs/Texas%20Instruments/PCN20170310000.pdf",
In the table I've created in SQL it uses the same column names as the keys in the JSON array. There are several arrays in the JSON response so I'm hoping to create a generic function that accepts the JSON array and creates the correct SQL INSERT statements to populate the tables with the JSON data. I'm planning on using pyodbc and best case is something that works for both Python 2.7 as well as 3.x
Updated Information:
I found the following code snippet which comes very close:
for thedata in data['PartDetails']['MediaLinks']:
keys, values = zip(*thedata.items())
print(values) #This will create the VALUES for the INSERT Statement
print(keys) #This will create the COLUMNS, need to add the PartDetailsId field
I was trying to find a way to get the keys before I ran this for loop because I would have to replace the print statements with the actual SQL INSERT statement.
When I check type(newdata['PartDetails']['MediaLinks']) is returns <class 'list'> in Python 3.7.4 so even though it looks like a dictionary it's treated like a list and .keys() fails to try and grab the keys
Just for completeness I want to post a formatted code snippet that is working for me. This would not have been possible without #barmar 's help so thanks again.
The end goal is to convert this into a function so that I can pass in the arrays from a JSON response and have it populate the correct SQL tables with the data. This is close to being complete but not quite there yet.
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};Server=GEORGE-OFFICE3\SQLEXPRESS01;Database=Components;')
cursor = conn.cursor()
with open(r"C:\Users\george\OneDrive\Documents\296-1592-5-ND.json") as json_file:
data = json.load(json_file)
x = tuple(data['PartDetails']['MediaLinks'][0])
a = str(x).replace("'","").replace("(","")
query = "INSERT INTO MediaLinks (PartDetailsId, " + a + " VALUES(" + str(data['PartDetails']['PartId'])
b = ""
for i in range(len(x)):
b += ", ?"
b += ")"
query += b
cursor.executemany(query, [tuple(d.values()) for d in data['PartDetails']['MediaLinks']])
cursor.commit()
conn.close()
Use cursor.executemany() to execute the query on all the rows in the MediaLinks list.
You can't pass the dictionaries directly, though, because iterating over a dictionary returns the keys, not the values. You need to convert this to a list of values, using one of the methods in How to convert list of dictionaries into list of lists
colnames = ", ".join (data['PartDetails']['MediaLinks'][0].keys())
placeholders = ", ".join(["?"] * len(data['PartDetails']['MediaLinks'][0]))
query = "INSERT INTO MediaLInks (" + colnames + ") VALUES (" + placeholders + ")"
cursor.executemany(query, [tuple(d.values()) for d in data['PartDetails']['MediaLinks']])
I have a JSON data that is inconsistent with fields.
{
"Firsthouse": {
"Doors": "10",
"windows": "9"
},
"Secondhouse": {
"doors": "1",
"windows": "10",
"pools": "2"
}
}
In "Secondhouse" field "pools" is present while it is absent in "Firsthouse".
If I want to write an insert query, do I need to have 6 different queries for presence/absence of such fields, like below:
#This is a query when 3 fields are present
query = "insert into table (doors,windows,pools) values (%s,%s,%s)"
q_tup = data_list_3Fields
cursor.executemany(query, q_tup)
#This is a query when 4 fields are present
query = "insert into table (doors,windows,pools,floors) values (%s,%s,%s,%s)"
q_tup = data_list_4Fields
cursor.executemany(query, q_tup)
Is there a proper approach to do this?
This is a two-part question. If you're checking this out, thanks for your time!
Is there a way to make my query faster?
I previously asked a question here, and was eventually able to solve the problem myself.
However, the query I devised to produce my desired results is VERY slow (25+ minutes) when run against my database, which contains 40,000+ records.
The query is serving its purpose, but I'm hoping one of you brilliant people can point out to me how to make the query perform at a more preferred speed.
My query:
with dupe as (
select
json_document->'Firstname'->0->'Content' as first_name,
json_document->'Lastname'->0->'Content' as last_name,
identifiers->'RecordID' as record_id
from (
select *,
jsonb_array_elements(json_document->'Identifiers') as identifiers
from staging
) sub
group by record_id, json_document
order by last_name
)
select * from dupe da where (
select count(*) from dupe db
where db.record_id = da.record_id
) > 1;
Again, some sample data:
Row 1:
{
"Firstname": "Bobb",
"Lastname": "Smith",
"Identifiers": [
{
"Content": "123",
"RecordID": "123",
"SystemID": "Test",
"LastUpdated": "2017-09-12T02:23:30.817Z"
},
{
"Content": "abc",
"RecordID": "abc",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:21.598Z"
},
{
"Content": "def",
"RecordID": "def",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:21.598Z"
}
]
}
Row 2:
{
"Firstname": "Bob",
"Lastname": "Smith",
"Identifiers": [
{
"Content": "abc",
"RecordID": "abc",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:26.020Z"
}
]
}
If I were to bring in my query's results, or a portion of the results, into a Python environment where they could be manipulated using Pandas, how could I iterate over the results of my query (or the sub-query) in order to achieve the same end result as with my original query?
Is there an easier way, using Python, to iterate through my un-nested json array in the same way that Postgres does?
For example, after performing this query:
select
json_document->'Firstname'->0->'Content' as first_name,
json_document->'Lastname'->0->'Content' as last_name,
identifiers->'RecordID' as record_id
from (
select *,
jsonb_array_elements(json_document->'Identifiers') as identifiers
from staging
) sub
order by last_name;
How, using Python/Pandas, can i take that query's results and perform something like:
da = datasets[query_results] # to equal my dupe da query
db = datasets[query_results] # to equal my dupe db query
Then perform the equivalent of
select * from dupe da where (
select count(*) from dupe db
where db.record_id = da.record_id
) > 1;
in Python?
I apologize if I do not provide enough information here. I am a Python novice. Any and all help is greatly appreciated! Thanks!!
Try the following, which eliminates your count(*) and instead uses exists.
with dupe as (
select id,
json_document->'Firstname'->0->'Content' as first_name,
json_document->'Lastname'->0->'Content' as last_name,
identifiers->'RecordID' as record_id
from
(select
*,
jsonb_array_elements(json_document->'Identifiers') as identifiers
from staging ) sub
group by
id,
record_id,
json_document
order by last_name )
select * from dupe da
where exists
(select *
from dupe db
where
db.record_id = da.record_id
and db.id != da.id
)
Consider reading the raw, unqueried values of the Postgres json column type and use pandas json_normalize() to bind into a flat dataframe. From there use pandas drop_duplicates.
To demonstrate, below parses your one json data into three-row dataframe for each corresponding Identifiers records:
import json
import pandas as pd
json_str = '''
{
"Firstname": "Bobb",
"Lastname": "Smith",
"Identifiers": [
{
"Content": "123",
"RecordID": "123",
"SystemID": "Test",
"LastUpdated": "2017-09-12T02:23:30.817Z"
},
{
"Content": "abc",
"RecordID": "abc",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:21.598Z"
},
{
"Content": "def",
"RecordID": "def",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:21.598Z"
}
]
}
'''
data = json.loads(json_str)
df = pd.io.json.json_normalize(data, 'Identifiers', ['Firstname','Lastname'])
print(df)
# Content LastUpdated RecordID SystemID Lastname Firstname
# 0 123 2017-09-12T02:23:30.817Z 123 Test Smith Bobb
# 1 abc 2017-09-13T10:10:21.598Z abc Test Smith Bobb
# 2 def 2017-09-13T10:10:21.598Z def Test Smith Bobb
For your database, consider connecting with your DB-API such as psycopg2 or sqlAlchemy and parse each json as a string accordingly. Admittedly, there may be other ways to handle json as seen in the psycopg2 docs but below receives data as text and parses on python side:
import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()
cur.execute("SELECT json_document::text FROM staging;")
df = pd.io.json.json_normalize([json.loads(row[0]) for row in cur.fetchall()],
'Identifiers', ['Firstname','Lastname'])
df = df.drop_duplicates(['RecordID'])
cur.close()
conn.close()