Handling Images in API with Python Flask/Connexion and Swagger - python

I tried to setup a very simple app. I wanted to create this app as fullstack app as training for future projects. So i wrote a backend in python which provides data from a DB (SQLLite) via an API (Flask/Connexion). The API is documented via Swagger. The DB should have a table where each row got 2 values:
1. name
2. images
I quickly faced a problem: I actually don't know how to handle images in APIs. Therefore I created the backup with a placeholder. Till now images is just another string which is mostly empty. Everything works fine. But now I want to be able to get Images via API and save them in the DB. I have absolutly no Idea how to do this. Hope one of you can help me.
Here is my Code so far:
SqlliteHandler.py
import sqlite3
conn = sqlite3.connect('sprint_name.db')
c = conn.cursor()
def connect_db():
global conn
global c
conn = sqlite3.connect('sprint_name.db')
c = conn.cursor()
c.execute("CREATE TABLE if not exists sprint_names ( name text, image text)")
def make_db_call(execute_statement, fetch_smth=""):
global c
connect_db()
print(execute_statement)
c.execute(execute_statement)
response = ""
if fetch_smth is "one":
response = transform_tuple_to_dict(c.fetchone())
if fetch_smth is "all":
response_as_tuples = c.fetchall()
response = []
for sug in response_as_tuples:
response.append(transform_tuple_to_dict(sug))
conn.commit()
conn.close()
return response
def transform_tuple_to_dict(my_tuple):
return {"name": my_tuple[0], "image": my_tuple[1]}
def add_name(suggestion):
name = suggestion.get("name")
image = "" if suggestion.get("image") is None else suggestion.get("image")
execute_statement = "SELECT * FROM sprint_names WHERE name='" + name + "'"
print(execute_statement)
alreadyexists = False if make_db_call(execute_statement, "one") is None else True
print(alreadyexists)
if not alreadyexists:
execute_statement = "INSERT INTO sprint_names VALUES ('" + name + "', '" + image + "')"
make_db_call(execute_statement)
def delete_name(suggestion_name):
execute_statement = "DELETE FROM sprint_names WHERE name='" + suggestion_name + "'"
print(execute_statement)
make_db_call(execute_statement)
def delete_all():
make_db_call("DELETE FROM sprint_names")
def get_all_names():
return make_db_call("SELECT * FROM sprint_names", "all")
def get_name(suggestion_name):
print(suggestion_name)
execute_statement = "SELECT * FROM sprint_names WHERE name='" + suggestion_name + "'"
print(execute_statement)
return make_db_call(execute_statement, "one")
def update_image(suggestion_name, suggestion):
new_name = suggestion.get("name" )
new_image = "" if suggestion.get("image") is None else suggestion.get("image")
execute_statement = "UPDATE sprint_names SET name='" + new_name + "', image='" + new_image + "' WHERE name='"\
+ suggestion_name + "'"
make_db_call(execute_statement)
RunBackEnd.py
from flask import render_template
import connexion
# Create the application instance
app = connexion.App(__name__, specification_dir='./')
# Read the swagger.yml file to configure the endpoints
app.add_api('swagger.yml')
# Create a URL route in our application for "/"
#app.route('/')
def home():
"""
This function just responds to the browser ULR
localhost:5000/
:return: the rendered template 'home.html'
"""
return render_template('home.html')
# If we're running in stand alone mode, run the application
if __name__ == '__main__':
app.run(port=5000)
Swagger.yml
swagger: "2.0"
info:
description: This is the swagger file that goes with our server code
version: "1.0.0"
title: Swagger REST Article
consumes:
- "application/json"
produces:
- "application/json"
basePath: "/api"
# Paths supported by the server application
paths:
/suggestions:
get:
operationId: SqlliteHandler.get_all_names
tags:
- suggestions
summary: The names data structure supported by the server application
description: Read the list of names
responses:
200:
description: Successful read names list operation
schema:
type: array
items:
properties:
name:
type: string
image:
type: string
post:
operationId: SqlliteHandler.add_name
tags:
- suggestions
summary: Create a name and add it to the names list
description: Create a new name in the names list
parameters:
- name: suggestion
in: body
description: Suggestion you want to add to the sprint
required: True
schema:
type: object
properties:
name:
type: string
description: Name you want to submit
image:
type: string
description: path to the picture of that name
responses:
201:
description: Successfully created name in list
/suggestions/{suggestion_name}:
get:
operationId: SqlliteHandler.get_name
tags:
- suggestions
summary: Read one name from the names list
description: Read one name from the names list
parameters:
- name: suggestion_name
in: path
description: name of the sprint name to get from the list
type: string
required: True
responses:
200:
description: Successfully read name from names list operation
schema:
type: object
properties:
name:
type: string
image:
type: string
put:
operationId: SqlliteHandler.update_image
tags:
- suggestions
summary: Update an image in the suggestion list via the name of the suggestions
description: Update an image in the suggestion list
parameters:
- name: suggestion_name
in: path
description: Suggestion you want to edit
type: string
required: True
- name: suggestion
in: body
schema:
type: object
properties:
name:
type: string
image:
type: string
responses:
200:
description: Successfully updated suggestion in suggestion list
delete:
operationId: SqlliteHandler.delete_name
tags:
- suggestions
summary: Delete a suggestion via its name from the suggestion list
description: Delete a suggestion
parameters:
- name: suggestion_name
in: path
type: string
required: True
responses:
200:
description: Successfully deleted a suggestion from the list

To save an image in SQLITE (not that it's recommended, better to save the image as a file and to save the path in the DB) you save it as an array of bytes (storage type of BLOB, not that the column has to be defined as a BLOB).
In SQL you specify an array of bytes as a hex string. So you read you image and build a hex string
Noting
Maximum length of a string or BLOB
The maximum number of bytes in a string or BLOB in SQLite is defined
by the preprocessor macro SQLITE_MAX_LENGTH. The default value of this
macro is 1 billion (1 thousand million or 1,000,000,000). You can
raise or lower this value at compile-time using a command-line option
like this:
-DSQLITE_MAX_LENGTH=123456789 The current implementation will only support a string or BLOB length up to 231-1 or 2147483647. And some
built-in functions such as hex() might fail well before that point. In
security-sensitive applications it is best not to try to increase the
maximum string and blob length. In fact, you might do well to lower
the maximum string and blob length to something more in the range of a
few million if that is possible.
During part of SQLite's INSERT and SELECT processing, the complete
content of each row in the database is encoded as a single BLOB. So
the SQLITE_MAX_LENGTH parameter also determines the maximum number of
bytes in a row.
The maximum string or BLOB length can be lowered at run-time using the
sqlite3_limit(db,SQLITE_LIMIT_LENGTH,size) interface.
Also
Noting
Maximum Length Of An SQL Statement
The maximum number of bytes in the text of an SQL statement is limited
to SQLITE_MAX_SQL_LENGTH which defaults to 1000000. You can redefine
this limit to be as large as the smaller of SQLITE_MAX_LENGTH and
1073741824.
If an SQL statement is limited to be a million bytes in length, then
obviously you will not be able to insert multi-million byte strings by
embedding them as literals inside of INSERT statements. But you should
not do that anyway. Use host parameters for your data. Prepare short
SQL statements like this:
INSERT INTO tab1 VALUES(?,?,?); Then use the sqlite3_bind_XXXX()
functions to bind your large string values to the SQL statement. The
use of binding obviates the need to escape quote characters in the
string, reducing the risk of SQL injection attacks. It is also runs
faster since the large string does not need to be parsed or copied as
much.
The maximum length of an SQL statement can be lowered at run-time
using the sqlite3_limit(db,SQLITE_LIMIT_SQL_LENGTH,size) interface.
The resultant SQL would be along the lines of :-
INSERT INTO mytable (myimage) VALUES (x'fffe004577aabbcc33f1f8');
As a demo using your table (slightly modified to include the "correct" column type BLOB, which makes little difference) :-
DROP TABLE If EXISTS sprint_names;
CREATE TABLE if not exists sprint_names ( name text, image text, altimage BLOB);
INSERT INTO sprint_names VALUES
('SPRINT001',x'fffe004577aabbcc33f1f8',x'fffe004577aabbcc33f1f8'), -- obviously image would be larger
('SPRINT002',x'99008877665544332211f4d6e9c2aaa8b7b4',x'99008877665544332211f4d6e9c2aaa8b7b4')
;
SELECT * FROM sprint_names;
The result would be :-
Note Navicat was used to run text the above. Blobs are inherently difficult to display hence display. However, what is shown is that the above obviously stores and retrieves the data.
As previously stated it's much simpler to just store the path to the image file and when it boils down to it there is likely very little need for the image as data. You're unlikely to be querying the data that the image is comprised of, whilst using naming standards could allow useful searches/queries of a stored name/path.
However, in contradiction of the above, SQLite can, in some circumstances (images with an average size around 100k or less (maybe more)) allow faster access than the file system 35% Faster Than The Filesystem.

Related

Intake: catalogue level parameters

I am reading about "parameters" here and wondering whether I can define catalogue level parameters that I can later use in the definition of the catalogue's sources?
Consider a simple YAML-catalogue with two sources:
sources:
data1:
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
data2:
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
Note that both data sources (data1 and data2) make use of snapshot_date parameter inside urlpath argument? With this definition I can load data sources with:
cat = intake.open_catalog("./catalog.yaml")
cat.data1(snapshot_date="latest").read() # reads from data/latest/data1.csv
cat.data2(snapshot_date="20211029").read() # reads from data/20211029/data2.csv
Please note that cat.data1().read() will not work, since snapshot_date defaults to empty string, so the csv driver cannot find the path "./data//data1.csv".
I can set the default value by adding parameters section to every (!) source like in the below.
sources:
data1:
parameters:
snapshot_date:
type: str
default: "latest"
description: ""
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
data2:
parameters:
snapshot_date:
type: str
default: "latest"
description: ""
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
But this looks complicated (too much repetitive code) and a little inconvenient for the end user -- if a user wants to load all data sources from a given date, he has to explicitly provide snapshot_date parameter to every(!) data source at initialization. IMO, it would be nice I user can provide this value once when initializing the catalog.
Is there a way I can define snapshot_date parameter at catalog level? So that:
I can set default value (e.g. "latest" in my example) in the YAML-definition of the catalogue's parameter
or can pass catalogue's parameter value at runtimeduring the call intake.open_catalog("./catalog.yaml", snapshot_date="20211029")
this value should be accessible in the definition of data sources of this catalog
?
cat = intake.open_catalog("./catalog.yaml", snapshot_date="20211029")
cat.data1.read() # will return data from ./data/20211029/data1.csv
cat.data2.read() # will return data from ./data/20211029/data2.csv
cat.data2(snapshot_date="latest").read() # will return data from ./data/latest/data1.csv
cat = intake.open_catalog("./catalog.yaml")
cat.data1.read() # will return data from ./data/latest/data1.csv
cat.data2.read() # will return data from ./data/latest/data2.csv
Thanks in advance
This idea has been suggested before ( https://github.com/intake/intake/pull/562 , https://github.com/intake/intake/issues/511 ), and I have an inkling that maybe https://github.com/zillow/intake-nested-yaml-catalog supports something like you are asking.
However, I fully support adding this functionality in Intake, either based on #562, above, or otherwise. Adding it to the base Catalog and YAML file(s) catalog should be easy, but doing it so that it works for all subclasses might be tricky.
Currently, you can achieve what you want using environment variables, e.g., "{{snapshot_date}}"->"{{env(SNAPSHOT_DATE)}}", but you would ned to communicate to the user that this variable should be set. In addition, if the value is not to be used within a string, you would still need a parameter definition to cast to the right type.
This is a bit of a hack, but consider a yaml file with this content:
global_params:
snapshot_date: &global
default: latest
description: ''
type: str
sources:
data1:
args:
urlpath: '{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv'
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
parameters:
snapshot_date: *global
data2:
args:
urlpath: '{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv'
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
parameters:
snapshot_date: *global
Now intake will accept keyword argument for snapshot_date for specific sources.
Some relevant answers: 1 and 2.

How to get a CSV string from querying a relational DB?

I'm querying a relational Database and I need the result as a CSV string. I can't save it on the disk as is running in a serverless environment (I don't have access to disk).
Any idea?
My solution was using PyGreSQL library and defining this function:
import pg
def get_csv_from_db(query, cols):
"""
Given the SQL #query and the expected #cols,
a string formatted CSV (containing headers) is returned
:param str query:
:param list of str cols:
:return str:
"""
connection = pg.DB(
dbname=my_db_name,
host=my_host,
port=my_port,
user=my_username,
passwd=my_password)
header = ','.join(cols) + '\n'
records_list = []
for row in connection.query(query).dictresult():
record = []
for c in cols:
record.append(str(row[c]))
records_list.append(",".join(record))
connection.close()
return header + "\n".join(records_list)
Unfortunately this solution expects the column names in input (which is not too bad IMHO) and iterate over the dictionary result with Python code.
Other solutions (especially out of the box) using other packages are more than welcome.
This is another solution based on PsycoPG and Pandas:
import psycopg2
import pandas as pd
def get_csv_from_db(query):
"""
Given the SQL #query a string formatted CSV (containing headers) is returned
:param str query:
:return str:
"""
conn = psycopg2.connect(
dbname=my_db_name,
host=my_host,
port=my_port,
user=my_username,
passwd=my_password)
cur = conn.cursor()
cursor.execute("query")
df = pd.DataFrame(cur.fetchall(), columns=[desc[0] for desc in cur.description])
cur.close()
conn.commit()
return df.to_csv()
I hadn't chance to test it yet though.
here is a different approach from other answers, Using Pandas.
i suppose you have a database connection already,
for example I'm using Oracle database, same can be done by using respective library for your relational db.
only these 2 lines do the trick,
df = pd.read_sql(query, con)
df.to_csv("file_name.csv")
Here is a full example using Oracle database:
dsn = cx_Oracle.makedsn(ip, port,service_name)
con = cx_Oracle.connect("user","password",dsn)
query = """"select * from YOUR_TABLE"""
df = pd.read_sql(query, con)
df.to_csv("file_name.csv")
PyGreSQL's Cursor has method copy_to. It accept as stream file-like object (which must have a write() method). io.StringIO does meet this condition and do not need access to disk, so it should be possible to do:
import io
csv_io = io.StringIO()
# here connect to your DB and get cursor
cursor.copy_to(csv_io, "SELECT * FROM table", format="csv", decode=True)
csv_io.seek(0)
csv_str = csv_io.read()
Explanation: many python modules accept file-like object, meaning you can use io.StringIO() or io.BytesIO() in place of true file-handles. These mimick file opened in text and bytes modes respectively. As with files there is position of reader, so I do seek to begin after usage. Last line does create csv_str which is just plain str. Remember to adjust SQL query to your needs.
Note: I do not tested above code, please try it yourself and write if it works as intended.

How to make a hard-coded HTTP processing script dynamic?

I have a Jython 2.7 script that receives a URL and uses the parameters/values in the URL to create or update records.
Example URL: http://server:host/maximo/oslc/script/CREATEWO?&wonum=WO0001&description=Legacy&classstructureid=1666&wopriority=1&worktype=CM
Details:
Receive the URL and put the parameters/values in variables:
from psdi.server import MXServer
from psdi.mbo import MboSet
resp = {}
wonum = request.getQueryParam("wonum")
description = request.getQueryParam("description")
classstructureid = request.getQueryParam("classstructureid")
wopriority = request.getQueryParam("wopriority")
worktype = request.getQueryParam("worktype")
Some lines that aren't relevant to the question:
woset = MXServer.getMXServer().getMboSet("workorder",request.getUserInfo())
whereClause = "wonum= '" + wonum + "'"
woset.setWhere(whereClause)
woset.reset()
woMbo = woset.moveFirst()
Then use the values to either create a new record or update an existing record:
#If workorder already exists, update it:
if woMbo is not None:
woMbo.setValue("description", description)
woMbo.setValue("classstructureid", classstructureid)
woMbo.setValue("wopriority", wopriority)
woMbo.setValue("worktype", worktype)
woset.save()
woset.clear()
woset.close()
resp[0]='Updated workorder ' + wonum
#Else, create a new workorder
else:
woMbo=woset.add()
woMbo.setValue("wonum",wonum)
woMbo.setValue("description", description)
woMbo.setValue("classstructureid", classstructureid)
woMbo.setValue("wopriority", wopriority)
woMbo.setValue("worktype", worktype)
woset.save()
woset.clear()
woset.close()
resp[0]='Created workorder ' + wonum
responseBody =resp[0]
Question:
Unfortunately, the field names/values are hardcoded in 3 different places in the script.
I would like to enhance the script so that it is dynamic -- not hardcoded.
In other words, it would be great if the script could accept a list of parameters/values and simply loop through them to update or create a record in the respective fields.
Is it possible to do this?
You're using the Maximo Next Gen. REST API to execute an automation script that accepts an HTTP request with parameters and creates or updates a Work Order in the system. You want to make your script more generic (presumably to accept more paramaters for the created/updated work order) and/or other mbo's.
This can be achieved without developing automation scripts and just using the Next Gen. API you're already using to execute the script. The API already accepts create & update requests on the mxwo object structure with the ability to use all the fields, child objects, etc.
https://developer.ibm.com/static/site-id/155/maximodev/restguide/Maximo_Nextgen_REST_API.html#_creating_and_updating_resources
Assuming you are working always with the same query parameters, rather than define variables, loop through a list of strings and put them as key-value pairs
To populate
items = ["wonum", "description"]
resp = {k: request.getQueryParam(k) for k in items}
Then to set
for i in items:
woMbo.setValue(i, resp[i])
Otherwise, you are looking for URL parsing and the getQuery method, followed by a split("="), giving you ["wonum", "WO0001", "description", "Legacy"], for example, and you can loop over every other element to get you dynamic entries
l = ["wonum", "WO0001", "description", "Legacy"]
for i in range(0, len(l)-1, 2):
print(f'key:{l[i]}\tvalue:{l[i+1]}')
key:wonum value:WO0001
key:description value:Legacy
Note: This is subject to SQL injection attacks, and should be fixed
whereClause = "wonum= '" + wonum + "'"

Python Flask Swagger Flasgger Download Excel

I am trying to return an excel file from Swagger API. Built that using Flask with a Swagger wrapper with Flasgger. Here's the code -
#app.route('/cluster', methods=['POST'])
def index():
"""
This API will help you generate clusters based on keywords present in unstructured text
Call this api passing the following parameters -
Dataset Path - <hostname>\\<path to dataset>
Column Name based on which clustering needs to be done
Number of Clusters
Sample URL: http://localhost:8180/cluster/clusters.csv?dataset=\\\\W1400368\\c$\\Users\\VK046010\\Documents\\Python%20Scripts\\RevCycle_PatientAcc.csv&ext=csv&col=SR_SUM_TXT&no_of_clusters=100
---
tags:
- Clustering API
parameters:
- name: dataset
in: formData
type: file
required: true
description: The fully qualified path of the dataset without the extension.
- name: col
in: query
type: string
required: true
description: The column name on which the clustering needs to be done
- name: no_of_clusters
in: query
type: integer
required: true
description: The number of clusters
"""
global data
data = data.fillna('NULL')
output = StringIO.StringIO()
data.to_csv(output,index=False)
resp = Response(output.getvalue(), mimetype="text/csv")
resp.headers["Accept"] = "text/csv"
resp.headers['Access-Control-Allow-Origin'] = '*'
resp.headers["Content-Disposition"] = "attachment; filename=clusters.csv"
return resp
This returns a downloadable link which I have to rename to csv to make it work.
Question: I am not being able to do this for excel files. No matter how I do it, once I download and rename, excel says the file is corrupt and that's that.
I tried pyexcel and pandas excel writer, didn't work out. Please help!
Try to use flasgger to download excel. You can change the response type to "application/octet-stream" to resolve it.
Did you try to change the mimetype?
I think that the traditional mimetype for excel is application/vnd.ms-excel
You could find more details on microsoft files mimetype here: What is a correct mime type for docx, pptx etc?

couchdb-python change notifications

I'm trying to use couchdb.py to create and update databases. I'd like to implement notification changes, preferably in continuous mode. Running the test code posted below, I don't see how the changes scheme works within python.
class SomeDocument(Document):
#############################################################################
# def __init__ (self):
intField = IntegerField()#for now - this should to be an integer
textField = TextField()
couch = couchdb.Server('http://127.0.0.1:5984')
databasename = 'testnotifications'
if databasename in couch:
print 'Deleting then creating database ' + databasename + ' from server'
del couch[databasename]
db = couch.create(databasename)
else:
print 'Creating database ' + databasename + ' on server'
db = couch.create(databasename)
for iii in range(5):
doc = SomeDocument(intField=iii,textField='somestring'+str(iii))
doc.store(db)
print doc.id + '\t' + doc.rev
something = db.changes(feed='continuous',since=4,heartbeat=1000)
for iii in range(5,10):
doc = SomeDocument(intField=iii,textField='somestring'+str(iii))
doc.store(db)
time.sleep(1)
print something
print db.changes(since=iii-1)
The value
db.changes(since=iii-1)
returns information that is of interest, but in a format from which I haven't worked out how to extract the sequence or revision numbers, or the document information:
{u'last_seq': 6, u'results': [{u'changes': [{u'rev': u'1-9c1e4df5ceacada059512a8180ead70e'}], u'id': u'7d0cb1ccbfd9675b4b6c1076f40049a8', u'seq': 5}, {u'changes': [{u'rev': u'1-bbe2953a5ef9835a0f8d548fa4c33b42'}], u'id': u'7d0cb1ccbfd9675b4b6c1076f400560d', u'seq': 6}]}
Meanwhile, the code I'm really interested in using:
db.changes(feed='continuous',since=4,heartbeat=1000)
Returns a generator object and doesn't appear to provide notifications as they come in, as the CouchDB guide suggests ....
Has anyone used changes in couchdb-python successfully?
I use long polling rather than continous, and that works ok for me. In long polling mode db.changes blocks until at least one change has happened, and then returns all the changes in a generator object.
Here is the code I use to handle changes. settings.db is my CouchDB Database object.
since = 1
while True:
changes = settings.db.changes(since=since)
since = changes["last_seq"]
for changeset in changes["results"]:
try:
doc = settings.db[changeset["id"]]
except couchdb.http.ResourceNotFound:
continue
else:
// process doc
As you can see it's an infinite loop where we call changes on each iteration. The call to changes returns a dictionary with two elements, the sequence number of the most recent update and the objects that were modified. I then loop through each result loading the appropriate object and processing it.
For a continuous feed, instead of the while True: line use for changes in settings.db.changes(feed="continuous", since=since).
I setup a mailspooler using something similar to this. You'll need to also load couchdb.Session() I also use a filter for only receiving unsent emails to the spooler changes feed.
from couchdb import Server
s = Server('http://localhost:5984/')
db = s['testnotifications']
# the since parameter defaults to 'last_seq' when using continuous feed
ch = db.changes(feed='continuous',heartbeat='1000',include_docs=True)
for line in ch:
doc = line['doc']
// process doc here
doc['priority'] = 'high'
doc['recipient'] = 'Joe User'
# doc['state'] + 'sent'
db.save(doc)
This will allow you access your doc directly from the changes feed, manipulate your data as you see fit, and finally update you document. I use a try/except block on the actual 'db.save(doc)' so I can catch when a document has been updated while I was editing and reload the doc before saving.

Categories