Sharing large sqlalchemy objects between flask templates - python

I have a flask application, which I want to use to do some data visualization of a very large dataset that I have in a postgresql database.
I have sqlalchemy, plotly, google maps set up, but my current problem is that I want to fetch a rather large dataset from my database and pass that to a template that will take care of doing some different visualizations. I only want to load the data in once, when the site is started, and then avoid running it later on when navigating the other pages.
which methods can I use to achieve this? And how can I access the data that I get in other pages (templates)?
I read somewhere about session and I tried to implement such a functionality, but then I got a "not serializable error", which I fixed by parsing my object to JSON, but then I got a "too large header" error. I feel like I'm sort of out of possiblities..
Just to show some code, here is what I'm currently messing around with.
Models.py:
from sqlalchemy import BigInteger, Column, JSON, Text
from app import db
class Table910(db.Model):
__tablename__ = 'table_910'
id = db.Column(BigInteger, primary_key=True)
did = db.Column(Text)
timestamp = db.Column(BigInteger)
data = db.Column(JSON)
db_timestamp = db.Column(BigInteger)
def items(self):
return {'id': self.id, 'did': self.did, 'timestamp': self.timestamp, 'data': self.data, 'db_timestamp': self.db_timestamp}
def __repr__(self):
return '<1_Data %r, %r>' % (self.did, self.id)
Views.py
from flask import render_template, session
from app import app, db, models
from datetime import datetime
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import json
import plotly
#app.route('/')
#app.route('/index')
def index():
# fetching from the database (takes ~50 seconds to run)
rows = models.Table910.query \
.filter_by(did='357139052424715') \
.filter((models.Table910.timestamp > 1466846920000) & (models.Table910.timestamp < 1467017760000)) \
.order_by(models.Table910.timestamp.asc())
#
# managing and drawing data on plots
#
return render_template('index.html',
ids=ids,
graphJSON=graphJSON,
rows=rows,
routeCoordinates=routeCoordinates))
EDIT:
I might have explained my problem a little too vaguely, therefore to elaborate a little I want to use a specific page to load the data, and then have another page take care of drawing the graphs. The best procedure I can think of currently would be to be able to press a button to get the data and once data is loaded press another button to draw the graphs.. How can I achieve this?

Related

Import a JSON project wise, so it loads just once

I have a Python project that performs a JSON validation against a specific schema.
It will run as a Transform step in GCP Dataflow, so it's very important that all dependencies are gathered before the run to avoid downloading the same file again and again.
The schema is placed in a separated Git repository.
The nature of the Transformer is that you receive a single record in your class, and you work with it. The typical flow is that you load the JSON Schema, you validate the record against it, and then you do stuff with the invalid and with the valid. Loading the schema in this way means that I download the schema from the repo for every record, and it could be hundred thousands.
The code gets "cloned" into the workers and then work kinda independent.
Inspired by the way Python loads the requirements at the beginning (one single time) and using them as imports, I thought I could add the repository (where the JSON schema lives) as a Python requirement, and then simply use it in my Python code. But of course, it's a JSON, not a Python module to be imported. How can it work?
An example would be something like:
requirements.txt
git+git://github.com/path/to/json/schema#41b95ec
dataflow_transformer.py
import apache_beam as beam
import the_downloaded_schema
from jsonschema import validate
class Verifier(beam.DoFn):
def process(self, record: dict):
validate(instance=record, schema=the_downloaded_schema)
# ... more stuff
yield record
class Transformer(beam.PTransform):
def expand(self, record):
return (
record
| "Verify Schema" >> beam.ParDo(Verifier())
)
You can load the json schema once and use it as a side input.
An example:
import json
import requests
json_current='https://covidtracking.com/api/v1/states/current.json'
def get_json_schema(url):
with requests.Session() as session:
schema = json.loads(session.get(url).text)
return schema
schema_json = get_json_schema(json_current)
def feed_schema(data, schema):
yield {'record': data, 'schema': schema[0]}
schema = p | beam.Create([schema_json])
data = p | beam.Create(range(10))
data_with_schema = data | beam.FlatMap(feed_schema, schema=beam.pvalue.AsSingleton(schema))
# Now do your schema validation
Just a demonstration of what the data_with_schema pcollection looks like
Why don't you just use a class for loading your resources that uses a cache in order to prevent double loading? Something along the lines of:
class JsonLoader:
def __init__(self):
self.cache = set()
def import(self, filename):
filename = os.path.absname(filename)
if filename not in self.cache:
self._load_json(filename)
self.cache.add(filename)
def _load_json(self, filename):
...

Stuck implementing elasticsearch in a flask app

I'm currently working through the Flask Mega-Tutorial (Part XVI) and have gotten stuck implementing elasticsearch. Specifically, I get this error when running the following from my flask shell command line:
from app.search import add_to_index, remove_from_index, query_index
>>> for post in Post.query.all():
... add_to_index('posts', post)
AttributeError: module 'flask.app' has no attribute 'elasticsearch'
I should mention that I did not implement the app restructuring from the previous lesson to use blueprints. Here's what my files look like:
__init__.py:
#
from elasticsearch import Elasticsearch
app.elasticsearch = Elasticsearch([app.config['ELASTICSEARCH_URL']]) \
if app.config['ELASTICSEARCH_URL'] else None
config.py:
class Config(object):
#
ELASTICSEARCH_URL = 'http://localhost:9200'
search.py:
from flask import app
def add_to_index(index, model):
if not app.elasticsearch:
return
payload = {}
for field in model.__searchable__:
payload[field] = getattr(model, field)
app.elasticsearch.index(index=index, id=model.id, body=payload)
def remove_from_index(index, model):
if not app.elasticsearch:
return
app.elasticsearch.delete(index=index, id=model.id)
def query_index(index, query, page, per_page):
if not app.elasticsearch:
return [], 0
search = app.elasticsearch.search(
index=index,
body={'query': {'multi_match': {'query': query, 'fields': ['*']}},
'from': (page - 1) * per_page, 'size': per_page})
ids = [int(hit['_id']) for hit in search['hits']['hits']]
return ids, search['hits']['total']['value']
I think I'm not importing elasticsearch correctly into search.py but I'm not sure how to represent it given that I didn't do the restructuring in the last lesson. Any ideas?
The correct way to write it in the search.py file should be from flask import current_app
Not sure if you got this working, but the way I implemented it was by still using app.elasticsearch but instead within search.py do:
from app import app

Run Python script in Django template through a button

I have a very small project in Django where I get fx_rates through a Python script (details below). These are used to convert amounts in different currencies to GBP. I would like to know how I can just have a button on the website that allows to refresh this query instead of having to run it manually in the IDE. Can I create a view for this? How would I show this in a template? Thanks
fx_rates.py
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE','fx_rates_project.settings')
import django
django.setup()
from fx_rates_app.models import fx_table
import pandas_datareader.data as web
from datetime import datetime
os.environ["ALPHAVANTAGE_API_KEY"] = '#########'
fx_gbp_to_eur = web.DataReader("GBP/EUR","av-forex")
eur = float(fx_gbp_to_eur[4:5].values[0][0])
fx_gbp_to_aud = web.DataReader("GBP/AUD","av-forex")
aud = float(fx_gbp_to_aud[4:5].values[0][0])
fx_gbp_to_usd = web.DataReader("GBP/USD","av-forex")
usd = float(fx_gbp_to_usd[4:5].values[0][0])
from datetime import datetime
now = datetime.now()
dt_string = now.strftime("%d/%m/%Y %H:%M:%S")
webpg1 = fx_table.objects.get_or_create(eur_to_gbp=eur,aud_to_gbp=aud,usd_to_gbp=usd,date_time=dt_string)[0]
In the template I included:
<form method="post">
{% csrf_token %}
<button type="submit" name="run_script">Refresh</button>
</form>
However, I don't think the best way to do this would be to copy the all script in views with request.POST. Is there another way to leave the script in another file and just create a view to run it (e.g. if request.method == 'POST' run fx_rates.py)?
This can be definitely done by creating a function update_fx to wrap your script and then call that function in your view.
Now for the tricky part.
Do you need the values to render them or just to update them?
Rendering will be easy, just return them from your function and you will
be able to use them in your template.
This though includes the waiting time to fetch these values. When your view will be executed you will have to wait for the function to fetch the values.
The alternative is to "trigger" a second job that updates the values from your
view. This pattern is called a task queue and there are many solutions that you can leverage already, depending on how complicated things should be (what happens if the job has an error? how many times should we retry? ...).
This also has the unfortunate effect that now you run a second python script somewhere that listens for new tasks and executes them.
As always your choice is based on tradeoffs, personally if the server has good internet connection and the request is completed fast I would not mind putting it
in my view as is.
Hope this is helpful.
Let's see an example
# update.py
from fx_rates_app.models import fx_table
import pandas_datareader.data as web
from django.conf import settings
from datetime import datetime
import os
def get_price():
os.environ["ALPHAVANTAGE_API_KEY"] = '#########'
fx_gbp_to_eur = web.DataReader("GBP/EUR","av-forex") # here we are reading stuff from the network I guess
eur = float(fx_gbp_to_eur[4:5].values[0][0])
fx_gbp_to_aud = web.DataReader("GBP/AUD","av-forex")
aud = float(fx_gbp_to_aud[4:5].values[0][0])
fx_gbp_to_usd = web.DataReader("GBP/USD","av-forex")
usd = float(fx_gbp_to_usd[4:5].values[0][0])
now = datetime.now()
dt_string = now.strftime("%d/%m/%Y %H:%M:%S")
# here you have two choices,
# either you return the raw values
# or you put them into a model
return {
'timestamp': dt_string,
'value': 1,
'currency': 'gbp',
'eur': eur,
'aud': aud,
'us': us,
}
webpg1 = fx_table.objects.get_or_create(eur_to_gbp=eur,aud_to_gbp=aud,usd_to_gbp=usd,date_time=dt_string)[0]
return webpg1
# views.py
from django.http import HttpResponse
from update import get_price
def refresh_from_values(request):
data = get_price() # here we don't have a model, just data
webpg1 = fx_table.objects.get_or_create(
eur_to_gbp=data['eur'],
aud_to_gbp=data['aud'],
usd_to_gbp=data['usd'],
date_time=data['timestamp'],
)[0]
# here your render the model using the template
def refresh_from_model(request):
fx = get_price() # here we get back the model instead
# you can render again the model as you can refresh the page
In the future you might be interested in running this thing continuously
and then you would have to move to an event driven app which is more compilicated but can be very satisfying to get right. In django everything
would start with signals and more likely you would have to write some javascript to
update part of the page without refreshing it.

After insert, transform and insert into another table

I have two tables Document and Picture. The relationship is that one document can have several pictures. What should happen is that once a document is uploaded to the PostgreSQL, the document should be downloaded and transformed into a jpeg, and then uploaded to the Picture table.
I'm using sqlalchemy and flask in my application.
So far I tried using events to trigger a method after insert. Unfortunately, I am receiving the error sqlalchemy.exc.ResourceClosedError: This transaction is closed when I commit.
The code:
from app.models.ex_model import Document, Picture
from pdf2image import convert_from_bytes
from sqlalchemy import event
import io
import ipdb
from app.core.app_setup import db
#event.listens_for(Document, 'after_insert')
def receive_after_insert(mapper, connection, target):
document = target.document
images = convert_from_bytes(document, fmt='jpeg')
images_bytes = map(lambda img: convert_to_byte(img), images)
map(lambda img_byte: upload_picture(img_byte, target.id, ), images_bytes)
db.session.commit()
def convert_img_to_byte(img):
img_byte = io.BytesIO()
img.save(img_byte, format='jpeg')
img_byte = img_byte.getvalue()
return img_byte
def upload_picture(img_byte, document_id):
picture = Picture(picture=img_byte, document_id=document_id)
db.session.add(picture)
As Session.add method states:
Its state will be persisted to the database on the next flush
operation.Repeated calls to add() will be ignored.
So your add calls should be followed by session.flush() call.
...
def upload_picture(img_byte, document_id):
picture = Picture(picture=img_byte, document_id=document_id)
db.session.add(picture)
db.session.flush()
Furthermore I would pay attention to performance of inserting records. There's a good article for this point in official docs: https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow
As such the current approach is not the fastest one, therefore I would choose either sqlalchemy_orm_bulk_insert or sqlalchemy_core approach.

Creating a pandas dataframe from a JSON request

I am trying to create a machine Learning application with Flask. I have created a POST API route that will take the data and transform it into a pandas dataframe. Here is the Flask code that calls the python function to transform the data.
from flask import Flask, abort, request
import json
import mlFlask as ml
import pandas as pd
app = Flask(__name__)
#app.route('/test', methods=['POST'])
def test():
if not request.json:
abort(400)
print type(request.json)
result = ml.classification(request.json)
return json.dumps(result)
This is the file that contains the helper function.
def jsonToDataFrame(data):
print type(data)
df = pd.DataFrame.from_dict(data,orient='columns')
But I am getting an import error. Also, when I print the type of data is dict so I don't know why it would cause an issue. It works when I orient the dataframe based on index but it doesn't work based on column.
ValueError: If using all scalar values, you must pass an index
Here is the body of the request in JSON format.
{
"updatedDate":"2012-09-30T23:51:45.778Z",
"createdDate":"2012-09-30T23:51:45.778Z",
"date":"2012-06-30T00:00:00.000Z",
"name":"Mad Max",
"Type":"SBC",
"Org":"Private",
"month":"Feb"
}
What am I doing wrong here ?

Categories