Requests/urllib not working in Flask/Apache/mod_wsgi/Windows

Requests/urllib not working in Flask/Apache/mod_wsgi/Windows - python

I have a Flask app with code that processes data coming from a request to another web app hosted on a different server, and it works just fine in development, furthermore, the library that processes the request can be called and used perfectly fine from python in our Windows server... However when the library is called by the webapp in production using mod_wsgi it refuses to work, requests made by the server just... time out.
I have tried everything from moving my code to the file it's used in, to switching from requests to urllib... nothing, so long as they're made from mod_wsgi all requests I make time out.
Why is that? is it some weird apache configuration thing that I'm unaware of?
I'm posting the library below (sorry I have to censor it up a bit, but I promise it works)
import requests
import re
class CannotAccessServerException(Exception):
pass
class ServerItemNotFoundException(Exception):
pass
class Service():
REQUEST_URL = "http://server-ip/url?query={query}&toexcel=csv"
#classmethod
def fetch_info(cls, query):
# Get Approximate matches
try:
server_request = requests.get(cls.REQUEST_URL.format(query = query), timeout = 30).content
except:
raise CannotAccessServerException
# If you're getting ServerItemNotFoundException or funny values consistently maybe the server has changed their tables.
server_regex = re.compile('^([\d\-]+);[\d\-]+;[\d\-]+;[\d\-]+;[\d\-]+;[\-"\w]+;[\w"\-]+;{query};[\w"\-]+;[\w"\-]+;[\w"\-]+;[\w"\-]+;[\w\s:"\-]+;[\w\s"\-]+;[\d\-]+;[\d\-]+;[\d\-]+;([\w\-]+);[\w\s"\-]+;[\w\-]+;[\w\s"\-]+;[\d\-]+;[\d\-]+;[\d\-]+;([\w\-]+);[\d\-]+;[\d\-]+;[\w\-]+;[\w\-]+;[\w\-]+;[\w\-]+;[\w\s"\-]+$'.format(query = query), re.MULTILINE)
server_exact_match = server_regex.search(server_request.decode())
if server_exact_match is None:
raise ServerItemNotFoundException
result_json = {
"retrieved1": server_exact_match.group(1),
"retrieved2": server_exact_match.group(2),
"retrieved3": server_exact_match.group(3)
}
return result_json
if __name__ == '__main__':
print(Service.fetch_info(99999))
PS: I know it times out because one of the things I tried was capturing the error raised by requests.get and returning its repr esentation.

In case anybody's wondering, after lots of research, trying to run my module as a subprocess, and all kinds of experiments, I had to resort to replicating the entirety of the dataset I needed to query from the remote server to my database with a weekly crontab task and then querying that.
So... Yeah, I don't have a solution, to be frank, or an explanation of why this happens. But if this is happening to you, your best bet might sadly be replicating the entire dataset on your server.

Related

Running my Python script 24/7

Really newbie questions.
I made a Python bot which receives some data and has to analyze it,then prints everything. To use it, i need it to run for the whole day, the problem is that i can't leave my computer on 24/7, so i need a server or something similar for it and i need to be able to check what it prints whenever i want.
I made some research and found Heroku, but i'm having some problems understanding it: i tried to deploy it there and it's working but it prints all the stuff on a shell in the app's page and not on the webpage that heroku assigned to my app, so my problem is partially solved, since i can run it for the whole day but checking what it prints is way harder.
I was thinking of making it as a Telegram bot in order to have everything there but since it prints a lot of stuff, Telegram would not be the best platform for this kind of thing.
Is there another resource to deploy it and have it, for example, on a webpage?

You can look into renting a cheapish cloud server (from digitalocean for example).
There are multiple ways of transferring data from your python script to your bot, either directly, through a websocket, or a webpage that displays it in a JSON format or otherwise.
Since you're already using python you could look into running a flask app on your node alongside your script or even combine them together.
If ran separately you could modify your script to output it's content into a file and then read the file with your flask application to display it on a webpage. For example:
with open('/tmp/data.txt', 'w') as f:
f.write(yourdata)
then in your flask application:
from flask import Flask
app = Flask(__name__)
#app.route('/')
def show_data():
with open('/tmp/data.txt', 'r') as f:
data = f.read()
return data
There are way more efficient ways of transferring data. Example above is a quick and dirty solution I wouldn't recommend running it due to security reasons especially if you are transmitting sensitive data.

How can I test the functions in this Flask app- write unittests?

POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
app = Flask(__name__)
#app.route('/get_cohort_curve/', methods=['GET'])```
def get_cohort_curve():
curve = str(request.args.get('curve'))
cohort = str(request.args.get('cohort'))
key = curve+cohort
return get_from_redis(key)
def get_from_redis(key):
try:
my_server = redis.Redis(connection_pool=POOL)
return json.dumps(my_server.get(key))
except Exception, e:
logging.error(e)
app.run()
I need to write unit-tests for this.
How do I test just the route, i.e. a get request goes to the right place?
Do I need to create and destroy instances of the app in the test for each function?
Do I need to create a mock redis connection?

If you are interested in running something in Flask, you could create a virtual environment and test the whole shebang, but in my opinion that is THE HARDEST way to do it.
When I built my site installing Redis locally, setting the port and then depositing some data inside it with an appropriate key was essential. I did all of my development in iPython (jupyter) notebooks so that I could test the functions and interactions with Redis before adding the confounding layer of Flask.
Then you set up a flawless template, solid HTML around it and CSS to style the site. If it works without data as an html page, then you move on to the python part of Flask.
Solidify the Flask directories. Make sure that you house them in a virtual environment so that you can call them from your browser and it will treat your virtual environment as a server.
You create your app.py application. I created each one of the page functions one at a time. tested to see that it was properly pushing the variables to post on the page and calling the right template. After you get on up and running right, with photos and data, then at the next page's template, using #app.route
Take if very slow, one piece at a time with debugging on so that you can see where when and how you are going wrong. You will only get the site to run with redis server on and your virtual environment running.
Then you will have to shut down the VE to edit and reboot to test. At first it is awful, but over time it becomes rote.
EDIT :
If you really want to test just the route, then make an app.py with just the #app.route definition and return just the page (the template you are calling). You can break testing into pieces as small as you like, but you have to be sure that the quanta you pick are executable as either a python script in a notebook or commandline execution or as a compact functional self-contained website....unless you use the package I mentioned in the comment: Flask Unit Testing Applications
And you need to create REAL Redis connections or you will error out.

Bottle with CherryPy does not behave multi-threaded when same end-point is called

As far as I know Bottle when used with CherryPy server should behave multi-threaded. I have a simple test program:
from bottle import Bottle, run
import time
app = Bottle()
#app.route('/hello')
def hello():
time.sleep(5)
#app.route('/hello2')
def hello2():
time.sleep(5)
run(app, host='0.0.0.0', server="cherrypy", port=8080)
When I call localhost:8080/hello by opening 2 tabs and refreshing them at the same time, they don't return at the same time but one of them is completed after 5 seconds and the other is completed after 5 more seconds.
But when I call /hello in one tab and /hello2 in another at the same time they finish at the same time.
Why does Bottle not behave multi-threaded when the same end-point is called twice? Is there a way to make it multi-threaded?
Python version: 2.7.6
Bottle version: 0.12.8
CherryPy version: 3.7.0
OS: Tried on both Ubuntu 14.04 64-Bit & Windows 10 64-Bit

I already met this behaviour answering one question and it had gotten me confused. If you would have searched around for related questions the list would go on and on.
The suspect was some incorrect server-side handling of Keep-Alive, HTTP pipelining, cache policy or the like. But in fact it has nothing to do with server-side at all. The concurrent requests coming to the same URL are serialised because of a browser cache implementation (Firefox, Chromium). The best answer I've found before searching bugtrackers directly, says:
Necko's cache can only handle one writer per cache entry. So if you make multiple requests for the same URL, the first one will open the cache entry for writing and the later ones will block on the cache entry open until the first one finishes.
Indeed, if you disable cache in Firebug or DevTools, the effect doesn't persist.
Thus, if your clients are not browsers, API for example, just ignore the issue. Otherwise, if you really need to do concurrent requests from one browser to the same URL (normal requests or XHRs) add random query string parameter to make request URLs unique, e.g. http://example.com/concurrent/page?nocache=1433247395.

It's almost certainly your browser that's serializing the request. Try using two different ones, or better yet a real client. It doesn't reproduce for me using curl.

Web application: Hold large object between requests

I'm working on a web application related to genome searching. This application makes use of this suffix tree library through Cython bindings. Objects of this type are large (hundreds of MB up to ~10GB) and take as long to load from disk as it takes to process them in response to a page request. I'm looking for a way to load several of these objects once on server boot and then use them for all page requests.
I have tried using a remote manager / client setup using the multiprocessing module, modeled after this demo, but it fails when the client connects with an error message that says the object is not picklable.

I would suggest writing a small Flask (or even raw WSGI… But it's probably simpler to use Flask, as it will be easier to get up and running quickly) application which loads the genome database then exposes a simple API. Something like this:
app = Flask(__name__)
database = load_database()
#app.route('/get_genomes')
def get_genomes():
return database.all_genomes()
app.run(debug=True)
Or, you know, something a bit more sensible.
Also, if you need to be handling more than one request at a time (I believe that app.run will only handle one at a time), start by threading… And if that's too slow, you can os.fork() after the database is loaded and run multiple request handlers from there (that way they will all share the same database in memory).

Keeping concurrency in web.py applications on mod_wsgi

Sorry if this makes no sense. Please comment if clarification is needed.
I'm writing a small file upload app in web.py which I am deploying using mod_wsgi + apache. I have been having a problem with my session management and would like clarification on how the threading works in web.py.
Essentially I embed a code in a hidden field of the html page I render when someone accesses my page. The file upload is then done via a standard POST request containing both the file and the code. Then I retrieve the progress of the file by updating it in the file upload POST method and grabbing it with a GET request to a different class. The 'session' (apologies for it being fairly naive) is stored in a session object like this:
class session:
def __init__(self):
self.progress = 0
self.title = ""
self.finished = False
def advance(self):
self.progress = self.progress + 1
The sessions are all kept in a global dictionary within my app script and then accessed with my code (from earlier) as the key.
For some reason my progress seems to stay at 0 and never increments. I've been debugging for a couple hours now and I've found that the two session objects referenced from the upload class and the progress class are not the same. The two codes, however, are (as far as I can tell) equal. This is driving me mad as it worked without any problems on the web.py test server on my local machine.
EDIT: After some research it seems that the dictionary may get copied for every request. I've tried putting the dictionary in another and importing but this doesn't work. Is there some other way short of using a database to 'seperate' the sessions dictionary?

Apache/mod_wsgi can run in multiprocess configurations and possible your requests aren't even being serviced by the same process and never will if for that multiprocess configuration each process is single thread because while the upload is occuring no other requests can be handled by that same process. Read:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Possibly you should use mod_wsgi daemon mode with single multiple thread daemon process.

From PEP 333, defining WSGI:
Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server
Check the documentation of your WSGI server.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.