I'm aiming to apply the skills to hosting an app on Heroku with Python + SQLAlchemy + Xeround + Redis/Memcache.
What is the minimal software stack that I need? I'm looking at the following:
Python
A web application framework, like Flask
MySQL
MySQLdb <-- do I need this?
SQLAlchemy
It's obvious from the question that I do not know anything about SQL yet, that it would be preposterous to look at SQLAlchemy already. That's fine. I'm planning to learn the basics then immediately apply them using a Python API, if "immediately" is possible at all.
What I have accomplished so far
For an idea of where I stand:
Hosted an app on Google App Engine, using my own custom Models and Propertys for the datastore, memcache, task queue.
Hosted an app on Heroku, but I haven't used a database with it yet.
What I'm aiming for with this question
I want to know the software stack that I need to begin using MySQL. I just want to avoid installing stuff that I don't need.
For MySQL on Heroku, I suggest https://addons.heroku.com/cleardb
Related
I am trying to create a django project which uses AppEngine's task queue, and would like to test it locally before deploying (using gclouds dev_appserver.py`).
I can't seem to find resources that help in local development, and the closest thing was a Medium article that helps setting up django with Datastore (https://medium.com/#bcrodrigues/quick-start-django-datastore-app-engine-standard-3-7-dev-appserver-py-way-56a0f90c53a3).
Does anyone have an example that I could look into for understanding how to start my implementation?
I don't think you can test it locally. I'd create a new app engine project and test it there. You should be able to stay within the free quota.
Once you have it working, you can write unit tests with mocks of task queue API calls.
I have Django app just for CRUD of some daily data.
Model only have price and date.
I should write some code that will automatically(daily) insert new data to my model.
I am planning to use BeautifulSoup for web page parsing.
So I have few questions:
I am planing to use crontab (manual edit with crontab -e) for setting task to run once daily. Is there smarter solution ?
Should I use Django ORM or just write SQL in separate script ?
I am looking for advices what is better in the long run. I will have more task like this one.
Thanks
If you are already building supporting code in Django for your models and will be running the code on the same server your app is installed on, then you should probably use Django ORM.
See this page for help getting started writing command-line admin utilities that get run in the context of your Django app:
https://docs.djangoproject.com/en/dev/howto/custom-management-commands/
This answer is more a general architecture answer...
To start, everything can be done in django.
I would set up celery and periodic tasks: http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html
For the actual crawl, you will probably need to fan out on link discovery... you can use celery for that too using just the #task decorator.
Start the project using the django:/// broker. Once you get to size, move on to RabbitMQ.
I put the first steps in BlueBream framework. In my project I must get data from RDBMS - MySQL, PostgreSQL and MS Server. For now, I made a simple tutorial helloworld :) I know how to write Interfaces and Implementations, etc.
My question is: How to set up a connections to RDBMS and multiple connections? Could you give me a simple "step by step" tutorial ?
How to bake a database? Is in BlueBream something like a command "syncdb" in Django?
Take a look at zope.sqlalchemy; it integrates SQLAlchemy into the Zope transaction manager. SQLAlchemy in turn lets you access many different databases, including MySQL, PostgreSQL and MS Server.
The zope.sqlalchemy package explains how to obtain a SQLAlchemy session. From there on out you use bog-standard SQLAlchemy operations for which there are plenty of tutorials and help here on SO. SQLAlchemy can also take care of setting up the database schema for you, if that is needed.
I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.
Is there a way to use a real database(SQLite, Mysql, or even some non-relational one) as datastore for development, instead of memory/file datastore that is provided.
I saw few projects, GAE-SQLite(did not seem to be working) and one tip about accessing production datastore using remote api (still pretty slow for large datasets).
MongoDB works great for that. You will need:
The MongoDB stub: http://github.com/mongodb/mongo-appengine-connector
MongoDB: http://www.mongodb.org/display/DOCS/Downloads
Some code to set it up like:
code:
import datastore_mongo_stub
os.environ['APPLICATION_ID'] = 'test'
datastore = datastore_mongo_stub.DatastoreMongoStub(
os.environ['APPLICATION_ID'], 'woot', '', require_indexes=False)
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', datastore)
But if you're looking for truly faster development (like I was) the datastore is actually not the issue as much is the single threaded web server. I tried to replace it with spawning but that was a little too hard. You could also try to set up TyphoonAE which will mimic the appengine stack with open alternatives.
Be aware that if you do any of these, you might lose some of the exact behavior the current tools provide, meaning that if you deploy you could get results you didn't expect. In other words; make sure you know what you're doing :-)
The Google App Engine SDK for Python now bundles support for SQLite. See the official docs for more information.
bdbdatastore is an alternative datastore backend that's considerably better than the one built in to the development server, although the datastore is far from being the only problem with the dev server when it comes to handling large applications.