I want to write some unittests for an application that uses MySQL. However, I do not want to connect to a real mysql database, but rather to a temporary one that doesn't require any SQL server at all.
Any library (I could not find anything on google)? Any design pattern? Note that DIP doesn't work since I will still have to test the injected class.
There isn't a good way to do that. You want to run your queries against a real MySQL server, otherwise you don't know if they will work or not.
However, that doesn't mean you have to run them against a production server. We have scripts that create a Unit Test database, and then tear it down once the unit tests have run. That way we don't have to maintain a static test database, but we still get to test against the real server.
I've used python-mock and mox for such purposes (extremely lightweight tests that absolutely cannot require ANY SQL server), but for more extensive/in-depth testing, starting and populating a local instance of MySQL isn't too bad either.
Unfortunately SQLite's SQL dialect and MySQL's differ enough that trying to use the former for tests is somewhat impractical, unless you're using some ORM (Django, SQLObject, SQLAlchemy, ...) that can hide the dialect differences.
Related
I am attempting to make a small mafia style game, and I am using replit. Would there be a way to use a php server (or an html server) as a database that I can connect to from a python project?
HTML nor PHP are databases. The "LAMP" stack uses PHP with MySQL / MariaDB for its database, which might be what you're referring to... However, the "P" there could also be Python ¯\_(ツ)_/¯
What you need in Python is a Data Persistence layer, which could just be a simple CSV / JSON file; however pickle module is easier to work with native Python types.
sqlite module can be used if you want the data to be more portable to other frameworks/languages.
And the final option is to actually run your own database server externally and expose it over a remote TCP / HTTP API connection (I don't think Repl.it supports running Docker containers).
If you have access to the actual machine, you can run something like SQLite on the machine. You are running dangerously close to needing more security and what not though.
Security is important, but if you just want to "play around" running something like SQLite should answer your initial pass.
I am working on a system where a bunch of modules connect to a MS SqlServer DB to read/write data. Each of these modules are written in different languages (C#, Java, C++) as each language serves the purpose of the module best.
My question however is about the DB connectivity. As of now, all these modules use the language-specific Sql Connectivity API to connect to the DB. Is this a good way of doing it ?
Or alternatively, is it better to have a Python (or some other scripting lang) script take over the responsibility of connecting to the DB? The modules would then send in input parameters and the name of a stored procedure to the Python Script and the script would run it on the database and send the output back to the respective module.
Are there any advantages of the second method over the first ?
Thanks for helping out!
If we assume that each language you use will have an optimized set of classes to interact with databases, then there shouldn't be a real need to pass all database calls through a centralized module.
Using a "middle-ware" for database manipulation does offer a very significant advantage. You can control, monitor and manipulate your database calls from a central and single location. So, for example, if one day you wake up and decide that you want to log certain elements of the database calls, you'll need to apply the logical/code change only in a single piece of code (the middle-ware). You can also implement different caching techniques using middle-ware, so if the different systems share certain pieces of data, you'd be able to keep that data in the middle-ware and serve it as needed to the different modules.
The above is a very advanced edge-case and it's not commonly used in small applications, so please evaluate the need for the above in your specific application and decide if that's the best approach.
Doing things the way you do them now is fine (if we follow the above assumption) :)
I have a Pylons application using SQLAlchemy with SQLite as backend. I would like to know if every read operation going to SQLite will always lead to a hard disk read (which is very slow compared to RAM) or some caching mechanisms are already involved.
does SQLite maintain a subset of the database in RAM for faster access ?
Can the OS (Linux) do that automatically ?
How much speedup could I expect by using a production database (MySQL or PostgreSQL) instead of SQLite?
Yes, SQLite has its own memory cache. Check PRAGMA cache_size for instance. Also, if you're looking for speedups, check PRAGMA temp_store. There is also API for implementing your own cache.
The SQLite database is just a file to the OS. Nothing is 'automatically' done for it. To ensure caching does happen, there are sqlite.h defines and runtime pragma settings.
It depends, there are a lot of cases when you'll get a slowdown instead.
How much speedup could I expect by using a production database (Mysql or postgres) instead of sqlite?
Are you using sqlite in a production server environment? You probably shouldn't be:
From Appropriate Uses for Sqlite:
SQLite will normally work fine as the database backend to a website.
But if you website is so busy that you are thinking of splitting the
database component off onto a separate machine, then you should
definitely consider using an enterprise-class client/server database
engine instead of SQLite.
SQLite is not designed well for, and was never intended to scale well; SQLite trades convenience for performance; if performance is a concern, you should consider another DBMS
I have an application that needs to interface with another app's database. I have read access but not write.
Currently I'm using sql statements via pyodbc to grab the rows and using python manipulate the data. Since I don't cache anything this can be quite costly.
I'm thinking of using an ORM to solve my problem. The question is if I use an ORM like "sql alchemy" would it be smart enough to pick up changes in the other database?
E.g. sql alchemy accesses a table and retrieves a row. If that row got modified outside of sql alchemy would it be smart enough to pick it up?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Edit: To be more clear
I have one application that is simply a reporting tool lets call App A.
I have another application that handles various financial transactions called App B.
A has access to B's database to retrieve the transactions and generates various reports. There's hundreds of thousands of transactions. We're currently caching this info manually in python, if we need an updated report we refresh the cache. If we get rid of the cache, the sql queries combined with the calculations becomes unscalable.
I don't think an ORM is the solution to your problem of performance. By default ORMs tend to be less efficient than row SQL because they might fetch data that you're not going to use (eg. doing a SELECT * when you need only one field), although SQLAlchemy allows fine-grained control over the SQL generated.
Now to implement a caching mechanism, depending on your application, you could use a simple dictionary in memory or a specialized system such as memcached or Redis.
To keep your cached data relatively fresh, you can poll the source at regular intervals, which might be OK if your application can tolerate a little delay. Otherwise you'll need the application that has write access to the db to notify your application or your cache system when an update occurs.
Edit: since you seem to have control over app B, and you've already got a cache system in app A, the simplest way to solve your problem is probably to create a callback in app A that app B can call to expire cached items. Both apps need to agree on a convention to identify cached items.
There's an API for Twisted apps to talk to a database in a scalable way: twisted.enterprise.dbapi
The confusing thing is, which database to pick?
The database will have a Twisted app that is mostly making inserts and updates and relatively few selects, and then other strictly-read-only clients that are accessing the database directly making selects.
(The read-only users are not necessarily selecting the data that the Twisted app is inserting; its not as though the database is being used as a message-queue)
My understanding - which I'd like corrected/adviced - is that:
Postgres is a great DB, but almost all the Python bindings - and there is a confusing maze of them - are abandonware
There is psycopg2 for postgres, but that makes a lot of noise about doing its own connection-pooling and things; does this co-exist gracefully/usefully/transparently with the Twisted async database connection pooling and such?
SQLLite is a great database for little things but if used in a multi-user way it does whole-database locking, so performance would suck in the usage pattern I envisage; it also has different mechanisms for typing column values?
MySQL - after the Oracle takeover, who'd want to adopt it now or adopt a fork?
Is there anything else out there?
Scalability
twisted.enterprise.adbapi isn't necessarily an interface for talking to databases in a scalable way. Scalability is a problem you get to solve separately. The only thing twisted.enterprise.adbapi really claims to do is let you use DB-API 2.0 modules without the blocking that normally implies.
Postgres
Yes. This is the correct answer. I don't think all of the Python bindings are abandonware - psycopg2, for example, seems to be actively maintained. In fact, they just added some new bindings for async access which Twisted might eventually offer an interface.
SQLite3 is pretty cool too. You might want to make it possible to use either Postgres or SQLite3 in your app; your unit tests will definitely be happier running against SQLite3, for example, even if you want to deploy against Postgres.
Other?
It's hard to know if another database entirely (something non-relational, perhaps) would fit your application better than Postgres. That depends a lot on the specific data you're going to be storing and the queries you need to run against it. If there are interesting relationships in your database, Postgres does seem like a pretty good answer. If all your queries look like "SELECT foo, bar FROM baz" though, there might be a simpler, higher performance option.
There is the txpostgres library which is a drop in replacement for twisted.enterprise.dbapi, —instead of a thread pool and blocking DB IO, it is fully asynchronous, leveraging the built in async capabilities of psycopg2.
We are using it in production in a big corporation and it's been serving us very well so far. Also, it's actively developed—a bug we reported recently was solved very quickly.
http://pypi.python.org/pypi/txpostgres
https://github.com/wulczer/txpostgres
You could look at nosql databases like mongodb or couchdb with twisted.
Scaling out could be rather easier with nosql based databases than with mysql or postgres.