As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to ask you what programming language I should use to develop a horizontally scalable database. I don't care too much about performance.
Currently, I only know PHP and Python, but I wonder if Python is good for scalability.
Or is this even possible in Python?
The reasons I don't use an existing system is, I need deep insight into the system, and there is no database out there that can store indexes the way I want. (It's a mix of non relational, sparse free multidimensional, and graph design)
EDIT:
I already have most of the core code written in Python and investigated ways to improve adding data for that type of database design, what limits the use of other databases even more.
EDIT 2:
Forgot to note, the database tables are several hundred gigabytes.
The deveopment of a scalable database is language independent, i cannot say much about PHP, but i can tell you good things about Python, it's easy to read, easy to learn, etc. In my opinion it makes the code much cleaner than other languges.
Betweent PHP & Python, definitely Python. Where I work, the entire system is written in Python and it scales quite well.
p.s.: Do take a look at Mongo Db though.
You're looking for MongoDB.
Mongodb has some excellent python drivers. It is a joy to work with.
Since this is clearly a request for "opinion", I thought I'd offer my $.02
We looked at MongoDB 12-months ago, and started to really like it...but for one issue. MongoDB limits the largest database to amount of physical RAM installed on the MongoDB server. For our tests, this meant we were limited to 4 GB databases. This didn't fit our needs, so we walked away (too bad really, because Mongo looked great).
We moved back to home turf, and went with PostgreSQL for our project. It is an exceptional system, with lots to like.
But we've kept an eye on the NoSQL crowd ever since, and it looks like Riak is doing some really interesting work.
(fyi -- it's also possible the MongoDB project has resolved the DB size issue -- we haven't kept up with that project).
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have huge tables of data that I need to manipulate (sort, calculate new quantities, select specific rows according to some conditions and so on...). So far I have been using a spreadsheet software to do the job but this is really time consuming and I am trying to find a more efficient way to do the job.
I use python but I could not figure out how to use it for such things. I am wondering if anybody can suggest something to use. SQL?!
This is a very general question, but there are multiple things that you can do to possibly make your life easier.
1.CSV These are very useful if you are storing data that is ordered in columns, and if you are looking for easy to read text files.
2.Sqlite3 Sqlite3 is a database system that does not require a server to use (it uses a file instead), and is interacted with just like any other database system. However, for very large scale projects that are handling massive amounts of data, it is not recommended.
3.MySql MySql is a database system that requires a server to interact with, but can be tweaked for very large scale projects, as well as small scale projects.
There are many other different types of systems though, so I suggest you search around and find that perfect fit. However, if you want to mess around with Sqlite3 or CSV, both Sqlite3 and CSV modules are supplied in the standard library with python 2.7 and 3.x I believe.
You will probably appreciate the sqlite3 module in Python standard library:
http://docs.python.org/library/sqlite3.html
You get a SQL database that's stored in a file on disk, with no need to configure a separate database server. It's not appropriate for multiple clients accessing at once, but for a single-threaded analysis application like yours, it's a good fit.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Do you know what speed difference are of:
pickle
shelve
sqlite
some MySQL connector
MongoDB
...
If I wanted to store many dicts, which would be the prefered way and what are differences?
I don't want to drag out the comments, so I'll answer.
Given what you have said:
No server, no existing code. I only want to write a program that locally stores string to string dicts in a file or whatever. No more fancy than that.
I would say your best bet for something so simple is probably something like JSON.
However, if you need to to be super-fast, it may not be the best solution (or it may be - I honestly don't know how it performs in comparison). It's simple, and there are implementations of it for most platforms, which covers a lot of the ground you want. If you want the best speed possible, my advice would be to test it, that's the only way you'll know for sure. Of course, simple is usually a good sign for speed.
You haven't given enough information to know how important performance is here. Remember, unless you need the performance (provably) then don't bother optimising until you do. Go for something easy to read and maintain code-side, and easy to work with file-side. This is why I recommend JSON.
For persistent string to string dict's, anydbm is pretty reasonable. bsddb can be used from anydbm, and is fast but a bit sensitive to being interrupted. gdbm can be used from anydbm, and is slower but not likely to yield a corrupted database.
Also, if you want to read an entire dict into memory, make a lot of changes, and write the resulting dict back out, there's: http://stromberg.dnsalias.org/svn/dohdbm/trunk/ I'm using this one in a backup software project. It'll compress your dictionaries if you want, which can be a performance win if your I/O is particularly slow, or you have a lot of modifications to make.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am a newb coder in a startup and I am implementing search of documents in a directory in a web host.
I am comparing Lucene/Solr, Whoosh, Sphinx and Xapian. Whoosh is natively python. But I want your opinions on it too. Which of these have
mature and easy to use and install interfaces with python? (Whoosh is a no-brainer)
no chance for crashes, bottlenecks and other failures
best documented interface (Im not reading PHP docs because python docs were sparse)
easiest to get up and running (only one has a quick-start tutorial)
Speaking for Apache Solr, Python has several Solr clients, which I've collected based on feedback from our customers at Websolr:
Haystack is very popular, and designed for seamless integration within Django apps. If you're developing a Django app, Haystack is for you.
Sunburnt looks to be more generic than Haystack, and is also very well documented. If you're doing plain ol' Python, Sunburnt is worth a look.
Other Python Solr clients that I've found, which seem a bit lower level...
solrpy
pysolr (I know, right?)
Insol
Some more details about how your app is built (in particular, is it a Django app?) would help narrow things down from here. Good luck finding the best fit for your app!
Use Whoosh if you don't need the speed, extra features of the alternatives. It's great, has a nice API, good documentation. My second choice would probably be Xapian, which is fast and has a fairly decent API. They are all fairly mature products. If you don't know what you really need, I'd just go with Whoosh for now.
If you want quick python integration, try indextank. You can be up and running in 2 minutes, and it's free.
For the other alternatives, I'd go with Solr (provided you want to host the search servers yourself, or signup for websolr )
Disclaimer: I work at indextank.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Writing an app in Python, and been playing with various ORM setups and straight SQL. All of which are ugly as sin.
I have been looking at ZODB as an object store, and it looks a promising alternative... would you recommend it? What are your experiences, problems, and criticism, particularly regarding developer's perspectives, scalability, integrity, long-term maintenance and alternatives? Anyone start a project with it and ditch it? Why?
Whilst the ideas behind ZODB, Pypersyst and others are interesting, there seems to be a lack of enthusiasm around for them :(
I've used ZODB for more than ten years now, in Zope and outside. It's great if your data is hierarchical. The largest data store a customer operates has maybe. I don't know, 100GB in it? Something on that order of magnitude anyway.
Here is a performance comparison against Postgres.
If you're writing a WSGI web app, these packages may be useful:
repoze.tm2 (docs)
repoze.zodbconn (docs)
Compared to "any key-value store", the key features for ZODB would be automatic integration of attribute changes with real ACID transactions, and clean, "arbitrary" references to other persistent objects.
The ZODB is bigger than just the FileStorage used by default in Zope:
The RelStorage backend lets you put your data in an RDBMS which can be backed up, replicated, etc. using standard tools.
ZEO allows easy scaling of appservers and off-line jobs.
The two-phase commit support allows coordinating transactions among multiple databases, including RDBMSes (assuming that they provide a TPC-aware layer).
Easy hierarchy based on object attributes or containment: you don't need to write recursive self-joins to emulate it.
Filesystem-based BLOB support makes serving large files trivial to implement.
Overall, I'm very happy using ZODB for nearly any problem where the shape of the data is not obviously "square".
I would recommend it.
I really don't have any criticisms. If it's an object store your looking for, this is the one to use. I've stored 2.5 million objects in it before and didn't feel a pinch.
ZODB has been used for plenty of large databases
Most ZODB usage is/was probably Zope users who migrated away if they migrate away from Zope
Performance is not so good as relatonal database+ORM especially if you have lots of writes.
Long term maintenance is not so bad, you want to pack the database from time to time, but that can be done live.
You have to use ZEO if you are going to use more than one process with your ZODB which is quite a lot slower than using ZODB directly
I have no idea how ZODB performs on flash disks.
With pickling you should be able to use any key value database in a similar fashion.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
web2py to is a Python framework but shares the "convention over configuration" design that Ruby on Rails has. On the plus side it packages a lot more functionality with its s standard distribution and we claim it is faster and easier to use.
Has any Rails user tried it? What is your impression?
No rants please. Just technical comments.
c'mon guys... your only argument is "Technical differences are rather irrelevant." and "it don't matter what web framework you use"? I disagree. The size of the users base has more to do with marketing and how long a framework has been around. By that argument ASP and PHP are better than Rails.
Has anybody here used both Rails and web2py?
web2py runs on webfaction and any hosting provider that supports mod_proxy or mod_wsgi or mod_fcgi, and runs on Google App Engine (rails does not). There is also a dedicated web2py hosting provider (star-nix.com).
I found web2py much easier to learn... there are fewer scripts to run and abstractions. On the other hand, web2py's database layer isn't a real ORM... it's almost like writing raw SQL. Simple things end up taking many lines of code, just like SQL.
I would say the biggest "con" of using webpy over Rails is that there are not a lot of Rails-specific hosting services around, and the huge community based around it (there are Rails plugins and tools for.. everything). The same cannot be said for web2py.
It depends what you want to do with it - if it's something to write your personal site with, and you already have a server to host it on, use whatever you prefer. If it's something to distribute for others to run, Rails has more options for hosting, and a bigger community, so it may be a better choice.
Technical differences are rather irrelevant. Every framework can basically do the same (generate web-pages). What is important is community, ease of use, useful feature-sets, ability to host it and so on - and those are all really subjective.
I still use PHP quite often, not because "it's better", but because I can host it on a huge majority of web-hosts. I also use Rails because as it has a good, and very active community. The actually technicalities of the framework wasn't ever a consideration, really..
I could probably put together a list of why web2py is "better"/"worse" than Rails - Rails may be 0.04sec/request slower at generating templates containing loops, or web2py may have a good DB model generator, or some other technical reason - but those may not be relevant to you at all