Querying Multiple Database on the same server with same table structure - python

I have a specific problem where I have to query different database to show results in dashboard. The tables that I have to query in those db are exactly same. The number of database can be max 50 and minimum is 5.
NOTE: I can't put all the data in same database.
I am using postgres and django. I am not able to understand how to query those database to get data. I also need to filter and aggregate and sort the data and show 10 - 100 results on the search query params.
APPROACH that I have in mind
Loop through all database and fetch the data based on the search params and order it by created date. After that take 10 - 100 results as per search params.
I am not able to understand what should be the correct approach and how it should be done considering speed and reliability.
I am open to use any other database for temporary storage. or any other ideas also.

Related

Super slow SQLAlchemy query to Google Cloud SQL on a very large table

I have a very large (and growing) table of URLs, and I want to query the table to check if an item exists and return that item so I can edit it, or choose to add a new item. The code below works but runs very very slowly and, given the volume of queries I need to perform (several thousand per hour) is creating some issues. I haven't been able to find a better solution than below. I have a good sense of what is happening - it is loading the entire table every time, but there must be a faster way here.
Session = sessionmaker(bind=engine)
formatted_url = "%{}%".format(url)
matching_url = None
with Session.begin() as session:
matching_url = session.query(Link.id).filter(Link.URL.like(formatted_url)).yield_per(200).first()
This works great if the URL exists and is recent, but especially if the URL isn't in the database at all, the process takes as long as one minute.
You are doing a select from table where Linkid like %formatted_url% limit 1;
This needs a full table scan in the database.
If you are lucky, the row is still in memory or cache.
If not, or if it does not exist the database will need that full table scan.
If you are using postgres on cloud SQL, this question will help you to remediate the problem PostgreSQL: Full Text Search - How to search partial words?

django-tables2 flooding database with queries

im using django-tables2 in order to show values from a database query. And everythings works fine. Im now using Django-dabug-toolbar and was looking through my pages with it. More out of curiosity than performance needs. When a lokked at the page with the table i saw that the debug toolbar registerd over 300 queries for a table with a little over 300 entries. I dont think flooding the DB with so many queries is a good idea even if there is no performance impact (at least not now). All the data should be coming from only one query.
Why is this happening and how can i reduce the number of queries?
Im posting this as a future reference for myself and other who might have the same problem.
After searching for a bit I found out that django-tables2 was sending a single query for each row. The query was something like SELECT * FROM "table" LIMIT 1 OFFSET 1 with increasing offset.
I reduced the number of sql calls by calling query = list(query) before i create the table and pass the query. By evaluating the query in the python view code the table now seems to work with the evaulated data instead and there is only one database call instead of hundreds.
This was a bug and has been fixed in https://github.com/bradleyayers/django-tables2/issues/427

Maximum number of rows in SQLite Database in Pycharm

I am trying to create a sqlite database for an app in python.
So far my database contains one table and it contains 500 entries already, I think. I say this because that is what the built-in database tool in Pycharm tells me. I then created another table to contain the rest of the data. I think they did not go into the table because it wasn't showing in the database tool. So I created another database to insert the rest of the data.
When I tried to delete some of the data from the first database it deleted but replaced it with some of the data I previously thought hadn't been entered in the first place due to 500 rows limit. I did this in PyCharm and all along it had thrown to exceptions. The diver is used was the xerial driver.
What am I doing wrong and how can I get to put all the data in one table? The final table is going to have a little over 1000 entries.
SQLite has a theoretical maximum row count of 264 rows:
The theoretical maximum number of rows in a table is 264 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first. A 140 terabytes database can hold no more than approximately 1e+13 rows, and then only if there are no indices and if each row contains very little data.
PyCharm displays database results in pages of a fixed size; use the paging controls (the left and right arrow buttons in the result page toolbar) to page through the results.
You can adjust the page size in your settings, see IDE settings -> database. I strongly suspect that the default is set to 500.
A more reliable way to count your current rows is to query the database:
SELECT COUNT(*) FROM <name_of_table>

How can I query on MySQL data snapshot using Python?

I'd like to create a snapshot of a database periodically, and execute some queries on the snapshot data to generate data for next step. Finally I want to discard the snapshot.
I read and convert all data into memory data structure(python dict) from the database and execute queries(implemented by my own code) on data structure
The program have a bottleneck on "execute query" step after data size increased
How can I query on data snapshot elegantly? Thanks much for any advice.
you can get all tables from your database with
SHOW TABLES FROM <yourDBname>
after that you may create copies of the tables in a new DB via
CREATE TABLE copy.tableA AS SELECT * FROM <yourDBname>.tableA
afterwars you can query the copy-database instead of the real data.
if you do queries on the tables, pls add indexes since they are not copied.

Large text database: Convert to SQL or use as is

My python project involves an externally provided database: A text file of approximately 100K lines.
This file will be updated daily.
Should I load it into an SQL database, and deal with the diff daily? Or is there an effective way to "query" this text file?
ADDITIONAL INFO:
Each "entry", or line, contains three fields - any one of which can be used as an index.
The update is is the form of the entire database - I would have to manually generate a diff
The queries are just looking up records and displaying the text.
Querying the database will be a fundamental task of the application.
How often will the data be queried? On the one extreme, if once per day, you might use a sequential search more efficiently than maintaining a database or index.
For more queries and a daily update, you could build and maintain your own index for more efficient queries. Most likely, it would be worth a negligible (if any) sacrifice in speed to use an SQL database (or other database, depending on your needs) in return for simpler and more maintainable code.
What I've done before is create SQLite databases from txt files which were created from database extracts, one SQLite db for each day.
One can query across SQLite db to check the values etc and create additional tables of data.
I added an additional column of data that was the SHA1 of the text line so that I could easily identify lines that were different.
It worked in my situation and hopefully may form the barest sniff of an acorn of an idea for you.

Categories