Pymongo : keep creating new ids - python

In a pymongo project I'm working on, in a particular collection, I have to keep uploading name and age of people who will be entering them. I have to identify them with unique ids.
What I as planning to do is start the first id from 1. When inserting data, I first read the whole collection, find the number of records and then save my record with the next id. (eg. I read, and I find that there are 10 records, then the id of my new record will be 11).
But is there any better way to do it?

MongoDB already assigns unique id to each document which can be sorted in ascending order. If you don't wanna use that then you can create a separate collection which should just contain 1 document with totalRecordsCount. And increment it every time you add new record and get latest number before adding new record. This is not a best way but you will be able to avoid reading whole collection.

Related

How does DynamoDB Query act when all items are in a single item collection?

I have dynamodb and I use lambda to query the tables using python,
My columns are:
product_id,
product_name,
create_at,
I'd like to be able to sort every column In descending or ascending order. From what I have read, I came to the conclusion that I need to create the first column as Partition, and in every record, I have the same value, let's say "dummy". Moreover, I need to create the create_at as a sort key and; for the other columns I need to create a local secondary index for each of them.Then when I sort, I can do that
response = table.query(
KeyConditionExpression=Key('dummy_col').eq('dummy'),
IndexName=product_name_index
ScanIndexForward=True,
)
what I don't understand is that: will my query Go through all the records like scanning, because of my dummy value in every record?
If you require all these access patterns, that's one way to design your table. It has some limitations though - you won't be able to have more than 10GB of data in total, because when you use Local Secondary Indexes, that limits the item collection (all items with the same PK) size to 10GB.
Each query reads and returns up to 1MB of data (docs), afterwards you will get a token that you can use to request the next 1 MB (LastEvaluatedKey)in a new query (ExclusiveStartKey). You can also use the Limit parameter to limit how many items will be read and returned per Query. If you read the whole table based on that, you'll effectively have scanned it.
By filtering based on the sort key you can also define where it starts to read, so you don't have to read everything all the time.

What is the most efficient way update multiple rows in a database and log it?

I am working on making a relatively simple inventory system in which data is stored and updated with MySQL with Python connected to it. When adding to stock, an end user would input values into an interface and associate a purchase ticket number with that transaction. A log would then indicate that purchase ticket x added units to stock.
In a single purchase ticket, several items in stock may be increased, i.e. several rows within the stock table will need to be updated per purchase ticket.
However, I am having trouble conceptualizing an efficient way of updating multiple rows while still associating the purchase ticket number with the transaction. I was going to use a simple UPDATE statement, but can't figure out how to link the ticket number.
I was considering making a table for purchase tickets, but figured it would be more efficient to just increment stock with UPDATEs alone, but I appear to be wrong. Was going to use something like:
UPDATE stock SET count + x WHERE id = y;
Where x is how much the stock is being incremented by, and y is the specific product's unique ID.
TL;DR is there any efficient way to update multiple rows in a single column while also associating a user-inputted number with that transaction?
Don’t update. Updates are slow because the transaction has to seek out each row to be updated. Just append to the end of the table. Then use the Output clause so you have access to what got inserted. Finally you can join to those results that will include the primary keys and log those keys in another table.

delete previous line in postgresql database

I'm doing a face recognition project and my output from this project is a PostgreSQL database that is stored every time a person is identified by the name of the person in the database. I use the python programming language and psycopg2 module for make this output. What I need is until the new one is detected, then the previous line in the database is deleted. Thank you in advance.
thank you for your support
this is my database
I have a table that stores the images path with a limited number id
I want to show appropriate image when my system detected for example a happy man,
For this purpose, I want to join the two tables(images table and face classification table), but my table introduces the IDs serially and I can not join them.
Assuming that the order of the ids correlates to the order of insertion: You can fetch the id in descending order of the persons lower than the last added, and taking the first one.
SELECT id FROM person WHERE id < $1 ORDER BY id DESC
where $1 stands for the id of the row inserted. With the fetched id you can delete the row.
A better way would be storing the times as timestamps instead of texts

Counting the number of distinct strings given by a GQL Query in Python

Suppose I have the following GQL database,
class Signatories(db.Model):
name = db.StringProperty()
event = db.StringProperty()
This database holds information regarding events that people have signed up for. Say I have the following entries in the database in the format (event_name, event_desc): (Bob, TestEvent), (Bob, TestEvent2), (Fred, TestEvent), (John, TestEvent).
But the dilemma here is that I cannot just aggregate all of Bob's events into one entity because I'd like to Query for all the people signed up for a specific event and also I'd like to add such entries without having to manually update the entry every single time.
How could I count the number of distinct strings given by a GQL Query in Python (in my example, I am specifically trying to see how many people are currently signed up for events)?
I have tried using the old mcount = db.GqlQuery("SELECT name FROM Signatories").count(), however this of course returns the total number of strings in the list, regardless of the uniqueness of each string.
I have also tried using count = len(member), where member = db.GqlQuery("SELECT name FROM Signatories"), but unfortunately, this only returns an error.
You can't - at least not directly. (By the way you don't have a GQL database).
If you have a small number of items, then fetch them into memory, and use a set operation to produce the unique set and then count
If you have larger numbers of entities that make in memory filtering and counting problematic then your strategy will be to aggregate the count as you create them,
e.g.
create a separate entity each time you create an event that has the pair of strings as the key. This way you will only have one entity the data store representing the specific pair. Then you can do a straight count.
However as you get large numbers of these entities you will need to start performing some additional work to count them as the single query.count() will become too expensive. You then need to start looking at counting strategies using the datastore.

How to check if data has already been previously used

I have a python script that retrieves the newest 5 records from a mysql database and sends email notification to a user containing this information.
I would like the user to receive only new records and not old ones.
I can retrieve data from mysql without problems...
I've tried to store it in text files and compare the files but, of course, the text files containing freshly retrieved data will always have 5 records more than the old one.
So I have a logic problem here that, being a newbie, I can't tackle easily.
Using lists is also an idea but I am stuck in the same kind of problem.
The infamous 5 records can stay the same for one week and then we can have a new record or maybe 3 new records a day.
It's quite unpredictable but more or less that should be the behaviour.
Thank you so much for your time and patience.
Are you assigning a unique incrementing ID to each record? If you are, you can create a separate table that holds just the ID of the last record fetched, that way you can only retrieve records with IDs greater than this ID. Each time you fetch, you could update this table with the new latest ID.
Let me know if I misunderstood your issue, but saving the last fetched ID in the database could be a solution.

Categories