Mongodb: Can insert only not exists data? - python

I have a sample data, like:
data = {'tag':'ball','color':'red'}
I want insert it to my collection. But if it has a same one, then do not insert.
I can do it in python like:
if not collection.find_one({data}):
collection.insert(data)
Or, I can do it with update:
collection.update(data,data,upsert=True)
But, my question is: weather the 'update' write data every time?
In this two method, they search same times for the duplicate data, but in 1st way, only write while not exist.
In 2nd way, is it means if data not exists, insert one. if data exists, update one. The database would be wrote in all situation?
So, which method is better? and why?

I recently came across a situation where I only want to write to the database if a record doesn't exist. In this case your first method is better.
For the 2nd way, the database would be written in all situation. However, if you want certain field not to be updated, you can try $setOnInsert as explained here: https://docs.mongodb.com/manual/reference/operator/update/setOnInsert/

Related

Creating an empty database in SQLite from a template SQLite file

I have a question about a problem that I'm sure is pretty often, but I still couldn't find any satisfying answers.
Suppose I have a huge-size original SQLite database. Now, I want to create an empty database that is initialized with all the tables, columns, and primary/foreign keys but has no rows in the tables. Basically, I want to create an empty table using the original one as a template.
The "DELETE from {table_name}" query for every table in the initial database won't do it because the resulting empty database ends up having a very large size, just like the original one. As I understand it, this is because all the logs about the deleted rows are still being stored. Also, I don't know exactly what else is being stored in the original SQLite file, I believe it may store some other logs/garbage things that I don't need, so I would prefer just creating an empty one from scratch if it is possible. FWIW, I eventually need to do it inside a Python script with sqlite3 library.
Would anyone please provide me with an example of how this can be done? A set of SQLite queries or a python script that would do the job?
You can use Command Line Shell For SQLite
sqlite3 old.db .schema | sqlite3 new.db

Updating mysql database using your findings in python

I am new to python so this might have wrong syntax as well. I want to bulk update my sql database. I have already created a column in the database with null values. My aim is to find the moving average on python then use those value to update the database. But the problem is that the moving averages that I have found is in a data table format where columns are time blocks and rows are dates. But in the database both dates and time blocks are different column.
MA30 = pd.rolling_mean(df, 30)
cur.execute(""" UPDATE feeder_ndmc_copy
SET time_block = CASE date, (MA30)""")
db.commit()
This is the error I am getting.
self.errorhandler(self, exc, value)
I have seen a lot of other question answers but there is no example of how to use the finding of python command to update the database. Any suggestions?
Ok, so it is very hard to give you a complete answer with what little information your question contains, but I'll try my best to explain how I would deal with this.
The easiest way is probably to automatically write a separate UPDATE query for each row you want to update. If I'm not mistaken, this will be relatively efficient on the database side of things but it will produce some overhead in your python program. I'm not a database guy, but since you didn't mention performance optimality in your question, i will assume that any solution that works will do for now.
I will be using sqlalchemy to handle interactions with the database. Take care that if you want to copy my code, you will need to install sqlalchemy and import the module in your code.
First, we will need to create a sqlalchemy engine. I will assume that you use a local database, if not you will need to edit this part.
engine = sqlalchemy.create_engine('mysql://localhost/yourdatabase')
Now lets create a string containing all our queries (i don't know the name of the columns you want to update, so I'll use place holders, I also do no know the format of your time index, so I'll have to guess):
queries = ''
for index, value in MA30.iterrows():
queries += 'UPDATE feeder_ndmc_copy SET column_name = {} WHERE index_column_name = {};\n'.format(value, index.strftime(%Y-%m-%d %H:%M:%S))
You will need to adapt this part heavily to conform with your requirements. I simply can't do any better without you supplying the proper schema of your database.
With the list of queries complete, we proceed to put the data into SQL:
with engine.connect() as connection:
with engine.begin():
connection.execute(queries)
Edit:
Obviously my solution does not deal in any way with things like if your pandas operations create datapoints for timestamps that are not in mysql, etc. You need to keep that in mind. If that is a problem, you will need to use queries of the form
INSERT INTO table (id,Col1,Col2) VALUES (1,1,1),(2,2,3),(3,9,3),(4,10,12)
ON DUPLICATE KEY UPDATE Col1=VALUES(Col1),Col2=VALUES(Col2);

update database, how to check it wrote the data

I am using Python to write on a mongoDB database a very large amount of data. Some of this data can actually overwrite old data already in the database. I am using pymongo and MongoClient using update function.
Since i am writing ten of thousands of datapoints in the database, can i make sure that the data is actually being written properly, how can i check if any data has not been written on mongoDB? I dont want to add to much code to that as it it already quite slow to download and write everything. If there is no easy answer, i will sacrifice speed but i want to make sure everything goes into mongoDB.
When you execute an insert_one or insert_many instruction, you should get a result value. You can check this to make sure that the insert was successful.
result = posts.insert_many(new_posts)
print result.inserted_id

python unusual textfile to database: strategy

I am pretty new to Python, so I like to ask you for some advice about the right strategy.
I've a textfile with fixed positions for the data, like this.
It can have more than 10000 rows. At the end the database (SQL) table should look like this. File & Table
The important col is nr. 42. It defines the kind of data in this row.
(2-> Titel, 3->Text 6->Amount and Price). So the data comes from different rows.
QUESTIONS:
Reading the Data: Since there are always more than 4 rows
containing the data, process them line by line, as soon as one sql
statement is complete, send it OR:read all the lines into a list of
lists, and then iterate over these lists? OR: read all the lines in
one list and iterate?
Would it be better to convert the data into a csv or json instead of preparing sql statements, and then use the database software to import to db? (Or use NoSQL DB)
I hope I made my problems clear, if not, I will try.....
Every advice is really appreciated.
The problem is pretty simple, so perhaps you are overthinking it a bit. My suggestion is to use the simplest solution: read a line, parse it, prepare an SQL statement and execute it. If the database is around 10000 records, anything would work, e.g. SQLLite would do just fine. The problem is in the form of a table already so translation to a relational database like SQLLite or MySQL is a pretty obvious and straightforward choice. If you need a different type of organization in your data then you can look at other types of databases: don't do it only because it is "fashionable".

add if no duplicate in collection mongodb python

i'm writing a script that puts a large number of xml files into mongodb, thus when i execute the script multiple times the same object is added many times to the same collection.
I checked out for a way to stop this behavior by checkinng the existance of the object before adding it, but can't find a way.
help!
The term for the operation you're describing is "upsert".
In mongodb, the way to upsert is to use the update functionality with upsert=True.
You can index on one or more fields(not _id) of the document/xml structure. Then make use of find operator to check if a document containing that indexed_field:value is present in the collection. If it returns nothing then you can insert new documents into your collection. This will ensure only new docs are inserted when you re-run the script.

Categories