Insert on mongodb if not duplicated - python

I write this script for insert a doc into Mongodb if not duplicated
import tldextract
from pymongo import MongoClient
client = MongoClient()
db = client.my_domains
collection = db.domain
with open('inputcut.csv', 'r') as f:
for line in f:
ext = tldextract.extract(line)
domain = {"domain":ext.registered_domain}
collection.update(domain,{'upsert':True})
When I run the script, no domains are inserted into the database.
I would like to insert a domain if it is not yet present in mongodb.
If the domain is already present, we do not insert it and we go to the next one ...
Thank you in advance for your help.

collection.update expects 3 arguments - the query, the update and the options. Since upsert should be in the options, rewrite the call as follows:
collection.update(domain, {$set: domain}, {'upsert':True})

Related

Trying to check if document with fields exists and if so edit it in pymongo

I'm trying to work a bit with pymongo, and I currently have a database that I need to look inside, and if the document with a specific field exists, then the document should be updated.
First I created a entry by running this a few times:
import pymongo
client = pymongo.MongoClient()
mydb = client["mydb"]
data = {'name': "john"}
mycol = mydb['something']
mycol.insert_one(data)
Which works the way I want it to.
Now, I need to check whether or not an entry exists where name = "john".
I followed this tutorial, which basically just shows this snippet:
db.student.find({name:{$exists:true}})
I tried to implement this, so it now looks like this:
import pymongo
from pymongo import cursor
client = pymongo.MongoClient()
mydb = client["mydb"]
print(mydb.something.find({"name":{"john"}}))
and this just returns <pymongo.cursor.Cursor object at 0x7fbf266239a0>
which I don't really know what to do with.
I also looked at some similar questions here, and found some suggestions for something like this:
print(mydb.values.find({"name" : "john"}).limit(1).explain())
But this just gives me a long json-looking string, which by the way doesnt change if I put other things in for "john".
So how do I check whether a document where "name" = "john" exists? and perhaps also then edit the document?
EDIT
I now tried the following solution:
import pymongo
from pymongo import cursor
client = pymongo.MongoClient()
mydb = client["mydb"]
mycol = mydb['something']
name = "john"
print(mycol.find_one({name:{"$exists":True}}))
But it only prints me None
Change find() to find_one(), or if you're expecting more than one result, iterate the cursor using a for loop:
print(db.student.find_one({'name':{'$exists': True}}))
or
for student in db.student.find({'name': {'$exists': True}}):
print(student)

LOAD DATA LOCAL INFILE with incremental field

I have multiple unstructured txt files in a directory and I want to insert all of them into mysql; basically, the entire content of each text file should be placed into a row . In MySQL, I have 2 columns: ID (auto increment), and LastName(nvarchar(45)). I used Python to connect to MySql; used LOAD DATA LOCAL INFILE to insert the whole content. But when I run the code I see the following messages in Python console:
.
Also, when I check MySql, I see nothing but a bunch of empty rows with Ids being automatically generated.
Here is the code:
import MySQLdb
import sys
import os
result = os.listdir("C:\\Users\\msalimi\\Google Drive\\s\\Discharge_Summary")
for x in result:
db = MySQLdb.connect("localhost", "root", "Pass", "myblog")
cursor = db.cursor()
file1 = os.path.join(r'C:\\Discharge_Summary\\'+x)
cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE clamp_test" %(file1,));
db.commit()
db.close()
Can someone please tell me what is wrong with the code? What is the right way to achieve my goal?
I edited my code with:
.....cursor.execute("LOAD DATA LOCAL INFILE '%s' INTO TABLE clamp_test LINES TERMINATED BY '\r' (Lastname) SET id = NULL" %(file1,))
and it worked :)

How to write csv file into sql database with python

I have csv file that include some information about computer as like ostype, ram, cpu value and ı have sql database that already has same information and ı want to updated that database table with by python script. database table and csv file has uniqe "id" parameters.
import csv
with open("Hypersanal.csv") as csvfile:
readCSV = csv.reader(csvfile, delimiter=';')
for row in readCSV:
print row
Depending on what type of database, there will be some slight adjustments to make to the code.
For this example, I'll use SQLAlchemy with the pymysql driver. To find out what the first part of the connection String should be (depends on the kind of database you want to connect to), check SQLAlchemy Doc about Dialects.
First, we import the necessary modules
from sqlalchemy import *
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base
Then, we create the connection string
dialect_part = "mysql+pymysql://"
# username is the usernmae we'll use to connect to the db, and password the corresponding password
# server_name is the name fo the server where the db is. It INCLUDES the port number (eg : 'localhost:9080')
# database is the name of the db on the server we'll work on
connection_string = dialect_part+username+":"+password+"#"+server_name+"/"+database
Some more setups needed for SQLAlchemy :
Base = declarative_base()
engine = create_engine(connection_string)
metadata = MetaData(bind=engine)
Now, we have a link to the db, but need some more work before being able to do anything to it.
We create a class corresponding to the table of the db we'll hit. This class will 'autofill' according to how the table is in the db. You can also fill it manually.
class TableWellHit(Base):
__table__ = Table(name_of_the_table, metadata, autoload=True)
Now, to be able to interact with the table, we need to create a session :
session = create_session(bind=engine)
Now, we need to begin the session, and we'll be set.
Your code will now be used.
import csv
with open("Hypersanal.csv") as csvfile:
readCSV = csv.reader(csvfile, delimiter=';')
for row in readCSV:
# print row
# I chose to push each value from the db one by one
# If you're sure there won't be any duplicates for the primary key, you can put the session.begin() before the for loop
session.begin()
# I create an element for the db
new_element = TableWellHit(field_in_table=row[field])
An example for this, imagine you have required fiels 'username' and 'password' in the table, and row contains a dictionnary containing 'user' and 'pass' as keys.
The elements will be created by : TableWellHit(username=row['user],password=row['pass'])
# I add the element to the table
# I choose to merge instead of add, so as to prevent duplicates, one more time
session.merge(new_element)
# Now, we commit our changes to the db
# This also closes the session
# if you put the session.begin() outside of the loop, do the same for the session.commit()
session.commit()
Hope this answers your question, and if he does not, just let me know so I can correct my answer.
edit :
For MSSQL :
- Install pymssql (pip install pymssql)
The connection_string should be of the following form, according to this SQLAlchemy page : mssql+pymssql://<username>:<password>#<freetds_name>/?charset=utf8
Using merge allows you to create or update a value, depending on whether or not it already exists.

comparing a given variable to data in a database and checking to see if it exists

I have a sqlite db of API keys and I want to make something check and see if the given key is in the database.I'm generating the API keys using another python script named apikeygen.py. I'm using python 2.7 and pattern 2.6. This is going to be a data scraping/mining/filtering application that I'm doing just for fun and maybe have a future use for malware analysis.
I need help getting the main piece of code that we will call API.py to check and see if the given API key is in the database.
This is the code for the API.py file so far.
import os, sys; sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
import sqlite3 as lite
from pattern.server import App
from pattern.server import MINUTE, HOUR, DAY
app = App("api")
def search_db(key=''):
con = lite.connect('apikeys.db')
with con:
cur = con.cursor()
cur.execute("SELECT * FROM keys")
while True:
row = cur.fetchone()
if row == None:
break
print row[2]
I'm still not really clear what you are asking. Why don't you explicitly query for the key, rather than iterating over your whole table?
cur.execute("SELECT * FROM keys WHERE key = ?", (key,))

Psycopg2 "copy_from" command, possible to ignore delimiter in quote (getting error)?

I am trying to load rows of data into postgres in a csv-like structure using the copy_from command (function to utilize copy command in postgres). My data is delimited with commas(and unfortunately since I am not the data owner I cannot just change the delimiter). I run into a problem when I try to load a row that has a value in quotes containing a comma (ie. that comma should not be treated as a delimiter).
For example this row of data is fine:
",Madrid,SN,,SEN,,,SN,173,157"
This row of data is not fine:
","Dominican, Republic of",MC,,YUO,,,MC,65,162",
Some code:
conn = get_psycopg_conn()
cur = conn.cursor()
_io_buffer.seek(0) #This buffer is holding the csv-like data
cur.copy_from(_io_buffer, str(table_name), sep=',', null='', columns=column_names)
conn.commit()
It looks like copy_from doesn't expose the csv mode or quote options, which are available form the underlying PostgreSQL COPY command. So you'll need to either patch psycopg2 to add them, or use copy_expert.
I haven't tried it, but something like
curs.copy_expert("""COPY mytable FROM STDIN WITH (FORMAT CSV)""", _io_buffer)
might be sufficient.
I had this same error and was able to get close to a fix based on the single line of code listed by craig-ringer. The other item I needed was to include quotes for the initial object by using df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',') and specifically , quoting=csv.QUOTE_NONNUMERIC.
The full example of pulling one data source from MySQL and storing it in Postgres is below:
#run in python 3.6
import MySQLdb
import psycopg2
import os
from io import StringIO
import pandas as pd
import csv
mysql_db = MySQLdb.connect(host="host_address",# your host, usually localhost
user="user_name", # your username
passwd="source_pw", # your password
db="source_db") # name of the data base
postgres_db = psycopg2.connect("host=dest_address dbname=dest_db_name user=dest_user password=dest_pw")
my_list = ['1','2','3','4']
# you must create a Cursor object. It will let you execute all the queries you need
mysql_cur = mysql_db.cursor()
postgres_cur = postgres_db.cursor()
for item in my_list:
# Pull cbi data for each state and write it to postgres
print(item)
mysql_sql = 'select * from my_table t \
where t.important_feature = \'' + item + '\';'
# Do something to create your dataframe here...
df = pd.read_sql_query(mysql_sql, mysql_db)
# Initialize a string buffer
sio = StringIO()
sio.write(df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',')) # Write the Pandas DataFrame as a csv to the buffer
sio.seek(0) # Be sure to reset the position to the start of the stream
# Copy the string buffer to the database, as if it were an actual file
with postgres_db.cursor() as c:
print(c)
c.copy_expert("""COPY schema:new_table FROM STDIN WITH (FORMAT CSV)""", sio)
postgres_db.commit()
mysql_db.close()
postgres_db.close()

Categories