Python Importing a CSV file into a sqlite3 remove replicates - python

I have a CSV file and I want to import this file into my sqlite3 database using Python. The column names of the CSV is the same with the column names of the database table, the following is the code i am using now.
df = pandas.read_csv(Data.csv)
df.to_sql(table_name, conn, index=False)
However it seems the command will import all data into the database, I am trying to only input the data that does not already exist in the database. Is there a way to do that without iterating every row of the csv or database?

Use the if_exists parameter.
df = pandas.read_csv(Data.csv)
df.to_sql(table_name,conn,if_exists='append',index=False)

Related

Inserting specific columns of csv file into mongodb collection using python script

I have a python script to insert a csv file into mongodb collection
import pymongo
import pandas as pd
import json
client = pymongo.MongoClient("mongodb://localhost:27017")
df = pd.read_csv("iris.csv")
data = df.to_dict(oreint = "records")
db = client["Database name"]
db.CollectionName.insert_many(data)
Here all the columns of csv files are getting inserted into mongo collection. How can I achieve a usecase where I want to insert only specific columns of csv file in the mongo collection .
What changes I can make to existing code.
Lets say I also have database already created in my Mongo. Will this command work even if the database is present (db = client["Database name"])
Have you checked out pymongoarrow? the latest release has write support where you can import a csv file into mongodb. Here are the release notes and documentation. You can also use mongoimport to import a csv file, documentation is here, but I can't see any way to exclude fields like the way you can with pymongoarrow.

Load the data from oracle database to csv files using python

I have written python script to fetch the data from oracle database and load that data to csv file.
import datetime
import pandas as pd
import cx_Oracle
con = cx_Oracle.connect('SYSTEM/oracle123#localhost:1521/xe')
c = con.cursor()
sql = "select * from covid_data"
res = c.execute(sql)
t = pd.read_sql(sql,con)
t.to_csv(r'C:\Users\abc\covid.csv')
I want the script to be run everyday and load to csv file. But the challenge which I am facing here is I want to fetch the data on daily basis but the data in csv file should be available for only that particular day. The contents of previous day should not be seen the next day
I found solution for this.
In the below line, we have to add write mode, so that it will not append the data to csv file instead it overwrites.
t.to_csv(r'C:\Users\abc\covid.csv', mode='w')

Pandas Dataframe to Postgres table conversion not working

I am converting a csv file into a Pandas dataframe and then converting it to Postgres table essentially.
The problem is that I am able to create a table in Postgres but I am unable to select column names from the table while querying it.
This is the sample code I have:
import pandas as pd
from sqlalchemy import create_engine
import psycopg2
engine = create_engine('postgresql://postgres:pwd#localhost:5432/test')
def convertcsvtopostgres(csvfileloc, table_name, delimiter):
data = pd.read_csv(csvfileloc, sep=delimiter, encoding='latin-1')
data.head()
data1 = data.rename(columns=lambda x: x.strip())
data1.to_sql(table_name, engine, index=False)
convertcsvtopostgres("Product.csv","t_product","~")
I can do a select * from test.t_product; but I am unable to do a select product_id from test.t_product;
I am not sure if that is happening because of the encoding of the file and the conversion because of that. Is there any way around this, since I do not want to specify the table structure each time.

Pandas/Python/Dropna: Renaming header column names after a dropna takes place with intention to import to MySQL

With the code below, I've successfully removed rows where values may be blank in my CSV file, which consists of 33 columns.
import pandas as pd
from sqlalchemy import create_engine
data = pd.read_csv('TestCSV.csv', sep=',')
data.dropna()
data.dropna().to_csv('CleanCSV.csv', index=False)
Now, the intention is to rename the 33 header columns within the file to my own, to then to import the contents of the new (with the newly named headers) into my MySQL database with the following code, which is missing the renaming of the headers:
data = pd.read_csv('CleanCSV.csv', sep=',')
cnx = create_engine('mysql+pymysql://root:password#localhost:3306/schema', echo=False)
data.to_sql(name='t_database', con=cnx, if_exists='append', index=False)
I've read up slightly on DataFrames but is this option still valid for when the contents is in a CSV file? If so, how do I assign the newly dropna's contents to a DataFrame and from there, rename the headers of the columns, after which I intend to import to MySQL?
Thank you in advance.
Before you create the new csv, do this
new_df = data.dropna().rename(columns={'oldcol1': 'newcol1', 'oldcol2': 'newcol2})
The columns argument is a dictionary with key and values as old and new column names respectively.

Update a SQLite3 database using CSVs and script automation

I have a sqlite database that is populated with values from csv files. I would like to create a script that when run:
deletes the old tables
creates new tables with the same schema (with newly updated values)
I noticed that sqlite script files don't accept ".mode csv" or .import "csv". Is there a way to automate this is with a script of some sort?
If you want a Python approach, you can use to_sql method from the pandas package to write to SQLite. Pandas can replace existing tables and automatically generate the schema based on the CSV file read.
import sqlite3
import pandas as pd
conn = sqlite3.connect('my.db')
# read the csv file
df = pd.read_csv("my.csv")
# write to SQLite
df.to_sql("my_tbl", conn, if_exists="replace")
conn.close()

Categories