Importing data from an excel file using python into SQL Server

Importing data from an excel file using python into SQL Server - python

I have found some other questions that have a similar error to what I am getting, but have not been able to figure out how to resolve this based on the answers. I am trying to import an excel file into SQL Server with the help of python. This is the code I wrote:
import pandas as pd
import numpy as np
import pandas.io.sql
import pyodbc
import xlrd
server = "won't disclose private info"
db = 'private info'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + Server + ';DATABASE=' +
db + ';Trusted_Connection=yes')
cursor = conn.cursor()
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")
query1 = """CREATE TABLE [LEAF].[MK] ([LEAF][Lease_Number] varchar(255),
[LEAF][Start_Date] varchar(255), [LEAF][Report_Status] varchar(255), [LEAF]
[Status_Date] varchar(255), [LEAF][Current_Status] varchar(255), [LEAF]
[Sales_Rep] varchar(255), [LEAF][Customer_Name] varchar(255),[LEAF]
[Total_Finance] varchar(255),
[LEAF][Rate_Class] varchar(255) ,[LEAF][Supplier_Name] varchar(255) ,[LEAF]
[DecisionStatus] varchar(255))"""
query = """INSERT INTO [LEAF].[MK] (Lease_Number, Start_Date, Report_Status,
Status_Date, Current_Status, Sales_Rep, Customer_Name,Total_Finance,
Rate_Class,Supplier_Name,DecisionStatus) VALUES (%s, %s, %s, %s, %s, %s, %s,
%s, %s, %s, %s)"""
for r in range(1, sheet.nrows):
Lease_Number = sheet.cell(r,0).value
Start_Date = sheet.cell(r,1).value
Report_Status = sheet.cell(r,2).value
Status_Date = sheet.cell(r,3).value
Current_Status= sheet.cell(r,4).value
Sales_Rep = sheet.cell(r,5).value
Customer_Name = sheet.cell(r,6).value
Total_Financed= sheet.cell(r,7).value
Rate_Class = sheet.cell(r,8).value
Supplier_Name = sheet.cell(r,9).value
DecisionStatus= sheet.cell(r,10).value
values = (Lease_Number, Start_Date, Report_Status, Status_Date,
Current_Status, Sales_Rep, Customer_Name, Total_Financed, Rate_Class,
Supplier_Name, DecisionStatus)
cursor.execute(query1)
cursor.execute(query, values)
database.commit()
database.close()
database.commit()
The error message I get is:
ProgrammingError Traceback (most recent call last)
<ipython-input-24-c525ebf0af73> in <module>()
16
17 # Execute sql Query
---> 18 cursor.execute(query, values)
19
20 # Commit the transaction
ProgrammingError: ('The SQL contains 0 parameter markers, but 11 parameters
were supplied', 'HY000')
Can someone please explain the problem to me and how I can fix it? Thank you!
Update:
I have gotten that error message to go away based on the comments below. I modified my query also because the table into which I am trying to insert values into was not previously created, so I updated my code in an attempt to create it.
However, now I am getting the error message:
ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL
Server]The specified schema name "dbo" either does not exist or you do not
have permission to use it. (2760) (SQLExecDirectW)')
I tried changing that slightly by writing CREATE [HELLO][MK] instead of just CREATE MK but that tells me that MK is already in the database... What steps should I take next?

Based on the conversation we had in our chat, here are a few takeaways:
After executing your CREATE TABLE query, make sure to commit it immediately before running any subsequent INSERT queries.
Use error catching for cases when the table already exists in the database. You asked that if you wanted to import more data to the table, would the script still run. The answer is no, since Python will throw an exception at cursor.execute(query1).
If you want to validate whether your insert operations were successful, you can do a simple record count check.
EDIT
Yesterday, when I had #mkheifetz test my code out, he caught a minor bug where the validation check would return False, and the reason was because the database already had existing records, so when comparing against only the current data being imported, the validation would fail. Therefore, as a solution to address the bug, I have modified the code again.
Below is how I would modify your code:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
import pandas.io.sql
import pyodbc
import xlrd
server = 'XXXXX'
db = 'XXXXXdb'
# create Connection and Cursor objects
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
cursor = conn.cursor()
# read data
data = pd.read_excel('Flash Daily Apps through 070918.xls')
# rename columns
data = data.rename(columns={'Lease Number': 'Lease_Number',
'Start Date': 'Start_Date',
'Report Status': 'Report_Status',
'Status Date': 'Status_Date',
'Current Status': 'Current_Status',
'Sales Rep': 'Sales_Rep',
'Customer Name': 'Customer_Name',
'Total Financed': 'Total_Financed',
'Rate Class': 'Rate_Class',
'Supplier Name': 'Supplier_Name'})
# export
data.to_excel('Daily Flash.xlsx', index=False)
# Open the workbook and define the worksheet
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")
query1 = """
CREATE TABLE [LEAF].[ZZZ] (
Lease_Number varchar(255),
Start_Date varchar(255),
Report_Status varchar(255),
Status_Date varchar(255),
Current_Status varchar(255),
Sales_Rep varchar(255),
Customer_Name varchar(255),
Total_Finance varchar(255),
Rate_Class varchar(255),
Supplier_Name varchar(255),
DecisionStatus varchar(255)
)"""
query = """
INSERT INTO [LEAF].[ZZZ] (
Lease_Number,
Start_Date,
Report_Status,
Status_Date,
Current_Status,
Sales_Rep,
Customer_Name,
Total_Finance,
Rate_Class,
Supplier_Name,
DecisionStatus
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"""
# execute create table
try:
cursor.execute(query1)
conn.commit()
except pyodbc.ProgrammingError:
pass
# grab existing row count in the database for validation later
cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
before_import = cursor.fetchone()
for r in range(1, sheet.nrows):
Lease_Number = sheet.cell(r,0).value
Start_Date = sheet.cell(r,1).value
Report_Status = sheet.cell(r,2).value
Status_Date = sheet.cell(r,3).value
Current_Status= sheet.cell(r,4).value
Sales_Rep = sheet.cell(r,5).value
Customer_Name = sheet.cell(r,6).value
Total_Financed= sheet.cell(r,7).value
Rate_Class = sheet.cell(r,8).value
Supplier_Name = sheet.cell(r,9).value
DecisionStatus= sheet.cell(r,10).value
# Assign values from each row
values = (Lease_Number, Start_Date, Report_Status, Status_Date, Current_Status,
Sales_Rep, Customer_Name, Total_Financed, Rate_Class, Supplier_Name,
DecisionStatus)
# Execute sql Query
cursor.execute(query, values)
# Commit the transaction
conn.commit()
# If you want to check if all rows are imported
cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
result = cursor.fetchone()
print((result[0] - before_import[0]) == len(data.index)) # should be True
# Close the database connection
conn.close()

Related

ON CONFLICT DO UPDATE syntax and EXCLUDED error on cursor.executemany

I have a simplified postgres (ver 13) table below with updated rows generated in python with psycopg2.
My question is when I update the price field in the rows, I can't complete the update because of the following errors of ON CONFLICT DO UPDATE. If I don't use ON CONFLICT DO UPDATE , I can update the chart but I would like ON CONFLICT DO UPDATE because it eliminates duplicate rows.
With ON CONFLICT DO UPDATE , I only need to update the fields "price" and "last_updated" but update only when the rows match the "id,item,original_price_date"
The following errors I get ON CONFLICT DO UPDATE :
Error : syntax error at or near "="
# update the prices within the existing data
df = pd.DataFrame(np.array([['5/3/2010', 'rock', 15],
['4/15/2010', 'paper', 11],
['2/3/2015', 'scissor', 13]]),
columns = ['original_price_date', 'item', 'price'])
tuples_for_dB = [tuple(x) for x in df.to_numpy()]
sql_script = '''INSERT INTO ''' + TABLE_ + ''' (
original_price_date, item, price, created_date, last_updated)
VALUES (%s, %s, %s, transaction_timestamp(), transaction_timestamp())
ON CONFLICT (id, item, original_price_date)
DO UPDATE SET (price, last_updated = EXCLUDED.price, EXCLUDED.transaction_timestamp());'''
Error : relation "price_data" does not exist
sql_script = '''INSERT INTO ''' + TABLE_ + ''' (
original_price_date, item, price, created_date, last_updated)
VALUES (%s, %s, %s, transaction_timestamp(), transaction_timestamp())
ON CONFLICT (id, item, original_price_date)
DO UPDATE SET (price, last_updated) = (EXCLUDED.price, EXCLUDED.transaction_timestamp());'''
My original creation of the data :
# postGRESQL connection details
DATABASE_INITIAL_ = 'postgres'
DATABASE_ = 'data'
TABLE_ = 'price_data'
USER_ = 'postgres'
SERVERNAME_ = 'localhost'
PASSWORD_ = password_
HOST_ = '127.0.0.1'
PORT_ = '5432'
#establishing the connection
conn = psycopg2.connect(database = DATABASE_,
user = USER_,
password = PASSWORD_,
host = HOST_,
port = PORT_);
conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT);
conn.autocommit = True
# Creating a cursor object using the cursor() method
cursor = conn.cursor()
sql = "SELECT 1 FROM pg_catalog.pg_database WHERE datname = " + "'" + DATABASE_ + "'"
cursor.execute(sql)
# If dB does not exist create the dB
exists = cursor.fetchone()
print(exists)
if not exists:
print('does not exist')
#Preparing query to create a database
sql = '''CREATE database '''+DATABASE_;
#Creating a database
cursor.execute(sql)
# Creating the table
sql = '''CREATE TABLE IF NOT EXISTS ''' + TABLE_ + ''' (
id SERIAL PRIMARY KEY,
original_price_date DATE NOT NULL,
item TEXT NOT NULL,
price NUMERIC NULL DEFAULT NULL,
created_date TIMESTAMPTZ NULL DEFAULT TRANSACTION_TIMESTAMP(),
last_updated TIMESTAMPTZ NULL DEFAULT TRANSACTION_TIMESTAMP());'''
cursor.execute(sql)
# update the table with data
df = pd.DataFrame(np.array([['5/3/2010', 'rock', 0.9],
['4/15/2010', 'paper', 6.5],
['2/3/2015', 'scissor', 3.9],
['3/23/2017', 'ball', 1.1],
['4/7/2013', 'tire', 5.4]]),
columns = ['original_price_date', 'item', 'price'])
tuples_for_dB = [tuple(x) for x in df.to_numpy()]
sql_script = '''INSERT INTO ''' + TABLE_ + ''' (
original_price_date, item, price, created_date, last_updated)
VALUES (%s, %s, %s, transaction_timestamp(), transaction_timestamp());'''
try:
cursor.executemany(sql_script, tuples_for_dB);
success = True
except psycopg2.Error as e:
error = e.pgcode
print(f'Error : {e.args[0]}')
success = False
if success:
print(f'\nData inserted successfully........')
print(f'Table INSERT sql commit comment :\n"{sql_script}"\n')
elif success == False:
print(f'\nData NOT inserted successfully XXXXXX')
# Preparing query to drop a table
sql = '''DROP TABLE IF EXISTS ''' + TABLE_ + ";"
# Creating the table
cursor.execute(sql)
conn.close()

I added a constraint row (CONSTRAINT com UNIQUE (original_price_date,item))) where I created the table.
sql = '''CREATE TABLE IF NOT EXISTS ''' + TABLE_ + ''' (
id SERIAL PRIMARY KEY,
original_price_date DATE NOT NULL,
item TEXT NOT NULL,
price NUMERIC NULL DEFAULT NULL,
created_date TIMESTAMPTZ NULL DEFAULT TRANSACTION_TIMESTAMP(),
last_updated TIMESTAMPTZ NULL DEFAULT TRANSACTION_TIMESTAMP(),
CONSTRAINT com UNIQUE (original_price_date,item));'''
Then I could insert the data NOT creating duplicate rows of (original_price_date,item) by the following statement.
sql = '''INSERT INTO ''' + TABLE_ + '''(original_price_date, item, price)
VALUES (%s, %s, %s)
ON CONFLICT (original_price_date, item)
DO UPDATE
SET (price, last_updated) = (EXCLUDED.price,TRANSACTION_TIMESTAMP());'''

Error while updating MySQL DB from PostgreSQL DB

I need to update/insert rows to MySQL database using the data from Postgres DB.So here is the script which i'm using but getting the below error while i schedule this in Jenkins.
Can anyone please guide on what i can do/change to rectify this.
File "signup.py", line 80, in <module>
11:59:27 cur_msql_1.execute(msql_insert_1, row)
11:59:27 File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 209, in execute
11:59:27 res = self._query(query)
11:59:27 File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 315, in _query
11:59:27 db.query(q)
11:59:27 File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 239, in query
11:59:27 _mysql.connection.query(self, query)
11:59:27 MySQLdb._exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \'"timestamp", ip, store_id, confirmed_at) SELECT \'user123#gmail.com\', 15463\' at line 2')
11:59:27 Build step 'Execute shell' marked build as failure
11:59:27 Finished: FAILURE
Below is the entire code:
import psycopg2
import os
import time
import MySQLdb
import sys
from pprint import pprint
from datetime import datetime
from utils.config import Configuration as Config
from utils.postgres_helper import get_connection
from utils.utils import get_global_config
# MySQLdb connection
try:
source_host = 'magento'
conf = get_global_config()
cnx_msql = MySQLdb.connect(host=conf.get(source_host, 'host'),
user=conf.get(source_host, 'user'),
passwd=conf.get(source_host, 'password'),
port=int(conf.get(source_host, 'port')),
db=conf.get(source_host, 'db'))
print('Magento MySQL DB Connected')
except mysql.connector.Error as e:
print ("MYSQL: Unable to connect!", e.msg)
sys.exit(1)
# Postgresql connection
try:
cnx_psql = get_connection(get_global_config(), 'pg_dwh')
print('DWH PostgreSQL DB Connected')
except psycopg2.Error as e:
print('PSQL: Unable to connect!\n{0}').format(e)
sys.exit(1)
# Cursors initializations
cur_msql = cnx_msql.cursor()
cur_msql_1 = cnx_msql.cursor()
cur_psql = cnx_psql.cursor()
cur_psql_1 = cnx_psql.cursor()
now = time.strftime('%Y-%m-%d %H:%M:%S')
##################################################################################
update_sql_base="""select gr.email from unsubscribed_contacts gr
INNER JOIN subscriber sn on sn.email=gr.email"""
msql_update_1="""UPDATE subscriber SET status=3,timestamp=CAST(TO_CHAR(now(),'YYYY-MM-DD HH24:MI:SS') AS TIMESTAMP) WHERE email='%s'"""
msql_update_2="""UPDATE n_subscriber SET subscriber_status=3,change_status_at=CAST(TO_CHAR(now(),'YYYY-MM-DD HH24:MI:SS') AS TIMESTAMP)
WHERE subscriber_email='%s';"""
cur_psql.execute(update_sql_base)
for row in cur_psql:
email=row[0]
cur_msql.execute(msql_update_1 %email)
cnx_msql.commit()
cur_msql.execute(msql_update_2 %email)
cnx_msql.commit()
##################################################################################
insert_sql_base="""select gr.email,c.customer_id,'',3,'',CAST(TO_CHAR(now(),'YYYY-MM-DD HH24:MI:SS') AS TIMESTAMP),'','',CAST(TO_CHAR(now(),'YYYY-MM-DD HH24:MI:SS') AS TIMESTAMP)
from unsubscribed_contacts gr
LEFT JOIN n_subscriber sn on sn.email=gr.email
LEFT JOIN customers_2 c on c.customer_email=gr.email
WHERE sn.email IS NULL"""
msql_insert="""INSERT INTO n_subscriber(
email, customer_id, options, status, confirm_code, "timestamp", ip, store_id, confirmed_at) SELECT """
msql_insert_1="""INSERT INTO n_subscriber(
email, customer_id, options, status, confirm_code, "timestamp", ip, store_id, confirmed_at) SELECT %s, %s, %s, %s, %s, %s, %s, %s, %s"""
cur_psql_1.execute(insert_sql_base)
for row in cur_psql_1:
print(msql_insert_1)
cur_msql_1.execute(msql_insert_1, row)
cnx_msql.commit()
## Closing cursors'
cur_msql.close()
cur_psql.close()
cur_psql_1.close()
cur_msql_1.close()
## Closing database connections
cnx_msql.close()
cnx_psql.close()
Python : 3.5
PostgreSQL: Version 11

The main problem is wrong syntax(cur_msql_1.execute(msql_insert_1, row)). Just trying to explain using a few tables:
create table subscriber
(
customer_id int null,
email varchar(100) null,
timestamp int null
);
INSERT INTO subscriber (customer_id, email, timestamp) VALUES (1, 'test1#gmail.com', 1591187277);
INSERT INTO subscriber (customer_id, email, timestamp) VALUES (2, 'test2#gmail.com', 1591187303);
create table n_subscriber
(
customer_id int null,
email varchar(100) null,
timestamp int null
);
in your case it works something like this:
import MySQLdb
db = MySQLdb.connect(...)
cursor = db.cursor()
cursor.execute("SELECT customer_id, email, timestamp FROM subscriber")
for row in cursor:
cursor.execute("""INSERT INTO n_subscriber(customer_id, email, "timestamp") SELECT %s, %s, %s""", row)
db.commit()
MySQLdb._exceptions.ProgrammingError: (1064, 'You have an error in
your SQL syntax; check the manual that corresponds to your MySQL
server version for the right syntax to use near \'"timestamp") SELECT
1, \'test1#gmail.com\', 1591187277\' at line 1')
Correct syntax:
cursor.execute("INSERT INTO n_subscriber(customer_id, email, timestamp) VALUES (%s, %s, %s)", row)
Also you can do it using executemany():
cursor = db.cursor()
cursor.execute("SELECT customer_id, email, timestamp FROM subscriber")
data = cursor.fetchall()
cursor.executemany("INSERT INTO n_subscriber(customer_id, email, timestamp) VALUES (%s, %s, %s)", data)
db.commit()
Hope this helps.

Get API-endpoint and store it in a SQLite (Python)

As you can see I am trying to fetch data from this API-endpoint https://api.coindesk.com/v1/bpi/currentprice.json and I have chosen few data I want to fetch and store it in SQLite.
When I try to save it in a database it gives me this error:
Traceback (most recent call last):
File "bitcoin.py", line 41, in <module>
cur.execute("INSERT INTO COINS (Identifier, symbol, description) VALUES (?, ?, ?);", to_db)
sqlite3.ProgrammingError: Binding 1 has no name, but you supplied a dictionary (which has only names).
How can I store the some of the data from API-endpoint into the database?
I'm doing this to learn programming and still new to this so hopefully, you can guide me in the right way.
Here is what I have tried so far:
import requests
import sqlite3
con = sqlite3.connect("COINS.db")
cur = con.cursor()
cur.execute('DROP TABLE IF EXISTS COINS')
cur.execute(
"CREATE TABLE COINS (Identifier INTEGER PRIMARY KEY, symbol TEXT, description TEXT);"
)
r = requests.get('https://api.coindesk.com/v1/bpi/currentprice.json')
to_db = r.json() # I do not have to do it in json, CSV would also be another
# solution but the data that is been stored cannot be static.
# It has to automatically fetch the data from API-endpoint
cur.execute("INSERT INTO COINS (Identifier, symbol, description) VALUES (?, ?, ?);", to_db)
con.commit()
con.close()

import requests
import sqlite3
con = sqlite3.connect("COINS.db")
cur = con.cursor()
cur.execute('DROP TABLE IF EXISTS COINS')
cur.execute(
"CREATE TABLE COINS (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENTUNIQUE,
symbol TEXT, description TEXT);")
r = requests.get('https://api.coindesk.com/v1/bpi/currentprice.json')
to_db = r.json()
des=to_db['bpi']['USD']['description']
code=to_db['bpi']['USD']['code']
cur.execute("INSERT INTO COINS (symbol, description) VALUES (?, ?);",
(des,code))
con.commit()
con.close()
Check full code

pyodbc executemany on merge statement with returning values

I am using MS SQL Server 2014. I have following code:
import pyodbc, time
cnxn = pyodbc.connect('DRIVER={ODBC Driver 11 for SQL Server};SERVER=192.xxx.xxx.xxx;DATABASE=test;UID=xx;PWD=xxxxx')
cursor = cnxn.cursor()
# cursor.fast_executemany = True
values = [(x, "one_{0}".format(x), "ONE_{0}".format(x), 100) for x in range(300000)]
cursor.execute("""
CREATE TABLE employees
( id INT NOT NULL PRIMARY KEY,
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50),
salary MONEY
);
""")
cursor.commit()
t1 = time.time()
cursor.executemany("""
MERGE employees USING (
VALUES
(?, ?, ?, ?)
) AS vals (id, last_name, first_name, salary)
ON employees.id = vals.id
WHEN MATCHED THEN
UPDATE SET
last_name = vals.last_name,
first_name = vals.first_name,
salary = vals.salary
WHEN NOT MATCHED THEN
INSERT (id, last_name, first_name, salary)
VALUES (vals.id, vals.last_name, vals.first_name, vals.salary)
OUTPUT inserted.id, inserted.last_name, inserted.first_name, inserted.salary;
""", values )
cursor.commit()
t2 = time.time()
print(t2 - t1)
Here, if measure the performance of above snippet without enabling fast_executemany, it will take 35-40 seconds. And if I enable the fast_executemany with, then it will take just 500 milliseconds.
if I tried to fetch the results as cursor.fetchmany() or cursor.fetchall(); it gives an error as:
---------------------------------------------------------------------------
ProgrammingError Traceback (most recent call last)
<ipython-input-10-d84781edf8ac> in <module>()
----> 1 cursor.fetchmany()
ProgrammingError: No results. Previous SQL was not a query.
I want the returning values of above upsert behaviour. In my above snippet, I have used 300000 samples. That is my normal scenario.
Is it possible with MSSQL Server 2014 and pyodbc driver? any other way to achieve the desired result in MSSQL Server?

Populating a MySQL table with scraped data

I'm using Python 3, MySQL, Sequel Pro and BeautifulSoup.
Put simply, I want to create a SQL table and then insert my downloaded data into that data.
I've used this answer as a template to build the SQL part Beautiful soup webscrape into mysql, but it won't work.
Errors thrown:
line 86 finally:SyntaxError: invalid syntax
When I comment out this last finally: (just see if the rest of the code worked) I get:
InternalError: (1054, "Unknown column 'address' in 'field list'")
Another common error I got was:
ProgrammingError: (1146, "Table 'simple_scrape.simple3' doesn't exist",
though I can't remember the exact changes I made to end up with this error.
Finally- I started to learn programming (not just Python, but 'programming') less than four weeks ago- if you're wondering why I've done something stupid or inefficient it's almost certainly because that was the first way I got it to work!
Please help!
Code:
from selenium import webdriver
#Guess BER Number
for i in range(108053983,108053985):
try:
# ber_try = 100000000
ber_try =+i
#Open page & insert BER Number
browser = webdriver.Firefox()
type(browser)
browser.get('https://ndber.seai.ie/pass/ber/search.aspx')
ber_send = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_txtBERNumber')
ber_send.send_keys(ber_try)
#click search
form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_Bottomsearch')
form.click()
#click intermediate page
form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_gridRatings_gridview_ctl02_ViewDetails')
form.click()
#scrape the page
import bs4
soup = bs4.BeautifulSoup(browser.page_source)
# First Section
ber_dec = soup.find('fieldset', {'id':'ctl00_DefaultContent_BERSearch_fsBER'})
address = ber_dec.find('div', {'id':'ctl00_DefaultContent_BERSearch_dfBER_div_PublishingAddress'})
address = (address.get_text(', ').strip())
print(address)
date_issue = ber_dec.find('span', {'id':'ctl00_DefaultContent_BERSearch_dfBER_container_DateOfIssue'})
date_issue = date_issue.get_text().strip()
print(date_issue)
except:
print('Invalid BER Number:', ber_try)
browser.quit()
#connecting to mysql
finally:
import pymysql.cursors
from pymysql import connect, err, sys, cursors
#Making the connection
connection = pymysql.connect(host = '127.0.0.1',
port = 3306,
user = 'root',
passwd = 'root11',
db = 'simple_scrape',
cursorclass=pymysql.cursors.DictCursor);
with connection.cursor() as cursor:
sql= """CREATE TABLE `simple3`(
(
`ID` INT AUTO_INCREMENT NOT NULL,
`address` VARCHAR( 200 ) NOT NULL,
`date_issue` VARCHAR( 200 ) NOT NULL,
PRIMARY KEY ( `ID` )
)Engine = MyISAM)"""
sql = "INSERT INTO `simple3` (`address`, `date_issue`) VALUES (%s, %s)"
cursor.execute(sql, (address, date_issue))
connection.commit()
finally:
connection.close()
browser.quit()

Issues:
And actually create the table
sql= """CREATE TABLE simple3(
(
ID INT AUTO_INCREMENT NOT NULL,
address VARCHAR( 200 ) NOT NULL,
date_issue VARCHAR( 200 ) NOT NULL,
PRIMARY KEY ( ID )
)Engine = MyISAM)"""
// Added this line since your table was not being created.
cursor.execute(sql)
sql = "INSERT INTO simple3 (address, date_issue) VALUES (%s, %s)"
cursor.execute(sql, (address, date_issue))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Importing data from an excel file using python into SQL Server - python

Related

ON CONFLICT DO UPDATE syntax and EXCLUDED error on cursor.executemany

Error while updating MySQL DB from PostgreSQL DB

Get API-endpoint and store it in a SQLite (Python)

pyodbc executemany on merge statement with returning values

Populating a MySQL table with scraped data

Categories

Resources