I have a python script (written in Jupyter notebook) and I would like to run this script in Azure. The python script basically gets data from API source (which updated every 24 hours) and updates the SQL database which is Azure. So this automated python script will update the database table whenever it runs
Can someone please me with this?
Below is the python code i have written,
import pyodbc
import requests
import json
import pandas as pd
responses = requests.get("https://data.buffalony.gov/resource/d6g9-xbgu.json")
crime_data = json.loads(responses.text)
dic = {}
dic = crime_data
df = pd.DataFrame.from_dict(dic)
dff = df[['case_number','day_of_week','incident_datetime','incident_description','incident_id','incident_type_primary']].copy()
connection = pyodbc.connect ('Driver={ODBC Driver 17 for SQL Server};Server=servername;Database=Databasename;UID=admin;PWD=admin')
cur = connection.cursor()
row = []
for i in range(dff.shape[0]):
row.append(dff.iloc[i].tolist())
sql = '''\
INSERT INTO [dbo].[FF] ([case_number],[day_of_week],[incident_datetime],[incident_description],[incident_id],[incident_type_primary]) values (?,?,?,?,?,?)
'''
for i in range(dff.shape[0]):
cur.execute(sql,row[i])
connection.commit()
I don't use azure and jupyter notebook but I think I have a solution
If you leave your computer run all night change your code into this :
import time
import pyodbc
import requests
import json
import pandas as pd
while 1:
responses = requests.get("https://data.buffalony.gov/resource/d6g9-xbgu.json")
crime_data = json.loads(responses.text)
dic = {}
dic = crime_data
df = pd.DataFrame.from_dict(dic)
dff = df [['case_number','day_of_week','incident_datetime','incident_description','incident_i d','incident_type_primary']].copy()
connection = pyodbc.connect ('Driver={ODBC Driver 17 for SQL Server};Server=servername;Database=Databasename;UID=admin;PWD=admin')
cur = connection.cursor()
row = []
for i in range(dff.shape[0]):
row.append(dff.iloc[i].tolist())
sql = '''\
INSERT INTO [dbo].[FF] ([case_number],[day_of_week],[incident_datetime], [incident_description],[incident_id],[incident_type_primary]) values (?,?,?,?,?,?)
'''
for i in range(dff.shape[0]):
cur.execute(sql,row[i])
connection.commit()
time.sleep(86400)
if not create a new python program in the startup file like this:
import time, os
while 1:
if time.ctime()[11:13] >= "update hour" and time.ctime()[0:4] != open("path/to/any_file.txt").read():
file = open("path/to/any_file.txt", "w")
file.write(time.ctime()[0:4])
file.close()
os.system("python /path/to/file.py")
A task scheduler like Azure WebJobs will do this for you.
Related
I have below codes to connect to my database with a simple select statement, which works fine.
However, if I wanted to read from an sql file (say the name is "sqlcodes.sql") and then execute the code to pull data, how do I do that?
import pandas as pd
import numpy as np
import sqlalchemy as sqla
def new_connection():
print('creating new connection')
return sqla.create_engine(r"mssql+pyodbc://PROD_DSN",echo=False).connect()
if __name__ == "__main__":
#create DB connection
conn = new_connection()
sql = f"select * from Exp.RISK_FACTOR"
#Read into a DataFrame
df = pd.read_sql(sql,conn)
I am on Linux platform with Cassandra database. I want to insert Images data into Cassandra database using Python Code from a remote server. Previously, I had written a python code that inserts Images' data into MySQL database from a remote server. Please see the code below for MySQL
#!/usr/bin/python
# -*- coding: utf-8 -*-
import MySQLdb as mdb
import psycopg2
import sys
import MySQLdb
def read_image(i):
filename="/home/faban/Downloads/Python/Python-Mysql/images/im"
filename=filename+str(i)+".jpg"
print(filename)
fin = open(filename)
img = fin.read()
return img
con = MySQLdb.connect("192.168.50.12","root","faban","experiments" )
with con:
print('connecting to database')
range_from=input('Enter range from:')
range_till=input('Enter range till:')
for i in range(range_from,range_till):
cur = con.cursor()
data = read_image(i)
cur.execute("INSERT INTO images VALUES(%s, %s)", (i,data, ))
cur.close()
con.commit()
con.close()
This code successfully inserts data into MySQL database which is located at .12
I want to modify the same code to insert data into Cassandra database which is also located at .12
Please help me out in this regard.
If I create a simple table like this:
CREATE TABLE stackoverflow.images (
name text PRIMARY KEY,
data blob);
I can load those images with Python code that is similar to yours, but with some minor changes to use the DataStax Python Cassandra driver (pip install cassandra-driver):
#imports for DataStax Cassandra driver and sys
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.cluster import SimpleStatement
import sys
#reading my hostname, username, and password from the command line; defining my Cassandra keyspace as as variable.
hostname=sys.argv[1]
username=sys.argv[2]
password=sys.argv[3]
keyspace="stackoverflow"
#adding my hostname to an array, setting up auth, and connecting to Cassandra
nodes = []
nodes.append(hostname)
auth_provider = PlainTextAuthProvider(username=username, password=password)
ssl_opts = {}
cluster = Cluster(nodes,auth_provider=auth_provider,ssl_options=ssl_opts)
session = cluster.connect(keyspace)
#setting my image name, loading the file, and reading the data
name = "IVoidWarranties.jpg"
fileHandle = open("/home/aploetz/Pictures/" + name)
imgData = fileHandle.read()
#preparing and executing my INSERT statement
strCQL = "INSERT INTO images (name,data) VALUES (?,?)"
pStatement = session.prepare(strCQL)
session.execute(pStatement,[name,imgData])
#closing my connection
session.shutdown()
Hope that helps!
In Python version 2.7.6
Pandas version 0.18.1
MySQL 5.7
import MySQLdb as dbapi
import sys
import csv
import os
import sys, getopt
import pandas as pd
df = pd.read_csv('test.csv')
rows = df.apply(tuple, 1).unique().tolist()
db=dbapi.connect(host=dbServer,user=dbUser,passwd=dbPass)
cur=db.cursor()
for (CLIENT_ID,PROPERTY_ID,YEAR) in rows:
INSERT_QUERY=("INSERT INTO {DATABASE}.TEST SELECT * FROM {DATABASE}_{CLIENT_ID}.TEST WHERE PROPERTY_ID = {PROPERTY_ID} AND YEAR = {YEAR};".format(
CLIENT_ID=CLIENT_ID,
PROPERTY_ID=PROPERTY_ID,
YEAR=YEAR,
DATABASE=DATABASE
))
print INSERT_QUERY
cur.execute(INSERT_QUERY)
db.query(INSERT_QUERY)
This will print out the query I am looking for, however, without successfully returning the results of INSERT INTO when I checked the results in MySQL
INSERT INTO test.TEST SELECT * FROM test_1.TEST WHERE PROPERTY_ID = 1 AND YEAR = 2015;
However, if I just copy and paste this MySQL query into MySQL GUI, it will execute without any problem. Could any guru enlighten?
I also tried the following
cur.execute(INSERT_QUERY, multi=True)
Returns an error
TypeError: execute() got an unexpected keyword argument 'multi'
The answer here is we need to use "from mysql.connector" and a db.commit(). Here is a good example
http://www.mysqltutorial.org/python-mysql-insert/
import MySQLdb as dbapi
import mysql.connector
import sys
import csv
import os
import sys, getopt
import pandas as pd
df = pd.read_csv('test.csv')
rows = df.apply(tuple, 1).unique().tolist()
db=dbapi.connect(host=dbServer,user=dbUser,passwd=dbPass)
cur=db.cursor()
conn = mysql.connector.connect(host=dbServer,user=dbUser,port=dbPort,password=dbPass)
cursor=conn.cursor()
for (CLIENT_ID,PROPERTY_ID,YEAR) in rows:
INSERT_QUERY=("INSERT INTO {DATABASE}.TEST SELECT * FROM {DATABASE}_{CLIENT_ID}.TEST WHERE PROPERTY_ID = {PROPERTY_ID} AND YEAR = {YEAR};".format(
CLIENT_ID=CLIENT_ID,
PROPERTY_ID=PROPERTY_ID,
YEAR=YEAR,
DATABASE=DATABASE
))
print INSERT_QUERY
cursor.execute(INSERT_QUERY)
conn.commit()
Only by having the commit, the database/ table changes will be accepted
I was using mysql-connector pool, trying to insert a new row into a table, and got the same problem. The version info: mysql-8, python3.7.
The solution is to add connection.commit at last even you didn't start transaction.
I am accessing oracle database and trying to update it using python. Below is my code :
import cx_Oracle
import pandas as pd
import datetime
import numpy
import math
conn = cx_Oracle.connect(conn_str)
c = conn.cursor()
def update_output_table(customer_id_list,column_name,column_vlaue_list) :
num_rows_to_add = len(customer_id_list)
conn = cx_Oracle.connect(conn_str)
c = conn.cursor()
for i in range(0,num_rows_to_add,1) :
c.execute("""UPDATE output SET """+column_name+""" = %s WHERE customer_id = %s""" %(column_vlaue_list[i],customer_id_list[i]))
total_transaction_df = pd.read_sql("""select distinct b.customer_id,count(a.transaction_id) as total_transaction from transaction_fact a,customer_dim b where a.customer_id = b.CUSTOMER_ID group by b.CUSTOMER_ID""",conn)
# Update this details to the output table
update_output_table(list(total_transaction_df['CUSTOMER_ID']),'TOTAL_TRANSACTION',list(total_transaction_df['TOTAL_TRANSACTION']))
conn.close()
My program is getting executed completely but I don't see my database table getting updated. Can someone suggest where I am going wrong?
Note : I am a newbie.Sorry for asking silly doubts. Thanks in advance.
You're missing conn.commit() before conn.close():
Here you will find some info why you need it explicitely. Without commit your code is doing update then when closing connection all non-commited changes are rolled back so you see no changes in DB.
You can also set cx_Oracle.Connection.autocommit = 1 but this is not recommended way as you're loosing control over transactions.
Has anyone found a way to read a Teradata query into a Pandas dataframe? It looks like SQLAlchemy does not have a Teradata dialect.
http://docs.sqlalchemy.org/en/latest/dialects/
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html
You can use slqalchemy but you will need to install sqlalchemy-teradata too. You can do that via PIP
pip install sqlachemy-teradata
The rest of the code remains the same :)
from sqlalchemy import create_engine
import pandas as pd
user, pasw, host = 'username','userpass', 'hostname'
# connect
td_engine = create_engine('teradata://{}:{}#{}:22/'.format(user,pasw,hostname))
# execute sql
query = 'select * from dbc.usersV'
result = td_engine.execute(query)
#To read your query to Pandas
df = pd.read_sql(query,td_engine)
I did it using read_sql . Below id the code snip :
def dqm() :
conn_rw = create_connection()
dataframes = []
srcfile = open('srcqueries.sql', 'rU').read()
querylist = srcfile.split(';')
querylist.pop()
for query in querylist :
dataframes.append(pd.read_sql(query, conn_rw))
close_connection(conn_rw)
return dataframes,querylist
You can create connection as below :
def create_connection():
conn = pyodbc.connect("DRIVER=Teradata;DBCNAME=tddb;UID=uid;PWD=pwd;QUIETMODE=YES", autocommit=True,unicode_results=True)
return conn
You can check complete code here : GitHub Link
Let me know if this answers your query .