Hi everyone,
I'm trying to wrap my head around microsoft server 2017 and python script.
In general - I'm trying to store a table I took from a website (using bs4),
storing it in a panda df , and then simply put the results in a temp sql table.
I entered the following code (I'm skipping parts of the code because the python script
does work in python. Keep in mind I'm calling the script from microsoft sql server 2017):
CREATE PROC OTC
AS
BEGIN
EXEC sp_execute_external_script
#language = N'Python',
#script = N'
import bs4 as bs
import pandas as pd
import requests
....
r = requests.get(url, verify = False)
html = r.text
soup = bs.BeautifulSoup(html, "html.parser")
data_date = str(soup.find(id="ctl00_SPWebPartManager1_g_4be2cf24_5a47_472d_a6ab_4248c8eb10eb_ctl00_lDate").contents)
t_tab1 = soup.find(id="ctl00_SPWebPartManager1_g_4be2cf24_5a47_472d_a6ab_4248c8eb10eb_ctl00_NiaROGrid1_DataGrid1")
df = parse_html_table(1,t_tab1)
print(df)
OutputDataSet=df
'
I tried the microsoft tutorials and simply couldn't understand how to
handle the inputs/outputs to get the result as a sql table.
Furthermore, I get the error
"
import bs4 as bs
ImportError: No module named 'bs4'
"
I'm obviously missing a lot here.
What am I to add to the sql code?
does the sql server even supports bs4? or only pandas?
and then I need to find another solution like write as csv?
Thanks for any help or advice you can offer
To use pip to install a Python package on SQL Server 2017:
On the server, open a command prompt as administrator.
Then cd to {instance directory}\PYTHON_SERVICES\Scripts
(for example: C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\PYTHON_SERVICES\Scripts).
Then execute pip install {package name}.
One you have the necessary package(s) installed and the script executes successfully, simply setting variable OutputDataSet to a pandas data frame will result in the contents of that data frame being returned as a result set from the stored procedure.
If you want to capture that result set in a table (perhaps a temporary table), you can use INSERT...EXEC (e.g. INSERT MyTable(Col1, Col2) EXEC sp_execute_external_script ...).
Related
Working on a project where I am trying to query a SSAS data source we have at work through Python. The connection is presently within Excel files, but I am trying to reverse engineer the process with Python to automate part of the analysis I do on a day to day... I use the pyadomd library to connect to the data source, here`s my code:
clr.AddReference(r"C:\Program Files (x86)\Microsoft Office\root\vfs\ProgramFilesX86\Microsoft.NET\ADOMD.NET\130\Microsoft.AnalysisServices.AdomdClient.dll")
clr.AddReference('Microsoft.AnalysisServices.AdomdClient')
from Microsoft.AnalysisServices.AdomdClient import AdomdConnection , AdomdDataAdapter
from sys import path
path.append('C:\Program Files (x86)\Microsoft Office\root\vfs\ProgramFilesX86\Microsoft.NET\ADOMD.NET\130\Microsoft.AnalysisServices.AdomdClient.dll')
import pyadomd
from pyadomd import Pyadomd
from pyadomd._type_code import adomd_type_map, convert
constr= "connection string"
with Pyadomd(constr) as conn:
with conn.cursor().execute(query) as cur:
print(cur.fetchall())
Which works (in part), seemingly I am able to connect to the SSAS data source. Say I do conn = Pyadomd(constr), it returns no error (no more as it did before). The issue is when I try to execute the query with the cursor it returns an error saying:
File "C:\Users\User\Anaconda3\lib\site-packages\pyadomd\pyadomd.py", line 71, in execute
adomd_type_map[self._reader.GetFieldType(i).ToString()].type_name
KeyError: 'System.Object'
By doing a bit of research, I found that KeyError meant that the code was trying to access a key within a dictionary in which that key isn't present. By digging through my variables and going through the code, I realized that the line:
from pyadomd._type_code import adomd_type_map
Created this dictionary of keys:values:
See dictionary here
Containing these keys: System.Boolean, System.DateTime, System.Decimal, System.Double, System.Int64, System.String. I figured that the "KeyError: System.Object" was referring to that dictionary. My issue is how can I import this System.Object key to that dictionary? From which library/module/IronPython Clr reference can I get it from?
What I tried:
clr.AddReference("System.Object")
Gave me error message saying "Unable to find assembly 'System.Object'. at Python.Runtime.CLRModule.AddReference(String name)"
I also tried:
from System import Object #no error but didn't work
from System import System.Object #error saying invalid syntax
I think it has to do with some clr.AddReference IronPython thing that I am missing, but I've been looking everywhere and can't find it.
Thanks!
Glad that the newer version solved the problem.
A few comments to the code snippet above. It can be done a bit more concise 😊
Pyadomd will import the necessary classes from the AdomdClient, which means that the following lines can be left out.
clr.AddReference(r"C:\Program Files (x86)\MicrosoftOffice\root\vfs\ProgramFilesX86\Microsoft.NET\ADOMD.NET\130\Microsoft.AnalysisServices.AdomdClient.dll")
clr.AddReference('Microsoft.AnalysisServices.AdomdClient')
from Microsoft.AnalysisServices.AdomdClient import AdomdConnection , AdomdDataAdapter
Your code will then look like this:
import pandas as pd
from sys import path
path.append(r'C:\Program Files (x86)\MicrosoftOffice\root\vfs\ProgramFilesX86\Microsoft.NET\ADOMD.NET\130')
from pyadomd import Pyadomd
constr= "constring"
query = "query"
with Pyadomd(constr) as con:
with con.cursor().execute(query) as cur:
DF = pd.DataFrame(cur.fetchone(), columns = [i.name for i in cur.description])
The most important thing is to add the AdomdClient.dll to your path before importing the pyadomd package.
Furthermore, the package is mainly meant to be used with CPython version 3.6 and 3.7.
Well big problems require big solutions..
After endlessly searching the web, I went on https://pypi.org/project/pyadomd/ and directly contacted the author of the package (SCOUT). Emailed him the same question and apparently there was a bug within the code that he fixed overnight and produced a new version of the package, going from 0.0.5 to 0.0.6. In his words:
[Hi,
Thanks for writing me 😊
I investigated the error, and you are correct, the type map doesn’t support converting System.Object.
That is a bug!
I have uploaded a new version of the Pyadomd package to Pypi which should fix the bug – Pyadomd will now just pass a System.Object type through as a .net object. Because Pyadomd doesn’t know the specifics of the System.Object type at runtime, you will then be responsible yourself to convert to a python type if necessary.
Please install the new version using pip.]1
So after running a little pip install pyadomd --upgrade, I restarted Spyder and retried the code and it now works and I can query my SSAS cube !! So hopefully it can help others.
Snippet of the code:
import pandas as pd
import clr
clr.AddReference(r"C:\Program Files (x86)\MicrosoftOffice\root\vfs\ProgramFilesX86\Microsoft.NET\ADOMD.NET\130\Microsoft.AnalysisServices.AdomdClient.dll")
clr.AddReference('Microsoft.AnalysisServices.AdomdClient')
from Microsoft.AnalysisServices.AdomdClient import AdomdConnection , AdomdDataAdapter
from sys import path
path.append(r'C:\Program Files (x86)\MicrosoftOffice\root\vfs\ProgramFilesX86\Microsoft.NET\ADOMD.NET\130\Microsoft.AnalysisServices.Ado mdClient.dll')
import pyadomd
from pyadomd import Pyadomd
constr= "constring"
query = "query"
and then as indicated on his package website:
with Pyadomd(constr) as con:
with con.cursor().execute(query) as cur:
DF = pd.DataFrame(cur.fetchone(), columns = [i.name for i in cur.description])
and bam! 10795 rows by 39 columns DataFrame, I haven't precisely calculated time yet, but looking good so far considering the amount of data.
I want to download data from SQL via python. But, instead of downloading the whole of dataset I only need specific variables.
I am restricted to use only the read_sql from pyodbc
My code is the following:
# call from SQL
import pandas as pd
import pyodbc
conn = pyodbc.connect("""DRIVER={SQL Server};
Server=BXTS131133.eu.rabonet.com\LWID_LAB_03;
Database=CORP_Modelling;
Trusted_connection=yes;""")
SQL1 = 'SELECT * FROM [CORP_Modelling].[LDM_Freeze_1].[JointObligorMonthly]'
Nevertheless, suppose that I want to download only a few variables/attributes from SQL. For example, from the tables sepecified in 'SLQ1' I only want to download:
var_to_download = ['MeasurementPeriodID', 'JointObligorID' ]
I cannot understand how I can modify the above code in order to download only these variables.
Using Benthic Golden6 "ImpExp6" tool -- I can successfully import 122+K rows of data from csv file.
Attempting to automate via .py as I have with other smaller data sets and I am encountering the exceeded table space error. I dropped everything from the user, maximizing available space just for test purposes-- continue to receive the error -- however I can use the import tool and import the 122K rows no problems.
If I can import the file manually with no issues -- should I not be able to also do so via python script? Below is the script I am using.
Note: if I use lines = [] for lines in reader: lines.append(line) it will append 5556 rows of data VS the nothing I am getting with the script below. Using Python2.7
import cx_Oracle
import csv
connection = cx_Oracle.connect('myinfo')
cursor = connection.cursor()
L=[]
reader = csv.reader(open("myfile.csv","r"))
for row in reader:
L.append(row)
cursor.execute("ALTER SESSION SET NLS_DATE_FORMAT = 'MM/DD/YYYY'")
cursor.executemany("INSERT INTO BI_VANTAGE_TEST VALUES(:25,:24,:23,:22,:21,:20,:19,:18,:17,:16,:15,:14,:13,:12,:11,:10,:9,:8,:7,:6,:5,:4,:3,:2,:1)",L)
connection.commit
I was able to automate this import using an alternate method (note keystroke commands are unique to what steps I needed to complete within the tool I was utilizing).
from pywinauto.application import Application
import pyautogui
app = Application().start("C:\myprogram.exe")
pyautogui.typewrite(['enter', 'right', 'tab'])
pyautogui.typewrite('myfile.txt')
pyautogui.typewrite(['tab'])
pyautogui.typewrite('myoracletbl')
pyautogui.typewrite(['tab', 'tab', 'tab'])
pyautogui.typewrite(['enter'])
pyautogui.typewrite(['enter'])
time.sleep(#seconds)
Application.Kill_(app)
I'm trying to pull some data in from a DashDB database and analyze it within a Jupyter Notebook, all within the Watson Studio. Ideally we would create a Pandas Dataframe for analysis.
Here's how I was able to do it:
# First import the relevant libraries
import jaydebeapi
from ibmdbpy import IdaDataBase
from ibmdbpy import IdaDataFrame
Create a hash with credentials:
credentials_dashdb = {
'host':'bluemix05.bluforcloud.com',
'port':'50000',
'user':'dash123456',
'password':"""mypassword""",
'database':'BLUDB'
}
Build the connection:
dsn="DASHDB;Database=BLUDB;Hostname=" + credentials_dashdb["host"] + ";Port=50000;PROTOCOL=TCPIP;UID=" + credentials_dashdb["user"] + ";PWD=" + credentials_dashdb["password"]
idadb=IdaDataBase(dsn)
Import the data:
# See all the table names in the database
df=idadb.show_tables(show_all = True)
# Show the table names
df.head(100)
# create a pandas dataframe from the table, show the first few rows
pandas_df = IdaDataFrame(idadb, 'MY_TABLE')
pandas_df.head()
Hope that helps someone. Big credit to Sven Hafeneger and this notebook for this solution!
Matt, you can drop the "import jaydebeapi" because you are using the dashDB ODBC driver under the hood with the dsn that you construct (which is also the recommended way to use ibmdbpy in DSX).
The link to Sven's notebook points to a R notebook. Not sure if that is what you intended. In any case, here is my official DSX sample notebook for ibmdbpy that highlights the solution that you described above.
I have used Python to parse a txt file for specific information (dates, $ amounts, lbs, etc) and now I want to export that data to an Oracle table that I made in SQL Developer.
I have successfully connected Python to Oracle with the cx_Oracle module, but I am struggling to export or even print any data to my database from Python.
I am not proficient at using SQL, I know of simple queries and that's about it. I have explored the Oracle docs and haven't found straightforward export commands. When exporting data to an Oracle table via Python is it Python code I am going to be using or SQL code? Is it the same as importing a CSV file, for example?
I would like to understand how to write to an Oracle table from Python; I need to parse and export a very large amount of data so this won't be a one time export/import. I would also ideally like to have a way to preview my import to ensure it aligns correctly with my already created Oracle table, or if a simple undo action exists that would suffice.
If my problem is unclear I am more than happy to clarify it. Thanks for all help.
My code so far:
import cx_Oracle
dsnStr = cx_Oracle.makedsn("sole.wh.whoi.edu", "1526", "sole")
con = cx_Oracle.connect(user="myusername", password="mypassword", dsn=dsnStr)
print (con.version)
#imp 'Book1.csv' [this didn't work]
cursor = con.cursor()
print (cursor)
con.close()
From Import a CSV file into Oracle using CX_Oracle & Python 2.7 you can see overall plan.
So if you already parsed data into csv you can easily do it like:
import cx_Oracle
import csv
dsnStr = cx_Oracle.makedsn("sole.wh.whoi.edu", "1526", "sole")
con = cx_Oracle.connect(user="myusername", password="mypassword", dsn=dsnStr)
print (con.version)
#imp 'Book1.csv' [this didn't work]
cursor = con.cursor()
print (cursor)
text_sql = '''
INSERT INTO tablename (firstfield, secondfield) VALUES(:1,:2)
'''
my_file = 'C:\CSVData\Book1.csv'
cr = csv.reader(open(my_file,"rb"))
for row in cr:
print row
cursor.execute(text_sql, row)
print 'Imported'
con.close()