subunit2sql configuration for Azure database - python

I'm trying to setup a Azure powered database to be used by subunit2sql.
In the subunit2sql db setup steps, it is mentioned to create the schema as:
subunit2sql-db-manage --database-connection mysql://subunit:pass#127.0.0.1/subunit upgrade head
Replacing the mysql tab with mssql, and after creating a odbc SDN connection to the azure DB - I'm able to see the database.
However, subunit2sql connection always fails with the below trace.
My understanding is that I might not be using a proper config for alchemy/python, but the messages are confusing me, as if I use isql for eg with the SDN DB info I can connect to it fine.
Command I'm running is:
subunit2sql-db-manage --verbose --database-connection mssql://user:'pwd'#remote_host/DB upgrade head
I do have the odbc config under /etc/odbcinst.ini and /etc/odbc.ini but the above fails with:
oslo_db.exception.DBError: (pyodbc.Error) ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)')
(sorry I cannot add subuni2sql tag here as I don't have points)

Sharing my testing. Hope it helps.
According to the requirements.txt of project subunit2sql on GitHub, the tool required Python package SQLAlchemy, so I refer to the SQLAlchemy document support for the Microsoft SQL Server database via the pymssql driver to set the database connection with the Azure SQL Database connection string, please see below.
subunit2sql-db-manage --database-connection mssql+pymssql://<username>#<hostname>:<password>#<hostname>.database.windows.net:1433/<database> upgrade head
Note: You can find the variables username, password, hostname & database in the SQL Database connection string on Azure portal.
For connecting SQL database successfully, you need to do the steps below before running the above command.
As reference, My environment is Ubuntu 14.04 LTS.
Refering to the document Configure development environment for Python development to install the required package for Python.
Configuring the /etc/odbcinst.ini file, for example as below.
[FreeTDS]
Driver=/usr/local/lib/libtdsodbc.so
Setup=/usr/local/lib/libtdsodbc.so
Server={hostname}.database.windows.net
UsageCount=1
Port=1433
Database={database}
User={username}#{hostname}
Password={password}
TDS_Version=7.2
client_charset=utf-8
Configuring the firewall settings to add the client ip, please refer to https://azure.microsoft.com/en-us/documentation/articles/sql-database-configure-firewall-settings/.
Althought I got the error information as below, successfully connected the SQL Database.
oslo_db.exception.DBError: (pymssql.ProgrammingError) (102, "Incorrect syntax near ''.DB-Lib error message 102, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\nDB-Lib error message 156, severity 15:\nGeneral SQL Server error: Check messages from the SQL Server\n") [SQL: 'INSERT INTO test_metadata_new (id,key`, value, test_id, new_test_id) SELECT tm.id, tm.key, tm.value, tm.test_id, tn.new_id FROM test_metadata tm INNER JOIN tests_new tn ON tn.id = tm.test_id']

Related

Python psycopg2, NO postgresql etc service running on mashine... Where is the postgress driver? where is pg_hba.conf? (FATAL: no pg_hba.conf entry)

I am using -in Linux- python 3 and psycopg2 to connect to some postres databases:
import psycopg2 as pg
connection = None
try:
connection = pg.connect(
user = "username",
password = "...",
host = "host_ip",
port = "5432",
database = "db_name",
)
cursor = connection.cursor()
# Print PostgreSQL Connection properties
print ( connection.get_dsn_parameters(),"\n")
# Print PostgreSQL version
cursor.execute("SELECT version();")
record = cursor.fetchone()
print("You are connected to - ", record,"\n")
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
except (Exception, pg.Error) as error :
print ("Error while connecting to PostgreSQL", error)
For one of the DBs this works, but for the other I am getting:
Error while connecting to PostgreSQL FATAL: no pg_hba.conf entry for host "XXX.XXX.XXX.XXX", user "YYYY", database "ZZZZ", SSL off
I have checked the web and stackoverflow, and there are a lot of similar questions,
e.g. Psycopg2 reporting pg_hba.conf error
However, I am not root on the machine where I used pip/anaconda, and there seems to be no
sql service or anything similar running:
$ sudo systemctl status postgres*
$ sudo systemctl status postgres
Unit postgres.service could not be found.
$ sudo systemctl status postgresql
Unit postgresql.service could not be found.
$ sudo systemctl status post*
So none of the answers seem to be relevant, because this question seems to be based on the
postgress service running, or on the existence of pg_hba.conf, either of which do not in my system. Though note that a sample is included in my envs/py3/share/ (where py3 the name of my environment):
$ locate pg_hba.conf
/home/nick/anaconda3/envs/py3/share/pg_hba.conf.sample
My question here aims to -apart to find a way to solve my immediate problem- understand what psycopg2 is / how it ends up using pg_hba.conf, seemingly used in a postgresql service that does not seem to exist in my system:
Does psycopg2 is/uses a driver? Why does it seem to include pg_hba.conf.sample and what is one supposed to do with it? where to place pg_hba.conf(???) it to make psycopg2 read it?
Notes / Info based on comments:
The DB is not locally hosted. It is running on a different server.
I am able to access that DB using DBeaver and my local Ubuntu python, but a container (same psycopg2 version is not), so I speculate it is not a DB server issue.
It seems pg_hba.conf is a file that should only be on the server? (If so, that actually is part of the answer I am looking for...)

How to read & write Local MySQL Server 8 from Google Colab with Pyspark?

I have been trying but failing to write/read tables from MySQL Server 8.0.19 on localhost on Windows 10 with pyspark from Google colab. There's also a lot of similar questions and with some suggested answers but none of the solutions seem to work here. Here is my code:
<...installations ...>
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("Word Count")\
.config("spark.driver.extraClassPath", "/content/spark-2.4.5-bin-hadoop2.7/jars/mysql-connector-java-8.0.19.jar")\
.getOrCreate()
An here is the connection string:
MyjdbcDF = spark.read.format("jdbc")\
.option("url", "jdbc:mysql://127.0.0.1:3306/mydb?user=testuser&password=pwtest")\
.option("dbtable", "collisions")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.load()
I have as well used the .option("driver","com.mysql.jdbc.Driver") but still keep getting this error:
Py4JJavaError: An error occurred while calling o154.load.
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
...
...
...
Caused by: java.net.ConnectException: Connection refused (Connection refused)
From this, I guess that MySQL Sever is not reachable.
I have Telnetted to port 3306 & it confirmed that MySQL Server is accepting connections from client machine. I have read that running: netsh advfirewall firewall add rule name="MySQL Server" action=allow protocol=TCP dir=in localport=3306 will permitting firewall rule for MySQL Server incase it was being blocked, yet no change.
Can somebody help outpy?
Here's how I install and setup MySQL on Colab
# install, set connection
!apt-get install mysql-server > /dev/null
!service mysql start
!mysql -e "ALTER USER 'root'#'localhost' IDENTIFIED WITH mysql_native_password BY 'root'"
!pip -q install PyMySQL
%load_ext sql
%config SqlMagic.feedback=False
%config SqlMagic.autopandas=True
%sql mysql+pymysql://root:root#/
# query using %sql or %%sql
df = %sql SELECT Host, User, authentication_string FROM mysql.user
df
You are trying to connect mysql database installed on your local machine i.e. windows 10 from google colab as localhost instance.
This is not possible because google colab spins up instance of their own to execute your code, if you want to access your local mysql you need to host it on a server so that it is reachable over the internet.
Otherwise, you can install mysql on colab then use that to run your code for testing.
!apt-get -y install mysql-server
then configure it on instance to use
After about several days of trials I discovered a solution, that is why I am going to answer my own question. I was able to connect using a WAMP server (thanks to #Shubham Jain for suggesting) and as well as without a WAMP server. This answer is without WAMP server.
Downloaded ngrok from https://ngrok.com/,
Unzipped it,
Saved it on my local windows,
Authinticate with:
./ngrok authtoken xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(pretty straight forward instructions available on the website)
Still on my local Windows, I copied and ran ngrok tcp 3306 on the command line
C:\Users\userMe> ngrok tcp 3306
and it gave someting like:
ngrok by #inconshreveable
Session Status online
Account userMe (Plan: Free)
Version 2.3.35
Region United States (us)
Web Interface http://localhost:4041
Forwarding tcp://0.tcp.ngrok.io:17992 -> localhost:3306
Connections ttl opn rt1 rt5 p50 p90
0 0 0.00 0.00 0.00 0.00
Where 0.tcp.ngrok.io:17992 is the only thing I am interested in and where 3306 is MySQL's and only port I am interested at exposing to the internet to link with my Google Colab.
So, at the end of the day, my PySpark READ connection will look like:
jdbcDF = spark.read.format("jdbc")\
.option("url", "jdbc:mysql://0.tcp.ngrok.io:17992/mydb?user=testUser&password=pestpw")\
.option("dbtable", "pipeLineTable")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.load();
The WRITE connection will be:
jdbcDF.write.mode("overwrite")\
.format("jdbc")\
.option("url", f"jdbc:mysql://0.tcp.ngrok.io:17992/mydb")\
.option("dbtable", "fromGcTable")\
.option("user", "testUser")\
.option("password", "testpw")\
.option("driver","com.mysql.cj.jdbc.Driver")\
.save()
In both connection strings, note the 0.tcp.ngrok.io:17992 that replaces localhost:3306
Solution retried in colab in 2021, this version currently works:
import pymysql
link=pymysql.connect(
host="xx.tcp.ngrok.io",
user="userName",
password="userPassword",
db="dbName",
charset="utf8",
port=xxxxx
)

MySql: ERROR 1045 (28000): Access denied for user using passwrod YES using cloud_sql_proxy

I need to execute a sql file using the mysql client in a python script (I can not execute the queries using a python mysql module), the database is a MySQL instance on GCloud, I'm connecting to that instance using cloud_sql_proxy.
I launched my_sql_proxy using tcp like so:
./cloud_sql_proxy -instances=<my_instance>=tcp:12367
And I'm trying to execute the sql script like so:
gunzip -c <filename>.sql.gz 2>&1 | mysql --host=127.0.0.1 --port=12367 --user=<user> --password=<password> <dbmane>
This produces the following error:
ERROR 1045 (28000): Access denied for user '<myuser>'#'cloudsqlproxy~<some ip>' (using password: YES)
I obviously checked user/pwd/instance are correct.
The same command pointing to a mysql instance actually hosted on localhost is working.
With the same cloud_sql_proxy process running I am able to connect using Sequel Pro client with the same auth info and I am able to connect to the db using the following command:
/rnd/pos/components/mysql --host=127.0.0.1 --port=12367 --user=<myuser> -p <dbname>
>>> Enter password:
>>> Welcome to the MySQL monitor. Commands....
When connected, this is the output (masked) of SELECT USER(), CURRENT_USER();
+--------------------------------------------+-------------------------------------------------+
| USER() | CURRENT_USER() |
+--------------------------------------------+-------------------------------------------------+
| <my user>#cloudsqlproxy~<same ip as above> | <my user>#cloudsqlproxy~<part of the same ip>.% |
+--------------------------------------------+-------------------------------------------------+
I tried with -p[password] and --password[=password], same output.
Why the --password is blocking? Is it a cloud_sql_proxy config? Or is it a MySQL instance setting?
I googled and stackoverflowed but can't find anything relevant to my case.
I'd suspect mysql looks for a socket by the default, while cloud_sql_proxy is explicitly using TCP.
I believe adding --protocol=TCP should solve the problem.
At least it helped me when I faced this obstacle under the same circumstances.

How to connect Scrapy spider deployed on Scrapinghub to a remote SQL server with SQLAlchemy and pyodbc?

After trying to solve this problem on my own I need some help or nudge in right direction.
I wrote and deployed Scrapy spider on Scrapinghub. This spider collects some data and after finish saves that data to remote Microsoft SQL Server. I use SQLAlchemy as ORM and Pyodbc as a driver.
For connecting to a DB in spider code I use:
params = quote_plus('DRIVER={ODBC Driver 13 for SQL Server};SERVER="server";DATABASE="db";UID="user";PWD="pass")
engine = create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
On my local PC with Win10 all work well - spider successfully connects to a remote DB and save data.
But if I try to run this spider on Scrapinghub I get an error:
DBAPIError: (pyodbc.Error) ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 13 for SQL Server' : file not found (0) (SQLDriverConnect)")
Seems like a problem with DRIVER part. I tried to change DRIVER={ODBC Driver 13 for SQL Server} to a DRIVER={SQL Server} or DRIVER={FreeTDS} but still getting the same error can't open lib 'lib_name' : file not found.
Does Scrapinghub support connection to a Microsoft SQL Server at all? What driver parameters do I need to use in order to successfully connect?
Thank you!
Can't open lib 'ODBC Driver 13 for SQL Server' : file not found
Above error is usually related to misconfiguration or missing odbcinst.ini file.
Run odbcinst -j and verify whether odbcinst.ini does exist and it has the right driver path, e.g.
[ODBC Driver 13 for SQL Server]
Description=Microsoft ODBC Driver 13 for SQL Server
Driver=/usr/local/lib/libmsodbcsql.13.dylib
Here is the example command creating user's config file (~/.odbcinst.ini):
printf "[ODBC Driver 13 for SQL Server]\nDescription=Microsoft ODBC Driver 13 for SQL Server\nDriver=/usr/local/lib/libmsodbcsql.13.dylib\n" >> ~/.odbcinst.ini
See: Installing the Microsoft ODBC Driver for SQL Server on Linux and macOS.
If you're using Anaconda, checkout this issue: ODBC Driver 13 for SQL Server can't open lib.

mongodb refusing connection in python

I am using windows8, for writing code I use IDLE. I tried to connect python to mongodb. But when trying to get collection name than it gives an error.
ServerSelectionTimeoutError: localhost:20101: [Errno 10061] No connection could be made because the target machine actively refused it
This is code for which i am getting an error.
from pymongo import MongoClient
connection = MongoClient('localhost',20101)
db = connection['Bhautik']
collection = db['Student']
db.collection_names(include_system_collections=True)
By the output message you probably didn't set your mongo bind_ip or didn't set the dbpath. Try this:
mongod --dbpath <database_path> --bind_ip 127.0.0.1 --port 20101
It would be more helpful to put alongside with your code some information regarding the mongodb configuration, like the server port, if you are using authentication or not, which dbpath you are using and so on.
So put in your question your mongodb.conf (if you are using one) or the command you are using to start the mongo server.
If you are starting to use mongoDB after installation, make C:/data/db because it is a default database directory which mongoDB uses.
To change the database directory, do type below:
C:\Program Files\MongoDB\Server\3.x\bin> mongod --dbpath "c:\custom_folder"
You can try
run mongo like that:
"C:\\Program Files\\MongoDB\\Server\\3.6\\bin\\mongod.exe" --dbpath E:\\data\\db --port 27017 --bind_ip 127.0.0.1
E:\data\db should be your location path
then in you code
it will lok like
client = MongoClient("127.0.0.1", 27017)
db = client['addsome']
datas = db.follow_up
and if you want to access from a distant machine make sure you open the port "27017" in the firewall
Some times it's gives this error when you forgot to run the local server (if it's run with local server).
To run it you need to write on your terminal:
mongod
or, if MongoDB not in PATH, you can find it via this link in your computer:
C:\Program Files\MongoDB\Server\4.0\bin\mongod.exe
In order to run MongoDB,
You should have installed MongoDB into your OS, download it from https://www.mongodb.com/download-center/community?tck=docs_server
Add the installation's bin folder to your system environment variables.
Openup the terminal and check 'mongod' and 'mongo' commands are working.
Then try to rerun your python script.

Categories