SQLAlchemy create_engine not working in Google Cloud Function - python

I am trying to run a Google Cloud Function that collects data off a website then inserts it into a Cloud SQL (MySQL) database, but having problems with SQLAlchemy in Cloud which don't appear on my local machine. Any suggestions!?
When I run the function locally, against Py3.7 (on a Mac, not using virtualenv), using the Cloud SQL Proxy and SQLAlchemy, I successfully connect to the database.
When running the Cloud Function, I use connection string in this format mysql+pymysql://<username>:<password>/<dbname>?unix_socket=/cloudsql/<PROJECT-NAME>:<INSTANCE-REGION>:<INSTANCE-NAME>.
The Cloud Function keeps throwing the following exception for SQLAlchemy.create_engine. It does not appear to be related to being able to connect, but due to instantiation.
Everything is in the same project.
I have also tried using the public IP and connection string in the format mysql+pymysql://<username>:<password>#<public ip address>:3306/<dbname>, which made no difference.
Traceback (most recent call last):
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker_v2.py", line 449, in run_background_function
_function_handler.invoke_user_function(event_object)
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker_v2.py", line 268, in invoke_user_function
return call_user_function(request_or_event)
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker_v2.py", line 265, in call_user_function
event_context.Context(**request_or_event.context))
File "/user_code/main.py", line 14, in retrieve_and_log
engine = create_engine(connection_string,echo=True)
File "/env/local/lib/python3.7/site-packages/sqlalchemy/engine/__init__.py", line 500, in create_engine
return strategy.create(*args, **kwargs)
File "/env/local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 56, in create
plugins = u._instantiate_plugins(kwargs)
AttributeError: 'Context' object has no attribute '_instantiate_plugins'
Here is a snippet of my code:
import requests
from bs4 import BeautifulSoup
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
def retrieve_and_log(store_string, connection_string = 'mysql+pymysql://<username>:<password>/<dbname>?unix_socket=/cloudsql/<PROJECT-NAME>:<INSTANCE-REGION>:<INSTANCE-NAME>'):
engine = create_engine(connection_string,echo=True)
conn = engine.connect()
# ....

If retrieve_and_log is the function you are trying to deploy as a background Cloud Function, it needs a signature like:
def retrieve_and_log(data, context):
...
It can't take arbitrary parameters.
See https://cloud.google.com/functions/docs/writing/background for more details.

Related

InterfaceError using PyMySQL (database connection closes)

I use PyMySQL library and Flask in my program. My view function accesses the database every time it called. After some calls it breaks and raise InterfaceError(0, ''). All next requests also raise InterfaceError (any db query, specifially).
Traceback (most recent call last):
(several files of mine and Flask)
File "/home/maxim/.local/lib/python3.7/site-packages/pymysql/cursors.py", line 170, in execute
result = self._query(query)
File "/home/maxim/.local/lib/python3.7/site-packages/pymysql/cursors.py", line 328, in _query
conn.query(q)
File "/home/maxim/.local/lib/python3.7/site-packages/pymysql/connections.py", line 516, in query
self._execute_command(COMMAND.COM_QUERY, sql)
File "/home/maxim/.local/lib/python3.7/site-packages/pymysql/connections.py", line 750, in _execute_command
raise err.InterfaceError("(0, '')")
pymysql.err.InterfaceError: (0, '')
I read PyMySQL library code and saw, that this error occures if connection's _sock variable is None (i think it means connection is closed). But why is it happen?
I use one connection object for all view functions (i.e. it is defined outside functions). Do I do it right or I must make new connection every request? Or I need do something other to get rid of this error?
My code: https://pastebin.com/sy3xKtgB
Full traceback: https://pastebin.com/iTU75FUi
I solved my problem by creating a new connection to db every request.
def get_db():
return pymysql.connect(
'ip',
'user',
'password',
'db_name',
cursorclass=pymysql.cursors.DictCursor
)
I call this function every request.
from flask import Flask, request
from my_utils import get_db
app = Flask(__name__)
#app.route('/get', methods=['POST'])
def get():
conn = get_db()
with conn.cursor() as cur:
pass

How to use boto3 on EC2 instance without local configuration?

My goal is to be able to run Python program using boto3 to access DynamoDB without any local configuration. I've been following this AWS document https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html) and it seems to be feasible using the 'IAM role' option https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#iam-role. This means I don't have anything configured locally.
However, as I attached a role with DynamoDB access permission to the EC2 instance the Python program is running and ran boto3.resources('dynamodb') I kept getting the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.6/site-packages/boto3/__init__.py", line 100, in resource
return _get_default_session().resource(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/boto3/session.py", line 389, in resource
aws_session_token=aws_session_token, config=config)
File "/home/ubuntu/.local/lib/python3.6/site-packages/boto3/session.py", line 263, in client
aws_session_token=aws_session_token, config=config)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/session.py", line 839, in create_client
client_config=config, api_version=api_version)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/client.py", line 86, in create_client
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/client.py", line 328, in _get_client_args
verify, credentials, scoped_config, client_config, endpoint_bridge)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/args.py", line 47, in get_client_args
endpoint_url, is_secure, scoped_config)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/args.py", line 117, in compute_client_args
service_name, region_name, endpoint_url, is_secure)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/client.py", line 402, in resolve
service_name, region_name)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/regions.py", line 122, in construct_endpoint
partition, service_name, region_name)
File "/home/ubuntu/.local/lib/python3.6/site-packages/botocore/regions.py", line 135, in _endpoint_for_partition
raise NoRegionError()
botocore.exceptions.NoRegionError: You must specify a region.
I've searched the internet and it seems most of the solutions pointing to have local configuration (e.g. ~/.aws/config, boto3 config file, etc.).
Also, I have verified that from EC2 instance, I am able to get the region from instance metadata:
$ curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document
{
...
"region" : "us-east-2",
...
}
My workaround right now is to provide an environment variable AWS_DEFAULT_REGION passing via Docker command line.
Here is the simple code I have to replicate the issue:
>>>import boto3
>>>dynamodb = boto3.resource('dynamodb')
I expect somehow boto3 is able to pick up the region that is already available in the EC2 instance.
There are two types of configuration data in boto3: credentials and non-credentials (including region). How boto3 reads them differs.
See:
Configuring Credentials
https://github.com/boto/boto3/issues/375
Specifically, boto3 retrieves credentials from the instance metadata service but not other configuration items (such as region).
So, you need to indicate which region you want. You can retrieve the current region from metadata and use it, if appropriate. Or use the environment variable AWS_DEFAULT_REGION.
You can pass region as a parameter to any boto3 resource.
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')

Google Cloud App Engine Flexible Python 2.7 Env Errors starting new threads

I know just enough devops to be dangerous. I've successfully deployed a VERY simple python flask app to App Engine that basically publishes received post data as a message to PubSub. It is almost identical to Google's sample code to do so. Only difference is it uses a service account I push with the app repository to access PubSub to circumvent this issue.
Works very well so far, but I've started seeing a very small number of errors around starting a new thread in threading.py:
1)
Traceback (most recent call last):
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 33, in grpc._cython.cygrpc._spawn_callback_async
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 24, in grpc._cython.cygrpc._spawn_callback_in_thread
File "/usr/lib/python2.7/threading.py", line 736, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
2)
Traceback (most recent call last):
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 33, in grpc._cython.cygrpc._spawn_callback_async
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 24, in grpc._cython.cygrpc._spawn_callback_in_thread
3)
Traceback (most recent call last):
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 33, in grpc._cython.cygrpc._spawn_callback_async
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 33, in grpc._cython.cygrpc._spawn_callback_async
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 24, in grpc._cython.cygrpc._spawn_callback_in_thread
File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 24, in grpc._cython.cygrpc._spawn_callback_in_thread
File "/usr/lib/python2.7/threading.py", line 736, in start
File "/usr/lib/python2.7/threading.py", line 736, in start
I have 2 questions, in order of importance:
This is an app that basically needs 100% uptime in order to not lose data (not confident the clients attempt retries if there is an error on my server side). Are these errors internal to how App Engine is managing my app's resources, and not resulting in errors handling actual requests? How can I determine if I ever responded with an HTTP error/didn't successfully handle a request? I don't see any errors in my nginx logs...is that the place I need to look to see if anything failed?
Is there a way I can fix this error?
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/pubsub/google/cloud/pubsub_v1/publisher/client.py#L143
it looks like publisher.publish(topic_path, data=data) is an async operation, returning a concurrent.futures.Future object
Have you trying calling the Future's result()? https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future.result
This will block until the future object is successful, fails, or timesout.
You could then forward that result as your HTTP response.
Hopefully, the result object will give you more information about the error.
Ended up changing the methodology a bit. Instead of posting a pubsub message then having dataflow ingest through GCS to BigQuery I decided to stream directly into BQ using the BigQuery python client. Updated dependencies for the python flask app to:
Flask==1.0.2
google-cloud-pubsub==0.39.1
gunicorn==19.9.0
google-cloud-bigquery==1.11.2
and I am no longer seeing any of those exceptions. It's worth noting that I'm still using a service account .json credentials file in the same directory as the app source, and I'm creating the BigQuery client with
bq_client = bigquery.Client.from_service_account_json(BQ_SVC_ACCT_FILE).
For anyone else with similar issues I'd recommend updating your dependencies (esp any Google Cloud client libraries) and create the Client you need from a local service account credentials file. I attempted to use the inherited Compute engine environment credentials (basically the default project compute engine service account) but that was less stable than pushing up an actual credential file and using that locally. However...assess your own security needs before doing the same.

Accessing HDInsight Hive with python

We have a HDInsight cluster with some tables in HIVE. I want to query these tables from Python 3.6 from a client machine (outside Azure).
I have tried using PyHive, pyhs2 and also impyla but I am running into various problems with all of them.
Does anybody have a working example of accessing a HDInsight HIVE from Python?
I have very little experience with this, and don't know how to configure PyHive (which seems the most promising), especially regarding authorization.
With impyla:
from impala.dbapi import connect
conn = connect(host='redacted.azurehdinsight.net',port=443)
cursor = conn.cursor()
cursor.execute('SELECT * FROM cs_test LIMIT 100')
print(cursor.description) # prints the result set's schema
results = cursor.fetchall()
This gives:
Traceback (most recent call last):
File "C:/git/ml-notebooks/impyla.py", line 3, in <module>
cursor = conn.cursor()
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 125, in cursor
session = self.service.open_session(user, configuration)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 995, in open_session
resp = self._rpc('OpenSession', req)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 923, in _rpc
response = self._execute(func_name, request)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 954, in _execute
.format(self.retries))
impala.error.HiveServer2Error: Failed after retrying 3 times
With Pyhive:
from pyhive import hive
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
#also tried other auth-types, but as i said, i have no clue here
This gives:
Traceback (most recent call last):
File "C:/git/ml-notebooks/PythonToHive.py", line 3, in <module>
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 64, in connect
return Connection(*args, **kwargs)
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TSocket.py", line 117, in read
buff = self.handle.recv(sz)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
According to the offical document Understand and resolve errors received from WebHCat on HDInsight, it said as below.
What is WebHCat
WebHCat is a REST API for HCatalog, a table, and storage management layer for Hadoop. WebHCat is enabled by default on HDInsight clusters, and is used by various tools to submit jobs, get job status, etc. without logging in to the cluster.
So a workaround way is to use WebHCat to run the Hive QL in Python, please refer to the Hive document to learn & use it. As reference, there is a similar MSDN thread discussed about it.
Hope it helps.
Technically you should be able to use the Thrift connector and pyhive but I haven't had any success with this. However I have successfully used the JDBC connector using JayDeBeAPI.
First you need to download the JDBC driver.
http://central.maven.org/maven2/org/apache/hive/hive-jdbc/1.2.1/hive-jdbc-1.2.1-standalone.jar
http://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.4/httpclient-4.4.jar
http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.4.4/httpcore-4.4.4.jar
I put mine in /jdbc and used JayDeBeAPI with the following connection string.
edit: You need to add /jdbc/* to your CLASSPATH environment variable.
import jaydebeapi
conn = jaydebeapi.connect("org.apache.hive.jdbc.HiveDriver",
"jdbc:hive2://my_ip_or_url:443/;ssl=true;transportMode=http;httpPath=/hive2",
[username, password],
"/jdbc/hive-jdbc-1.2.1.jar")

sqlalchemy.exc.OperationalError: (OperationalError) unable to open database file None None

I am running a program from another person who are inconvenience ask for help from. The program is a website. Server end is written by python and flask (module, http://flask.pocoo.org/). The program has been successfully run on the server. What I need to do is modify something on it. Since the production server is not allowed for test, I tested it in development server locally via flask. However, I could not run even the original program. Below is from python.
(venv)kevin#ubuntu:~/python/public_html$ python index.wsgi
Traceback (most recent call last):
File "index.wsgi", line 6, in
from app import app as application
File "/home/kevin/python/public_html/app.py", line 27, in <module>
app = create_app()
File "/home/kevin/python/public_html/app.py", line 12, in create_app
database.init_db()
File "/home/kevin/python/public_html/database.py", line 24, in init_db
Base.metadata.create_all(engine)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/schema.py", line 2793, in create_all
tables=tables)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1478, in _run_visitor
with self._optional_conn_ctx_manager(connection) as conn:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1471, in _optional_conn_ctx_manager
with self.contextual_connect() as conn:
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1661, in contextual_connect
self.pool.connect(),
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 272, in connect
return _ConnectionFairy(self).checkout()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 425, in __init__
rec = self._connection_record = pool._do_get()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 857, in _do_get
return self._create_connection()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 225, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 318, in __init__
self.connection = self.__connect()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 368, in __connect
connection = self.__pool._creator()
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 80, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 283, in connect
return self.dbapi.connect(*cargs, **cparams)
sqlalchemy.exc.OperationalError: (OperationalError) unable to open database file None None
In the config.py file
LOGFILE = '/tmp/ate.log'
DEBUG = True
TESTING = True
THREADED = True
DATABASE_URI = 'sqlite:////tmp/ate.db'
SECRET_KEY = os.urandom(24)
Hence, I created a folder called "tmp" under my user and an empty file called "ate.db". Then, ran it again. It said
IOError: [Errno 2] No such file or directory: '/home/kevin/log/ate.log'
Then, I created the log folder and the log file. Run it, but nothing happened like
(venv)kevin#ubuntu:~/python/public_html$ python index.wsgi
(venv)kevin#ubuntu:~/python/public_html$ python index.wsgi
(venv)kevin#ubuntu:~/python/public_html$
If it is successful, the website should be available on http://127.0.0.1:5000/. However, it did not work. Does anybody know why and how to solve it? The codes should be fine since it is now available online. The problem should be a local problem. Thank you so much for your help.
The code of where the program is stuck
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
engine = None
db_session = None
Base = declarative_base()
def init_engine(uri, **kwards):
global engine
engine = create_engine(uri, **kwards)
return engine
def init_db():
global db_session
db_session = scoped_session(sessionmaker(bind=engine))
# import all modules here that might define models so that
# they will be registered properly on the metadata. Otherwise
# you will have to import them first before calling init_db()
import models
Base.metadata.create_all(engine)
Replace:
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////dbdir/test.db'
With:
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///dbdir/test.db'
finally figured it out, had help tho
import os
file_path = os.path.abspath(os.getcwd())+"\database.db"
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///'+file_path
db = SQLAlchemy(app)
I had this issue with sqlite. The process trying to open the database file needs to have write access to the directory as it creates temporary/lock files.
The following structure worked for me to allow www-data to use the database.
%> ls -l
drwxrwxr-x 2 fmlheureux www-data 4096 Feb 17 13:24 database-dir
%> ls -l database-dir/
-rw-rw-r-- 1 fmlheureux www-data 40960 Feb 17 13:28 database.sqlite
My database URI started rocking after adding one dot in between ////. Working on windows 7. I had directory and db-file created prior to calling this.
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///./dbdir/test.db'
I think I've seen errors like this where file permissions were wrong for the .db file or its parent directory. You might make sure that the process trying to access the database can do so by appropriate use of chown or chmod.
This is specifically about Django, but maybe still relevant: https://serverfault.com/questions/57596/why-do-i-get-sqlite-error-unable-to-open-database-file
I just met this same problem and found that I make a stupid circular reference .
./data_model.py
from flask.ext.sqlalchemy import SQLAlchemy
from api.src.app import app
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////database/user.db')
db = SQLAlchemy(app)
./app.py
...
from api.src.data_model import db
db.init_app(app)
Then I removed the app.py/db and it works.
For those looking for a solution to the OperationalError, not necessarily caused by being unable to open database file None None - you might try adding a pool_pre_ping=True argument to create_engine, i.e.
engine = create_engine("mysql+pymysql://user:pw#host/db", pool_pre_ping=True)
see sqlalchemy documentation:
Pessimistic testing of connections upon checkout is achievable by using the Pool.pre_ping argument, available from create_engine() via the create_engine.pool_pre_ping argument
The “pre ping” feature will normally emit SQL equivalent to “SELECT 1” each time a connection is checked out from the pool; if an error is raised that is detected as a “disconnect” situation, the connection will be immediately recycled, and all other pooled connections older than the current time are invalidated, so that the next time they are checked out, they will also be recycled before use.
You're not managing to find the path to the database from your current level. What you need to do is the following:
DATABASE_URI = 'sqlite:///../tmp/ate.db'
That means go up to the root level .. and then navigate down to the database (the relative path is /tmp/ate.db in this case).
I had this same issue when trying to start the central scheduler for luigi (python module) with task history enabled.
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
I was attempting to use the following configuration from their documentation:
[task_history]
db_connection = sqlite:////user/local/var/luigi-task-hist.db
However, /user/local/* did not exist on my machine and I had to change the configuration to:
[task_history]
db_connection = sqlite:////usr/local/var/luigi-task-hist.db
Kind of a dumb mistake, but easily overlooked. Might save someone some time. This change got rid of the error in my case and luigid started with no errors.
I am doing a course of Python and I have the same problem. Affortunately in the course put the right way to determined the path of the database URI
So it works for me even in the 2022 year.
You need to change:
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
to:
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///<name of database>.db'
I hope that it works for someone.
This is the problem related to your file path. If you want to save your file in your root directory itself, then write file_name itself right after '/' -
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///file_name.db'
I was able to overcome the same error by running sudo python :)

Categories