Python - list of tuples as function's parameter - python

I have a list of tuples that will store my db credentials later used by DBClient class to perform a SELECT using get_data(). dd_client's post_sql_metrics method builds entire data set, adds envs tag and injectes into the dash.board
Issue I'm having is how to run the loop for ech environment with its dedicated credentials.
envs_creds = [
('dev', 'xyz', 'db', 'usr', 'pass'),
('test', 'xyz', 'db', 'usr', 'pass'),
]
for i,e in enumerate(envs_creds):
client = DBClient(envs_creds[0][1], envs_creds[0][2], envs_creds[0][3], envs_creds[0][4])
sql_query, header = client .get_data()
dd_client.post_sql_metrics(sql_query, header, envs_creds[0][0])
DBClient class:
class DBClient:
def __init__(self, server, database, username, password):
self.server = server
self.database = database
self.username = username
self.password = password
def get_data(self):
query = 'SELECT col1, colN FROM tbl'
conn = pymssql.connect(server=self.server, user=self.username, password=self.password, database=self.database)
cursor = conn.cursor()
cursor.execute(query)
res_list = cursor.fetchall()
conn.close()
header = re.search('SELECT(.*)FROM', query) #q[7:q.index('FROM')]
header = [x.strip() for x in header.group(1).split(sep=',')] #[x.strip() for x in q.split(sep=',')]
return res_list, header
Metrics post method:
def post_sql_metrics(self, tasks, header, env, metric_name="my_metric"):
tags = [[f'{a}:{b}' for a, b in zip(header, i)] for i in tasks]
tags = [sub_lst + [f'env:{env}'] for sub_lst in tags]
col_to_remove = 2
tags = [(x[0:col_to_remove] + x[col_to_remove+1:]) for x in tags]
series = [
DDMetric(
self.HOST,
metric_name,
record[2],
tag,
).to_series() for record, tag in zip(tasks, tags)
]
print(series)

Your problem is that you are constantly referring to the 0th element rather than i. You are also starting from element 1 when instantiating your DBClient which will give an IndexError
# Using _ instead of e as e is not being used here
for i, _ in enumerate(envs_creds):
client = DBClient(envs_creds[i][0], envs_creds[i][1], envs_creds[i][1], envs_creds[i][3])
sql_query, header = client .get_data()
dd_client.post_sql_metrics(sql_query, header, envs_creds[i][0])
Typically, in Python you can avoid indexing (especially in a for loop).
# e will be a single tuple with the DB credentials
for e in envs_creds:
client = DBClient(e[0], e[1], e[2], e[3]) # or more simply: DBClient(*e)
sql_query, header = client .get_data()
dd_client.post_sql_metrics(sql_query, header, e[0])
When you are looping over a list with consistent elements you can also use tuple unpacking as demonstrated below.
# Assigning each element to a variable
for server, db, usr, pwd in envs_creds:
client = DBClient(server, db, usr, pwd)
sql_query, header = client.get_data()
dd_client.post_sql_metrics(sql_query, header, server)
As an aside, you are going to have a serious security issue if you are storing your DB credentials in your code. You should read up on proper methods to store sensitive information.
Note: as mentioned in the comments, the credentials are just there to represent structure and are en/decrypted.

Related

How can I bulk upload JSON records to AWS OpenSearch index using a python client library?

I have a sufficiently large dataset that I would like to bulk index the JSON objects in AWS OpenSearch.
I cannot see how to achieve this using any of: boto3, awswrangler, opensearch-py, elasticsearch, elasticsearch-py.
Is there a way to do this without using a python request (PUT/POST) directly?
Note that this is not for: ElasticSearch, AWS ElasticSearch.
Many thanks!
I finally found a way to do it using opensearch-py, as follows.
First establish the client,
# First fetch credentials from environment defaults
# If you can get this far you probably know how to tailor them
# For your particular situation. Otherwise SO is a safe bet :)
import boto3
credentials = boto3.Session().get_credentials()
region='eu-west-2' # for example
auth = AWSV4SignerAuth(credentials, region)
# Now set up the AWS 'Signer'
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
auth = AWSV4SignerAuth(credentials, region)
# And finally the OpenSearch client
host=f"...{region}.es.amazonaws.com" # fill in your hostname (minus the https://) here
client = OpenSearch(
hosts = [{'host': host, 'port': 443}],
http_auth = auth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)
Phew! Let's create the data now:
# Spot the deliberate mistake(s) :D
document1 = {
"title": "Moneyball",
"director": "Bennett Miller",
"year": "2011"
}
document2 = {
"title": "Apollo 13",
"director": "Richie Cunningham",
"year": "1994"
}
data = [document1, document2]
TIP! Create the index if you need to -
my_index = 'my_index'
try:
response = client.indices.create(my_index)
print('\nCreating index:')
print(response)
except Exception as e:
# If, for example, my_index already exists, do not much!
print(e)
This is where things go a bit nutty. I hadn't realised that every single bulk action needs an, er, action e.g. "index", "search" etc. - so let's define that now
action={
"index": {
"_index": my_index
}
}
You can read all about the bulk REST API, there.
The next quirk is that the OpenSearch bulk API requires Newline Delimited JSON (see https://www.ndjson.org), which is basically JSON serialized as strings and separated by newlines. Someone wrote on SO that this "bizarre" API looked like one designed by a data scientist - far from taking offence, I think that rocks. (I agree ndjson is weird though.)
Hideously, now let's build up the full JSON string, combining the data and actions. A helper fn is at hand!
def payload_constructor(data,action):
# "All my own work"
action_string = json.dumps(action) + "\n"
payload_string=""
for datum in data:
payload_string += action_string
this_line = json.dumps(datum) + "\n"
payload_string += this_line
return payload_string
OK so now we can finally invoke the bulk API. I suppose you could mix in all sorts of actions (out of scope here) - go for it!
response=client.bulk(body=payload_constructor(data,action),index=my_index)
That's probably the most boring punchline ever but there you have it.
You can also just get (geddit) .bulk() to just use index= and set the action to:
action={"index": {}}
Hey presto!
Now, choose your poison - the other solution looks crazily shorter and neater.
PS The well-hidden opensearch-py documentation on this are located here.
conn = wr.opensearch.connect(
host=self.hosts, # URL
port=443,
username=self.username,
password=self.password
)
def insert_index_data(data, index_name='stocks', delete_index_data=False):
""" Bulk Create
args: body [{doc1}{doc2}....]
"""
if delete_index_data:
index_name = 'symbol'
self.delete_es_index(index_name)
resp = wr.opensearch.index_documents(
self.conn,
documents=data,
index=index_name
)
print(resp)
return resp
I have used below code to bulk insert records from postgres into OpenSearch ( ES 7.2 )
import sqlalchemy as sa
from sqlalchemy import text
import pandas as pd
import numpy as np
from opensearchpy import OpenSearch
from opensearchpy.helpers import bulk
import json
engine = sa.create_engine('postgresql+psycopg2://postgres:postgres#127.0.0.1:5432/postgres')
host = 'search-xxxxxxxxxx.us-east-1.es.amazonaws.com'
port = 443
auth = ('username', 'password') # For testing only. Don't store credentials in code.
# Create the client with SSL/TLS enabled, but hostname verification disabled.
client = OpenSearch(
hosts = [{'host': host, 'port': port}],
http_compress = True,
http_auth = auth,
use_ssl = True,
verify_certs = True,
ssl_assert_hostname = False,
ssl_show_warn = False
)
with engine.connect() as connection:
result = connection.execute(text("select * from account_1_study_1.stg_pred where domain='LB'"))
records = []
for row in result:
record = dict(row)
record.update(record['item_dataset'])
del record['item_dataset']
records.append(record)
df = pd.DataFrame(records)
#df['Date'] = df['Date'].astype(str)
df = df.fillna("null")
print(df.keys)
documents = df.to_dict(orient='records')
#bulk(es ,documents, index='search-irl-poc-dump', raise_on_error=True)\
#response=client.bulk(body=documents,index='sample-index')
bulk(client, documents, index='search-irl-poc-dump', raise_on_error=True, refresh=True)

SOAP operation name:import with Zeep

I have a problem with WSDL operation name:import. It is one of the most important remote operation, that update product list on the remote server.
The problem starts when I want to call the method:
client.service.import('ns0:Product_Import', _soapheaders = [header_value])
node = client.service.import(product_name)
^
SyntaxError: invalid syntax
because the 'import' statement is reserved to the python. How to make that calling this method does not interfere with python?
This code below works fine. Maybe someone will use it.
from zeep import Client
from zeep import xsd
loginIn = {'username': 'my_username', 'password': 'my_password'}
wsdl_auth = 'http://some-wsdl-service.com/auth/wsdl/'
wsdl_products = 'http://some-wsdl-service.com/products/wsdl/'
header = xsd.Element(
'{http://some-wsdl-service.com/products/wsdl/}Header',
xsd.ComplexType([
xsd.Element(
'{http://some-wsdl-service.com/products/wsdl/}sessionId',
xsd.String()
),
])
)
client = Client(wsdl = wsdl_auth)
response = client.service.login(loginIn)
sid = response.sessionId
header_value = header(sessionId = sid)
client = Client(wsdl = wsdl_products)
list_of_products = client.service.get('ns0:Product_List',
_soapheaders [header_value])
client = Client(wsdl = wsdl_auth)
request_to_end = client.service.logout(_soapheaders=[header_value]))
You can use getattr() to access methods in client.service
_import = getattr(client.service, 'import')
result = _import(product_name)

BigQuery For Loop Results into Python smtplib Message Body

I want to grab data from a BigQuery database at a set interval and have the results emailed automatically using smtplib.
I have an example below. I can pull data from BigQuery just fine. I can send email via smtplib just fine. What I need to do is combine. I want to store the results of a for loop into the email body of the message. I believe I do that by calling the function. However when I do that. I receive the error.
File "bqtest5.py", line 52, in server.sendmail(login_email,
recipients, query_named_params('corpus', 'min_word_count')) File
"/usr/lib/python2.7/smtplib.py", line 729, in sendmail
esmtp_opts.append("size=%d" % len(msg)) TypeError: object of type
'NoneType' has no len()
from google.cloud import bigquery
import smtplib
#Variables
login_email = 'MYEMAIL'
login_pwd = 'MYPASSWORD'
recipients ='EMAILSENDINGTO'
#Create a function
#specifies we are going to add two paramaters
def query_named_params(corpus, min_word_count):
#Create a Client
client = bigquery.Client()
#Define the query
query = """
SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = #corpus
AND word_count >= #min_word_count
ORDER BY word_count DESC;
"""
#Define the paramaters
query_params = [
bigquery.ScalarQueryParameter('corpus', 'STRING', 'sonnets'),
bigquery.ScalarQueryParameter(
'min_word_count', 'INT64', 10)
]
#Create job configuration
job_config = bigquery.QueryJobConfig()
#Add Query paramaters
job_config.query_parameters = query_params
#P
query_job = client.query(query, job_config=job_config)
# Print the results.
destination_table_ref = query_job.destination
table = client.get_table(destination_table_ref)
resulters = client.list_rows(table)
for row in resulters:
print("{} : {} views".format(row.word, row.word_count))
# --------------------EMAIL PORTION -------------#
#)smtplib connection print('messenger()')
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login(login_email, login_pwd)
msg = """
"""
server.sendmail(login_email, recipients, query_named_params('corpus',
'min_word_count'))
server.quit()
if __name__ == '__main__':
query_named_params('corpus','min_word_count')
Your function is not returning any value which causes an empty message to be sent to server.sendmail.
Try this instead:
def query_named_params(corpus, min_word_count):
(...)
query_params = [
bigquery.ScalarQueryParameter('corpus', 'STRING', corpus),
bigquery.ScalarQueryParameter(
'min_word_count', 'INT64', min_word_count)
]
(...)
s = ""
for row in resulters:
s+= "{} : {} views\n".format(row.word, row.word_count))
return s
(...)
server.sendmail(login_email, recipients, query_named_params('sonnets', 10))
This probably will not send a very readable message though. Depending on the complexity of the results from your BQ table I'd recommend using Jinja2 to create a HTML template and then rendering it to be sent as the mail body.

Insert into Odoo db with a specific id using cursor.commit and psycopg2

I'm trying to migrate some models from OpenERP 7 to Odoo 8 by code. I want to insert objects into new table maintaining the original id number, but it doesn't do it.
I want to insert the new object including its id number.
My code:
import openerp
from openerp import api, modules
from openerp.cli import Command
import psycopg2
class ImportCategory(Command):
"""Import categories from source DB"""
def process_item(self, model, data):
if not data:
return
# Model structure
model.create({
'id': data['id'],
'parent_id': None,
'type': data['type'],
'name': data['name']
})
def run(self, cmdargs):
# Connection to the source database
src_db = psycopg2.connect(
host="127.0.0.1", port="5432",
database="db_name", user="db_user", password="db_password")
src_cr = src_db.cursor()
try:
# Query to retrieve source model data
src_cr.execute("""
SELECT c.id, c.parent_id, c.name, c.type
FROM product_category c
ORDER BY c.id;
""")
except psycopg2.Error as e:
print e.pgerror
openerp.tools.config.parse_config(cmdargs)
dbname = openerp.tools.config['db_name']
r = modules.registry.RegistryManager.get(dbname)
cr = r.cursor()
with api.Environment.manage():
env = api.Environment(cr, 1, {})
# Define target model
product_category = env['product.category']
id_ptr = None
c_data = {}
while True:
r = src_cr.fetchone()
if not r:
self.process_item(product_category, c_data)
break
if id_ptr != r[0]:
self.process_item(product_category, c_data)
id_ptr = r[0]
c_data = {
'id': r[0],
'parent_id': r[1],
'name': r[2],
'type': r[3]
}
cr.commit()
How do I do that?
The only way I could find was to use reference attributes in others objects to relate them in the new database. I mean create relations over location code, client code, order number... and when they are created in the target database, look for them and use the new ID.
def run(self, cmdargs):
# Connection to the source database
src_db = psycopg2.connect(
host="localhost", port="5433",
database="bitnami_openerp", user="bn_openerp", password="bffbcc4a")
src_cr = src_db.cursor()
try:
# Query to retrieve source model data
src_cr.execute("""
SELECT fy.id, fy.company_id, fy.create_date, fy.name,
p.id, p.code, p.company_id, p.create_date, p.date_start, p.date_stop, p.special, p.state,
c.id, c.name
FROM res_company c, account_fiscalyear fy, account_period p
WHERE p.fiscalyear_id = fy.id AND c.id = fy.company_id AND p.company_id = fy.company_id
ORDER BY fy.id;
""")
except psycopg2.Error as e:
print e.pgerror
openerp.tools.config.parse_config(cmdargs)
dbname = openerp.tools.config['db_name']
r = modules.registry.RegistryManager.get(dbname)
cr = r.cursor()
with api.Environment.manage():
env = api.Environment(cr, 1, {})
# Define target model
account_fiscalyear = env['account.fiscalyear']
id_fy_ptr = None
fy_data = {}
res_company = env['res.company']
r = src_cr.fetchone()
if not r:
self.process_fiscalyear(account_fiscalyear, fy_data)
break
company = res_company.search([('name','like',r[13])])
print "Company id: {} | Company name: {}".format(company.id,company.name)
The previous code is only an extract from the whole source code.

Inserting a value with SQLite3 and Python/Flask

I am working in a small website where I want to show, insert, edit and delete elements from the database
I accomplished showing the data with this route:
#app.route('/objects')
def objects():
g.db = sqlite3.connect(db_location)
cur = g.db.execute('SELECT * FROM milkyway ORDER BY type')
objects = [dict(id=row[0], type=row[1], name=row[2], description=row[3], size=row[4], mass=row[5],distance=row[6], discoverer=row[7], image_url=row[8]) for row in cur.fetchall()]
g.db.close()
return render_template("objects.html",objects=objects)
And now I am trying to insert an element but I receive an error "AttributeError: '_AppCtxGlobals' object has no attribute 'db'"
#app.route('/insert_value',methods = ['POST', 'GET'])
def insert_value():
atype = request.form['type']
name = request.form['name']
description = request.form['description']
size = request.form['size']
mass = request.form['mass']
distance = request.form['distance']
discoverer = request.form['discoverer']
image_url = request.form['image_url']
g.db.execute('INSERT INTO milkyway (type,name,description,size,mass,distance,discoverer,image_ur) VALUES (?,?,?,?,?,?,?,?)', [atype], [name], [description], [size], [mass], [distance], [discoverer], [image_url] )
g.db.commit()
return redirect(url_for('objects'))
I search everywhere but the thing is, there are so many different ways to do this and I cannot make it works. I followed for instance this example; http://opentechschool.github.io/python-flask/extras/databases.html
The connection, g.db, needs to be added with each request. You only create it in objects, so it doesn't exist for insert_value. g is an object which is created in the context of each request.
As the example code shows, you should create the connection before each request and add it to g.
#app.before_request
def before_request():
g.db = sqlite3.connect(db_location)
Be sure to also close it in #app.teardown_request.

Categories