Extract domain from a link in Python and using SQL - python

I have a database to which I am connecting to using Python and running the sql statements in the following way .
import ibm_db
conn = ibm_db.connect("DATABASE=ABCD;HOSTNAME=dsomehostname.net;PORT=50001;PROTOCOL=TCPIP;UID=User1_id;PWD=Password; Security = SSL; ConnectTimeout = 30; sslConnection=TRUE","","")
connState = ibm_db.active(conn)
print(connState)
import ibm_db_dbi
# con = ibm_db_dbi.Connection(conn)
sql = "SELECT emails from Database1.Table1 WHERE TIMESTAMP>'2020-08-20' GROUP BY emails; "
stmt = ibm_db.exec_immediate(conn, sql)
dictionary = ibm_db.fetch_both(stmt)
It is giving me emails in the following way :
**https://abc**.ind.analytics.google.com/bs/?perspective=story
**https://abc**.ind.analytics.google.com/bs/
**https://tmb**.ind.analytics.google.com/bs/?perspective=ca-modeller
**https://fgt**.ind.analytics.google.com/bs/?perspective=explore
(null)
**https://abc**.ind.analytics.google.com/bs/?perspective=home
(null)
**https://col**.ind.analytics.google.com/bs/?perspective=classicviewer
**https://prod**.ind.analytics.google.com/bs/
(null)
**https://fcv**.ind.analytics.google.com/bs/?perspective=home
**https://prod**.health-analytics.something-else.com/bs/
(null)
**https://fcv**.health-analytics.something-else?perspective=home
I only want the bold part. i.e. I only want the part before "ind.analytics.google.com/bs/......" AND NOT before "health-analytics.something-else.com/bs/...:
https://abc
https://tmb
https://fgt
Is there a way I can include regex into this and fire the query. It will be great if someone can help me out with that.

You can iterate through the dictionary and collect the data you want into a list.
emailList = []
for email in dictionary:
domain = email.split('.')[0] // as the domain is the first element
emailList.append(domain)

Related

Executing a postgresql query with plpg-sql from sqlalchemy

I can't find examples using plpg-sql in raw SQL to be executed by sqlAlchemy these were the closest but no plpg-sql:
How to execute raw SQL in Flask-SQLAlchemy app
how to set autocommit = 1 in a sqlalchemy.engine.Connection
I've done research and I'm not sure if this is possible. I'm trying to either INSERT or UPDATE a record and there is no error. It must fail silently because there's no record created/updated in the database and I've explicitly set AutoCommit=True.
Python:
engine = db.create_engine(connstr, pool_size=20, max_overflow=0)
Session = scoped_session(sessionmaker(bind=engine, autocommit=True))
s = Session()
query = """DO $$
declare
ppllastActivity date;
percComplete numeric;
begin
select lastactivity into ppllastActivity FROM feeds WHERE email = :e and courseName=:c and provider = :prov;
IF COALESCE (ppllastActivity, '1900-01-01') = '1900-01-01' THEN
INSERT INTO feeds (email, courseName, completedratio, lastActivity, provider) VALUES (:e, :c, :p, :l, :prov);
ELSEIF ppllastActivity < :l THEN
UPDATE feeds set completedratio = :p,lastActivity = :l WHERE email = :e and courseName = :c and provider = :prov;
END if;
end; $$"""
params = {'e' : item.get('email').replace("'", "''").lower(), 'c' : item.get('courseName').replace("'", "''"), 'p' : item.get('progress'), 'l' : item.get('lastActivity'),'prov' : "ACG" }
result = s.execute(text(query),params)
I'm unable to troubleshoot since it doesn't give me any errors. Am I going down the wrong path? Should I just use psql.exe or can you do plpg-sql in raw SQL with sqlAlchemy?
While typing this question up I found a solution or a bug.
The automcommit=True doesn't work, you have to begin a transaction:
with s.begin():
result = s.execute(text(query),params)

How to retrieve all saved queries from MS Access database using python?

Basically, i'm comparing 2 access databases using python.
i do not have access to manually open any access files, it must be done entirely within python!
I need to retrieve the full list of
Query names
Associated query code
I will not know what the names of the queries are ahead of time.
I've tried a number of solutions that have nearly worked, i've outlined the 3 closed below.
Partial Solution 1
i nearly had it working using win32com & the CurrentDb.QueryDefs method to retrieve each query's code.
However, it appears that the order of the joins is not stored deterministicaly between 2 databases.
(it appears to be dependent on the order of the entry in MSysQueries)
i.e. in one database, the text for the join could be
on Table1.ColumnA = Table2.ColumnA & Table1.ColumnB = Table2.ColumnB
and in another
on Table1.ColumnB = Table2.ColumnB & Table1.ColumnA = Table2.ColumnA
obviously these will result in the same type of join, but not the exact same query text.
If i compared the text directly they do not match. Processing the text before comparing seems like a bad idea with lots of corner cases.
Sample Code
objAccess = Dispatch("Access.Application")
objAccess.Visible = False
counter = 0
query_dicts = {}
for database_path in (new_database_path, old_database_path):
# Open New DB and pull stored queries into dict
objAccess.OpenCurrentDatabase(database_path)
objDB = objAccess.CurrentDb()
db_query_dict = {}
for stored_query in objDB.QueryDefs:
db_query_dict[stored_query.name] = stored_query.sql
query_dicts[("New" if counter == 0 else 'Old')] = db_query_dict
objAccess.CloseCurrentDatabase()
counter += 1
Partial Solution 2
After the first solution failed, i tried to write a query on MSysQueries and force an ordering. However, pyodbc does not have read access to the table!
It appears you cannot grant read access from python itself, which is an issue, could be wrong here.
Query:
SELECT MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id = MSysQueries.ObjectId
order by MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2
Partial Solution 3
Another thing i tried was to get python to store a VBA module into the database, that will write the meta info to a table and then read that table out via pyodbc.
i could add the module, but the access database kept prompting for a name for the module. I couldnt find the documentation on how to name the module with a method call
Sample Code:
import win32com.client as win32
import comtypes, comtypes.client
import win32api, time
from win32com.client import Dispatch
strDbName = r'C:\Users\Username\SampleDatabase.mdb'
objAccess = Dispatch("Access.Application")
# objAccess.Visible = False
objAccess.OpenCurrentDatabase(strDbName)
objDB = objAccess.CurrentDb()
xlmodule = objAccess.VBE.VbProjects(1).VBComponents.Add(1) # vbext_ct_StdModule
xlmodule.CodeModule.AddFromString(Constants.ACCESS_QUERY_META_INFO_MACRO)
objAccess.Run("CreateQueryMetaInfoTable")
objAccess.CloseCurrentDatabase()
objAccess.Quit()
Macro i was attempting to add.
Sub CreateQueryMetaInfoTable()
Dim sql_string As String
# Create empty table
CurrentDb.Execute ("Create Table QueryMetaInfoTable (QueryName text, SqlCode text)")
Dim qd As QueryDef
For Each qd In CurrentDb.QueryDefs
# insert values
sql_string = "Insert into QueryMetaInfoTable (QueryName, SqlCode) values ('" & qd.Name & "', '" & qd.SQL & "')"
CurrentDb.Execute sql_string
Next
End Sub
With the help of #Gord Thompson, i have a working solution now.
I needed to connect with OLEDB to grant the read access 1st, generated a non-system table with the info needed, then read the table back with ODBC via pandas.
CONNECTION_STRING_OLEDB = "PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE={};Jet OLEDB:System Database={};"
ACCESS_QUERY_META_INFO_CREATE = """SELECT MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2
INTO QueryMetaInfo
FROM MSysObjects INNER JOIN MSysQueries ON MSysObjects.Id = MSysQueries.ObjectId
order by MSysObjects.Name
, MSysQueries.Attribute
, MSysQueries.Expression
, MSysQueries.Flag
, MSysQueries.Name1
, MSysQueries.Name2"""
ACCESS_QUERY_META_INFO_READ = """select * from QueryMetaInfo
order by Name
, Attribute
, Expression
, Flag
, Name1
, Name2;"""
ACCESS_QUERY_META_INFO_DROP = "DROP TABLE QueryMetaInfo"
connection = win32com.client.Dispatch(r'ADODB.Connection')
DSN = CONNECTION_STRING_OLEDB.format(database_path, r"C:\Users\C218\AppData\Roaming\Microsoft\Access\System.mdw")
connection.Open(DSN)
cmd = win32com.client.Dispatch(r'ADODB.Command')
cmd.ActiveConnection = connection
cmd.CommandText = "GRANT SELECT ON MSysObjects TO Admin;"
cmd.Execute()
connection.Execute(ACCESS_QUERY_META_INFO_CREATE)
connection.Close()
# connect with odbc to read the query meta info into pandas
connection_string = Constants.CONNECTION_STRING_ACCESS.format(database_path)
access_con = pyodbc.connect(connection_string)
access_cursor = access_con.cursor()
df = pd.read_sql(ACCESS_QUERY_META_INFO_READ, access_con)
# drop table after read
access_cursor.execute(ACCESS_QUERY_META_INFO_DROP)
access_cursor.commit

Number of entries in SAP R/3 table using Pyrfc

How do you use the Pyrfc Python library to query the number of entries in an SAP R/3 database table?
I know of three methods to do this using Pyrfc. Modify the following example with your SAP R/3 server connection settings and desired table name:
from pyrfc import Connection
params = dict(
ashost="1.1.1.1",
sysnr="1",
client="100",
user="username",
passwd="password",
)
table = "MKAL"
with Connection(**params) as conn:
# Method 1
result = conn.call("RFC_GET_TABLE_ENTRIES", TABLE_NAME=table, MAX_ENTRIES=1)
entries = result["NUMBER_OF_ENTRIES"]
# Method 2
result = conn.call("EM_GET_NUMBER_OF_ENTRIES", IT_TABLES=[{"TABNAME": table}])
entries = result["IT_TABLES"][0]["TABROWS"]
# Method 3
short_field = "MANDT" # table field with short data length
result = conn.call(
"RFC_READ_TABLE",
QUERY_TABLE=table,
ROWCOUNT=0,
FIELDS=short_field,
)
entries = len(result)

Using Boto3 to interact with amazon Aurora on RDS

I have set up a database in Amazon RDS using Amazon Aurora and would like to interact with the database using Python - the obvious choice is to use Boto.
However, their documentation is awful and does nopt cover ways in which I can interact with the databse to:
Run queries with SQL statements
Interact with the tables in the database
etc
Does anyone have an links to some examples/tutorials, or know how to do these tasks?
When using Amazon RDS offerings (including Aurora), you don't connect to the database via any AWS API (including Boto). Instead you would use the native client of your chosen database. In the case of Aurora, you would connect using the MySQL Command Line client. From there, you can query it just like any other MySQL database.
There's a brief section of the "Getting Started" documentation that talks about connecting to your Aurora database:
Connecting to an Amazon Aurora DB Cluster
Here are a couple examples:
INSERT example:
import boto3
sql = """
INSERT INTO YOUR_TABLE_NAME_HERE
(
your_column_name_1
,your_column_name_2
,your_column_name_3)
VALUES(
:your_param_1_name
,:your_param_2_name)
,:your_param_3_name
"""
param1 = {'name':'your_param_1_name', 'value':{'longValue': 5}}
param2 = {'name':'your_param_2_name', 'value':{'longValue': 63}}
param3 = {'name':'your_param_3_name', 'value':{'stringValue': 'para bailar la bamba'}}
param_set = [param1, param2, param3]
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
rds_data = boto3.client('rds-data')
response = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = sql,
parameters = param_set)
print(str(response))
READ example:
import boto3
rds_data = boto3.client('rds-data')
db_clust_arn = 'your_db_cluster_arn_here'
db_secret_arn = 'your_db_secret_arn_here'
employee_id = 35853
get_vacation_days_sql = f"""
select vacation_days_remaining
from employees_tbl
where employee_id = {employee_id}
"""
response1 = rds_data.execute_statement(
resourceArn = db_clust_arn,
secretArn = db_secret_arn,
database = 'your_database_name_here',
sql = get_vacation_days_sql)
#recs is a list (of rows returned from Db)
recs = response1['records']
print(f"recs === {recs}")
#recs === [[{'longValue': 57}]]
#single_row is a list of dictionaries, where each dictionary represents a
#column from that single row
for single_row in recs:
print(f"single_row === {single_row}")
#single_row === [{'longValue': 57}]
#one_dict is a dictionary with one key value pair
#where the key is the data type of the column and the
#value is the value of the column
#each additional column is another dictionary
for single_column_dict in single_row:
print(f"one_dict === {single_column_dict}")
# one_dict === {'longValue': 57}
vacation_days_remaining = single_column_dict['longValue']
print(f'vacation days remaining === {vacation_days_remaining}')
Source Link:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.calling.python

why can't I fetch sql statements in python?

I have a very large table (374870 rows) and when I run the following code timestamps just ends up being a long int with the value 374870.... I want to be able to grab all the timestamps in the table... but all I get is a long int :S
import MySQLdb
db = MySQLdb.connect(
host = "Some Host",
user = "SOME USER",
passwd = "SOME PASS",
db = "SOME DB",
port = 3306
)
sql = "SELECT `timestamp` from `table`"
timestamps = db.cursor().execute(sql)
Try this:
cur = db.cursor()
cur.execute(sql)
timestamps = []
for rec in cur:
timestamps.append(rec[0])
You need to call fetchmany() on the cursor to fetch more than one row, or call fetchone() in a loop until it returns None.
Consider the possibility that the not-very-long integer that you are getting is the number of rows in your query result.
Consider reading the docs (PEP 249) ... (1) return value from cursor.execute() is not defined; what you are seeing is particular to your database and for portability sake should not be relied on. (2) you need to do results = cursor.fetch{one|many|all}() or iterate over the cursor ... for row in cursor: do_something(row)

Categories