how to query spanner and get metadata, especially columns' names? - python

I'm trying to query custom SQL on Spanner and convert the results into a Pandas Dataframe, so I need data and column names, but I can't find a way to get the column names.
According to the documentation, I can get columns using metadata or fields properties, but this doesn't work.
I tried to run a query transaction and also to get a snapshot, but I just get a data row.
from google.cloud import spanner
from google.cloud.spanner_v1.streamed import StreamedResultSet
def query_transaction(instance_id, database_id, query_param):
spanner_client = spanner.Client.from_service_account_json("PATH_XXXXX")
database = spanner_client.instance(instance_id).database(database_id)
def run_transaction(transaction):
query = query_param
results: StreamedResultSet = transaction.execute_sql(query)
print("type", type(results))
print("metadata", results.stats)
for row in results:
print(row)
database.run_in_transaction(run_transaction)
def query_snapshot(instance_id, database_id, query):
spanner_client = spanner.Client.from_service_account_json("PATH_XXXXX")
database = spanner_client.instance(instance_id).database(database_id)
with database.snapshot() as snapshot:
results: StreamedResultSet = snapshot.execute_sql(query)
print("metadata", results.metadata)
print("type", type(results))
for row in results:
print(row)
spanner_id = "XXXXXXX"
base_id = "XXXXXXXX"
query ="SELECT * FROM XXXXX LIMIT 5"
spanner.query_snapshot(spanner_id, base_id, query)
spanner.query_transaction(spanner_id, base_id, query)
I can iterate the results and get rows, but metadata always is None.

You must fetch at least one row before the metadata are available. So if you were to change the order of your code so that you first fetch the data (or at least some data), and then get the metadata, it should work.
results: StreamedResultSet = snapshot.execute_sql(query)
print("metadata", results.metadata)
for row in results:
print(row)
Into this:
results: StreamedResultSet = snapshot.execute_sql(query)
for row in results:
print(row)
print("metadata", results.metadata)
then you should be getting metadata.
Also note that result set statistics (results.stats) is only available when you are profiling a query. When you are just executing the query, as you are in your above example, this will always be empty.

Related

Array Outputting result set with the same amount of rows in a sql database

I have a query that reaches into a MySQL database and grabs row data that match the column "cab" which is a variable that is passed on from a previous html page. That variable is cabwrite.
SQL's response is working just fine, it queries and matches the column 'cab' with all data point in the rows that match id cab.
Once that happens I then remove the data I don't need line identifier and cab.
The output from that is result_set.
However when I print the data to verify its what I expect I'm met with the same data for every row I have.
Example data:
Query has 4 matching rows that is finds
This is currently what I'm getting:
> data =
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
Code:
cursor = connection1.cursor(MySQLdb.cursors.DictCursor)
cursor.execute("SELECT * FROM devices WHERE cab=%s " , [cabwrite])
result_set = cursor.fetchall()
data = []
for row in result_set:
localint = "('%s','%s','%s')" % ( row["localint"], row["devicename"], row["hostname"])
l = str(localint)
data.append(l)
print (data)
This is what I want it too look like:
data = [(g11,none,tech11),(g2,none,tech13),(g3,none,tech15),(g4,none,tech31)]
["('Gi3/0/13','None','TECH2_HELP')", "('Gi3/0/7','None','TECH2_1507')", "('Gi1/0/11','None','TECH2_1189')", "('Gi3/0/35','None','TECH2_4081')", "('Gi3/0/41','None','TECH2_5625')", "('Gi3/0/25','None','TECH2_4598')", "('Gi3/0/43','None','TECH2_1966')", "('Gi3/0/23','None','TECH2_2573')", "('Gi3/0/19','None','TECH2_1800')", "('Gi3/0/39','None','TECH2_1529')"]
Thanks Tripleee did what you recommended and found my issue... legacy FOR clause in my code upstream was causing the issue.

Working with Selected IDs in SQL Alchemy

I have a database with two tables. The ssi_processed_files_prod table contains file information including the created date and a boolean indicating if the data has been deleted. The data table contains the actual data the boolean references.
I want to get a list of IDs over the age of 45 days from the file_info table, delete the associated rows from the data table, then set the boolean from file_info to True to indicate the data has been deleted.
file_log_test= Table('ssi_processed_files_prod', metadata, autoload=True, autoload_with=engine)
stmt = select([file_log_test.columns.id])
stmt = stmt.where(func.datediff(text('day'),
file_log_test.columns.processing_end_time, func.getDate()) > 45)
connection = engine.connect()
results = connection.execute(stmt).fetchall()
This query returns the correct results, however, I have not been able to work with the output effectively.
For those who would like to know the answer. This was based on reading the Essential SQL Alchemy book. The initial block of cod was correct, but I had to flatten the results into a list. From there I could use the in_() conjuction to work with the list of ids. This allowed me to delete rows from the relevant table and update data status in anohter.
file_log_test= Table('ssi_processed_files_prod', metadata, autoload=True,
autoload_with=engine)
stmt = select([file_log_test.columns.id])
stmt = stmt.where(func.datediff(text('day'),
file_log_test.columns.processing_end_time, func.getDate()) > 45)
connection = engine.connect()
results = connection.execute(stmt).fetchall()
ids_to_delete = [x[0] for x in results]
d = delete(data).where(data.c.filename_id.in_(ids_to_delete))
connection.execute(d)

How to improve query performance to select info from postgress?

I have a flask app:
db = SQLAlchemy(app)
#app.route('/')
def home():
query = "SELECT door_id FROM table WHERE id = 2422628557;"
result = db.session.execute(query)
return json.dumps([dict(r) for r in result])
When I execute: curl http://127.0.0.1:5000/
I got result very quickly [{"door_id": 2063805}]
But when I reverse my query: query = "SELECT id FROM table WHERE door_id = 2063805;"
everything work very-very slow.
Probably I have index on id attribute and don't have such on door_id.
How can I improve performance? How to add index on door_id?
If you want index on that column, just create it:
create index i1 on table (door_id)
Then depending on your settings you might have to analyze it to introduce to query planner, eg:
analyze table;
keep in mind - all indexes require additional IO on data manipulation
Look at the explain plan for your query.
EXPLAIN
"SELECT door_id FROM table WHERE id = 2422628557;"
It is likely you are seeing something like this:
QUERY PLAN
------------------------------------------------------------
Seq Scan on table (cost=0.00..483.00 rows=99999 width=244)
Filter: (id = 2422628557)
The Seq Scan is checking every single row in this table and this is then being filtered by the id you are restricting by.
What you should do in this occasion is to add an index to the ID column.
The plan will change to something like:
QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using [INDEX_NAME] on table (cost=0.00..8.27 rows=1 width=244)
Index Cond: (id = 2422628557)
The optimiser will be using the index to reduce the row look ups for your query.
This will speed up your query.

Put retrieved data from MySQL query into DataFrame pandas by a for loop

I have one database with two tables, both have a column called barcode, the aim is to retrieve barcode from one table and search for the entries in the other where extra information of that certain barcode is stored. I would like to have bothe retrieved data to be saved in a DataFrame. The problem is when I want to insert the retrieved data into DataFrame from the second query, it stores only the last entry:
import mysql.connector
import pandas as pd
cnx = mysql.connector(user,password,host,database)
query_barcode = ("SELECT barcode FROM barcode_store")
cursor = cnx.cursor()
cursor.execute(query_barcode)
data_barcode = cursor.fetchall()
Up to this point everything works smoothly, and here is the part with problem:
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_info = pd.DataFrame(cursor.fetchall())
pro_info contains only the last matching barcode information! While I want to retrieve all the information for each data_barcode match.
That's because you are consistently overriding existing pro_info with new data in each loop iteration. You should rather do something like:
query_info = ("SELECT product_code FROM product_info")
cursor.execute(query_info)
pro_info = pd.DataFrame(cursor.fetchall())
Making so many SELECTs is redundant since you can get all records in one SELECT and instantly insert them to your DataFrame.
#edit: However if you need to use the WHERE statement to fetch only specific products, you need to store records in a list until you insert them to DataFrame. So your code will eventually look like:
pro_list = []
query_info = ("SELECT product_code FROM product_info WHERE barcode=%s")
for each_barcode in data_barcode:
cursor.execute(query_info % each_barcode)
pro_list.append(cursor.fetchone())
pro_info = pd.DataFrame(pro_list)
Cheers!

Fetch and render fetched values from MySQL tables as XML

I am fetching results out of a query from a table:
def getdata()
self.cursor.execute("....")
fetchall = self.cursor.fetchall()
result ={}
for row in fetchall:
detail1 = row['mysite']
details2 = row['url']
result[detail1] = row
return result
Now I need to process the result set as generated :
def genXML()
data = getdata()
doc = Document() ""create XML tree structure"""
Such that data would hold all the rows as fetched from query and I can extract each column values from it? Somehow I am not getting the desired out. My requirement is to fetch result set via a DB query and store result into a placeholder such that I can easily access it later in other method or locations?
================================================================================
I tried the below technique but still in method 'getXML()' I am unable to get each dict row so that I can traverse and manipulate:
fetchall = self.cursor.fetchall()
results= []
result={}
for row in fetchall:
result['mysite'] = row['mysite']
result['mystart'] = row['mystart']
..................................
results.append(result)
return results
def getXML(self):
doc = Document()
charts = doc.createElement("charts")
doc.appendChild(charts)
chartData = self.grabChartData()
for site in chartData:
print site[??]
So how do I get each chartData row values and then I can loop for each?
Note: I found that only last row fetched values are getting printed as in chartData. Say I know that 2 rows are getting returned by the query. Hence in case I print the list in getXML() method like below both rows are same:
chartData[0]
chartData[1]
How can I uniquely add each result to the list?
Here you are modifying and adding the same dict to results over and over again:
result={}
for row in fetchall:
result['mysite'] = row['mysite']
result['mystart'] = row['mystart']
..................................
results.append(result)
Create the dictionary inside the loop to solve this:
for row in fetchall:
result={}

Categories