How to read sql query to pandas dataframe / python / django - python

I'm using this below in views.py to get app
from django.db import connection
def test(request):
cursor = connection.cursor()
sql = """
SELECT x , n
from table1 limit 10
"""
cursor.execute(sql)
rows = cursor.fetchall()
# df1 = pd.read_sql_query(sql,cursor) <==> not working )
# df1.columns = cursor.keys() <==> not working )
return render(request, 'app/test.html',{ "row" : rows,})
I am able to print row and got a list of something like this below in test.html
row((x1,yvalue1),(x2,yvalue2) , .... ))
But what I'm trying to do is to get all data with its column name that I fetched and put into dataframe , hopefully to use something like this below :
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.html#pandas.read_sql_query

I think aus_lacy is a bit off in his solution - first you have to convert the QuerySet to a string containing the SQL backing the QuerySet
from django.db import connection
query = str(ModelToRetrive.objects.all().query)
df = pandas.read_sql_query(query, connection)
Also there is a less memory efficient but still valid solution:
df = DataFrame(list(ModelToRetrive.objects.values('id','some_attribute_1','some_attribute_2')))

You need to use Django's built in QuerySet API. More information on it can be seen here. Once you create a QuerySet you can then use pandas read_sql_query method to construct the data frame. The simplest way to construct a QuerySet is simply query the entire database which can be done like so:
db_query = YourModel.objects.all()
You can use filters which are passed in as args when querying the database to create different QuerySet objects depending on what your needs are.
Then using pandas you could do something like:
d_frame = pandas.read_sql_query(db_query, other_args...)

Related

How to get column names in a SQLAlchemy query?

I have a function (remote database, without models in my app), I am making a request to it. Is it possible to get the column names using query rather than execute?
session = Session(bind=engine)
data = session.query(func.schema.func_name())
I am getting an array of strings with values, how do I get the keys? I want to generate a dict.
When I make a request with an execute, the dictionary is generated fine.
data = session.execute("select * from schema.func_name()")
result = [dict(row) for row in data]
You can do something like:
keys = session.execute("select * from schema.func_name()").keys()
Or try accessing it after the query:
data = session.query(func.schema.func_name()).all()
data[0].keys()
You can also use: data.column_descriptions
Documention:
https://docs.sqlalchemy.org/en/14/orm/query.html

while iterating over a pandas Series, query an SQLite database with each member of the Series

I have a pandas Series made from the following python dictionary, so:
gr8 = {'ERF13' : 'AT2G44840', 'BBX32' : 'AT3G21150', 'NAC061' : 'AT3G44350', 'NAC090' : 'AT5G22380', 'ERF019' : 'AT1G22810'}
gr8obj = pd.Series(gr8)
( where I have previously imported pandas as pd )
I have an SQLite database, AtRegnet.db
I want to iterate over the pandase Series, gr8obj, and query the database, AtRegnet.db, for each member of the series.
This is what I have tried:
for i in gr8obj:
resdf = pd.read_sql('SELECT * FROM AtRegNet WHERE TargetLocus = ?' (i), con=sqlite3.connect("/home/anno/SQLiteDBs/AtRegnet.db"))
fresdf = resdf.append(resdf)
fresdf
( the table in the AtRegnet.db that I want is AtRegNet and the column I am searching on is called TargetLocus. )
I know that when I work on the SQLite3 database directly with a SQL command,
select * from AtRegNet where TargetLocus="AT3G23230"
that I get back 80 lines from the database. (AT3G23230 is one of members of gr8obj)
You can try using a f-string. And the value for TargetLocus in your query should also be in quotes
resdf = pd.read_sql(f'''SELECT * FROM AtRegNet WHERE TargetLocus = \'{i}\'''')

sqlite selecting multiple tables

I have a database in sqlite with c.300 tables. Currently i am iterating through a list and appending the data.
Is there a faster way / more pythonic way of doing this?
df = []
for i in Ave.columns:
try:
df2 = get_mcap(i)
df.append(df2)
#print (i)
except:
pass
df = pd.concat(df, axis=0
Ave is a dataframe where the column in the list i want to iterate through.
def get_mcap(Ticker):
cnx = sqlite3.connect('Market_Cap.db')
df = pd.read_sql_query("SELECT * FROM '%s'"%(Ticker), cnx)
df.columns = ['Date', 'Mcap-Ave', 'Mcap-High', 'Mcap-Low']
df = df.set_index('Date')
df.index = pd.to_datetime(df.index)
cnx.close
return df
Before I post my solution, I should include a quick warning that you should never use string manipulation to generate SQL queries unless it's absolutely unavoidable, and in such cases you need to be certain that you are in control of the data which is being used to format the strings and it won't contain anything that will cause the query to do something unintended.
With that said, this seems like one of those situations where you do need to use string formatting, since you cannot pass table names as parameters. Just make sure there's no way that users can alter what is contained within your list of tables.
Onto the solution. It looks like you can get your list of tables using:
tables = Ave.columns.tolist()
For my simple example, I'm going to use:
tables = ['table1', 'table2', 'table3']
Then use the following code to generate a single query:
query_template = 'select * from {}'
query_parts = []
for table in tables:
query = query_template.format(table)
query_parts.append(query)
full_query = ' union all '.join(query_parts)
Giving:
'select * from table1 union all select * from table2 union all select * from table3'
You can then simply execute this one query to get your results:
cnx = sqlite3.connect('Market_Cap.db')
df = pd.read_sql_query(full_query, cnx)
Then from here you should be able to set the index, convert to datetime etc, but now you only need to do these operations once rather than 300 times. I imagine the overall runtime of this should now be much faster.

Using Unnest With psycopg2

I've built a Web UI to serve as an ETL application that allows users to select some CSV and TSV files contain large amounts of records and I am attempting to insert them into a PostgreSQL database. As has already been well commented on, this process is kind of slow. After some research it looked like using the UNNEST function would be my answer but I'm having trouble implementing it. Honestly I just didn't find a great walk-through tutorial as I normally do when researching any data processing in Python.
Here's the SQL string as I store them (to be used in functions later):
salesorder_write = """
INSERT INTO api.salesorder (
site,
sale_type,
sales_rep,
customer_number,
shipto_number,
cust_po_number,
fob,
order_number
) VALUES (
UNNEST(ARRAY %s)
"""
I use this string along with a list of tuples like so:
for order in orders:
inputs=(
order['site'],
order['sale_type'],
order['sales_rep'],
order['customer_number'],
order['shipto_number'],
order['cust_po_number'],
order['fob'],
order['order_number']
)
tup_list.append(inputs)
cur.execute(strSQL,tup_list)
This gives me the error that Not all arguments converted during string formatting. My first question is How do I need to structure my SQL to be able to pass my list of tuples. My second is, can I use the existing dictionary structure in much the same way?
unnest is not superior to the now (since Psycopg 2.7) canonical execute_values:
from psycopg2.extras import execute_values
orders = [
dict (
site = 'x',
sale_type = 'y',
sales_rep = 'z',
customer_number = 1,
shipto_number = 2,
cust_po_number = 3,
fob = 4,
order_number = 5
)
]
salesorder_write = """
insert into t (
site,
sale_type,
sales_rep,
customer_number,
shipto_number,
cust_po_number,
fob,
order_number
) values %s
"""
execute_values (
cursor,
salesorder_write,
orders,
template = """(
%(site)s,
%(sale_type)s,
%(sales_rep)s,
%(customer_number)s,
%(shipto_number)s,
%(cust_po_number)s,
%(fob)s,
%(order_number)s
)""",
page_size = 1000
)

how to use two Lists to insert multiple records in Django Model query

I have three tables:
table1, table2, table3
from table1 i get the data using:
table1queryset = table1.objects.filter(token = 123)
it gives me 50 records.
from table2 i get the data using:
table2queryset = table2.objects.filter(name='andy')
it gives me 10 records.
table3 structure is like:
mytoken = models.foreignKey(table1)
myname = models.foreignKey(table2)
now for every table1 record i want to insert table2 record into table3. like:
for eachT1 in table1queryset:
for eachT2 in table2queryset:
table3(mytoken=eachT1,myname=eachT2).save()
in my case it will insert 50*10 = 500 records.
what is the most efficiant way of using this?
can i assign both queryset to query, something like:
table3(mytoken=table1queryset,myname=table2queryset).save()
Look at RelatedManager.add It is promised that "Using add() with a many-to-many relationship, however, will not call any save() methods, but rather create the relationships using QuerySet.bulk_create()."
If you define ManyToMany field in first model and tell django that this relationship must work through table 3 (like ManyToManyField.through) you can do:
t2s = [x.id for x in tq2]
for t1 in tq1:
tq.add(*t2s)
And probably it will be more efficient than creation of separate instances of T3 model.

Categories