I am trying to automatically process data in python using a direct query to our SQL database. This part is done, but the code specifies the runnumber each time and when a new batch starts, you have to re-input the newest runnumber into the code before you can proceed - a waste of time and on a recurring basis!
df = pd.read_sql(f"Select time, temp from datatable where machineid=84207 and runnumber=1616862158", conn)
I'd like to update this code with the most recent runnumber (the most recent runnumber will always be a maximum value in the database for each machine) without having to look up and type in the most recent runnumber. There are many different machines that are all collecting data into the SQL database, hence having to specify each dataframe with each machineid. Again, that part of the code is finished and I can replicate for all machines. However, I don't want to do this runnumber for each one, as it will dynamically change over time.
So I'm trying to find a way to either create a string that will define the max runnumber automatically such as this:
pp1 = pd.read_sql(f"Select Max(runnumber) from datatable where machineid=84207", conn)
print(pp1)
max
0 1616862158
Is there a way I can substitute this value contained within "pp1" into a string in the query line below? Or set the runnumber=max? I'm not familiar enough with the syntax or options within the WHERE commands or python commands to set this up. Can someone help?
df = pd.read_sql(f"Select time, temp from datatable where machineid=84207 and runnumber=1616862158", conn)
Thank you in advance!
You could add a subquery in the where statement like this.
select time, temp from datatable where machineid=84207 and runnumber=(select max(runnumber) from datatable)
Related
I am working with cloud firestore with an API in Python. I need to make a where clause to extract some users that meet the condition that the date that your account was processed is less than the date that was updated.
The problem with these dates is that they are in tick format:
'last_processed': 637053568312425740,
'last_updated': 637053568312425740,
It seems that they are int numbers so I should be able to make a simple comparation and only take the values that meet this condition, but it is not working, this is what I have done so far:
persons =
db.collection(u'collections').where(u'last_processed', u'<',
u'last_updated')
person_docs = persons.stream()
for person_doc in person_docs:
print(u'{} => {}'.format(person_doc.id, person_doc.to_dict()))
I can extract all the content in this table without this where clause, and also if I apply different where clauses for other variables in the table it is working, so can someone explain me why is not working?
This type of call can't be made using Firestore. This is more similar to a SQL statement that would run some operation before returning data.
With the type of operation you're trying to do here, you're better off doing it client side or storing an indicator with the data when you initially add it.
I would add a boolean marker that you can call (e.g. all from blanklabelcom_persons where shouldBeProcessed == true, and update that to false after it's updated, etc.
I am unable to fetch a limited amount of rows. I always get an error whenever I attempt it.
Please note that i'm also trying to paginate the limited number of rows that are fetched.
The program works fine without using the limit. Its able to fetch randomly and paginate but gives an error when I try using .limit()
def question(page=1):
# questions = Question.query.paginate(page, per_page=1)
quess = Question.query.order_by(func.rand())
quest = quess.limit(2)
questions = quest.paginate(page, per_page=1)
This is the error I keep getting...
sqlalchemy.exc.InvalidRequestError
InvalidRequestError: Query.order_by() being called on a Query which already has LIMIT or OFFSET applied. To modify the row-limited results of a Query, call from_self() first. Otherwise, call order_by() before limit() or offset() are applied.
You can't call order_by() after limit() in the same query. SQLAlchemy doesn't allow you to do that.
Try to use from_self() before calling paginate().
Question.query.order_by(func.rand())
.limit(2).from_self()
.paginate(page, per_page=1)
I am querying a postgresql database through python's psycopg2 package.
In short: The problem is psycopg2.fetchmany() yields a different table everytime I run a psydopg2.cursor.execute() command.
import psycopg2 as ps
conn = ps.connect(database='database', user='user')
nlines = 1000
tab = "some_table"
statement= """ SELECT * FROM """ + tab + " LIMIT %d;" %(nlines)
crs.execute(statement)
then I fetch the data in pieces. Running the following executes just fine and each time when I scroll back to the beginning I get the same results.
rec=crs.fetchmany(10)
crs.scroll(0, mode='absolute')
print rec[-1][-2]
However, if I run the crs.execute(statement) again and then fetch the data, it yields a completely different output. I tried running ps.connect again, do conn.rollback(), conn.reset(), crs.close() and nothing ever resulted in consisted output from the table. I also tried a named cursor with scrollable enabled
crs= conn.cursor(name="cur1")
crs.scrollable=1
...
crs.scroll(0, mode= 'absolute')
still no luck.
You don't have any ORDER BY clause in your query, and Postgres does not guarantee any particular ordering without one. It's particularly likely to change ordering for tables which have lots of churn (i.e. lots of inserts/updates/deletes).
See the Postgres SELECT doc for more details, but the most salient snippet here is this:
If the ORDER BY clause is specified, the returned rows are sorted in
the specified order. If ORDER BY is not given, the rows are returned
in whatever order the system finds fastest to produce.
I wouldn't expect any query, regardless of the type of cursor used, to necessarily return the exact same result set given a query of this type.
What happens when you add an explicit ORDER BY?
I am working on a temperature sensor network using onewire temperature sensors that runs on a Raspberry Pi 2. I am following this tutorial and as I was going along, I realized that his setup is for one temperature sensor, whereas my setup needs to work with multiple sensors.
As a result of having multiple sensors, I also need to be able to differentiate the sensors from one another. To do this, I want to have 3 columns in the SQLite table. I am encountering the error when I execute the Python script that is supposed to log the readout from the sensor, the date and time, and the sensor name.
Here is the problem, when I am configuring the python script to write three values to the table, I get an error.
Here is the code that I am getting an error when executing
#!/usr/bin/env python
import sqlite3
import os
import time
import glob
# global variables
speriod=(15*60)-1
dbname='/var/www/templog.db'
# store the temperature in the database
def log_temperature(temp):
conn=sqlite3.connect(dbname)
curs=conn.cursor()
sensor1 = 'Sensor1'
curs.execute("INSERT INTO temps values(datetime('now'), (?,?))" (temp, sensor1))
# commit the changes
conn.commit()
conn.close()
"INSERT INTO temps values(datetime('now'), (?,?))" (temp, sensor1)
Breaking this down you will see that this creates a string and then the parenthesis appears to Python to be a function call. However this is nonsensical because you have a string that you are trying to call like it is a function. Hence the error you get about str not being callable, this is definitely a bit cryptic if you are not experienced with Python. Essentially you are missing a comma:
curs.execute("INSERT INTO temps values(datetime('now'), (?,?))", (temp, sensor1))
Now you will get the ? placeholders correctly filled in.
Often the "str is not callable" error will be a result of typos such as this or duplicated variable names (you think you are calling a function but the variable really contained a string), so start by looking for those problems when you see this type of error.
You have to put a , there:
curs.execute("INSERT INTO temps values(datetime('now'), (?,?))" , (temp, sensor1))
From the documentation
Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method
As you can see you need to provide the tuple as the second argument to the function execute
I have a database with roughly 30 million entries, which is a lot and i don't expect anything but trouble working with larger database entries.
But using py-postgresql and the .prepare() statement i would hope i could fetch entries on a "yield" basis and thus avoiding filling up my memory with only the results from the database, which i aparently can't?
This is what i've got so far:
import postgresql
user = 'test'
passwd = 'test
db = postgresql.open('pq://'+user+':'+passwd+'#192.168.1.1/mydb')
results = db.prepare("SELECT time time FROM mytable")
uniqueue_days = []
with db.xact():
for row in result():
if not row['time'] in uniqueue_days:
uniqueue_days.append(row['time'])
print(uniqueue_days)
Before even getting to if not row['time'] in uniqueue_days: i run out of memory, which isn't so strange considering result() probably fetches all results befor looping through them?
Is there a way to get the library postgresql to "page" or batch down the results in say a 60k per round or perhaps even rework the query to do more of the work?
Thanks in advance!
Edit: Should mention the dates in the database is Unix timestamps, and i intend to convert them into %Y-%m-%d format prior to adding them into the uniqueue_days list.
If you were using the better-supported psycopg2 extension, you could use a loop over the client cursor, or fetchone, to get just one row at a time, as psycopg2 uses a server-side portal to back its cursor.
If py-postgresql doesn't support something similar, you could always explicitly DECLARE a cursor on the database side and FETCH rows from it progressively. I don't see anything in the documentation that suggests py-postgresql can do this for you automatically at the protocol level like psycopg2 does.
Usually you can switch between database drivers pretty easily, but py-postgresql doesn't seem to follow the Python DB-API, so testing it will take a few more changes. I still recommend it.
You could let the database do all the heavy lifting.
Ex: Instead of reading all the data into Python and then calculating unique_dates why not try something like this
SELECT DISTINCT DATE(to_timestamp(time)) AS UNIQUE_DATES FROM mytable;
If you want to strictly enforce sort order on unique_dates returned then do the following:
SELECT DISTINCT DATE(to_timestamp(time)) AS UNIQUE_DATES
FROM mytable
order by 1;
Usefull references for functions used above:
Date/Time Functions and Operators
Data Type Formatting Functions
If you would like to read data in chunks you could use the dates you get from above query to subset your results further down the line:
Ex:
'SELECT * FROM mytable mytable where time between' +UNIQUE_DATES[i] +'and'+ UNIQUE_DATES[j] ;
Where UNIQUE_DATES[i]& [j] will be parameters you would pass from Python.
I will leave it for you to figure how to convert date into unix timestamps.