I have a table in PostGIS with several rasters, which have the same spatial reference, but the tiffs are from different dates. Now I am trying to access the column "rast" to detect changes between rows. My aim is to subtract the pixel value of the first row from the second and then from the third row's pixel values, and so on.
How can I iterate over the rows and subtract the pixel values of each row from the following row?
[enter image description here][1]
#!/usr/bin/python
# -*- coding: utf-8 -*-
import psycopg2
import sys
conn = None
conn = psycopg2.connect(database="postgres", user="postgres",host="localhost", password="password")
cur = conn.cursor()
cur.execute('SELECT * from my_table')
while True:
row = cur.fetchone()
if row == None:
break
rast_col = row[1]
I imported several rasters, which have the same spatial area but diffrent dates via following command:
C:\Program Files\PostgreSQL\9.6\bin>raster2pgsql -s 4326 -F -I "C:\User\Desktop\Data\*.tif" public.all_data|psql -U User -h localhost -p 5432
This is the table that was created in postgresql after importing the data [1]: https://i.stack.imgur.com/uBHX3.jpg
Each row is representing one raster image in "TIFF" format. The column "rast" contains the pixel values. My aim is to calculate the diffrence between the adjacent rows...same like the lag windows function, but it does not work on raster column type...
The only thing, that i fixed was calculating the diffrence between two raster images. For that I had to create for each row a separate table. U can see it below:
CREATE TABLE table1 AS SELECT * FROM my_tabke WHERE rid=1;
CREATE TABLE table2 AS SELECT * FROM my_table WHERE rid=2;
And then I did a simple MapAlgebra Operation on both tables like this:
SELECT ST_MapAlgebra(t1.rast,t2.rast, '([rast1]-[rast2])') AS rast INTO diffrence FROM table1 t1, table2 t2;
but this is just the diffrence between two rasters, and for the MapAlgebra operation I had to create extra tables for each raster images. But I have more the 40 raster images in one table, and I want to detect the change of all adjacent rows between my table.
The lag() window function should work on raster columns just like on any old column. It just selects the value from a row before the current offset by some amount in the window frame.
You of course cannot just subtract rasters using Postgresql operators – not without overloading at least.
In order to calculate the differences between adjacent rasters ordered by rid you should pass the lagged raster as an argument to ST_MapAlgebra
SELECT ST_MapAlgebra(rast, lag(rast) OVER (ORDER BY rid DESC),
'[rast1] - [rast2]')
FROM my_table;
Since lag() selects rows before the current row in the partition, the rows are ordered by rid in descending order; 2 comes before 1 etc. Also because a window frame by default consists only of rows that come before the current row, this is easier than using lead() and a frame clause that selects rows following the current.
Disclaimer
I've not used rasters and you may have to fine tune the query to suit your specific needs.
Related
I have a 11 columns x 13,470,621 rows pytable. The first column of the table contains a unique identifier to each row (this identifier is always only present once in the table).
This is how I select rows from the table at the moment:
my_annotations_table = h5r.root.annotations
# Loop through table and get rows that match gene identifiers (column labeled gene_id).
for record in my_annotations_table.where("(gene_id == b'gene_id_36624' ) | (gene_id == b'gene_id_14701' ) | (gene_id == b'gene_id_14702')"):
# Do something with the data.
Now this works fine with small datasets, but I will need to routinely perform queries in which I can have many thousand of unique identifiers to match for in the table's gene_id column. For these larger queries, the query string can quickly get very large and I get an exception:
File "/path/to/my/software/python/python-3.9.0/lib/python3.9/site-packages/tables/table.py", line 1189, in _required_expr_vars
cexpr = compile(expression, '<string>', 'eval')
RecursionError: maximum recursion depth exceeded during compilation
I've looked at this question (What is the PyTables counterpart of a SQL query "SELECT col2 FROM table WHERE col1 IN (val1, val2, val3...)"?), which is somehow similar to mine, but was not satisfactory.
I come from an R background where we often do these kinds of queries (i.e. my_data_frame[my_data_frame$gene_id %in% c("gene_id_1234", "gene_id_1235"),] and was wondering if there was comparable solution that I could use with pytables.
Thanks very much,
Another approach to consider is combining 2 functions: Table.get_where_list() with Table.read_coordinates()
Table.get_where_list(): gets the row coordinates fulfilling the given condition.
Table.read_coordinates(): Gets a set of rows given their coordinates (in a list), and returns as a (record) array.
The code would look something like this:
my_annotations_table = h5r.root.annotations
gene_name_list = ['gene_id_36624', 'gene_id_14701', 'gene_id_14702']
# Loop through gene names and get rows that match gene identifiers (column labeled gene_id)
gene_row_list = []
for gene_name in gene_name_list:
gene_rows = my_annotations_table.get_where_list("gene_id == gene_name"))
gene_row_list.extend(gene_rows)
# Retieve all of the data in one call
gene_data_arr = my_annotations_table.read_coordinates(gene_row_list)
Okay, I managed to do some satisfactory improvements on this.
1st: optimize the table (with the help of the documentation - https://www.pytables.org/usersguide/optimization.html)
Create table. Make sure to specify the expectedrows=<int> arg as it has the potential to increase the query speed.
table = h5w.create_table("/", 'annotations',
DataDescr, "Annotation table unindexed",
expectedrows=self._number_of_genes,
filters=tb.Filters(complevel=9, complib='blosc')
#tb comes from import tables as tb ...
I also modified the input data so that the gene_id_12345 fields are simple integers (gene_id_12345 becomes 12345).
Once the table is populated with its 13,470,621 entries (i.e. rows),
I created a complete sorted index based on the gene_id column (Column.create_csindex()) and sorted it.
table.cols.gene_id.create_csindex()
table.copy(overwrite=True, sortby='gene_id', newname="Annotation table", checkCSI=True)
# Just make sure that the index is usable. Will print an empty list if not.
print(table.will_query_use_indexing('(gene_id == 57403)'))
2nd - The table is optimized, but I still can't query thousands of gene_ids at a time. So I simply separated them in chunks of 31 gene_ids (yes 31 was the absolute maximum, 32 was too much apparently).
I did not perform benchmarks, but querying ~8000 gene_ids now takes approximately 10 seconds which is acceptable for my needs.
So currently I have 1 script inserting sensor data into my weather database every hour using python.
I have now added a second script to add rainfall data into the same table also every hour.
Now the problem: When the 2nd script inserts, all other values get 'zeroed'. As displayed in grafana.
am I overwriting somewhere or, if someone could point me in the right direction
Weather sensors insert statement
sql=("INSERT INTO WEATHER_MEASUREMENT (AMBIENT_TEMPERATURE, AIR_PRESSURE, HUMIDITY) VALUES ({},{},{})".format(temperature,pressure,humidity))
mycursor.execute(sql)
weatherdb.commit()
Rainfall sensors insert
sql=("INSERT INTO WEATHER_MEASUREMENT (RAINFALL) VALUES ({})".format(rainfall))
mycursor.execute(sql)
weatherdb.commit()
Tell me if I understand it right:
Your table “WEATHER_MEASUREMENT” has 4 columns (apart ID): AMBIENT_TEMPERATURE, AIR_PRESSURE, HUMIDITY and RAINFALL.
When you add RAINFALL value it creates a new row in your table with other column values at “NULL” and this is the problem?
If this is the case, you probably want to update existing row with a query like:
sql = ("""
UPDATE _ WEATHER_MEASUREMENT
SET RAINFALL = "{}"
WHERE id_of_the_row = {}
""".format(rainfall, id)
mycursor.execute(sql)
You will need to find a way to figure out the ID of the row you just created with your Weather sensor insert statement (maybe search for last inserted row if you are sure of timings).
I'm hoping to duplicate my techniques for looping through tables in R using python in the ArcGIS/arcpy framework. Specifically, is there a practical way to loop through the rows of an attribute table using python and copy that data based on the values from previous table values?
For example, using R I would use code similar to the following to copy rows of data from one table that have unique values for a specific variable:
## table name: data
## variable of interest: variable
## new table: new.data
for (i in 1:nrow(data))
{
if (data$variable[i] != data$variable[i-1])
{
rbind(new.data,data[i,])
}
}
If I've written the above code correctly then in words, this for-loop simply checks to see if the current value in a table is different from the previous value and adds all column values for that row to the new table if it is in fact a new value. Any help with this thought process would be great.
Thanks!
To just get the unique values in a table in a field in arcpy:
import arcpy
table = "mytable"
field = "my_field"
# ArcGIS 10.0
unique_values = set(row.getValue(field) for row in iter(arcpy.SearchCursor(table).next, None))
# ArcGIS 10.1+
unique_values = {row[0] for row in arcpy.da.SearchCursor(table, field)}
Yes to loop through values in table using arcpy you want to use a cursor. Its been a while since I've used arcpy, but if I recall correctly the one you want is a search cursor. In its simplest form this is what it would look like:
import arcpy
curObj = arcpy.SearchCursor(r"C:/shape.shp")
row = curObj.next()
while row:
columnValue = row.getValue("columnName")
row = curObj.next()
As of version 10 (i think) they introduced a data access cursor which is orders of magnitude faster. Data access or DA cursors require you to declare what columns you want to have returned when you create the cursor. Example:
import arcpy
columns = ['column1', 'something', 'someothercolumn']
curObj = arcpy.da.SearchCursor(r"C:/somefile.shp", columns)
for row in curObj:
print 'column1 is', row[0]
print 'someothercolumn is', row[2]
I want to develop a script to update individual cells (row of a specific column) of an attribute table based on the value of the cell that comes immediately before it as well as data in other columns but in the same row. I'm sure that this can be done with cursors but I'm having trouble conceptualizing exactly how to tackle this.
Essentially what I want to do is this:
If Column A, row 13 = a certain value AND Column B, row 13 = a certain value (but different from A), then change Column A, row 13 to be the same value as Column A, row 12.
If this can't be done with cursors then maybe some kind of array or matrix, or list of lists would be the way to go? I'm basically looking for the best direction to take with this. EDIT: My files are shapefiles or I also have them in .csv format. My code is really basic right now:
import arcpy
from arcpy import env
env.workspace = "C:/All Data Files/My Documents All/My Documents/wrk"
inputLyr = "C:/All Data Files/My Documents All/My Documents/wrk/file.lyr"
fields = ["time", "lon", "activityIn", "time", "fixType"]
cursor180 = arcpy.da.SearchCursor(inputLyr, fields, """"lon" = -180""")
for row in cursor180:
# Print the rows that have no data, along with activity Intensity
print row[0], row[1], row[2]
I'm using cx_Oracle to select rows from one database and then insert those rows to a table in another database. The 2nd table's columns match the first select.
So I have (simplified):
db1_cursor.execute('select col1, col2 from tab1')
rows = db1_cursor.fetchall()
db2_cursor.bindarraysize = len(rows)
db2_cursor.setinputsizes(cx_Oracle.NUMBER, cx_Oracle.BINARY)
db2_cursor.executemany('insert into tab2 values (:1, :2)', rows)
This works fine, but my question is how to avoid the hard coding in setinputsizes (I have many more columns).
I can get the column types from db1_cursor.description, but I'm not sure how to feed those into setinputsizes. i.e. how can I pass a list to setinputsizes instead of arguments?
Hope this makes sense - new to python and cx_Oracle
Just use tuple unpacking.
eg.
db_types = (d[1] for d in db1_cursor.description)
db2_cursor.setinputsizes(*db_types)