POSTGIS inserts become slow after some time - python

I have a location table with following structure:
CREATE TABLE location
(
id BIGINT,
location GEOMETRY,
CONSTRAINT location_pkey PRIMARY KEY (id, location),
CONSTRAINT enforce_dims_geom CHECK (st_ndims(location) = 2),
CONSTRAINT enforce_geotype_geom CHECK (geometrytype(location) = 'POINT'::TEXT OR location IS NULL),
CONSTRAINT enforce_srid_geom CHECK (st_srid(location) = 4326)
)
WITH (
OIDS=FALSE
);
CREATE INDEX location_geom_gist ON location
USING
GIST (location);
I run the following query to insert data:
def insert_location_data(msisdn, lat, lon):
if not (lat and lon):
return
query = "INSERT INTO location (id, location) VALUES ('%s', ST_GeomFromText('POINT(%s %s)', 4326))"%(str(id), str(lat), str(lon))
try:
cur = get_cursor()
cur.execute(query)
conn.commit()
except:
tb = traceback.format_exc()
Logger.get_logger().error("Error while inserting location in sql: %s", str(tb))
return False
return True
I run this block of code 10,000,000 times in a loop but somewhere after 1 million inserts the inserting speed drops drastically. The speed returns to normal when I restart the script but it again drops around a million documents and the same trend continues. I cannot figure out why?
Any help.

Here's a few tips.
Watch out for str(id), which would always return a string '<built-in function id>', since id is not shown to be a variable in the question, and is a built-in id() function.
The correct axis order for PostGIS is (X Y) or (lon lat).
There are more efficient ways to insert points.
Don't format a string to insert
This is how to insert one point:
cur.execute(
"INSERT INTO location (id, location) "
"VALUES (%s, ST_SetSRID(ST_MakePoint(%s, %s), 4326))",
(msisdn, lon, lat))
And see executemany if you want to insert more records at a time, where you would prepare a list of parameters to insert (i.e. [(msisdn, lon, lat), (msisdn, lon, lat), ..., (msisdn, lon, lat)]).

Related

Python sqlite3 - operationalerror near "2017"

I'm new to programming. I have dictionary called record, that receives various inputs like 'Color', 'Type' 'quantity',etc. Now I tried to add a Date column then insert into sqlite table running through the 'if loop' with the code below. But I get an "Operational error near 2017", ie near the date.
Can anyone help please? Thanks in advance
Date = str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d'))
record['Date'] = Date
column = [record['Color'], Date]
values = [record['quantity'], record['Date']]
column = ','.join(column)
if record['Type'] == 'T-Shirts' and record['Style'] == 'Soft':
stment = ("INSERT INTO xtrasmall (%s) values(?)" %column)
c.execute(stment, values)
conn.commit()
Updated
You can simplify the code as follows:
from datetime import datetime
date = datetime.now().date()
sql = "INSERT INTO xtrasmall (%s, Date) values (?, ?)" % record['Color']
c.execute(sql, (record['quantity'], date))
This substitutes the value of the selected color directly into the column names in the query string. Then the query is executed passing the quantity and date string as arguments. The date should automatically be converted to a string, but you could convert with str() if desired.
This does assume that the other colour columns have a default value (presumably 0), or permit null values.
Original answer
Because you are constructing the query with string interpolation (i.e. substituting %s for a string) your statement becomes something like this:
INSERT INTO xtrasmall (Red,2017-10-06) values(?)
which is not valid because 2017-10-06 is not a valid column name. Print out stment before executing it to see.
If you know what the column names are just specify them in the query:
values = ['Red', 2, Date]
c.execute("INSERT INTO xtrasmall (color, quantity, date) values (?, ?, ?)", values)
conn.commit()
You need to use a ? for each column that you are inserting.
It looks like you want to insert the dictionary using its keys and values. This can be done like this:
record = {'date':'2017-10-06', 'color': 'Red', 'quantity': 2}
columns = ','.join(record.keys())
placeholders = ','.join('?' * len(record.values()))
sql = 'INSERT INTO xtrasmall ({}) VALUES ({})'.format(columns, placeholders)
c.execute(sql, record.values())
This code will generate the parameterised SQL statement:
INSERT INTO xtrasmall (date,color,quantity) VALUES (?,?,?)
and then execute it using the dictionary's values as the parameters.

No error, but no data written to SQLite DB

I have a python list called result that contains 7 items...
print result
returns
[u'2013:11:29', u'17:01:11', u'Apple', u'iPhone 5', -36.57033055555556, 174.68374722222222, '5095c554fef7d990a2c57e0e12b18854']
I have this code setting up my DB, and writting the result into the DB
conn = sqlite3.connect('geopic.sqlite')
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS Results''')
cur.execute('''CREATE TABLE Results (Date DATE, Time TIME, Make TEXT, Model TEXT, Latitude FLOAT, Longitude FLOAT, Hash VARCHAR(128))''')
for result in results:
if result == None:
continue
else:
print result #this prints as expected
cur.execute('''INSERT INTO Results (Date, Time, Make, Model, Latitude, Longitude, Hash) VALUES (?,?,?,?,?,?,?)''', result)
This runs with no errors, but nothing gets written into the database. I'm stumped as to why!
Append conn.commit() to end of your code.
Example:
for result in results:
if result == None:
continue
else:
print result #this prints as expected
cur.execute('''INSERT INTO Results (Date, Time, Make, Model, Latitude, Longitude, Hash) VALUES (?,?,?,?,?,?,?)''', result)
conn.commit()
OK - Simple Answer - I forgot to use
conn.commit()
To write the changes to the DB.
DOH!

Unrecognized token in SQLite statement

I am inserting data from one table to another, however for some reason I get "unrecognized token". This is the code:
cur.execute("INSERT INTO db.{table} SELECT distinct latitude, longitude, port FROM MessageType1 WHERE latitude>={minlat} AND latitude<={maxlat} AND longitude>= {minlong} AND longitude<= {maxlong}".format(minlat = bottomlat, maxlat = toplat, minlong = bottomlong, maxlong = toplong, table=tablename))
This translates to the following, with values:
INSERT INTO db.Vardo SELECT distinct latitude, longitude, port FROM MessageType1 WHERE latitude>=69.41 AND latitude<=70.948 AND longitude>= 27.72 AND longitude<= 28.416
The error code is the following:
sqlite3.OperationalError: unrecognized token: "70.948 AND"
Is the problem that there is three decimal points?
This is the create statement for the table:
cur.execute("CREATE TABLE {site} (latitude, longitude, port)".format(site = site))
Don't make your SQL queries via string formatting, use the driver's ability to prepare SQL queries and pass parameters into the query - this way you would avoid SQL injections and it would make handling of passing parameters of different types transparent:
query = """
INSERT INTO
db.{table}
SELECT DISTINCT
latitude, longitude, port
FROM
MessageType1
WHERE
latitude >= ? AND
latitude <= ? AND
longitude >= ? AND
longitude <= ?
""".format(table=tablename)
cur.execute(query, (bottomlat, toplat, bottomlong, toplong))
Try using ? for your parameters:
cur.execute("INSERT INTO db.? SELECT distinct latitude, longitude, port FROM MessageType1 WHERE latitude>=? AND latitude<=? AND longitude>= ? AND longitude<= ?",(bottomlat, toplat, bottomlong, toplong, tablename))

Updating timestamp each time a row is added?

I have code that loops, adding a row with information to each row. However, I find that each row does not have a new timestamp, but rather has the same one as the very first row, leading me to believe that the value of current_timestamp is not updating each time. Thus, what fix this problem? Here is my code:
if __name__ == "__main__":
main()
deleteAll() # Clears current table
ID = 0
while ID < 100:
insert(ID, 'current_date', 'current_timestamp')
ID += 1
conn.commit()
my insert function:
def insert(ID, date, timestamp): # Assumes table name is test1
cur.execute(
"""INSERT INTO test1 (ID, date,timestamp) VALUES (%s, %s, %s);""", (ID, AsIs(date), AsIs(timestamp)))
This code is in python, btw, and it is using postgresql for database stuff.
The immediate fix is to commit after each insert otherwise all of the inserts will be done inside a single transaction
while ID < 100:
insert(ID, 'current_date', 'current_timestamp')
ID += 1
conn.commit()
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT
Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the "current" time, so that multiple modifications within the same transaction bear the same time stamp.
Those functions should not be passed as parameters but included in the SQL statement
def insert(ID): # Assumes table name is test1
cur.execute("""
INSERT INTO test1 (ID, date, timestamp)
VALUES (%s, current_date, current_timestamp);
""", (ID,)
)
The best practice is to keep the commit outside of the loop to have a single transaction
while ID < 100:
insert(ID)
ID += 1
conn.commit()
and use the statement_timestamp function which, as the name implies, returns the statement timestamp in instead of the transaction beginning timestamp
INSERT INTO test1 (ID, date, timestamp)
values (%s, statement_timestamp()::date, statement_timestamp())

Insert points into SQL Server?

Brand new to python and loving it, and I imagine this might be a simple one.
I am currently inserting points into SQL Server 2008 via a Python script with the help of pymssql.
var1 = "hi"
lat = "55.92"
lon = "-3.29"
cursor.execute("INSERT INTO table (field1, x, y) VALUES(%s, %s, %s)",
(var1 , lat, lon))
This all works fine.
I need to also insert those coordinates into a GEOGRAPHY type field (called geog).
geog_type = "geography::STGeomFromText('POINT(%s %s)',4326))" % (lat, lon)
cursor.execute("INSERT INTO table (field1, x, y, geog) VALUES(%s, %s, %s, %s)",
(var1 , lat, lon, geog_type))
This throws the following exception:
The label geography::STGeomFro in the input well-known text (WKT) is
not valid. Valid labels are POINT, LINESTRING, POLYGON, MULTIPOINT,
MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION, CIRCULARSTRING,
COMPOUNDCURVE, CURVEPOLYGON and FULLGLOBE (geography Data Type only).
From SSMS I can run an insert statement on the table to insert a point fine.
USE [nosde]
INSERT INTO tweets (geog)
VALUES(
geography::STGeomFromText(
'POINT(55.9271035250276 -3.29431266523898)',4326))
Let me know in the comments if you need more details.
Some of my workings on pastebin.
Several issues - firstly, you're supplying the coordinates in the wrong order - the STPointFromText() method expects longitude first, then latitude.
Secondly, it may be easier to use the Point() method rather than the STPointFromText() method, which doesn't require any string manipulation - just supply the two numeric coordinate parameters directly. http://technet.microsoft.com/en-us/library/bb933811.aspx
But, from the error message, it appears that the value you're sending is attempting to be parsed as a WKT string. If this is the case, you don't want the extra geography::STGeomFromText and the SRID at the end anyway - these are assumed. So try just supplying:
geog_type = "'POINT(%s %s)'" % (lon, lat)
cursor.execute("INSERT INTO table (field1, x, y, geog) VALUES(%s, %s, %s, %s)",
(var1 , lat, lon, geog_type))
I'm not sure if you need the extra single quotes in the first line or not, but don't have a system to test on at the moment.

Categories