I am using the sqlalchemy package to state queries to my postgis database which is filled with .osm data of a city. I want to retrieve the longitude and latitude values from lets say the planet_osm_point table.
I state the sql query which looks like this:
SELECT st_y(st_asewkt(st_transform(way, 4326))) as lat,
st_x(st_asewkt(st_transform(way, 4326))) as lon
"addr:housenumber" AS husenumber,
"addr:street" AS street,
"addr:postcode" AS postcode
FROM planet_osm_point
Sqlalchemy throws me this error:
sqlalchemy.exc.InternalError: (psycopg2.InternalError) FEHLER: Argument to ST_Y() must be a point
The only problem is the ST_Y() and ST_X() function.
ST_X/ST_Y Return floats. You could either use the floats or cast them to text.
Using ST_AsEWKT is problematic here since both ST_Y/ST_X return floats and ST_AsEWKT expects a geometry.
Use the floats you get:
SELECT st_y(st_transform(way, 4326)) AS lat,
st_x(st_transform(way, 4326)) AS lon
"addr:housenumber" AS husenumber,
"addr:street" AS street,
"addr:postcode" AS postcode
FROM planet_osm_point
Or cast to text:
SELECT cast(st_y(st_transform(way, 4326)) as text) AS lat,
cast(st_x(st_transform(way, 4326)) as text) AS lon
"addr:housenumber" AS husenumber,
"addr:street" AS street,
"addr:postcode" AS postcode
FROM planet_osm_point
Related
I have a dataframe like this
Name age city
John 31 London
Pierre 35 Paris
...
Kasparov 40 NYC
I would like to select data from redshift city table using sql where city are included in city of the dataframe
query = select * from city where ....
Can you help me to accomplish this query?
Thank you
Jeril's answer is going to right direction but not complete. df.unique() result is not a string it's series. You need a string in your where clause
# create a string for cities to use in sql, the way sql expects the string
unique_cities = ','.join("'{0}'".format(c) for c in list(df['city'].unique()))
# output
'London','Paris'
#sql query would be
query = f"select * from city where name in ({unique_cities})"
The code above is assuming you are using python 3.x
Please let me know if this solves your issue
You can try the following:
unique_cities = df['city'].unique()
# sql query
select * from city where name in unique_cities
I have a database of every major airport's lat/long coords all across the world. I only need a portion of them (specifically in the USA) that are listed in a separate .csv file.
This csv file has two columns I extracted data from into two lists: The origin airport code (IATA code) and the destination airport code (also IATA).
My database has a column for IATA, and essentially I'm trying to query this database to get the respective lat/long coords for each airport in the two lists I have.
Here's my code:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('sqlite:///airport_coordinates.db')
# The dataframe that contains the IATA codes for the airports I need
airport_relpath = "data/processed/%s_%s_combined.csv" % (file, airline)
script_dir = os.path.dirname(os.getcwd())
temp_file = os.path.join(script_dir, airport_relpath)
fields = ["Origin_Airport_Code", "Destination_Airport_Code"]
df_airports = pd.read_csv(temp_file, usecols=fields)
# the origin/destination IATA codes for the airports I need
origin = df_airports.Origin_Airport_Code.values
dest = df_airports.Destination_Airport_Code.values
# query the database for the lat/long coords of the airports I need
sql = ('SELECT lat, long FROM airportCoords WHERE iata IN %s' %(origin))
indexcols = ['lat', 'long']
df_origin = pd.read_sql(sql, engine)
# testing the origin coordinates
print(df_origin)
This is the error I'm getting:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such
table: 'JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA' [SQL: "SELECT lat, long
FROM airportCoords WHERE iata IN ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK'
'MIA']"] (Background on this error at: http://sqlalche.me/e/e3q8)
It's definitely because I'm not querying it correctly (since it thinks my queries are supposed to tables).
I tried looping through the list to query each element individually, but the list contains over 604,885 elements and my computer was not able to come up with any output.
Your error is in using string interpolation:
sql = ('SELECT lat, long FROM airportCoords WHERE iata IN %s' %(origin))
Because origin is a Numpy array this results in a [....] SQL identifier syntax in the query; see the SQLite documentation:
If you want to use a keyword as a name, you need to quote it. There are four ways of quoting keywords in SQLite:
[...]
[keyword] A keyword enclosed in square brackets is an identifier. [...]
You asked SQLite to check if iata is in a table named ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA'] because that's the string representation of a Numpy array.
You are already using SQLAlchemy, it would be easier if you used that library to generate all SQL for you, including the IN (....) membership test:
from sqlalchemy import *
filter = literal_column('iata', String).in_(origin)
sql = select([
literal_column('lat', Float),
literal_column('long', Float),
]).select_from(table('airportCoords')).where(filter)
then pass sql in as the query.
I used literal_column() and table() objects here to shortcut directly to the names of the objects, but you could also ask SQLAlchemy to reflect your database table directly from the engine object you already created, then use the resulting table definition to generate the query:
metadata = MetaData()
airport_coords = Table('airportCoords', metadata, autoload=True, autoload_with=engine)
at which point the query would be defined as:
filter = airport_coords.c.iata.in_(origin)
sql = select([airport_coords.c.lat, airport_coords.c.long]).where(filter)
I'd also include the iata code in the output, otherwise you will have no path back to connecting IATA code to the matching coordinates:
sql = select([airport_coords.c.lat, airport_coords.c.long, airport_coords.c.iata]).where(filter)
Next, as you say you have 604,885 elements in the list, so you probably want to load that CSV data into a temporary table to keep the query efficient:
engine = create_engine('sqlite:///airport_coordinates.db')
# code to read CSV file
# ...
df_airports = pd.read_csv(temp_file, usecols=fields)
# SQLAlchemy table wrangling
metadata = MetaData()
airport_coords = Table('airportCoords', metadata, autoload=True, autoload_with=engine)
temp = Table(
"airports_temp",
metadata,
*(Column(field, String) for field in fields),
prefixes=['TEMPORARY']
)
with engine.begin() as conn:
# insert CSV values into a temporary table in SQLite
temp.create(conn, checkfirst=True)
df_airports.to_sql(temp.name), engine, if_exists='append')
# Join the airport coords against the temporary table
joined = airport_coords.join(temp, airport_coords.c.iata==temp.c.Origin_Airport_Code)
# select coordinates per airport, include the iata code
sql = select([airport_coords.c.lat, airport_coords.c.long, airport_coords.c.iata]).select_from(joined)
df_origin = pd.read_sql(sql, engine)
I am using pandas and a Dataframe to deal with some data. I want to load the data into a mySQL dabase where one of the fields is a Point.
In the file I am parsing with python I have the lat and lon of the points.
I have created a dataframe (df) with the point information (id and coords):
id coords
A GeomFromText( ' POINT(40.87 3.80) ' )
I have saved in coords the command required in mySQL to create a Point from the text. However, when executing:
from sqlalchemy import create_engine
engine = create_engine(dbconnection)
df.to_sql("point_test",engine, index=False, if_exists="append")
I got the following error:
DataError: (mysql.connector.errors.DataError) 1416 (22003): Cannot get
geometry object from data you send to the GEOMETRY field
Triggered because df.to_sql transforms the GeomFromText( ' POINT(40.87
3.80) ' ) into string as "GeomFromText( ' POINT(40.87 3.80) ' )" when it should be the execution of the function GeomFromText in mySQL.
Does anyone has a suggestion about how to insert in mySQL geometrical fields information originally in text form using pandas dataframe?
A work around is to create a temporary table with the String of the geometrical information that need to be added and then update the point_test table with a call to ST_GeomFromText from the temporary table.
Assuming database with table point_test with id (VARCHAR(5)) and coords(POINT):
a.Create dataframe df as an example with point "A" and "B"
dfd = np.array([['id','geomText'],
["A","POINT( 50.2 5.6 )"],
["B","POINT( 50.2 50.4 )"]])
df=pd.DataFrame(data=dfd[1:,:], columns=dfd[0,:])
b.Add point "A" and "B" into point_test but only the id and add the string "geomText" into the table temp_point_test
df[['id']].to_sql("point_test",engine, index=False, if_exists="append")
df[['id', 'geomText']].to_sql("temp_point_test",engine, index=False, if_exists="append")
c. Update table point_test with the point from table temp_point_test applying the ST_GeomFromText() to the select. Finally, drop temp_point_test:
conn = engine.connect()
conn.execute("update point_test pt set pt.coords=(select ST_GeomFromText(geomText) from temp_point_test tpt "+
"where pt.id=tpt.id)")
conn.execute("drop table temp_point_test")
conn.close()
I am inserting data from one table to another, however for some reason I get "unrecognized token". This is the code:
cur.execute("INSERT INTO db.{table} SELECT distinct latitude, longitude, port FROM MessageType1 WHERE latitude>={minlat} AND latitude<={maxlat} AND longitude>= {minlong} AND longitude<= {maxlong}".format(minlat = bottomlat, maxlat = toplat, minlong = bottomlong, maxlong = toplong, table=tablename))
This translates to the following, with values:
INSERT INTO db.Vardo SELECT distinct latitude, longitude, port FROM MessageType1 WHERE latitude>=69.41 AND latitude<=70.948 AND longitude>= 27.72 AND longitude<= 28.416
The error code is the following:
sqlite3.OperationalError: unrecognized token: "70.948 AND"
Is the problem that there is three decimal points?
This is the create statement for the table:
cur.execute("CREATE TABLE {site} (latitude, longitude, port)".format(site = site))
Don't make your SQL queries via string formatting, use the driver's ability to prepare SQL queries and pass parameters into the query - this way you would avoid SQL injections and it would make handling of passing parameters of different types transparent:
query = """
INSERT INTO
db.{table}
SELECT DISTINCT
latitude, longitude, port
FROM
MessageType1
WHERE
latitude >= ? AND
latitude <= ? AND
longitude >= ? AND
longitude <= ?
""".format(table=tablename)
cur.execute(query, (bottomlat, toplat, bottomlong, toplong))
Try using ? for your parameters:
cur.execute("INSERT INTO db.? SELECT distinct latitude, longitude, port FROM MessageType1 WHERE latitude>=? AND latitude<=? AND longitude>= ? AND longitude<= ?",(bottomlat, toplat, bottomlong, toplong, tablename))
Brand new to python and loving it, and I imagine this might be a simple one.
I am currently inserting points into SQL Server 2008 via a Python script with the help of pymssql.
var1 = "hi"
lat = "55.92"
lon = "-3.29"
cursor.execute("INSERT INTO table (field1, x, y) VALUES(%s, %s, %s)",
(var1 , lat, lon))
This all works fine.
I need to also insert those coordinates into a GEOGRAPHY type field (called geog).
geog_type = "geography::STGeomFromText('POINT(%s %s)',4326))" % (lat, lon)
cursor.execute("INSERT INTO table (field1, x, y, geog) VALUES(%s, %s, %s, %s)",
(var1 , lat, lon, geog_type))
This throws the following exception:
The label geography::STGeomFro in the input well-known text (WKT) is
not valid. Valid labels are POINT, LINESTRING, POLYGON, MULTIPOINT,
MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION, CIRCULARSTRING,
COMPOUNDCURVE, CURVEPOLYGON and FULLGLOBE (geography Data Type only).
From SSMS I can run an insert statement on the table to insert a point fine.
USE [nosde]
INSERT INTO tweets (geog)
VALUES(
geography::STGeomFromText(
'POINT(55.9271035250276 -3.29431266523898)',4326))
Let me know in the comments if you need more details.
Some of my workings on pastebin.
Several issues - firstly, you're supplying the coordinates in the wrong order - the STPointFromText() method expects longitude first, then latitude.
Secondly, it may be easier to use the Point() method rather than the STPointFromText() method, which doesn't require any string manipulation - just supply the two numeric coordinate parameters directly. http://technet.microsoft.com/en-us/library/bb933811.aspx
But, from the error message, it appears that the value you're sending is attempting to be parsed as a WKT string. If this is the case, you don't want the extra geography::STGeomFromText and the SRID at the end anyway - these are assumed. So try just supplying:
geog_type = "'POINT(%s %s)'" % (lon, lat)
cursor.execute("INSERT INTO table (field1, x, y, geog) VALUES(%s, %s, %s, %s)",
(var1 , lat, lon, geog_type))
I'm not sure if you need the extra single quotes in the first line or not, but don't have a system to test on at the moment.