Can't Set Primary Key Via Sqlalchemy But Can Via PGAdmin4 - python

For the following dataframe:
LAD20CD LAD20NM BNG_E BNG_N ... LAT Shape__Area Shape__Length geometry
0 E07000154 Northampton 476495 260539 ... 52.237751 8.255064e+07 38381.688084 POLYGON ((-0.8091414142605670 52.2753276939684...
1 E07000246 Somerset West and Taunton 304960 130228 ... 51.063480 1.191178e+09 233156.429712 POLYGON ((-3.0538047490802600 51.2059417666536...
2 E07000040 East Devon 313790 96050 ... 50.757599 8.182959e+08 169999.596103 MULTIPOLYGON (((-3.0524230989883701 50.9082640...
3 E07000044 South Hams 270676 54036 ... 50.371948 8.921215e+08 234574.690559 POLYGON ((-3.5842498548751598 50.4777231181161...
4 E07000202 Ipswich 617161 244456 ... 52.055920 4.084468e+07 29187.875675 POLYGON ((1.1578388391924299 52.08875163594530...
5 E06000026 Plymouth 249945 58255 ... 50.404942 8.288777e+07 49419.795939 POLYGON ((-4.1230475222729899 50.3467427583020...
6 E07000079 Cotswold 402125 208209 ... 51.772549 1.167649e+09 275881.531075 POLYGON ((-1.6657543045486300 51.9874888219864...
7 E08000002 Bury 379658 410768 ... 53.593102 1.007527e+08 57024.343964 POLYGON ((-2.2717870687905002 53.6145142332618...
8 E07000084 Basingstoke and Deane 454508 151423 ... 51.259369 6.345992e+08 122971.049819 POLYGON ((-0.9861237505300590 51.3628482885656...
9 E07000078 Cheltenham 394925 222232 ... 51.898609 4.653884e+07 31000.684891 POLYGON ((-2.0102151915442801 51.9029244535680...
10 E07000126 South Ribble 352017 425840 ... 53.726749 1.151752e+08 66247.390716 POLYGON ((-2.5994877797848099 53.7814710235385...
11 E08000037 Gateshead 420168 559658 ... 54.931198 1.475563e+08 67934.528110 POLYGON ((-1.7697567363655600 54.9809837372463...
12 E07000068 Brentwood 558560 196070 ... 51.641079 1.530372e+08 62499.674509 POLYGON ((0.4023278825251010 51.65099490683400...
13 E08000026 Coventry 432807 279689 ... 52.414230 9.979901e+07 43481.405727 POLYGON ((-1.4590531741648900 52.4551580337384...
14 S12000029 South Lanarkshire 284634 636071 ... 55.604530 1.771616e+09 247590.081941 POLYGON ((-4.1070317994739796 55.8346525858565...
15 E07000029 Copeland 310871 508739 ... 54.466171 7.405896e+08 142439.232915 POLYGON ((-3.1671393240152499 54.4541106699468...
16 E08000034 Kirklees 414586 416223 ... 53.642330 4.053064e+08 106837.808449 POLYGON ((-1.6816208841975799 53.7564689245214...
17 E06000017 Rutland 492992 308655 ... 52.667648 3.921855e+08 96395.318751 POLYGON ((-0.4950258021289160 52.6402363852470...
18 E07000121 Lancaster 356896 464988 ... 54.079010 5.801983e+08 167797.392829 POLYGON ((-2.4608627348339200 54.2267161360627...
19 E08000025 Birmingham 408150 287352 ... 52.484039 2.690266e+08 88776.343219 POLYGON ((-1.7880812993329001 52.5878626088220..
I can successfully save the dataframe to Postgres using the following code:
# CONNECT TO POSTGRES.
conn_params_dict = {"user":"postgres",
"password":"postgres",
# FOR host, USE THE POSTGRES INSTANCE CONTAINER NAME, AS THE CONTAINER IP CAN CHANGE.
"host":"postgres",
"database":"github_projects"}
connect_alchemy = "postgresql+psycopg2://%s:%s#%s/%s" % (
conn_params_dict['user'],
conn_params_dict['password'],
conn_params_dict['host'],
conn_params_dict['database']
)
# CREATE POSTGRES ENGINE (CONNECTION POOL).
engine = create_engine(connect_alchemy)
# CONVERT geometry COLUMN FROM DTYPE geometry TO DTYPE object TO ALLOW DATAFRAME TO BE SAVED TO POSTGRES.
lad_gdf['geometry'] = lad_gdf['geometry'].apply(lambda x: wkt.dumps(x))
pd.DataFrame(lad_gdf).to_sql("shapefile_lad20", con = engine, if_exists='replace', index=True,
dtype={"lad20code":sqlalchemy.types.Text,
"lad20nm":sqlalchemy.types.Text,
"bng_e":sqlalchemy.types.Integer,
"bng_n":sqlalchemy.types.Integer,
"long":sqlalchemy.types.Float,
"lat":sqlalchemy.types.Float,
"shape__area":sqlalchemy.types.Float,
"shape__length":sqlalchemy.types.Float,
"geometry":sqlalchemy.types.Text
})
I then try to set the Primary Key using the following:
set_primary_key = engine.execute("""
ALTER TABLE shapefile_lad20 ADD PRIMARY KEY (lad20cd)
""")
set_primary_key.close()
But this fails and gives the error:
ProgrammingError: (psycopg2.errors.UndefinedColumn) column "lad20cd" of relation "shapefile_lad20" does not exist
The lad20cd attribute very much does exist. I tried changing the case to LAD20CD in case that was the issue but I got the same result.
Strangely, I can set LAD20CD as the Primary Key via the PGAdmin4 GUI so I am not sure what the issue is here?
I have to convert a geometry column from dtype = geometry to dtype = object so that I can save the dataframe to Postgres - could this step possibly be the cause?
Thanks

Related

How to add Search plugin in folium for multiple fields?

I'm trying to add a search bar in folium map using folium plugins.
Data:
import geopandas
states = geopandas.read_file(
"https://raw.githubusercontent.com/PublicaMundi/MappingAPI/master/data/geojson/us-states.json",
driver="GeoJSON",
)
states_sorted = states.sort_values(by="density", ascending=False)
states_sorted.head(5).append(states_sorted.tail(5))[["name", "density"]]
def rd2(x):
return round(x, 2)
minimum, maximum = states["density"].quantile([0.05, 0.95]).apply(rd2)
mean = round(states["density"].mean(), 2)
import branca
colormap = branca.colormap.LinearColormap(
colors=["#f2f0f7", "#cbc9e2", "#9e9ac8", "#756bb1", "#54278f"],
index=states["density"].quantile([0.2, 0.4, 0.6, 0.8]),
vmin=minimum,
vmax=maximum,
)
colormap.caption = "Population Density in the United States"
id name density geometry
0 01 Alabama 94.650 POLYGON ((-87.35930 35.00118, -85.60667 34.984...
1 02 Alaska 1.264 MULTIPOLYGON (((-131.60202 55.11798, -131.5691...
2 04 Arizona 57.050 POLYGON ((-109.04250 37.00026, -109.04798 31.3...
3 05 Arkansas 56.430 POLYGON ((-94.47384 36.50186, -90.15254 36.496...
4 06 California 241.700 POLYGON ((-123.23326 42.00619, -122.37885 42.0...
Folium Map:
import folium
from folium.plugins import Search
m = folium.Map(location=[38, -97], zoom_start=4)
def style_function(x):
return {
"fillColor": colormap(x["properties"]["density"]),
"color": "black",
"weight": 2,
"fillOpacity": 0.5,
}
stategeo = folium.GeoJson(
states,
name="US States",
style_function=style_function,
tooltip=folium.GeoJsonTooltip(
fields=["name", "density"], aliases=["State", "Density"], localize=True
),
).add_to(m)
statesearch = Search(
layer=stategeo,
geom_type="Polygon",
placeholder="Search for a US State",
collapsed=False,
search_label="name",
weight=3,
).add_to(m)
folium.LayerControl().add_to(m)
colormap.add_to(m)
m
In the above map user can search only by US state name, is it possible to include multiple fields for search, like searching based on density/ id / Name??

Check if geographical polygon is valid

I have a df that look like this
coordinates={"type":"zone","bound":"POLYGON ((11.31767 43.32289, 11.32205 43.32467, 11.3235 43.32458, 11.32395 43.32474, 11.32411 43.32522, 11.32623 43.32516, 11.32647 43.32459, 11.32576 43.32435, 11.32581 43.32384, 11.32438 43.32332, 11.32803 43.32171, 11.32573 43.32016, 11.32571 43.31896, 11.32588 43.31844, 11.32319 43.31699, 11.32058 43.31589, 11.31782 43.31419, 11.3171 43.31093, 11.3166 43.31046, 11.31569 43.31045, 11.31344 43.31128, 11.31158 43.31121, 11.3097 43.31289, 11.30727 43.31445, 11.30414 43.31606, 11.3027 43.31726, 11.30154 43.31853, 11.29848 43.32291, 11.29457 43.3281, 11.29194 43.3313, 11.29289 43.33069, 11.29388 43.33036, 11.29505 43.33021, 11.29745 43.33008, 11.30058 43.33046, 11.3029 43.33021, 11.30485 43.33054, 11.30569 43.33197, 11.30626 43.33223, 11.30809 43.3325, 11.30907 43.33198, 11.31024 43.33192, 11.312 43.33134, 11.31369 43.32529, 11.31767 43.32289))"}
df=pd.DataFrame([coordinates])
I would love to know if the column "bound" is a valid polygon and if it is not I want to fix it
I tries .is_valid but it doesn't look working
for that you can use geopandas:
import geopandas as gpd
coordinates = {"type":"zone", "bound":"POLYGON ((11.31767 43.32289, 11.32205 43.32467, 11.3235 43.32458, 11.32395 43.32474, 11.32411 43.32522, 11.32623 43.32516, 11.32647 43.32459, 11.32576 43.32435, 11.32581 43.32384, 11.32438 43.32332, 11.32803 43.32171, 11.32573 43.32016, 11.32571 43.31896, 11.32588 43.31844, 11.32319 43.31699, 11.32058 43.31589, 11.31782 43.31419, 11.3171 43.31093, 11.3166 43.31046, 11.31569 43.31045, 11.31344 43.31128, 11.31158 43.31121, 11.3097 43.31289, 11.30727 43.31445, 11.30414 43.31606, 11.3027 43.31726, 11.30154 43.31853, 11.29848 43.32291, 11.29457 43.3281, 11.29194 43.3313, 11.29289 43.33069, 11.29388 43.33036, 11.29505 43.33021, 11.29745 43.33008, 11.30058 43.33046, 11.3029 43.33021, 11.30485 43.33054, 11.30569 43.33197, 11.30626 43.33223, 11.30809 43.3325, 11.30907 43.33198, 11.31024 43.33192, 11.312 43.33134, 11.31369 43.32529, 11.31767 43.32289))"}
foo = gpd.GeoDataFrame([coordinates])
foo['geometry'] = gpd.GeoSeries.from_wkt(foo['bound'])
foo.is_valid
0 True
dtype: bool

How to use values from a PANDAS data frame as filter params in regex?

I would like to use the values from a Pandas df as filter params in a SPARQL query.
By reading the data from an excel file I'm creating the pandas dataframe:
xls = pd.ExcelFile ('excel/dataset_nuovo.xlsx')
df1 = pd.read_excel(xls, 'Sheet1')
print(df1)
Here the resulting dataframe:
oggetto descrizione lenght label tipologia
0 #iccd4580759# Figure: putto. Oggetti: ghirlanda di fiori 6 Bad OpereArteVisiva
1 #iccd3636719# Decorazione plastica. 2 Bad OpereArteVisiva
2 #iccd3641475# Scultura.. Figure: angelo 3 Bad OpereArteVisiva
3 #iccd8282504# Custodia di reliquiario in legno intagliato e ... 8 Good OpereArteVisiva
4 #iccd3019633# Portale. 1 Bad OpereArteVisiva
... ... ... ... ... ...
59995 #iccd2274873# Ciotola media a larga tesa. Decorazione in cob... 35 Good OpereArteVisiva
59996 #iccd11189887# Il medaglione bronzeo, sormontato da un'aquila... 85 Good OpereArteVisiva
59997 #iccd4545324# Tessuto di fondo rosaceo. Disegno a fiori e fo... 49 Good OpereArteVisiva
59998 #iccd2934870# Sculture a tutto tondo in legno dipinto di bia... 28 Good OpereArteVisiva
59999 #iccd2685205# Calice con piede a base circolare e nodo ovoid... 14 Bad OpereArteVisiva
Then I need to use the values from the oggetto column as filter to retrieve for (each record) the relative subject from a SPARQL endpoint.
By using this SPARQL query:
SELECT ?object ?description (group_concat(?subject;separator="|") as ?subjects)
WHERE { ?object a crm:E22_Man-Made_Object;
crm:P3_has_note ?description;
crm:P129_is_about ?concept;
crm:P2_has_type ?type.
?concept a crm:E28_Conceptual_Object;
rdfs:label ?subject.
filter( regex(str(?object), "#iccd4580759#" ))
}
I'm able to filter one single record.
object.type object.value ... subjects.type subjects.value
0 uri http://dati.culturaitalia.it/resource/oai-oaic... ... literal Putto con ghirlanda di fiori|Putto con ghirlan..
Since the dataset is 60k records I would automatize the process by looping through the dataframe and use the value as filter to have a new df with a relative subject col.
oggetto descrizione subject lenght label tipologia
0 #iccd4580759# Figure: putto. Oggetti: ghirlanda di fiori Putto con ghirlanda di fiori|Putto con ghirlan.. 6 Bad OpereArteVisiva
Here the entire script I wrote:
import xlrd
import pandas as pd
from pandas import json_normalize
from SPARQLWrapper import SPARQLWrapper, JSON
xls = pd.ExcelFile ('excel/dataset_nuovo.xlsx')
df1 = pd.read_excel(xls, 'Sheet1')
print(df1)
def query_ci(sparql_query, sparql_service_url):
sparql = SPARQLWrapper(sparql_service_url)
sparql.setQuery(sparql_query)
sparql.setReturnFormat(JSON)
# ask for the result
result = sparql.query().convert()
return json_normalize(result["results"]["bindings"])
sparql_query = """ SELECT ?object ?description (group_concat(?subject;separator="|") as ?subjects)
WHERE { ?object a crm:E22_Man-Made_Object;
crm:P3_has_note ?description;
crm:P129_is_about ?concept;
crm:P2_has_type ?type.
?concept a crm:E28_Conceptual_Object;
rdfs:label ?subject.
filter( regex(str(?object), "#iccd4580759#" ))
}
"""
sparql_service_url = "http://dati.culturaitalia.it/sparql"
result_table = query_ci(sparql_query, sparql_service_url)
print (result_table)
result_table.to_excel("output.xlsx")
Is it possible to do that?

How to know if a request is fulfilled on python?

I making a request to download some data on python from Copernicus website. The thing is that I want to know when the request is fulfilled and when the download is finished.
Is is solid enough to work with 2 flags(request_finished and download_finished)?
import cdsapi
c = cdsapi.Client()
latitude = 43.1 # North, South
longitude = -1.5 # West , East
#str(latitude)+'/'+str(longitude)+'/'+str(latitude)+'/'+str(longitude)
r = c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type':'reanalysis',
'variable':[
'100m_u_component_of_wind','100m_v_component_of_wind','10m_u_component_of_wind','10m_v_component_of_wind','2m_temperature',
'surface_pressure'
],
'area' : str(latitude)+'/'+str(longitude)+'/'+str(latitude)+'/'+str(longitude), # North, West, South, East. Default: global
'year':'2018',
'grid':'0.1/0.1', # Latitude/longitude grid in degrees: east-west (longitude) and north-south resolution (latitude). Default: reduced Gaussian grid
'month':'01',
'day':[
'01','02','03',
'04','05','06',
'07','08','09',
'10','11','12',
'13','14','15',
'16','17','18',
'19','20','21',
'22','23','24',
'25','26','27',
'28','29','30',
'31'
],
'time':[
'00:00','01:00','02:00',
'03:00','04:00','05:00',
'06:00','07:00','08:00',
'09:00','10:00','11:00',
'12:00','13:00','14:00',
'15:00','16:00','17:00',
'18:00','19:00','20:00',
'21:00','22:00','23:00'
],
'format':'netcdf'
}
)
request_finished = 1
r.download('download_grid_reduction_one_month_point_limit.nc')
download_finished = 1

python mysql update failing when the string is huge

I have a table where a text column needs be updated. The column report_2_comments is the text column. When I update the column with a small string - "This is test message", I dont have any issues but when I update using the given below message, I get this error.
e == not enough arguments for format string
context = DbContext()
try:
qry = """Update Report_Table
set {2} = '{3}'
where valid_flag = 'Y' and report_status = 'C'
and report_name = '{0}'
and date(report_run_date) = '{1}';
""".format('Daily Errors Report'
, '2016-07-09', 'report_2_comments',
'8-3-2016 01:00 EST Affected DC region 1,000 errors over 2.5 hours (2%) 8-3-2016 13:00 EST Affected Virginia 500 errors over 11 hours (2%) 1233 8-3-2016 13:00 EST Affected 212/1412121001 - Date/skljld (sdlkjd)NOT_FOUND) 90,800 errors over 11 hours (2%) sldkdsdsd Fiber cut 8-3-2016 17:00 EST Affected 16703 - sdsdsd, WV (Tune Error) 15,400 errors over 7.5 hours (0.6%) sdkjd dskdjhsd sdkjhd')
print 'update qry == ', qry
output = context.execute(qry,())
except Exception as e:
print 'e == ', e
qry
Update vbo.Report_Table
set report_2_comments = '8-3-2016 01:00 EST Affected DC region 1,000 errors over 2.5 hours (2%) 8-3-2016 13:00 EST Affected Virginia 500 errors over 11 hours (2%) 1233 8-3-2016 13:00 EST Affected 212/1412121001 - Date/skljld (sdlkjd)NOT_FOUND) 90,800 errors over 11 hours (2%) sldkdsdsd Fiber cut 8-3-2016 17:00 EST Affected 16703 - sdsdsd, WV (Tune Error) 15,400 errors over 7.5 hours (0.6%) sdkjd dskdjhsd sdkjhd'
where valid_flag = 'Y' and report_status = 'C'
and report_name = 'Daily Errors Report'
and date(report_run_date) = '2016-07-09';
Table definition.
CREATE TABLE Report_Table (
id bigint(19) NOT NULL auto_increment,
report_name varchar(200),
report_run_date datetime,
report_status char(25),
valid_flag char(1),
report_1_comments text(65535),
report_2_comments text(65535),
report_3_comments text(65535),
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Categories