Shapely/Pyproj transformation throws OverflowError - determining length of LineString - python

I'm working in a Jupyter Notebook and deleted code I thought I didn't need the other. Now I get an overflow error, when running the notebook. I'm pretty sure the code used to work just fine and the problem is caused by me stupidly deleting stuff.
Anyway, I can't seem to find what is missing and would really appreciate help. I'm using a list with coordinates, convert them to a linestring and then transform them. Finally, I lookup the length.
import pyproj
from pyproj import Transformer
from shapely.ops import transform
from shapely.geometry import LineString
route = [[41.875562, -87.624421], [29.949932, -90.070116], [40.712728, -74.006015]]
ls = LineString(route)
project = pyproj.Transformer.from_proj(
pyproj.Proj(init='epsg:4326'),
pyproj.Proj(init='epsg:3857'))
ls_metric = transform(project.transform, ls)
ls_metric_length = round(ls_metric.length / 1000)
This returns
OverflowError: cannot convert float infinity to integer
The problem arises already with ls_metric which doesn't generate a LineString.

I ran your code and got this warning:
FutureWarning: '+init=<authority>:<code>' syntax is deprecated.
'<authority>:<code>' is the preferred initialization method
Sure enough I changed the pyproj Transformer and got a result:
project = pyproj.Transformer.from_proj(
pyproj.Proj('epsg:4326'),
pyproj.Proj('epsg:3857'))
gives a length of 3984 km.
I used the latest versions in a venv:
pyproj==2.6.0
Shapely==1.7.0
The warning above also gives another important note regarding axis order changes; in short:
pyproj.Proj('epsg:4326') works with [lat,lng], [lat,lng] ...
pyproj.Proj(init='epsg:4326') works with [lng,lat], [lng,lat] ...
the first one being the preferred way while the second is deprecated.

Related

How to add several new columns if not already present using if statement in geopandas

new to coding and first time posting so let me know if there is something I haven't included.
I am trying to add new columns to a shapefile. I have a list of columns that are required and sometimes the shapefile will have only some of these. So I need to check which ones are present and add those which are not. I have tried to achieve this using the following code:
import geopandas as gpd
shape = gpd.read_file("shapefile.shp")
add_fields = [['GUI',
'POLYGON',
'ORIG_HAB',
'ORIG_CLASS',
'EUNIS_L3',
'HAB_TYPE',
'VERSION',
'DET_MTHD',
'DET_NAME',
'TRAN_COM',
'T_RELATE',
'VAL_COMM',
'DataAccess',
'Three_Step',
'MESH_Confi']]
field_name_list = shape.columns.tolist()
for fieldToAdd in add_fields:
if fieldToAdd not in field_name_list:
shape.reindex(shape.columns.tolist() + fieldToAdd, axis=1)
shape.to_file("outputshapefile.shp")
When I run the code there are no errors, however the shapefile appears unaltered. I have scoured the forums to see what to try next and that's partly where I found the .reindex. I think that I am not actually adding the new columns but not sure how to do this. If there is anyone can that can point me in the right direction it would be very much appreciated.
I think you are looking for something like this:
import geopandas as gpd
shapefile = gpd.read_file("shapefile.shp")
add_fields = ['GUI',
'POLYGON',
'ORIG_HAB',
'ORIG_CLASS',
'EUNIS_L3',
'HAB_TYPE',
'VERSION',
'DET_MTHD',
'DET_NAME',
'TRAN_COM',
'T_RELATE',
'VAL_COMM',
'DataAccess',
'Three_Step',
'MESH_Confi']
field_name_list = shapefile.columns.tolist()
for fieldToAdd in add_fields:
if fieldToAdd not in field_name_list:
#shapefile.reindex(shapefile.columns.tolist().append(fieldToAdd), axis=1)
shapefile[fieldToAdd]=None
shapefile.to_file("outputshapefile.shp")
Please notice that I've changed add_fields to be a 1D instead of 2D (removed outer brackets)
Also I changed the name to shapefile instead of shape, just so to not be confused with a df property.

Python DTW Package Correlation and Cosine Distance not Working

I am trying to use the dtw function like:
alignment = dtw(x, y, dist_method='correlation')
However, the 'cosine' and 'correlation' never worked regardless of my input data (other measures such as euclidean always work). The error message is:
ValueError: No warping path found compatible with the local constraints
So I tried all different combinations of other parameters:
open_begin/open_end: True of False
step_pattern: 'symmetric2' or 'asymmetric'
I still get the same error.
Any help is welcome. Thanks

Pandas: where is autocorrelation_plot?

I'm trying to plot an autocorrelation_plot() of a time series using pandas.
According to this SO post pandas.tools was removed in 0.24.0 and the autocorrelation_plot function can now be found in the pandas.plotting library. However the API shows no reference to this function.
I'm able to plot an autocorrelation by importing the function but where can I find the documentation?
from pandas.plotting import autocorrelation_plot # works fine
slope = -1
offset = 250
noise_scale = 100
npts = 100
x = np.linspace(0, 100, npts)
y = slope*x + noise_scale*np.random.rand(npts) + offset
autocorrelation_plot(y)
Python: 3.7.2
Pandas: 0.24.1
I think this would probably be more appropriate as an issue in GitHub.
In any case, autocorrelation_plot and the similar plots (andrews_curves, radviz,...) are probably going to be moved out of pandas, into a separate package. So you can expect to have to call something like pandas_matplotlib.autocorrelation_plot() in the future (see #28177).
In the meantime, I'm adding it and some other missing functions to the documentation in #28179. When the pull request is merged, you'll be able to see the docs in https://dev.pandas.io. But there is nothing very interesting for autocorrelation_plot:
Have a look at:
https://github.com/pandas-dev/pandas/blob/v0.24.1/pandas/plotting/_misc.py#L600
Looks like it was buried in the plotting._misc source code.
You can at least find a reference and a short doc here: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#visualization-autocorrelation
Btw, you can search the docs for any keyword: https://pandas.pydata.org/pandas-docs/stable/search.html?q=autocorrelation_plot&check_keywords=yes&area=default#

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

If I try to run the script below I get the error: LinAlgError: SVD did not converge in Linear Least Squares. I have used the exact same script on a similar dataset and there it works. I have tried to search for values in my dataset that Python might interpret as a NaN but I cannot find anything.
My dataset is quite large and impossible to check by hand. (But I think my dataset is fine). I also checked the length of stageheight_masked and discharge_masked but they are the same. Does anyone know why there is an error in my script and what can I do about it?
import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval
kwargs = dict(delimiter = '\t',\
skip_header = 0,\
missing_values = 'NaN',\
converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
dtype = float,\
names = True,\
)
rating_curve_Gillisstraat = np.genfromtxt('G:\Discharge_and_stageheight_Gillisstraat.txt',**kwargs)
discharge = rating_curve_Gillisstraat['discharge'] #change names of columns
stageheight = rating_curve_Gillisstraat['stage'] - 131.258
#mask NaN
discharge_masked = np.ma.masked_array(discharge,mask=np.isnan(discharge)).compressed()
stageheight_masked = np.ma.masked_array(stageheight,mask=np.isnan(discharge)).compressed()
#sort
sort_ind = np.argsort(stageheight_masked)
stageheight_masked = stageheight_masked[sort_ind]
discharge_masked = discharge_masked[sort_ind]
#regression
a1,b1,c1 = polyfit(stageheight_masked, discharge_masked, 2)
discharge_predicted = polyval([a1,b1,c1],stageheight_masked)
print 'regression coefficients'
print (a1,b1,c1)
#create upper and lower uncertainty
upper = discharge_predicted*1.15
lower = discharge_predicted*0.85
#create scatterplot
plt.scatter(stageheight,discharge,color='b',label='Rating curve')
plt.plot(stageheight_masked,discharge_predicted,'r-',label='regression line')
plt.plot(stageheight_masked,upper,'r--',label='15% error')
plt.plot(stageheight_masked,lower,'r--')
plt.axhline(y=1.6,xmin=0,xmax=1,color='black',label='measuring range')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,2)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()
I don't have your data file, but it almost always that case that when you get that error you have NaN's or infinity in your data. Look for both of those using pd.notnull or np.isfinite
As others have pointed out, the problem is likely that there are rows without numericals for the algorithm to work with. This is an issue with most regressions.
That's the problem. The solution then, is to do something about that. And that depends on the data. Often, you can replace the NaNs with 0s, using Pandas .fillna(0) for example. Sometimes, you might have to interpolate missing values, and Pandas .interpolate() is probably the simplest solution to that as well. Or, when it's not a time series, you might be able to simply drop the rows with NaNs in them, using for example Pandas .dropna() method. Or, sometimes it's not about the NaNs, but about the infs or others, and then there are other solutions for that: https://stackoverflow.com/a/55293137/12213843
Exactly which way to go about it, is up to the data. And it's up to you to interpret the data. And domain knowledge goes a long way to interpret the data well.
As ski_squaw mentions the error is most of the time due to NaN's, however for me this error came after a windows update. I was using numpy version 1.16. Moving my numpy version to 1.19.3 solved the issue. (run pip install numpy==1.19.3 --user in the cmd)
This gitHub issue explains it more:
https://github.com/numpy/numpy/issues/16744
Numpy 1.19.3 doesn't work on Linux and 1.19.4 doesn't work on Windows.
I developed a code on windows 8.
So now I'm using windows 10 and the problem popped up!
It was resolved as #Joris said.
pip install numpy==1.19.3
my example after fix:
def calculating_slope(x):
x = x.replace(np.inf, np.nan).replace(-np.inf, np.nan).dropna()
if len(x)>1:
slope = np.polyfit(range(len(x)), x, 1)[0]
else:
slope = 0
return slope

IPython and OS X terminal output is line wrapping before column limit

I'm using IPython working with the pandas module which allows for the DataFrame object. When I'm running some code, I get an output where the DataFrame output is wrapping before the width of my terminal despite that the terminal width should accommodate the length. This issue seems to be isolated only to the pandas Series and DataFrame objects and not say, a long list.
Running pip uninstall readline and then reinstalling readline through easy_install and restarting IPython did not solve the problem.
It would be helpful to see my data not broken up like that, but I honestly don't know where to begin to fix this. Any insight?
I found a workaround that allows the console output to be more readable. Calling to_string() on the DataFrame object returns a string representation of the object, skirting around whatever inherent formatting that DataFrame contains, especially since the goal is readability.
data = DataFrame(some_long_list)
print data.to_string() # outputs to console's full-width
EDIT:
From pandas docs: "New since 0.10.0, wide DataFrames will now be printed across multiple rows by default". I'm seeing that this helps as a default so that instead of cramming rows onto the next line, you'll see separation by column. There are two additional methods to configure output width:
import pandas as pd
pd.set_option('line_width', 40) # default is 80
or to turn off the wrap feature completely:
pd.set_option('expand_frame_repr', False)

Categories