Index out of bounds error for raster data extraction code

Index out of bounds error for raster data extraction code - python

I am using a code written by Victor Velasquez to extract data from raster files which contain dayly precipitation data since 1981.
When I run the code, I get this error that some index is out of bounds. I did a little research and found that this is common and there are a lot of similar questions here, but I haven´t been able to find the specific solution for this case.
The error:
IndexError Traceback (most recent call last)
<ipython-input-8-eff66ef74d73> in <module>
1 Pisco = Extract_Pisco()
----> 2 Pisco.DataPre()
3 Pisco.ExportExcel()
<ipython-input-7-6cf99336b9e1> in DataPre(self)
23 Band = Data.read(1)
24 X,Y = Data.index(self.x,self.y) #extraigo
---> 25 Pre = Band[X,Y]
26 self.ListPre.append(Pre) #agrego a lista
27
IndexError: index 158116290 is out of bounds for axis 0 with size 198
The part of the code pointed by the traceback is:
def DataPre(self):
os.chdir(path)
fileDir= path
fileExt = r".tif"
Lis = [_ for _ in os.listdir(fileDir) if _.endswith(fileExt)]
Lis.sort() #ordeno archivos .tif
Inicio = '1981-01-01.tif'
Fin = '2018-07-31.tif'
Rini = Lis.index(Inicio)
Rend = Lis.index(Fin)
self.Lis = Lis[Rini:Rend+1]
self.ListPre = []
for i in tnrange (0,len(self.Lis),desc = "!! Extrayendo Datos !!"):
with rasterio.open(self.Lis[i]) as Data:
Band = Data.read(1)
X,Y = Data.index(self.x,self.y)
Pre = Band[X,Y]
self.ListPre.append(Pre)
Thank you very much!

It looks like the file you are reading does not contain the geospatial point you are trying to find data for. (If this is incorrect please let me know).
You can add a statement to catch if a point is contained in the data:
Band = Data.read(1)
X,Y = Data.index(self.x,self.y)
if 0 <= X < Band.height and 0 <= Y <= Band.width:
Pre = Band[X,Y]
self.ListPre.append(Pre)

Related

Why do I get "index 0 is out of bounds for axis 0 with size 0" for a large dataset but not for a small dataset?

enter image description here
<ipython-input-46-4dc0f8b3e097> in number_of_real_and_fake_videos(data_list)
8 for i in data_list:
9 temp_video = i.split('/')[-1]
---> 10 label = lab.iloc[(labels.loc[labels["file"] == temp_video].index.values[0]),1]
11 if(label == 'FAKE'):
12 fake+=1
IndexError: index 0 is out of bounds for axis 0 with size 0
I am getting this error but when I use small dataset it will give output but for a large dataset it will show an index error.

It looks like when you run the line labels.loc[labels["file"] == temp_video].index.values[0] within your loop you run into the situation where there are no values in labesl["file"] matching temp_video. This leaves you with an empty array, which can't return anything for the element at index position 0. To simplify, here's your error:
import pandas as pd
empty_df = pd.DataFrame([])
empty_df.index.values[0]
which gives you:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [1], in <module>
----> 1 empty_df.index.values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
To fix this, you could try checking that there are matches before calling the value:
for i in data_list:
temp_video = i.split('/')[-1]
temp_video_filter = labels.loc[labels["file"] == temp_video]
if len(temp_video_filter > 0):
label = lab.iloc[(temp_video_filter.index.values[0]),1]
if(label == 'FAKE'):
fake+=1
I suspect there is a way to solve your problem with the pandas api without using the for loop, so you may want to look into that also.

Python request API

I'm trying to retrieve some data off the NOAA API, but there is an error that I'm not able to resolve
location=[]
def find_xy(Name, lat, long):
api = url+str(lat)+','+str(long)
r = requests.get(api).json()
x = r['properties']['gridX']
y = r['properties']['gridY']
xy=(Name, str(lat), str(long), x, y)
location.append(xy)
for i in dfgrid:
Name = dfgrid['Name']
lat = dfgrid['Lat']
long = dfgrid['Long']
find_xy(Name,lat,long)
There is a list of lat and longs in dfgrid, I'd like to loop through each coordinate and grab the gridX and gridY values in the NOAA API
I'm able to pull this data using one example but when I try to loop through the entire dfgrid I receive the following error
--------------------------------------------------------------------------- KeyError Traceback (most recent call
last) in
3 lat = dfgrid['Lat']
4 long = dfgrid['Long']
----> 5 find_xy(Name,lat,long)
in find_xy(Name, lat, long)
3 api = url+str(lat)+','+str(long)
4 r = requests.get(api).json()
----> 5 x = r['properties']['gridX']
6 y = r['properties']['gridY']
7 xy=(Name, str(lat), str(long), x, y)
KeyError: 'properties'

Resolved this by
dfgrid = pd.DataFrame(df,columns=['Name','Lat','Long'])
dfgrid['Lat']=dfgrid['Lat'].astype(str)
dfgrid['Long']=dfgrid['Long'].astype(str)
dfgrid['coordinate']= dfgrid['Lat']+","+dfgrid['Long']
What I was doing before was adding the Lat and Long even though I str() both the Lat and Long float... This gave the error on the API request. Still trying to process why str(Lat)+","+str(Long) didn't work, but regardless I found a solution.
Thanks everyone who tried to help. Very much appreciated.
Mike

BitCoin Algo not iterating through historical data correctly

I'm creating a simple trading backtester on Bitcoin, yet I'm having trouble with the for loops in my code. The current code is based on 2 simple moving averages q and z (currently for learning purposes no real strategy). info is a dataframe holding Bitcoin historical data from a csv file. There seems to be an outofbounce error and I can't figure it out. Any help would be appreciated.
import pandas as pd
import numpy as np
cash = 10000
file = 'BTC-USD.csv'
data = pd.read_csv(file)
y = data['Adj Close'][1000:]
x = data['Date'][1000:]
v = data['Volume'][1000:]
h = data['High'][1000:]
l = data['Low'][1000:]
def movAvg(values,time):
times=np.repeat(1.0,time)/time
sma = np.convolve(values,times,'valid')
return sma
z = movAvg(y,12)
q = movAvg(y,9)
SP = len(x[50-1:])
def AlgoCal(account,info):
#i = 1050
bought = False
test = []
for x in info.index:
if q[x]<z[x]:
if bought == False:
temp = info[x]
account = account-info[x]
test.append(account)
bought = True
elif q[x]>z[x]:
if bought == True:
temp = info[x]
account = account + info[x]
test.append(account)
bought = False
else:
print("Error")
return(test)
money = AlgoCal(cash,y)
print(money)
Sample Data from Yahoo Bitcoin csv
Date,Open,High,Low,Close,Adj Close,Volume
2014-09-17,465.864014,468.174011,452.421997,457.334015,457.334015,21056800
2014-09-18,456.859985,456.859985,413.104004,424.440002,424.440002,34483200
........
........
2020-05-21,9522.740234,9555.242188,8869.930664,9081.761719,9081.761719,39326160532
2020-05-22,9080.334961,9232.936523,9008.638672,9182.577148,9182.577148,29810773699
2020-05-23,9185.062500,9302.501953,9118.108398,9209.287109,9209.287109,27727866812
2020-05-24,9196.930664,9268.914063,9165.896484,9268.914063,9268.914063,27658280960
Error:
Traceback (most recent call last):
File "main.py", line 47, in <module>
money = AlgoCal(cash,y)
File "main.py", line 31, in AlgoCal
if q[x]<z[x]:
IndexError: index 1066 is out of bounds for axis 0 with size 1066

Your moving averages have two different lengths. One is 12 periods and the other is 9 periods. When you try to compare them in AlgoCal your short one runs out and gives you the out of bounds error.
If you are going to compare moving averages in this way, you need to add a minimum period at the beginning to only start when both averages are available.

Is there a way to fix maximum recursion level in python 3?

I'm trying to build a state map for data across a decade, with a slider to select the year displayed on the map. The sort of display where a user can pick 2014 and the map will show the data for 2014.
I merged the data I want to show with the appropriate shapefile. I end up with 733 rows and 5 columns - as many as 9 rows per county with the same county name and coordinates.
Everything seems to be okay until I try to build the map. This error message is returned:
OverflowError: Maximum recursion level reached
I've tried resetting the recursion limit using sys.setrecursionlimit but can't get past that error.
I haven't been able to find an answer on SO that I understand, so I'm hoping someone can point me in the right direction.
I'm using bokeh and json to build the map. I've tried using sys.setrecursionlimit but I get the same error message no matter how high I go.
I used the same code last week but couldn't get data from different years to display because I was using a subset of the data. Now that I've fixed that, I'm stuck on this error message.
def json_data(selectedYear):
yr = selectedYear
murders = murder[murder['Year'] == yr]
merged = mergedfinal
merged.fillna('0', inplace = True)
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
return json_data
geosource = GeoJSONDataSource(geojson = json_data(2018))
palette=brewer['YlOrRd'][9]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 60, nan_color = '#d9d9d9')
hover = HoverTool(tooltips = [ ('County/City','#NAME'),('Victims', '#Victims')])
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 30,
border_line_color=None,location = (0,0),
orientation = 'horizontal')
p = figure(title = 'Firearm Murders in Virginia', plot_height = 600 , plot_width = 950, toolbar_location = None, tools = [hover])
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.visible=False
p.yaxis.visible=False
p.patches('xs','ys', source = geosource,fill_color = {'field' :'Victims', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
def update_plot(attr, old, new):
year = Slider.value
new_data = json_data(year)
geosource.geojson = new_data
p.title.text = 'Firearm Murders in VA'
slider = Slider(title = 'Year', start = 2009, end = 2018, step = 1, value = 2018)
slider.on_change('value', update_plot)
layout = column(p,widgetbox(slider))
curdoc().add_root(layout)
output_notebook()
show(layout)
The same code worked well enough when I was using a more limited dataset. Here is the full context of the error message:
OverflowError Traceback (most recent call last)
<ipython-input-50-efd821491ac3> in <module>()
8 return json_data
9
---> 10 geosource = GeoJSONDataSource(geojson = json_data(2018))
11
12 palette=brewer['YlOrRd'][9]
<ipython-input-50-efd821491ac3> in json_data(selectedYear)
4 merged = mergedfinal
5 merged.fillna('0', inplace = True)
----> 6 merged_json = json.loads(merged.to_json())
7 json_data = json.dumps(merged_json)
8 return json_data
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
1087 force_ascii=force_ascii, date_unit=date_unit,
1088 default_handler=default_handler,
-> 1089 lines=lines)
1090
1091 def to_hdf(self, path_or_buf, key, **kwargs):
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
37 obj, orient=orient, date_format=date_format,
38 double_precision=double_precision, ensure_ascii=force_ascii,
---> 39 date_unit=date_unit, default_handler=default_handler).write()
40 else:
41 raise NotImplementedError("'obj' should be a Series or a DataFrame")
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in write(self)
83 date_unit=self.date_unit,
84 iso_dates=self.date_format == 'iso',
---> 85 default_handler=self.default_handler)
86
87
OverflowError: Maximum recursion level reached

I had a similar problem!
I narrowed my problem down to the .to_json step. For some reason when I merged my geopandas file on the right:
Neighbourhoods_merged = df_2016.merge(gdf_neighbourhoods, how = "left", on = "Neighbourhood#")
I ran into the recursion error. I found success by switching the two:
Neighbourhoods_merged = gdf_neighbourhoods.merge(df_2016, how = "left", on = "Neighbourhood#")
This is what worked for me. Infuriatingly I have no idea why this works, but I hope this might help someone else with the same error!

I solved this problem by changing the merge direction.
so, If you want to merge two dataframes A and B, and A has type of 'geopandas.geodataframe.GeoDataFrame' and B has 'pandas.core.frame.DataFrame
', you should merge them with pd.merge(A,B,on="some column'), not with the opposite direction.
I think the maximum recursion error comes when you execute .to_json() method to the pandas dataframe type with POLYGON type in it.
When you change the direction of merge and change the type to GeoDataFrame, .to_json() is executed without problem even they have POLYGON type column in it.
I spent 2 hours with this, and I hope this can help you.

If you need a higher recursion depth, you can set it using sys:
import sys
sys.setrecursionlimit(1500)
That being said, your error is most likely the result of an infinite recursion, which may be the case if increasing the depth doesn't fix it.

Calculating an average for every X number of lines

I am trying to take data from a text file and calculate an average for every 600 lines of that file. I'm loading the text from the file, putting it into a numpy array and enumerating it. I can get the average for the first 600 lines but I'm not sure how to write a loop so that python calculates an average for every 600 lines and then puts this into a new text file. Here is my code so far:
import numpy as np
#loads file and places it in array
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
shape = np.shape(data)
#creates array for u wind values
for i,d in enumerate(data):
data[i] = (d[3])
if i == 600:
minavg = np.mean(data[i == 600])
#finds total u mean for day
ubar = np.mean(data)

Based on what I understand from your question, it sounds like you have some file that you want to take the mean of every line up to the 600th one, and repeat that multiple times till there is no more data. So at line 600 you average lines 0 - 600, at line 1200 you average lines 600 to 1200.
Modulus division would be one approach to taking the average when you hit every 600th line, without having to use a separate variable to keep count how many lines you've looped through. Additionally, I used Numpy Array Slicing to create a view of the original data, containing only the 4th column out of the data set.
This example should do what you want, but it is entirely untested... I'm also not terribly familiar with numpy, so there are some better ways do this as mentioned in the other answers:
import numpy as np
#loads file and places it in array
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
shape = np.shape(data)
data_you_want = data[:,3]
daily_averages = list()
#creates array for u wind values
for i,d in enumerate(data_you_want):
if (i % 600) == 0:
avg_for_day = np.mean(data_you_want[i - 600:i])
daily_averages.append(avg_for_day)
You can either modify the example above to write the mean out to a new file, instead of appending to a list as I have done, or just write the daily_averages list out to whatever file you want.
As a bonus, here is a Python solution using only the CSV library. It hasn't been tested much, but theoretically should work and might be fairly easy to understand for someone new to Python.
import csv
data = list()
daily_average = list()
num_lines = 600
with open('testme.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter="\t")
for i,row in enumerate(reader):
if (i % num_lines) == 0 and i != 0:
average = sum(data[i - num_lines:i]) / num_lines
daily_average.append(average)
data.append(int(row[3]))
Hope this helps!

Simple solution would be:
import numpy as np
data = np.loadtxt('244UTZ10htz.txt', delimiter = '\t', skiprows = 2)
mydata=[]; counter=0
for i,d in enumerate(data):
mydata.append((d[3]))
# Find the average of the previous 600 lines
if counter == 600:
minavg = np.mean(np.asarray(mydata))
# reset the counter and start counting from 0
counter=0; mydata=[]
counter+=1

The following program uses array slicing to get the column, and then a list comprehension indexing into the column to get the means. It might be simpler to use a for loop for the latter.
Slicing / indexing into the array rather than creating new objects also has the advantage of speed as you're just creating new views into existing data.
import numpy as np
# test data
nr = 11
nc = 3
a = np.array([np.array(range(nc))+i*10 for i in range(nr)])
print a
# slice to get column
col = a[:,1]
print col
# comprehension to step through column to get means
numpermean = 2
means = [np.mean(col[i:(min(len(col), i+numpermean))]) \
for i in range(0,len(col),numpermean)]
print means
it prints
[[ 0 1 2]
[ 10 11 12]
[ 20 21 22]
[ 30 31 32]
[ 40 41 42]
[ 50 51 52]
[ 60 61 62]
[ 70 71 72]
[ 80 81 82]
[ 90 91 92]
[100 101 102]]
[ 1 11 21 31 41 51 61 71 81 91 101]
[6.0, 26.0, 46.0, 66.0, 86.0, 101.0]

Something like this works. Maybe not that readable. But should be fairly fast.
n = int(data.shape[0]/600)
interestingData = data[:,3]
daily_averages = np.mean(interestingData[:600*n].reshape(-1, 600), axis=1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Index out of bounds error for raster data extraction code - python

Related

Why do I get "index 0 is out of bounds for axis 0 with size 0" for a large dataset but not for a small dataset?

Python request API

BitCoin Algo not iterating through historical data correctly

Is there a way to fix maximum recursion level in python 3?

Calculating an average for every X number of lines

Categories

Resources