Error while plotting bar graph in python - python

I have this current code below. The way the code is supposed to work is ask for a user name and then print all the projects and hours associated to that user within the called csv and then plot a vertical bar graph with the x-values being the project names and the y values being the hours for that project. I'm running into an error which is also shown below. How do I pull the right values from my code and plot them accordingly? I'm very novice with python so any help you can lend would be great.
Thanks in advance
import csv
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
with open('Report1.csv') as csvfile:
hour_summation = {}
read_csv = csv.reader(csvfile, delimiter=',')
name_found = False
while not name_found:
# take user input for name
full_name = input('Enter your full name: ').lower()
# start search at top of file. Have to do this or each loop will
# start at the end of the file after the first check.
csvfile.seek(0)
for row in read_csv:
# check to see if row matches name
if(' '.join((row[0], row[1]))).lower() == full_name.strip().lower():
name_found = True
hour_summation[row[2]] = hour_summation.get(row[2], 0) + int(float(row[3]))
# name wasn't found, so repeat process
if not name_found:
# name wasn't found, go back to start of while loop
print('Name is not in system')
else:
# break out of while loop. Technically not required since while
# will exit here anyway, but clarity is nice
break
print('This is {} full hours report:'.format(full_name))
for k, v in hour_summation.items():
print(k + ': ' + str(v) + ' hours')
for k, v in hour_summation.items():
x = np.array(k)
y = np.array(v)
y_pos = np.arange(len(y))
plt.bar(y_pos, y, align='center', alpha=0.5)
plt.xticks(y_pos, x)
plt.ylabel('hours')
plt.title('Projects for: ', full_name)
plt.show()
Here is the error I'm getting. It is doing everything as it should until it reaches the graph section.
Enter your full name: david ayers
This is david ayers full hours report:
AD-0001 Administrative Time: 4 hours
AD-0002 Training: 24 hours
AD-0003 Personal Time: 8 hours
AD-0004 Sick Time: 0 hours
OPS-0001 General Operational Support: 61 hours
SUP-2507 NAFTA MFTS OS Support: 10 hours
ENG-2001_O Parts Maturity-Overhead: 1 hours
ENG-2006_O CHEC 2 Tool-Overhead: 19 hours
FIN-2005 BLU Lake BOM Analytics: 52 hours
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-c9c8a4aa3176> in <module>()
36 x = np.array(k)
37 y = np.array(v)
---> 38 y_pos = np.arange(len(y))
39 plt.bar(y_pos, y, align='center', alpha=0.5)
40 plt.xticks(y_pos, x)
TypeError: len() of unsized object

Related

What should i put in x and y

Previously i got average prices for gas like this:
1993 1.0711538461538466
1994 1.0778653846153845
1995 1.1577115384615386
1996 1.2445283018867925
1997 1.2442499999999999
I want to get a graph, so i tried this code.
import matplotlib.pyplot as plt
import numpy as np
with open('c:/Gasprices.txt', 'r') as file:
td = dict()
for line in file:
year = line[6:10]
price = float(line[11:])
td.setdefault(year, []).append(price)
for k, v in td.items():
print(f'{k} {sum(v) / len(v):}')
x=[k], y=[sum(v) / len(v)]
plt.plot(x,y, 'o--')
plt.title('Average gas price per year in US')
plt.xlabel('year')
plt.ylabel('Avg.gas price per gallon[$]')
plt.grid()
plt.yticks(np.arange(1.0, 3.5, 0.5))
plt.tight_layout()
plt.show()
But i don't know what should i put in x and y.
I copied your code and sample data and here is my assessment and answer:
First of all, just to answer your question in plain English, ' X ' is usually used for time and ' Y ' is usually for values that have changed over time although based on your code I think you know that, so ' Year ' would be ' X ' and ' Average Gas Price ' would be ' Y'.
Next I did some modifications to your code to make it work (which I will explain after):
import matplotlib.pyplot as plt
import numpy as np
x = list()
y = list()
with open('e:/Gasprices.txt', 'r') as file:
for line in file:
line_striped = line.strip()
if line_striped == '':
continue
line_splitted = line_striped.split(' ')
x.append(
int(line_splitted[0])
)
y.append(
float(line_splitted[1])
)
plt.plot(x,y, 'o--')
plt.title('Average gas price per year in US')
plt.xlabel('year')
plt.ylabel('Avg.gas price per gallon[$]')
plt.grid()
plt.yticks(np.arange(1.0, 3.5, 0.5))
plt.tight_layout()
plt.show()
So what's going on:
I created two empty lists before opening the file and subsequently iterating through its lines
Next inside the loop I made sure to skip over any empty line by using .strip() which removes whitespace from the beginning and end of a string and then comparing the result to an empty string, if the resulting string is equal to an empty string, skip the line entirely.
And then and splitted the striped line by whitespace between characters which resulted in getting a list of two items, first element is the year and the second element is the average price.
Finally I added each value to its respective list, year to X and average price to Y.
I hope it helped.
P.S. My english is not very good so I hope my answer is somewhat readable.

Submitting queries to gaia.aip.de seems to no longer work

So I wrote some code a month ago, and I've been consistently running/updating it. I uploaded my most recent one to GitHub and it works I know it works because I tested it over and over again before uploading. However, now I opened up the file nothing changed and submitting queries... NO LONGER WORKS, and by no longer works I mean that out of the 150 queries 2 succeed. I have the data from my most recent script and I know 104/150 work. Anyone know why this might be? My code is below
"""
Imports needed for the code.
"""
"""
Script to get and clean data
"""
import numpy as np
import pandas as pd
from itertools import chain
from astroquery.gaia import Gaia
from pynverse import inversefunc
from astropy.io import ascii
import wget
import requests
import matplotlib.pyplot as plt
import numpy as np
import math
import pandas as pd
from sklearn.metrics import r2_score
from scipy import stats
import sklearn.metrics as sm
defaults = [0] * 3#needed for ignoring values that don't exsist
data = []#array for storing data
def reject_outliers(data):#Outlier Rejection Function
m = 2
u = np.mean(data)
s = np.std(data)
filtered = [e for e in data if (u - 2 * s < e < u + 2 * s)]
return filtered
def isNaN(num):#Checking if it is NaN(Not a Number)
return num != num
def HMS2deg(ra='', dec=''):#Convert from form RA to Degree RA(Gaia Form)
RA, DEC, rs, ds = '', '', 1, 1
if ra:
H, M, S, *_ = [float(i) for i in chain(ra.split(), defaults)]
if str(H)[0] == '-':
rs, H = -1, abs(H)
deg = (H*15) + (M/4)
RA = '{0}'.format(deg*rs)
if ra and dec:
return (RA, DEC)
else:
return RA or DEC
def HMS2degDEC(dec='', ra=''):#Convert from form Dec to Degree Dec(Gaia Form)
RA, DEC, rs, ds = '', '', 1, 1
if dec:
D, M, S, *_ = [float(i) for i in chain(dec.split(), defaults)]
S = S[0] if S else 0
if str(D)[0] == '-':
ds, D = -1, abs(D)
deg = D + (M/60) + (S/3600)
DEC = '{0}'.format(deg*ds)
if ra and dec:
return (RA, DEC)
else:
return RA or DEC
count=1
csv_file='test1.csv'#Data Storing File for Gaia
data = pd.read_csv(csv_file, error_bad_lines=False)#Ignore the bad lines
radata=data['R.A.']#get RA
decdata=data['Dec.']#get dec
agedata=data['Age(Myr)']#get Age
diamaterdata=data['Diameter']#get Diameter later converted to FOV
ra=[]#cleaned RA
dec=[]#cleaned Dec
age=[]#Cleaned age
csv_files=['M42.csv', 'Horsehead.csv', 'M93.csv', 'IrisTrain.csv']#Pre exsisting data
ages=[3, 6, 25, 0.055]#pre exsisting data's age
diameter=[]#Diameter cleaned data
gooddata=[]#Overall data storage for cleaned data
for i in range(len(radata)):#cleaning RA data and converting
if(isNaN(radata[i])):
ra.append(0)
else:
ra.append(HMS2deg(radata[i]))
print(ra)
for i in range(len(decdata)):#Cleaning Dec Data and converting
if(isNaN(decdata[i])):
dec.append(0)
else:
dec.append(HMS2degDEC(decdata[i]))
print(dec)
for i in range(len(diamaterdata)):#cleaning diameter data and converting to FOV
if(isNaN(diamaterdata[i])):
diameter.append(0)
else:
diameter.append(((diamaterdata[i])/3600)*100)
print(diameter)
for i in range(len(ra)):#Modified Query for each object
query1=""" SELECT bp_rp, parallax, pmra, pmdec, phot_g_mean_mag AS gp
FROM gaiadr2.gaia_source
WHERE 1 = CONTAINS(POINT('ICRS', ra, dec),
"""
query1=query1+" CIRCLE('ICRS'," +str(ra[i])+","+ str(dec[i])+","+str(diameter[i])+")"+")"
string2="""
AND phot_g_mean_flux_over_error > 50
AND phot_rp_mean_flux_over_error > 20
AND phot_bp_mean_flux_over_error > 20
AND visibility_periods_used > 8
"""
print(query1)
query1=query1+string2
try:#Try the following code
job = Gaia.launch_job(query1)#Launch query to gaia webpage
print(job)
results = job.get_results()#get results
ascii.write(results, 'values'+str(count)+'.csv', format='csv', fast_writer=False)
csv_files.append('values'+str(count)+'.csv')#store in CSV
ages.append(agedata[i])#append data
print(ages)
count+=1#avoid re-writing CSV file by creating different ones
except:#If the code throws any error, usually 'can't query' it will ignore the file, another filter to clean out any useless or bad data
continue
"""
End of Cleaning and Gathering Data
"""
"""
Training and Creating Model with the data
"""
arr2=[]
datasetY=[]
datasetX=[]
Y=[]
av=0
count=[]
count2=[]
MAD=[]
"""
def adjR(x, y, degree):
results = {}
coeffs = np.polyfit(x, y, degree)
p = np.poly1d(coeffs)
yhat = p(x)
ybar = np.sum(y)/len(y)
ssreg = np.sum((yhat-ybar)**2)
sstot = np.sum((y - ybar)**2)
results['r_squared'] = 1- (((1-(ssreg/sstot))*(len(y)-1))/(len(y)-degree-1)
return results
original accuracy calculation
"""
"""
def objective(x, a, b, c):
return a * x + b
needed for scipy modeling, polyfit was more accurate
"""
"""
Line 59-68 checks if CSV data is NAN if it is it will ignore the value and only take the data that can be used
"""
count=0
for i in range(len(csv_files)):
data=pd.read_csv(csv_files[i])
arr=data['gp']
arr2=data['bp_rp']
for i in range(len(arr2)):
if(isNaN(arr2[i])):
continue
elif(13<=arr[i]<=19):
datasetX.append(arr2[i])
datasetY.append(arr[i])
count+=1
mad=stats.median_absolute_deviation(datasetY)#Calculate MAD for Magnitude
mad2=stats.median_absolute_deviation(datasetX)#Calculate MAD for Color
madav=(mad+mad2)/2#Total MAD
MAD.append(count)#Appending to an Array for training and plotting
datasetX.clear()#Clearing for next Iteration
datasetY.clear()#Clearing for next Iteration
count=0
"""
Plotting data and Traning
"""
ages3=[]
MAD2=[]
ages2 = [4000 if math.isnan(i) else i for i in ages]#ignore any age nan values
print(len(ages3))
print(len(MAD))
MAD=[1.5 if math.isnan(i) else i for i in MAD]#ignore any MAD computation error values
for i in range(len(MAD)):
if(-500<=MAD[i]<=1500 and -25<=ages2[i]<170 or (100<=MAD[i]<=1262) and (278<=ages2[i]<=5067) or (-20<=MAD[i]<=20) and (3900<=ages2[i]<=4100) or (2642<=MAD[i]<=4750) and (0<=ages2[i]<=200) or (7800<=MAD[i]<=315800) and (0<=ages2[i]<=20)):
continue
else:
ages3.append(float(ages2[i]))
MAD2.append(float(MAD[i]))
fig = plt.figure()
ax1 = fig.add_subplot('111')
ax1.scatter(ages3, MAD2, color='blue')
plt.ylim(-7800,315800)
polyline = np.linspace(-5, 9000, 20)
mod1 = np.poly1d(np.polyfit(ages3, MAD2, 2))#Train for a function of degree 2
predict = np.poly1d(mod1)
ax1.plot(polyline,mod1(polyline), color='red')
print(np.interp(0.795, mod1(polyline),polyline))
print(mod1)#print model
plt.show()
"""
End of Training and Creating model/End of Script
"""
Please focus on this part, the querying section:
for i in range(len(ra)):#Modified Query for each object
query1=""" SELECT bp_rp, parallax, pmra, pmdec, phot_g_mean_mag AS gp
FROM gaiadr2.gaia_source
WHERE 1 = CONTAINS(POINT('ICRS', ra, dec),
"""
query1=query1+" CIRCLE('ICRS'," +str(ra[i])+","+ str(dec[i])+","+str(diameter[i])+")"+")"
string2="""
AND phot_g_mean_flux_over_error > 50
AND phot_rp_mean_flux_over_error > 20
AND phot_bp_mean_flux_over_error > 20
AND visibility_periods_used > 8
"""
print(query1)
query1=query1+string2
try:#Try the following code
job = Gaia.launch_job(query1)#Launch query to gaia webpage
print(job)
results = job.get_results()#get results
ascii.write(results, 'values'+str(count)+'.csv', format='csv', fast_writer=False)
csv_files.append('values'+str(count)+'.csv')#store in CSV
ages.append(agedata[i])#append data
print(ages)
count+=1#avoid re-writing CSV file by creating different ones
except:#If the code throws any error, usually 'can't query' it will ignore the file, another filter to clean out any useless or bad data
continue
Thank you for your time. I know this is really unusual.
After removing the try/except this is the errors:
Traceback (most recent call last):
File "read.py", line 120, in <module>
job = Gaia.launch_job(query1)#Launch query to gaia webpage
File "C:\ProgramData\Anaconda3\lib\site-packages\astroquery\gaia\core.py", line 846, in launch_job
return TapPlus.launch_job(self, query=query, name=name,
File "C:\ProgramData\Anaconda3\lib\site-packages\astroquery\utils\tap\core.py", line 344, in launch_job
results = utils.read_http_response(response, output_format)
File "C:\ProgramData\Anaconda3\lib\site-packages\astroquery\utils\tap\xmlparser\utils.py", line 42, in read_http_response
result = APTable.read(data, format=astropyFormat)
File "C:\ProgramData\Anaconda3\lib\site-packages\astropy\table\connect.py", line 61, in __call__
out = registry.read(cls, *args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\astropy\io\registry.py", line 520, in read
data = reader(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\astropy\io\votable\connect.py", line 116, in read_table_votable
raise ValueError("No table found")
ValueError: No table found
Please note, this has been resolved. The reason for this is on their website: https://www.cosmos.esa.int/web/gaia/news, planned maintenance. For future reference, if your code stops working and it involves Querying, head to their website they have probably posted it.

Create multiple barplots based off groupby conditions

I am trying to create mutliple horizontal barplots for a dataset. The data deals with race times from a running race.
Dataframe has the following columns: Name, Age Group, Finish Time, Finish Place, Hometown. Sample data below.
Name
Age Group
Finish Time
Finish Place
Hometown
Times Ran The Race
John
30-39
15.5
1
New York City
2
Mike
30-39
17.2
2
Denver
1
Travis
40-49
20.4
1
Louisville
3
James
40-49
22.1
2
New York City
1
I would like to create a bar plot similar to what is shown below. There would be 1 bar chart per age group, fastest runner on bottom of chart, runner name with city and number of times ran the race below their name.
Do I need a for loop or would a simple groupby work? The number and sizing of each age group can be dynamic based off the race so it is not a constant, but would be dependent on the dataframe that is used for each race.
I employed a looping process. I use the extraction by age group as a temporary data frame, and then accumulate label information for multiple x-axis to prepare for reuse. The accumulated label information is decomposed into strings and stored in a new list. Next, draw a horizontal bar graph and update the labels on the x-axis.
for ag in df['Age Group'].unique():
label_all = []
tmp = df[df['Age Group'] == ag]
labels = [[x,y,z] for x,y,z in zip(tmp.Name.values, tmp.Hometown.values, tmp['Times Ran The Race'].values)]
for k in range(len(labels)):
label_all.append(labels[k])
l_all = []
for l in label_all:
lbl = l[0] + '\n'+ l[1] + '\n' + str(l[2]) + ' Time'
l_all.append(lbl)
ax = tmp[['Name', 'Finish Time']].plot(kind='barh', legend=False)
ax.set_title(ag +' Age Group')
ax.set_yticklabels([l_all[x] for x in range(len(l_all))])
ax.grid(axis='x')
for i in ['top','bottom','left','right']:
ax.spines[i].set_visible(False)
Here's a quite compact solution. Only tricky part is the ordinal number, if you really want to have that. I copied the lambda solution from Ordinal numbers replacement
Give this a try and please mark the answer with Up-button if you like it.
import matplotlib.pyplot as plt
ordinal = lambda n: "{}{}".format(n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])
for i, a in enumerate(df['Age Group'].unique()):
plt.figure(i)
dfa = df.loc[df['Age Group'] == a].copy()
dfa['Info'] = dfa.Name + '\n' + dfa.Hometown + '\n' + \
[ordinal(row) for row in dfa['Times Ran The Race']] + ' Time'
plt.barh(dfa.Info, dfa['Finish Time'])
plt.title(f'{a} Age Group')
plt.xlabel("Time (Minutes)")

BitCoin Algo not iterating through historical data correctly

I'm creating a simple trading backtester on Bitcoin, yet I'm having trouble with the for loops in my code. The current code is based on 2 simple moving averages q and z (currently for learning purposes no real strategy). info is a dataframe holding Bitcoin historical data from a csv file. There seems to be an outofbounce error and I can't figure it out. Any help would be appreciated.
import pandas as pd
import numpy as np
cash = 10000
file = 'BTC-USD.csv'
data = pd.read_csv(file)
y = data['Adj Close'][1000:]
x = data['Date'][1000:]
v = data['Volume'][1000:]
h = data['High'][1000:]
l = data['Low'][1000:]
def movAvg(values,time):
times=np.repeat(1.0,time)/time
sma = np.convolve(values,times,'valid')
return sma
z = movAvg(y,12)
q = movAvg(y,9)
SP = len(x[50-1:])
def AlgoCal(account,info):
#i = 1050
bought = False
test = []
for x in info.index:
if q[x]<z[x]:
if bought == False:
temp = info[x]
account = account-info[x]
test.append(account)
bought = True
elif q[x]>z[x]:
if bought == True:
temp = info[x]
account = account + info[x]
test.append(account)
bought = False
else:
print("Error")
return(test)
money = AlgoCal(cash,y)
print(money)
Sample Data from Yahoo Bitcoin csv
Date,Open,High,Low,Close,Adj Close,Volume
2014-09-17,465.864014,468.174011,452.421997,457.334015,457.334015,21056800
2014-09-18,456.859985,456.859985,413.104004,424.440002,424.440002,34483200
........
........
2020-05-21,9522.740234,9555.242188,8869.930664,9081.761719,9081.761719,39326160532
2020-05-22,9080.334961,9232.936523,9008.638672,9182.577148,9182.577148,29810773699
2020-05-23,9185.062500,9302.501953,9118.108398,9209.287109,9209.287109,27727866812
2020-05-24,9196.930664,9268.914063,9165.896484,9268.914063,9268.914063,27658280960
Error:
Traceback (most recent call last):
File "main.py", line 47, in <module>
money = AlgoCal(cash,y)
File "main.py", line 31, in AlgoCal
if q[x]<z[x]:
IndexError: index 1066 is out of bounds for axis 0 with size 1066
Your moving averages have two different lengths. One is 12 periods and the other is 9 periods. When you try to compare them in AlgoCal your short one runs out and gives you the out of bounds error.
If you are going to compare moving averages in this way, you need to add a minimum period at the beginning to only start when both averages are available.

Is there a way to fix maximum recursion level in python 3?

I'm trying to build a state map for data across a decade, with a slider to select the year displayed on the map. The sort of display where a user can pick 2014 and the map will show the data for 2014.
I merged the data I want to show with the appropriate shapefile. I end up with 733 rows and 5 columns - as many as 9 rows per county with the same county name and coordinates.
Everything seems to be okay until I try to build the map. This error message is returned:
OverflowError: Maximum recursion level reached
I've tried resetting the recursion limit using sys.setrecursionlimit but can't get past that error.
I haven't been able to find an answer on SO that I understand, so I'm hoping someone can point me in the right direction.
I'm using bokeh and json to build the map. I've tried using sys.setrecursionlimit but I get the same error message no matter how high I go.
I used the same code last week but couldn't get data from different years to display because I was using a subset of the data. Now that I've fixed that, I'm stuck on this error message.
def json_data(selectedYear):
yr = selectedYear
murders = murder[murder['Year'] == yr]
merged = mergedfinal
merged.fillna('0', inplace = True)
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
return json_data
geosource = GeoJSONDataSource(geojson = json_data(2018))
palette=brewer['YlOrRd'][9]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 60, nan_color = '#d9d9d9')
hover = HoverTool(tooltips = [ ('County/City','#NAME'),('Victims', '#Victims')])
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 30,
border_line_color=None,location = (0,0),
orientation = 'horizontal')
p = figure(title = 'Firearm Murders in Virginia', plot_height = 600 , plot_width = 950, toolbar_location = None, tools = [hover])
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.visible=False
p.yaxis.visible=False
p.patches('xs','ys', source = geosource,fill_color = {'field' :'Victims', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
def update_plot(attr, old, new):
year = Slider.value
new_data = json_data(year)
geosource.geojson = new_data
p.title.text = 'Firearm Murders in VA'
slider = Slider(title = 'Year', start = 2009, end = 2018, step = 1, value = 2018)
slider.on_change('value', update_plot)
layout = column(p,widgetbox(slider))
curdoc().add_root(layout)
output_notebook()
show(layout)
The same code worked well enough when I was using a more limited dataset. Here is the full context of the error message:
OverflowError Traceback (most recent call last)
<ipython-input-50-efd821491ac3> in <module>()
8 return json_data
9
---> 10 geosource = GeoJSONDataSource(geojson = json_data(2018))
11
12 palette=brewer['YlOrRd'][9]
<ipython-input-50-efd821491ac3> in json_data(selectedYear)
4 merged = mergedfinal
5 merged.fillna('0', inplace = True)
----> 6 merged_json = json.loads(merged.to_json())
7 json_data = json.dumps(merged_json)
8 return json_data
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
1087 force_ascii=force_ascii, date_unit=date_unit,
1088 default_handler=default_handler,
-> 1089 lines=lines)
1090
1091 def to_hdf(self, path_or_buf, key, **kwargs):
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
37 obj, orient=orient, date_format=date_format,
38 double_precision=double_precision, ensure_ascii=force_ascii,
---> 39 date_unit=date_unit, default_handler=default_handler).write()
40 else:
41 raise NotImplementedError("'obj' should be a Series or a DataFrame")
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in write(self)
83 date_unit=self.date_unit,
84 iso_dates=self.date_format == 'iso',
---> 85 default_handler=self.default_handler)
86
87
OverflowError: Maximum recursion level reached
I had a similar problem!
I narrowed my problem down to the .to_json step. For some reason when I merged my geopandas file on the right:
Neighbourhoods_merged = df_2016.merge(gdf_neighbourhoods, how = "left", on = "Neighbourhood#")
I ran into the recursion error. I found success by switching the two:
Neighbourhoods_merged = gdf_neighbourhoods.merge(df_2016, how = "left", on = "Neighbourhood#")
This is what worked for me. Infuriatingly I have no idea why this works, but I hope this might help someone else with the same error!
I solved this problem by changing the merge direction.
so, If you want to merge two dataframes A and B, and A has type of 'geopandas.geodataframe.GeoDataFrame' and B has 'pandas.core.frame.DataFrame
', you should merge them with pd.merge(A,B,on="some column'), not with the opposite direction.
I think the maximum recursion error comes when you execute .to_json() method to the pandas dataframe type with POLYGON type in it.
When you change the direction of merge and change the type to GeoDataFrame, .to_json() is executed without problem even they have POLYGON type column in it.
I spent 2 hours with this, and I hope this can help you.
If you need a higher recursion depth, you can set it using sys:
import sys
sys.setrecursionlimit(1500)
That being said, your error is most likely the result of an infinite recursion, which may be the case if increasing the depth doesn't fix it.

Categories