I'm trying to build a state map for data across a decade, with a slider to select the year displayed on the map. The sort of display where a user can pick 2014 and the map will show the data for 2014.
I merged the data I want to show with the appropriate shapefile. I end up with 733 rows and 5 columns - as many as 9 rows per county with the same county name and coordinates.
Everything seems to be okay until I try to build the map. This error message is returned:
OverflowError: Maximum recursion level reached
I've tried resetting the recursion limit using sys.setrecursionlimit but can't get past that error.
I haven't been able to find an answer on SO that I understand, so I'm hoping someone can point me in the right direction.
I'm using bokeh and json to build the map. I've tried using sys.setrecursionlimit but I get the same error message no matter how high I go.
I used the same code last week but couldn't get data from different years to display because I was using a subset of the data. Now that I've fixed that, I'm stuck on this error message.
def json_data(selectedYear):
yr = selectedYear
murders = murder[murder['Year'] == yr]
merged = mergedfinal
merged.fillna('0', inplace = True)
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
return json_data
geosource = GeoJSONDataSource(geojson = json_data(2018))
palette=brewer['YlOrRd'][9]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 60, nan_color = '#d9d9d9')
hover = HoverTool(tooltips = [ ('County/City','#NAME'),('Victims', '#Victims')])
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 30,
border_line_color=None,location = (0,0),
orientation = 'horizontal')
p = figure(title = 'Firearm Murders in Virginia', plot_height = 600 , plot_width = 950, toolbar_location = None, tools = [hover])
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.visible=False
p.yaxis.visible=False
p.patches('xs','ys', source = geosource,fill_color = {'field' :'Victims', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
def update_plot(attr, old, new):
year = Slider.value
new_data = json_data(year)
geosource.geojson = new_data
p.title.text = 'Firearm Murders in VA'
slider = Slider(title = 'Year', start = 2009, end = 2018, step = 1, value = 2018)
slider.on_change('value', update_plot)
layout = column(p,widgetbox(slider))
curdoc().add_root(layout)
output_notebook()
show(layout)
The same code worked well enough when I was using a more limited dataset. Here is the full context of the error message:
OverflowError Traceback (most recent call last)
<ipython-input-50-efd821491ac3> in <module>()
8 return json_data
9
---> 10 geosource = GeoJSONDataSource(geojson = json_data(2018))
11
12 palette=brewer['YlOrRd'][9]
<ipython-input-50-efd821491ac3> in json_data(selectedYear)
4 merged = mergedfinal
5 merged.fillna('0', inplace = True)
----> 6 merged_json = json.loads(merged.to_json())
7 json_data = json.dumps(merged_json)
8 return json_data
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
1087 force_ascii=force_ascii, date_unit=date_unit,
1088 default_handler=default_handler,
-> 1089 lines=lines)
1090
1091 def to_hdf(self, path_or_buf, key, **kwargs):
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
37 obj, orient=orient, date_format=date_format,
38 double_precision=double_precision, ensure_ascii=force_ascii,
---> 39 date_unit=date_unit, default_handler=default_handler).write()
40 else:
41 raise NotImplementedError("'obj' should be a Series or a DataFrame")
/Users/mcuddy/anaconda/lib/python3.6/site-packages/pandas/io/json.py in write(self)
83 date_unit=self.date_unit,
84 iso_dates=self.date_format == 'iso',
---> 85 default_handler=self.default_handler)
86
87
OverflowError: Maximum recursion level reached
I had a similar problem!
I narrowed my problem down to the .to_json step. For some reason when I merged my geopandas file on the right:
Neighbourhoods_merged = df_2016.merge(gdf_neighbourhoods, how = "left", on = "Neighbourhood#")
I ran into the recursion error. I found success by switching the two:
Neighbourhoods_merged = gdf_neighbourhoods.merge(df_2016, how = "left", on = "Neighbourhood#")
This is what worked for me. Infuriatingly I have no idea why this works, but I hope this might help someone else with the same error!
I solved this problem by changing the merge direction.
so, If you want to merge two dataframes A and B, and A has type of 'geopandas.geodataframe.GeoDataFrame' and B has 'pandas.core.frame.DataFrame
', you should merge them with pd.merge(A,B,on="some column'), not with the opposite direction.
I think the maximum recursion error comes when you execute .to_json() method to the pandas dataframe type with POLYGON type in it.
When you change the direction of merge and change the type to GeoDataFrame, .to_json() is executed without problem even they have POLYGON type column in it.
I spent 2 hours with this, and I hope this can help you.
If you need a higher recursion depth, you can set it using sys:
import sys
sys.setrecursionlimit(1500)
That being said, your error is most likely the result of an infinite recursion, which may be the case if increasing the depth doesn't fix it.
Related
Does anybody know, if its possible to switch the colors, so that i can distinguish every row instead of every column ? And how do I add in a legend, where i can see which player (one color for each player) has e.g. which pace?
My code is:
feldspieler = feldspieler["sofifa_id"]
skills = ['pace','shooting','passing','dribbling','defending','physic']
diagramm = plt.figure(figsize=(40,20))
plt.xticks(rotation=90,fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel('Skills', fontsize=30)
plt.ylabel('Skill value', fontsize=30)
plt.title('Spielervergleich', fontsize = 40)
sns.set_palette("pastel")
for i in feldspieler:
i = fifa_21.loc[fifa_21['sofifa_id'] == i]
i = pd.DataFrame(i, columns = skills)
sns.swarmplot(data=i,size=12)
Thanks a lot #Trevis.
Unfortunately, it still does not work.
Here you can find a screenshot of the dataset and the code that the graphic accesses.
while True:
team = input("Welches Team suchen Sie?: ")
if team in fifa_21.values:
break
else:
print("Dieser Verein existiert nicht. Bitte achten Sie auf eine korrekte Schreibweise.")
gesuchtes_team = fifa_21.loc[(fifa_21['club_name'] == team)]
spieler_verein = gesuchtes_team[["sofifa_id","short_name","nationality","age","player_positions","overall","value_eur"]]
spieler_verein = pd.DataFrame(spieler_verein)
spieler_verein = spieler_verein.reset_index(drop=True)
spieler_verein
feldspieler = spieler_verein.loc[spieler_verein.player_positions != "GK", :]
feldspieler = feldspieler.reset_index(drop=True)
feldspieler
feldspieler = feldspieler["sofifa_id"]
skills = ['pace','shooting','passing','dribbling','defending','physic']
diagramm = plt.figure(figsize=(40,20))
plt.xticks(rotation=90,fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel('Skills', fontsize=30)
plt.ylabel('Skill value', fontsize=30)
plt.title('Spielervergleich', fontsize = 40)
sns.set_palette("pastel")
for i in feldspieler:
i = fifa_21.loc[fifa_21['sofifa_id'] == i]
i = pd.DataFrame(i, columns = skills)
sns.swarmplot(data=fifa_21, x="skills", y="skill_value", hue="sofifa_id")
#sns.swarmplot(x =skills, y= pd.DataFrame(i, columns == skills) ,hue= "sofifa_id", data=i,size=12)
Set the hue parameter to the value of the column you're interested in (sofifa_id). You can then provide the whole dataset at once to plot the data. The legend will be added automatically.
So you should have a DataFrame with a 'skills' column containing the different skills you have in x-axis here. If necessary, see the documentation for pd.melt, in particular the third example.
Then, assuming the default column name value for the value after melting, call
sns.swarmplot(data=fifa_21, x="skills", y="value", hue="sofifa_id")
This is from the official swarmplot function documentation (here).
Edit: So, seeing your data, you should really use pd.melt like this:
(I'm considering one row per player, with distinct short_name values).
data = pd.melt(fifa_21, id_vars='short_name', var_name='skill',
value_vars=['pace', 'shooting', 'passing', 'dribbling',
'defending', 'physic'])
sns.swarmplot(x='skill', y='value', hue='short_name', data=data)
melt will transform to columns and value from a wide format
short_name
pace
shooting
a_name
85
92
to a long table format
short_name
skill
value
a_name
pace
85
a_name
shooting
92
I am using a code written by Victor Velasquez to extract data from raster files which contain dayly precipitation data since 1981.
When I run the code, I get this error that some index is out of bounds. I did a little research and found that this is common and there are a lot of similar questions here, but I havenĀ“t been able to find the specific solution for this case.
The error:
IndexError Traceback (most recent call last)
<ipython-input-8-eff66ef74d73> in <module>
1 Pisco = Extract_Pisco()
----> 2 Pisco.DataPre()
3 Pisco.ExportExcel()
<ipython-input-7-6cf99336b9e1> in DataPre(self)
23 Band = Data.read(1)
24 X,Y = Data.index(self.x,self.y) #extraigo
---> 25 Pre = Band[X,Y]
26 self.ListPre.append(Pre) #agrego a lista
27
IndexError: index 158116290 is out of bounds for axis 0 with size 198
The part of the code pointed by the traceback is:
def DataPre(self):
os.chdir(path)
fileDir= path
fileExt = r".tif"
Lis = [_ for _ in os.listdir(fileDir) if _.endswith(fileExt)]
Lis.sort() #ordeno archivos .tif
Inicio = '1981-01-01.tif'
Fin = '2018-07-31.tif'
Rini = Lis.index(Inicio)
Rend = Lis.index(Fin)
self.Lis = Lis[Rini:Rend+1]
self.ListPre = []
for i in tnrange (0,len(self.Lis),desc = "!! Extrayendo Datos !!"):
with rasterio.open(self.Lis[i]) as Data:
Band = Data.read(1)
X,Y = Data.index(self.x,self.y)
Pre = Band[X,Y]
self.ListPre.append(Pre)
Thank you very much!
It looks like the file you are reading does not contain the geospatial point you are trying to find data for. (If this is incorrect please let me know).
You can add a statement to catch if a point is contained in the data:
Band = Data.read(1)
X,Y = Data.index(self.x,self.y)
if 0 <= X < Band.height and 0 <= Y <= Band.width:
Pre = Band[X,Y]
self.ListPre.append(Pre)
This question already has answers here:
Data tooltips in Bokeh don't show data, showing '???' instead
(2 answers)
Closed 5 years ago.
I have a table that contains the number of times a student accessed an activity.
df_act5236920.head()
activities studs
0 3.0 student 1
1 4.0 student 10
2 5.0 student 11
3 6.0 student 12
4 2.0 student 13
5 4.0 student 14
6 19.0 student 15
If I try to add the hover tool to the bar chart created by this dataframe through the code below:
from bokeh.charts import Bar
from bokeh.models import Legend
from collections import OrderedDict
TOOLS = "pan,wheel_zoom,box_zoom,reset,hover,save"
bar = Bar(df_act5236920,values='activities',label='studs',title = "Activity 5236920 performed by students",
xlabel="Students",ylabel="Activity",legend=False,tools=TOOLS)
hover = bar.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = OrderedDict([
("Student Name", "#studs"),
("Access Count", "#activities"),
])
show(bar)
When I hover over the bar chart, it shows the student value but not the activities values, I even tried using "$activities" but the result is still the same.
I tried using ColumnDataSource instead of DataFrame based on other stack overflow questions I read, as is apparent in the code below:
source = ColumnDataSource(ColumnDataSource.from_df(df_act5236920))
from collections import OrderedDict
TOOLS = "pan,wheel_zoom,box_zoom,reset,hover,save"
bar = Bar('studs','activities',source=source, title = "Activity 5236920 performed by students",tools=TOOLS)
hover = bar.select_one(HoverTool)
hover.point_policy = "follow_mouse"
hover.tooltips = OrderedDict([
("Student Name", "#studs"),
("Access Count", "#activities"),
])
show(bar)
It gives me the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-76-81505464c390> in <module>()
3 # bar = Bar(df_act5236920,values='activities',label='studs',title = "Activity 5236920 performed by students",
4 # xlabel="Students",ylabel="Activity",legend=False,tools=TOOLS)
----> 5 bar = Bar('studs','activities',source=source, title = "Activity 5236920 performed by students",tools=TOOLS)
6 hover = bar.select_one(HoverTool)
7 hover.point_policy = "follow_mouse"
C:\Anaconda2\lib\site-packages\bokeh\charts\builders\bar_builder.pyc in Bar(data, label, values, color, stack, group, agg, xscale, yscale, xgrid, ygrid, continuous_range, **kw)
319 kw['y_range'] = y_range
320
--> 321 chart = create_and_build(BarBuilder, data, **kw)
322
323 # hide x labels if there is a single value, implying stacking only
C:\Anaconda2\lib\site-packages\bokeh\charts\builder.pyc in create_and_build(builder_class, *data, **kws)
66 # create the new builder
67 builder_kws = {k: v for k, v in kws.items() if k in builder_props}
---> 68 builder = builder_class(*data, **builder_kws)
69
70 # create a chart to return, since there isn't one already
C:\Anaconda2\lib\site-packages\bokeh\charts\builder.pyc in __init__(self, *args, **kws)
292 # make sure that the builder dimensions have access to the chart data source
293 for dim in self.dimensions:
--> 294 getattr(getattr(self, dim), 'set_data')(data)
295
296 # handle input attrs and ensure attrs have access to data
C:\Anaconda2\lib\site-packages\bokeh\charts\properties.pyc in set_data(self, data)
170 data (`ChartDataSource`): the data source associated with the chart
171 """
--> 172 self.selection = data[self.name]
173 self._chart_source = data
174 self._data = data.df
TypeError: 'NoneType' object has no attribute '__getitem__'
I even tried creating the ColumnDataSource from scratch by passing the columns of the dataframe to it in the form of a list of values, but I still got the same error as the one shown above
source = ColumnDataSource(data=dict(
studs=students,
activities=activity_5236920,
))
I'm having the same issue when I try to implement the hovertool on a heatmap as well. Can anyone help in how to fix this?
So, after going through a lot of the documentation, I've finally figured out a few things.
Firstly, the NoneType error was due to the fact that for a bar chart, you need to pass the dataframe as well as the ColumnDataSource, for the bar-chart to display.
So the code needed to be:
bar = Bar(df_act5236920,values='activities',label='studs',title = "Activity 5236920 performed by students",
xlabel="Students",ylabel="Activity",legend=False,tools=TOOLS,source=source)
Notice how the dataframe name and source=source are both mentioned in the Bar() method.
For the second issue of the value not being displayed, I used #height which essentially displayed the height of the selected bar, which in this case was the count value.
hover.tooltips = OrderedDict([
("Student Name", "#studs"),
("Access Count", "#height"),
])
For the student name value, #x and #studs both work. But the only thing I still couldn't resolve was that although I have mentioned the ColumnDataSource "source", it didn't really accomplish anything for me because when I try to use #activities in hover.tooltips, it still gives me a response of "???". So, I'm not sure what that is all about. And it is an issue that I'm sturggling with in another Time Series visualisation that I'm trying to build.
I have two GeoDataFrame. One is of the state of Iowa, while the other is of foretasted rain over the next 72 hours for North America. I want to create a GeoDataFrame of the rain forecast where it overlies the state of Iowa. But I get an error.
state_rain = gpd.overlay(NA_rain,iowa,how='intersection')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-ba8264ed63c2> in <module>()
3 #ws_usa[['WTRSHD_ID','QPF']].groupby('WTRSHD_ID').max().reset_index()
4 #state_rain = sjoin(usa_r,usa,how='inner',op='intersects')
----> 5 state_rain = gpd.overlay(usa_r,joined_states,how='intersection')
6 ws_state = gpd.overlay(ws,joined_states,how='intersection')
7 #print ws_usa.loc[ws_usa.WTRSHD_ID == 'IA-04']['QPF']
C:\Anaconda2\lib\site-packages\geopandas\tools\overlay.pyc in overlay(df1, df2, how, use_sindex)
95
96 # Collect the interior and exterior rings
---> 97 rings1 = _extract_rings(df1)
98 rings2 = _extract_rings(df2)
99 mls1 = MultiLineString(rings1)
C:\Anaconda2\lib\site-packages\geopandas\tools\overlay.pyc in _extract_rings(df)
50 # geom from layer is not valid attempting fix by buffer 0"
51 geom = geom.buffer(0)
---> 52 rings.append(geom.exterior)
53 rings.extend(geom.interiors)
54
AttributeError: 'MultiPolygon' object has no attribute 'exterior'
I checked for type == 'MultiPolygon', but neither GeoDataFrame contain any.
print NA_rain[NA_rain.geometry.type == 'MulitPolygon']
print iowa[iowa.geometry.type == 'MultiPolygon']
Empty GeoDataFrame
Columns: [END_TIME, ID, ISSUE_TIME, PRODUCT, QPF, START_TIME, UNITS, VALID_TIME, geometry]
Index: []
Empty GeoDataFrame
Columns: [sid, AFFGEOID, ALAND, AWATER, GEOID, LSAD, NAME, STATEFP, STATENS, STUSPS, geometry]
Index: []
If I do the following, the intersection works.
NA_rain.geometry = NA_rain.geometry.map(lambda x: x.convex_hull)
My question is twofold: 1.Why don't any MultiPolygons show up in my NA_rain GeoDataFrame, and 2. Besides turning every Polygon into a convex_hull, which ruins detailed contours of the Polygon, how would you suggest dealing with the MultiPolygon issue.
I agree with #jdmcbr. I suspect that at least one of the features in NA_rain is a MultiPolygon which did not get detected since the condition you showed is misspelled (MulitPolygon instead of MultiPolygon).
If your dataframe has MultiPolygons, you can convert all of them to Polygons. One dirty was is by passing the list() function to each MultiPolygon and then exploding into multiple rows:
geom = NA_rain.pop('geometry')
geom = geom.apply(lambda x: list(x) if isinstance(x, MultiPolygon) else x).explode())
NA_rain = NA_rain.join(geom, how='inner')
Note that the joining in line 3 duplicates the other attributes of the dataframe for all Polygons of the MultiPolygon, including feature identifiers, which you may want to change later, depending on your task.
I'm using Pandas version 0.12.0 on Ubuntu 13.04. I'm trying to create a 5D panel object to contain some EEG data split by condition.
How I'm chosing to structure my data:
Let me begin by demonstrating my use of pandas.core.panelnd.creat_nd_panel_factory.
Subject = panelnd.create_nd_panel_factory(
klass_name='Subject',
axis_orders=['setsize', 'location', 'vfield', 'channels', 'samples'],
axis_slices={'labels': 'location',
'items': 'vfield',
'major_axis': 'major_axis',
'minor_axis': 'minor_axis'},
slicer=pd.Panel4D,
axis_aliases={'ss': 'setsize',
'loc': 'location',
'vf': 'vfield',
'major': 'major_axis',
'minor': 'minor_axis'}
# stat_axis=2 # dafuq is this?
)
Essentially, the organization is as follows:
setsize: an experimental condition, can be 1 or 2
location: an experimental condition, can be "same", "diff" or None
vfield: an experimental condition, can be "lvf" or "rvf"
The last two axes correspond to a DataFrame's major_axis and minor_axis. They have been renamed for clarity:
channels: columns, the EEG channels (129 of them)
samples: rows, the individual samples. samples can be though of as a time axis.
What I'm trying to do:
Each experimental condition (subject x setsize x location x vfield) is stored in it's own tab-delimited file, which I am reading in with pandas.read_table, obtaining a DataFrame object. I want to create one 5-dimensional panel (i.e. Subject) for each subject, which will contain all experimental conditions (i.e. DataFrames) for that subject.
To start, I'm building a nested dictionary for each subject/Subject:
# ... do some boring stuff to get the text files, etc...
for _, factors in df.iterrows():
# `factors` is a 4-tuple containing
# (subject number, setsize, location, vfield,
# and path to the tab-delimited file).
sn, ss, loc, vf, path = factors
eeg = pd.read_table(path, sep='\t', names=range(1, 129) + ['ref'], header=None)
# build nested dict
subjects.setdefault(sn, {}).setdefault(ss, {}).setdefault(loc, {})[vf] = eeg
# and now attempt to build `Subject`
for sn, d in subjects.iteritems():
subjects[sn] = Subject(d)
Full stack trace
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-831fa603ca8f> in <module>()
----> 1 import_data()
/home/louist/Dropbox/Research/VSTM/scripts/vstmlib.py in import_data()
64
65 import ipdb; ipdb.set_trace()
---> 66 for sn, d in subjects.iteritems():
67 subjects[sn] = Subject(d)
68
/usr/local/lib/python2.7/dist-packages/pandas/core/panelnd.pyc in __init__(self, *args, **kwargs)
65 if 'dtype' not in kwargs:
66 kwargs['dtype'] = None
---> 67 self._init_data(*args, **kwargs)
68 klass.__init__ = __init__
69
/usr/local/lib/python2.7/dist-packages/pandas/core/panel.pyc in _init_data(self, data, copy, dtype, **kwargs)
250 mgr = data
251 elif isinstance(data, dict):
--> 252 mgr = self._init_dict(data, passed_axes, dtype=dtype)
253 copy = False
254 dtype = None
/usr/local/lib/python2.7/dist-packages/pandas/core/panel.pyc in _init_dict(self, data, axes, dtype)
293 raxes = [self._extract_axis(self, data, axis=i)
294 if a is None else a for i, a in enumerate(axes)]
--> 295 raxes_sm = self._extract_axes_for_slice(self, raxes)
296
297 # shallow copy
/usr/local/lib/python2.7/dist-packages/pandas/core/panel.pyc in _extract_axes_for_slice(self, axes)
1477 """ return the slice dictionary for these axes """
1478 return dict([(self._AXIS_SLICEMAP[i], a) for i, a
-> 1479 in zip(self._AXIS_ORDERS[self._AXIS_LEN - len(axes):], axes)])
1480
1481 #staticmethod
KeyError: 'location'
I understand that panelnd is an experimental feature, but I'm fairly certain that I'm doing something wrong. Can somebody please point me in the right direction? If it is a bug, is there something that can be done about it?
As usual, thank you very much in advance!
Working example. You needed to specify the mapping of your axes to the internal axes names via the slices. This fiddles with the internal structure, but the fixed names of pandas still exist (and are somewhat hardcoded via Panel/Panel4D), so you need to provide the mapping.
I would create a Panel4D first, then your Subject as I did below.
Pls post on github / here if you find more bugs. This is not a heavily used feature.
Output
<class 'pandas.core.panelnd.Subject'>
Dimensions: 3 (setsize) x 1 (location) x 1 (vfield) x 10 (channels) x 2 (samples)
Setsize axis: level0_0 to level0_2
Location axis: level1_0 to level1_0
Vfield axis: level2_0 to level2_0
Channels axis: level3_0 to level3_9
Samples axis: level4_1 to level4_2
Code
import pandas as pd
import numpy as np
from pandas.core import panelnd
Subject = panelnd.create_nd_panel_factory(
klass_name='Subject',
axis_orders=['setsize', 'location', 'vfield', 'channels', 'samples'],
axis_slices={'location' : 'labels',
'vfield' : 'items',
'channels' : 'major_axis',
'samples': 'minor_axis'},
slicer=pd.Panel4D,
axis_aliases={'ss': 'setsize',
'loc': 'labels',
'vf': 'items',
'major': 'major_axis',
'minor': 'minor_axis'})
subjects = dict()
for i in range(3):
eeg = pd.DataFrame(np.random.randn(10,2),columns=['level4_1','level4_2'],index=[ "level3_%s" % x for x in range(10)])
loc, vf = ('level1_0','level2_0')
subjects["level0_%s" % i] = pd.Panel4D({ loc : { vf : eeg }})
print Subject(subjects)