Number of labels does not match samples on decision tree regression

Number of labels does not match samples on decision tree regression - python

Trying to run a decision tree regressor on my data, but whenever I try and run my code, I get this error
ValueError: Number of labels=78177 does not match number of samples=312706
#feature selection
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
target = ['sale_price']
train, test = train_test_split(housing_data, test_size=0.2)
regression_tree = DecisionTreeRegressor(criterion="entropy",random_state=100,
max_depth=4,min_samples_leaf=5)
regression_tree.fit(train,test)
I have added a sample of my code, hopefully this gives you more context to help better understand my question and problem:
{'Age of House at Sale': {0: 6,
1: 2016,
2: 92,
3: 42,
4: 90,
5: 2012,
6: 89,
7: 3,
8: 2015,
9: 104},
'AreaSource': {0: 2.0,
1: 7.0,
2: 2.0,
3: 2.0,
4: 2.0,
5: 2.0,
6: 2.0,
7: 2.0,
8: 2.0,
9: 2.0},
'AssessLand': {0: 9900.0,
1: 1571850.0,
2: 1548000.0,
3: 36532350.0,
4: 2250000.0,
5: 3110400.0,
6: 2448000.0,
7: 1354500.0,
8: 1699200.0,
9: 1282500.0},
'AssessTot': {0: 34380.0,
1: 1571850.0,
2: 25463250.0,
3: 149792400.0,
4: 27166050.0,
5: 5579990.0,
6: 28309500.0,
7: 23965650.0,
8: 3534300.0,
9: 11295000.0},
'BldgArea': {0: 2688.0,
1: 0.0,
2: 304650.0,
3: 2548000.0,
4: 356000.0,
5: 382746.0,
6: 290440.0,
7: 241764.0,
8: 463427.0,
9: 547000.0},
'BldgClass': {0: 72,
1: 89,
2: 80,
3: 157,
4: 150,
5: 44,
6: 92,
7: 43,
8: 39,
9: 61},
'BldgDepth': {0: 50.0,
1: 0.0,
2: 92.0,
3: 0.0,
4: 100.33,
5: 315.0,
6: 125.0,
7: 100.0,
8: 0.0,
9: 80.92},
'BldgFront': {0: 20.0,
1: 0.0,
2: 335.0,
3: 0.0,
4: 202.0,
5: 179.0,
6: 92.0,
7: 500.0,
8: 0.0,
9: 304.0},
'BsmtCode': {0: 5.0,
1: 5.0,
2: 5.0,
3: 5.0,
4: 2.0,
5: 5.0,
6: 2.0,
7: 2.0,
8: 5.0,
9: 5.0},
'CD': {0: 310.0,
1: 302.0,
2: 302.0,
3: 318.0,
4: 302.0,
5: 301.0,
6: 302.0,
7: 301.0,
8: 301.0,
9: 302.0},
'ComArea': {0: 0.0,
1: 0.0,
2: 304650.0,
3: 2548000.0,
4: 30000.0,
5: 11200.0,
6: 290440.0,
7: 27900.0,
8: 4884.0,
9: 547000.0},
'CommFAR': {0: 0.0,
1: 2.0,
2: 2.0,
3: 2.0,
4: 0.0,
5: 0.0,
6: 10.0,
7: 2.0,
8: 0.0,
9: 2.0},
'Council': {0: 41.0,
1: 33.0,
2: 33.0,
3: 46.0,
4: 33.0,
5: 33.0,
6: 33.0,
7: 33.0,
8: 33.0,
9: 35.0},
'Easements': {0: 0.0,
1: 0.0,
2: 0.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0},
'ExemptLand': {0: 0.0,
1: 1571850.0,
2: 0.0,
3: 0.0,
4: 2250000.0,
5: 0.0,
6: 0.0,
7: 932847.0,
8: 0.0,
9: 0.0},
'ExemptTot': {0: 0.0,
1: 1571850.0,
2: 0.0,
3: 0.0,
4: 27166050.0,
5: 0.0,
6: 11304900.0,
7: 23543997.0,
8: 0.0,
9: 0.0},
'FacilFAR': {0: 0.0,
1: 6.5,
2: 0.0,
3: 0.0,
4: 4.8,
5: 4.8,
6: 10.0,
7: 3.0,
8: 5.0,
9: 4.8},
'FactryArea': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 547000.0},
'GarageArea': {0: 0.0,
1: 0.0,
2: 0.0,
3: 1285000.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 22200.0,
8: 0.0,
9: 0.0},
'HealthArea': {0: 6410.0,
1: 1000.0,
2: 2300.0,
3: 8822.0,
4: 2300.0,
5: 400.0,
6: 2300.0,
7: 700.0,
8: 500.0,
9: 9300.0},
'HealthCent': {0: 35.0,
1: 36.0,
2: 38.0,
3: 35.0,
4: 38.0,
5: 30.0,
6: 38.0,
7: 30.0,
8: 30.0,
9: 36.0},
'IrrLotCode': {0: 1, 1: 1, 2: 0, 3: 0, 4: 1, 5: 1, 6: 0, 7: 1, 8: 0, 9: 0},
'LandUse': {0: 2.0,
1: 10.0,
2: 5.0,
3: 5.0,
4: 8.0,
5: 4.0,
6: 5.0,
7: 3.0,
8: 3.0,
9: 6.0},
'LotArea': {0: 2252.0,
1: 134988.0,
2: 32000.0,
3: 905000.0,
4: 20267.0,
5: 57600.0,
6: 12500.0,
7: 50173.0,
8: 44704.0,
9: 113800.0},
'LotDepth': {0: 100.0,
1: 275.33,
2: 335.92,
3: 859.0,
4: 100.33,
5: 320.0,
6: 125.0,
7: 200.0,
8: 281.86,
9: 204.0},
'LotFront': {0: 24.0,
1: 490.5,
2: 92.42,
3: 930.0,
4: 202.0,
5: 180.0,
6: 100.0,
7: 521.25,
8: 225.08,
9: 569.0},
'LotType': {0: 5.0,
1: 5.0,
2: 3.0,
3: 3.0,
4: 3.0,
5: 3.0,
6: 3.0,
7: 1.0,
8: 5.0,
9: 3.0},
'NumBldgs': {0: 1.0,
1: 0.0,
2: 1.0,
3: 4.0,
4: 1.0,
5: 1.0,
6: 1.0,
7: 1.0,
8: 2.0,
9: 13.0},
'NumFloors': {0: 2.0,
1: 0.0,
2: 13.0,
3: 2.0,
4: 15.0,
5: 0.0,
6: 37.0,
7: 6.0,
8: 20.0,
9: 8.0},
'OfficeArea': {0: 0.0,
1: 0.0,
2: 264750.0,
3: 0.0,
4: 30000.0,
5: 1822.0,
6: 274500.0,
7: 4200.0,
8: 0.0,
9: 0.0},
'OtherArea': {0: 0.0,
1: 0.0,
2: 39900.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0},
'PolicePrct': {0: 70.0,
1: 84.0,
2: 84.0,
3: 63.0,
4: 84.0,
5: 90.0,
6: 84.0,
7: 94.0,
8: 90.0,
9: 88.0},
'ProxCode': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 1.0,
8: 0.0,
9: 0.0},
'ResArea': {0: 2172.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 371546.0,
6: 0.0,
7: 213864.0,
8: 458543.0,
9: 0.0},
'ResidFAR': {0: 2.0,
1: 7.2,
2: 0.0,
3: 0.0,
4: 2.43,
5: 2.43,
6: 10.0,
7: 3.0,
8: 5.0,
9: 0.0},
'RetailArea': {0: 0.0,
1: 0.0,
2: 0.0,
3: 1263000.0,
4: 0.0,
5: 9378.0,
6: 15940.0,
7: 0.0,
8: 4884.0,
9: 0.0},
'SHAPE_Area': {0: 2316.8863224,
1: 140131.577176,
2: 34656.4472405,
3: 797554.847834,
4: 21360.1476315,
5: 58564.8643115,
6: 12947.145471,
7: 50772.624868800005,
8: 47019.5677861,
9: 118754.78573699998},
'SHAPE_Leng': {0: 249.41135038849998,
1: 1559.88914353,
2: 890.718521021,
3: 3729.78685686,
4: 620.761169374,
5: 1006.33799946,
6: 460.03168012300006,
7: 1385.27352839,
8: 992.915660585,
9: 1565.91477261},
'SanitDistr': {0: 10.0,
1: 2.0,
2: 2.0,
3: 18.0,
4: 2.0,
5: 1.0,
6: 2.0,
7: 1.0,
8: 1.0,
9: 2.0},
'SanitSub': {0: 21,
1: 23,
2: 31,
3: 22,
4: 31,
5: 21,
6: 23,
7: 7,
8: 12,
9: 22},
'SchoolDist': {0: 19.0,
1: 13.0,
2: 13.0,
3: 22.0,
4: 13.0,
5: 14.0,
6: 13.0,
7: 14.0,
8: 14.0,
9: 14.0},
'SplitZone': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 0, 9: 1},
'StrgeArea': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 1500.0,
8: 0.0,
9: 0.0},
'UnitsRes': {0: 2.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 522.0,
6: 0.0,
7: 234.0,
8: 470.0,
9: 0.0},
'UnitsTotal': {0: 2.0,
1: 0.0,
2: 0.0,
3: 123.0,
4: 1.0,
5: 525.0,
6: 102.0,
7: 237.0,
8: 472.0,
9: 1.0},
'YearAlter1': {0: 0.0,
1: 0.0,
2: 1980.0,
3: 0.0,
4: 1998.0,
5: 0.0,
6: 2009.0,
7: 2012.0,
8: 0.0,
9: 0.0},
'YearAlter2': {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 2000.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0},
'ZipCode': {0: 11220.0,
1: 11201.0,
2: 11201.0,
3: 11234.0,
4: 11201.0,
5: 11249.0,
6: 11241.0,
7: 11211.0,
8: 11249.0,
9: 11205.0},
'ZoneDist1': {0: 24,
1: 76,
2: 5,
3: 64,
4: 24,
5: 24,
6: 30,
7: 74,
8: 45,
9: 27},
'ZoneMap': {0: 3,
1: 19,
2: 19,
3: 22,
4: 19,
5: 19,
6: 19,
7: 2,
8: 19,
9: 19},
'building_class': {0: 141,
1: 97,
2: 87,
3: 176,
4: 168,
5: 8,
6: 102,
7: 46,
8: 97,
9: 66},
'building_class_at_sale': {0: 143,
1: 98,
2: 89,
3: 179,
4: 171,
5: 7,
6: 103,
7: 49,
8: 98,
9: 69},
'building_class_category': {0: 39,
1: 71,
2: 31,
3: 38,
4: 86,
5: 40,
6: 80,
7: 75,
8: 71,
9: 41},
'commercial_units': {0: 1,
1: 0,
2: 0,
3: 123,
4: 1,
5: 0,
6: 102,
7: 3,
8: 0,
9: 1},
'gross_sqft': {0: 0.0,
1: 0.0,
2: 304650.0,
3: 2548000.0,
4: 356000.0,
5: 0.0,
6: 290440.0,
7: 241764.0,
8: 0.0,
9: 547000.0},
'land_sqft': {0: 0.0,
1: 134988.0,
2: 32000.0,
3: 905000.0,
4: 20267.0,
5: 57600.0,
6: 12500.0,
7: 50173.0,
8: 44704.0,
9: 113800.0},
'neighborhood': {0: 43,
1: 48,
2: 6,
3: 44,
4: 6,
5: 40,
6: 6,
7: 28,
8: 40,
9: 56},
'residential_units': {0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 234,
8: 0,
9: 0},
'sale_date': {0: 2257,
1: 4839,
2: 337,
3: 638,
4: 27,
5: 1458,
6: 2450,
7: 3276,
8: 5082,
9: 1835},
'sale_price': {0: 499401179.0,
1: 345000000.0,
2: 340000000.0,
3: 276947000.0,
4: 202500000.0,
5: 185445000.0,
6: 171000000.0,
7: 169000000.0,
8: 165000000.0,
9: 161000000.0},
'tax_class': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 7, 8: 3, 9: 3},
'total_units': {0: 1,
1: 0,
2: 0,
3: 123,
4: 1,
5: 0,
6: 102,
7: 237,
8: 0,
9: 1},
'zip_code': {0: 11201,
1: 11201,
2: 11201,
3: 11234,
4: 11201,
5: 11249,
6: 11241,
7: 11211,
8: 11249,
9: 11205}}

Related

How to resample hourly this dataframe?

I have this dataset, I'm trying to have a mean of "AC_POWER" every hour but isn't working properly. The dataset have 20-22 value every 15 minutes. I want to have something like this:
DATE AC_POWER
'15-05-2020 00:00' 400
'15-05-2020 01:00' 500
'15-05-2020 02:00' 500
'15-05-2020 03:00' 500
How to solve this?
import pandas as pd
df = pd.read_csv('dataset.csv')
df = df.reset_index()
df['DATE_TIME'] = df['DATE_TIME'].astype('datetime64[ns]')
df = df.resample('H', on='DATE_TIME').mean()
>>> df.head(10).to_dict()
{'AC_POWER': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'DAILY_YIELD': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'DATE_TIME': {0: '15-05-2020 00:00', 1: '15-05-2020 00:00', 2: '15-05-2020 00:00', 3: '15-05-2020 00:00',
4: '15-05-2020 00:00', 5: '15-05-2020 00:00', 6: '15-05-2020 00:00', 7: '15-05-2020 00:00',
8: '15-05-2020 00:00', 9: '15-05-2020 00:00'},
'DC_POWER': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'PLANT_ID': {0: 4135001, 1: 4135001, 2: 4135001, 3: 4135001, 4: 4135001, 5: 4135001,
6: 4135001, 7: 4135001, 8: 4135001, 9: 4135001},
'SOURCE_KEY': {0: '1BY6WEcLGh8j5v7', 1: '1IF53ai7Xc0U56Y', 2: '3PZuoBAID5Wc2HD', 3: '7JYdWkrLSPkdwr4',
4: 'McdE0feGgRqW7Ca', 5: 'VHMLBKoKgIrUVDU', 6: 'WRmjgnKYAwPKWDb', 7: 'ZnxXDlPa8U1GXgE',
8: 'ZoEaEvLYb1n2sOq', 9: 'adLQvlD726eNBSB'},
'TOTAL_YIELD': {0: 6259559.0, 1: 6183645.0, 2: 6987759.0, 3: 7602960.0, 4: 7158964.0,
5: 7206408.0, 6: 7028673.0, 7: 6522172.0, 8: 7098099.0, 9: 6271355.0}}
EDIT: I tried with a different dataset and the same code I've posted and it worked!

You need to set your date as an index first, the following does this and computes the mean for windows of 15 minutes:
df.set_index('DATE_TIME').resample('15T').mean()
Also, make sure your date vector is correctly formated.

I think you're looking for DataFrame.resample:
df.resample(rule='H', on='DATE_TIME')['AC_POWER'].mean()

Categorial area stackplot in pandas grouped by date

I found the way to implement the stackplot if my x-axis is just a list of numbers.
import pandas as pd
import matplotlib.pyplt as plt
d = {'time_key': {0: '2021-03-01',
1: '2021-03-01',
2: '2021-03-01',
3: '2021-03-01'},
'target': {0: 2, 1: 1, 2: 0, 3: 3},
'count': {0: 400, 1: 300, 2: 200, 3: 100},
'fraction': {0: 0.4, 1: 0.3, 2: 0.2, 3: 0.1}}
df = pd.DataFrame(d)
plt.stackplot(range(2), s[s.target==0].fraction, s[s.target==1].fraction,
s[s.target==2].fraction, s[s.target==3].fraction)
But I want to generalize the plot to many dates list.
d = {'time_key': {0: '2021-03-01',
1: '2021-03-01',
2: '2021-03-01',
3: '2021-03-01',
4: '2021-04-01',
5: '2021-04-01',
6: '2021-04-01',
7: '2021-04-01',
8: '2021-05-01',
9: '2021-05-01',
10: '2021-05-01',
11: '2021-05-01'},
'target': {0: 2,
1: 1,
2: 0,
3: 3,
4: 2,
5: 1,
6: 0,
7: 3,
8: 2,
9: 1,
10: 0,
11: 3},
'count': {0: 163,
1: 110,
2: 90,
3: 38,
4: 113,
5: 97,
6: 56,
7: 34,
8: 85,
9: 57,
10: 42,
11: 16},
'fraction': {0: 0.18091009988901222,
1: 0.1220865704772475,
2: 0.09988901220865705,
3: 0.042175360710321866,
4: 0.12541620421753608,
5: 0.1076581576026637,
6: 0.06215316315205328,
7: 0.03773584905660377,
8: 0.09433962264150944,
9: 0.06326304106548279,
10: 0.04661487236403995,
11: 0.017758046614872364}}
And I'd like to assign dates to x-axis in ascending order to see dynamics of the proportions.
Is this a way to implement it in a proper way?
The approximate desired output plot (I need time_key x-axis though):

Try:
dfp = df.set_index(['time_key','target'])['count'].unstack()
dfp.div(dfp.sum(axis=1), axis=0).plot.bar(stacked=True)
Output:

Also useful solution is
d = {0: {'2021-03-01': 0.2, '2021-04-01': 0.25, '2021-05-01': 0.3},
1: {'2021-03-01': 0.3, '2021-04-01': 0.25, '2021-05-01': 0.3},
2: {'2021-03-01': 0.4, '2021-04-01': 0.25, '2021-05-01': 0.3},
3: {'2021-03-01': 0.1, '2021-04-01': 0.25, '2021-05-01': 0.1}}
df = pd.DataFrame(d)
fig, ax = plt.subplots(figsize=(9, 6))
plt.style.use('classic')
df.plot.area(ax=ax)

Bokeh Hovertool stacked barchart

I have constructed a Bokeh stacked barchart by the code below. The chart shows the different tree types for the districts of Copenhagen. At the moment I have a hoverTool which shows the excat amount of trees (corrosponding to the columns with the tree names) for the tree type but I also want it to show the percentage (the columns with _pat the end), but how can I do this with the stacked bar chart?
A reduced part of the data frame:
temp=pd.DataFrame( {'bydelsnavn': {0: 'Amager Vest', 1: 'Amager Øst', 2: 'Bispebjerg', 3: 'Brønshøj-Husum', 4: 'Indre By', 5: 'Nørrebro', 6: 'Valby', 7: 'Vanløse', 8: 'Vesterbro', 9: 'Østerbro'}, 'Alder': {0: 53.0, 1: 21.0, 2: 1.0, 3: 9.0, 4: 4.0, 5: 2.0, 6: 3.0, 7: 44.0, 8: 46.0, 9: 59.0}, 'Alderm': {0: 63.0, 1: 32.0, 2: 49.0, 3: 13.0, 4: 45.0, 5: 55.0, 6: 104.0, 7: 0.0, 8: 50.0, 9: 4.0}, 'Apple': {0: 94.0, 1: 109.0, 2: 115.0, 3: 12.0, 4: 22.0, 5: 81.0, 6: 41.0, 7: 3.0, 8: 132.0, 9: 51.0}, 'Alder_p': {0: 21.9, 1: 8.68, 2: 0.41, 3: 3.72, 4: 1.65, 5: 0.83, 6: 1.24, 7: 18.18, 8: 19.01, 9: 24.38}, 'Alderm_p': {0: 15.18, 1: 7.71, 2: 11.81, 3: 3.13, 4: 10.84, 5: 13.25, 6: 25.06, 7: 0.0, 8: 12.05, 9: 0.96}, 'Apple_p': {0: 14.24, 1: 16.52, 2: 17.42, 3: 1.82, 4: 3.33, 5: 12.27, 6: 6.21, 7: 0.45, 8: 20.0, 9: 7.73}})
My code:
treeName = ['Alder','Alderm','Apple']
treeName_p = ['Alder_p','Alderm_p','Apple_p']
colornames = named.__all__
colornames = colornames[:len(treeName)]
# Create an empty figure
p = figure(x_range = temp['bydelsnavn'].values,plot_width = 700, plot_height=400,
title='Tree pr. district', toolbar_sticky = False,
tools = 'pan,wheel_zoom,reset')
# Stacked bar chart
renderers = p.vbar_stack(stackers=treeName,x='bydelsnavn',source=temp,
width=0.8, color = colornames)
# Add the hover tool
for r in renderers:
tree = r.name
hover = HoverTool(tooltips=[
("%s" % tree, "#{%s}" % tree)
], renderers = [r])
p.add_tools(hover)
# remove the grid
p.xgrid.grid_line_color=None
p.ygrid.grid_line_color=None
# Make sure bars stat at 0
p.y_range.start = 0
# remove - y-axis
p.yaxis.visible = False
# Remove the grey box around the plot
p.outline_line_color = None
# Turn the x-labels
p.xaxis.major_label_orientation = 0.5
# Remove tool bar logo
p.toolbar.logo = None
# Move the border of the left side to show "Amager"
p.min_border_left = 30
show(p)
My current chart looks like this:

Assuming that the values of the _p columns are actually in the data source, you can just add another tooltip to the HoverTool:
for r in renderers:
tree = r.name
p.add_tools(HoverTool(tooltips=[(tree, "#$name"),
(f"{tree} %", f"#{tree}_p")],
renderers=[r]))
Notice how #$name is used in there - not that necessary in this particular case but sometimes comes in handy.

Bokeh: remove Hovertool from toolbar in stacked bar chart

In my stacked barchart I have specified which tools I want in the toolbar. But when I add a hovertool this seems to overwrite my command and adds a hovertool for every element in the stacked barchart. How can I remove the hovertool tool in the toolbar?
Data example:
temp=pd.DataFrame( {'bydelsnavn': {0: 'Amager Vest', 1: 'Amager Øst', 2: 'Bispebjerg', 3: 'Brønshøj-Husum', 4: 'Indre By', 5: 'Nørrebro', 6: 'Valby', 7: 'Vanløse', 8: 'Vesterbro', 9: 'Østerbro'}, 'Alder': {0: 53.0, 1: 21.0, 2: 1.0, 3: 9.0, 4: 4.0, 5: 2.0, 6: 3.0, 7: 44.0, 8: 46.0, 9: 59.0}, 'Alderm': {0: 63.0, 1: 32.0, 2: 49.0, 3: 13.0, 4: 45.0, 5: 55.0, 6: 104.0, 7: 0.0, 8: 50.0, 9: 4.0}, 'Apple': {0: 94.0, 1: 109.0, 2: 115.0, 3: 12.0, 4: 22.0, 5: 81.0, 6: 41.0, 7: 3.0, 8: 132.0, 9: 51.0}, 'Alder_p': {0: 21.9, 1: 8.68, 2: 0.41, 3: 3.72, 4: 1.65, 5: 0.83, 6: 1.24, 7: 18.18, 8: 19.01, 9: 24.38}, 'Alderm_p': {0: 15.18, 1: 7.71, 2: 11.81, 3: 3.13, 4: 10.84, 5: 13.25, 6: 25.06, 7: 0.0, 8: 12.05, 9: 0.96}, 'Apple_p': {0: 14.24, 1: 16.52, 2: 17.42, 3: 1.82, 4: 3.33, 5: 12.27, 6: 6.21, 7: 0.45, 8: 20.0, 9: 7.73}})
My code:
treeName = ['Alder','Alderm','Apple']
treeName_p = ['Alder_p','Alderm_p','Apple_p']
colornames = named.__all__
colornames = colornames[:len(treeName)]
# Create an empty figure
p = figure(x_range = temp['bydelsnavn'].values,plot_width = 700, plot_height=400,
title='Tree pr. district', toolbar_sticky = False,
tools = 'pan,wheel_zoom,reset')
# Stacked bar chart
renderers = p.vbar_stack(stackers=treeName,x='bydelsnavn',source=temp,
width=0.8, color = colornames)
# Add the hover tool
for r in renderers:
tree = r.name
hover = HoverTool(tooltips=[
("%s" % tree, "#{%s}" % tree)
], renderers = [r])
p.add_tools(hover)
# remove the grid
p.xgrid.grid_line_color=None
p.ygrid.grid_line_color=None
# Make sure bars stat at 0
p.y_range.start = 0
# remove - y-axis
p.yaxis.visible = False
# Remove the grey box around the plot
p.outline_line_color = None
# Turn the x-labels
p.xaxis.major_label_orientation = 0.5
# Remove tool bar logo
p.toolbar.logo = None
# Move the border of the left side to show "Amager"
p.min_border_left = 30
show(p)
In the picture the tools I would like to aviod are pointed out:

You can hide the tool buttons completely by passing toggleable=False to HoverTool.

pandas: columns must be same length as key

I'm trying to re-format several columns into strings (they contain NaNs, so I can't just read them in as integers). All of the columns are currently float64, and I want to make it so they don't have decimals.
Here is the data:
{'crash_id': {0: 201226857.0,
1: 201226857.0,
2: 2012272611.0,
3: 2012272611.0,
4: 2012298998.0},
'driver_action1': {0: 1.0, 1: 1.0, 2: 29.0, 3: 1.0, 4: 3.0},
'driver_action2': {0: 99.0, 1: 99.0, 2: 1.0, 3: 99.0, 4: 99.0},
'driver_action3': {0: 99.0, 1: 99.0, 2: 99.0, 3: 99.0, 4: 99.0},
'driver_action4': {0: 99.0, 1: 99.0, 2: 99.0, 3: 99.0, 4: 99.0},
'harmful_event1': {0: 14.0, 1: 14.0, 2: 14.0, 3: 14.0, 4: 14.0},
'harmful_event2': {0: 99.0, 1: 99.0, 2: 99.0, 3: 99.0, 4: 99.0},
'harmful_event3': {0: 99.0, 1: 99.0, 2: 99.0, 3: 99.0, 4: 99.0},
'harmful_event4': {0: 99.0, 1: 99.0, 2: 99.0, 3: 99.0, 4: 99.0},
'most_damaged_area': {0: 14.0, 1: 2.0, 2: 14.0, 3: 14.0, 4: 3.0},
'most_harmful_event': {0: 14.0, 1: 14.0, 2: 14.0, 3: 14.0, 4: 14.0},
'point_of_impact': {0: 15.0, 1: 1.0, 2: 14.0, 3: 14.0, 4: 1.0},
'vehicle_id': {0: 20121.0, 1: 20122.0, 2: 20123.0, 3: 20124.0, 4: 20125.0},
'vehicle_maneuver': {0: 3.0, 1: 1.0, 2: 4.0, 3: 1.0, 4: 1.0}}
When I try to convert those columns to string, this is what happens:
>> df[['crash_id','vehicle_id','point_of_impact','most_damaged_area','most_harmful_event','vehicle_maneuver','harmful_event1','harmful_event2','harmful_event3','harmful_event4','driver_action1','driver_action2','driver_action3','driver_action4']] = df[['crash_id','vehicle_id','point_of_impact','most_damaged_area','most_harmful_event','vehicle_maneuver','harmful_event1','harmful_event2','harmful_event3','harmful_event4','driver_action1','driver_action2','driver_action3','driver_action4']].applymap(lambda x: '{:.0f}'.format(x))
File "C:\Users\<name>\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2376, in _setitem_array
raise ValueError('Columns must be same length as key')
ValueError: Columns must be same length as key
I've never seen this error before and feel like this is something simple...what am I doing wrong?

Your code runs for me with the dictionary you provided. Try creating a function to deal with the NaN cases separately; I think they are causing your issues.
Something basic like below:
def formatter(x):
if x == None:
return None
else:
return '{:.0f}'.format(x)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Number of labels does not match samples on decision tree regression - python

Related

How to resample hourly this dataframe?

Categorial area stackplot in pandas grouped by date

Bokeh Hovertool stacked barchart

Bokeh: remove Hovertool from toolbar in stacked bar chart

pandas: columns must be same length as key

Categories

Resources