Interactive Data table visual power bi via R/python - python

I am attempting to display a table in powerbi that includes a ' checkbox' field that the user can tick or untick.
So far, the closest I've come is to create such a table in R using the rhandsontable library.
'''
DF = data.frame(integer =
1:10,
numeric =
rnorm(10),
logical =
rep(TRUE, 10),
character =
LETTERS[1:10],
factor =
factor(letters[1:10], levels =
letters[10:1],
ordered = TRUE),
factor_allow =
factor(letters[1:10], levels =
letters[10:1],
ordered = TRUE),
date = seq(from
= Sys.Date(), by = "days",
length.out = 10),
stringsAsFactors = FALSE)
rhandsontable(DF, width = 600,
height = 300) %>%
hot_col("factor_allow",
allowInvalid = TRUE)
'''
This solution works in R . However, in powerBi ( through the R script visual), it gets the following error: " cannot display visual".
From what I can tell, this is because Powerbi cannot display dynamic html visuals. If this correct, is there a way around this or alternate solution?
Any suggestions will be greatly appreciated!

Related

Retrieving data from the Air Quality Index (AQI) website through the API and only recieving small nr. of stations

I'm working on a personal project and I'm trying to retrieve air quality data from the https://aqicn.org website using their API.
I've used this code, which I've copied and adapted for the city of Bucharest as follows:
import pandas as pd
import folium
import requests
# GET data from AQI website through the API
base_url = "https://api.waqi.info"
path_to_file = "~/path"
# Got token from:- https://aqicn.org/data-platform/token/#/
with open(path_to_file) as f:
contents = f.readlines()
key = contents[0]
# (lat, long)-> bottom left, (lat, lon)-> top right
latlngbox = "44.300264,25.920181,44.566991,26.297836" # For Bucharest
trail_url=f"/map/bounds/?token={key}&latlng={latlngbox}" #
my_data = pd.read_json(base_url + trail_url) # Joined parts of URL
print('columns->', my_data.columns) #2 cols ‘status’ and ‘data’ JSON
### Built a dataframe from the json file
all_rows = []
for each_row in my_data['data']:
all_rows.append([each_row['station']['name'],
each_row['lat'],
each_row['lon'],
each_row['aqi']])
df = pd.DataFrame(all_rows, columns=['station_name', 'lat', 'lon', 'aqi'])
# Cleaned the DataFrame
df['aqi'] = pd.to_numeric(df.aqi, errors='coerce') # Invalid parsing to NaN
# Remove NaN entries in col
df1 = df.dropna(subset = ['aqi'])
Unfortunately it only retrieves 4 stations whereas there are many more available on the actual site. In the API documentation the only limitation I saw was for "1,000 (one thousand) requests per second" so why can't I get more of them?
Also, I've tried to modify the lat-long values and managed to get more stations, but they were outside the city I was interested in.
Here is a view of the actual perimeter I've used in the embedded code.
If you have any suggestions as of how I can solve this issue, I'd be very happy to read your thoughts. Thank you!
Try using waqi through aqicn... not exactly a clean API but I found it to work quite well
import pandas as pd
url1 = 'https://api.waqi.info'
# Get token from:- https://aqicn.org/data-platform/token/#/
token = 'XXX'
box = '113.805332,22.148942,114.434299,22.561716' # polygon around HongKong via bboxfinder.com
url2=f'/map/bounds/?latlng={box}&token={token}'
my_data = pd.read_json(url1 + url2)
all_rows = []
for each_row in my_data['data']:
all_rows.append([each_row['station']['name'],each_row['lat'],each_row['lon'],each_row['aqi']])
df = pd.DataFrame(all_rows,columns=['station_name', 'lat', 'lon', 'aqi'])
From there its easy to plot
df['aqi'] = pd.to_numeric(df.aqi,errors='coerce')
print('with NaN->', df.shape)
df1 = df.dropna(subset = ['aqi'])
df2 = df1[['lat', 'lon', 'aqi']]
init_loc = [22.396428, 114.109497]
max_aqi = int(df1['aqi'].max())
print('max_aqi->', max_aqi)
m = folium.Map(location = init_loc, zoom_start = 5)
heat_aqi = HeatMap(df2, min_opacity = 0.1, max_val = max_aqi,
radius = 60, blur = 20, max_zoom = 2)
m.add_child(heat_aqi)
m
Or as such
centre_point = [22.396428, 114.109497]
m2 = folium.Map(location = centre_point,tiles = 'Stamen Terrain', zoom_start= 6)
for idx, row in df1.iterrows():
lat = row['lat']
lon = row['lon']
station = row['station_name'] + ' AQI=' + str(row['aqi'])
station_aqi = row['aqi']
if station_aqi > 300:
pop_color = 'red'
elif station_aqi > 200:
pop_color = 'orange'
else:
pop_color = 'green'
folium.Marker(location= [lat, lon],
popup = station,
icon = folium.Icon(color = pop_color)).add_to(m2)
m2
checking for stations within HK, returns 19
df[df['station_name'].str.contains('HongKong')]

DashTable not updating with DatePickerSingle input in Callback

I am pretty new to dash and I have tried to read as much as I can to understand what the issue might be. In a nutshell I have a single datepicker which is an input to the DataTable and Graph callback. The graph callback is working fine so it is just the DataTable which is causing problems. I also tried the single input to multiple output callback but didnt work. My code is as below:
app = JupyterDash()
folder = os.getcwd()
portfolio_returns_table = pd.read_csv(Path(folder, 'portfolioreturns_maria.csv',parse_dates=[0]))
portfolio_returns_table = portfolio_returns_table.set_index('Unnamed: 0')
name_portfolioID_table = pd.read_csv(Path(folder, 'name_portfolioID.csv'))
#Calculate portfolio cumulative returns
df_cumret = (portfolio_returns_table+1).cumprod().round(5)
df_cumret.index = pd.to_datetime(df_cumret.index)
app.layout = html.Div(html.Div([dcc.DatePickerSingle(
id='my-date-picker-single',
min_date_allowed=dt.date(df_cumret.index.min()),
max_date_allowed=dt.date(df_cumret.index.max()),
initial_visible_month=dt.date(df_cumret.index.max()),
date = dt.date(df_cumret.index.max())
,display_format = 'Y-MM-DD',clearable = True),
html.Div(id='output-container-date-picker-single'),
html.Div(dash_table.DataTable(id = 'data_table',
data = {},
fixed_rows={'headers': True},
style_cell = {'textAlign': 'left'},
style_table={'height': 400})),
html.Div(dcc.Graph('my_graph'))
]))
#app.callback([Output('data_table','data'),Output('data_table','columns')],
[Input('my-date-picker-
single','date')])
def update_leader_table(date):
#Get data for the selected date and transpose
df_T = df_cumret.loc[[date]].T
#Sort the table to reveal the top leaders
df_Top = df_T.sort_values(df_T.columns[0], ascending=False)[:10]
#Convert the index to an interger
df_Top.index = df_Top.index.astype(int)
#Generate the leaderboard to given date
df_leader = pd.merge(df_Top,name_portfolioID_table,
left_index=True,right_index=True, how = 'left')
#Create the col rank
df_leader['Rank'] = range(1,len(df_leader)+1)
df_leader.columns = ['Cum Return', 'Investor','Rank']
df_leader.reset_index(drop = True, inplace = True)
data = df_leader.to_dict('records')
columns= [{'id': c, 'name': c, "selectable": True} for c in
df_leader.columns]
return (data,columns)
#callback to link calendar to graph
#app.callback(Output('my_graph','figure'),[Input('my-date-picker-single','date')])
def update_graph(date):
#date filter
df_T = df_cumret.loc[:date].T
#Sort the table to reveal the top leaders & filter for leaderboard
df_Top = df_T.sort_values(df_T.columns[-1], ascending=False)[:10]
#Transpose to have date as index
df_top_graph = df_Top.T
#set the columns as an Int
df_top_graph.columns = df_top_graph.columns.astype(int)
#Rename columns
df_top_graph.rename(columns=dict(zip(name_portfolioID_table.index,
name_portfolioID_table.name)),
inplace=True)
#Generate graph
fig = px.line(df_top_graph, x = df_top_graph.index, y =
df_top_graph.columns, title='ETF LEADERBOARD PERFORMANCE: '+date, labels=
{'Unnamed: 0':'Date','value':'Cumulative Returns'})
fig.update_layout(hovermode = 'x unified')
fig.update_traces(hovertemplate='Return: %{y} <br>Date: %{x}')
fig.update_layout(legend_title_text = 'Investor')
return fig
if __name__ == '__main__':
app.run_server(mode = 'inline',debug=True, port = 65398)

Vertical positioning of nodes in Sankey diagram to avoid collision with links

I'm trying to make a Sankey-plot using Plotly, which follows the filtering of certain documents into either being in scope or out of scope, i.e. 1 source, 2 targets, however some documents are filtered during step 1, some during step 2 etc. This leads to the following Sankey-plot:
Current output
Now what I would ideally like is for it to look something like this:
Ideal output
I've already tried to look through the documentation on : https://plot.ly/python/reference/#sankey but I fail to find what I'm looking for, ideally I would like to implement a feature to prevent the plot from overlapping nodes and links.
This is the code I'm using the generate the plot object:
def genSankeyPlotObject(df, cat_cols=[], value_cols='', visible = False):
### COLORPLATTE TO USE
colorPalette = ['472d3c', '5e3643', '7a444a', 'a05b53', 'bf7958', 'eea160', 'f4cca1', 'b6d53c', '71aa34', '397b44',
'3c5956', '302c2e', '5a5353', '7d7071', 'a0938e', 'cfc6b8', 'dff6f5', '8aebf1', '28ccdf', '3978a8',
'394778', '39314b', '564064', '8e478c', 'cd6093', 'ffaeb6', 'f4b41b', 'f47e1b', 'e6482e', 'a93b3b',
'827094', '4f546b']
### CREATES LABELLIST FROM DEFINED COLUMNS
labelList = []
for catCol in cat_cols:
labelListTemp = list(set(df[catCol].values))
labelList = labelList + labelListTemp
labelList = list(dict.fromkeys(labelList))
### DEFINES THE NUMBER OF COLORS IN THE COLORPALLET
colorNum = len(df[cat_cols[0]].unique()) + len(df[cat_cols[1]].unique()) + len(df[cat_cols[2]].unique())
TempcolorPallet = colorPalette * math.ceil(len(colorPalette)/colorNum)
shuffle(TempcolorPallet)
colorList = TempcolorPallet[0:colorNum]
### TRANSFORMS DF INTO SOURCE -> TARGET PAIRS
for i in range(len(cat_cols)-1):
if i==0:
sourceTargetDf = df[[cat_cols[i],cat_cols[i+1],value_cols]]
sourceTargetDf.columns = ['source','target','count']
else:
tempDf = df[[cat_cols[i],cat_cols[i+1],value_cols]]
tempDf.columns = ['source','target','count']
sourceTargetDf = pd.concat([sourceTargetDf,tempDf])
sourceTargetDf = sourceTargetDf.groupby(['source','target']).agg({'count':'sum'}).reset_index()
### ADDING INDEX TO SOURCE -> TARGET PAIRS
sourceTargetDf['sourceID'] = sourceTargetDf['source'].apply(lambda x: labelList.index(x))
sourceTargetDf['targetID'] = sourceTargetDf['target'].apply(lambda x: labelList.index(x))
### CREATES THE SANKEY PLOT OBJECT
data = go.Sankey(node = dict(pad = 15,
thickness = 20,
line = dict(color = "black",
width = 0.5),
label = labelList,
color = colorList),
link = dict(source = sourceTargetDf['sourceID'],
target = sourceTargetDf['targetID'],
value = sourceTargetDf['count']),
valuesuffix = ' ' + value_cols,
visible = visible)
return data

Updating Data in Bokeh

I'm new to Bokeh and not sure how to get my plot data and how to update it.
My code is following these instructions:
https://github.com/WillKoehrsen/Bokeh-Python-Visualization/blob/master/interactive/exploration/interactive_development.ipynb
But unfurtunately the update method doesn't seem to work for me. i was trying to find documentation regarding this method but couldn't find any. can anyone assist?
basically i've generated a pandas dataframe from my data and converted it to ColumnDataSource. now i want to add or subtract data from it but the new ColumnDataSource does not update the old one.
Any help would be most appreciated! This is my code as of now, it still won't update properly:
def update(attr, old, new):
stations_to_plot = [int(station_selection.labels[i]) for i in station_selection.active]
by_station = pd.DataFrame(data=no_bikes.loc[stations_to_plot,:'23 PM'].values,index=list(map(str,initial_stations))
,columns=no_bikes.loc[:,:'23 PM'].columns.tolist())
new_src = ColumnDataSource(by_station.T)
r.data_source.data.update(new_src.data)
stations=list(map(str,no_bikes.loc[:,'station_avg'].nlargest(10).index.tolist()))
station_selection = CheckboxGroup(labels=stations, active = [0,1,3])
station_selection.on_change('active', update)
initial_stations = [int(station_selection.labels[i]) for i in station_selection.active]
range_select = RangeSlider(start = 0, end = 23, value = (0, 23),step = 1, title = 'Hours to plot')
range_select.on_change('value', update)
by_station = pd.DataFrame(data=no_bikes.loc[initial_stations,:'23 PM'].values,index=list(map(str,initial_stations))
,columns=no_bikes.loc[:,:'23 PM'].columns.tolist())
src = ColumnDataSource(by_station.T)
p = figure(plot_width = 500, plot_height = 500, title = 'Chances not to find bikes',
x_axis_label = 'Hour of the day', y_axis_label = 'Proportion',x_range=src.data['index'])
for i,station in enumerate(src.data.keys()):
if station in list(map(str, initial_stations)):
r=p.line(x='index',y=station,source =src, legend='Station N.'+station, color=Category20_16[i], line_width=5)
controls = WidgetBox(station_selection,range_select)
layout = row(controls, p)
curdoc().add_root(layout)

Python / Pandas Dataframe: Automatically fill in missing rows

My goal is to ultimately create a scatter plot with date on the x-axis and won delegates (of each candidate) on the y-axis. I'm unsure of how to "fill in the blanks" when it comes to missing dates. I've attached a picture of the table I get.
For example, I'm trying to put March 1 as the date for Alaska, Arkansas, etc. to make it possible to plot the data.
# CREATE DATAFRAME WITH DELEGATE WON/TARGET INFORMATION
import requests
from lxml import html
import pandas
url = "http://projects.fivethirtyeight.com/election-2016/delegate-targets/"
response = requests.get(url)
doc = html.fromstring(response.text)
tables = doc.findall('.//table[#class="delegates desktop"]')
election = tables[0]
election_rows = election.findall('.//tr')
def extractCells(row, isHeader=False):
if isHeader:
cells = row.findall('.//th')
else:
cells = row.findall('.//td')
return [val.text_content() for val in cells]
def parse_options_data(table):
rows = table.findall(".//tr")
header = extractCells(rows[1], isHeader=True)
data = [extractCells(row, isHeader=False) for row in rows[2:]]
trumpdata = "Trump Won Delegates"
cruzdata = "Cruz Won Delegates"
kasichdata = "Kasich Won Delegates"
data = pandas.DataFrame(data, columns=["Date", "State or Territory", "Total Delegates", trumpdata, cruzdata, kasichdata, "Rubio"])
data.insert(4, "Trump Target Delegates", data[trumpdata].str.extract(r'(\d{0,3}$)'))
data.insert(6, "Cruz Target Delegates", data[cruzdata].str.extract(r'(\d{0,3}$)'))
data.insert(8, "Kasich Target Delegates", data[kasichdata].str.extract(r'(\d{0,3}$)'))
data = data.drop('Rubio', 1)
data[trumpdata] = data[trumpdata].str.extract(r'(^\d{0,3})')
data[cruzdata] = data[cruzdata].str.extract(r'(^\d{0,3})')
data[kasichdata] = data[kasichdata].str.extract(r'(^\d{0,3})')
return df
election_data = parse_options_data(election)
df = pandas.DataFrame(election_data)
df
You could do,
data.fillna('March 1')
I would advise you to go through the documentation
http://pandas.pydata.org/pandas-docs/stable/10min.html

Categories