Plotly: How to use dropdowns with pandas dataframes? - python

I'm writing a small script to plot material data using plotly, I tried to use a dropdown to select which column of my dataframe to display. I can do this by defining the columns one by one but the dataframe will change in size so I wanted to make it flexible.
I've tried a few things and got to this;
for i in df.columns:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=df[i]))
fig.update_layout(
updatemenus=[
go.layout.Updatemenu(
buttons=list([
dict(
args=["values", i],
label=i,
method="update"
),
]),
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.1,
xanchor="left",
y=1.1,
yanchor="top"
),
]
)
fig.update_layout(
annotations=[
go.layout.Annotation(text="Material:", showarrow=False,
x=0, y=1.085, yref="paper", align="left")
]
)
fig.show()
The chart shows the final column of the df, while dropdown only holds the last column header?
My data looks something like this, where i is the index, the chart and dropdown show column 'G';
i A B C D E F G
1 8 2 4 5 4 9 7
2 5 3 7 7 6 7 3
3 7 4 9 3 7 4 6
4 3 9 3 6 3 3 4
5 1 7 6 9 9 1 9

The following suggestion should be exactly what you're asking for. It won't exceed the built-in functionalities to an extreme extent though, meaning that you can already subset your figure by clicking the legend. Anyway, let me know if this is something you can use or if it needs som adjustments.
Plot 1 - State of plot on first execution:
Click the dropdown in the upper left corner and select, D for example to get:
Plot 2 - State of plot after selecting D from dropdown:
Code:
# imports
import plotly.graph_objs as go
import pandas as pd
# data sample
df = pd.DataFrame({'i': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'A': {0: 8, 1: 5, 2: 7, 3: 3, 4: 1},
'B': {0: 2, 1: 3, 2: 4, 3: 9, 4: 7},
'C': {0: 4, 1: 7, 2: 9, 3: 3, 4: 6},
'D': {0: 5, 1: 7, 2: 3, 3: 6, 4: 9},
'E': {0: 4, 1: 6, 2: 7, 3: 3, 4: 9},
'F': {0: 9, 1: 7, 2: 4, 3: 3, 4: 1},
'G': {0: 7, 1: 3, 2: 6, 3: 4, 4: 9}})
df=df.set_index('i')
# plotly figure setup
fig = go.Figure()
# one trace for each df column
for col in df.columns:
fig.add_trace(go.Scatter(x=df.index, y=df[col].values,
name = col,
mode = 'lines')
)
# one button for each df column
updatemenu= []
buttons=[]
for col in df.columns:
buttons.append(dict(method='restyle',
label=col,
args=[{'y':[df[col].values]}])
)
# some adjustments to the updatemenu
updatemenu=[]
your_menu=dict()
updatemenu.append(your_menu)
updatemenu[0]['buttons']=buttons
updatemenu[0]['direction']='down'
updatemenu[0]['showactive']=True
# update layout and show figure
fig.update_layout(updatemenus=updatemenu)
fig.show()

Related

How to highlight a row in pandas df if it is in another df?

I'm trying to highlight any row in one column in my dataframe if it exists in another.
I have tried:
apics_tonal_features.style.apply(lambda x: ["background: red" if v.isin(blasi_final_features['x']) else "" for v in x], axis = 1)
but since I am comparing strings, it gives me the error
AttributeError: 'str' object has no attribute 'isin'
Here's a little bit of reproducible code for the data frames I am using
apics_tonal_features = pd.DataFrame({'Feature Name': ['Tone', 'Para-linguistic usages of clicks'],
'Feature ID': ['120', '108'],
'Number of Languages': ['74', '64'],
'Number of Variance': ['5', '4'],
'WALS Equivalent': ['WALS 13A', 'WALS 142A']})
blasi_final_features = pd.DataFrame({'x': ['order.of.subject.object.verb', 'Order.of.genitive.and.noun', 'Tone', 'Vowel nasalization']})
Please consider the df.
A B C D
0 1 0 1 1
1 1 2 4 7
2 2 3 5 8
3 3 4 6 9
4 4 5 7 10
5 2 11 2 7
The output
A B C D
1: Red 0 1: Red 1: Red
1: Red 2: Red 4: Red 7: Red
2: Red 3: Red 5: Red 8
3: Red 4: Red 6 9
4: Red 5: Red 7: Red 10
2: Red 11 2: Red 7: Red
Explanation:
A: all values exists in other columns
B: values 0 and 11 are not present in other columns.
C: value 6 is not present in other columns.
D: values 8, 9, `10z are not present in other columns.
Code to achieve that
df = pd.DataFrame({
"A": [1, 1, 2, 3, 4, 2],
"B": [0, 2, 3, 4, 5, 11],
"C": [1, 4, 5, 6, 7, 2],
"D": [1, 7, 8, 9, 10, 7]
})
df = df.apply(
lambda x: [f"{elem}: Red" if
any([elem in df[column].tolist() for column in if column != x.name])
else elem for elem in x], axis=0)

How to insert / transplant columns from one dataframe to another at a certain position?

I have two dataframes:
print(df1)
A B C
0 1 5 9
1 2 6 8
2 3 7 7
3 4 8 6
print(df2)
D E F
0 1 5 9
1 2 6 8
2 3 7 7
3 4 8 6
I want to insert columns D and E from df2 into df1 after column B.
The end result should be like this:
A B D E C
0 1 5 1 5 9
1 2 6 2 6 8
2 3 7 3 7 7
3 4 8 4 8 6
I know there's already a solution with the insert method with pandas:
df1.insert(1, "D", df2["D"])
df1.insert(2, "E", df2["E"])
However I would like to insert D and E at the same time. Like "transplant" it into df1, rather than having multiple inserts. (in real life the data to be transplanted is bigger which is why I want to avoid all the inserts)
My dataframes in dict format, so you can use DataFrame.from_dict():
# df1
{'A': {0: 1, 1: 2, 2: 3, 3: 4},
'B': {0: 5, 1: 6, 2: 7, 3: 8},
'C': {0: 9, 1: 8, 2: 7, 3: 6}}
# df2
{'D': {0: 1, 1: 2, 2: 3, 3: 4},
'E': {0: 5, 1: 6, 2: 7, 3: 8},
'F': {0: 9, 1: 8, 2: 7, 3: 6}}
You can slice the dataframe df1 into two parts based on the location of column B, then concat these slices with columns D, E along the columns axis
i = df1.columns.get_loc('B') + 1
pd.concat([df1.iloc[:, :i], df2[['D', 'E']], df1.iloc[:, i:]], axis=1)
A B D E C
0 1 5 1 5 9
1 2 6 2 6 8
2 3 7 3 7 7
3 4 8 4 8 6
I think your solution is optimal. Alternatively you can:
df1[["D", "E"]] = df2[["D", "E"]]
and then change the column order
import pandas as pd
df1 = pd.DataFrame.from_dict({'A': {0: 1, 1: 2, 2: 3, 3: 4},
'B': {0: 5, 1: 6, 2: 7, 3: 8},
'C': {0: 9, 1: 8, 2: 7, 3: 6}})
df2 = pd.DataFrame.from_dict({'D': {0: 1, 1: 2, 2: 3, 3: 4},
'E': {0: 5, 1: 6, 2: 7, 3: 8},
'F': {0: 9, 1: 8, 2: 7, 3: 6}})
df1.merge(df2[['D', 'E']],on=df1.index)
You can reorder based on your requirement

Matplotlib how to count the occurence of specific value

I have a DataFrame like this:
Apples Oranges
0 1 1
1 2 1
2 3 2
3 2 3
4 1 2
5 2 3
I'm trying to count the occurence of values for both Apples and Oranges (how often values 1,2 and 3 occur in data frame for each fruit). I want to draw a bar chart using Matplotlib but so far I have not been successful. I have tried:
plt.bar(2,['Apples', 'Oranges'], data=df)
plt.show()
But the output is very weird, could I have some advise? Thanks in advance.
Edit: I'm expecting result like this:
You can use the value_counts method together with pandas plotting:
# Sample data
d = {'apples': [1, 2,3,2,1,2], 'oranges': [1,1,2,3,2,3]}
df = pd.DataFrame(data=d)
# Calculate the frequencies of each value
df2 = pd.DataFrame({
"apples": df["apples"].value_counts(),
"oranges": df["oranges"].value_counts()
})
# Plot
df2.plot.bar()
You will get:
Here is another one
import pandas as pd
df = pd.DataFrame({'Apples': [1, 2, 3, 2, 1, 2], 'Oranges': [1, 1, 2, 3, 2, 3]})
df.apply(pd.Series.value_counts).plot.bar()
You can use hist from matplotlib:
d = {'apples': [1, 2,3,2,1,2], 'oranges': [1,1,2,3,2,3]}
df = pd.DataFrame(data=d)
plt.hist([df.apples,df.oranges],label=['apples','oranges'])
plt.legend()
plt.show()
This will give the output:
Whenever you are plotting the count or frequency of something, you should look into a histogram:
from matplotlib import pyplot as plt
import pandas as pd
df = pd.DataFrame({'Apples': {0: 1, 1: 2, 2: 3, 3: 2, 4: 1, 5: 2}, 'Oranges': {0: 1, 1: 1, 2: 2, 3: 3, 4: 2, 5: 3}})
plt.hist([df.Apples,df.Oranges], bins=3,range=(0.5,3.5),label=['Apples', 'Oranges'])
plt.xticks([1,2,3])
plt.yticks([1,2,3])
plt.legend()
plt.show()

Plotly: How to subset a pandas dataframe at specific values in a column?

If I have a dataframe that looks like the following:
Time
Wavelength
Absorption
1
100
0.123
1
101
0.456
1
102
0.798
2
100
0.101
2
101
0.112
2
101
0.131
I want to create a new dataframe that only contains the rows when Time = 1, and only has the Wavelength and Absorption columns:
Wavelength
Absorption
100
0.123
101
0.456
102
0.798
How would I go about doing this?
Sample data:
import pandas as pd
df = pd.DataFrame({'Time': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
'Wavelength': {0: 100, 1: 101, 2: 102, 3: 100, 4: 101, 5: 101},
'Absorption': {0: 0.123,
1: 0.456,
2: 0.798,
3: 0.101,
4: 0.112,
5: 0.131}})
You seem happy with the help you've already gotten in the comments, but since you've tagged your question with [plotly], I thought you might be be interested in how to set up a table that lets you select any subset of unique values of time; [1, 2] or the data as it is in your input dataframe.
Table 1 - Raw data
Table 2 - Subset with Time = 1
The comments in the code section should explain every step pretty clearly. Don't hesitate to let me know if anything should prove unclear.
Complete code:
import plotly.graph_objects as go
import pandas as pd
# input data
df = pd.DataFrame({'Time': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
'Wavelength': {0: 100, 1: 101, 2: 102, 3: 100, 4: 101, 5: 101},
'Absorption': {0: 0.123,
1: 0.456,
2: 0.798,
3: 0.101,
4: 0.112,
5: 0.131}})
# plotly table setup
fig = go.Figure(go.Table(header=dict(values=df.columns.to_list()),
cells=dict(values=[df[col].to_list() for col in df.columns])
)
)
# set up buttons as list where the first element
# is an option for showing the raw data
buttons = [{'method': 'restyle',
'label': 'Raw data',
'args': [{'cells.values': [[df[col].to_list() for col in df.columns]],
'header.values': [df.columns.to_list()]},
],
}]
# create a dropdown option for each unique value of df['Time']
# which in this case is [1, 2]
# and extend the buttons list accordingly
for i, option in enumerate(df['Time'].unique()):
df_subset = (df[df['Time'] == option][['Wavelength', 'Absorption']])
buttons.extend([{'method': 'restyle',
'label': 'Time = ' + str(option),
'args': [{'cells.values': [[df_subset[col].to_list() for col in df_subset.columns]],
'header.values': [df_subset.columns.to_list()]},
],
},])
# configure updatemenu and add constructed buttons
updatemenus = [{'buttons': buttons,
'direction': 'down',
'showactive': True,}]
# update layout with buttons, and show the figure
fig.update_layout(updatemenus=updatemenus)
fig.show()
This answer from #Epsi95 worked for me. Posting it here as an answer rather than a comment:
df[df['Time'] == 1]][['Wavelength', 'Absorption']]

Select only one unique element from multiple lists in Python

This is not a homework that I'm struggling to do but I am trying to solve a problem (here is the link if interested https://open.kattis.com/problems/azulejos).
Here you actually don't have to understand the problem but what I would like to accomplish now is that I want to select only one element from multiple lists and they do not overlap with each other.
For example, at the end of my code, I get an output:
{1: [1, 2, 3], 2: [1, 2, 3, 4], 3: [2, 4], 4: [1, 2, 3, 4]}
I would like to transform this into, for example,
{3: 4, 2: 2, 4:1, 1: 3} -- which is the sample answer that is in the website.
But from my understanding, it can also be simply
{1: 3, 2: 2, 3: 4, 4: 1}
I am struggling to select only one integer that does not overlap with the others. The dictionary I produce in my code contains lists with multiple integers. And I would like to pick only one from each and they are all unique
import sys
n_tiles_row = int(sys.stdin.readline().rstrip())
# print(n_tiles_row) ==> 4
# BACK ROW - JOAO
back_row_price = sys.stdin.readline().rstrip()
# print(back_row_price) ==> 3 2 1 2
back_row_height = sys.stdin.readline().rstrip()
# print(back_row_height) ==> 2 3 4 3
# FRONT ROW - MARIA
front_row_price = sys.stdin.readline().rstrip()
# print(front_row_price) ==> 2 1 2 1
front_row_height = sys.stdin.readline().rstrip()
# print(front_row_height) ==> 2 2 1 3
br_num1_price, br_num2_price, br_num3_price, br_num4_price = map(int, back_row_price.split())
# br_num1_price = 3; br_num2_price = 2; br_num3_price = 1; br_num4_price = 2;
br_num1_height, br_num2_height, br_num3_height, br_num4_height = map(int, back_row_height.split())
# 2 3 4 3
fr_num1_price, fr_num2_price, fr_num3_price, fr_num4_price = map(int, front_row_price.split())
# 2 1 2 1
fr_num1_height, fr_num2_height, fr_num3_height, fr_num4_height = map(int, front_row_height.split())
# 2 2 1 3
back_row = {1: [br_num1_price, br_num1_height],
2: [br_num2_price, br_num2_height],
3: [br_num3_price, br_num3_height],
4: [br_num4_price, br_num4_height]}
# {1: [3, 2], 2: [2, 3], 3: [1, 4], 4: [2, 3]}
front_row = {1: [fr_num1_price, fr_num1_height],
2: [fr_num2_price, fr_num2_height],
3: [fr_num3_price, fr_num3_height],
4: [fr_num4_price, fr_num4_height]}
# {1: [2, 2], 2: [1, 2], 3: [2, 1], 4: [1, 3]}
_dict = {1: [],
2: [],
3: [],
4: []
}
for i in range(n_tiles_row):
_list = []
for n in range(n_tiles_row):
if(list(back_row.values())[i][0] >= list(front_row.values())[n][0]
and list(back_row.values())[i][1] >= list(front_row.values())[n][1]):
_list.append(list(front_row.keys())[n])
_dict[list(back_row.keys())[i]] = _list
print(_dict)
# {1: [1, 2, 3], 2: [1, 2, 3, 4], 3: [2, 4], 4: [1, 2, 3, 4]}
Please let me know if there is another approach to this problem.
Here is a solution using the same syntax as the code you provided.
The trick here was to order the tiles first by price ascending (the question asked for non-descending) then by height descending such that the tallest tile of the next lowest price in the back row would be matched the tallest tile of the next lowest price in the front row.
To do this sorting I utilized Python's sorted() function. See a Stack Overflow example here.
I assumed if there was no such match then immediately break and print according to the problem you linked.
As a side note, you had originally claimed that a python dictionary
{3: 4, 2: 2, 4:1, 1: 3} was equivalent to {1: 3, 2: 2, 3: 4, 4: 1}. While you are correct, you must remember that in Python dictionary objects are unsorted by default so it is not easy to compare keys this way.
import sys
n_tiles_row = int(sys.stdin.readline().rstrip())
# print(n_tiles_row) ==> 4
# BACK ROW - JOAO
back_row_price = sys.stdin.readline().rstrip()
# print(back_row_price) ==> 3 2 1 2
back_row_height = sys.stdin.readline().rstrip()
# print(back_row_height) ==> 2 3 4 3
# FRONT ROW - MARIA
front_row_price = sys.stdin.readline().rstrip()
# print(front_row_price) ==> 2 1 2 1
front_row_height = sys.stdin.readline().rstrip()
# print(front_row_height) ==> 2 2 1 3
# preprocess data into lists of ints
back_row_price = [int(x) for x in back_row_price.strip().split(' ')]
back_row_height = [int(x) for x in back_row_height.strip().split(' ')]
front_row_price = [int(x) for x in front_row_price.strip().split(' ')]
front_row_height = [int(x) for x in front_row_height.strip().split(' ')]
# store each tile into lists of tuples
front = list()
back = list()
for i in range(n_tiles_row):
back.append((i, back_row_price[i], back_row_height[i])) # tuples of (tile_num, price, height)
front.append((i, front_row_price[i], front_row_height[i]))
# sort tiles by price first (as the price must be non-descending) then by height descending
back = sorted(back, key=lambda x: (x[1], -x[2]))
front = sorted(front, key=lambda x: (x[1], -x[2]))
# print(back) ==> [(2, 1, 4), (1, 2, 3), (3, 2, 3), (0, 3, 2)]
# print(front) ==> [(3, 1, 3), (1, 1, 2), (0, 2, 2), (2, 2, 1)]
possible_back_tile_order = list()
possible_front_tile_order = list()
for i in range(n_tiles_row):
if back[i][2] > front[i][2]: # if next lowest priced back tile is taller than next lowest priced front tile
possible_back_tile_order.append(back[i][0])
possible_front_tile_order.append(front[i][0])
else:
break
if len(possible_back_tile_order) < n_tiles_row: # check that all tiles had matching pairs in back and front
print("impossible")
else:
print(possible_back_tile_order)
print(possible_front_tile_order)
A, possibly inefficient, way of solving the issue, is to generate all possible "solutions" (with values potentially not present in the lists corresponding to a specific key) and settle for a "valid" one (for which all values are present in the corresponding lists).
One way of doing this with itertools.permutation (that is able to compute all possible solutions satisfying the uniqueness constraint) would be:
import itertools
def gen_valid(source):
keys = source.keys()
possible_values = set(x for k, v in source.items() for x in v)
for values in itertools.permutations(possible_values):
result = {k: v for k, v in zip(keys, values)}
# : check that `result` is valid
if all(v in source[k] for k, v in result.items()):
yield result
d = {1: [1, 2, 3], 2: [1, 2, 3, 4], 3: [2, 4], 4: [1, 2, 3, 4]}
next(gen_valid(d))
# {1: 1, 2: 2, 3: 4, 4: 3}
list(gen_valid(d))
# [{1: 1, 2: 2, 3: 4, 4: 3},
# {1: 1, 2: 3, 3: 2, 4: 4},
# {1: 1, 2: 3, 3: 4, 4: 2},
# {1: 1, 2: 4, 3: 2, 4: 3},
# {1: 2, 2: 1, 3: 4, 4: 3},
# {1: 2, 2: 3, 3: 4, 4: 1},
# {1: 3, 2: 1, 3: 2, 4: 4},
# {1: 3, 2: 1, 3: 4, 4: 2},
# {1: 3, 2: 2, 3: 4, 4: 1},
# {1: 3, 2: 4, 3: 2, 4: 1}]
This generates n! solutions.
The "brute force" approach using a Cartesian product over the lists, produces prod(n_k) = n_1 * n_1 * ... * n_k solutions (with n_k the length of each list). In the worst case scenario (maximum density) this is n ** n solutions, which is asymptotically much worse than the factorial.
In the best case scenario (minimum density) this is 1 solution only.
In general, this can be either slower or faster than the "permutation solution" proposed above, depending on the "sparsity" of the lists.
For an average n_k of approx. n / 2, n! is smaller/faster for n >= 6.
For an average n_k of approx. n * (3 / 4), n! is smaller/faster for n >= 4.
In this example there are 4! == 4 * 3 * 2 * 1 == 24 permutation solutions, and 3 * 4 * 2 * 4 == 96 Cartesian product solutions.

Categories