Repeated values in date axis - python

I'm trying to make a bar graph which has its X axis's type as time. I formatted it using 'utcyearmonth', but the X axis's label repeats values.
Any ideas on what I'm doing wrong?
Thank you in advance!
This is what happens to the graph
The code I've written is:
serie_temporal = alt.Chart(base_filtrada).mark_bar(size=60).encode(
x=alt.X(
'utcyearmonth(Data):T',
axis=alt.Axis(format="%Y %B"),
scale = alt.Scale(nice={'interval': 'month', 'step': 1})
),
y='sum(Valor):Q',
color='Tipo de despesa'
).properties(
height=400
)
I already tried changing the axis type from time to nominal, ordinal and quantitative and removing the scale configuration.

Related

Understanding the interaction between mark_line point overlay and legend

I have found some unintuitive behavior in the interaction between the point property of mark_line and the appearance of the color legend for Altair/Vega-Lite. I ran into this when attempting to create a line with very large and mostly-transparent points in order to increase the area that would trigger the line's tooltip, but was unable to preserve a visible type=gradient legend.
The following code is an MRE for this problem, showing 6 cases: the use of [False, True, and a custom OverlayMarkDef] for the point property and the use of plain and customized color encoding.
import pandas as pd
import altair as alt
# create data
df = pd.DataFrame()
df['x_data'] = [0, 1, 2] * 3
df['y2'] = [0] * 3 + [1] * 3 + [2] * 3
# initialize
base = alt.Chart(df)
markdef = alt.OverlayMarkDef(size=1000, opacity=.001)
color_encode = alt.Color(shorthand='y2', legend=alt.Legend(title='custom legend', type='gradient'))
marks = [False, True, markdef]
encodes = ['y2', color_encode]
plots = []
for i, m in enumerate(marks):
for j, c in enumerate(encodes):
plot = base.mark_line(point=m).\
encode(x='x_data', y='y2', color=c, tooltip=['x_data','y2']).\
properties(title=', '.join([['False', 'True', 'markdef'][i], ['plain encoding', 'custom encoding'][j]]))
plots.append(plot)
combined = alt.vconcat(
alt.hconcat(*plots[:2]).resolve_scale(color='independent'),
alt.hconcat(*plots[2:4]).resolve_scale(color='independent'),
alt.hconcat(*plots[4:]).resolve_scale(color='independent')
).resolve_scale(color='independent')
The resulting plot (the interactive tooltips work as expected):
The color data is the same for each of these plots, and yet the color legend is all over the place. In my real case, the gradient is preferred (the data is quantitative and continuous).
With no point on the mark_line, the legend is correct.
Adding point=True converts the legend to a symbol type - I'm not sure why this is the case since the default legend type is gradient for quantitative data (as seen in the first row) and this is the same data - but can be forced back to gradient by the custom encoding.
Attempting to make a custom point via OverlayMarkDef however renders the forced gradient colorbar invisible - matching the opacity of the OverlayMarkDef. But it is not simply a matter of the legend always inheriting the properties of the point, because the symbol legend does not attempt to reflect the opacity.
I would like to have the normal gradient colorbar available for the custom OverlayMarkDef, but I would also love to build up some intuition for what is going on here.
The transparency issue with the bottom right plot has been fixed since Altair 4.2.0, so now all occasions that include a point on the line changes the legend to 'Ordinal' instead of 'Quantitative'.
I believe the reason the legend is converted to a symbol instead of a gradient, is that your are adding filled points and the fill channel is not set to a quantitative field so it defaults to either ordinal or nominal with a sort:
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
)
plot + plot.mark_circle(opacity=1)
mark_point gives a gradient legend since it has not fill, and if we set the fill for mark_circle explicitly we also get a gradient legend (one for fill and one for color.
plot = base.mark_line().encode(
x='x_data',
y='y2',
color='y2',
fill='y2'
)
plot + plot.mark_circle(opacity=1)
I agree with you that this is a bit unexpected and it would be more convenient if the encoding type of point=True was set to the same as that used for the lines. You might suggest this as an enhancement in VegaLite together with reporting the apparent bug that you can't override the legend type via type='gradient'.

Adjust y axis when using parallel_coordinates

I'm working on plotting some data with pandas in the form of parallel coordinates, and I'm not too sure how to go about setting the y-axis scaling.
here is my code:
def show_means(df: pd.DataFrame):
plt.figure('9D-parallel_coordinates')
plt.title('continents & features')
parallel_coordinates(df,'continent', color=['blue', 'green', 'red', 'yellow','orange','black'])
plt.show()
and I got this:
enter image description here
as shown in the graph, the value of "tempo" is way more than others. I want to scale all features values between 0 and 1 and get a line chart. How could I do that? Also, I want to change exegesis to vertical that readers can understand it easier.
this is my data frame:
enter image description here
Thanks
To normalize your values between 0 and 1, you have multiple choices. One of them could be (MinMaxScaler): the lowest value of each column is 0 and the highest value is 1:
df = (df - df.min()) / (df.max() - df.min())
To have vertically labels, use df.plot(rot=90)

How to display a dataframe with one field through an altair chart?

I have a dataframe, of this kind, I don't quite understand how to display it in the form of a regular curve graph.
st.write(data_I)
model_graph = alt.Chart(data_I).transform_filter(
size
).mark_line().encode(
x=alt.X('index'),
y=alt.Y('confirmed:Q', title='Колличество'),
).properties(
width=820,
height=500
).configure_axis(
labelFontSize=17,
titleFontSize=20
)
st.altair_chart(model_graph)
Accordingly, I do not understand what to indicate on the X axis
See the Including Index Data in the Altair docs; briefly, you can start with
alt.Chart(data_I.reset_index())
and then the index will be accessible as a normal dataframe column.

Adding X-Y offsets to data points

I'm looking for a way to specify an X-Y offset to plotted data points. I'm just getting into Altair, so please bear with me.
The situation: I have a dataset recording daily measurements for 30 people. Every person can register several different types of measurements every day.
Example dataset & plot, with 2 people and 2 measurement types:
import pandas as pd
df = pd.DataFrame.from_dict({"date": pd.to_datetime(pd.date_range("2019-12-01", periods=5).repeat(4)),
"person": pd.np.tile(["Bob", "Amy"], 10),
"measurement_type": pd.np.tile(["score_a", "score_a", "score_b", "score_b"], 5),
"value": 20.0*np.random.random(size=20)})
import altair as alt
alt.Chart(df, width=600, height=100) \
.mark_circle(size=150) \
.encode(x = "date",
y = "person",
color = alt.Color("value"))
This gives me this graph:
In the example above, the 2 measurement types are plotted on top of each other. I would like to add an offset to the circles depending on the "measurement_type" column, so that they can all be made visible around the date-person location in the graph.
Here's a mockup of what I want to achieve:
I've been searching the docs but haven't figured out how to do this - been experimenting with the "stack" option, with the dx and dy options, ...
I have a feeling this should just be another encoding channel (offset or alike), but that doesn't exist.
Can anyone point me in the right direction?
There is currently no concept of an offset encoding in Altair, so the best approach to this will be to combine a column encoding with a y encoding, similar to the Grouped Bar Chart example in Altair's documentation:
alt.Chart(df,
width=600, height=100
).mark_circle(
size=150
).encode(
x = "date",
row='person',
y = "measurement_type",
color = alt.Color("value")
)
You can then fine-tune the look of the result using standard chart configuration settings:
alt.Chart(df,
width=600, height=alt.Step(25)
).mark_circle(
size=150
).encode(
x = "date",
row='person',
y = alt.Y("measurement_type", title=None),
color = alt.Color("value")
).configure_facet(
spacing=10
).configure_view(
strokeOpacity=0
)
Well I don't know what result you are getting up until know, but maybe write a function whith parameters likedef chart(DotsOnXAxis, FirstDotsOnYAxis, SecondDotsOnYAxis, OffsetAmount)
and then put those variables on the right place.
If you want an offset with the dots maybe put in a system like: SecondDotsOnYAxis = FirstDotsOnYAxis + OffsetAmount

How can I exclude certain dates (e.g., weekends) from time series plots?

In the following example, I'd like to exclude weekends and plot Y as a straight line, and specify some custom frequency for major tick labels since they would be a "broken" time series (e.g., every Monday, a la matplotlib's set_major_locator).
How would I do that in Altair?
import altair as alt
import pandas as pd
index = pd.date_range('2018-01-01', '2018-01-31', freq='B')
df = pd.DataFrame(pd.np.arange(len(index)), index=index, columns=['Y'])
alt.Chart(df.reset_index()).mark_line().encode(
x='index',
y='Y'
)
A quick way to do that is to specify the axis as an ordinal field. This would produce a very ugly axis, with the hours specified for every tick. To change that, I add a column to the dataframe with a given label. I also added the grid, as by default it is removed for an ordinal encoding, and set the labelAngle to 0.
df2 = df.assign(label=index.strftime('%b %d %y'))
alt.Chart(df2).mark_line().encode(
x=alt.X('label:O', axis=alt.Axis(grid=True, labelAngle=0)),
y='Y:Q'
)
Beware that it would remove any missing point. So, maybe you want to add a tooltip. This is discussed in the documentation here.
You can also play with labelOverlap in the axis setting depending of hat you want.
To customize the axis, we can build one up using mark_text and bring back the grid with mark_rule and a custom dataframe. It does not necessarily scale up well, but it can give you some ideas.
df3 = df2.loc[df2.index.dayofweek == 0, :].copy()
df3["Y"] = 0
text_chart = alt.Chart(df3).mark_text(dy = 15).encode(
x=alt.X('label:O', axis = None),
y=alt.Y('Y:Q'),
text=alt.Text('label:O')
)
tick_chart = alt.Chart(df3).mark_rule(color='grey').encode(
x=alt.X('label:O', axis=None),
)
line_chart = alt.Chart(df2).mark_line().encode(
x=alt.X('label:O', axis=None, scale=alt.Scale(rangeStep=15)),
y='Y:Q'
)
text_chart + tick_chart + line_chart

Categories