How to display a dataframe with one field through an altair chart? - python

I have a dataframe, of this kind, I don't quite understand how to display it in the form of a regular curve graph.
st.write(data_I)
model_graph = alt.Chart(data_I).transform_filter(
size
).mark_line().encode(
x=alt.X('index'),
y=alt.Y('confirmed:Q', title='Колличество'),
).properties(
width=820,
height=500
).configure_axis(
labelFontSize=17,
titleFontSize=20
)
st.altair_chart(model_graph)
Accordingly, I do not understand what to indicate on the X axis

See the Including Index Data in the Altair docs; briefly, you can start with
alt.Chart(data_I.reset_index())
and then the index will be accessible as a normal dataframe column.

Related

how to replicate plot: density bar plot in Python

I'm working on a project and would like to plot by data in a similar way as this example from a book:
So I would like to create a density histogram for my categorical features (left image) and than add a separate column for each value of another feature (middle and right image).
In my case the feature I want to plot is called [district_code] and I would like to create columns based on a feature called [status_group]
What I've tried so far:
sns.kdeplot(data = raw, x = "district_code"): problem, it is a line plot, not a histogram
sns.kdeplot(data = raw, x = "district_code", col = "status_group"): problem, you can't use the col argument for this plottype
sns.displot(raw, x="district_code", col = 'status_group'): problem, col argument works, but it creates a countplot, not a density plot
I would really appreciate some suggestions about the correct code I could use.
This is just an example for one of my categorical features, but I have many more I would like to plot. Any suggestions on how to turn this into a function where I could run the code for a list of categorical features would be highly appreciated.
UPDATE:
sns.displot(raw, x="source_class", stat = 'density', col = 'status_group', color = 'black'): works but looks a bit akward for some features.
How could I improve this?
Good:
Not so good:

Encoding a list column to the legend of a plot

Apologies in advance, I am not sure how to word this question best:
I am working with a large dataset, and I would like to plot Latitude and Longitude where the colour of the points (actually the opacity) is encoded to a 'FeatureType' column binded to the legend. This way I can use the legend to highlight on my map various features I am looking for.
Here is a picture of my map and legend so far
The problem is that in my dataset, the FeatureType column is a list of features that can be found there (i.e arch, bridge, etc..).
How can I make it so that the point shows up for both arch, and bridge. At the moment it creates its own category of (arch,bridge etc.), leading to over 300 combinations of about 20 different FeatureTypes.
The dataset can be found at http://atlantides.org/downloads/pleiades/dumps/pleiades-locations-latest.csv.gz
N.B: I am using altair/pandas
import altair as alt
import pandas as pd
from vega_datasets import data
df = pd.read_csv ('C://path/pleiades-locations.csv')
alt.data_transformers.enable('json')
countries = alt.topo_feature(data.world_110m.url, 'countries')
selection = alt.selection_multi(fields=['featureType'], bind='legend')
brush = alt.selection(type='interval', encodings=['x'])
map = alt.Chart(countries).mark_geoshape(
fill='lightgray',
stroke='white'
).project('equirectangular').properties(
width=500,
height=300
)
points = alt.Chart(df).mark_circle().encode(
alt.Latitude('reprLat:Q'),
alt.Longitude('reprLong:Q'),
alt.Color('featureType:N'),
tooltip=['featureType','timePeriodsKeys:N'],
opacity=alt.condition(selection, alt.value(1), alt.value(0.0))
).add_selection(
selection)
(map + points)
It is not possible for Altair to generate the labels you want from your current column format. You will need to turn your comma-separated string labels into lists and then explode the column so that you get one row per item in the list:
import altair as alt
import pandas as pd
from vega_datasets import data
alt.data_transformers.enable('data_server')
df = pd.read_csv('http://atlantides.org/downloads/pleiades/dumps/pleiades-locations-latest.csv.gz')[['reprLong', 'reprLat', 'featureType']]
df['featureType'] = df['featureType'].str.split(',')
df = df.explode('featureType')
countries = alt.topo_feature(data.world_110m.url, 'countries')
world_map = alt.Chart(countries).mark_geoshape(
fill='lightgray',
stroke='white')
points = alt.Chart(df).mark_circle(size=10).encode(
alt.Latitude('reprLat:Q'),
alt.Longitude('reprLong:Q'),
alt.Color('featureType:N', legend=alt.Legend(columns=2)))
world_map + points
Note that having this many entries in the legend is not meaningful since the colors are repeated. The interactivity would help with that somewhat, but I would consider splitting this up into multiple charts. I am not sure if it is even possible to expand the legend to show those hidden 81 entries. And double check that the long lat location corresponds correctly with the world map projection you are using, they seemed to move around when I changed the projection.

Which parts of my dataframe are being plotted?

The goal is to plot the data frame I'm working with on a single chart, with a line for each value of init_population where the y-axis is count and x-axis is tick_number.
I've figured out how to use groupby() and plot() together to make this:
As you can see, all the lines are there nicely, but I'm pretty confident that the blue at the top that doesn't follow the relationship the other lines are following is actually a different column of data.
So that this is reproducible, the data is available here.
import pandas as pd
max_runs_data = pd.read_csv('clean_table.csv')
del max_runs_data['visualization']
max_runs_data.columns = ['run_number','init_population', 'tick', 'turtle_count']
max_runs_data.set_index('tick', inplace = True)
test_plot_1 = max_runs_data.groupby('init_population')['turtle_count'].plot()
test_plot_2 = max_runs_data.groupby('init_population').plot(y='turtle_count')
test_plot_1 is the linked image, test_plot_2 is a separate plot for each group.
Is it obvious how to specify the columns for x and y without losing the grouping on a single chart?
Thanks

Matplotlib 3D plot colors from different classes from Dataframe

I am trying to plot a 3D plot in Matplotlib from a Pointcloud data which is essentially extracted from two different classes.
However, I cannot differentiate the classes into different colors. My code is below.
x=pd.DataFrame(np.array(x).reshape(-1,1))
y=pd.DataFrame( np.array(y).reshape( -1, 1 ) )
z=pd.DataFrame(np.array(z).reshape(-1,1))
target=pd.DataFrame(np.array(target).reshape(-1,1))
new_data=[x,y,z,target]
new_data = pd.concat(new_data, axis=1, ignore_index=True )
new_data.columns = ['x','y','z','target']
colors=[]
fig=plt.figure(figsize=(8,8))
ax=fig.add_subplot(111,projection='3d')
ax.scatter(new_data.x,new_data.y,new_data.z,color='target')
The color argument cannot be linked to the class in the "Target" column in my dataframe. Is there something that I am missing?
I found the answer myself- Mapped the Dataframe to the arguments of Color using below col=new_data['target'].map({'Variable1':'r','Variable2 ':'g','Variable3':'b'})
you're saying that the colors should come from the values of the string 'target'. Change it to c=new_data.target

Python scatterplot design - select specific values of a variable for the x axis based on another columns values

I am relatively new to python and am currently trying to generate a scatterplot based off of some data using pandas & seaborn.
The data I'm using ('ErrorMedianScatter') is as follows (apologies for the link, I have yet to get permissions to embed images!):
Image of data
Each participant has two data points of interest. The mean when MissingLimb = 0 or 1
I want to create a scatterplot for participants where the x-axis represents their value for 'mean' when 'MissingLimb' = 0, and the y-axis represents their value for 'mean' when 'MissingLimb' = 1.
I am using the current code so far to create a scatterplot:
sns.lmplot(("mean",
"mean",
data=ErrorMedianScatter,
fit_reg=False,
hue="participant")
This generates a perfectly functional, but very uninteresting, scatterplot. What I'm stuck on is creating an x-/y-axis variable that allows for me to specify that I'm interested in the mean of a participant based on the value of 'MissingLimb' column.
Many thanks in advance!
There are most likely multiple ways to solve your problem. The method I'd take is to first transform you dataset in such a way that there is a single row (observation) for each participant, and where (for each row) there is one column that reports the means where MissingLimb is 0 and another column that reports the means where MissingLimb is 1.
You can accomplish this data transformation with this code:
df = pd.pivot_table(ErrorMedianScatter,
values='mean',
index='participant',
columns='MissingLimb')
df.columns = ['MissingLimb 0', 'MissingLimb 1']
You can then use this (transformed) dataframe to create the scatterplot:
sns.lmplot(data=df, x='MissingLimb 0', y='MissingLimb 1')
Notice that in addition to specifying the data to plot (using the data parameter), I also specified the data to plot on the x- and y-axis (using the x and y parameters, respectively). You can add additional arguments to the sns.lmplot call and customize the plot to your specifications.

Categories