Reformatting y axis values in a multi-line plot in Python - python

Updated with more info
I've seen this answered on here for single line plots, but I need help with a plot showing two variables, if that matters at all... I am fairly new to python in general. My line graph shows two different departments' funding over the years. I just want to reformat the y axis to display as a number in the hundreds of millions.
Using a csv for the general public funding report of Minneapolis.
msp_df = pd.read_csv('Minneapolis_Data_Snapshot_v2.csv',error_bad_lines=False)
msp_df.info()
Saved just the two depts I was interested in, to a dataframe.
CPED_df = (msp_df['Unnamed: 0'] == 'CPED')
msp_df.iloc[CPED_df.values]
police_df = (msp_df['Unnamed: 0'] == 'Police')
msp_df.iloc[police_df.values]
("test" is the new name of my data frame containing all the info as seen below.)
test = pd.DataFrame({'Year': range(2014,2021),
'CPED': msp_df.iloc[CPED_df.values].T.reset_index(drop=True).drop(0,0)[5].tolist(),
'Police': msp_df.iloc[police_df.values].T.reset_index(drop=True).drop(0,0)[4].tolist()})
The numbers from the original dataset were being read as strings because of the commas so had to fix that first.)
test['Police2'] = test['Police'].str.replace(',','').astype(int)
test['CPED2'] = test['CPED'].str.replace(',','').astype(int)
And here is my code for the plot. It executes, I'm just wanting to reformat the y axis number scale. Right now it just shows up as a decimal. (I've already imported pandas and seaborn and matploblib)
plt.plot(test.Year, test.Police2, test.Year, test.CPED2)
plt.ylabel('Budget in Hundreds of Millions')
plt.xlabel('Year')
Current plot
Any help super appreciated! Thanks :)

the easiest way to reformat the y axis, to force it to take certain values ​​is to use
plt.yticks(ticks, labels)
for example if you want to have only display values ​​from 0 to 1 you can do :
plt.yticks([0,0.2,0.5,0.7,1], ['a', 'b', 'c', 'd', 'e'])

Related

PPTX Python - How to fix ValueError: chart data contains no categories for a LineChart?

I'm trying to replace data in an existing line chart in Python PPTX. Here's the code I'm using:
for chart in charts:
chart_data = CategoryChartData()
chart_index = list(charts).index(chart)
scenario_no = chart_index + 1
sc_df = wrk_df[wrk_df['Scenario No'] == scenario_no]
for category in sc_df['categories'].tolist():
chart_data.add_category(category)
chart_data.add_series('Volume', sc_df['Series 1'].tolist())
chart_data.add_series('Value', sc_df['Series 2'].tolist())
chart.replace_data(chart_data)
Basically there are several charts on the slide, through which the code iterates and replaces the data. The charts themselve have a numeric x axis and two series.
When I run this code I get the following error:
ValueError: chart data contains no categories
I've already tried converting the new categories into string, however, it doesn't work with any data type.
I'm also able to print original category labels in the existing chart, which means it does have categories.
I can't think of what's going wrong here. Does anyone have any solution for this or at least any knowledge of why this is happenning?
It turned out that the data being passed as categories was empty

Time series plot showing unique occurrences per day

I have a dataframe, where I would like to make a time series plot with three different lines that each show the daily occurrences (the number of rows per day) for each of the values in another column.
To give an example, for the following dataframe, I would like to see the development for how many a's, b's and c's there have been each day.
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
When I try the command below (my best guess so far), however, it does not filter for the different dates (I would like three lines representing each of the letters.
Any ideas on how to solve this?
df.groupby(['date']).count().plot()['letter']
I have also tried a solution in Matplotlib, though this one gives an error..
fig, ax = plt.subplots()
ax.plot(df['date'], df['letter'].count())
Based on your question, I believe you are looking for a line plot which has dates in X-axis and the counts of letters in the Y-axis. To achieve this, these are the steps you will need to do...
Group the dataframe by date and then letter - get the number of entries/rows for each which you can do using size()
Flatten the grouped dataframe using reset_index(), rename the new column to Counts and sort by letter column (so that the legend shows the data in the alphabetical format)... these are more to do with keeping the new dataframe and graph clean and presentable. I would suggest you do each step separately and print, so that you know what is happening in each step
Plot each line plot separately using filtering the dataframe by each specific letter
Show legend and rotate date so that it comes out with better visibility
The code is shown below....
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-14','2019-10-09','2019-10-10','2019-10-08','2019-10-14','2019-10-10','2019-10-08','2019-10-08','2019-10-13','2019-10-08','2019-10-12','2019-10-11','2019-10-09','2019-10-08']),
'letter':['a','b','c','a','b','b','b','b','c','b','b','a','b','a','c']})
df_grouped = df.groupby(by=['date', 'letter']).size().reset_index() ## New DF for grouped data
df_grouped.rename(columns = {0 : 'Counts'}, inplace = True)
df_grouped.sort_values(['letter'], inplace=True)
colors = ['r', 'g', 'b'] ## New list for each color, change as per your preference
for i, ltr in enumerate(df_grouped.letter.unique()):
plt.plot(df_grouped[df_grouped.letter == ltr].date, df_grouped[df_grouped.letter == ltr].Counts, '-o', label=ltr, c=colors[i])
plt.gcf().autofmt_xdate() ## Rotate X-axis so you can see dates clearly without overlap
plt.legend() ## Show legend
Output graph

plotly python lines going backwards even after sorting values

I am trying to create a plot which shows each individual's trajectory as well as the mean. This is working OK except that there appear to be extra lines and the lines go backwards, even after sorting the values.
Example:
import pandas as pd
import plotly.graph_objects as go
df = pd.DataFrame({"id": [1,1,1,1,2,2,2,2],
"months": [0,1,2,3,0,1,2,3],
"outcome":[5,2,7,11,18,3,15,3]})
#sort by each individual and the months ie. time column
df.sort_values(by=["id", "months"], inplace=True)
#create mean to overlay on plot
grouped = df.groupby("months")["outcome"].mean().reset_index()
#create plot
fig = go.Figure()
fig.add_trace(go.Scatter(x= df['months'], y= df['outcome'], name = "Individuals"))
fig.add_trace(go.Scatter(x=grouped['months'], y=grouped['outcome'], name = "Mean"))
fig.write_image("test.jpeg", scale = 2)
fig.show()
Now that I'm looking at it it actually looks like it's just creating one giant line for all IDs together, whereas I'd like one line for ID 1, and one line for ID2.
Any help much appreciated. Thanks in advance.
I believe the issue is in your x-values. In Pycharm, I looked at the dataframe and it looks like this:
Your months go from 0-3 and then back to 0-3. I'm a little unclear on what you want to do though - do you want to display only the ones with IDs that match? Such as all the ID with 1 and ID with 2?
Let us know what you expect to see given this dataframe I'm showing, it would be helpful.
EDIT So, I couldn't read the original question. Looking at it more, I believe I can at least answer the first portion however that led me to another bug. The line in question should be changed like so:
fig.add_trace(go.Scatter(x=df['months'][df['id'] == 1], y=df['outcome'][df['id'] == 1], name="Individuals"))
This will pull from the dataframe only where the id == 1, however this then won't show on your graph since your grouped dataframe doesn't fall within the same bounds.

how to plot a column of arrays where i need to count how many times a different values come up in all these arrays

I'm working with a data base of shows and I'm wanting to plot on a bar graph how many times each genre is used over all the shows so I can show the most popular genres. The issue I'm having is that a show (a show is a row in the database) usually has more then 1 genre (for example: ['Comedy', 'Drama','Sci-Fi'] might be the genres of 1 show). I'd like to displays genres on their own (I'm using jupyter with the pandas, matplotlib,...).
This is the code I've made so far:
bar_data = content2['genre'].value_counts().sort_values().tail(20)
bar_plot = bar_data.plot.barh(figsize=(20, 12))
bar_plot.set_title("genre poplularity")
bar_plot.set_xlabel("amount of times genre is used")
bar_plot.set_ylabel("genres")
plt.show()
Some ways that I've tried to solve this by trying to split at the ',' but this doesn't work (probably because it's not a string).
So could anyone help me figure out how to plot a column of arrays like this.
end result should be something like this but then in a bar graph
Comedy: 800
Adventure: 756
Sci-Fi: 698
Kids: 630
Thank you very much for you time and help
If you have each genre as a list, you can use explode() to get the individual strings from it. Then use value_counts().
content2['genre'].explode().value_counts()
Hope this helps.
Update
It looks like each row is a string. So you will first have to strip (strip('[]')) the '[]', then split the string by commas (split(',')) to get the genre names. This can be done using the following code snippet.
content2['genre'].str.strip('[]').str.split(',')
Hope this helps.

Preparing Data-frame for Bokeh Consumption

Trying to plot with Bokeh using a data-frame but plot is displaying empty. Beginner here; missing something fundamental.
My plot works if I hard code some basic X and Y variables so I know the issue has to do with the data-frame I'm trying to use as a source.
...
df = pd.DataFrame(j)
df.columns = ['Team','Type','Date','SLA_MET']
df['SLA_MET']= df['SLA_MET'].round(2)
pd.set_option('display.max_columns', 10)
print(df)
source = ColumnDataSource(df)
p = figure(background_fill_color='gray',
background_fill_alpha=0.5,
border_fill_color='blue',
border_fill_alpha=0.25,
plot_height=600,
plot_width=1000,
x_axis_label='Month',
x_axis_location='below',
y_axis_label='% SLA Met',
y_axis_location='left',
title='Percentage of SLA Met',
title_location='above',
toolbar_location='below',
tools='save')
p.line(source=source,x='Date',y='SLA_MET')
show(p)
Decided to pass clean lists to plot
for index, row in df.iterrows():
if row[2] =='Service Request':
sr_list.append(row[3])
else:
inc_list.append(row[3])
date_list.append(row[1]) # Only need 1 list of dates
Problem is dates in scientific notation and dates are not in order.
Bokeh does not know what to do with the strings in your Date column. You have two options:
convert this column to real python/numpy/pandas (numeric) datetime values, and also set x_axis_type="datetime" in your figure call, or
use the string values as categorical factors
It's not clear what your intention is, so I can't recommend one vs the other.

Categories