Unintended Additional line drawn by Plotly express in Python - python

Plotly draws an extra diagonal line from the start to the endpoint of the original line graph.
Other data, other graphs work fine.
Only this data adds the line.
Why does this happen?
How can I fix this?
Below is the code
temp = pd.DataFrame(df[{KEY_WORD}])
temp['date'] = temp.index
fig=px.line(temp.melt(id_vars="date"), x='date', y='value', color='variable')
fig.show()
plotly.offline.plot(fig,filename='Fig_en1')

Just had the same issue -- try checking for duplicate values on the X axis. I was using the following code:
fig = px.line(df, x="weekofyear", y="interest", color="year")
fig.show()
That created the following plot:
I realised that this was because in certain years, some of the week numbers for the dates I had pertained to the previous years' weeks 52/53 and therefore created duplicates e.g. index 93 and 145 below:
date interest query year weekofyear
39 2015-12-20 44 home insurance 2015 51
40 2015-12-27 55 home insurance 2015 52
41 2016-01-03 69 home insurance 2016 53
92 2016-12-25 46 home insurance 2016 51
93 2017-01-01 64 home insurance 2017 52
144 2017-12-24 51 home insurance 2017 51
145 2017-12-31 79 home insurance 2017 52
196 2018-12-23 46 home insurance 2018 51
197 2018-12-30 64 home insurance 2018 52
248 2019-12-22 57 home insurance 2019 51
249 2019-12-29 73 home insurance 2019 52
By amending these (for week numbers that are high for dates in Jan, I subtracted 1 from the year column) I seem to have got rid of the phenomenon:
NB: there may be some other differences between the charts due to the dataset being somewhat fluid.

A similar question has been asked and answered in the post How to disable trendline in plotly.express.line?, but in your case I'm pretty sure the problem lies in temp.melt(id_vars="date"), x='date', y='value', color='variable'. It seems you're transfomring your data from a wide to a long format. You're using color='variable' without specifying that in temp.melt(id_vars="date"). And when the color specification does not properly correspond to the structure of your dataset, an extra line like yours can occur. Just take a look at this:
Command 1:
fig = px.line(data_frame=df_long, x='Timestamp', y='value', color='stacked_values')
Plot 1:
Command 2:
fig = px.line(data_frame=df_long, x='Timestamp', y='value')
Plot 2:
See the difference? That's why I think there's a mis-specification in your fig=px.line(temp.melt(id_vars="date"), x='date', y='value', color='variable').
So please share your data, or a sample of your data that reproduces the problem, and I'll have a better chance of verifying your problem.

Related

Missing xticks on chart for matplotlib on Python 3

I am following this section, I realize this code was made using Python 2 but they have xticks showing on the 'Start Date' axis and I do not. My chart only shows Start Date and no dates are provided.
# Set as_index=False to keep the 0,1,2,... index. Then we'll take the mean of the polls on that day.
poll_df = poll_df.groupby(['Start Date'],as_index=False).mean()
# Let's go ahead and see what this looks like
poll_df.head()
Start Date Number of Observations Obama Romney Undecided Difference
0 2009-03-13 1403 44 44 12 0.00
1 2009-04-17 686 50 39 11 0.11
2 2009-05-14 1000 53 35 12 0.18
3 2009-06-12 638 48 40 12 0.08
4 2009-07-15 577 49 40 11 0.09
Great! Now plotting the Differencce versus time should be straight forward.
# Plotting the difference in polls between Obama and Romney
fig = poll_df.plot('Start Date','Difference',figsize=(12,4),marker='o',linestyle='-',color='purple')
https://nbviewer.jupyter.org/github/jmportilla/Udemy-notes/blob/master/Data%20Project%20-%20Election%20Analysis.ipynb

Plotting multiple scatter plots of multiple years in Python

I have a dataframe that looks like:
Date Faculty Target Avg
2012-01-01 Arts 80 60
2012-01-01 Science 70 60
2012-02-01 Arts 91 89
2012-02-01 Gym 80 89
.
.
2012-07-01 Arts 83 67
2012-07-01 Science 72 67
2012-08-01 Arts 81 83
2012-08-01 Science 70 83
I want to plot all Faculty on a single scatter plot with each of their respective Target values (Y-Axis) and Avg values (X-Axis).
I'm trying to use (pseudo code) a scatterplot like:
ax1 = data.plot(kind='scatter', x='Avg', y='Target(Arts)', color='r', label='Arts')
ax2 = data.plot(kind='scatter', x='Avg', y='Target(Science)', color='g', ax=ax1, label='Science')
ax3 = data.plot(kind='scatter', x='Avg', y='Target(Gym)', color='b', ax=ax1, label='Gym')
I'd like all Faculties (there are 28 of them total) on the same plot for every Target value (marked by different colors) but there are too many to manually enter with loc (or at least I'd like to avoid this). I can't use iloc to count by index because each number of Faculty counts is different on each date.
Is there a simple way to do this?
You can groupby the Faculty, and iterate over the groups, plotting each one:
g = df.groupby('Faculty')
for faculty, data in g:
plt.scatter(data['Avg'], data['Target'], label=faculty)
plt.xlabel('Avg')
plt.ylabel('Target')
plt.legend()
plt.show()

Multiple column plotting Python

I've got data in the form:
Year Month State Value
2001 Jan AK 80
2001 Feb AK 40
2001 Mar AK 60
2001 Jan LA 70
2001 Feb LA 79
2001 Mar LA 69
2001 Jan KS 65
.
.
This data is only for Year 2001 and Months repeat on each State.
I want a basic graph with this data together in one based off the State with X-Axis being Month and Y-Axis being the Value.
When I plot with:
g = df.groupby('State')
for state, data in g:
plt.plot(df['Month'], df['Value'], label=state)
plt.show()
I get a very wonky looking graph.
I know based off plotting these individually they aren't extremely different in their behaviour but they are not even close to being this much overlapped.
Is there a way of building more of a continuous plot?
Your problem is that inside your for loop you're referencing df, which still has the data for all the states. Try:
for state, data in g:
plt.plot(data['Month'], data['Value'], label = state)
plt.legend()
plt.show()
Hopefully this helps!

Plotting multiple lines in one graph with pandas and matplotlib, using climate data

I'm trying to create a graph that shows whether or not average temperatures in my city are increasing. I'm using data provided by NOAA and have a DataFrame that looks like this:
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1939-11 53.1 11 1939
5 1939-12 52.5 12 1939
This is saved in a variable called "avgs", and I then use groupby and plot functions like so:
avgs.groupby(["YEAR"]).plot(kind='line',x='MONTH', y='TAVG')
This produces a line graph (see below for example) for each year that shows the average temperature for each month. That's great stuff, but I'd like to be able to put all of the yearly line graphs into one graph, for the purposes of visual comparison (to see if the monthly averages are increasing).
Example output
I'm a total noob with matplotlib and pandas, so I don't know the best way to do this. Am I going wrong somewhere and just don't realize it? And if I'm on the right track, where should I go from here?
Very similar to the other answer (by Anake), but you can get control over legend here (the other answer, legends for all years will be "TAVG". I add a new year entries into your data just to show this.
avgs = '''
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1940-11 53.1 11 1940
5 1940-12 52.5 12 1940
'''
ax = plt.subplot()
for key, group in avgs.groupby("YEAR"):
ax.plot(group.MONTH, group.TAVG, label = key)
ax.set_xlabel('Month')
ax.set_ylabel('TAVG')
plt.legend()
plt.show()
will result in
You can do:
ax = None
for group in df.groupby("YEAR"):
ax = group[1].plot(x="MONTH", y="TAVG", ax=ax)
plt.show()
Each plot() returns the matplotlib Axes instance where it drew the plot. So by feeding that back in each time, you can repeatedly draw on the same set of axes.
I don't think you can do that directly in the functional style as you have tried unfortunately.

Matplotlib Plot confusion

I'm not sure what is going on here but I have two seemingly similar bits of code designed to produce graphs in the same format:
apple_fcount = apple_f1.groupby("Year")["Domain Category"].nunique("count")
plt.figure(1); apple_fcount.plot(figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Number of Fungicides used')
plt.title('Number of fungicides used on apples in the US')
plt.savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/Apple/apple fcount')
This one produces the graph how I would like it to be seen; y axis shows number of fungicides, x axis shows the respective years. However, the following code, on a different dataset prints a usable graph but the x axis shows years as '1, 2, 3, ...' instead of the actual years.
apple_yplot = apple_y1.groupby('Year')['Value'].sum()
plt.figure(3); apple_yplot.plot(figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Yield / lb per acre')
plt.title('Graph of apple yield in the US over time')
plt.savefig('C:/Users/User/Documents/Work/Year 3/Project/Plots/Apple/Yield.png')
The only discernable difference I see in the code is that the first counts .nunique() datapoints, whilst the second is a .sum() of all data in the year. I can't imagine thats the reason behind this issue though. Both .groupby() lines print in the same format, with year being correctly displayed there.
apple_fcount:
Year
1991 19
1993 19
1995 21
1997 26
1999 28
2001 27
2003 31
2005 37
2007 30
2009 35
2011 32
Name: Domain Category, dtype: int64
apple_yplot:
Year
2007 405399
2008 541180
2009 483130
2010 473150
2011 468120
2012 417710
2013 529470
2014 510700
Name: Value, dtype: float64

Categories