Highlight specific points in matplotlib scatterplot - python

I have a CSV with 12 columns of data. I'm focusing on these 4 columns
Right now I've plotted "Pass def" and "Rush def". I want to be able to highlight specific points on the scatter plot. For example, I want to highlight 1995 DAL point on the plot and change that point to a color of yellow.
I've started with a for loop but I'm not sure where to go. Any help would be great.
Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import csv
import random
df = pd.read_csv('teamdef.csv')
x = df["Pass Def."]
y = df["Rush Def."]
z = df["Season"]
points = []
for point in df["Season"]:
if point == 2015.0:
print(point)
plt.figure(figsize=(19,10))
plt.scatter(x,y,facecolors='black',alpha=.55, s=100)
plt.xlim(-.6,.55)
plt.ylim(-.4,.25)
plt.xlabel("Pass DVOA")
plt.ylabel("Rush DVOA")
plt.title("Pass v. Rush DVOA")
plot.show

You can layer multiple scatters, so the easiest way is probably
plt.scatter(x,y,facecolors='black',alpha=.55, s=100)
plt.scatter(x, 2015.0, color="yellow")

Related

How to get lines in matplotlib plot to continue past x-axis limit (python)

My goal is to plot lines that represent certain days of the year. Note: I do not want to use a package like date time -- I am just trying to do this with the data points as integer values. So in the plot below, the event at y=3 lasts between day 123 to 189 and event at y=2 lasts between days 214 and 365.
Where I run into problems is with event at y=1, which should go from day 205 to 22 (that's what's in the data dataframe). However, the plot does not know that I am plotting days of the year (obviously) and so it stretches from day 0 to 205, which is wrong. Instead, it should start at 205, stretch to the right and then end at the value of 22. I've hand drawn in blue what it should look like.
example figure
Help?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame(np.array([[3,123,189],[2,214,365],[1,205,22]]), columns=['name','start','end'])
plt.hlines(data['name'],data['start'],data['end'],linewidth=1)
plt.xlabel('Date')
plt.ylabel('Event')
How about this
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame(np.array([[3,123,189],[2,214,365],[1,205,22]]), columns=['name','start','end'])
endstart = (data.index[data['start'] > data['end']].tolist())
startend = ([i for i in range(len(data)) if i not in endstart])
fig, ax = plt.subplots()
# plot startend
ax.hlines(data['name'][startend],data['start'][startend],data['end'][startend],linewidth=1)
# plot endstart
ax.hlines(data['name'][endstart],data['start'][endstart],365,linewidth=1)
ax.hlines(data['name'][endstart],0,data['end'][endstart],linewidth=1)
ax.set_xlabel('Date')
ax.set_ylabel('Event')
Output:

What can I do to make this matplotlib .bar contribution-chart-code in python work?

I am trying to plot a financial contribution analysis, one bar chart with two vertical bars, one representing a portfolio's gain contributed by equity and the other by Fixed Income (bonds) over a certain time period.
%matplotlib inline
import numpy as np
import pandas as pd
import itertools
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.lines as mlines
import matplotlib.dates as mdates
import matplotlib.font_manager as fm
import matplotlib.patches as mpatches
import matplotlib.transforms as mtrans
d = {'col1': [0.006269, 0.003842, 0.002237, 0.001448, 0.000752, 0.000166]}
equity = pd.DataFrame(data=d, index=['Aktien Nordamerika', 'Gold', 'Aktien Flexibel', 'Aktien Europa', 'Aktien Schwellenlaender', 'Aktien Pazifik'])
d2 = {'col1': [0.009533, 0.003879, 0.001926, 0.000714]}
bonds = pd.DataFrame(data=d2, index=['Anleihen Investmentgrade', 'Hedgefonds', 'Hochzinsanleihen', 'Anleihen Schwellenlaender'])
The vertical bars starting at the x-axis should in sum represent the overall gain from ['equity','bondsā€˜] but be divided into by the index defined in the df.
I tried to do that by using the iterator building block itertools.zip_longest to assign the values of the incrementally gaining bar chart and then the color for each section.
fig, ax = plt.subplots()
fig.set_size_inches(4, 4.3)
bar_width = 0.26
x_values = np.array([0, 1.2])
x_pos = [list(x_values)]*2 + [x_values[0]]*4
pl = [(p1,p2) if p2 is not None else p1 for p1, p2 in itertools.zip_longest(list(equity['col1'].values), list(bonds['col1'].values))]
colors = np.array((
['#FF7600', '#A9A9A9', '#1778A6', '#146189', '#5794B9', '#B0D2E7'],
['#004232', '#3AC2A0', '#007558', '#2A8C74']))
colormap = [(c1,c2) if c2 is not None else c1 for c1, c2 in itertools.zip_longest(colors[0],colors[1])]
then I would just use matplotlib's matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs) to create the desired bar chart..
for x, p, c in zip(x_pos, pl, colormap):
ls_sub_aktien = plt.bar(x, p, align='edge', width=bar_width, linewidth=0.0001, color=c)
BUT the output does not give me the total division in the ['bonds'] bar, where only two of the row items are colored as defined...
The output I got with the missing 4 color split on the right bar:
Also, the left bar appears to include a color which should actually be found in the bar on the right hand side.
It would be amazing if you have had a similar issue and remember how you solved it, or if you have suggestion on how I could create this chart using a different approach.
Sorry for my English and if anything is unclear let me know. Try it out, please. I'm wondering if you can solve it, thank you!

Y-axis values not displaying correctly in double bar graph

Don't know why this error is coming . pls check this. Thanks in advance.
coding->
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def dbargraph():
df=pd.read_csv('C:\\Users\Bhuwan Bhatt\Desktop\IP PROJECT\olymp_data.csv',encoding='cp1252')
df.sort_values(by='TotalMedal',ascending=False,inplace=True)
df1=df.head(n=10)
x=np.arange(len(df1))
Countries=df1['Country']
Summermedal=df1['SummerTotal']
Wintermedal=df1['WinterTotal']
plt.bar(x-0.2,Summermedal,label='Total Medals won by Top 10 Countries IN SUMMER',width=0.4,color='orangered')
plt.bar(x+0.2,Wintermedal,label='Total Medals won by Top 10 Countries IN WINTER',width=0.4,color='grey')
plt.xticks(x,Countries,rotation=20)
plt.title('Olmypic Medal Analysis by Top 10 Countries',color='navy',fontsize=16)
plt.xlabel('Countries~~~~>',fontsize=12,color='r')
plt.ylabel('No. of Medals~~~~>',fontsize=12,color='r')
plt.grid()
plt.legend()
plt.show()
dbargraph()
error: y-axis values not displaying properly
did you try
to set it manually by set_yticks() ?
Im not sure if you won't need to set up the axis, i.e
axes = plt.axes()
and then specify the range of x and y ticks. That may help

How to plot coordinates (1,2) against time (0.5) in python

I am trying to plot vehicle position (coordinates - x,y) against time(1s,2s,3s...). I tried with matplotlib but could not succeed. I am new in python. Could anyone help me please.
My code:
import matplotlib.pyplot as plt
import numpy as np
coordinate = [[524.447876,1399.091919], [525.1377563,1399.95105], [525.7932739,1400.767578], [526.4627686,1401.601563],
[527.2360229,1402.564575], [527.8989258,1403.390381], [528.5689697,1404.224854]]
timestamp =[0,0.05,0.1,0.15,0.2,0.25,0.3]
plt.plot(coordinate,timestamp)
Plot comes like: But this is wrong one. I did wrong.
Plot supposed to become, in particular, timestamp (1s) the vehicle position is (x,y). So there should be one line just like vehicle trajectory.
Thanks.
I believe this is the output you're looking for:
import matplotlib.pyplot as plt
import numpy as np
coordinate = [[524.447876,1399.091919],
[525.1377563,1399.95105],
[525.7932739,1400.767578],
[526.4627686,1401.601563],
[527.2360229,1402.564575],
[527.8989258,1403.390381],
[528.5689697,1404.224854]]
v1 = [y[1] for y in coordinate]
v2 = [y[0] for y in coordinate]
x = [0,0.05,0.1,0.15,0.2,0.25,0.3]
plt.plot(x,v1)
plt.plot(x,v2,'--')
plt.ylim(0,1500)
plt.show()
Does something simple like this meet your needs:
import matplotlib.pyplot as plt
coordinates = [
(524.447876,1399.091919),
(525.1377563,1399.95105),
(525.7932739,1400.767578),
(526.4627686,1401.601563),
(527.2360229,1402.564575),
(527.8989258,1403.390381),
(528.5689697,1404.224854),
]
timestamp = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]
x, y = zip(*coordinates)
ax = plt.axes(projection="3d")
ax.plot(x, y, timestamp);
plt.show()
Matplotlib will let you rotate the image with the mouse to view it from various angles.
Hi I think the problem over here is that you are using a two-dimensional list, so matplotlib plots the coordinates and not the timestamp.
Code:
import matplotlib.pyplot as plt
import numpy as np
coordinate = np.array([[524.447876,1399.091919], [525.1377563,1399.95105], [525.7932739,1400.767578], [526.4627686,1401.601563], [527.2360229,1402.564575], [527.8989258,1403.390381], [528.5689697,1404.224854]])
timestamp =np.array([0,0.05,0.1,0.15,0.2,0.25,0.3])
plt.plot(coordinate)
Output:
You have to convert it into a single dimension list like this:
coordinate_new = np.array([524.447876,525.1377563,1399.95105, 525.7932739,1400.767578, 526.4627686,1401.601563])
timestamp =np.array([0,0.05,0.1,0.15,0.2,0.25,0.3])
plt.plot(coordinate_new, timestamp)
Then the output will be:
Hope I could help!!
If you want to plot it in 3-d, here is what you can do:
import matplotlib.pyplot as plt
#importing matplotlib
fig = plt.figure() #adding figure
ax_3d = plt.axes(projection="3d") #addign 3-d axes
coordinate_x = [524.447876, 525.137756, 525.7932739, 526.4627686, 527.2360229, 527.8989258, 528.5689697]
coordinate_y = [1399.091919, 1399.95105,1400.767578,1401.601563,1402.564575,1403.390381,1404.224854]
timestamp =[0,0.05,0.1,0.15,0.2,0.25,0.3]
# defining the variables
ax.plot(coordinate_x, coordinate_y, timestamp)
#plotting them
Output:
All the Best!

How to make horizontal linechart with categorical variables and timeseries?

I want to replicate plots from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000555/pdf/nihms774453.pdf I'm particularly interested in plot on page 16, right panel. I tried to do this in matplotlib but it seems to me that there is no way to access lines in linecollection.
I don't know how to change the color of the each line, according to the value at every index. I'd like to eventually get something like here: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/multicolored_line.html but for every line, according to the data.
this is what I tried:
the data in numpy array: https://pastebin.com/B1wJu9Nd
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib import colors as mcolors
%matplotlib inline
base_range = np.arange(qq.index.max()+1)
fig, ax = plt.subplots(figsize=(12,8))
ax.set_xlim(qq.index.min(), qq.index.max())
# ax.set_ylim(qq.columns[0], qq.columns[-1])
ax.set_ylim(-5, len(qq.columns) +5)
line_segments = LineCollection([np.column_stack([base_range, [y]*len(qq.index)]) for y in range(len(qq.columns))],
cmap='viridis',
linewidths=(5),
linestyles='solid',
)
line_segments.set_array(base_range)
ax.add_collection(line_segments)
axcb = fig.colorbar(line_segments)
plt.show()
my result:
what I want to achieve:

Categories