I've created a dataframe with stock information. When I go to create a scatter plot and annotate labels, not all of the labels are included. I'm only getting 3 labels out of 50 or so points. I can't figure out why it's not plotting all labels.
My Table:
Dividend ExpenseRatio Net_Assets PriceEarnings PriceSales
Ticker
FHLC 0.0128 0.08 6.056 22.95 1.78
ONEQ 0.0083 0.21 6.284 20.24 2.26
FTEC 0.0143 0.08 3.909 20.83 2.73
FDIS 0.0144 0.08 2.227 20.17 1.36
FENY 0.0262 0.08 4.386 25.97 1.34
My plotting code:
for ticker,row in df.iterrows():
plt.scatter(row['PriceSales'], row['PriceEarnings'], c = np.random.rand(3,1), s = 300)
for i, txt in enumerate(ticker):
plt.annotate(df.index[i-1], (df.PriceSales[i-1], df.PriceEarnings[i-1]))
plt.xlabel('PriceSales')
plt.ylabel('PriceEarnings')
plt.show()
My graph image:
ticker here is going to have the value of the ticker of the last row; e.g., "FENY". When you call enumerate(ticker), it will generate an item for each char, so it sounds like your last ticker has 3 entries.
I think you can annotate points in the same loop as the scatter plot:
for ticker,row in df.iterrows():
plt.scatter(row['PriceSales'], row['PriceEarnings'], c = np.random.rand(3,1), s = 300)
plt.annotate(ticker, (row['PriceSales'], row['PriceEarnings']))
Related
I'm not very confident with python, most of the time I get a code on the web (frequently here). I have a dataframe that I read in panda and I just can't plot it in the way I want (I can't plot it at all, by the way).
Here a sample of my df :
date 00:00 00:30 01:00 01:30 02:00 02:30
16/03/2021 0.518 0.679 0.516 0.633 0.606 0.775
17/03/2021 0.751 0.956 0.823 0.975 1.5 0.73
18/03/2021 0.925 0.733 0.825 1.489 0.762 0.686
19/03/2021 0.834 0.726 0.887 0.712 0.769 0.713
20/03/2021 0.735 0.799 0.732 0.803 0.629 0.811
21/03/2021 0.61 0.425 0.645 0.518 0.451 0.629
22/03/2021 0.401 0.525 0.472 0.518 0.508 0.432
As you see, the very first column is the dates and the very first row is hours of these dates (by step of 30 minutes). The original file cover all 24 hours and is separate by spaces (" ").
I want to plot a curve by day and, for example, 3 days by plot (and I'll make some subplots, I have already done that before, it should be ok...).
Here the beginning of my code :
import pandas as pd
df = pd.read_csv("file.csv", sep=" ",on_bad_lines='skip',usecols=range(48))
Yeah, that's not so much, I know.
These days, I spent a lot of time on python to learn and I now have a solution for this post that I forgot since. If someone needs this some day, here is a code that creates a dataframe line per line, then plot it on a figure with subplot (but we can easily put all data on the same plot when needed):
import pandas as pd
import matplotlib.pyplot as plt
df_all = pd.read_csv('df_test.csv')
df_all.index = df_all['Unnamed: 0']
df_all.drop(['Unnamed: 0'], axis=1, inplace=True)
df = {}
x = 0
for x in range (6) :
globals()[f"df{x}"] = df_all.iloc[x]
x += 1
fig, ax = plt.subplots((x+1),1,sharex=True,sharey=True,figsize=(18,15), dpi=200)
y = 0
for y in range(x):
ax[y].plot(globals()[f"df{y}"].index, globals()[f"df{y}"].iloc[0:])
y += 1
I am trying to use Seaborn to plot a simple bar plot using data that was transformed. The data started out looking like this (text follows):
element 1 2 3 4 5 6 7 8 9 10 11 12
C 95.6 95.81 96.1 95.89 97.92 96.71 96.1 96.38 96.09 97.12 95.12 95.97
N 1.9 1.55 1.59 1.66 0.53 1.22 1.57 1.63 1.82 0.83 2.37 2.13
O 2.31 2.4 2.14 2.25 1.36 1.89 2.23 1.8 1.93 1.89 2.3 1.71
Co 0.18 0.21 0.16 0.17 0.01 0.03 0.13 0.01 0.02 0.01 0.14 0.01
Zn 0.01 0.03 0.02 0.03 0.18 0.14 0.07 0.17 0.14 0.16 0.07 0.18
and after importing using:
df1 = pd.read_csv(r"C:\path.txt", sep='\t',header = 0, usecols=[0, 1, 2,3,4,5,6,7,8,9,10,11,12], index_col='element').transpose()
display(df1)
When I plot the values of an element versus the first column (which represents an observation), the first column of data corresponding to 'C' is used instead. What am I doing wrong and how can I fix it?
I also tried importing, then pivoting the dataframe, which resulted in an undesired shape that repeated the element set as columns 12 times.
ax = sns.barplot(x=df1.iloc[:,0], y='Zn', data=df1)
edited to add that I am not married to using any particular package or technique. I just want to be able to use my data to build a bar plot with 1-12 on the x axis and elemental compositions on the y.
you have different possibilities here. The problem you have is because 'element' is the index of your dataframe, so x=df1.iloc[:,0] is the column of 'C'.
1)
ax = sns.barplot(x=df.index, y='Zn', data=df1)
df.reset_index(inplace=True) #now 'element' is the first column of the df1
ax = sns.barplot(x=df.iloc[:,0], y='Zn', data=df1)
#equal to
ax = sns.barplot(x='element', y='Zn', data=df1
I need to plot a pivot chart from a multi-indexed pivot table. This is my pivot table description "multi_index = pd.pivot_table(df_new, index = ['Device_ID', 'Temp' ,'Supply'],columns = 'Frequency', values = 'NoiseLevel',)"
I used Plotly at that time it is coming as a single straight line. I am expecting two zig-zag lines one for frequency 0.8 and the other for 1.6 as shown in the first figure. could you please tell me where I went wrong? please see my code below. I don't know where I need to put the "columns = 'Frequency'" I think it needs to come at Y axis.
Please see my dta frame below(Pivot Table)
Frequency 0.8 1.6
Device_ID Temp Supply
FF_2649 -40.0 1.65 -100.72 -101.35
1.71 -100.61 -101.74
1.80 -100.74 -101.64
1.89 -100.63 -101.69
3.60 -100.60 -101.46
... ... ...
TT_2441 85.0 1.65 -94.99 -94.97
1.71 -94.85 -95.24
1.80 -95.02 -94.97
1.89 -94.69 -96.20
3.60 -94.90 -94.91
data=[go.Scatter(
x=multi_index.index,
y=multi_index.values,
mode='lines',
name='Noise Level'
)]
layout=go.Layout(title='Noise Level')
figure=go.Figure(data=data,layout=layout)
pyo.plot(figure)
plotly does not directly support multi-index
concat values in multi-index to a string that identifies it
generate a plotly scatter per column
import io
df = pd.read_csv(io.StringIO(""" Device_ID Temp Supply 0.8 1.6
FF_2649 -40.0 1.65 -100.72 -101.35
- - 1.71 -100.61 -101.74
- - 1.80 -100.74 -101.64
- - 1.89 -100.63 -101.69
- - 3.60 -100.60 -101.46
TT_2441 85.0 1.65 -94.99 -94.97
- - 1.71 -94.85 -95.24
- - 1.80 -95.02 -94.97
- - 1.89 -94.69 -96.20
- - 3.60 -94.90 -94.91"""), sep="\s+").replace({"-":np.nan}).fillna(method="ffill").apply(pd.to_numeric, **{"errors":"ignore"}).set_index(["Device_ID","Temp","Supply"])
# generate a line for each column dataframe. concat values of multi-index to make it work with plotly
data = [go.Scatter(x=pd.Series(df.index.values).apply(lambda i: " ".join(map(str, i))), y=df[c],
mode='lines', name=f'Noise Level {c}')
for c in df.columns]
layout=go.Layout(title='Noise Level')
figure=go.Figure(data=data,layout=layout)
figure
I'm trying to load data from an excel sheet and then plot all on the same plot but I am a little inexperienced with plotting multiple lines on a single plot. Each column is Time Elapsed and the corresponding Residual Act. and I have multiple columns named the same. Time is different in each column hence having multiple time columns. Right now the code just outputs 4 separate plots. Can someone tell me how to do this without overly complicating myself, I have to plot multiple files in the future and would like an easy way.
import pandas as pd
import matplotlib.pyplot as plt
HD = pd.read_excel('C:\\Users\\azizl\\Desktop\\HDPD_Data.xlsx')
HD.plot(x='Time Elapsed', y= 'Residual Act.' , label='24')
HD.plot(x='Time Elapsed.1', y= 'Residual Act..1', label='48')
HD.plot(x='Time Elapsed.2', y= 'Residual Act..2', label='normal')
HD.plot(x='Time Elapsed.3', y= 'Residual Act..3', label='physical')
plt.show()
HD.head()
Assuming that you have read HD before, to genarate your plot,
try the following code:
import matplotlib.pyplot as plt
labels = ['24', '48', 'normal', 'physical']
for slc in np.split(HD.values, HD.shape[1] // 2, axis=1):
plt.plot(slc[:, 0], slc[:, 1], 'o-')
plt.xlabel('Time elapsed')
plt.ylabel('Residual Act.')
plt.legend(labels)
plt.show()
The idea is to:
Divide the source DataFrame into 2-column slices (np.split).
Plot each slice, with x and y data from the current
slice (plt.plot).
Create the legend from an external list.
To test the above code I created the following DataFrame:
Time Elapsed Residual Act. Time Elapsed.1 Residual Act..1 Time Elapsed.2 Residual Act..2 Time Elapsed.3 Residual Act..3
0 1.00 4.15 1.10 4.10 1.15 3.50 1.05 3.76
1 1.15 4.01 1.27 3.90 1.30 3.20 1.20 3.00
2 1.80 3.40 1.90 3.50 2.11 3.00 2.00 2.90
3 2.20 3.00 2.50 3.05 2.47 2.88 2.30 2.70
4 2.90 2.50 2.95 2.20 2.90 2.40 3.10 2.30
5 3.60 2.00 4.00 1.70 3.86 2.20 4.05 2.00
The original Excel file contains pairs of Time Elapsed and
Residual Act. columns, but read_excel adds .1, .2, ...
to repeating column names.
Time Elapsed is expressed in seconds and Residual Act. in whatever
unit of choice.
For the above data I got the following picture:
I have a table of values which aren't logs but to find their relation I think I need to create a log-log plot. The values I have are:
R C
----------
0.2 103
2 13.9
20 2.72
200 0.800
2000 0.401
20000 0.433
How do I plot the logs of these values ?