Plotting Data from Excel file - python

I'm trying to load data from an excel sheet and then plot all on the same plot but I am a little inexperienced with plotting multiple lines on a single plot. Each column is Time Elapsed and the corresponding Residual Act. and I have multiple columns named the same. Time is different in each column hence having multiple time columns. Right now the code just outputs 4 separate plots. Can someone tell me how to do this without overly complicating myself, I have to plot multiple files in the future and would like an easy way.
import pandas as pd
import matplotlib.pyplot as plt
HD = pd.read_excel('C:\\Users\\azizl\\Desktop\\HDPD_Data.xlsx')
HD.plot(x='Time Elapsed', y= 'Residual Act.' , label='24')
HD.plot(x='Time Elapsed.1', y= 'Residual Act..1', label='48')
HD.plot(x='Time Elapsed.2', y= 'Residual Act..2', label='normal')
HD.plot(x='Time Elapsed.3', y= 'Residual Act..3', label='physical')
plt.show()
HD.head()

Assuming that you have read HD before, to genarate your plot,
try the following code:
import matplotlib.pyplot as plt
labels = ['24', '48', 'normal', 'physical']
for slc in np.split(HD.values, HD.shape[1] // 2, axis=1):
plt.plot(slc[:, 0], slc[:, 1], 'o-')
plt.xlabel('Time elapsed')
plt.ylabel('Residual Act.')
plt.legend(labels)
plt.show()
The idea is to:
Divide the source DataFrame into 2-column slices (np.split).
Plot each slice, with x and y data from the current
slice (plt.plot).
Create the legend from an external list.
To test the above code I created the following DataFrame:
Time Elapsed Residual Act. Time Elapsed.1 Residual Act..1 Time Elapsed.2 Residual Act..2 Time Elapsed.3 Residual Act..3
0 1.00 4.15 1.10 4.10 1.15 3.50 1.05 3.76
1 1.15 4.01 1.27 3.90 1.30 3.20 1.20 3.00
2 1.80 3.40 1.90 3.50 2.11 3.00 2.00 2.90
3 2.20 3.00 2.50 3.05 2.47 2.88 2.30 2.70
4 2.90 2.50 2.95 2.20 2.90 2.40 3.10 2.30
5 3.60 2.00 4.00 1.70 3.86 2.20 4.05 2.00
The original Excel file contains pairs of Time Elapsed and
Residual Act. columns, but read_excel adds .1, .2, ...
to repeating column names.
Time Elapsed is expressed in seconds and Residual Act. in whatever
unit of choice.
For the above data I got the following picture:

Related

Derivative of a dataset using python or pandas

I have a tab-limited csv file with a dataset with 2 columns (time and value) of data of type float. I have 100s of these files from a lab equipment. An example set is shown below.
3.64 1.22e-11
4.14 2.44e-11
4.64 1.22e-11
5.13 2.44e-11
5.66 1.22e-11
6.17 1.22e-11
6.67 2.44e-11
7.17 2.44e-11
7.69 1.22e-11
8.20 2.44e-11
8.70 1.22e-11
9.20 2.44e-11
9.72 2.44e-11
10.22 1.22e-11
10.72 1.22e-11
11.22 1.22e-11
11.72 1.22e-11
12.22 1.22e-11
12.70 -1.95e-10
13.22 -1.57e-09
13.73 -3.04e-09
14.25 -4.39e-09
14.77 -5.73e-09
15.28 -7.02e-09
15.80 -8.26e-09
16.28 -8.61e-09
16.83 -8.70e-09
17.31 -8.76e-09
17.81 -8.80e-09
18.31 -8.83e-09
18.83 -8.91e-09
19.33 -8.98e-09
19.84 -9.02e-09
20.34 -9.05e-09
20.84 -9.06e-09
21.34 -9.07e-09
21.88 -9.08e-09
22.39 -9.08e-09
22.89 -9.09e-09
23.39 -9.09e-09
23.89 -9.09e-09
24.41 -9.09e-09
I want to trim the data to reset time (x,1st col to 0) when the value (y/2nd column) starts to change, and also trim after the value plateaus.
For 1st derivative, if I use NumPy.gradient, I can see where the data changes, but I couldn't find a similar function for pandas.
Any suggestions?
Added: Output (done in excel manually) will look like below where (in this case) first 18 rows and last 3 are removed. The first row is set to 0 by subtracting all values from the previous row.
0.00 0.000000000000
0.52 -0.000000001375
1.03 -0.000000002845
1.55 -0.000000004195
2.07 -0.000000005535
2.58 -0.000000006825
3.10 -0.000000008065
3.58 -0.000000008415
4.13 -0.000000008505
4.61 -0.000000008565
5.11 -0.000000008605
5.61 -0.000000008635
6.13 -0.000000008715
6.63 -0.000000008785
7.14 -0.000000008825
7.64 -0.000000008855
8.14 -0.000000008865
8.64 -0.000000008875
9.18 -0.000000008885
9.69 -0.000000008885
10.19 -0.000000008895
What I have tried is using python and pandas to differentiate and then remove where derivative is 0, but that removes data point within the output I want too.
dfT = df1[df1.dB != 0]
dfT = dfT[df1.dB >= 0]
dfT = dfT.dropna()
dfT = dfT.reset_index(drop=True)
dfT
Why not use what is already working aka np.gradient and put it back to your dataframe? I am not able to create your final desired output however, since it looks like you rely on more than just filtering out gradient = 0? Open to fixing it once I get a bit clearer on logic
### imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
### data
time = [3.64,4.14,4.64,5.13,5.66,6.17,6.67,7.17,7.69,8.2,8.7,9.2,9.72,10.22,10.72,11.22,11.72,12.22,12.7,13.22,13.73,14.25,14.77,15.28,15.8,16.28,16.83,17.31,17.81,18.31,18.83,19.33,19.84,20.34,20.84,21.34,21.88,22.39,22.89,23.39,23.89,24.41,21.88,22.39,22.89,23.39,23.89,24.41]
value = [0.0000000000122,0.0000000000244,0.0000000000122,0.0000000000244,0.0000000000122,0.0000000000122,0.0000000000244,0.0000000000244,0.0000000000122,0.0000000000244,0.0000000000122,0.0000000000244,0.0000000000244,0.0000000000122,0.0000000000122,0.0000000000122,0.0000000000122,0.0000000000122,-0.000000000195,-0.00000000157,-0.00000000304,-0.00000000439,-0.00000000573,-0.00000000702,-0.00000000826,-0.00000000861,-0.0000000087,-0.00000000876,-0.0000000088,-0.00000000883,-0.00000000891,-0.00000000898,-0.00000000902,-0.00000000905,-0.00000000906,-0.00000000907,-0.00000000908,-0.00000000908,-0.00000000909,-0.00000000909,-0.00000000909,-0.00000000909,-0.00000000908,-0.00000000908,-0.00000000909,-0.00000000909,-0.00000000909,-0.00000000909]
### dataframe creation
# df = pd.read_csv('test.csv', names=["time", "value"])
df = pd.DataFrame({'time':time, 'value':value})
plt.plot(df.time,df.value)
Outputs:
Next you can differentiate and as you can see within your first 18 rows you mentioned there are multiple points where gradient is greater than 0:
df['gradient'] = np.gradient(df.value.values)
df
plt.plot(df.time,df.gradient)
Outputs:
Next filter out non change and add new time
### filter data where gradient is not 0 and add new time
df_filtered = df[df.gradient != 0]
df_filtered['time_difference'] = df_filtered.time.diff().fillna(0)
df_filtered['new_time'] = df_filtered['time_difference'].cumsum()
df_filtered.reset_index(drop=True,inplace=True)
df_filtered
Outputs:

plot dataframe : Use row as X axe and every line in Y axe

I'm not very confident with python, most of the time I get a code on the web (frequently here). I have a dataframe that I read in panda and I just can't plot it in the way I want (I can't plot it at all, by the way).
Here a sample of my df :
date 00:00 00:30 01:00 01:30 02:00 02:30
16/03/2021 0.518 0.679 0.516 0.633 0.606 0.775
17/03/2021 0.751 0.956 0.823 0.975 1.5 0.73
18/03/2021 0.925 0.733 0.825 1.489 0.762 0.686
19/03/2021 0.834 0.726 0.887 0.712 0.769 0.713
20/03/2021 0.735 0.799 0.732 0.803 0.629 0.811
21/03/2021 0.61 0.425 0.645 0.518 0.451 0.629
22/03/2021 0.401 0.525 0.472 0.518 0.508 0.432
As you see, the very first column is the dates and the very first row is hours of these dates (by step of 30 minutes). The original file cover all 24 hours and is separate by spaces (" ").
I want to plot a curve by day and, for example, 3 days by plot (and I'll make some subplots, I have already done that before, it should be ok...).
Here the beginning of my code :
import pandas as pd
df = pd.read_csv("file.csv", sep=" ",on_bad_lines='skip',usecols=range(48))
Yeah, that's not so much, I know.
These days, I spent a lot of time on python to learn and I now have a solution for this post that I forgot since. If someone needs this some day, here is a code that creates a dataframe line per line, then plot it on a figure with subplot (but we can easily put all data on the same plot when needed):
import pandas as pd
import matplotlib.pyplot as plt
df_all = pd.read_csv('df_test.csv')
df_all.index = df_all['Unnamed: 0']
df_all.drop(['Unnamed: 0'], axis=1, inplace=True)
df = {}
x = 0
for x in range (6) :
globals()[f"df{x}"] = df_all.iloc[x]
x += 1
fig, ax = plt.subplots((x+1),1,sharex=True,sharey=True,figsize=(18,15), dpi=200)
y = 0
for y in range(x):
ax[y].plot(globals()[f"df{y}"].index, globals()[f"df{y}"].iloc[0:])
y += 1

transposed Pandas data frame zeroth column not callable as [:, 0]

I am trying to use Seaborn to plot a simple bar plot using data that was transformed. The data started out looking like this (text follows):
element 1 2 3 4 5 6 7 8 9 10 11 12
C 95.6 95.81 96.1 95.89 97.92 96.71 96.1 96.38 96.09 97.12 95.12 95.97
N 1.9 1.55 1.59 1.66 0.53 1.22 1.57 1.63 1.82 0.83 2.37 2.13
O 2.31 2.4 2.14 2.25 1.36 1.89 2.23 1.8 1.93 1.89 2.3 1.71
Co 0.18 0.21 0.16 0.17 0.01 0.03 0.13 0.01 0.02 0.01 0.14 0.01
Zn 0.01 0.03 0.02 0.03 0.18 0.14 0.07 0.17 0.14 0.16 0.07 0.18
and after importing using:
df1 = pd.read_csv(r"C:\path.txt", sep='\t',header = 0, usecols=[0, 1, 2,3,4,5,6,7,8,9,10,11,12], index_col='element').transpose()
display(df1)
When I plot the values of an element versus the first column (which represents an observation), the first column of data corresponding to 'C' is used instead. What am I doing wrong and how can I fix it?
I also tried importing, then pivoting the dataframe, which resulted in an undesired shape that repeated the element set as columns 12 times.
ax = sns.barplot(x=df1.iloc[:,0], y='Zn', data=df1)
edited to add that I am not married to using any particular package or technique. I just want to be able to use my data to build a bar plot with 1-12 on the x axis and elemental compositions on the y.
you have different possibilities here. The problem you have is because 'element' is the index of your dataframe, so x=df1.iloc[:,0] is the column of 'C'.
1)
ax = sns.barplot(x=df.index, y='Zn', data=df1)
df.reset_index(inplace=True) #now 'element' is the first column of the df1
ax = sns.barplot(x=df.iloc[:,0], y='Zn', data=df1)
#equal to
ax = sns.barplot(x='element', y='Zn', data=df1

Why Pivot chart is comig as a straight line when using Plotly

I need to plot a pivot chart from a multi-indexed pivot table. This is my pivot table description "multi_index = pd.pivot_table(df_new, index = ['Device_ID', 'Temp' ,'Supply'],columns = 'Frequency', values = 'NoiseLevel',)"
I used Plotly at that time it is coming as a single straight line. I am expecting two zig-zag lines one for frequency 0.8 and the other for 1.6 as shown in the first figure. could you please tell me where I went wrong? please see my code below. I don't know where I need to put the "columns = 'Frequency'" I think it needs to come at Y axis.
Please see my dta frame below(Pivot Table)
Frequency 0.8 1.6
Device_ID Temp Supply
FF_2649 -40.0 1.65 -100.72 -101.35
1.71 -100.61 -101.74
1.80 -100.74 -101.64
1.89 -100.63 -101.69
3.60 -100.60 -101.46
... ... ...
TT_2441 85.0 1.65 -94.99 -94.97
1.71 -94.85 -95.24
1.80 -95.02 -94.97
1.89 -94.69 -96.20
3.60 -94.90 -94.91
data=[go.Scatter(
x=multi_index.index,
y=multi_index.values,
mode='lines',
name='Noise Level'
)]
layout=go.Layout(title='Noise Level')
figure=go.Figure(data=data,layout=layout)
pyo.plot(figure)
plotly does not directly support multi-index
concat values in multi-index to a string that identifies it
generate a plotly scatter per column
import io
df = pd.read_csv(io.StringIO(""" Device_ID Temp Supply 0.8 1.6
FF_2649 -40.0 1.65 -100.72 -101.35
- - 1.71 -100.61 -101.74
- - 1.80 -100.74 -101.64
- - 1.89 -100.63 -101.69
- - 3.60 -100.60 -101.46
TT_2441 85.0 1.65 -94.99 -94.97
- - 1.71 -94.85 -95.24
- - 1.80 -95.02 -94.97
- - 1.89 -94.69 -96.20
- - 3.60 -94.90 -94.91"""), sep="\s+").replace({"-":np.nan}).fillna(method="ffill").apply(pd.to_numeric, **{"errors":"ignore"}).set_index(["Device_ID","Temp","Supply"])
# generate a line for each column dataframe. concat values of multi-index to make it work with plotly
data = [go.Scatter(x=pd.Series(df.index.values).apply(lambda i: " ".join(map(str, i))), y=df[c],
mode='lines', name=f'Noise Level {c}')
for c in df.columns]
layout=go.Layout(title='Noise Level')
figure=go.Figure(data=data,layout=layout)
figure

.annotate() is not plotting all labels

I've created a dataframe with stock information. When I go to create a scatter plot and annotate labels, not all of the labels are included. I'm only getting 3 labels out of 50 or so points. I can't figure out why it's not plotting all labels.
My Table:
Dividend ExpenseRatio Net_Assets PriceEarnings PriceSales
Ticker
FHLC 0.0128 0.08 6.056 22.95 1.78
ONEQ 0.0083 0.21 6.284 20.24 2.26
FTEC 0.0143 0.08 3.909 20.83 2.73
FDIS 0.0144 0.08 2.227 20.17 1.36
FENY 0.0262 0.08 4.386 25.97 1.34
My plotting code:
for ticker,row in df.iterrows():
plt.scatter(row['PriceSales'], row['PriceEarnings'], c = np.random.rand(3,1), s = 300)
for i, txt in enumerate(ticker):
plt.annotate(df.index[i-1], (df.PriceSales[i-1], df.PriceEarnings[i-1]))
plt.xlabel('PriceSales')
plt.ylabel('PriceEarnings')
plt.show()
My graph image:
ticker here is going to have the value of the ticker of the last row; e.g., "FENY". When you call enumerate(ticker), it will generate an item for each char, so it sounds like your last ticker has 3 entries.
I think you can annotate points in the same loop as the scatter plot:
for ticker,row in df.iterrows():
plt.scatter(row['PriceSales'], row['PriceEarnings'], c = np.random.rand(3,1), s = 300)
plt.annotate(ticker, (row['PriceSales'], row['PriceEarnings']))

Categories