I need to plot a pivot chart from a multi-indexed pivot table. This is my pivot table description "multi_index = pd.pivot_table(df_new, index = ['Device_ID', 'Temp' ,'Supply'],columns = 'Frequency', values = 'NoiseLevel',)"
I used Plotly at that time it is coming as a single straight line. I am expecting two zig-zag lines one for frequency 0.8 and the other for 1.6 as shown in the first figure. could you please tell me where I went wrong? please see my code below. I don't know where I need to put the "columns = 'Frequency'" I think it needs to come at Y axis.
Please see my dta frame below(Pivot Table)
Frequency 0.8 1.6
Device_ID Temp Supply
FF_2649 -40.0 1.65 -100.72 -101.35
1.71 -100.61 -101.74
1.80 -100.74 -101.64
1.89 -100.63 -101.69
3.60 -100.60 -101.46
... ... ...
TT_2441 85.0 1.65 -94.99 -94.97
1.71 -94.85 -95.24
1.80 -95.02 -94.97
1.89 -94.69 -96.20
3.60 -94.90 -94.91
data=[go.Scatter(
x=multi_index.index,
y=multi_index.values,
mode='lines',
name='Noise Level'
)]
layout=go.Layout(title='Noise Level')
figure=go.Figure(data=data,layout=layout)
pyo.plot(figure)
plotly does not directly support multi-index
concat values in multi-index to a string that identifies it
generate a plotly scatter per column
import io
df = pd.read_csv(io.StringIO(""" Device_ID Temp Supply 0.8 1.6
FF_2649 -40.0 1.65 -100.72 -101.35
- - 1.71 -100.61 -101.74
- - 1.80 -100.74 -101.64
- - 1.89 -100.63 -101.69
- - 3.60 -100.60 -101.46
TT_2441 85.0 1.65 -94.99 -94.97
- - 1.71 -94.85 -95.24
- - 1.80 -95.02 -94.97
- - 1.89 -94.69 -96.20
- - 3.60 -94.90 -94.91"""), sep="\s+").replace({"-":np.nan}).fillna(method="ffill").apply(pd.to_numeric, **{"errors":"ignore"}).set_index(["Device_ID","Temp","Supply"])
# generate a line for each column dataframe. concat values of multi-index to make it work with plotly
data = [go.Scatter(x=pd.Series(df.index.values).apply(lambda i: " ".join(map(str, i))), y=df[c],
mode='lines', name=f'Noise Level {c}')
for c in df.columns]
layout=go.Layout(title='Noise Level')
figure=go.Figure(data=data,layout=layout)
figure
Related
How to combine multiple txt files into one merged file, where each file contains different number of columns(with Float values usually) and I need to get one merged file with all the columns as follows:
EDIT:
there is one rule: In case there is a non-numeric value ("Nan" for example..), I need to do padding according to the last numeric value that was before it.
file1.txt
1.04
2.26
3.87
file2.txt
5.44 4.65 9.86
8.67 Nan 7.45
8.41 6.54 6.21
file3.txt
6.98 6.52
4.45 8.74
0.58 4.12
merged.txt
1.04 5.44 4.65 9.86 6.98 6.52
2.26 8.67 8.67 7.45 4.45 8.74
3.87 8.41 6.54 6.21 0.58 4.12
I saw here answer to the case of one column in each file.
how can I do this for multiple columns?
The simplest way is probably using numpy:
import numpy as np
filenames = ["file1.txt", "file2.txt", "file3.txt"]
fmt = '%.2f' # assuming format is known in advance
all_columns = []
for filename in filenames:
all_columns.append(np.genfromtxt(filename))
arr_out = np.column_stack(tuple(all_columns)) # Stack columns
# Fill NaN-elements with last numeric value
arr_1d = np.ravel(arr_out) # "flat reference" to arr_out
replaced_all_nan = False
nan_indices = np.where(np.isnan(arr_1d))
while len(nan_indices[0]):
new_indices = tuple([i-1 for i in nan_indices])
arr_1d[nan_indices] = arr_1d[new_indices]
nan_indices = np.where(np.isnan(arr_1d))
np.savetxt("merged.txt", arr_out, fmt=fmt)
One problem (if it is one for you) that might occur is that the very first, i.e. the upper-left element, is non-numeric. In that case, the last (lower-right) value or the last numeric value before that would be used.
I am trying to use Seaborn to plot a simple bar plot using data that was transformed. The data started out looking like this (text follows):
element 1 2 3 4 5 6 7 8 9 10 11 12
C 95.6 95.81 96.1 95.89 97.92 96.71 96.1 96.38 96.09 97.12 95.12 95.97
N 1.9 1.55 1.59 1.66 0.53 1.22 1.57 1.63 1.82 0.83 2.37 2.13
O 2.31 2.4 2.14 2.25 1.36 1.89 2.23 1.8 1.93 1.89 2.3 1.71
Co 0.18 0.21 0.16 0.17 0.01 0.03 0.13 0.01 0.02 0.01 0.14 0.01
Zn 0.01 0.03 0.02 0.03 0.18 0.14 0.07 0.17 0.14 0.16 0.07 0.18
and after importing using:
df1 = pd.read_csv(r"C:\path.txt", sep='\t',header = 0, usecols=[0, 1, 2,3,4,5,6,7,8,9,10,11,12], index_col='element').transpose()
display(df1)
When I plot the values of an element versus the first column (which represents an observation), the first column of data corresponding to 'C' is used instead. What am I doing wrong and how can I fix it?
I also tried importing, then pivoting the dataframe, which resulted in an undesired shape that repeated the element set as columns 12 times.
ax = sns.barplot(x=df1.iloc[:,0], y='Zn', data=df1)
edited to add that I am not married to using any particular package or technique. I just want to be able to use my data to build a bar plot with 1-12 on the x axis and elemental compositions on the y.
you have different possibilities here. The problem you have is because 'element' is the index of your dataframe, so x=df1.iloc[:,0] is the column of 'C'.
1)
ax = sns.barplot(x=df.index, y='Zn', data=df1)
df.reset_index(inplace=True) #now 'element' is the first column of the df1
ax = sns.barplot(x=df.iloc[:,0], y='Zn', data=df1)
#equal to
ax = sns.barplot(x='element', y='Zn', data=df1
I'm trying to load data from an excel sheet and then plot all on the same plot but I am a little inexperienced with plotting multiple lines on a single plot. Each column is Time Elapsed and the corresponding Residual Act. and I have multiple columns named the same. Time is different in each column hence having multiple time columns. Right now the code just outputs 4 separate plots. Can someone tell me how to do this without overly complicating myself, I have to plot multiple files in the future and would like an easy way.
import pandas as pd
import matplotlib.pyplot as plt
HD = pd.read_excel('C:\\Users\\azizl\\Desktop\\HDPD_Data.xlsx')
HD.plot(x='Time Elapsed', y= 'Residual Act.' , label='24')
HD.plot(x='Time Elapsed.1', y= 'Residual Act..1', label='48')
HD.plot(x='Time Elapsed.2', y= 'Residual Act..2', label='normal')
HD.plot(x='Time Elapsed.3', y= 'Residual Act..3', label='physical')
plt.show()
HD.head()
Assuming that you have read HD before, to genarate your plot,
try the following code:
import matplotlib.pyplot as plt
labels = ['24', '48', 'normal', 'physical']
for slc in np.split(HD.values, HD.shape[1] // 2, axis=1):
plt.plot(slc[:, 0], slc[:, 1], 'o-')
plt.xlabel('Time elapsed')
plt.ylabel('Residual Act.')
plt.legend(labels)
plt.show()
The idea is to:
Divide the source DataFrame into 2-column slices (np.split).
Plot each slice, with x and y data from the current
slice (plt.plot).
Create the legend from an external list.
To test the above code I created the following DataFrame:
Time Elapsed Residual Act. Time Elapsed.1 Residual Act..1 Time Elapsed.2 Residual Act..2 Time Elapsed.3 Residual Act..3
0 1.00 4.15 1.10 4.10 1.15 3.50 1.05 3.76
1 1.15 4.01 1.27 3.90 1.30 3.20 1.20 3.00
2 1.80 3.40 1.90 3.50 2.11 3.00 2.00 2.90
3 2.20 3.00 2.50 3.05 2.47 2.88 2.30 2.70
4 2.90 2.50 2.95 2.20 2.90 2.40 3.10 2.30
5 3.60 2.00 4.00 1.70 3.86 2.20 4.05 2.00
The original Excel file contains pairs of Time Elapsed and
Residual Act. columns, but read_excel adds .1, .2, ...
to repeating column names.
Time Elapsed is expressed in seconds and Residual Act. in whatever
unit of choice.
For the above data I got the following picture:
I am merging one column from DataFrame (df1) with another DataFrame (df2 where both have the same index. The result of this operation gives me a lot more rows that I started with (duplicates). Is there a way to avoid duplicates? Please see the example codes below to replicate my issue.
df1 = pd.DataFrame([[1, 1.0, 2.3,0.2,0.53], [2, 3.35, 2.0,0.2,0.65], [2,3.4,
2.0,0.25,0.55]],
columns=["Sample_ID", "NaX", "NaU","OC","EC"])\
.set_index('Sample_ID')
df2 = pd.DataFrame([[1,0.2, 1.5, 82], [2, 3.35,2.4,92],[2, 3.4, 2.0,0.25]],
columns=["Sample_ID", "OC","Flow", "Diameter"])\
.set_index('Sample_ID')
df1 = pd.merge(df1,df2['Flow'].to_frame(), left_index=True,right_index=True)
My result (below) has two entries for sample "2" starting with 3.35 and then two entries for "2" starting with 3.40.
What I was expecting was just two entries for "2", one starting with 3.35 and the other line for "2" starting with 3.40. So the total number of rows should be only three, while I have a total of 5 rows of data now.
Can you please see what the reason for this is? Thanks for your help!
NaX NaU OC EC Flow
Sample_ID
1 1.00 2.3 0.20 0.53 1.5
2 3.35 2.0 0.20 0.65 2.4
2 3.35 2.0 0.20 0.65 2.0
2 3.40 2.0 0.25 0.55 2.4
2 3.40 2.0 0.25 0.55 2.0
What you want to do is concatenate as follows:
pd.concat([df1, df2['Flow'].to_frame()], axis=1)
...which returns your desired output. The axis=1 argument let's you "glue on" extra columns.
As to why your join is returning twice as many entries for Sample_ID = 2, you can read through the docs on joins. The relevant portion is:
In SQL / standard relational algebra, if a key combination appears more than once in both tables, the resulting table will have the Cartesian product of the associated data.
I've created a dataframe with stock information. When I go to create a scatter plot and annotate labels, not all of the labels are included. I'm only getting 3 labels out of 50 or so points. I can't figure out why it's not plotting all labels.
My Table:
Dividend ExpenseRatio Net_Assets PriceEarnings PriceSales
Ticker
FHLC 0.0128 0.08 6.056 22.95 1.78
ONEQ 0.0083 0.21 6.284 20.24 2.26
FTEC 0.0143 0.08 3.909 20.83 2.73
FDIS 0.0144 0.08 2.227 20.17 1.36
FENY 0.0262 0.08 4.386 25.97 1.34
My plotting code:
for ticker,row in df.iterrows():
plt.scatter(row['PriceSales'], row['PriceEarnings'], c = np.random.rand(3,1), s = 300)
for i, txt in enumerate(ticker):
plt.annotate(df.index[i-1], (df.PriceSales[i-1], df.PriceEarnings[i-1]))
plt.xlabel('PriceSales')
plt.ylabel('PriceEarnings')
plt.show()
My graph image:
ticker here is going to have the value of the ticker of the last row; e.g., "FENY". When you call enumerate(ticker), it will generate an item for each char, so it sounds like your last ticker has 3 entries.
I think you can annotate points in the same loop as the scatter plot:
for ticker,row in df.iterrows():
plt.scatter(row['PriceSales'], row['PriceEarnings'], c = np.random.rand(3,1), s = 300)
plt.annotate(ticker, (row['PriceSales'], row['PriceEarnings']))