How can I loop through the title of a plot matplotlib? - python

I'm new to python and trying to plot the PSD in separate plots for each electrode of my EEG dataset via a for loop. The title of the plot should include the respective electrode name.
Here is the code I use to load the data from a .txt file:
k = pd.read_csv(r'C:\Users\LPC\Desktop\rest txt 7min\AB24_rest_asr_ICA_MARA_7min.txt',usecols=['AFp2','F9','AFF5h','AFF1h','AFF2h','AFF6h','F10','FFT9h','FFT7h','FFC5h','FFC3h','FFC1h','FFC2h','FFC4h','FFC6h','FFT8h','FFT10h','FC1','FCz','FC2','FTT9h','FTT7h','FCC5h','FCC3h','FCC1h','FCC2h','FCC4h','FCC6h','FTT8h','FTT10h','Cz','TTP7h','CCP5h','CCP3h','CCP1h','CCP2h','CCP4h','CCP6h','TTP8h','CPz','TPP9h','TPP7h','CPP5h','CPP3h','CPP1h','CPP2h','CPP4h','CPP6h','TPP8h','TPP10h','Pz','PPO1h','PPO2h','P9','PPO9h','POO1','POO2','PPO10h','P10','POO9h','OI1h','OI2h','POO10h'], sep=",")
k.columns = ['AFp2','F9','AFF5h','AFF1h','AFF2h','AFF6h','F10','FFT9h','FFT7h','FFC5h','FFC3h','FFC1h','FFC2h','FFC4h','FFC6h','FFT8h','FFT10h','FC1','FCz','FC2','FTT9h','FTT7h','FCC5h','FCC3h','FCC1h','FCC2h','FCC4h','FCC6h','FTT8h','FTT10h','Cz','TTP7h','CCP5h','CCP3h','CCP1h','CCP2h','CCP4h','CCP6h','TTP8h','CPz','TPP9h','TPP7h','CPP5h','CPP3h','CPP1h','CPP2h','CPP4h','CPP6h','TPP8h','TPP10h','Pz','PPO1h','PPO2h','P9','PPO9h','POO1','POO2','PPO10h','P10','POO9h','OI1h','OI2h','POO10h']
I don't know if this way of doing is useful, but I try to have k to contain the data and k.columns to call the columns.
Then I use the following for loop:
for columns in k:
freqs, psd = signal.welch(k[columns], fs=500,
window='hanning',nperseg=40, noverlap=20, scaling='density', average='mean')
plt.figure(figsize=(5, 4))
plt.plot(freqs, psd)
plt.title('PSD: power spectral density')
plt.xlabel('Frequency')
plt.ylabel('Power')
plt.axis([0,50, -1, 5])
plt.show()
How can I add a loop in the title of the plot that contains the electrode name?
Thank you very much for your precious help! :)

The response from #Mr.T is really helpful!!
Use f-string formatting plt.title(f'PSD: power spectral density for {columns}')? You probably will also benefit from getting familiar with subplots and axis objects. – Mr. T

Related

Plot time series of paired columns

I have an excle file and would like to create time series plots. For a quick view of data for one site, caputred image is give below. I would like to plot two categories of data - one is modelled and the other is monitoring data. For example, "pH" and "pH data" on one plot, "WQ[SO4,Dissolved]" and "WQ data[SO4,Dissolved]" on one plot, as such for all the remaining 30 paires. That means 60 columns of data to plot.
enter image description here
My approach was:
1) read excel data as DF;
2) creat a list for each category of parameters to plot
3) use the "zip" function to creat a paralle list: parameters_pair = zip(parameters_model,parameters_monitor)
4) plot, some codes shown below.
for i,j in parameters_pair:
fig = plt.figure(figsize=(10, 7)
ax.plot(df_Plot['Tstamps'], df_Plot[i],
label=site, color='blue', linestyle='solid') #fillStyle='none'
ax.plot(df_Plot['Tstamps'], df_Plot[j],
label=site, color='orange', marker='s', markersize='4', linestyle='')
My code can plot for i or j individually but if it does not put modelled and monitoring data on one plot and iterate paralelly as expected. Could you please suggest what functions to use to solve this issue? Thank you very much.

How to align y labels when using 'add_axes'?

I have created a plot with some data points (in blue), a fit (in purple) and I have managed to include the fit residuals (fit-datapoints) by using 'add_axes' as shown below:
#Plot and fit:
fig1 = plt.figure(1)
frame1 = fig1.add_axes((.1,.3,.8,.6))
plt.scatter(a/nshots,m/nshots,zorder=-1,s=1)
plt.plot(a/nshots,fit(a/nshots),color='purple')
plt.xlabel(r'$a_i/N_s$ (mV)')
plt.ylabel(r'$m_i/N_s$ (count/$N_s$)')
plt.tick_params(axis='both',which='both',direction='in',right=True,top=True)
#Residuals:
frame2=fig1.add_axes((.1,.1,.8,.2))
plt.scatter(a/nshots,m/nshots-fit(a/nshots),zorder=-1,s=1,color='pink')
plt.xlabel(r'$a_i/N_s$ (mV)')
plt.ylabel(r'residuals')
plt.tick_params(axis='both',which='both',direction='in',right=True,top=True)
However, I cannot seem to align the y labels on the resulting figure:
I have tried using things like plt.gca().yaxis.set_label_coords(-0.1,0.1) and plt.gca().yaxis.labelpad=20 but I would very much prefer an approach where alignment is automated and I need not align the labels by hand.
Thank you very much for your help.

Creating a seaborn factor plot using two data sets?

I have two data sets:
https://storage.googleapis.com/hewwo/NCHS_-_Leading_Causes_of_Death__United_States.csv
https://storage.googleapis.com/hewwo/BRFSS_Prevalence_and_Trends_Data__Tobacco_Use_-_Four_Level_Smoking_Data_for_2011.csv
From these two data sets, I have created two factor plots
sns.factorplot(x='state', y='deaths', data=death, aspect = 4)
plt.xticks(rotation=90)
Result:
smoking['total_smoker'] = smoking['smoke_everyday'] + smoking['smoke_some_days']
sns.factorplot(x='state', y='total_smoker', data=smoking.sort_values("state"), aspect = 3)
plt.xticks(rotation = 90)
Result:
I am looking for a way to visually compare the two lines. Is there a way to create one factor plot using data from both sets in order to better compare data per state? Is there maybe a better way to show this visualization than what I am thinking? Apologies if my question is unclear, my experience with these tools are still lacking.
just use two lineplot on the same axes:
fig, ax = plt.subplots()
sns.lineplot(..., ax=ax) # first dataset
sns.lineplot(..., ax=ax) # second dataset

Graph matplotlib

I'm working on my first big data project for my university. My dataset is this one: https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset
In this part I'd like to:
Take only the best 20 variables of that particular column (IMDB Score
and Gross)
Plot everything to see the graph.
With this code I can see the graph as shown
Top20 = newmovieDef[['IMDB Score', 'Gross']].sort_values('IMDB Score', ascending=False).nlargest(20, 'IMDB Score')
newmovieDef[['IMDB Score', 'Gross']].sort_values('IMDB Score', ascending=False).nlargest(20, 'IMDB Score')
#visualizing top 20 in plot
plt.figure(figsize=(7,7))
x = Top20["IMDB Score"]
y = Top20["Gross"]
plt.bar(x, y, color="purple")
plt.show()
But if then I write this:
#GROSS-DURATION ---PROBLEMA GRAFICO
Top20 = newmovieDef[['Gross', 'Duration']].sort_values('Gross', ascending=False).nlargest(20, 'Gross')
newmovieDef[['Gross', 'Duration']].sort_values('Gross', ascending=False).nlargest(20, 'Gross')
#visualizing top 20 in plot
plt.figure(figsize=(7,7))
x = Top20["Gross"]
y = Top20["Duration"]
plt.bar(x, y, color="green")
plt.show()
it gives me a blank graph as in
Gross and Duration are continuous variables so a bar chart with Gross on the xaxis and Duration on the yaxis is not the right choice for a visualization. To see the relationship between two continuous variables (in this case Gross and Duration), generally, a scatter (X-Y) plot is used.
From this source, "Bar graphs are used to compare things between different groups or to track changes over time." The key word here is groups which means discrete variables (usually represented as strings in Python).
From the same source, "X-Y plots are used to determine relationships between the two different things. The x-axis is used to measure one event (or variable) and the y-axis is used to measure the other."
You can modify your code to show a scatter (X-Y) plot as follows:
plt.figure(figsize=(7,7))
x = Top20["Gross"]
y = Top20["Duration"]
# Scatter plot
plt.plot(x, y, color="green")
plt.show()
If you really want a bar plot, then I would suggest binning your continuous data. This breaks a continuous variable into discrete groups which can then be shown on a bar graph although this is still not the best choice for the visualization.
This book is an exceptional (free) resource for data visualization. It's written with the R programming language, but the general principles still apply.

df.corr showing all data as 1 when read from compiled csv, even though there is data that is different

I have a compiled data frame of sp500 stock data that I am trying to find correlations with using df.corr(), but it is labeling all data as having a '1' correlation when I run the program, and when I use a heat map to visualize the data it shows an entire green chart, when there should be many many different positive and negative correlations.
Using Python 3.6 and Spyder
here is the code I am using:
def visualize_data():
df = pd.read_csv('sp500_joined_closes.csv')
pd.options.display.float_format = '{:.5f}'.format
#df['AAPL'].plot()
#plt.show()
df_corr = df.corr() #creates a correlation table of our data frame. Generates correlation values
print(df_corr.head())
data1 = df_corr.values #gets inner values of our data frame
fig1 = plt.figure() #specify our figures
ax1 = fig1.add_subplot(1,1,1) #defined axis 1 by 1 plot 1
heatmap1 = ax1.pcolor(data1, cmap=plt.cm.RdYlGn) #sets the color paramater of heat map (negative,neutral,positive)
fig1.colorbar(heatmap1)
ax1.set_xticks(np.arange(data1.shape[0]) + 0.5, minor=False) #sets x ticks for heat map, arranging ticks at every 0.5(half-mark)
ax1.set_yticks(np.arange(data1.shape[1]) + 0.5, minor=False) #sets y ticks for heat map
ax1.invert_yaxis() #removes random gap from the top of graph
ax1.xaxis.tick_top() #moves x axis ticks to the top (meant to look more like a table)
column_labels = df_corr.columns
row_labels = df_corr.index
ax1.set_xticklabels(column_labels)
ax1.set_yticklabels(row_labels)
plt.xticks(rotation=90)
heatmap1.set_clim(-1,1)
plt.tight_layout()
#plt.savefig("correlations.png", dpi = (300))
plt.show()
visualize_data()
The interesting thing is that I searched all over for anyone having a similar error, and I cannot seem to find any answers. Could it be that the ticker symbols could be considered categorical and therefore something is getting skewed? I'm not quite sure here, to be honest.
Even when I tried to plot the correlations for one single company against all the data as seen by #df['AAPL'].plot() and #plt.show() the same exact thing happened where the data is only registering a correlation value of 1.0000 to all of the data.
I initially thought it was a rounding error due to significant figures, so I put in pd.options.display.float_format = '{:.5f}'.format but that didn't work and i still am receiving the skewed correlation.
Here is a screenshot of the issue and the subsequent heat map
Here is a screenshot of part of the data, confirming that it isn't all the same or that is has become corrupted in some measure
The issue was with sourcing the data through the google finance api. There seemed to have been an error downloading one of the dates to one of the sp500 companies and when I compiled all of the data including those few missing dates it could only produce one line of data for some reason. This lead to a correlation of '1' since all the data was exactly the same. I found the specific dates and added them in manually and now the program runs as it should. Thank you.

Categories