Generate and save plots from rows of a data frame - python

Python newbie here.
My problem is the following. I have this (80, 1002) DataFrame of continuous data loaded from a .csv file. My goal with it is to go through every row of this df (80) and plot each row on a basic pyplot.plot. In this df, the first 2 columns are to be used as title so that each plot has it's specific name (here it's the time of the recording and the name of the electrode).
What I did to plot one row is :
import matplotlib.pyplot as plt
import pandas as pd
Location = r'/pathtothefile/name.csv'
df=pd.read_csv(Location,sep=';')
time=range(1,1001);
plt.plot(time,df.loc[0, "0":"999"],'g')
plt.axhline(0, color='black',linewidth=0.5)
plt.xlabel('Time (ms)')
plt.ylabel('Power (mV)')
plt.axis([1, 1000, -5, 5])
plt.title(str(df.iloc[0,0]) + str(df.iloc[0,1]))
plt.show()
row.savefig('/pathwheretosave/name.eps',
format='eps', dpi=1000)
The "time" variable is to be plotted with the rows data. From here I want to loop on the rows of a data frame and plot/save each row in a separated file but so far : I failed. Any idea on how to do that ?
Ideally I want to write the title of the plot in the name of the file to be saved.

You will want to loop each row. This can be achieved by using the itertuple method as follows.
Example data:
sales = [{'values': [1,2,3,4], 'title': 'title 1'},
{'values': [2,3,5,7], 'title': 'title 2'},
{'values': [4,5,5,7], 'title': 'title 3'}]
df = pd.DataFrame(sales)
Produce a plot for the values in each row with the title specified in each row
for row in df.itertuples():
plt.plot(row.values,marker='o')
plt.title(row.title)
plt.savefig(row.title + '.png')
plt.clf()
The output of this is 3 separate plots (one for each row in the dataframe).

How about this? It will get a little more complicated if you want your x-axis to use time, rather than timestamp-labels, but it sounds like you are taking 1 measurement per second, across your electrodes.
import pandas as pd
import datetime
import matplotlib.pyplot as plt
# Make a sample DataFrame
ts = datetime.datetime.now()
df = pd.DataFrame({'time': [ts, ts, ts, ts],
'electrode': [1, 2, 3, 4],
1: [0.1, 0.1, 0.1, 0.1],
2: [0.22, 0.2, 0.2, 0.2],
3: [0.37, 0.3, 0.3, 0.3]},
columns = ['time', 'electrode', 1, 2, 3] )
number_of_measurements = df.shape[1] - 2
for i in range(0, len(df)):
fig, ax = plt.subplots(figsize=(8, 8))
df.iloc[i][2:].plot(ax=ax, xticks=range(1, number_of_measurements + 1, 1))
plot.set_xlabel("Time (ms)")
plot.set_ylabel("Power (mV)")
fig.suptitle('{} electrode:{}'.format(df.iloc[i].time, df.iloc[i].electrode))
fig.savefig('plot{}-{}.png'.format(df.iloc[i].time, df.iloc[i].electrode), bbox_inches='tight')

Related

Sorting the bars in the barchart based on the values in y axis

So I am doing this guided project in datacamp and it is essentially about Exploring the MarketCap of various Cryptocurrencies over the time. Even though I know other means to get the output, I am sticking to the proposed method.
So I need to make a bar graph for the top 10 cryptocurrencies(x axis) and their share of marketcap (y axis). I am able to get the desired output but I want to go one step up and sort the bar in the descending order. Right now, it is sorted based on the first letter of the respective crypto currencies. Here is the code,
#Declaring these now for later use in the plots
TOP_CAP_TITLE = 'Top 10 market capitalization'
TOP_CAP_YLABEL = '% of total cap'
# Selecting the first 10 rows and setting the index
cap10 = cap.iloc[:10,]
# Calculating market_cap_perc
cap10 = cap10.assign(market_cap_perc = round(cap10['market_cap_usd']/sum(cap['market_cap_usd'])*100,2))
# Plotting the barplot with the title defined above
fig, ax = plt.subplots(1,1)
ax.bar(cap10['symbol'], cap10['market_cap_perc'])
ax.set_title(TOP_CAP_TITLE)
ax.set_ylabel(TOP_CAP_YLABEL)
plt.show()
I've replicated your code with dummy data, and output the plot, is this the sorted plot you're looking for? Only need to sort the dataframe using df.sort_values()
import pandas as pd
import matplotlib.pyplot as plt
d = {'BCH': 8, 'BTC': 55, 'ETH': 12, 'MIOTA': 4, 'ADA': 0.5, 'BTG': 0.8, 'XMR': 0.7, 'DASH': 1, 'LTC': 0.99, 'XRP': 2.5}
cap = pd.DataFrame({'symbol': d.keys(), 'market_cap_perc': d.values()})
#Declaring these now for later use in the plots
TOP_CAP_TITLE = 'Top 10 market capitalization'
TOP_CAP_YLABEL = '% of total cap'
# Selecting the first 10 rows and setting the index
cap10 = cap.iloc[:10,]
# Calculating market_cap_perc
# cap10 = cap10.assign(market_cap_perc = round(cap10['market_cap_usd']/sum(cap['market_cap_usd'])*100,2))
cap10 = cap10.sort_values('market_cap_perc', ascending=False) #add this line
# Plotting the barplot with the title defined above
fig, ax = plt.subplots(1,1)
ax.bar(cap10['symbol'], cap10['market_cap_perc'])
ax.set_title(TOP_CAP_TITLE)
ax.set_ylabel(TOP_CAP_YLABEL)
plt.show()
You can sort cap10 before plotting:
cap10 = cap10.sort_values(by='market_cap_perc', ascending=False)
fig, ax = plt.subplots(1,1)
...

python plotly - multiple fragmented lines in same trace

I currently have the following code:
import pandas as pd
df = pd.DataFrame({'x1': [1,7,15], 'x2': [5,10,20]})
df
import plotly.graph_objects as go
fig = go.Figure()
for row in df.iterrows():
row_data = row[1]
fig.add_trace(go.Scatter(x=[row_data['x1'], row_data['x2']], y=[0,0], mode='lines',
line={'color': 'black'}))
fig.update_layout(showlegend=False)
fig.show()
This produces the required result. However, if I have 30k traces, things start to get pretty slow, both when rendering and when working with the plot (zooming, panning). So I'm trying to figure out a better way to do it. I thought of using shapes, but then I loos some functionalities that only traces have (e.g. hover information), and also not sure it'll be faster. Is there some other way to produce fragmented (non-overlapping) lines within one trace?
Thanks!
Update:
Based on the accepted answer by #Mangochutney, here is how I was able to produce the same plot using a single trace:
import numpy as np
import plotly.graph_objects as go
x = [1, 5, np.nan, 7, 10, np.nan, 15, 20]
y = [0, 0, np.nan, 0, 0, np.nan, 0, 0]
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='lines'))
fig.update_layout(showlegend=True)
fig.show()
By default you can introduce gaps in your go.scatter line plot by adding additional np.nan entries where you need them. This behavior is controlled by the connectgaps parameter: see docs
E.g.: go.Scatter(x=[0,1,np.nan, 2, 3], y=[0,0,np.nan,0,0], mode='lines')
should create a line segement between 0 and 1 and 2 and 3.
You need first to find the overlapping lines. Then you can reduce the size of the data frame drastically. First, let us define a sample data frame like yours:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
x_upperbound = 100_000
data = {'x1': [], 'x2': []}
for i in range(30_000):
start = np.random.randint(1, x_upperbound-10)
end = np.random.randint(start, start+4)
data['x1'].append(start)
data['x2'].append(end)
df = pd.DataFrame(data)
Then using the following code, we can find a reduced (by one third) but an equivalent version of our original data frame introduced above:
l = np.zeros(x_upperbound+2)
for i, row in enumerate(df.iterrows()):
l[row[1]['x1']] += 1
l[row[1]['x2']+1] -= 1
cumsum = np.cumsum(l)
new_data = {'x1': [], 'x2': []}
flag = False
for i in range(len(cumsum)):
if cumsum[i]:
if flag:
continue
new_data['x1'].append(i)
flag = True
else:
if flag:
new_data['x2'].append(i-1)
flag = False
optimized_df = pd.DataFrame(new_data)
And now is show time. Using this code, you can show the exact result you would have gotten if you had graphed the original data frame:
fig = go.Figure()
for row in optimized_df.iterrows():
row_data = row[1]
fig.add_trace(go.Scatter(x=[row_data['x1'], row_data['x2']], y=[0,0], mode='lines',
line={'color': 'black'}))
fig.update_layout(showlegend=False)
fig.show()
It takes more time if either the distance between any x1 and its respective x2 decreases or their domain expands further.

How to annotate points in a scatterplot based on a pandas column

Wanted 'Age' as the x-axis, 'Pos' as the y-axis and labels as 'Player' Names. But for some reason, not able to do label the points.
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import adjustText as at
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
y = df.Player
texts = []
for i, txt in enumerate(y):
plt.text()
at.adjust_text(texts, arrowprops=dict(arrowstyle="simple, head_width=0.25, tail_width=0.05", color='black', lw=0.5, alpha=0.5))
plt.show()
Summary of the data :
df.head()
Player Pos Age
0 Thibaut Courtois GK 28
1 Karim Benzema FW 32
2 Sergio Ramos DF 34
3 Raphael Varane DF 27
4 Luka Modric MF 35
Error :
ConversionError: Failed to convert value(s) to axis units: 'GK'
This is the plot so far; not able to label these points:
EDIT:
This is what I wanted but of all points:
Also, Could anyone help me in re-ordering the labels on the yaxis.
Like, I wanted FW,MF,DF,GK as my order but the plot is in MF,DF,FW,GK.
Thanks.
A similar solution was described here. Essentially, you want to annotate the points in your scatter plot.
I have stripped your code. Note that you need to plot the data with matplotlib (and not with pandas): df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age']). In this way, you can use the annotation()-method.
import matplotlib.pyplot as plt
import pandas as pd
# build data
data = [
['Thibaut Courtois', 'GK', 28],
['Karim Benzema', 'FW', 32],
['Sergio Ramos','DF', 34],
['Raphael Varane', 'DF', 27],
['Luka Modric', 'MF', 35],
]
# create pandas DataFrame
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
# open figure + axis
fig, ax = plt.subplots()
# plot
ax.scatter(x=df['Age'],y=df['Pos'],c='DarkBlue')
# set labels
ax.set_xlabel('Age')
ax.set_ylabel('Pos')
# annotate points in axis
for idx, row in df.iterrows():
ax.annotate(row['Player'], (row['Age'], row['Pos']) )
# force matplotlib to draw the graph
plt.show()
This is what you'll get as output:

How to make a 1-cell table headers with Matplotlib?

I am using Matplotlib's PdfPages to plot various figures and tables from queried data and generate a Pdf. I want to group plots by various sections such as "Stage 1", "Stage 2", and "Stage 3", by essentially creating section headers. For example, in a Jupyter notebook I can make cell's markdown and create bolded headers. However, I am not sure how to do something similar with PdfPages. One idea I had was to generate a 1 cell table containing the section title. Instead of creating a 1 cell table, it has a cell per character in the title.
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 2))
ax = plt.subplot(111)
ax.axis('off')
tab = ax.table(cellText=['Stage 1'], bbox=[0, 0, 1, 1])
tab.auto_set_font_size(False)
tab.set_fontsize(24)
This results in the following output:
If anyone has suggestions for how to create section headers or at least fix the cell issue in the table I created, I would appreciate your input. Thanks!
You need to use colLabels to name the columns and use the cellText with a corresponding shape
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 2))
ax = plt.subplot(111)
ax.axis('off')
length = 7
colLabels = ['Stage %s' %i for i in range(1,length+1)] # <--- 1 row, 7 columns
cellText = np.random.randint(0, 10, (1,length))
tab = ax.table(cellText=cellText, colLabels=colLabels, bbox=[0, 0, 1, 1], cellLoc = 'center')
tab.auto_set_font_size(False)
tab.set_fontsize(14)
Table with multiple rows
cellText = np.random.randint(0, 10, (3,length)) # <--- 3 rows, 7 columns
tab = ax.table(cellText=cellText, colLabels=colLabels, bbox=[0, 0, 1, 1], cellLoc = 'center')
To get a single row with multiple columns starting from 2 rows, 7 columns
tab = ax.table(cellText=[['']*length], colLabels=colLabels, bbox=[0, 0, 1, 1], cellLoc = 'center')
cells=tab.get_celld()
for i in range(length):
cells[(1,i)].set_height(0)
Getting a single column Using in the above code
length = 1
produces
A table expects two dimensional cellText. I.e. the mth column of the nth row has the content cellText[n][m]. If cellText=['Stage 1'], cellText[0][0] will evaluate to "S", because there is one row and the string inside is indexed as the columns. Instead you probably want to use
ax.table(cellText=[['Stage 1']])
i.e. the whole text as the first column of the first row.
Now the underlying question seems to be how to add a section title, and maybe using a table for that is not the best approach? At least a similar result could be achieved with a usual text,
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 2))
ax.tick_params(labelleft=False, left=False, labelbottom=False, bottom=False)
ax.annotate('Stage 1', (.5,.5), ha="center", va="center", fontsize=24)
plt.show()
I may be misunderstanding your question, but if your ultimate goal is to group multiple plots together in PDF, one solution is to make each of your plots a subplot of the same figure. For example:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import random
# Declare the PDF file and the single figure
pp = PdfPages('test.pdf')
thefig = plt.figure()
thefig.suptitle("Group 1")
# Generate 4 subplots for the same figure, arranged in a 2x2 grid
subplots = [ ["Plot One", 221], ["Plot Two", 222],
["Plot Three", 223], ["Plot Four", 224] ]
for [subplot_title, grid_position] in subplots:
plt.subplot(grid_position)
plt.title(subplot_title)
# Make a random bar graph:
plt.bar(range(1,11), [ random.random() for i in range(10) ])
# Add some spacing, so that the writing doesn't overlap
plt.subplots_adjust(hspace=0.35, wspace=0.35)
# Finish
pp.savefig()
pp.close()
When I do this, I get something like the following:

Sharing two y axes on multiple matplotlib subplots [duplicate]

This question already has an answer here:
How to share secondary y-axis between subplots in matplotlib
(1 answer)
Closed 5 years ago.
My goal is to have two rows and three columns of plots using matplotlib. Each graph in the top row will contain two data series, and two y-axes. I want to make the scales on each axis line up so that the corresponding data series are directly comparable. Right now I have it so that the primary y-axis on each graph is aligned, but I can't get the secondary y-axes to align. Here is my current code:
import matplotlib.pyplot as plt
import pandas as pd
excel_file = 'test_data.xlsx'
sims = ['Sim 02', 'Sim 01', 'Sim 03']
if __name__ == '__main__':
data = pd.read_excel(excel_file, skiprows=[0, 1, 2, 3], sheetname=None, header=1, index_col=[0, 1], skip_footer=10)
plot_cols = len(sims)
plot_rows = 2
f, axes = plt.subplots(plot_rows, plot_cols, sharex='col', sharey='row')
secondary_ax = []
for i, sim in enumerate(sims):
df = data[sim]
modern = df.loc['Modern']
traditional = df.loc['Traditional']
axes[0][i].plot(modern.index, modern['Idle'])
secondary_ax.append(axes[0][i].twinx())
secondary_ax[i].plot(modern.index, modern['Work'])
axes[1][i].bar(modern.index, modern['Result'])
axes[0][i].set_xlim(12, 6)
if i > 0:
secondary_ax[0].get_shared_y_axes().join(secondary_ax[0], secondary_ax[i])
# secondary_ax[0].get_shared_y_axes().join(x for x in secondary_ax)
plt.show()
The solution I tried (Both the line in the if statement, and the last line before plt.show()) were solutions to similar questions, however it didn't resolve my issue. Nothing breaks, the secondary axes just aren't aligned.
I also tried adding an extra row in the plt.subplots method and using twinx() to combined the first two rows, but it created an empty second row of plots none-the-less.
As a fall back I think I could manually check each axes for the maxes and mins, and loop through each to update manually, but I'd love to find a cleaner solution if one is out there, and anyone has any insight. Thanks.
You just need to share the y axes before plotting your data:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# excel_file = 'test_data.xlsx'
sims = ['Sim 02', 'Sim 01', 'Sim 03']
if __name__ == '__main__':
# data = pd.read_excel(excel_file, skiprows=[0, 1, 2, 3], sheetname=None, header=1, index_col=[0, 1], skip_footer=10)
modern = pd.DataFrame(np.random.randint(0, 100, (100, 3)), columns=sims)
traditional = pd.DataFrame(np.random.randint(10, 30, (100, 3)), columns=sims)
traditional[sims[1]] = traditional[sims[1]] + 40
traditional[sims[2]] = traditional[sims[2]] - 40
data3 = pd.DataFrame(np.random.randint(0, 100, (100, 3)), columns=sims)
plot_cols = len(sims)
plot_rows = 2
f, axes = plt.subplots(plot_rows, plot_cols, sharex='col', sharey='row', figsize=(30, 10))
secondary_ax = []
for i, sim in enumerate(sims):
df = data[sim]
modern_series = modern[sim]
traditional_series = traditional[sim]
idle = data3
axes[0][i].plot(modern_series.index, modern_series)
secondary_ax.append(axes[0][i].twinx())
if i > 0:
secondary_ax[0].get_shared_y_axes().join(secondary_ax[0], secondary_ax[i])
secondary_ax[i].plot(traditional_series.index, traditional_series)
# axes[1][i].bar(data3.index, data3)
axes[0][i].set_xlim(12, 6)
plt.show()

Categories