Matplotlib: Same title for 8 plots plotted using loop - python

I have the following code which generates 8 plots. I want to put the phases as titles in each plot. So I have succeded to put the phase on the plot. But instead of taking corresponding phase, it is always taking the last phase to show in each plot. The 8phases.txt file has the following 8 lines which I want to put in each plot -
-1 1 -1
-1 1 1
1 1 1
1 -1 1
-1 -1 -1
1 1 -1
1 -1 -1
-1 -1 1
Here is the code -
import numpy as np
import matplotlib.pyplot as plt
D=12
n=np.arange(1,4)
x = np.linspace(-D/2,D/2, 3000)
I = np.array([125,300,75])
phase = np.genfromtxt('8phases.txt')
I_phase = I*phase
for i in I_phase:
F = sum(m*np.cos(2*np.pi*l*x/D) for m,l in zip(i,n))
f,(ax1,ax2) = plt.subplots(2)
for row in phase:
ax1.plot(x,F,'g')
ax1.set_title(row)
plt.show()

I think your inner-most loop is unnecessary; it is recreating the same plot 8 times and updating the title 8 times with each of the 8 values.
If I understood what you are asking for, I believe this gives the correct results:
...
for index,i in enumerate(I_phase):
F = sum(m*np.cos(2*np.pi*l*x/D) for m,l in zip(i,n))
f,(ax1,ax2) = plt.subplots(2)
ax1.plot(x,F,'g')
ax1.set_title(phase[index])
...
(I would normally use "i" instead of "index", but you had already used "i")

Related

How do I plot an interaction graph, like a schemaball, from a table showing correlation/interaction data in python?

I've got a table with data from which I'd like to show the interaction in an informative way.
I have counted the interactions between different people, and inputted this in a table, which looks like this:
ideally, I'd like to visualise this data in interesting ways (if you know more, please let me know!). I found these things, and I'd like to create one from this data myself.
I found some tutorials online, however, I can't seem to get it to work as I am unable to input my data the right way in an NX graph: when iterating through the table, I end up attaching wrong ends to eachother or skipping data.
data:
A
B
C
D
E
F
A
x
2
1
3
0
0
B
2
x
0
4
5
1
C
1
0
x
3
0
2
D
3
4
3
x
1
1
E
0
5
0
1
x
1
F
0
1
2
1
1
x
Best-Effort code:
import matplotlib.pyplot as plt
import networkx as nx
import matplotlib
namelist = []
for i in range(0,len(systeem)):
namelist.append(systeem.iloc[i,0])
G=nx.Graph()
G.add_nodes_from(namelist)
weightlist=[]
for i in range(0,len(namelist)):
for j in range(1,len(namelist)):
if int(systeem.iloc[i,j]) > 0:
W=int(systeem.iloc[i,j])
weightlist.append(W)
G.add_edge(namelist[i-1],namelist[j], weight= W)
else:
continue
plt.figure(figsize=(40,40))
pos = nx.circular_layout(G)
cmap = matplotlib.cm.get_cmap('plasma_r')
nx.draw_networkx(G, pos, width=1, node_color="blue", edge_cmap=cmap, with_labels=False)
labels_pos = {name:[pos_list[0], pos_list[1]-0.04] for name, pos_list in pos.items()}
nx.draw_networkx_labels(G, labels_pos, font_size=40, font_family="sans-serif", font_color="#000000", font_weight="bold")
ax = plt.gca()
ax.margins(0.25)
plt.axis("equal")
plt.tight_layout()

Determine number of consecutive identical points in a grid

I have dataframe and grid size is 12*8
I want to calculate the number of consecutive red dots (only in the vertical direction ) and make new column with it (col = consecutive red ) for blue it will be zero
for example
X y red/blue consecutive red
1 1 blue 0
1 3 red 3
1 4 red 3
1 2 blue 0
1 5 red 3
9 4 red 5
[![enter image description here][1]][1]
Already have data for first 3 columns
from sklearn.neighbors import BallTree
red_points = df[df.red/blue== red]
blue_points = df[df.red/blue!= red]
tree = BallTree(red_points[['x','y']], leaf_size=40, metric='minkowski')
distance, index = tree.query(df[['x','y']], k=2)
I am not aware of such algorithm (there may very well be one), but writing the algo isn't that hard (I work with numpy because I'm used to it and because you can easily accelerate with CUDA and port to other data science python tools).
The data (0=blue, 1=red):
import numpy as np
import pandas as pd
# Generating dummy data for testing
ROWS=10
COLS=20
X = np.random.randint(2, size=(ROWS, COLS))
# Visualizing
df = pd.DataFrame(data=X)
bg='background-color: '
df.style.apply(lambda x: [bg+'red' if v>=1 else bg+'blue' for v in x])
The algorithm:
result = np.zeros((ROWS,COLS),dtype=np.int)
for y,x in np.ndindex(X.shape):
if X[y, x]==0:
continue
cons = 1 # consecutive in any direction including current
# Going backwward while we can
prev = y-1
while prev>=0:
if X[prev,x]==0:
break
cons+=1
prev-=1
# Going forward while we can
nxt = y+1
while nxt<=ROWS-1:
if X[nxt,x]==0:
break
cons+=1
nxt+=1
result[y,x]=cons
df2 = pd.DataFrame(data=result)
df2.style.apply(lambda x: [bg+'red' if v>=1 else bg+'blue' for v in x])
And the result:
Please note that in numpy the first coordinate represents the row index (y in your case), and the second the column (x in your case), you can use transpose on your data if you want to swap to x,y.

Calculate gap between two datasets (pandas, matplotlib, fill_between already used)

I'd like to ask for suggestions how to calculate lenght of gap between two datasets in matplotlib made of pandas dataframe. Ideally, I would like to have these gap values written in the plot and also, if it is possible, include them into the dataframe.
Here is my simplified example of dataframe:
import pandas as pd
d = {'Mean-1': [0.195842, 0.295069, 0.321345, 0.773725], 'SEM-1': [0.001216, 0.002687, 0.005267, 0.029974], 'Mean-2': [0.143103, 0.250505, 0.305767, 0.960804],'SEM-2': [0.000959, 0.001368, 0.003722, 0.150025], 'Atom Number': [1, 3, 5, 7]}
df=pd.DataFrame(d)
df
Mean-1 SEM-1 Mean-2 SEM-2 Atom Number
0 0.195842 0.001216 0.143103 0.000959 1
1 0.295069 0.002687 0.250505 0.001368 3
2 0.321345 0.005267 0.305767 0.003722 5
3 0.773725 0.029974 0.960804 0.150025 7
Then I made plot, where we can see two lines representing Mean-1 and Mean-2, and then shaded area around each line representing standard error of the mean. This is done for the selected atom numbers.
import matplotlib.pyplot as plt
ax = df.plot(x='Atom Number', y=['Mean-1','Mean-2'])
y_1 = df['Mean-1']
y_2 = df['Mean-2']
x = df['Atom Number']
error_1 = df['SEM-1']
error_2 = df['SEM-1']
ax.fill_between(df['Atom Number'], y_1-error_1, y_1+error_1, alpha=0.2, edgecolor='#CC4F1B', facecolor='#FF9848')
ax.fill_between(df['Atom Number'], y_2-error_2, y_2+error_2, alpha=0.2, edgecolor='#3F7F4C', facecolor='#7EFF99')
plt.xticks(x)
What I would like to do further is to calculate the gap for each residue. The gap is the white space only, thus space where the lines as well as the shaded areas (SEMs) don't overlap.
And also would like to know if I can somehow print the gap values from the plot? And save them into column. Thank You for suggestions.
It's not a compact solution but you could try something like this (Check the order of things). Calculate all the position (y_i and upper and lower limits).
import numpy as np
df['y1_upper'] = y_1+error_1
df['y1_lower'] = y_1-error_1
df['y2_upper'] = y_2+error_2
df['y2_lower'] = y_2-error_2
which gives
Mean-1 SEM-1 Mean-2 SEM-2 Atom Number y1_upper y1_lower \
0 0.195842 0.001216 0.143103 0.000959 1 0.197058 0.194626
1 0.295069 0.002687 0.250505 0.001368 3 0.297756 0.292382
2 0.321345 0.005267 0.305767 0.003722 5 0.326612 0.316078
3 0.773725 0.029974 0.960804 0.150025 7 0.803699 0.743751
y2_upper y2_lower
0 0.144319 0.141887
1 0.253192 0.247818
2 0.311034 0.300500
3 0.990778 0.930830
The distances (gaps) are calculated differently depending on if y_1 is over y_2and vice versa. So use conditions on the upper and lower limits and use linalg.norm to compute the distance.
conditions = [
(df['y1_lower'] >= df['y2_upper']),
(df['y1_lower'] < df['y2_upper'])]
choices = [np.linalg.norm(df['y1_lower']-df['y2_upper']), np.linalg.norm(df['y2_lower']-df['y1_upper'])]
df['dist'] = np.select(conditions, choices)
This gives
Mean-1 SEM-1 Mean-2 SEM-2 Atom Number y1_upper y1_lower \
0 0.195842 0.001216 0.143103 0.000959 1 0.197058 0.194626
1 0.295069 0.002687 0.250505 0.001368 3 0.297756 0.292382
2 0.321345 0.005267 0.305767 0.003722 5 0.326612 0.316078
3 0.773725 0.029974 0.960804 0.150025 7 0.803699 0.743751
y2_upper y2_lower dist
0 0.144319 0.141887 0.255175
1 0.253192 0.247818 0.255175
2 0.311034 0.300500 0.255175
3 0.990778 0.930830 0.149605
As I said, check the order, but this is a possible solution.
IIUC, do you want something like this:
import matplotlib.pyplot as plt
ax = df.plot(x='Atom Number', y=['Mean-1','Mean-2'], figsize=(15,8))
y_1 = df['Mean-1']
y_2 = df['Mean-2']
x = df['Atom Number']
error_1 = df['SEM-1']
error_2 = df['SEM-1']
ax.fill_between(df['Atom Number'], y_1-error_1, y_1+error_1, alpha=0.2, edgecolor='#CC4F1B', facecolor='#FF9848')
ax.fill_between(df['Atom Number'], y_2-error_2, y_2+error_2, alpha=0.2, edgecolor='#3F7F4C', facecolor='#7EFF99')
ax.fill_between(df['Atom Number'], y_1+error_1, y_2-error_2, alpha=.2, edgecolor='k', facecolor='blue')
for i in range(len(x)):
gap = y_1[i]+error_1[i] - y_2[i]-error_2[i]
ylabel = min(y_1[i], y_2[i]) + abs(gap) / 2
_ = ax.annotate(f'{gap:0.4f}', xy=(x[i],ylabel), xytext=(x[i]-.14,y_1[i]+gap/abs(gap)*.2), arrowprops=dict(arrowstyle="-"))
plt.xticks(x);
Output:

with VTK Python API, add multiple scalars to unstructured grid cells

This Python script:
import numpy as np
import vtk
from vtk.util.numpy_support import numpy_to_vtk
# Open a file, and create an unstructured grid.
filename = 'example.vtk'
writer = vtk.vtkUnstructuredGridWriter()
writer.SetFileName(filename)
grid = vtk.vtkUnstructuredGrid()
# Create 3 points
A,B,C = (0,0,0), (0,1,0), (1,0,0)
points = np.array( (A,B,C) )
vtk_points = vtk.vtkPoints()
vtk_points.SetData( numpy_to_vtk(points) )
grid.SetPoints(vtk_points)
# Cells: just 1 triangle
ntriangles = 1
npoints_per_triangle = 3
cells = np.array( [npoints_per_triangle, 0, 1, 2] )
vtk_cells = vtk.vtkCellArray()
id_array = vtk.vtkIdTypeArray()
id_array.SetVoidArray(cells, len(cells), 1)
vtk_cells.SetCells(ntriangles, id_array)
# Cell types: just 1 triangle.
cell_types = np.array( [vtk.VTK_TRIANGLE] , 'B')
vtk_cell_types = numpy_to_vtk(cell_types)
# Cell locations: the triangle is in `cells` at index 0.
cell_locations = np.array( [0,])
vtk_cell_locations = numpy_to_vtk(cell_locations, deep=1,
array_type=vtk.VTK_ID_TYPE)
# Cells: add to grid
grid.SetCells(vtk_cell_types, vtk_cell_locations, vtk_cells)
data = grid.GetCellData()
# Add scalar data to the triangle
data.SetActiveScalars('foo')
foo = np.array( [11.,] )
vtk_foo = numpy_to_vtk(foo)
vtk_foo.SetName("foo")
data.SetScalars(vtk_foo)
# Add other scalar data to the triangle
data.SetActiveScalars('bar')
bar = np.array( [12.,] )
vtk_bar = numpy_to_vtk(bar)
vtk_bar.SetName("bar")
data.SetScalars(vtk_bar)
# Write to file.
writer.SetInput(grid)
writer.Write()
print open(filename).read()
Produce the file:
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET UNSTRUCTURED_GRID
POINTS 3 long
0 0 0 0 1 0 1 0 0
CELLS 1 4
3 0 1 2
CELL_TYPES 1
5
CELL_DATA 1
SCALARS bar double
LOOKUP_TABLE default
12
FIELD FieldData 1
foo 1 1 double
11
But I want CELL_DATA section to be:
CELL_DATA 1
SCALARS foo double
LOOKUP_TABLE default
11
SCALARS bar double
LOOKUP_TABLE default
12
Edit
Looking at the source code (WriteCellData, WriteScalarData and deeper), it seems impossible.
You can add how many arrays you want using AddArray instead of SetActiveScalars
See also http://public.kitware.com/pipermail/vtkusers/2004-August/026366.html
http://www.vtk.org/doc/nightly/html/classvtkCellData-members.html
From what I've read, vtk can't write multiple SCALARS, but can read it. (What a good API!).
I'll continue using the good old pyvtk (which also has the adavange to be readable):
import pyvtk
filename = 'example.vtk'
title = 'Unstructured Grid Example'
points = [[0,0,0],[0,1,0],[0,0,1]]
triangles = [[0,1,2]]
grid = pyvtk.UnstructuredGrid(points, triangle=triangles)
celldata = pyvtk.CellData( pyvtk.Scalars([11.,], name="foo"),
pyvtk.Scalars([12.,], name="bar"))
vtk = pyvtk.VtkData(grid, celldata, title)
vtk.tofile(filename)
print open(filename).read()
Which produce:
# vtk DataFile Version 2.0
Unstructured Grid Example
ASCII
DATASET UNSTRUCTURED_GRID
POINTS 3 int
0 0 0
0 1 0
0 0 1
CELLS 1 4
3 0 1 2
CELL_TYPES 1
5
CELL_DATA 1
SCALARS foo float 1
LOOKUP_TABLE default
11.0
SCALARS bar float 1
LOOKUP_TABLE default
12.0

Memory leak when using matplotlib.collection.LineCollection

I am using the following code to create a collection of color coded line plots:
for j in idlist[i]:
single_traj(lonarray, latarray, parray)
plt.savefig(savename, dpi = 400)
plt.close('all')
plt.clf()
where:
def single_traj(lonarray, latarray, parray, linewidth = 0.7):
"""
Plots XY Plot of one trajectory, with color as a function of p
Helper Function for DrawXYTraj
"""
global lc
x = lonarray
y = latarray
p = parray
points = np.array([x,y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = col.LineCollection(segments, cmap=plt.get_cmap('Spectral'),
norm=plt.Normalize(100, 1000), alpha = 0.8)
lc.set_array(p)
lc.set_linewidth(linewidth)
plt.gca().add_collection(lc)
Somehow, this loop uses a lot of memory (> ~10GB), which is still being used after the plot is saved.
I used hpy to look at memory usage
Partition of a set of 27472988 objects. Total size = 10990671168 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 8803917 32 9226505016 84 9226505016 84 dict of matplotlib.path.Path
1 8888542 32 711083360 6 9937588376 90 numpy.ndarray
2 8803917 32 563450688 5 10501039064 96 matplotlib.path.Path
3 11 0 219679112 2 10720718176 98 guppy.sets.setsc.ImmNodeSet
4 25407 0 77593848 1 10798312024 98 list
5 89367 0 28232616 0 10826544640 99 dict (no owner)
6 7642 0 25615984 0 10852160624 99 dict of matplotlib.collections.LineCollection
7 15343 0 16079464 0 10868240088 99 dict of
matplotlib.transforms.CompositeGenericTransform
8 15327 0 16062696 0 10884302784 99 dict of matplotlib.transforms.Bbox
9 53741 0 15047480 0 10899350264 99 dict of weakref.WeakValueDictionary
At this point the plot is already saved, so all matplotlib related objects should be gone... But I cant "find" these objects, which means I don't know how to delete them.
EDIT:
Here is a stand-alone example which reproduces the leak (savefig throws an error for some reason but isn't relevant anyway):
# Memory leak test!
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.collections as col
def draw():
x = range(1000)
y = range(1000)
p = range(1000)
fig = plt.figure(figsize = (12,8))
ax = plt.gca()
ax.set_aspect('equal')
for i in range(1000):
if i%100 == 0:
print i
points = np.array([x,y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = col.LineCollection(segments, cmap=plt.get_cmap('Spectral'),
norm=plt.Normalize(0, 1000), alpha = 0.8)
lc.set_array(p)
lc.set_linewidth(0.7)
plt.gca().add_collection(lc)
cb = fig.colorbar(lc, shrink = 0.7)
cb.set_label('p')
cb.ax.invert_yaxis()
plt.tight_layout()
#plt.savefig('./mem_test.png', dpi = 400)
plt.close('all')
plt.clf()
draw()
a = input('Wait...')
The draw() function should delete all plt objects, but they still use up memory after the function is called. I just check it with top/htop!
It seems from your hpy dump that the memory hog consists of a large number of matplotlib.path.Paths. This may be due to your variable lc. Have you tried del lc? It may be that plt.close is not (at least should not be!) able to delete them, as they are in your global variable lc.

Categories