how to format larger dataframe better using pdfwriter using matplotlib - python

When a large dataframe is been transformed to a pdf it tries fit too much on to one page so information looks small on pdf. How can I fix this so it formats correctly for small and larger dataframes?
fig, ax = plt.subplots(figsize=(12, 4))
ax.axis('tight')
ax.axis('off')
ax.set_title(title)
ax.table(cellText=panda_df.values, colLabels=panda_df.columns, loc='center')
pdf.savefig(fig,bbox_inches='tight')
plt.close()

Related

Need help plot matrix binary python

Ask for help in like python.
I have a pandas where the rows are the "people" and the columns are the subjects. When it has the value "1", it means that there is a relationship between the two and "zero" for no. That simple.
As well as plotting a binary matrix between this relationship, and the people and x the subjects.
The question is that I can't even make this plot "smaller" according to the photo of the objective. I always come across the "trace".
Example code:
matrixNumpy = matrix.to_numpy()
fig=plt.figure(figsize=(20, 20))
fig.add_subplot(2, 4, 1)
plt.imshow(matrixNumpy, aspect='auto', interpolation='none', cmap='Greys')
Pandas
Objetive
how is it currently
New photos
G = Graph Bipartite
create matrix
plot
matrix = bipartite.biadjacency_matrix(G, Hash, assunto).todense()
matrix = pd.DataFrame(matrix, index=Hash, columns=assunto)
matrix = matrix.squeeze()
matrix
matrixNumpy = matrix.to_numpy()
matrixNumpy.shape
fig, axes = plt.subplots(1,2, figsize=(15,15))
ax = axes[0]
ax.imshow(matrixNumpy, aspect='auto', cmap='Greys', )
ax = axes[1]
ax.imshow(total_sort_mat(matrixNumpy), aspect='auto', cmap='Greys',)
TY
Hard to copy the data from a screenshot, so there is my attempt to help you out.
Considering that you are using a 2D numpy at the end, let's go with a toy example
import numpy as np
import matplotlib.pyplot as plt
mat = np.random.choice([0, 1], size=(45,), p=[1./3, 2./3]).reshape((3,15))
If we plot this using aspect='auto', we get a result which is similar to what you don't want
plt.figure(figsize=(2,2))
plt.imshow(mat, aspect='auto', interpolation='none', cmap='Greys')
If you use aspect='equal', it returns
plt.imshow(mat, aspect='equal', interpolation='none', cmap='Greys')
The other possible reasons why it is not working might be
Since mentioned in your comment that you are getting an empty plot when aspect='auto', change your figsize=(15,15) to a smaller value like such as figsize=(1,1)
Even after changing the figsize if you are getting empty plot, then the matrix may be too large to be rendered. Try plotting a small portion first.
If you are in a Jupyter notebook, check if some of the previously executed cells are not affecting your variables.

How to make two separate plots in seaborn from the same dataframe in pandas?

I have a dataframe in pandas that I'm trying to create two separate plots from in the same function, one is an ordinary boxplot w/ jitter and the other is a violin plot.
I've tried saving them to two separate variables and then saving each of those to their own image files, but in each of those files, the plots seem to contain an overlay of both of them rather than each containing their own separate plot. Here's what the code looks like:
final_boxplot = sns.boxplot(data = df)
final_violin = sns.violinplot(data = df)
final_boxplot.figure.savefig('boxplot.png')
final_violin.figure.savefig('violin.png')
any ideas on what I might be doing wrong, or any alternatives?
You should create different instance of figures and
save:
fig,ax = plt.subplots()
sns.boxplot(data=df, ax=ax)
fig.savefig('boxplot.png')
fig, ax = plt.subplots()
sns.violinplot(data=df, ax=ax)
fig.savefig('violin.png')

Data not appearing on a python plot

I have a dataframe with date as index, floats as columns, filled with mostly NaN and a few floats.
I am plotting this dataframe using :
fig, ax = plt.subplots()
plot(df2[11][2:], linestyle='dashed',linewidth=2,label='xx')
ax.set(xlabel='xx', ylabel='xx', title='xx')
ax.grid()
ax.legend()
The plot window open but with no data appearing. But if I use markers instead of line, the data point will appears.
What should I correct to plot my graphs as lines?
edit Thanks, it worked like this :
s1 = np.isfinite(df2[11][2:])
fig, ax = plt.subplots()
plot(df2.index[2:][s1],df2[11][2:].values[s1], linestyle='-',linewidth=2,label='xx')
ax.set(xlabel='xx', ylabel='xx',title='xx')
ax.grid()
ax.legend()
Try
import matplotlib.pyplot as plt
fig = plt.figure()
plt.plot(df2[11][2:], linestyle='dashed',linewidth=2,label='xx')
plt.set(xlabel='xx', ylabel='xx', title='xx')
plt.grid()
plt.legend()
plt.show()
In your case matplotlib won't draw a line between points separated by NaNs. You can mask NaNs or get rid of them. Have a look at the link below, there are some solutions to draw lines skipping NaNs.
matplotlib: drawing lines between points ignoring missing data

Saving matplotlib subplot figure to image file

I'm fairly new to matplotlib and am limping along. That said, I haven't found an obvious answer to this question.
I have a scatter plot I wanted colored by groups, and it looked like plotting via a loop was the way to roll.
Here is my reproducible example, based on the first link above:
import matplotlib.pyplot as plt
import pandas as pd
from pydataset import data
df = data('mtcars').iloc[0:10]
df['car'] = df.index
fig, ax = plt.subplots(1)
plt.figure(figsize=(12, 9))
for ind in df.index:
ax.scatter(df.loc[ind, 'wt'], df.loc[ind, 'mpg'], label=ind)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2)
# plt.show()
# plt.savefig('file.png')
Uncommenting plt.show() yields what I want:
Searching around, it looked like plt.savefig() is the way to save a file; if I re-comment out plt.show() and run plt.savefig() instead, I get a blank white picture. This question, suggests this is cause by calling show() before savefig(), but I have it entirely commented out. Another question has a comment suggesting I can save the ax object directly, but that cuts off my legend:
The same question has an alternative that uses fig.savefig() instead. I get the same chopped legend.
There's this question which seems related, but I'm not plotting a DataFrame directly so I'm not sure how to apply the answer (where dtf is the pd.DataFrame they're plotting):
plot = dtf.plot()
fig = plot.get_figure()
fig.savefig("output.png")
Thanks for any suggestions.
Edit: to test the suggestion below to try tight_layout(), I ran this and still get a blank white image file:
fig, ax = plt.subplots(1)
plt.figure(figsize=(12, 9))
for ind in df.index:
ax.scatter(df.loc[ind, 'wt'], df.loc[ind, 'mpg'], label=ind)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2)
fig.tight_layout()
plt.savefig('test.png')
Remove the line plt.figure(figsize=(12, 9)) and it will work as expected. I.e. call savefig before show.
The problem is that the figure being saved is the one created by plt.figure(), while all the data is plotted to ax which is created before that (and in a different figure, which is not the one being saved).
For saving the figure including the legend use the bbox_inches="tight" option
plt.savefig('test.png', bbox_inches="tight")
Of course saving the figure object directly is equally possible,
fig.savefig('test.png', bbox_inches="tight")
For a deeper understanding on how to move the legend out of the plot, see this answer.
Additional add-up on #ImportanceOfBeingErnest's answer, when bbox_inches='tight', 'pad_inches=0.1' may need to set to larger values.

Matplotlib savefig to eps ignores visibility=False

I'm trying to save a streamline plot as an EPS, and then convert it to PDF using epstopdf, since this gives a much smaller filesize.
I use multiple subplots that share their x and y axis. I add one overarching subplot, so I can easily add a xlabel and ylabel. I set frameon=False, so it doesn't appear. Afterward, I set the spines and ticks of this axis off. When the figure is displayed, I do not see anything from the big axis. So far, so good.
The problem appears when I save the figure. Saving to EPS and then converting to PDF makes the ticklabels appear, and interfere with my text. Removing the ticklabels outright is also no good, since the spacing then places the labels among the ticklabels of the plots that I do want to see. Curiously, saving to pdf does not have this problem, but the file size is 11 times greater.
Does anyone know what I'm doing wrong, or what is going on?
Working example:
import matplotlib.pyplot as plt
import numpy as np
import subprocess
fig, ax = plt.subplots(2, 2, sharex=True, sharey=True)
ax = ax.flatten()
ax = np.append(ax, fig.add_subplot(1, 1, 1, frameon=False))
ax[-1].spines['top'].set_color('none')
ax[-1].spines['bottom'].set_color('none')
ax[-1].spines['left'].set_color('none')
ax[-1].spines['right'].set_color('none')
ax[-1].tick_params(
labelcolor='none', top='off', bottom='off', left='off', right='off')
ax[-1].set_xlabel('$u$', fontsize=14)
ax[-1].set_ylabel('$v$', fontsize=14)
plt.setp(ax[-1].get_xticklabels(), visible=False)
fig.savefig('TestPdf.pdf')
fig.savefig('TestEps.eps')
subprocess.check_call(['epstopdf', 'TestEps.eps'])
plt.show()
You can try some other backends. For examle pgf backend (available in matplotlib 1.3+) is also capable of producing pdf:
import matplotlib
matplotlib.use("pgf")
You can get a list of backends available with:
matplotlib.rcsetup.all_backends
And You can check wither backend supports eps or pdf with:
import matplotlib
matplotlib.use("BACKEND")
import matplotlib.pyplot as plt
fig = plt.figure()
print fig.canvas.get_supported_filetypes()

Categories