pdflatex hang after large number of figures

pdflatex hang after large number of figures - python

I have a script that generates a number of figures and puts them in the appendix of a report, e.g.
Appendix
********
.. figure:: images/generated/image_1.png
.. figure:: images/generated/image_2.png
.. figure:: images/generated/image_3.png
... etc
It looks like after a large number (~50) of images, my pdflatex command will hang, and point to one of the graphics in my .tex file around here
...
\begin(figure)[htbp]
\centering
\noindent\sphinxincludegraphics{{image_49}.png}
\end{figure}
\begin(figure)[htbp]
\centering
\noindent\sphinxincludegraphics{{image_50}.png} <--- here
\end{figure}
\begin(figure)[htbp]
\centering
\noindent\sphinxincludegraphics{{image_51}.png}
\end{figure}
...
When pdflatex fails I can't really figure out what to make from the console output, I get a number of these lines which seem to be good news
<image_48.png, id=451, 411.939pt x 327.3831pt>
File: image_48.png Graphic file (type png)
<use image_48.png>
Package pdftex.def Info: image_48.png used on input line 1251.
(pdftex.def) Requested size: 411.93797pt x 327.3823pt.
<image_49.png, id=452, 411.939pt x 327.3831pt>
File: image_49.png Graphic file (type png)
<use image_49.png>
Package pdftex.def Info: image_49.png used on input line 1257.
(pdftex.def) Requested size: 411.93797pt x 327.3823pt.
Then after the last successful image (~50) it starts outputting
! Output loop---100 consecutive dead cycles.
\end#float ...loatpenalty <-\#Mii \penalty -\#Miv
\#tempdima \prevdepth \vbo...
l.1258 \end{figure}
I've concluded that your \output is awry; it never does a
\shipout, so I'm shipping \box255 out myself. Next time
increase \maxdeadcycles if you want me to be more patient!
[9
! Undefined control sequence.
\reserved#a ->\#nil
l.1258 \end{figure}
The control sequence at the end of the top line
of your error message was never \def'ed. If you have
misspelled it (e.g., `\hobx'), type `I' and the correct
spelling (e.g., `I\hbox'). Otherwise just continue,
and I'll forget about whatever was undefined.
If all I do is reduce the number of figures, it will run and produce a pdf without issue. Is there a hard limit to the number of images a section can have? Is there somewhere else I can look in the build log to narrow down why this is happening?

This seemed to be a combination of a couple things.
The first symptom was essentially an error caused by too many unprocessed floats. The fix for this was to add the following to the babel element of latex_elements
\usepackage[maxfloats=256]{morefloats}
The second symptom was complaining about Output loop---100 consecutive dead cycles. so the fix was simply to increase the number of cycles
\maxdeadcycles=1000
After these two adjustments, the pdflatex command will finish successfully now, even with a large number of figures.

I had this problem and the above suggestions did not work. I was however able to get it to run just fine by inserting subsections which may or may not compatible with your objectives. The script generates code as follows which is then input into another code snippet to preview the generated images,
( I'm generating svg plots from c++, converting to png, and previewing essentially raw data for selection into later plots that go into an actual document not just a collection of images )
\subsection{svghappy2.tyrosine.png}
\begin{figure}[htbp]
\testplot{svghappy2_tyrosine.png}
\caption{svghappy2.tyrosine.png}
\end{figure}
\subsection{svghappy2.valine.png}
\begin{figure}[htbp]
\testplot{svghappy2_valine.png}
\caption{svghappy2.valine.png}
\end{figure}

As the problem arises from the compiler having hard time to set all the images. Splitting between them would help. As #mike-marchywka noted sections may do the trick, but so would other things, such as \pagebreak or \FloatBarrier from placeins

Related

Merge audio (m4s) segments into one

I recently started learning Laravel, and currently watching an online course. Online courses are fine, but I like to have local copies, so I'm trying to download/merge segmented audio from Laracasts: Laravel 8 From Scratch series.
I've written some scripts (in Python) that does the following:
Download the master.json
Read master.json and download audio segments
Merge the segments into a single file (the file is not playable yet)
Process the audio file via ffmpeg (now it's playable, but has issues)
I think there's a problem with the step 3 and/or 4.
In step/script 3, I create a new file, and add the contents of the segments to the file in binary.
Then (step/script 4), run a ffmpeg command in python: ffmpeg -i merged-file.mp4 -c copy processed-file.mp4
However, the final file doesn't work/play as expected. There's a delay in the beginning, and some parts seem to be cut off/skipped.
There are three possibilities:
Segment files are problematic (not likely?)
I'm doing the merging wrong
I'm doing the ffmpeg processing wrong
Can someone guide me here?
The issues/colored parts in the ffmpeg output are:
...
[mov,mp4,m4a,3gp,3g2,mj2 # 000001cfbc0de780] could not find corresponding track id 2
[mov,mp4,m4a,3gp,3g2,mj2 # 000001cfbc0de780] could not find corresponding trex (id 2)
...
[aac # 000001cfbc0f0380] Number of bands (31) exceeds limit (6).
...
[mp4 # 000001cfbc20ecc0] track 0: codec frame size is not set
...
[mp4 # 000001cfbc20ecc0] Non-monotonous DTS in output stream 0:0; previous: 318318, current: 286286; changing to 318319. This may result in incorrect timestamps in the output file.
...
Everything required for a test case is located in GitHub (akinuri/dump/m4s-segments/). Screenshot of the contents:
Note: there are two types/formats of audio in the master.json: mp42 and dash. dash works as expected, and seem to be used in limited videos/courses. On the other hand, mp42 appears more. So I need a way to make mp42 work.

How to find objects in floor plan image in Tkinter python through svg file?

I have a vectorized floorplan image. I want to identify the objects in the image through the vector data in the SVG file of that image. The SVG code does not have any close points(z) in between them. So I am unable to understand when does the point moves to the other object? Can somebody help me, please?
I have very little knowledge about these SVG files and using them in Tkinter. So please somebody help me or suggest me what can I do?
This is the vector data of the image.
vector data of the image

use in conjunction with SO floorplan question.
Jump to z_final_floorplan.svg for final file.
A
Create 4 files:
w_original_floorplan.svg
x_rough_static_floorplan.svg
y_rough_live_floorplan.svg
z_final_floorplan.svg
w_original_floorplan.svg and x_rough_static_floorplan.svg are identical apart from filename.
y_rough_live_floorplan.svg and z_final_floorplan.svg are empty; to be populated.
Copy x_rough_static_floorplan.svg to y_rough_live_floorplan.svg.
Open y_rough_live_floorplan.svg on browser using server.
x_rough_static_floorplan.svg find all M and replace with two newlines / symbol M (case sensitive). shift + enter shift + enter /M
B
[this section takes the time]
Take away 1st '/' in path in y_rough_live_floorplan.svg [shows blackout_floorplan]
Label x_rough_static_floorplan.svg code section blackout_floorplan where code is.
(this file is used as rough-work, so being xml / svg valid is irrelevant)
In y_rough_live_floorplan.svg find next '/' and delete it [shows floorplan_top_left_whiteout]
Label x_rough_static_floorplan.svg code section floorplan_top_left_whiteout where code is.
Have x_rough_static_floorplan.svg and y_rough_live_floorplan.svg open in 2 windows, will be going back and forth to each of them. Keep repeating until at end.
(hint: find tool seems to be on switching from files in vscode, so you can use find / and next one cmd + g easily) Maybe handy to have a paper printout of original svg as reference and label the names of objects you create e.g.bath, sink, table, as you go along (don’t be fooled by this, one table is 'table'. Is 2nd chair chair2, chair_2, chair_two etc.?) etc..
C
Reorder the whole labels and corresponding code in path x_rough_static_floorplan.svg so the labels are ordered next to each other, but in the order they are found in the path:
e.g.
…
floorplan
bath
sink
table_chairs
sofa
…
Use the 'find' tool here. This process, itself will require a temp file to copy and paste to rather than reorder within the file working on. And rewrite temp to file working on. Might be good idea to create checklist of objects and cross-off as done.
E.g. floorplan, bath, table_chairs, sink…
D
Create path elements from your grouped objects, putting each id as id=“floorplan_main”, id=“bath”, id=“sink” etc.. etc..
Bear in mind, the data of how this is drawn is really, really bad. Really they should be drawn with rect elements for a rectangle when possible and a lot of the path data is very unnecessary, but that’s obviously how the application generates the svg.

Digital cursor movement recorded in PDF creation visible in CorelDraw

I am using CairoSVG to turn an SVG into a PDF. I have found a bug where each line of text has a 1 dimensional mark at the end when the PDF is loaded into CorelDraw.
I find this is true even with the simplest and barest of SVGs. I then decided to dig into the python source of CairoSVG to fix the issue at it's root. I have found text.py to be the place where the issue occurs.
https://github.com/Kozea/CairoSVG/blob/master/cairosvg/text.py
Specifically, the for [x, y, dx, dy, r], letter in letters_positions: for loop. I can actually manipulate the mark by doing this after the for loop.
surface.context.move_to( 100, 100 )
surface.context.save()
surface.context.text_path(' ')
surface.context.restore()
Depending on the values I enter, it will move one of the points (in this case the lower right point of the mark). This suggests that when CairoSVG is done with a line of text, the surface will move the cursor to the start of the last letter it was drawing. For some reason, the last time it does this is actually recorded into the PDF, although for most PDF viewers it does not render. I have tried to fine tune this mark so the two points are the same, but because this location is different each time depending on letter and location and more, I can't fine tune this number to make the mark disappear, even when using available width and height variables
Note that surface.context is not a file or function and surface is pprinted as <cairosvg.surface.PDFSurface object at 0x7f647802ec90>, it's some sort of Cairo interface for interacting with the PDF file directly through some sort of PDF API, although I haven't found any documentation on it.
I have tried lots of cheap work arounds, like saving the PDF in another program then opening in Corel, however elements are lost or altered in ways that are not acceptable.
I have also of course tried commenting out this file 1 line at a time, then chunks of code at a time, but with no luck.
I have also looked into other files such as path.py, parser.py, and more, but text.py seems to be the most promising location.
How can I prevent this mark from appearing in my PDF files?

I have found I could get the desired results by going into the CairoSVG code and using a different function to create the text elements onto the PDF.
This collects the letters into a variable.
# Collect entire line instead of letter
text_line_letters = ''
for [x, y, dx, dy, r], letter in letters_positions:
text_line_letters = text_line_letters + letter
# Skip rest of for loop
continue
This uses the new function to creates the actual text onto the PDF.
# Create Text with glyph_path
surface.context.save()
font = surface.context.get_scaled_font()
glyphs, clusters, is_backwards = font.text_to_glyphs(
0, -5, text_line_letters, with_clusters=True)
surface.context.glyph_path(glyphs)
surface.context.restore()
The actual root cause of the issue is unknown, but likely deeper inside the Cairo library itself.
https://github.com/bonzini/cairo/blob/9099c7e7307a39bc630919faa65bba089fd15104/src/cairo.c#L3349

Convolving Room Impulse Response with a Wav File (python)

I have written the following code which is supposed to put echo over an available sound file. Unfortunately the output is a very noisy result which I don't exactly understand. Can anybody help me with regard to this? Is there any skipped step?
#convolving a room impulse response function with a sound sample both of stereo type
from scipy.io import wavfile
inp=wavfile.read(sound_path+sound_file_name)
IR=wavfile.read(IR_path+IR_file_name)
if inp[0]!=IR[0]:
print "Size mismatch"
sys.exit(-1)
else:
rate=inp[0]
print sound_file_name
out_0=fftconvolve(inp[1][:,1],IR[1][:,0])
out_1=fftconvolve(inp[1][:,1],IR[1][:,1])
in_counter+=1
out=np.vstack((out_0,out_1)).T
out[:inp[1].shape[0]]=out[:inp[1].shape[0]]+inp[1]
wavfile.write(sound_path+sound_file_name+'_echoed.wav',rate,out)

Adding echo to a sound file is just that... adding echo. Your code doesn't look like it's adding two sounds together; it looks like it's transforming the input sound into something else.
Your data flow should look something like this:
source sound ------------------------------>|
| + ----------> target sound
---------> convolution echo --------->|
Note that your echo sound is going to be longer than your original sound (i.e. it has a "tail.")
Adding two sounds together is simply a matter of adding each of the individual samples together from both sounds to produce a new output wave. I don't think vstack does that.

Apparently Wav files are imported as int16 files and modification should be done after converting them to floats:
http://nbviewer.ipython.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-pysoundfile.ipynb
After convolution one needs to renormalize again. And thats it.
Hope this helps the others too.
from utility import pcm2float,float2pcm
input_rate,input_sig=wavfile.read(sound_path+sound_file_name)
input_sig=pcm2float(input_sig,'float32')
IR_rate,IR_sig=wavfile.read(IR_path+IR_file_name)
IR_sig=pcm2float(IR_sig,'float32')
if input_rate!=IR_rate:
print "Size mismatch"
sys.exit(-1)
else:
rate=input_rate
print sound_file_name
con_len=-1
out_0=fftconvolve(input_sig[:con_len,0],IR_sig[:con_len,0])
out_0=out_0/np.max(np.abs(out_0))
out_1=fftconvolve(input_sig[:con_len,1],IR_sig[:con_len,1])
out_1=out_0/np.max(np.abs(out_1))
in_counter+=1
out=np.vstack((out_0,out_1)).T
wavfile.write(sound_path+sound_file_name+'_'+IR_file_name+'_echoed.wav',rate,float2pcm(out,'int16'))
One can download utility from the above link.
UPDATE: Although it generates a working output its still not as good as the result when using the original website Openair for convolving.

How to unit test a Python function that draws PDF graphics?

I'm writing a CAD application that outputs PDF files using the Cairo graphics library. A lot of the unit testing does not require actually generating the PDF files, such as computing the expected bounding boxes of the objects. However, I want to make sure that the generated PDF files "look" correct after I change the code. Is there an automated way to do this? How can I automate as much as possible? Do I need to visually inspect each generated PDF? How can I solve this problem without pulling my hair out?

(See also update below!)
I'm doing the same thing using a shell script on Linux that wraps
ImageMagick's compare command
the pdftk utility
Ghostscript (optionally)
(It would be rather easy to port this to a .bat Batch file for DOS/Windows.)
I have a few reference PDFs created by my application which are "known good". Newly generated PDFs after code changes are compared to these reference PDFs. The comparison is done pixel by pixel and is saved as a new PDF. In this PDF, all unchanged pixels are painted in white, while all differing pixels are painted in red.
Here are the building blocks:
pdftk
Use this command to split multipage PDF files into multiple singlepage PDFs:
pdftk reference.pdf burst output somewhere/reference_page_%03d.pdf
pdftk comparison.pdf burst output somewhere/comparison_page_%03d.pdf
compare
Use this command to create a "diff" PDF page for each of the pages:
compare \
-verbose \
-debug coder -log "%u %m:%l %e" \
somewhere/reference_page_001.pdf \
somewhere/comparison_page_001.pdf \
-compose src \
somewhereelse/reference_diff_page_001.pdf
Ghostscript
Because of automatically inserted meta data (such as the current date+time), PDF output is not working well for MD5hash-based file comparisons.
If you want to automatically discover all cases which consist of purely white pages, you could also convert to a meta-data free bitmap format using the bmp256 output device. You can do that for the original PDFs (reference and comparison), or for the diff-PDF pages:
gs \
-o reference_diff_page_001.bmp \
-r72 \
-g595x842 \
-sDEVICE=bmp256 \
reference_diff_page_001.pdf
md5sum reference_diff_page_001.bmp
If the MD5sum is what you expect for an all-white page of 595x842 PostScript points, then your unit test passed.
Update:
I don't know why I didn't previously think of generating a histogram output from the ImageMagick compare...
The following is a command pipeline chaining 2 different commands:
the first one is the same as the above compare which generates the 'white pixels are equal, red pixels are differences'-format, only it outputs the ImageMagick internal miff format. It doesn't write to a file, but to stdout.
the second one uses convert to read stdin, generate a histogram and output the result in text form. There will be two lines:
one indicating the number of white pixels
the other one indicating the number of red pixels.
Here it goes:
compare \
reference.pdf \
current.pdf \
-compose src \
miff:- \
| \
convert \
- \
-define histogram:unique-colors=true \
-format %c \
histogram:info:-
Sample output:
56934: (61937, 0, 7710,52428) #F1F100001E1ECCCC srgba(241,0,30,0.8)
444056: (65535,65535,65535,52428) #FFFFFFFFFFFFCCCC srgba(255,255,255,0.8)
(Sample output was generated by using these reference.pdf and current.pdf files.)
I think this type of output is really well suited for automatic unit testing. If you evaluate the two numbers, you can easily compute the "red pixel" percentage and you could even decide to return PASSED or FAILED based on a certain threshold (if you don't necessarily need "zero red" for some reason).

You could capture the PDF as a bitmap (or at least a losslessly-compressed) image, and then compare the image generated by each test with a reference image of what it's supposed to look like. Any differences would be flagged as an error for the test.

The first idea that pops in my head is to use a diff utility. These are generally used to compare texts of documents but they might also compare the layout of the PDF. Using it, you can compare the expected output with the output supplied.
The first result google gives me is this. Altough it is commercial, there might be other free/open source alternatives.

I would try this using xpresser - (https://wiki.ubuntu.com/Xpresser ) You can try to match images to similar images not exact copies - which is the problem in these cases.
I don't know if xpresser is being ctively developed, or if it can be used with stand alone image files (I think so) -- anyway it takes its ideas from teh Sikuli project (which is Java with a Jython front end, while xpresser is Python).

I wrote a tool in Python to validate PDFs for my employer's documentation. It has the capability to compare individual pages to master images. I used a library I found called swftools to export the page to PNG, then used the Python Imaging Library to compare it with the master.
The relevant code looks something like this (this won't run as there are some dependencies on other parts of the script, but you should get the idea):
# exporting
gfxpdf = gfx.open("pdf", self.pdfpath)
if os.path.isfile(pngPath):
os.remove(pngPath)
page = gfxpdf.getPage(pagenum)
img = gfx.ImageList()
img.startpage(page.width, page.height)
page.render(img)
img.endpage()
img.save(pngPath)
return os.path.isfile(pngPath)
# comparing
outPng = os.path.join(outpath, pngname)
masterPng = os.path.join(outpath, "_master", pngname)
if os.path.isfile(masterPng):
output = Image.open(outPng).convert("RGB") # discard alpha channel, if any
master = Image.open(masterPng).convert("RGB")
mismatch = any(x[1] for x in ImageChops.difference(output, master).getextrema())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.