Get 'processed' terminal output in Python - python

I have some bytes that have escape sequences that move the cursor around, which causes the terminal to overwrite some text in the output. I want to "process" the bytes so that I only get the final output that appears when I print them out. For example:
b = b'Using default tag: latest\r\nlatest: Pulling from library/hello-world\r\n\r\n\x1b[1A\x1b[2K\r0e03bdcc26d7: Pulling fs layer \r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Downloading 423B/2.529kB\r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Download complete \r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Extracting 2.529kB/2.529kB\r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Extracting 2.529kB/2.529kB\r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Pull complete \r\x1b[1BDigest: sha256:31b9c7d48790f0d8c50ab433d9c3b7e17666d6993084c002c2ff1ca09b96391d\r\nStatus: Downloaded newer image for hello-world:latest\r\ndocker.io/library/hello-world:latest\r\n'
s = b.decode("utf-8")
print(s)
The output of the above script is:
Using default tag: latest
latest: Pulling from library/hello-world
Digest: sha256:31b9c7d48790f0d8c50ab433d9c3b7e17666d6993084c002c2ff1ca09b96391d
Status: Downloaded newer image for hello-world:latest
docker.io/library/hello-wo
I want to get the above output as a string inside python, as the string you see above, instead of with escape sequences and overwritten text. You can see the difference when you compare the lengths:
b = b'Using default tag: latest\r\nlatest: Pulling from library/hello-world\r\n\r\n\x1b[1A\x1b[2K\r0e03bdcc26d7: Pulling fs layer \r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Downloading 423B/2.529kB\r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Download complete \r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Extracting 2.529kB/2.529kB\r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Extracting 2.529kB/2.529kB\r\x1b[1B\x1b[1A\x1b[2K\r0e03bdcc26d7: Pull complete \r\x1b[1BDigest: sha256:31b9c7d48790f0d8c50ab433d9c3b7e17666d6993084c002c2ff1ca09b96391d\r\nStatus: Downloaded newer image for hello-world:latest\r\ndocker.io/library/hello-world:latest\r\n'
s = b.decode("utf-8")
desired_output = """Using default tag: latest
latest: Pulling from library/hello-world
Digest: sha256:31b9c7d48790f0d8c50ab433d9c3b7e17666d6993084c002c2ff1ca09b96391d
Status: Downloaded newer image for hello-world:latest
docker.io/library/hello-world:latest"""
print(len(s), len(desired_output))
Output:
544 238
One is almost two times longer than the other, even though print(s) and print(desired_output) result in the same text appearing in the terminal.
Initially I used a pyte, a terminal emulator library, where I could feed the bytes into a emulated terminal and "screengrab" the terminal for raw text afterwards. Unfortunately this method was really slow (up to 7s for 1 million bytes).
There must be a faster way, just because my actual terminal is able to process that many bytes in much less time.
I've been searching for a solution for a while now, so I would also really appreciate any leads or partial answers.

Related

Adding custom extratags with tifffile

I'm trying to write a script to simplify my everyday life in the lab. I operate one ThermoFisher / FEI scanning electron microscope and I save all my pictures in the TIFF format.
The microscope software is adding an extensive custom TiffTag (code 34682) containing all the microscope / image parameters.
In my script, I would like to open an image, perform some manipulations and then save the data in a new file, including the original FEI metadata. To do so, I would like to use a python script using the tifffile module.
I can open the image file and perform the needed manipulations without problems. Retrieving the FEI metadata from the input file is also working fine.
I was thinking to use the imwrite function to save the output file and using the extratags optional argument to transfer to the output file the original FEI metadata.
This is an extract of the tifffile documentation about the extratags:
extratags : sequence of tuples
Additional tags as [(code, dtype, count, value, writeonce)].
code : int
The TIFF tag Id.
dtype : int or str
Data type of items in 'value'. One of TIFF.DATATYPES.
count : int
Number of data values. Not used for string or bytes values.
value : sequence
'Count' values compatible with 'dtype'.
Bytes must contain count values of dtype packed as binary data.
writeonce : bool
If True, the tag is written to the first page of a series only.
Here is a snippet of my code.
my_extratags = [(input_tags['FEI_HELIOS'].code,
input_tags['FEI_HELIOS'].dtype,
input_tags['FEI_HELIOS'].count,
input_tags['FEI_HELIOS'].value, True)]
tifffile.imwrite('output.tif', data, extratags = my_extratags)
This code is not working and complaining that the value of the extra tag should be ASCII 7-bit encoded. This looks already very strange to me because I haven't touched the metadata and I am just copying it to the output file.
If I convert the metadata tag value in a string as below:
my_extratags = [(input_tags['FEI_HELIOS'].code,
input_tags['FEI_HELIOS'].dtype,
input_tags['FEI_HELIOS'].count,
str(input_tags['FEI_HELIOS'].value), True)]
tifffile.imwrite('output.tif', data, extratags = my_extratags)
the code is working, the image is saved, the metadata corresponding to 'FEI_HELIOS' is created but it is empty!
Can you help me in finding what I am doing wrongly?
I don't need to use tifffile, but I would prefer to use python rather than ImageJ because I have already several other python scripts and I would like to integrate this new one with the others.
Thanks a lot in advance!
toto
ps. I'm a frequent user of stackoverflow, but this is actually my first question!
In principle the approach is correct. However, tifffile parses the raw values of certain tags, including FEI_HELIOS, to dictionaries or other Python types. To get the raw tag value for rewriting, it needs to be read from file again. In these cases, use the internal TiffTag._astuple function to get an extratag compatible tuple of the tag, e.g.:
import tifffile
with tifffile.TiffFile('FEI_SEM.tif') as tif:
assert tif.is_fei
page = tif.pages[0]
image = page.asarray()
... # process image
with tifffile.TiffWriter('copy1.tif') as out:
out.write(
image,
photometric=page.photometric,
compression=page.compression,
planarconfig=page.planarconfig,
rowsperstrip=page.rowsperstrip,
resolution=(
page.tags['XResolution'].value,
page.tags['YResolution'].value,
page.tags['ResolutionUnit'].value,
),
extratags=[page.tags['FEI_HELIOS']._astuple()],
)
This approach does not preserve Exif metadata, which tifffile cannot write.
Another approach, since FEI files seem to be written uncompressed, is to directly memory map the image data in the file to a numpy array and manipulate that array:
import shutil
import tifffile
shutil.copyfile('FEI_SEM.tif', 'copy2.tif')
image = tifffile.memmap('copy2.tif')
... # process image
image.flush()
Finally, consider tifftools for rewriting TIFF files where tifffile is currently failing, e.g. Exif metadata.

Merge audio (m4s) segments into one

I recently started learning Laravel, and currently watching an online course. Online courses are fine, but I like to have local copies, so I'm trying to download/merge segmented audio from Laracasts: Laravel 8 From Scratch series.
I've written some scripts (in Python) that does the following:
Download the master.json
Read master.json and download audio segments
Merge the segments into a single file (the file is not playable yet)
Process the audio file via ffmpeg (now it's playable, but has issues)
I think there's a problem with the step 3 and/or 4.
In step/script 3, I create a new file, and add the contents of the segments to the file in binary.
Then (step/script 4), run a ffmpeg command in python: ffmpeg -i merged-file.mp4 -c copy processed-file.mp4
However, the final file doesn't work/play as expected. There's a delay in the beginning, and some parts seem to be cut off/skipped.
There are three possibilities:
Segment files are problematic (not likely?)
I'm doing the merging wrong
I'm doing the ffmpeg processing wrong
Can someone guide me here?
The issues/colored parts in the ffmpeg output are:
...
[mov,mp4,m4a,3gp,3g2,mj2 # 000001cfbc0de780] could not find corresponding track id 2
[mov,mp4,m4a,3gp,3g2,mj2 # 000001cfbc0de780] could not find corresponding trex (id 2)
...
[aac # 000001cfbc0f0380] Number of bands (31) exceeds limit (6).
...
[mp4 # 000001cfbc20ecc0] track 0: codec frame size is not set
...
[mp4 # 000001cfbc20ecc0] Non-monotonous DTS in output stream 0:0; previous: 318318, current: 286286; changing to 318319. This may result in incorrect timestamps in the output file.
...
Everything required for a test case is located in GitHub (akinuri/dump/m4s-segments/). Screenshot of the contents:
Note: there are two types/formats of audio in the master.json: mp42 and dash. dash works as expected, and seem to be used in limited videos/courses. On the other hand, mp42 appears more. So I need a way to make mp42 work.

MoviePy - getting progress bar values

I am running a python script which converts a video file to a audio clip using moviepy.
def convert(mp3_file,mp4_file):
videoclip = VideoFileClip(mp4_file)
audioclip = videoclip.audio
audioclip.write_audiofile(mp3_file)
audioclip.close()
videoclip.close()
I found out that Moviepy uses a library called Proglog to print a command line progress bar.
How do I get these process completion percentage values?
WARNING: Bruteforce ahead!
It is possible, however, moviepy has no official method to do this.
We can get the progress inside the source code of the module
we need to edit the module code and follow the steps. [This is for ubuntu linux]
Step 1 -
go to the ffmpg_writter.py [usually in /home/anipr/.local/lib/python3.10/site-packages/moviepy/video/io/ffmpeg_writer.py]
Step 2 -
go to line 221 and add calculate the percentage based on duration - get_the_persentage_of_progress = (t/clip.duration)*100
Step - 3
do anything with the variable get_the_persentage_of_progress
however, getting the variable back to our original code is a bit tricky.
here are a few ideas -
write it to a file.txt
insert into database
[let me know in comments if any better idea]

Error in writing video file using moviepy and ffmpeg

I have been working with moviepy library for the first time. I have a video clip around 7 hours long, and I'd like to clip it into small clips. I have a list of start and end time.
video = VideoFileClip("videoFile.mp4")
clips = []
for cut in cuts:
clip = video.subclip(cut[0], cut[1])
clips.append(clip)
clips
clip = video.subclip("7:32:18", "7:38:38")
clips.append(clip)
for clip, title in zip(clips, title_list):
clip.write_videofile(title + '.mp4', threads=8, fps=24, audio=True, codec='libx264',preset=compression)
video.close()
clips[] contain the start and end time for clipping. I have a list of title too which I have scraped from youtube. I have not included the two lists here but a small example could be:
cuts = [('0:00', '2:26'),
('2:26', '5:00'),
('5:00', '7:15'),
('7:15', '10:57'),
('10:57', '18:00'),
('18:00', '18:22'),
('18:22', '19:57'),
('19:57', '20:37'),
('20:37', '28:27'),
('28:27', '40:32'),
('40:32', '49:57'),...
title_list = ['Introduction (What is Todoist?), tech stack talk', 'Showing the final application (with dark mode!)', 'Installing create react app', "Clearing out what we don't need from create react app", "Let's get building our components!", 'Installing packages using Yarn', 'Building the Header component', 'Building the Content component',...
OSError: [Errno 32] Broken pipe
MoviePy error: FFMPEG encountered the following error while writing file Introduction(WhatisTodoist?),techstacktalkTEMP_MPY_wvf_snd.mp3:
b'Introduction(WhatisTodoist?),techstacktalkTEMP_MPY_wvf_snd.mp3: Invalid argument\r\n'
In case it helps, make sure you are using a recent version of FFMPEG (the versions in the Ubuntu/Debian repos are deprecated).
Above is the error I am getting after running the write_videofile(). I have looked at the documentation and the issues on github, I tried updating the ffmpeg through pip too. I don't know why it can't write the audio file.
The reason could be the special characters in the file name.
At least on windows you can not have '?' in the file name.
You can try other names to check if this is the problem.

pdflatex hang after large number of figures

I have a script that generates a number of figures and puts them in the appendix of a report, e.g.
Appendix
********
.. figure:: images/generated/image_1.png
.. figure:: images/generated/image_2.png
.. figure:: images/generated/image_3.png
... etc
It looks like after a large number (~50) of images, my pdflatex command will hang, and point to one of the graphics in my .tex file around here
...
\begin(figure)[htbp]
\centering
\noindent\sphinxincludegraphics{{image_49}.png}
\end{figure}
\begin(figure)[htbp]
\centering
\noindent\sphinxincludegraphics{{image_50}.png} <--- here
\end{figure}
\begin(figure)[htbp]
\centering
\noindent\sphinxincludegraphics{{image_51}.png}
\end{figure}
...
When pdflatex fails I can't really figure out what to make from the console output, I get a number of these lines which seem to be good news
<image_48.png, id=451, 411.939pt x 327.3831pt>
File: image_48.png Graphic file (type png)
<use image_48.png>
Package pdftex.def Info: image_48.png used on input line 1251.
(pdftex.def) Requested size: 411.93797pt x 327.3823pt.
<image_49.png, id=452, 411.939pt x 327.3831pt>
File: image_49.png Graphic file (type png)
<use image_49.png>
Package pdftex.def Info: image_49.png used on input line 1257.
(pdftex.def) Requested size: 411.93797pt x 327.3823pt.
Then after the last successful image (~50) it starts outputting
! Output loop---100 consecutive dead cycles.
\end#float ...loatpenalty <-\#Mii \penalty -\#Miv
\#tempdima \prevdepth \vbo...
l.1258 \end{figure}
I've concluded that your \output is awry; it never does a
\shipout, so I'm shipping \box255 out myself. Next time
increase \maxdeadcycles if you want me to be more patient!
[9
! Undefined control sequence.
\reserved#a ->\#nil
l.1258 \end{figure}
The control sequence at the end of the top line
of your error message was never \def'ed. If you have
misspelled it (e.g., `\hobx'), type `I' and the correct
spelling (e.g., `I\hbox'). Otherwise just continue,
and I'll forget about whatever was undefined.
If all I do is reduce the number of figures, it will run and produce a pdf without issue. Is there a hard limit to the number of images a section can have? Is there somewhere else I can look in the build log to narrow down why this is happening?
This seemed to be a combination of a couple things.
The first symptom was essentially an error caused by too many unprocessed floats. The fix for this was to add the following to the babel element of latex_elements
\usepackage[maxfloats=256]{morefloats}
The second symptom was complaining about Output loop---100 consecutive dead cycles. so the fix was simply to increase the number of cycles
\maxdeadcycles=1000
After these two adjustments, the pdflatex command will finish successfully now, even with a large number of figures.
I had this problem and the above suggestions did not work. I was however able to get it to run just fine by inserting subsections which may or may not compatible with your objectives. The script generates code as follows which is then input into another code snippet to preview the generated images,
( I'm generating svg plots from c++, converting to png, and previewing essentially raw data for selection into later plots that go into an actual document not just a collection of images )
\subsection{svghappy2.tyrosine.png}
\begin{figure}[htbp]
\testplot{svghappy2_tyrosine.png}
\caption{svghappy2.tyrosine.png}
\end{figure}
\subsection{svghappy2.valine.png}
\begin{figure}[htbp]
\testplot{svghappy2_valine.png}
\caption{svghappy2.valine.png}
\end{figure}
As the problem arises from the compiler having hard time to set all the images. Splitting between them would help. As #mike-marchywka noted sections may do the trick, but so would other things, such as \pagebreak or \FloatBarrier from placeins

Categories