Select slides from pptx using python automation - python

Dears,
In order to print the specific slides from ppt, based on a list of items we use the basic PowerPoint tool Ctrl+F flowing this simple process, save the slide ID, then next, then save the second slide ID....... then print with those IDs and that takes a lot of workloads.
A base on the above statement we think to automate this task with python script.
this is my try :
from pptx import Presentation
filename = "C:/Users/RElKassah/Desktop/test.pptx"
prs = Presentation(filename)
text="test"
for slide in prs.slides:
if slide.shapes ==text:
title = slide.shapes.text.find = 'test'
print(title)
thank you very much and best regards

Here I wrote some simple loop to find text inside a pptx file, then print slide number of slides that contain that text. Hope it could help.
from pptx import Presentation
filename = 'test.pptx'
prs = Presentation(filename)
text="test"
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
if text in run.text:
print(prs.slides.index(slide)+1)

Related

Copy PowerPoint file with Python pptx

I'm trying to make a Python code to copy a PowerPoint file. I have the file test.pptx which contains pictures (can be totally ignored) and text, with different slides (title, tittle and content, etc). I need to copy the text from this file, and create a new .pptx file containing the text in the same format.
I already have a code to extract the text from the file, but I have no clue in what to do next. Any ideas? Thank you.
This is the code to read the text from the file
import collections.abc
from pptx import Presentation
prs = Presentation('test.pptx')
text_runs = []
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
text_runs.append(run.text)

How to extract ALL IMAGES and text from all pptx file slides using python?

I'm able to read images from pptx file but not all images. I'm unable to extract the images presented in a slide with title or other text. Here is my code and please help me.
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
import glob
import os
import codecs
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/usr/local/Cellar/tesseract/4.1.1/bin/tesseract'
from pytesseract import image_to_string
n=0
def write_image(shape):
global n
image = shape.image
# get image
image_bytes = image.blob
# assinging file name, e.g. 'image.jpg'
image_filename = fname[:-5]+'{:03d}.{}'.format(n, image.ext)
n += 1
print(image_filename)
os.chdir("directory_path/readpptx/images")
with open(image_filename, 'wb') as f:
f.write(image_bytes)
os.chdir("directory_path/readpptx")
def visitor(shape):
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
write_image(shape)
def iter_picture_shapes(prs1):
for slide in prs1.slides:
for shape in slide.shapes:
visitor(shape)
file = open("directory_path/MyFile.txt","a+")
for each_file in glob.glob("directory_path/*.pptx"):
fname = os.path.basename(each_file)
file.write("-------------------"+fname+"----------------------\n")
prs = Presentation(each_file)
print("---------------"+fname+"-------------------")
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
print(shape.text)
file.write(shape.text+"\n")
iter_picture_shapes(prs)
file.close()
Above code is able to extract images from pptx slides which have no text or title but not able to extract images in slides with text or title.
Try also iterating over slide masters and slide layouts. If there are "background" images that's where they will be. The same for shape in slide.shapes: mechanism works on slide masters and slide layouts; they are a variant of the polymorphic Slide object with the same shape-access semantics.
I don't think your problem is strictly related to the presence of a title or text on the slide. Perhaps those particular slides use a layout that includes some background images. If you open the slide and clicking on the image does not select it (give it bounding box) that indicates it is a background image and resides on the slide layout or possibly the slide master. This is how logos are commonly implemented to show up on every slide.
You may also want to consider iterating over the Notes slide for each slide when it has one, if there is text and/or images in there you are interested in. It is uncommon to find images in the slide notes but PowerPoint supports it.
Another approach is the traverse the underlying .pptx package (as a Zip archive) and extract the images that way.

How to set a background image in python pptx

I am currently working on a project aiming to create a PowerPoint thanks Python pptx. However I am trying to set an image as the background of the slide and I can’t seem to find the solution in the docs of Python pptx. Is it possible to set an image as background if so can someone help me ? If it is not does anyone know another solution using python ?
Thank you
import os
import fnmatch
from pptx import Presentation
#Create presentation and setting layout as blank (6)
prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
#Find number of slides to create
#First Method = Count number of images in screenshot files (change path depending on the user)
nbSlide = len(fnmatch.filter(os.listdir("mypath"), '*.jpeg'))
#Loop to create as number of slides as there is report pages
for i in range(nbSlide):
slide = prs.slides.add_slide(blank_slide_layout)
#change background with an image of the slide …
background=slide.background
#Final step = Creation and saving of pptx
prs.save('test.pptx')
Is it possible to set an image as background ... ?
So far with python-pptx there is no direct way to insert image as background of an slide
If it is not does anyone know another solution using python ?
You could insert picture of interest into given slide on the regular basis, considering proper width/height parameters:
#Loop to create as number of slides as there is report pages
for i in range(nbSlide):
slide = prs.slides.add_slide(blank_slide_layout)
#change background with an image of the slide …
left = top = 0
pic = slide.shapes.add_picture('/your_file.jpeg', left-0.1*prs.slide_width, top, height = prs.slide_height)

Unable to delete PowerPoint Slides using Python-pptx

I am trying to delete PowerPoint slides containing a specific keywords using Python-pptx. If the keyword is present anywhere in the slide then that slide will be deleted. My code is given below:
from pptx import Presentation
String = 'Macro'
ppt = Presentation('D:\\Shaon\\pptss\\Regional.pptx')
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
shape.text = String
slide.delete(slide)
ppt.save('BODd.pptx')
After execution I am getting a memory error. No clue how to resolve this issue. How can I delete ppt slides using some specific keywords?
You can delete a slide with a specific index value with the following code using the pptx library:
from pptx import Presentation
# create slides ------
presentation = Presentation('new.pptx')
xml_slides = presentation.slides._sldIdLst
slides = list(xml_slides)
xml_slides.remove(slides[index])
So to delete the first slide, index would be 0.
It is possible to delete the whole slides using the following code. So just use this before generating the slides to have a clean and empty PowerPoint file. By changing the index, you can also delete the specific slides.
import os
import pptx.util
from pptx import Presentation
cwd = os.getcwd()
prs = Presentation(cwd + '\\ppt.pptx')
for i in range(len(prs.slides)-1, -1, -1):
rId = prs.slides._sldIdLst[i].rId
prs.part.drop_rel(rId)
del prs.slides._sldIdLst[i]
I was trying to delete all slides but the cover from one pptx file to reuse the layouts and the best solution I got was to loop Ferhat´s answer.
It worked :)
# Ferhat´s showed us how to list the slides:
xml_slides = prs.slides._sldIdLst
slides = list(xml_slides)
# Then I loop for all except the first (index 0):
for index in range(1,len(slides)):
xml_slides.remove(slides[index])

How to set slide number in generated slide using python-pptx

I am generating new slides using python pptx and I see no way to update the slide number. I have set the footer in master layout but when I try to read the shapes, I don't see that component over there.
My code:
from pptx import Presentation
prs = Presentation('sample_ppt.pptx')
title_slide_layout = prs.slide_layouts[14]
slide = prs.slides.add_slide(title_slide_layout)
for shape in slide.shapes:
print(shape.name)
prs.save('hh.pptx')
This is a common confusion, possibly because the way PowerPoint handles footers is, well, confusing :)
The short answer is that you need to put a plain-textbox shape (not a footer placeholder) on the master slide, and insert into that textbox a slide-number field, using Insert > Slide Number from the menu. On the master, it will appear something like <#>, but on the slide that inherits from that master, it will appear as the slide number.

Categories