Copy PowerPoint file with Python pptx - python

I'm trying to make a Python code to copy a PowerPoint file. I have the file test.pptx which contains pictures (can be totally ignored) and text, with different slides (title, tittle and content, etc). I need to copy the text from this file, and create a new .pptx file containing the text in the same format.
I already have a code to extract the text from the file, but I have no clue in what to do next. Any ideas? Thank you.
This is the code to read the text from the file
import collections.abc
from pptx import Presentation
prs = Presentation('test.pptx')
text_runs = []
for slide in prs.slides:
for shape in slide.shapes:
if not shape.has_text_frame:
continue
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
text_runs.append(run.text)

Related

Select slides from pptx using python automation

Dears,
In order to print the specific slides from ppt, based on a list of items we use the basic PowerPoint tool Ctrl+F flowing this simple process, save the slide ID, then next, then save the second slide ID....... then print with those IDs and that takes a lot of workloads.
A base on the above statement we think to automate this task with python script.
this is my try :
from pptx import Presentation
filename = "C:/Users/RElKassah/Desktop/test.pptx"
prs = Presentation(filename)
text="test"
for slide in prs.slides:
if slide.shapes ==text:
title = slide.shapes.text.find = 'test'
print(title)
thank you very much and best regards
Here I wrote some simple loop to find text inside a pptx file, then print slide number of slides that contain that text. Hope it could help.
from pptx import Presentation
filename = 'test.pptx'
prs = Presentation(filename)
text="test"
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
for paragraph in shape.text_frame.paragraphs:
for run in paragraph.runs:
if text in run.text:
print(prs.slides.index(slide)+1)

How to save your tkinter text file as a image(.png)?

I made a Tkinter GUI, added text-widget to it, and added a save button but when I click the save button it save as a text (.txt file) and I want it to save the text as a picture which would be read-only(.png) can anybody help?
in simple words
whats happening=user save the file - it is saved as a txt file
what I want=user save the file - it is saved as a png Image
It is a text widget not a canvas
(Actually, first i wanted it to Do like because I was having an option that can change the color of the text according to the user's choice and save it as a pdf but that didn't work if you can do that would also work) :)
I think this is help you
from PIL import Image, ImageDraw
img = Image.new('RGB', (100, 30))
d = ImageDraw.Draw(img)
d.text((10,10), "Hello", fill=(255,255,0))
img.save('text.png')
This project use Pillow.First we create a new image.Then we add text to this image and change fill for this text.And Finally we save this image.
Write your cods in txt file and go to file menu and click on (save as) and save file with .png format
like (cod.png)

How to extract ALL IMAGES and text from all pptx file slides using python?

I'm able to read images from pptx file but not all images. I'm unable to extract the images presented in a slide with title or other text. Here is my code and please help me.
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
import glob
import os
import codecs
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/usr/local/Cellar/tesseract/4.1.1/bin/tesseract'
from pytesseract import image_to_string
n=0
def write_image(shape):
global n
image = shape.image
# get image
image_bytes = image.blob
# assinging file name, e.g. 'image.jpg'
image_filename = fname[:-5]+'{:03d}.{}'.format(n, image.ext)
n += 1
print(image_filename)
os.chdir("directory_path/readpptx/images")
with open(image_filename, 'wb') as f:
f.write(image_bytes)
os.chdir("directory_path/readpptx")
def visitor(shape):
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
write_image(shape)
def iter_picture_shapes(prs1):
for slide in prs1.slides:
for shape in slide.shapes:
visitor(shape)
file = open("directory_path/MyFile.txt","a+")
for each_file in glob.glob("directory_path/*.pptx"):
fname = os.path.basename(each_file)
file.write("-------------------"+fname+"----------------------\n")
prs = Presentation(each_file)
print("---------------"+fname+"-------------------")
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
print(shape.text)
file.write(shape.text+"\n")
iter_picture_shapes(prs)
file.close()
Above code is able to extract images from pptx slides which have no text or title but not able to extract images in slides with text or title.
Try also iterating over slide masters and slide layouts. If there are "background" images that's where they will be. The same for shape in slide.shapes: mechanism works on slide masters and slide layouts; they are a variant of the polymorphic Slide object with the same shape-access semantics.
I don't think your problem is strictly related to the presence of a title or text on the slide. Perhaps those particular slides use a layout that includes some background images. If you open the slide and clicking on the image does not select it (give it bounding box) that indicates it is a background image and resides on the slide layout or possibly the slide master. This is how logos are commonly implemented to show up on every slide.
You may also want to consider iterating over the Notes slide for each slide when it has one, if there is text and/or images in there you are interested in. It is uncommon to find images in the slide notes but PowerPoint supports it.
Another approach is the traverse the underlying .pptx package (as a Zip archive) and extract the images that way.

Unable to delete PowerPoint Slides using Python-pptx

I am trying to delete PowerPoint slides containing a specific keywords using Python-pptx. If the keyword is present anywhere in the slide then that slide will be deleted. My code is given below:
from pptx import Presentation
String = 'Macro'
ppt = Presentation('D:\\Shaon\\pptss\\Regional.pptx')
for slide in ppt.slides:
for shape in slide.shapes:
if shape.has_text_frame:
shape.text = String
slide.delete(slide)
ppt.save('BODd.pptx')
After execution I am getting a memory error. No clue how to resolve this issue. How can I delete ppt slides using some specific keywords?
You can delete a slide with a specific index value with the following code using the pptx library:
from pptx import Presentation
# create slides ------
presentation = Presentation('new.pptx')
xml_slides = presentation.slides._sldIdLst
slides = list(xml_slides)
xml_slides.remove(slides[index])
So to delete the first slide, index would be 0.
It is possible to delete the whole slides using the following code. So just use this before generating the slides to have a clean and empty PowerPoint file. By changing the index, you can also delete the specific slides.
import os
import pptx.util
from pptx import Presentation
cwd = os.getcwd()
prs = Presentation(cwd + '\\ppt.pptx')
for i in range(len(prs.slides)-1, -1, -1):
rId = prs.slides._sldIdLst[i].rId
prs.part.drop_rel(rId)
del prs.slides._sldIdLst[i]
I was trying to delete all slides but the cover from one pptx file to reuse the layouts and the best solution I got was to loop FerhatĀ“s answer.
It worked :)
# FerhatĀ“s showed us how to list the slides:
xml_slides = prs.slides._sldIdLst
slides = list(xml_slides)
# Then I loop for all except the first (index 0):
for index in range(1,len(slides)):
xml_slides.remove(slides[index])

Extract image position from .docx file using python-docx

I'm trying to get the image index from the .docx file using python-docx library. I'm able to extract the name of the image, image height and width. But not the index where it is in the word file
import docx
doc = docx.Document(filename)
for s in doc.inline_shapes:
print (s.height.cm,s.width.cm,s._inline.graphic.graphicData.pic.nvPicPr.cNvPr.name)
output
21.228 15.920 IMG_20160910_220903848.jpg
In fact I would like to know if there is any simpler way to get the image name , like s.height.cm fetched me the height in cm. My primary requirement is to get to know where the image is in the document, because I need to extract the image and do some work on it and then again put the image back to the same location
This operation is not directly supported by the API.
However, if you're willing to dig into the internals a bit and use the underlying lxml API it's possible.
The general approach would be to access the ImagePart instance corresponding to the picture you want to inspect and modify, then read and write the ._blob attribute (which holds the image file as bytes).
This specimen XML might be helpful:
http://python-docx.readthedocs.io/en/latest/dev/analysis/features/shapes/picture.html#specimen-xml
From the inline shape containing the picture, you get the <a:blip> element with this:
blip = inline_shape._inline.graphic.graphicData.pic.blipFill.blip
The relationship id (r:id generally, but r:embed in this case) is available at:
rId = blip.embed
Then you can get the image part from the document part
document_part = document.part
image_part = document_part.related_parts[rId]
And then the binary image is available for read and write on ._blob.
If you write a new blob, it will replace the prior image when saved.
You probably want to get it working with a single image and get a feel for it before scaling up to multiple images in a single document.
There might be one or two image characteristics that are cached, so you might not get all the finer points working until you save and reload the file, so just be alert for that.
Not for the faint of heart as you can see, but should work if you want it bad enough and can trace through the code a bit :)
You can also inspect paragraphs with a simple loop, and check which xml contains an image (for example if an xml contains "graphicData"), that is which is an image container (you can do the same with runs):
from docx import Document
image_paragraphs = []
doc = Document(path_to_docx)
for par in doc.paragraphs:
if 'graphicData' in par._p.xml:
image_paragraphs.append(par)
Than you unzip docx file, images are in the "images" folder, and they are in the same order as they will be in the image_paragraphs list. On every paragraph element you have many options how to change it. If you want to extract img process it and than insert it in the same place, than
paragraph.clear()
paragraph.add_run('your description, if needed')
run = paragraph.runs[0]
run.add_picture(path_to_pic, width, height)
So, I've never really written any answers here, but i think this might be the solution to your problem. With this little code you can see the position of your images given all the paragraphs. Hope it helps.
import docx
doc = docx.Document(filename)
paraGr = []
index = []
par = doc.paragraphs
for i in range(len(par)):
paraGr.append(par[i].text)
if 'graphicData' in par[i]._p.xml:
index.append(i)
If you are using Python 3
pip install python-docx
import docx
doc = docx.Document(document_path)
P = []
I = []
par = doc.paragraphs
for i in range(len(par)):
P.append(par[i].text)
if 'graphicData' in par[i]._p.xml:
I.append(i)
print(I)
#returns list of index(Image_Reference)

Categories