Pulling PowerPoint Text Attributes Through Python - python

I am trying to pull in the attributes associated with my text in PowerPoint and am getting weird outputs... The output from shape.fill is not as expected. I am also curious to find the other attributes like shape.font and the position of the shape - is this possible?
Issue:
f = shape.fill
Output: <pptx.dml.fill.FillFormat object at 0x00000215C4D6DD90>
Code:
mylist = []
mylist2 = []
mylist3 = []
mylist4 = []
mylist5 = []
mylist6 = []
mylist7 = []
for eachfile in glob.glob(direct):
s = 1
file = os.path.basename(eachfile)
try:
prs = Presentation(eachfile)
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
x = nltk.word_tokenize(shape.text)
t = shape.text
f = shape.fill
print(f)
mylist4.append(file)
mylist5.append(t)
mylist7.append(f)
mylist6.append('Slide: ' + str(s))
# x = shape.text.split() #looks for words with punctuation included
for word in x:
word = word.lower()
if word in terms:
mylist.append("Slide " + str(s))
mylist2.append(file)
mylist3.append(word)
s = s + 1
except:
pass
#mylist = list(dict.fromkeys(mylist))
d = {'FileName':mylist2,'Slide':mylist, 'Match':mylist3}
d2 = {'FileName':mylist4, 'Slide':mylist6, 'Text':mylist5, 'Color':mylist7}
search = phrases + terms
d3 = {'Text':search}
df = pd.DataFrame(d)
df = df.drop_duplicates()

<pptx.dml.fill.FillFormat object at 0x00000215C4D6DD90> is a python object. You need to look up the documentation for these type of objects and use its attribute functions in order to get information out of it.
The only documentation I could find for this type of object is this one, although that is not a "normal" one, but just the source code. Functions You can use are written inside of the FillFormat class, starting with back_color(self, ...)

The API documentation describes what you should expect on any given attribute. For example, here: https://python-pptx.readthedocs.io/en/latest/api/dml.html#fillformat-objects
you can find out how to interrogate the FillFormat object that Shape.fill returns.
In many cases, things are substantially more complex than the common cases and the API will reflect that. For example, fills come in several varieties: an RGB color (most common), a pattern (repeated bitmap mask), an image (either tiled or fit in a variety of ways), and a "null" fill. Accommodating all these options requires you to learn more about PowerPoint than you probably originally wanted to know :)
The overall API documentation is here: https://python-pptx.readthedocs.io/en/latest/#api-documentation

Related

How to read element in list item in Python?

I have the following output from a function and I need to read shape, labels, and domain from this stream.
[Annotation(shape=Rectangle(x=0.0, y=0.0, width=1.0, height=1.0), labels=[ScoredLabel(62282a1dc79ed6743e731b36, name=GOOD, probability=0.5143796801567078, domain=CLASSIFICATION, color=Color(red=233, green=97, blue=21, alpha=255), hotkey=ctrl+3)], id=622cc4d962f051a8f41ddf35)]
I need them as follows
shp = Annotation.shape
lbl = Annotation.labels
dmn = domain
It seems simple but I could not figure it out yet.
Given output as a list of Annotation objects:
output = [Annotation(...)]
you ought to be able to simply do:
shp = output[0].shape
lbl = output[0].labels
dmn = labels[0].domain

How to iterate over and download each image in an image collection from the Google Earth Engine python api

I am new to google earth engine and was trying to understand how to use the Google Earth Engine python api. I can create an image collection, but apparently the getdownloadurl() method operates only on individual images. So I am trying to understand how to iterate over and download all of the images in the collection.
Here is my basic code. I broke it out in great detail for some other work I am doing.
import ee
ee.Initialize()
col = ee.ImageCollection('LANDSAT/LC08/C01/T1')
col.filterDate('1/1/2015', '4/30/2015')
pt = ee.Geometry.Point([-2.40986111110000012, 26.76033333330000019])
buff = pt.buffer(300)
region = ee.Feature.bounds(buff)
col.filterBounds(region)
So I pulled the Landsat collection, filtered by date and a buffer geometry. So I should have something like 7-8 images in the collection (with all bands).
However, I could not seem to get iteration to work over the collection.
for example:
for i in col:
print(i)
The error indicates TypeError: 'ImageCollection' object is not iterable
So if the collection is not iterable, how can I access the individual images?
Once I have an image, I should be able to use the usual
path = col[i].getDownloadUrl({
'scale': 30,
'crs': 'EPSG:4326',
'region': region
})
It's a good idea to use ee.batch.Export for this. Also, it's good practice to avoid mixing client and server functions (reference). For that reason, a for-loop can be used, since Export is a client function. Here's a simple example to get you started:
import ee
ee.Initialize()
rectangle = ee.Geometry.Rectangle([-1, -1, 1, 1])
sillyCollection = ee.ImageCollection([ee.Image(1), ee.Image(2), ee.Image(3)])
# This is OK for small collections
collectionList = sillyCollection.toList(sillyCollection.size())
collectionSize = collectionList.size().getInfo()
for i in xrange(collectionSize):
ee.batch.Export.image.toDrive(
image = ee.Image(collectionList.get(i)).clip(rectangle),
fileNamePrefix = 'foo' + str(i + 1),
dimensions = '128x128').start()
Note that converting a collection to a list in this manner is also dangerous for large collections (reference). However, this is probably the most scalable method if you really need to download.
Here is my solution:
import ee
ee.Initialize()
pt = ee.Geometry.Point([-2.40986111110000012, 26.76033333330000019])
region = pt.buffer(10)
col = ee.ImageCollection('LANDSAT/LC08/C01/T1')\
.filterDate('2015-01-01','2015-04-30')\
.filterBounds(region)
bands = ['B4','B5'] #Change it!
def accumulate(image,img):
name_image = image.get('system:index')
image = image.select([0],[name_image])
cumm = ee.Image(img).addBands(image)
return cumm
for band in bands:
col_band = col.map(lambda img: img.select(band)\
.set('system:time_start', img.get('system:time_start'))\
.set('system:index', img.get('system:index')))
# ImageCollection to List
col_list = col_band.toList(col_band.size())
# Define the initial value for iterate.
base = ee.Image(col_list.get(0))
base_name = base.get('system:index')
base = base.select([0], [base_name])
# Eliminate the image 'base'.
new_col = ee.ImageCollection(col_list.splice(0,1))
img_cummulative = ee.Image(new_col.iterate(accumulate,base))
task = ee.batch.Export.image.toDrive(
image = img_cummulative.clip(region),
folder = 'landsat',
fileNamePrefix = band,
scale = 30).start()
print('Export Image '+ band+ ' was submitted, please wait ...')
img_cummulative.bandNames().getInfo()
A reproducible example can you found it here: https://colab.research.google.com/drive/1Nv8-l20l82nIQ946WR1iOkr-4b_QhISu
You could possibly use ee.ImageCollection.iterate() with a function that gets the image and adds it to a list.
import ee
def accumluate_images(image, images):
images.append(image)
return images
for img in col.iterate(accumulate_images, []):
url = img.getDownloadURL(dict(scale=30, crs='EPSG:4326', region=region))
Unfortunately I am not able to test this code as I do not have access to the API, but it might help you arrive at a solution.
I have a similar problem and was not able o solve with presented solutions. Then I have elaborated a sample code for this purpose. It iterates over an image collection in client side, then it is not affected by limitations (server side only) of .map() or .iterate().
It is possible to download the code and see its explanation here
It basically transform the ImageCollection into a list (ic.toList()). Then it performs a standard loop, and for each individual image it is possible to convert it back to ee.Image(list.get(i)), and then process one by one taking all images in the collection.
In your particular case, to download each image, the function to be called within the loop could be: getDOwnloadURL() or getThumbURL():
var url = imgNew.getDownloadURL({
region: geometry,
});
var thumbURL = imgNew.getThumbURL({region: geometry,dimensions: 512, format: 'png'});

Repeating code when sorting information from a txt-file - Python

I'm having some problem with avoiding my code to repeat itself, like the title says, when I import data from a txt-file. My question is, if there is a smarter way to loop the function. I'm still very new to python in general so I don't have good knowledge in this area.
The code that I'm using is the following
with open("fundamenta.txt") as fundamenta:
fundamenta_list = []
for row in fundamenta:
info_1 = row.strip()
fundamenta_list.append(info_1)
namerow_1 = fundamenta_list[1]
sol_1 = fundamenta_list[2]
pe_1 = fundamenta_list[3]
ps_1 = fundamenta_list[4]
namerow_2 = fundamenta_list[5]
sol_2 = fundamenta_list[6]
pe_2 = fundamenta_list[7]
ps_2 = fundamenta_list[8]
namerow_3 = fundamenta_list[9]
sol_3 = fundamenta_list[10]
pe_3 = fundamenta_list[11]
ps_3 = fundamenta_list[12]
So when the code is reading from "fundamenta_list" how do I change to prevent code repetition?
It looks to me that your input file has records each as a block of 4 rows, so in turn is namerow, sol, pe, ps, and you'll be creating objects that take these 4 fields. Assuming your object is called MyObject, you can do something like:
with open("test.data") as f:
objects = []
while f:
try:
(namerow, sol, pe, ps) = next(f).strip(), next(f).strip(), next(f).strip(), next(f).strip()
objects.append(MyObject(namerow, sol, pe, ps))
except:
break
then you can access your objects as objects[0] etc.
You could even make it into a function returning the list of objects like in Moyote's answer.
If I understood your question correctly, you may want to make a function out of your code, so you can avoid repeating the same code.
You can do this:
def read_file_and_save_to_list(file_name):
with open(file_name) as f:
list_to_return = []
for row in f:
list_to_return.append(row.strip())
return list_to_return
Then afterwards you can call the function like this:
fundamenta_list = read_file_and_save_to_list("fundamenta.txt")

dynamic array of parameters to function python

I have the following functions in python:
def extractParam(word, varName, stringToReplace):
if word.startswith(stringToReplace):
varName=int (word.replace(stringToReplace, ''))
return varName
def getMParams(line):
l = r = b = t = 0
sline = line.strip().split()
for i in range(len(sline)):
l = extractParam(sline[i], l, "l=")
r = extractParam(sline[i], r, "r=")
b = extractParam(sline[i], b, "b=")
t = extractParam(sline[i], t, "t=")
return l, r, b, t
def getIterParams (line):
width = height = stride = x = y = 0
sline = line.strip().split()
for i in range(len(sline)):
width = extractParam(sline[i], width, "width=")
height = extractParam(sline[i], height,"height=")
stride = extractParam(sline[i], stride,"stride=")
x = extractParam(sline[i], x, "x=")
y = extractParam(sline[i], y, "y=")
return width, height , stride, x, y
the functions getMparams and getIterParams are quite the same, my question is if there's a way to create a function that will replace both of them, I was thinking about something like that:
def func (line, params)
//params is an array of parameters (i.e [l,r,b,t] or [width,height,stride,x,y])
//init all params
sline = line.strip().split()
for i in range(len(sline)):
//for everyParam:
param = extractParam(sline[i],param,"param=")
is it possible? or there's another way to do it?
First off, some style points:
The way you handle varName in extractParam is ugly and confusing. It took me a while to figure out what you are trying to do (i.e., allow for the fact that extractParam might not find any data). For now, this is better handled directly instead of trying to call out to a function.
That range(len( thing you're doing has to stop.
You do not need to strip the line before splitting it - any leading and trailing whitespace will disappear during the splitting operation. You will not end up with any extra empty strings in the result.
The name sline is just plain ugly. You've split the line up into words; why not refer to the words as, well, words? (And in any case, don't use abbreviations and jumble things up. Otherwise you get things like sline that are not actually words.)
We don't use namesLikeThis for functions (or anything else; although we do use NamesLikeThis, with a starting capital letter, for classes) in Python. We use names_like_this.
Also, it looks as though you are repeatedly trying to replace
That said, your proposed approach is fine. Note that since we don't know ahead of time how many items will be extracted, we can't just toss each one into a separate variable. But we can solve this easily by returning a dict.
My approach is as follows: I iterate over the names, and for each attempt to find the corresponding word in the line. Upon finding one, I replace the default value of 0, and after this check is done, I insert the corresponding key-value pair into the returned result. I also take a simpler approach to cutting up the word around the equals sign.
def extract_params(line, names):
words = line.split()
result = {}
for name in names:
value = 0
for word in words:
maybe_name, equals, maybe_value = word.partition('=')
if maybe_name == name and equals == '=':
value = maybe_value
result[name] = value
return result
This could potentially be improved quite a bit, but much depends on your exact specifications. I tried to create something that follows your basic logic as closely as possible.

Python: Joining and writing (XML.etrees) trees stored in a list

I'm looping over some XML files and producing trees that I would like to store in a defaultdict(list) type. With each loop and the next child found will be stored in a separate part of the dictionary.
d = defaultdict(list)
counter = 0
for child in root.findall(something):
tree = ET.ElementTree(something)
d[int(x)].append(tree)
counter += 1
So then repeating this for several files would result in nicely indexed results; a set of trees that were in position 1 across different parsed files and so on. The question is, how do I then join all of d, and write the trees (as a cumulative tree) to a file?
I can loop through the dict to get each tree:
for x in d:
for y in d[x]:
print (y)
This gives a complete list of trees that were in my dict. Now, how do I produce one massive tree from this?
Sample input file 1
Sample input file 2
Required results from 1&2
Given the apparent difficulty in doing this, I'm happy to accept more general answers that show how I can otherwise get the result I am looking for from two or more files.
Use Spyne:
from spyne.model.primitive import *
from spyne.model.complex import *
class GpsInfo(ComplexModel):
UTC = DateTime
Latitude = Double
Longitude = Double
DopplerTime = Double
Quality = Unicode
HDOP = Unicode
Altitude = Double
Speed = Double
Heading = Double
Estimated = Boolean
class Header(ComplexModel):
Name = Unicode
Time = DateTime
SeqNo = Integer
class CTrailData(ComplexModel):
index = UnsignedInteger
gpsInfo = GpsInfo
Header = Header
class CTrail(ComplexModel):
LastError = AnyXml
MaxTrial = Integer
Trail = Array(CTrailData)
from lxml import etree
from spyne.util.xml import *
file_1 = get_xml_as_object(etree.fromstring(open('file1').read()), CTrail)
file_2 = get_xml_as_object(etree.fromstring(open('file2').read()), CTrail)
file_1.Trail.extend(file_2.Trail)
file_1.Trail.sort(key=lambda x: x.index)
elt = get_object_as_xml(file_1, no_namespace=True)
print etree.tostring(elt, pretty_print=True)
While doing this, Spyne also converts the data fields from string to their native Python formats as well, so it'll be much easier for you to work with the data from this xml document.
Also, if you don't mind using the latest version from git, you can do e.g.:
class GpsInfo(ComplexModel):
# (...)
doppler_time = Double(sub_name="DopplerTime")
# (...)
so that you can get data from the CamelCased tags without having to violate PEP8.
Use lxml.objectify:
from lxml import etree, objectify
obj_1 = objectify.fromstring(open('file1').read())
obj_2 = objectify.fromstring(open('file2').read())
obj_1.Trail.CTrailData.extend(obj_2.Trail.CTrailData)
# .sort() won't work as objectify's lists are not regular python lists.
obj_1.Trail.CTrailData = sorted(obj_1.Trail.CTrailData, key=lambda x: x.index)
print etree.tostring(obj_1, pretty_print=True)
It doesn't do the additional conversion work that the Spyne variant does, but for your use case, that might be enough.

Categories