Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
There is a python-helloworld example for Libre/Openoffice-writer (which is included in Libreoffice 4.1.6.2:
def HelloWorldWriter( ):
"""Prints the string 'Hello World(in Python)' into the current document"""
#get the doc from the scripting context which is made available to all scripts
desktop = XSCRIPTCONTEXT.getDesktop()
model = desktop.getCurrentComponent()
#check whether there's already an opened document. Otherwise, create a new one
if not hasattr(model, "Text"):
model = desktop.loadComponentFromURL(
"private:factory/swriter","_blank", 0, () )
#get the XText interface
text = model.Text
#create an XTextRange at the end of the document
tRange = text.End
#and set the string
tRange.String = "Hello World (in Python)"
return None
That script checks for an open writer document, creates a new one if one does not exist and outputs a string into that document.
Is there something similar for Libreoffice/Openoffice-calc?
Ideally, it should include:
· Read a table cell
· Write a table cell
· Save as ODT/XLS/CSV
For open office check it :
http://stuvel.eu/ooo-python : http://www.apidev.fr/blog/2011/07/18/utiliser-openoffice-avec-python/ the explanation is in french but check the code :
import os, sys
if sys.platform == 'win32':
#This is required in order to make pyuno usable with the default python interpreter under windows
#Some environment varaible must be modified
#get the install path from registry
import _winreg
value = _winreg.QueryValue(_winreg.HKEY_LOCAL_MACHINE, 'SOFTWARE\OpenOffice.org\UNO\InstallPath')
install_folder = '\\'.join(value.split('\\')[:-1])
#modify the environment variables
os.environ['URE_BOOTSTRAP'] = 'vnd.sun.star.pathname:{0}\\program\\fundamental.ini'.format(install_folder)
os.environ['UNO_PATH'] = install_folder+'\\program\\'
sys.path.append(install_folder+'\\Basis\\program')
paths = ''
for path in ("\\URE\\bin;", "\\Basis\\program;"):
paths += install_folder + path
os.environ['PATH'] = paths+ os.environ['PATH']
import uno
using the calc :
class UnoClient:
def __init__(self):
localContext = uno.getComponentContext()
resolver = localContext.ServiceManager.createInstanceWithContext(
"com.sun.star.bridge.UnoUrlResolver", localContext)
self.smgr = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ServiceManager")
def create_document(self, app):
remoteContext = self.smgr.getPropertyValue("DefaultContext")
desktop = self.smgr.createInstanceWithContext( "com.sun.star.frame.Desktop",remoteContext)
url = "private:factory/{0}".format(app)
return desktop.loadComponentFromURL(url,"_blank", 0, () )
you should use it like that :
calc = UnoClient().create_document('scalc') #cree un nouveau classeur
sheet = calc.getSheets().getByIndex(0) #1ere feuille du classeur
sheet.getCellByPosition(0, 0).setString("Salut") #Un texte
sheet.getCellByPosition(0, 1).setValue(3.14) #Un nombre
sheet.getCellByPosition(0, 2).setFormula("=SUM(2+2)") #Une formule
sheet.getCellByPosition(0, 2).CellBackColor = int("ff7f00", 16) #Couleur RGB de fond
sheet.getCellByPosition(0, 2).CharUnderline = 1 # Souligne
sheet.getCellByPosition(0, 2).CharHeight = 16 #Taille de la police
sheet.getCellByPosition(0, 2).CharWeight = 150 #Gras
sheet.getCellByPosition(0, 2).CharPosture = 2 #Italique
And check it ! http://oosheet.hacklab.com.br/
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Is there any Python way to identify if the PDF has been OCR’d (the quality of the text is bad) vs a searchable PDF (the quality of the text is perfect)?
Using metadata of pdf
import pprint
import PyPDF2
def get_doc_info(path):
pp = pprint.PrettyPrinter(indent =4)
pdf_file = PyPDF2.PdfFileReader(path, 'rb')
doc_info = pdf_file.getDocumentInfo()
pp.pprint(doc_info)
I find :
result = get_doc_info(PDF_SEARCHABLE_HAS_BEEN_OCRD.pdf)
{ '/Author': 'NAPS2',
'/CreationDate': "D:20200701104101+02'00'",
'/Creator': 'NAPS2',
'/Keywords': '',
'/ModDate': "D:20200701104101+02'00'",
'/Producer': 'PDFsharp 1.50.4589 (www.pdfsharp.com)'}
result = get_doc_info(PDF_SEARCHABLE_TRUE.pdf)
{ '/CreationDate': 'D:20210802122000Z',
'/Creator': 'Quadient CXM AG~Inspire~14.3.49.7',
'/Producer': ''}
Can i check the type of the PDF (True PDF or OCR PDF) using Creator from metaData of the PDF?
There is another way using python ?
If there is no solution to the problem, how can i use the deep learning/Machine learning to detect the type of the pdf searchable (True or OCR) ?
This is a video to understand the difference between TRUE PDF and OCR PDF : https://www.youtube.com/watch?v=xs8KQbxsMcw
Not long ago i ran into the same problem!
I developed (based on some SO post i cannot recall) this function:
def get_scanned_pages_percentage(filepath: str) -> float:
"""
INPUT: path to a pdf file
OUTPUT: % of pages OCR'd which include text
"""
total_pages = 0
total_scanned_pages = 0
with fitz.open(filepath) as doc:
for page in doc:
text = page.getText().strip()
if len(text) == 0:
# Ignore "empty" pages
continue
total_pages += 1
pix1 = page.getPixmap(alpha=False) # render page to an image
remove_all_text(doc, page)
pix2 = page.getPixmap(alpha=False)
img1 = pix1.getImageData("png")
img2 = pix2.getImageData("png")
if img1 == img2:
# print(f"{page.number} was scanned or has no text")
if len(text) > 0:
# print(f"\tHas text of length {len(text):,} characters")
total_scanned_pages += 1
else:
pass
if total_pages == 0:
return 0
return (total_scanned_pages / total_pages) * 100
This function will give a 100 (or close to it) is the pdf is an image containing an OCR'd text, and a 0 if its a native digital pdf.
remove all text:
def remove_all_text(doc, page):
"""Removes all text from a doc pdf page (metadata)"""
page.cleanContents() # syntax cleaning of page appearance commands
# xref of the cleaned command source (bytes object)
xref = page.getContents()[0]
cont = doc.xrefStream(xref) # read it
# The metadata is stored, it extracts it as bytes. Then searches fot the tags refering to text and deletes it.
ba_cont = bytearray(cont) # a modifyable version
pos = 0
changed = False # switch indicates changes
while pos < len(cont) - 1:
pos = ba_cont.find(b"BT\n", pos) # begin text object
if pos < 0:
break # not (more) found
pos2 = ba_cont.find(b"ET\n", pos) # end text object
if pos2 <= pos:
break # major error in PDF page definition!
ba_cont[pos: pos2 + 2] = b"" # remove text object
changed = True
if changed: # we have indeed removed some text
doc.updateStream(xref, ba_cont) # write back command stream w/o text
I'm going to make an easy example on an application case of mine.
I am developing a gui with tkinter, my work is divided in 3 files:
constants:
CostoMedioEnergia = float(0.2) #
IncentivoAutFisico_Prop_Impianti = float(0.2)#
IncentivoAutFisico_DirittoSup = float(0.06)#
IncentivoAutFisico_Finanziatore = float(0.140)#
CostoFotovoltaico = float(1000)
ProduzioneFotovoltaico = float(1300) ##########
IncentivoRitDedicato = float(0.045)#
IncentivoMise = float(0.110)#
IncentivoArera = float(0.009)#
another file where I have the prototypes of the classes of users (here an example):
class ProprietarioImpianti(Parametri):
def __init__(self, ConsumiPrevisti, PotenzaContatore, PotenzaFotovoltaicoInstallato, PercEnerAutoconsumo, PercIncentivoAutconsumo, PercEnerCondivisa, PercMise, PercArera, PercIncentivoRitDedicato, IndiceUtente):
self.ConsumiPrevisti = ConsumiPrevisti
self.PotenzaContatore = PotenzaContatore
self.PotenzaFotovoltaicoInstallato = PotenzaFotovoltaicoInstallato
self.PercEnerCondivisa = PercEnerCondivisa #Energia in autoconsumo condiviso (percentuale sull'energia totale disponibile alla condivisione)
self.PercMise = PercMise #Percentuale incentivo Mise attributita all'utente
self.PercArera = PercArera # Percentuale incentivo Arera attribuita all'utente
self.PercEnerAutoconsumo = PercEnerAutoconsumo #Percentuale dell'energia prodotta che viene autoconsumata fisicamente
self.PercIncentivoAutoconsumo = PercIncentivoAutconsumo #Percentuale dell'incentivo autoconsumo attribuito all'utente
self.PercIncentivoRitDedicato = PercIncentivoRitDedicato
self.IndiceUtente = IndiceUtente
#property
def CostoMedioBolletta(self):
return self.ConsumiPrevisti * self.PotenzaContatore * CostoMedioEnergia
As you can see in the last line I'm using "CostoMedioEnergia" which is a constant imported from the previous file. The third and last file is my tkinter gui, you don't necessary need that, where I input some data and instantiate classes accordingly.
My question is: I want to input those constant data from the first file (such as CostoMedioEnergia) form the GUI in tkinter. What would be the best way to update my code so that I can use parameters form input? I thought to create a new class in the second file I mentioned (where all my other classes are) and to use inheritance. But this way I'd add a lot of attributes to my classes when I only need to store that information once and make it accessible to all classes without creating "copies" of the information in my "subclasses".
I'm new to VBA but have some experience on Python, and I'm working on a project that needs to scrape a webpage in order to get some info. When this info is fetched, it must fill a worksheet cell.
I've found some help here and here (mainly the second link) but guess that I'm missing something, because the Python shell window blinks quickly on the screen and then it does nothing. I've used a MsgBox to "print" the return value and got nothing as well, like if the script wasn't running.
Disclaimer: I'm not using Shell.Run because I want to receive my Python script return value on VBA, so I can put cell by cell accordingly.
Here's the code:
Private Sub CommandButton1_Click()
Dim codigo As String
codigo = InputBox("Leia o QRCode.", "...")
'Data
Dim dateString As String
code1 = Now
'Hora
Dim hourString As String
code2 = Hour(Now)
'Modelo
Dim theShell As Object
Dim theExec As Object
Dim runPath As String
Dim modelName As String
Dim model As String
Dim theOutput As Object
Set theShell = CreateObject("WScript.Shell")
runPath = "python " & ActiveWorkbook.Path & "\get_modelo.py " & "'" & codigo & "'"
Set theExec = theShell.Exec(runPath)
Set theOutput = theExec.StdOut
modelName = theOutput.ReadLine
While Not theOutput.AtEndOfStream
modelName = theOutput.ReadLine
MsgBox modelName
If modelName <> "" Then model = model & modelName
Wend
Set theShell = Nothing
Set theExec = Nothing
Set theOutput = Nothing
'Tem que ver alguma forma de conseguir
'o link ao ler o QRCode, ou teremos
'que gerar o link no nosso script também
'Lote
Worksheets("database").Range("A2").Value = dateString
Worksheets("database").Range("B2").Value = hourString
Worksheets("database").Range("C2").Value = model
End Sub
My Python code:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import webbrowser
import sys
def get_modelo(address_string):
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.set_window_position(-10000, 0)
driver.get(address_string)
model = driver.find_element_by_xpath(r'//*[#id="top90"]/h1')
return model.text
print(get_modelo(sys.argv[1]))
def enr_score(nom_jouer, n, con):
score = open('Score.txt', 'a')
seconds = time.time()
local_time = time.ctime(seconds)
score.write("Temp de registration: ")
score.write(local_time)
score.write("JOUER: ")
score.write(nom_jouer)
score.write("\n")
score.write("Il/Elle a fini le jeu avec - ")
score.write(n)
score.write(" - disques apres - ")
score.write(con)
score.write(" - tentatives.")
score.write("\n")
score.close()
return "Ton score a ete enregistre!"
I got this code but for some reason when I check the Score.txt file it's empty. Shouldn't something be written in it?
There's no errors btw
This is the code that calls the function btw
nom_jouer = input("\nComment vous appelez vous? \n \nUSERNAME: ") #demande le nom de jouer
from Partie_E import enr_score
nr_disq = str(n)
tent = str(con)
enr_score(nom_jouer, nr_disq, tent)
Ok. So, basically I was a bit dumb (I'm relatively new to programming) and didn't realize I was editing another file than the one I was looking at.
It created the file in an outside folder and I thought it would edit the already existing one in the same folder as the .py files.
Sorry for the mind-F.
I've got a problem with updating table of contents in docx-file, generated by python-docx on Linux. Generally, it is not difficult to create TOC (Thanks for this answer https://stackoverflow.com/a/48622274/9472173 and this thread https://github.com/python-openxml/python-docx/issues/36)
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'TOC \o "1-3" \h \z \u' # change 1-3 depending on heading levels you need
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p
But later to make TOC visible it requires to update fields. Mentioned bellow solution involves update it manually (right-click on TOC hint and choose 'update fields'). For the automatic updating, I've found the following solution with word application simulation (thanks to this answer https://stackoverflow.com/a/34818909/9472173)
import win32com.client
import inspect, os
def update_toc(docx_file):
word = win32com.client.DispatchEx("Word.Application")
doc = word.Documents.Open(docx_file)
doc.TablesOfContents(1).Update()
doc.Close(SaveChanges=True)
word.Quit()
def main():
script_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
file_name = 'doc_with_toc.docx'
file_path = os.path.join(script_dir, file_name)
update_toc(file_path)
if __name__ == "__main__":
main()
It pretty works on Windows, but obviously not on Linux. Have someone any ideas about how to provide the same functionality on Linux. The only one suggestion I have is to use local URLs (anchors) to every heading, but I am not sure is it possible with python-docx, also I'm not very strong with these openxml features. I will very appreciate any help.
I found a solution from this Github Issue. It work on ubuntu.
def set_updatefields_true(docx_path):
namespace = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
doc = Document(docx_path)
# add child to doc.settings element
element_updatefields = lxml.etree.SubElement(
doc.settings.element, f"{namespace}updateFields"
)
element_updatefields.set(f"{namespace}val", "true")
doc.save(docx_path)## Heading ##
import docx.oxml.ns as ns
def update_table_of_contents(doc):
# Find the settings element in the document
settings_element = doc.settings.element
# Create an "updateFields" element and set its "val" attribute to "true"
update_fields_element = docx.oxml.shared.OxmlElement('w:updateFields')
update_fields_element.set(ns.qn('w:val'), 'true')
# Add the "updateFields" element to the settings element
settings_element.append(update_fields_element)