Python - pygithub If file exists then update else create - python

Im using PyGithub library to update files. Im facing an issue with that. Generally, we have to options. We can create a new file or if the file exists then we can update.
Doc Ref: https://pygithub.readthedocs.io/en/latest/examples/Repository.html#create-a-new-file-in-the-repository
But the problem is, I want to create a new file it if doesn't exist. If exist the use update option.
Is this possible with PyGithub?

I followed The Otterlord's suggestion and I have achieved this. Sharing my code here, it maybe helpful to someone.
from github import Github
g = Github("username", "password")
repo = g.get_user().get_repo(GITHUB_REPO)
all_files = []
contents = repo.get_contents("")
while contents:
file_content = contents.pop(0)
if file_content.type == "dir":
contents.extend(repo.get_contents(file_content.path))
else:
file = file_content
all_files.append(str(file).replace('ContentFile(path="','').replace('")',''))
with open('/tmp/file.txt', 'r') as file:
content = file.read()
# Upload to github
git_prefix = 'folder1/'
git_file = git_prefix + 'file.txt'
if git_file in all_files:
contents = repo.get_contents(git_file)
repo.update_file(contents.path, "committing files", content, contents.sha, branch="master")
print(git_file + ' UPDATED')
else:
repo.create_file(git_file, "committing files", content, branch="master")
print(git_file + ' CREATED')

If you are interested in knowing if a single path exists in your repo you can use something like this and then branch your logic out from the result.
from github.Repository import Repository
def does_object_exists_in_branch(repo: Repository, branch: str, object_path: str) -> bool:
try:
repo.get_contents(object_path, branch)
return True
except github.UnknownObjectException:
return False
But the method in the approved answer is more efficient if you want to check if multiple files exist in a given path, but I wanted to provide this as an alternative.

Related

How to filter filenames with extension on API call?

I was working on the python confluence API for downloading attachment from confluence page, I need to download only files with .mpp extension. Tried with glob and direct parameters but didnt work.
Here is my code:
file_name = glob.glob("*.mpp")
attachments_container = confluence.get_attachments_from_content(page_id=33110, start=0, limit=1,filename=file_name)
print(attachments_container)
attachments = attachments_container['results']
for attachment in attachments:
fname = attachment['title']
download_link = confluence.url + attachment['_links']['download']
r = requests.get(download_link, auth = HTTPBasicAuth(confluence.username,confluence.password))
if r.status_code == 200:
if not os.path.exists('phoenix'):
os.makedirs('phoenix')
fname = ".\\phoenix\\" +fname
glob.glob() operates on your local folder. So you can't use that as a filter for get_attachments_from_content(). Also, don't specify a limit of since that gets you just one/the first attachment. Specify a high limit or whatever default will include all of them. (You may have to paginate results.)
However, you can exclude the files you don't want by checking the title of each attachment before you download it, which you have as fname = attachment['title'].
attachments_container = confluence.get_attachments_from_content(page_id=33110, limit=1000)
attachments = attachments_container['results']
for attachment in attachments:
fname = attachment['title']
if not fname.lower().endswith('.mpp'):
# skip file if it's not got that extension
continue
download_link = ...
# rest of your code here
Also, your code looks like a copy-paste from this answer but you've changed the actual "downloading" part of it. So if your next StackOverflow question is going to be "how to download a file from confluence", use that answer's code.

Take uploaded files on plone and download them via a python script?

I created a document site on plone from which file uploads can be made. I saw that plone saves them in the filesystem in the form of a blob, now I need to take them through a python script that will process the pdfs downloaded with an OCR. Does anyone have any idea how to do it? Thank you
Not sure how to extract PDFs from BLOB-storage or if it's possible at all, but you can extract them from a running Plone-site (e.g. executing the script via a browser-view):
import os
from Products.CMFCore.utils import getToolByName
def isPdf(search_result):
"""Check mime_type for Plone >= 5.1, otherwise check file-extension."""
if mimeTypeIsPdf(search_result) or search_result.id.endswith('.pdf'):
return True
return False
def mimeTypeIsPdf(search_result):
"""
Plone-5.1 introduced the mime_type-attribute on files.
Try to get it, if it doesn't exist, fail silently.
Return True if mime_type exists and is PDF, otherwise False.
"""
try:
mime_type = search_result.mime_type
if mime_type == 'application/pdf':
return True
except:
pass
return False
def exportPdfFiles(context, export_path):
"""
Get all PDF-files of site and write them to export_path on the filessytem.
Remain folder-structure of site.
"""
catalog = getToolByName(context, 'portal_catalog')
search_results = catalog(portal_type='File', Language='all')
for search_result in search_results:
# For each PDF-file:
if isPdf(search_result):
file_path = export_path + search_result.getPath()
file_content = search_result.getObject().data
parent_path = '/'.join(file_path.split('/')[:-1])
# Create missing directories on the fly:
if not os.path.exists(parent_path):
os.makedirs(parent_path)
# Write PDF:
with open(file_path, 'w') as fil:
fil.write(file_content)
print 'Wrote ' + file_path
print 'Finished exporting PDF-files to ' + export_path
The example keeps the folder-structure of the Plone-site in the export-directory. If you want them flat in one directory, a handler for duplicate file-names is needed.

How can I change a part of a filename with python

I am brand new to python (and coding in general) and I've been unable to find a solution to my specific problem online. I am currently creating a tool which will allow a user to save a file to a network location. The file will have a version number. What I would like to is to have the script auto version up before it saves. I have the rest of the script done, but it is the auto versioning that I am having issues with. Here's what I have so far:
import re
import os
def main():
wip_folder = "L:/xxx/xxx/xxx/scenes/wip/"
_file_list = os.listdir('%s' % wip_folder)
if os.path.exists('%s' wip_path):
for file in _file_list:
versionPattern = re.compile('_v\d{3}')
curVersions = versionPattern.findall('%s' % wip_folder)
curVersions.sort()
nextVersion = '_v%03d' % (int(curVersions[-1][2:]) + 1)
return nextVersion
else:
nextVersion = '_v001'
name = xxx_xxx_xx
name += '%s' nextVersion
name += '_xxx_wip
I should probably point out that main() is going to be called by a QPushbutton in another module. Also, that wip_path will most likely have several versions of a single file in it. So if there are 10 versions of this file in wip_path, this save should be v011. I apologize if this question makes no sense. Any help would be appreciated. Thank you!
You do not need to use re at all, programming is about simplification and you are over complicating it! I chose to return but anything in this function can be changed to whatever you need it to specifically do! Just read the comments and good luck!
def getVersioning(path):
#passing path
count = 1 #init count
versionStr = 'wip_path_v'
try: #in case path dosen't exist
for i in os.listdir(path): #loop thorough files in passed dir
if versionStr in i: #check if files contain default versioning string e.g. wip_path_V or any other (used in case other files are in the same dir)
count += 1 #incriment count
except:
os.mkdir(path) #makedir if one does not exist for future use
newName = versionStr + str(count) #new versioning file name
return newName #return new versioning name
print getVersioning('tstFold')

sublime text 3 auto-complete plugin not working

I try to write a plugin to get all the classes in current folder to do an auto-complete injection.
the following code is in my python file:
class FolderPathAutoComplete(sublime_plugin.EventListener):
def on_query_completions(self, view, prefix, locations):
folders = view.window().folders()
results = get_path_classes(folders)
all_text = ""
for result in results:
all_text += result + "\n"
#sublime.error_message(all_text)
return results
def get_path_classes(folders):
classesList = []
for folder in folders:
for root, dirs, files in os.walk(folder):
for filename in files:
filepath = root +"/"+filename
if filepath.endswith(".java"):
filepath = filepath.replace(".java","")
filepath = filepath[filepath.rfind("/"):]
filepath = filepath[1:]
classesList.append(filepath)
return classesList
but somehow when I work in a folder dir with a class named "LandingController.java" and I try to get the result, the auto complete is not working at all.
However, as you may noticed I did a error_message output of all the contents I got, there are actual a list of class name found.
Can anyone help me solve this? thank you!
It turn outs that the actual format which sublime text accept is: [(word,word),...]
but thanks to MattDMo who point out the documentation since the official documentation says nothing about the auto complete part.
For better understanding of the auto complete injection api, you could follow Zinggi's plugin DictionaryAutoComplete and this is the github link
So for a standard solution:
class FolderPathAutoComplete(sublime_plugin.EventListener):
def on_query_completions(self, view, prefix, locations):
suggestlist = self.get_autocomplete_list(prefix)
return suggestlist
def get_autocomplete_list(self, word):
global classesList
autocomplete_list = []
uniqueautocomplete = set()
# filter relevant items:
for w in classesList:
try:
if word.lower() in w.lower():
actual_class = parse_class_path_only_name(w)
if actual_class not in uniqueautocomplete:
uniqueautocomplete.add(actual_class)
autocomplete_list.append((actual_class, actual_class))
except UnicodeDecodeError:
print(actual_class)
# autocomplete_list.append((w, w))
continue
return autocomplete_list

.doc to pdf using python

I'am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don't know what I'm looking for).
Right now I'm reading through this. Dont know how useful this is going to be.
A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:
import sys
import os
import comtypes.client
wdFormatPDF = 17
in_file = os.path.abspath(sys.argv[1])
out_file = os.path.abspath(sys.argv[2])
word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
You could also use pywin32, which would be the same except for:
import win32com.client
and then:
word = win32com.client.Dispatch('Word.Application')
You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.
from docx2pdf import convert
convert("input.docx")
convert("input.docx", "output.pdf")
convert("my_docx_folder/")
pip install docx2pdf
docx2pdf input.docx output.pdf
Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf
I have tested many solutions but no one of them works efficiently on Linux distribution.
I recommend this solution :
import sys
import subprocess
import re
def convert_to(folder, source, timeout=None):
args = [libreoffice_exec(), '--headless', '--convert-to', 'pdf', '--outdir', folder, source]
process = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=timeout)
filename = re.search('-> (.*?) using filter', process.stdout.decode())
return filename.group(1)
def libreoffice_exec():
# TODO: Provide support for more platforms
if sys.platform == 'darwin':
return '/Applications/LibreOffice.app/Contents/MacOS/soffice'
return 'libreoffice'
and you call your function:
result = convert_to('TEMP Directory', 'Your File', timeout=15)
All resources:
https://michalzalecki.com/converting-docx-to-pdf-using-python/
I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven's answer is right, but it will fail on my computer. There are two key points to fix it here:
(1). The first time when I created the 'Word.Application' object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the 'Word.Application' object will be deleted by OS. )
(2). After doing (1), the program will work well sometimes but may fail often. The crash error "COMError: (-2147418111, 'Call was rejected by callee.', (None, None, None, 0, None))" means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.
After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.
import os
import comtypes.client
import time
wdFormatPDF = 17
# absolute path is needed
# be careful about the slash '\', use '\\' or '/' or raw string r"..."
in_file=r'absolute path of input docx file 1'
out_file=r'absolute path of output pdf file 1'
in_file2=r'absolute path of input docx file 2'
out_file2=r'absolute path of outputpdf file 2'
# print out filenames
print in_file
print out_file
print in_file2
print out_file2
# create COM object
word = comtypes.client.CreateObject('Word.Application')
# key point 1: make word visible before open a new document
word.Visible = True
# key point 2: wait for the COM Server to prepare well.
time.sleep(3)
# convert docx file 1 to pdf file 1
doc=word.Documents.Open(in_file) # open docx file 1
doc.SaveAs(out_file, FileFormat=wdFormatPDF) # conversion
doc.Close() # close docx file 1
word.Visible = False
# convert docx file 2 to pdf file 2
doc = word.Documents.Open(in_file2) # open docx file 2
doc.SaveAs(out_file2, FileFormat=wdFormatPDF) # conversion
doc.Close() # close docx file 2
word.Quit() # close Word Application
unoconv (writen in Python) and OpenOffice running as a headless daemon.
https://github.com/unoconv/unoconv
http://dag.wiee.rs/home-made/unoconv/
Works very nicely for doc, docx, ppt, pptx, xls, xlsx.
Very useful if you need to convert docs or save/convert to certain formats on a server.
As an alternative to the SaveAs function, you could also use ExportAsFixedFormat which gives you access to the PDF options dialog you would normally see in Word. With this you can specify bookmarks and other document properties.
doc.ExportAsFixedFormat(OutputFileName=pdf_file,
ExportFormat=17, #17 = PDF output, 18=XPS output
OpenAfterExport=False,
OptimizeFor=0, #0=Print (higher res), 1=Screen (lower res)
CreateBookmarks=1, #0=No bookmarks, 1=Heading bookmarks only, 2=bookmarks match word bookmarks
DocStructureTags=True
);
The full list of function arguments is: 'OutputFileName', 'ExportFormat', 'OpenAfterExport', 'OptimizeFor', 'Range', 'From', 'To', 'Item', 'IncludeDocProps', 'KeepIRM', 'CreateBookmarks', 'DocStructureTags', 'BitmapMissingFonts', 'UseISO19005_1', 'FixedFormatExtClassPtr'
It's worth noting that Stevens answer works, but make sure if using a for loop to export multiple files to place the ClientObject or Dispatch statements before the loop - it only needs to be created once - see my problem: Python win32com.client.Dispatch looping through Word documents and export to PDF; fails when next loop occurs
If you don't mind using PowerShell have a look at this Hey, Scripting Guy! article. The code presented could be adopted to use the wdFormatPDF enumeration value of WdSaveFormat (see here).
This blog article presents a different implementation of the same idea.
I have modified it for ppt support as well. My solution support all the below-specified extensions.
word_extensions = [".doc", ".odt", ".rtf", ".docx", ".dotm", ".docm"]
ppt_extensions = [".ppt", ".pptx"]
My Solution: Github Link
I have modified code from Docx2PDF
I tried the accepted answer but wasn't particularly keen on the bloated PDFs Word was producing which was usually an order of magnitude bigger than expected. After looking how to disable the dialogs when using a virtual PDF printer I came across Bullzip PDF Printer and I've been rather impressed with its features. It's now replaced the other virtual printers I used previously. You'll find a "free community edition" on their download page.
The COM API can be found here and a list of the usable settings can be found here. The settings are written to a "runonce" file which is used for one print job only and then removed automatically. When printing multiple PDFs we need to make sure one print job completes before starting another to ensure the settings are used correctly for each file.
import os, re, time, datetime, win32com.client
def print_to_Bullzip(file):
util = win32com.client.Dispatch("Bullzip.PDFUtil")
settings = win32com.client.Dispatch("Bullzip.PDFSettings")
settings.PrinterName = util.DefaultPrinterName # make sure we're controlling the right PDF printer
outputFile = re.sub("\.[^.]+$", ".pdf", file)
statusFile = re.sub("\.[^.]+$", ".status", file)
settings.SetValue("Output", outputFile)
settings.SetValue("ConfirmOverwrite", "no")
settings.SetValue("ShowSaveAS", "never")
settings.SetValue("ShowSettings", "never")
settings.SetValue("ShowPDF", "no")
settings.SetValue("ShowProgress", "no")
settings.SetValue("ShowProgressFinished", "no") # disable balloon tip
settings.SetValue("StatusFile", statusFile) # created after print job
settings.WriteSettings(True) # write settings to the runonce.ini
util.PrintFile(file, util.DefaultPrinterName) # send to Bullzip virtual printer
# wait until print job completes before continuing
# otherwise settings for the next job may not be used
timestamp = datetime.datetime.now()
while( (datetime.datetime.now() - timestamp).seconds < 10):
if os.path.exists(statusFile) and os.path.isfile(statusFile):
error = util.ReadIniString(statusFile, "Status", "Errors", '')
if error != "0":
raise IOError("PDF was created with errors")
os.remove(statusFile)
return
time.sleep(0.1)
raise IOError("PDF creation timed out")
I was working with this solution but I needed to search all .docx, .dotm, .docm, .odt, .doc or .rtf and then turn them all to .pdf (python 3.7.5). Hope it works...
import os
import win32com.client
wdFormatPDF = 17
for root, dirs, files in os.walk(r'your directory here'):
for f in files:
if f.endswith(".doc") or f.endswith(".odt") or f.endswith(".rtf"):
try:
print(f)
in_file=os.path.join(root,f)
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
doc.SaveAs(os.path.join(root,f[:-4]), FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
word.Visible = True
print ('done')
os.remove(os.path.join(root,f))
pass
except:
print('could not open')
# os.remove(os.path.join(root,f))
elif f.endswith(".docx") or f.endswith(".dotm") or f.endswith(".docm"):
try:
print(f)
in_file=os.path.join(root,f)
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
doc = word.Documents.Open(in_file)
doc.SaveAs(os.path.join(root,f[:-5]), FileFormat=wdFormatPDF)
doc.Close()
word.Quit()
word.Visible = True
print ('done')
os.remove(os.path.join(root,f))
pass
except:
print('could not open')
# os.remove(os.path.join(root,f))
else:
pass
The try and except was for those documents I couldn't read and won't exit the code until the last document.
You should start from investigating so called virtual PDF print drivers.
As soon as you will find one you should be able to write batch file that prints your DOC files into PDF files. You probably can do this in Python too (setup printer driver output and issue document/print command in MSWord, later can be done using command line AFAIR).
import docx2txt
from win32com import client
import os
files_from_folder = r"c:\\doc"
directory = os.fsencode(files_from_folder)
amount = 1
word = client.DispatchEx("Word.Application")
word.Visible = True
for file in os.listdir(directory):
filename = os.fsdecode(file)
print(filename)
if filename.endswith('docx'):
text = docx2txt.process(os.path.join(files_from_folder, filename))
print(f'{filename} transfered ({amount})')
amount += 1
new_filename = filename.split('.')[0] + '.txt'
try:
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
except:
os.mkdir(files_from_folder + r'\txt_files')
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
elif filename.endswith('doc'):
doc = word.Documents.Open(os.path.join(files_from_folder, filename))
text = doc.Range().Text
doc.Close()
print(f'{filename} transfered ({amount})')
amount += 1
new_filename = filename.split('.')[0] + '.txt'
try:
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
except:
os.mkdir(files_from_folder + r'\txt_files')
with open(os.path.join(files_from_folder + r'\txt_files', new_filename), 'w', encoding='utf-8') as t:
t.write(text)
word.Quit()
The Source Code, see here:
https://neculaifantanaru.com/en/python-full-code-how-to-convert-doc-and-docx-files-to-pdf-from-the-folder.html
I would suggest ignoring your supervisor and use OpenOffice which has a Python api. OpenOffice has built in support for Python and someone created a library specific for this purpose (PyODConverter).
If he isn't happy with the output, tell him it could take you weeks to do it with word.

Categories