I am doing content linking check on user's upload zip file with Python's zipfile and BeautifulSoup module.
In the zip file, there is a file "a.html" and its full path in the zip file is "content/product1/component1/a.html". File 'a.html' has a <a href="../../product2/component2/b.html"> link to another HTML file.
I want to know how to combine the path "content/product1/component1/a.html" with "../../product2/component2/b.html" and get the right path which is "content/product2/component2/b.html". So I can check where this file exists.
I tried os.path.join("content/product1/component1/a.html","../../product2/component2/b.html), but I don't get "content/product2/component2/b.html". Does anyone know how to do that?
You need to extract the path component from "content/product1/component1/a.html", join that to the "../../product2/component2/b.html" href, and then normalize the result.
import os.path
src = "content/product1/component1/a.html"
srcdir = os.path.dirname(src)
href = "../../product2/component2/b.html"
url = os.path.normpath(os.path.join(srcdir, href))
print(url)
output
content/product2/component2/b.html
You might want to try using str.split() (with / as the separator) and then use os.path.join() on the parts you need.
Related
Let consider a folders(Mandar and html) on your desktop. Now paste any pdf file and named it 'dell' in 'html' folder and create demo.py file in 'Mandar' folder. Now create some txt files(2-4) as your wish so that 'html' folder contains some txt files and only one pdf file.
import os
import PyPDF2 # install via 'pip install PyPDF2'
# Put location of your pdf file i.e. dell.pdf in 'location' variable
location = "C:/Users/Desktop/html/"
n = "dell.pdf"
path = os.path.join(location, n)
reader = PyPDF2.PdfReader(path)
pages = len(reader.pages)
print(f"The no. of pages in {n} is {pages}.")
Now run program and you see that
''The no. of pages in dell.pdf is NUM.'' //NUM is no. of pages of your pdf
Now let consider 'html' folder always contain only one pdf file with any name maybe dell, maybe ecc, maybe any name. I want that variable 'n' stores this one pdf file in itself as input so that the program will run and display same result with different pdf file name and Num.
Give glob in the standard library a shot. It'll get you a list of all the matching PDF files in that directory.
import os
import PyPDF2
...
import glob
Location='C:/Users/Desktop/html/'
candidates = glob.glob(os.path.join(Location, '*.pdf'))
if len(candidates) == 0:
raise Exception('No PDFs found')
File=open(candidates[0],'rb')
...
You're looking for globbing. You can do that with pathlib:
from pathlib import Path
root = Path(location)
pdf_files = root.glob("*.pdf")
I am using python3.10. To unzip a file I have a zip file in folder 'wowo' if I select folder using path and use only file name the code doesn't work. But, when full path+filename given it works. I don't want go give full path and file name together. I want to define path saperately.
zipdata = zipfile.ZipFile('/Volumes/MacHD/MYPY/wowo/NST_cm.zip')
zipinfos = zipdata.infolist()
for zipinfo in zipinfos:
zipinfo.filename = 'Nst.csv'
zipdata.extract(path=path, member=zipinfo)
You could join the two strings in order to form the full filepath.
filepath = os.path.join(path, filename)
zipfile.ZipFile(filepath)
Or I believe the ZipFile function can take a path and file name expression like this
zipfile.ZipFile(path,'filename')
Replacing filename with the name of the file you wish to work with
You can use pathlib and add the path with the filename in the zipfile.zipfile:
import pathlib
path = pathlib.Path('PATH/TO/FOLDER')
zipfile.ZipFile( path / 'filename')
I am trying to find if a given file pattern is available in a directory. If found, I would want to open and load as JSON. If not found, I would like to send an email saying that the file is not present.
File name: test_09082021.txt
import os
path ='/dev/stage/'
file_pattern = 'test*'
file = path + file_pattern
if os.path.isfile(file):
with open(file) as f:
data = json.load(f)
else:
mailserver.sendmail('abc#gmail.com','def#gmail.com',msg.as_string())
But the application couldn't find the file and is returning false in the IF statement.
Could you please advise how I can check for the file pattern to make this work?
I would use glob library:
import glob
import json
path ='/dev/stage/'
file_pattern = 'test*'
pattern = path + file_pattern
matched_files = glob.glob(pathname=pattern)
try:
with open(matched_files[0]) as f:
data = json.load(f)
except IndexError:
mailserver.sendmail('abc#gmail.com','def#gmail.com',msg.as_string())
Use try, except block, according to python ask forgiveness not permission principle.
glob() returns list of files, which matched the desired pattern. If there is no file, python will throw IndexError and you send email. Otherwise first file is opened as json.
Your code doesn't work because it checks if a file with the given name "test*" exist - that's not the same as creating a list of all files that have a pattern.
Have a look at the following for an example of using file patterns:
Find all files in a directory with extension .txt in Python
I need to change a prefix for a current file.
An example would look as follows:
from pathlib import Path
file = Path('/Users/my_name/PYTHON/Playing_Around/testing_lm.py')
# Current file with destination
print(file)
# Prefix to be used
file_prexif = 'A'
# Hardcoding wanted results.
Path('/Users/my_name/PYTHON/Playing_Around/A_testing_lm.py')
As can be seen hardcoding it is easy. However is there a way to automate this step?
There is a pseudo - idea of what I want to do:
str(file).split('/')[-1] = str(file_prexif) + str('_') + str(file).split('/')[-1]
I only want to change last element of PosixPath file. However it is not possible to change only last element of string
file.stem accesses the base name of the file without extension.
file.with_stem() (added in Python 3.9) returns an updated Path with a new stem:
from pathlib import Path
file = Path('/Users/my_name/PYTHON/Playing_Around/testing_lm.py')
print(file.with_stem(f'A_{file.stem}'))
\Users\my_name\PYTHON\Playing_Around\A_testing_lm.py
Use file.parent to get the parent of the path and file.name to get the final path component, excluding the drive and root.
from pathlib import Path
file = Path('/Users/my_name/PYTHON/Playing_Around/testing_lm.py')
file_prexif_lst = ['A','B','C']
for prefix in file_prexif_lst:
p = file.parent.joinpath(f'{prefix}_{file.name}')
print(p)
/Users/my_name/PYTHON/Playing_Around/A_testing_lm.py
/Users/my_name/PYTHON/Playing_Around/B_testing_lm.py
/Users/my_name/PYTHON/Playing_Around/C_testing_lm.py
I am getting a file posting from a file:
file = request.post['ufile']
I want to get the path. How can I get it?
You have to use the request.FILES dictionary.
Check out the official documentation about the UploadedFile object, you can use the UploadedFile.temporary_file_path attribute, but beware that only files uploaded to disk expose it (that is, normally, when using the TemporaryFileUploadHandler uploads handler).
upload = request.FILES['ufile']
path = upload.temporary_file_path
In the normal case, though, you would like to use the file handler directly:
upload = request.FILES['ufile']
content = upload.read() # For small files
# ... or ...
for chunk in upload.chunks():
do_somthing_with_chunk(chunk) # For bigger files
You should use request.FILES['ufile'].file.name
you will get like this /var/folders/v7/1dtcydw51_s1ydkmypx1fggh0000gn/T/tmpKGp4mX.upload
and use file.name, your upload file have to bigger than 2.5M.
if you want to change this , see File Upload Settings
We cannot get the file path from the post request, only the filename, because flask doesn't has the file system access. If you need to get the file and perform some operations on it then you can try creating a temp directory save the file there, you can also get the path.
import tempfile
import shutil
dirpath = tempfile.mkdtemp()
# perform some operations if needed
shutil.rmtree(dirpath) # remove the temp directory