python split string and list full path name - python

I am using glob to get a list of all PDF files in a folder (I need full path names to upload file to cloud)
also, during the upload I need to assign a "title" to the file which we be the items name in the cloud.
I need to split the last "\" and the "." and get the values in between. for example:
pdf_list = glob.glob(r'C:\User\username\Desktop\pdf\*.pdf')
a item in the list will be: "c:\User\username\Desktop\pdf\4434343434331.pdf"
I need another pythonic way to grab the pdfs file name in a separate variable while still in the for loop.
for file in pdf_list:
upload.file
file.title(file.split(".")[0]
however the above split will not return my desired results but something along those lines
I am using a for loop to upload each pdf (using file path)

Actually, there is a function for this already:
for file in pdf_list:
file_name = os.path.basename(file)
upload.file(file_name)

You can use pathlib, for example:
from pathlib import Path
p = list(Path('C:/User/username/Desktop/pdf').glob('*.pdf'))
first_filename = p[0].name

Related

Bulk convert files extension using Python

Trying to develop a bulk webp to png converter using python.
Am using the webptools library (https://pypi.org/project/webptools/)
the documentation above only shows how to convert one file at each time and require user input of the file name.
So, what I am trying to do is to scan the folder for *.webp and then convert it to *.png with the original filename. I couldn't solve the output file names. I suppose with the current codes, it keeps overwriting the same file x.png, so it ended up with just 1 output file. I can't figure out how to fix this.
I am new to python. hope to get some guidance or help here. Thank you very much.
from webptools import dwebp
import os, glob
os.chdir("./images") # working directory
webp_list = []
for file in glob.glob("*.webp"):
webp_list = file
print([webp_list])
for files in webp_list:
print(dwebp(input_image=webp_list, output_image="x.png", option="-o", logging="-v"))
# documentation - code allows only 1 input and 1 output
# print(dwebp(input_image="sample.webp", output_image="sample.png", option="-o", logging="-v"))
After you do
webp_list = []
for file in glob.glob("*.webp"):
webp_list = file
print([webp_list])
webp_list is name of last file which matches, rather list of file names. glob.glob itself
Return a possibly-empty list of path names that match pathname(...)
so there is no need for such conhortion and you can simply do
webp_list = glob.glob("*.webp")
instead, then you need different output filename, for which I propose following solution
for filename in webp_list:
outname = filename[:-4] + "png"
dwebp(input_image=filename, output_image=outname, option="-o", logging="-v")
filename[:-4] means filename without 4 last characters (webp in this case), which is then concatenated with png.
I've never used this library before, so my suggestion is based just on how I guess it should work:
from webptools import dwebp
import os, glob
os.chdir("./images") # working directory
webp_list = []
for file in glob.glob("*.webp"):
output_file = file[:-4] + 'png'
dwebp(input_image=file, output_image=output_file, option="-o", logging="-v")

How to use python to create every word list?

Here is wav and image file . and you can donwload it - https://www.dropbox.com/s/iuwt6boc2r2fotc/word_images_file.zip?dl=0
1st step create word list txt file for every word.
put image name to list , and the list name is every word.
but I don't know how to write python code for create every word image list .
example:
accordion-word.txt
file 'accordion_1_musical_instruments.jpg'
file 'accordion_2_musical_instruments.jpg'
file 'accordion_3_musical_instruments.jpg'
file 'accordion_musical_instruments.jpg'
2nd step create audio file list
don't know how to use python write code to create list for every word audio.
accordion-audio.txt
file 'slience_2sec.mp3'
file 'This_is_.mp3'
file 'slience_2sec.mp3'
file 'accordion.mp3'
Thank you !
I prefer os.listdir when I only need file names - Compared to full path that glob returns when absolute path is supplied to it.
I'm making a guess that images' name without numbers in it has same prefix with numbered ones. Regex went crazy without this premis.
Here's full code that does your stuff:
from os import listdir
import re
# Getting list of all image files in directory
location = 'X:test folder/'
image_list = [name for name in listdir(location) if name.endswith('.jpg')]
# Fetching all image keywords, separated by '_x'. ignoring file without it.
reg = re.compile(r'(^[^0-9]*(?=_[0-9]))')
keywords = [reg.match(name).group(0) for name in image_list if reg.match(name)]
# Create txt files per keywords
for keyword in keywords:
filtered = [f"file '{name}'" for name in image_list if name.startswith(keyword)]
with open(location + keyword + '-word.txt', 'w') as file:
file.write('\n'.join(filtered))
# Fetching .wav audio clips
audios = [f"file '{name}'" for name in listdir(location) if name.endswith('.wav')]
# Saving audio clips list
with open(location + 'audio.txt', 'w') as file:
file.write('\n'.join(audios))
Results: On My server.
You can make keyword section way simpler by using changing file names.
for example, air-conditioner_1 instead of air_conditioner_1. Then we know we need to separate at first underscore for all files to get keywords. Much simpler.
You could use python built-in module glob.
For example to get a list of all mp3 file:
glob.glob('C:\Downloads\*.mp3')
Note that the path format in the example above is for window.

Python string alphabet removal?

So in my program, I am reading in files and processing them.
My output should say just the file name and then display some data
When I am looping through files and printing output by their name and data,
it displays for example: myfile.txt. I don't want the .txt part. just myfile.
how can I remove the .txt from the end of this string?
The best way to do it is in the example
import os
filename = 'myfile.txt'
print(filename)
print(os.path.splitext(filename))
print(os.path.splitext(filename)[0])
More info about this very useful builtin module
https://docs.python.org/3.8/library/os.path.html
The answers given are totally right, but if you have other possible extensions, or don't want to import anything, try this:
name = file_name.rsplit(".", 1)[0]
You can use pathlib.Path which has a stem attribute that returns the filename without the suffix.
>>> from pathlib import Path
>>> Path('myfile.txt').stem
'myfile'
Well if you only have .txt files you can do this
file_name = "myfile.txt"
file_name.replace('.txt', '')
This uses the built in replace functionality. You can find more info on it here!

Removing file extension from filename with file handle as input

I have the following code f = open('01-01-2017.csv')
From f variable, I need to remove the ".csv" and set the remaining "01-01-2017" to a variable called "date". what is the best way to accomplish this
just retrieve the name of the file using f.name and apply os.path.splitext, keep the left part:
import os
date = os.path.splitext(os.path.basename(f.name))[0]
(I've used os.path.basename in case the file has an absolute path)

Extracting all file names in python

I have a application that converts from one photo format to another by inputting in cmd.exe following: "AppConverter.exe" "file.tiff" "file.jpeg"
But since i don't want to input this every time i want a photo converted, i would like a script that converts all files in the folder. So far i have this:
def start(self):
for root, dirs, files in os.walk("C:\\Users\\x\\Desktop\\converter"):
for file in files:
if file.endswith(".tiff"):
subprocess.run(['AppConverter.exe', '.tiff', '.jpeg'])
So how do i get the names of all the files and put them in subprocess. I am thinking taking basename (no ext.) for every file and pasting it in .tiff and .jpeg, but im at lost on how to do it.
I think the fastest way would be to use the glob module for expressions:
import glob
import subprocess
for file in glob.glob("*.tiff"):
subprocess.run(['AppConverter.exe', file, file[:-5] + '.jpeg'])
# file will be like 'test.tiff'
# file[:-5] will be 'test' (we remove the last 5 characters, so '.tiff'
# we add '.jpeg' to our extension-less string
All those informations are on the post I've linked in the comments o your original question.
You could try looking into os.path.splitext(). That allows you to split the file name into a tuple containing the basename and extension. That might help...
https://docs.python.org/3/library/os.path.html

Categories