How to use python to create every word list? - python

Here is wav and image file . and you can donwload it - https://www.dropbox.com/s/iuwt6boc2r2fotc/word_images_file.zip?dl=0
1st step create word list txt file for every word.
put image name to list , and the list name is every word.
but I don't know how to write python code for create every word image list .
example:
accordion-word.txt
file 'accordion_1_musical_instruments.jpg'
file 'accordion_2_musical_instruments.jpg'
file 'accordion_3_musical_instruments.jpg'
file 'accordion_musical_instruments.jpg'
2nd step create audio file list
don't know how to use python write code to create list for every word audio.
accordion-audio.txt
file 'slience_2sec.mp3'
file 'This_is_.mp3'
file 'slience_2sec.mp3'
file 'accordion.mp3'
Thank you !

I prefer os.listdir when I only need file names - Compared to full path that glob returns when absolute path is supplied to it.
I'm making a guess that images' name without numbers in it has same prefix with numbered ones. Regex went crazy without this premis.
Here's full code that does your stuff:
from os import listdir
import re
# Getting list of all image files in directory
location = 'X:test folder/'
image_list = [name for name in listdir(location) if name.endswith('.jpg')]
# Fetching all image keywords, separated by '_x'. ignoring file without it.
reg = re.compile(r'(^[^0-9]*(?=_[0-9]))')
keywords = [reg.match(name).group(0) for name in image_list if reg.match(name)]
# Create txt files per keywords
for keyword in keywords:
filtered = [f"file '{name}'" for name in image_list if name.startswith(keyword)]
with open(location + keyword + '-word.txt', 'w') as file:
file.write('\n'.join(filtered))
# Fetching .wav audio clips
audios = [f"file '{name}'" for name in listdir(location) if name.endswith('.wav')]
# Saving audio clips list
with open(location + 'audio.txt', 'w') as file:
file.write('\n'.join(audios))
Results: On My server.
You can make keyword section way simpler by using changing file names.
for example, air-conditioner_1 instead of air_conditioner_1. Then we know we need to separate at first underscore for all files to get keywords. Much simpler.

You could use python built-in module glob.
For example to get a list of all mp3 file:
glob.glob('C:\Downloads\*.mp3')
Note that the path format in the example above is for window.

Related

Copying files from one location of a server to another using python

Say I have a file that contains the different locations where some '.wav' files are present on a server. For example say the content of the text file location.txt containing the locations of the wav files is this
/home/user/test_audio_folder_1/audio1.wav
/home/user/test_audio_folder_2/audio2.wav
/home/user/test_audio_folder_3/audio3.wav
/home/user/test_audio_folder_4/audio4.wav
/home/user/test_audio_folder_5/audio5.wav
Now what I want to do is that I want to copy these files from different locations within the server to a particular directory within that server, for example say /home/user/final_audio_folder/ and this directory will contain all the audio files from audio1.wav to audio5.wav
I am trying to perform this task by using shutil, but the main problem with shutil that I am facing is that while copying the files, I need to name the file. I have written a demo version of what I am trying to do, but dont know how to scale it when I will be reading the paths of the '.wav' files from the txt file and copy them to my desired location using a loop.
My code for copying a single file goes as follows,
import shutil
original = r'/home/user/test_audio_folder_1/audio1.wav'
target=r'/home/user/final_audio_folder_1/final_audio1.wav'
shutil.copyfile(original,target)
Any suggestions will be really helpful. Thank you.
import shutil
i=0
with open(r'C:/Users/turing/Desktop/location.txt', "r") as infile:
for t in infile:
i+=1
x="audio"+str(i)+".wav"
t=t.rstrip('\n')
original= r'{}'.format(t)
target=r'C:/Users/turing/Desktop/audio_in/' + x
shutil.copyfile(original, target)
Use the built-in string's split() method within a for loop on the location.txt contents & split the name of the directory on the '/' character, then the last element in a new list would be your filename.

Bulk convert files extension using Python

Trying to develop a bulk webp to png converter using python.
Am using the webptools library (https://pypi.org/project/webptools/)
the documentation above only shows how to convert one file at each time and require user input of the file name.
So, what I am trying to do is to scan the folder for *.webp and then convert it to *.png with the original filename. I couldn't solve the output file names. I suppose with the current codes, it keeps overwriting the same file x.png, so it ended up with just 1 output file. I can't figure out how to fix this.
I am new to python. hope to get some guidance or help here. Thank you very much.
from webptools import dwebp
import os, glob
os.chdir("./images") # working directory
webp_list = []
for file in glob.glob("*.webp"):
webp_list = file
print([webp_list])
for files in webp_list:
print(dwebp(input_image=webp_list, output_image="x.png", option="-o", logging="-v"))
# documentation - code allows only 1 input and 1 output
# print(dwebp(input_image="sample.webp", output_image="sample.png", option="-o", logging="-v"))
After you do
webp_list = []
for file in glob.glob("*.webp"):
webp_list = file
print([webp_list])
webp_list is name of last file which matches, rather list of file names. glob.glob itself
Return a possibly-empty list of path names that match pathname(...)
so there is no need for such conhortion and you can simply do
webp_list = glob.glob("*.webp")
instead, then you need different output filename, for which I propose following solution
for filename in webp_list:
outname = filename[:-4] + "png"
dwebp(input_image=filename, output_image=outname, option="-o", logging="-v")
filename[:-4] means filename without 4 last characters (webp in this case), which is then concatenated with png.
I've never used this library before, so my suggestion is based just on how I guess it should work:
from webptools import dwebp
import os, glob
os.chdir("./images") # working directory
webp_list = []
for file in glob.glob("*.webp"):
output_file = file[:-4] + 'png'
dwebp(input_image=file, output_image=output_file, option="-o", logging="-v")

python split string and list full path name

I am using glob to get a list of all PDF files in a folder (I need full path names to upload file to cloud)
also, during the upload I need to assign a "title" to the file which we be the items name in the cloud.
I need to split the last "\" and the "." and get the values in between. for example:
pdf_list = glob.glob(r'C:\User\username\Desktop\pdf\*.pdf')
a item in the list will be: "c:\User\username\Desktop\pdf\4434343434331.pdf"
I need another pythonic way to grab the pdfs file name in a separate variable while still in the for loop.
for file in pdf_list:
upload.file
file.title(file.split(".")[0]
however the above split will not return my desired results but something along those lines
I am using a for loop to upload each pdf (using file path)
Actually, there is a function for this already:
for file in pdf_list:
file_name = os.path.basename(file)
upload.file(file_name)
You can use pathlib, for example:
from pathlib import Path
p = list(Path('C:/User/username/Desktop/pdf').glob('*.pdf'))
first_filename = p[0].name

Iterating through subdirectories to add unique strings to each file

My goal: To build a program that:
Opens a folder (provided by the user) from the user's computer
Iterates through that folder, opening each document in each subdirectory (named according to language codes; "AR," "EN," "ES," etc.)
Substitutes a string in for another string in each document. Crucially, the new string will change with each document (though the old string will not), according to the language code in the folder name.
My level of experience: Minimal; been learning python for a few months but this is the first program I'm building that's not paint-by-numbers. I'm building it to make a process at work faster. I'm sure I'm not building this as efficiently as possible; I've been throwing it together from my own knowledge and from reading stackexchange religiously while building it.
Research I've done on my own: I've been living in stackexchange the past few days, but I haven't found anyone doing quite what I'm doing (which was very surprising to me). I'm not sure if this is just because I lack the vocabulary to search (tried out a lot of search terms, but none of them totally match what I'm doing) or if this is just the wrong way of going about things.
The issue I'm running into:
I'm getting this error:
Traceback (most recent call last):
File "test5.py", line 52, in <module>
for f in os.listdir(src_dir):
OSError: [Errno 20] Not a directory: 'ExploringEduTubingEN(1).txt'
I'm not sure how to iterate through every file in the subdirectories and update a string within each file (not the file names) with a new and unique string. I thought I had it, but this error has totally thrown me off. Prior to this, I was getting an error for the same line that said "Not a file or directory: 'ExploringEduTubingEN(1).txt'" and it's surprising to me that the first error could request a file or a directory, and once I fixed that, it asked for just a directory; seems like it should've just asked for a directory at the beginning.
With no further ado, the code (placing at bottom because it's long to include context):
import os
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
#Asking for a PDF to which we'll iteratively append the language codes from below.
lst = ['_ar.pdf', '_cs.pdf', '_de.pdf', '_el.pdf', '_en_gb.pdf', '_es.pdf', '_es_419.pdf',
'_fr.pdf', '_id.pdf', '_it.pdf', '_ja.pdf', '_ko.pdf', '_nl.pdf', '_pl.pdf', '_pt_br.pdf', '_pt_pt.pdf', '_ro.pdf', '_ru.pdf',
'_sv.pdf', '_th.pdf', '_tr.pdf', '_vi.pdf', '_zh_tw.pdf', '_vn.pdf', '_zh_cn.pdf']
#list of language code PDF appending strings.
pdf_list=open('pdflist.txt','w+')
#creating a document to put this group of PDF filepaths in.
pdf2='pdflist.txt'
#making this an actual variable.
for word in lst:
pdf_list.write(ex + word + "\n")
#creating a version of the PDF example for every item in the language list, and then appending the language codes.
pdf_list.seek(0)
langlist=pdf_list.readlines()
#creating a list of the PDF paths so that I can use it below.
for i in langlist:
i=i.rstrip("\n")
#removing the line breaks.
pdf_list.close()
#closing the file after removing the line breaks.
file1=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
#the folder provided by the user to iterate through.
folder1=os.listdir(file1)
#creating a list of the files within the folder
pdfpath1="example.pdf"
langfile="example2.pdf"
#setting variables for below
#my thought here is that i'd need to make the variable the initial folder, then make it a list, then iterate through the list.
for ogfile in folder1:
#want to iterate through all the files in the directory, including in subdirectories
src_dir=ogfile.split("/",6)
src_dir="/".join(src_dir[:6])
#goal here is to cut off the language code folder name and then join it again, w/o language code.
for f in os.listdir(src_dir):
f = os.path.join(src_dir, f)
#i admit this got a little convoluted–i'm trying to make sure the files put the right code in, I.E. that the document from the folder ending in "AR" gets the PDF that will now end in "AR"
#the perils of pulling from lots of different questions in stackexchange
with open(ogfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
for langfile in langlist:
f.write(content.replace(pdfpath1, langfile))
#replacing the placeholder PDF link with the created PDF links from the beginning of the code
If you read this far, thanks. I've tried to provide as much information as possible, especially about my thought process. I'll keep trying things and reading, but I'd love to have more eyes on it.
You have to specify the full path to your directories/files. Use os.path.join to create a valid path to your file or directory (and platform-independent).
For replacing your string, simply modify your example string using the subfolder name. Assuming that ex as the format filename.pdf, you could use: newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'. That way, you do not have to specify the list of replacement strings nor loop through this list.
Solution
To iterate over your directory and replace the content of your files as you'd like, you can do the following:
# Get the name of the file: "example.pdf" (note the .pdf is assumed here)
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
# Get the folder to go through
folderpath=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
# Get all subfolders and go through them (named: 'AR', 'DE', etc.)
subfolders=os.listdir(folderpath)
for subfolder in subfolders:
# Get the full path to the subfolder
fullsubfolder = os.path.join(folderpath,subfolder)
# If it is a directory, go through it
if os.path.isdir(fullsubfolder):
# Find all files in subdirectory and go through each of them
files = os.listdir(fullsubfolder)
for filename in files:
# Get full path to the file
fullfile = os.path.join(fullsubfolder, filename)
# If it is a file, process it (note: we do not check if it is a text file here)
if os.path.isfile(fullfile):
with open(fullfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
# Create the replacing string based on the subdirectory name. Ex: 'example_ar.pdf'
newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'
f.write(content.replace(ex, newstring))
Note
Instead of asking the user to find write the folder, you could ask him to open the directory with a dialog box. See this question for more info: Use GUI to open directory in Python 3

Extracting all file names in python

I have a application that converts from one photo format to another by inputting in cmd.exe following: "AppConverter.exe" "file.tiff" "file.jpeg"
But since i don't want to input this every time i want a photo converted, i would like a script that converts all files in the folder. So far i have this:
def start(self):
for root, dirs, files in os.walk("C:\\Users\\x\\Desktop\\converter"):
for file in files:
if file.endswith(".tiff"):
subprocess.run(['AppConverter.exe', '.tiff', '.jpeg'])
So how do i get the names of all the files and put them in subprocess. I am thinking taking basename (no ext.) for every file and pasting it in .tiff and .jpeg, but im at lost on how to do it.
I think the fastest way would be to use the glob module for expressions:
import glob
import subprocess
for file in glob.glob("*.tiff"):
subprocess.run(['AppConverter.exe', file, file[:-5] + '.jpeg'])
# file will be like 'test.tiff'
# file[:-5] will be 'test' (we remove the last 5 characters, so '.tiff'
# we add '.jpeg' to our extension-less string
All those informations are on the post I've linked in the comments o your original question.
You could try looking into os.path.splitext(). That allows you to split the file name into a tuple containing the basename and extension. That might help...
https://docs.python.org/3/library/os.path.html

Categories