Bulk convert files extension using Python - python

Trying to develop a bulk webp to png converter using python.
Am using the webptools library (https://pypi.org/project/webptools/)
the documentation above only shows how to convert one file at each time and require user input of the file name.
So, what I am trying to do is to scan the folder for *.webp and then convert it to *.png with the original filename. I couldn't solve the output file names. I suppose with the current codes, it keeps overwriting the same file x.png, so it ended up with just 1 output file. I can't figure out how to fix this.
I am new to python. hope to get some guidance or help here. Thank you very much.
from webptools import dwebp
import os, glob
os.chdir("./images") # working directory
webp_list = []
for file in glob.glob("*.webp"):
webp_list = file
print([webp_list])
for files in webp_list:
print(dwebp(input_image=webp_list, output_image="x.png", option="-o", logging="-v"))
# documentation - code allows only 1 input and 1 output
# print(dwebp(input_image="sample.webp", output_image="sample.png", option="-o", logging="-v"))

After you do
webp_list = []
for file in glob.glob("*.webp"):
webp_list = file
print([webp_list])
webp_list is name of last file which matches, rather list of file names. glob.glob itself
Return a possibly-empty list of path names that match pathname(...)
so there is no need for such conhortion and you can simply do
webp_list = glob.glob("*.webp")
instead, then you need different output filename, for which I propose following solution
for filename in webp_list:
outname = filename[:-4] + "png"
dwebp(input_image=filename, output_image=outname, option="-o", logging="-v")
filename[:-4] means filename without 4 last characters (webp in this case), which is then concatenated with png.

I've never used this library before, so my suggestion is based just on how I guess it should work:
from webptools import dwebp
import os, glob
os.chdir("./images") # working directory
webp_list = []
for file in glob.glob("*.webp"):
output_file = file[:-4] + 'png'
dwebp(input_image=file, output_image=output_file, option="-o", logging="-v")

Related

Changing filenames in folders

I have a folder that contains a lot of files that has a lot of copies which make them unreadable.
Example:
cow.txt
cow.txt(1)
cow.txt(2)
cow.txt(3)
dog.txt
dog.txt(1)
I would like to to have all the files structured in away that makes them able to be opened. Example
cow.txt
cow(1).txt
cow(2).txt
cow(3).txt
dog.txt
dog(1).txt
Any help you can provided would be greatly appreciated. I am just looking to make sure there name is changed, and am not looking to read each individual file. In addition if possible I would like to break up the files into 20k blocks. Thank you in advance.
I have tried using os.rename to simply rename the file but I am confused on how to do the efficiently as the numbers come after the .txt I then decided to read all the files and convert them to a pandas data frame and fix it that way. However I am confused on how to pull the files and make them with that name.
list_of_files = os.listdir()
df = pd.DataFrame(list_of_files, columns = ['File_Name'])
df['.txt_removed'] = df.replace(to_replace = '.txt', value = '', regex = True)
df['txt_add'] = df['.txt_removed'] + '.txt'
To pull the files I would do something like this
for filewant_in df['txt_add']:
if filewant in os.listdir():
sutil.copy(os.path.join(filewant), 'new location')
I do not think this option will work even though it gives me my intended result. As I would like to change the overall file names.
You can use python's standard library, the os module has the os.rename function.
Like this:
It works like this:
os.rename('cow.txt(1)', 'cow(1).txt')
Create a .py file and paste the code below then run it. Change /mydir path with the path to the directory having the files. The code will loop through the directory finding all the containing have .txt as part of the file extension and renaming them to a .txt file. I hope it works.
import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt*"):
file_name = os.path.basename(file)
part_name = file_name.split(".", 1)
new_name = part_name[0]+'.txt'
os.rename(file,new_name)

How to use python to create every word list?

Here is wav and image file . and you can donwload it - https://www.dropbox.com/s/iuwt6boc2r2fotc/word_images_file.zip?dl=0
1st step create word list txt file for every word.
put image name to list , and the list name is every word.
but I don't know how to write python code for create every word image list .
example:
accordion-word.txt
file 'accordion_1_musical_instruments.jpg'
file 'accordion_2_musical_instruments.jpg'
file 'accordion_3_musical_instruments.jpg'
file 'accordion_musical_instruments.jpg'
2nd step create audio file list
don't know how to use python write code to create list for every word audio.
accordion-audio.txt
file 'slience_2sec.mp3'
file 'This_is_.mp3'
file 'slience_2sec.mp3'
file 'accordion.mp3'
Thank you !
I prefer os.listdir when I only need file names - Compared to full path that glob returns when absolute path is supplied to it.
I'm making a guess that images' name without numbers in it has same prefix with numbered ones. Regex went crazy without this premis.
Here's full code that does your stuff:
from os import listdir
import re
# Getting list of all image files in directory
location = 'X:test folder/'
image_list = [name for name in listdir(location) if name.endswith('.jpg')]
# Fetching all image keywords, separated by '_x'. ignoring file without it.
reg = re.compile(r'(^[^0-9]*(?=_[0-9]))')
keywords = [reg.match(name).group(0) for name in image_list if reg.match(name)]
# Create txt files per keywords
for keyword in keywords:
filtered = [f"file '{name}'" for name in image_list if name.startswith(keyword)]
with open(location + keyword + '-word.txt', 'w') as file:
file.write('\n'.join(filtered))
# Fetching .wav audio clips
audios = [f"file '{name}'" for name in listdir(location) if name.endswith('.wav')]
# Saving audio clips list
with open(location + 'audio.txt', 'w') as file:
file.write('\n'.join(audios))
Results: On My server.
You can make keyword section way simpler by using changing file names.
for example, air-conditioner_1 instead of air_conditioner_1. Then we know we need to separate at first underscore for all files to get keywords. Much simpler.
You could use python built-in module glob.
For example to get a list of all mp3 file:
glob.glob('C:\Downloads\*.mp3')
Note that the path format in the example above is for window.

How to load data set having multiple 'No-extension files' in python?

I am trying to load a dataset for my machine learning project and it requires me to load files having no extensions.
I tried :
import os
import glob
files = filter(os.path.isfile, glob.glob("./[0-9]*"))
for name in files:
with open(name) as fh:
contents = fh.read()
But doesn't return anything, mainly that glob command has nothing in it.
Also tried :
import os
import glob
path = './dataset1/training_validation/2012-07-10/'
for infile in glob.glob(os.path.join(path, '*')):
print("test")
file = open(infile, 'r')
print(file)
but this returns [] because of that glob command.
I'm stuck in here and couldn't find anything over the internet.
My actual problem is to load 'no extension files in a training and testing set' from two folders, validation, and the test itself. I can iterate through the folder but don't know how to handle those file types.
When I open those files in a text editor. it shows me something like this.
So I know that it's a binary format of an image, but have no idea how can I store and train them.
any help would be appreciated. thanks.
Two things:
File extensions (.txt , .dat , .bat, .f90, etc.) are not meaningful to python, at least when using glob or numpy or something of the sort, because it's just part of a string. Some of us are raised (within Windows) to believe that file extensions mean something (I too fell for it).
The file you are looking at is a text file, containing the ASCII representation of a binary image on 0's and 1's. So, it's not a binary file, and it's not an image file (per-se), but it is a text file, which means we can read it as such from python.
To read this in, you could do either:
1. Use numpy to do data = numpy.loadtxt(<filename>), however you might have trouble delimiting the digits.
2. Use Python's standard open function on the file, and loop through each line using for line in <file_handle>:. This way, each row of data is a string, which can be parsed easily (see documentation on string indexing).
Good luck!
IMO this simply means that your path does not exist.
Perhaps you try in a first test an absolute path to your folder, as you eventually confused the relative position of the folder to your current working directory.
I got it to work with the following code.
fileNames = [f for f in listdir(dirName) if isfile(join(dirName, f))]
random.shuffle(fileNames)
for files in fileNames:
data = open(dirName+'/'+files,'r');
Thanks for your responses.

Run only if "if " statement is true.!

So I've a question, Like I'm reading the fits file and then i'm using the information from the header of the fits to define the other files which are related to the original fits file. But for some of the fits file, the other files (blaze_file, bis_file, ccf_table) are not available. And because of that my code gives the pretty obvious error that No Such file or directory.
import pandas as pd
import sys, os
import numpy as np
from glob import glob
from astropy.io import fits
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
e2ds_hdu = fits.open(filename)
e2ds_header = e2ds_hdu[0].header
date = e2ds_header['DATE-OBS']
date2 = date = date[0:19]
blaze_file = e2ds_header['HIERARCH ESO DRS BLAZE FILE']
bis_file = glob('HARPS.' + date2 + '*_bis_G2_A.fits')
ccf_table = glob('HARPS.' + date2 + '*_ccf_G2_A.tbl')
if not all(file in os.listdir(PATH) for file in [blaze_file,bis_file,ccf_table]):
continue
So what i want to do is like, i want to make my code run only if all the files are available otherwise don't. But the problem is that, i'm defining the other files as variable inside the for loop as i'm using the header information. So how can i define them before the for loop???? and then use something like
So can anyone help me out of this?
The filenames returned by os.listdir() are always relative to the path given there.
In order to be used, they have to be joined with this path.
Example:
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
filepath = os.path.join(PATH, filename)
e2ds_hdu = fits.open(filepath)
…
Let the filenames be ['a', 'b', 'a_ed2ds_A.fits', 'b_ed2ds_A.fits']. The code now excludes the two first names and then prepends the file path to the remaining two.
a_ed2ds_A.fits becomes /home/Desktop/2d_spectra/a_ed2ds_A.fits and
b_ed2ds_A.fits becomes /home/Desktop/2d_spectra/b_ed2ds_A.fits.
Now they can be accessed from everywhere, not just from the given file path.
I should become accustomed to reading a question in full before trying to answer it.
The problem I mentionned is a problem if you don't start the script from any path outside the said directory. Nevertheless, applying it will make your code much more consistent.
Your real problem, however, lies somewhere else: you examine a file and then, after checking its contents, want to read files whose names depend on informations from that first file.
There are several ways to accomplish your goal:
Just extend your loop with the proper tests.
Pseudo code:
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if all files exist:
proceed
or
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if not all files exist:
continue # actual keyword, no pseudo code!
proceed
Put some functionality into functions (variation of 1.)
Create a loop in a generator function which yields the "interesting information" of one fits file (or alternatively nothing) and have another loop run over them to actually work with the data.
If I am still missing some points or am not detailled enough, please let me know.
Since you have to read the fits file to know the other dependant files names, there's no way you can avoid reading the fit file first. The only thing you can do is test for the dependant files existance before trying to read them and skip the rest of the loop (using continue) if not.
Edit this line
e2ds_hdu = fits.open(filename)
And replace with
e2ds_hdu = fits.open(os.path.join(PATH, filename))

Extracting all file names in python

I have a application that converts from one photo format to another by inputting in cmd.exe following: "AppConverter.exe" "file.tiff" "file.jpeg"
But since i don't want to input this every time i want a photo converted, i would like a script that converts all files in the folder. So far i have this:
def start(self):
for root, dirs, files in os.walk("C:\\Users\\x\\Desktop\\converter"):
for file in files:
if file.endswith(".tiff"):
subprocess.run(['AppConverter.exe', '.tiff', '.jpeg'])
So how do i get the names of all the files and put them in subprocess. I am thinking taking basename (no ext.) for every file and pasting it in .tiff and .jpeg, but im at lost on how to do it.
I think the fastest way would be to use the glob module for expressions:
import glob
import subprocess
for file in glob.glob("*.tiff"):
subprocess.run(['AppConverter.exe', file, file[:-5] + '.jpeg'])
# file will be like 'test.tiff'
# file[:-5] will be 'test' (we remove the last 5 characters, so '.tiff'
# we add '.jpeg' to our extension-less string
All those informations are on the post I've linked in the comments o your original question.
You could try looking into os.path.splitext(). That allows you to split the file name into a tuple containing the basename and extension. That might help...
https://docs.python.org/3/library/os.path.html

Categories