Save part of a file name in seperate panda df columns

Save part of a file name in seperate panda df columns - python

I would like to save different positions of a file name in different panda df columns.
For example my file names look like this:
001015io.png
position 0-2 in column 'y position' in this case '001'
position 3-5 in column 'x position' in this case '015'
position 6-7 in column 'status' in this case 'io'
My folder contains about 400 of these picture files. I'm a beginner in programming, so I don't know how I should start to solve this.

If the parts of the file names that you need are consistent (same position and length in all files), you can use string slicing to create new columns from the pieces of the file name like this:
import pandas as pd
df = pd.DataFrame({'file_name': ['001015io.png']})
df['y position'] = df['file_name'].str[0:3]
df['x position'] = df['file_name'].str[3:6]
df['status'] = df['file_name'].str[6:8]
This results in the dataframe:
file_name y position x position status
0 001015io.png 001 015 io
Note that when you slice a string you give a start position and a stop position like [0:3]. The start position is inclusive, but the stop position is not, so [0:3] gives you the substring from 0-2.

You can do this with slicing. A string is basically a list of character, so you can slice this string into the parts you need. See the example below.
filename = '001015io.png'
x = filename[0:3]
y = filename[3:6]
status = filename[6:8]
print(x, y, status)
output
001 015 io
As for getting the list of files, there's an absurdly complete answer for that here.
I have this function below in my personal library which I reuse whenever I need to generate a list of files.
def get_files_from_path(path: str = ".", ext=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this. I tend to use this one.
note that it also goes into subdirs of the path
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of full file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
There's one thing you need to take into account here, this function will give the full filepath, fe: path/to/file/001015io.png
You can use the code below to get just the filename:
import os
print(os.path.basename('path/to/file/001015io.png'))
ouput
001015io.png
Use what Bill the Lizard said to turn it into a df

Related

loop through folder and find files missing python

I am trying to loop through a folder for a given date. There will be 3-4 files in a folder everyday. I have to check if each file is present in the folder for a given date. I cannot hardcode the file name because it has sequence number and can vary.
I got below snippet from stackoverflow and it works if my filename is consistent.
import glob
source='./'
date=20221102
paths=["file1_"+date+"*.json","file2_"+date+"*.json","file3_"+date+"*.json"]
for path in paths:
print(f"looking for {path} with {source+'**/'+path+'.jpg'}")
print(glob.glob(source+"**/"+path+".jpg",recursive=True))
mp=[path for path in paths if not glob.glob(source+"**/"+path+".jpg",recursive=True)]
for nl in mp:
print(f'{nl}... is missing')
I just would like to loop all files in a folder and throw exception if any one of the file is missing.
Can anyone please help?

Here is how I handle the issue. I grab a list of files from path that contain the date and file type. I then split on any possible delimiter that isn't alphanumeric to create a list of lists. Then flatten to a single list.
Finally compare lists for matches return boolean.
import re
from pathlib import Path
def flatten_nested(nested: list or tuple or set) -> list:
flattened = []
[flattened.extend(flatten_nested(x)) if isinstance(x, (list, tuple, set)) else flattened.append(x) for x in nested]
return flattened
def find_files(file_path: str, file_type: str, value: str) -> list:
files = Path(file_path).rglob(f"*{value}*.{file_type}")
delimiter = r"(?<=[^\W_])[\W_]+(?=[^\W_])"
return flatten_nested([re.sub(delimiter, " ", x.stem).split(" ") for x in files if x.is_file()])
source = "./"
to_check = ["file1", "file2", "file3"]
check = all(x in find_files(source, "json", "20221102") for x in to_check)
print(check)
True

Comparing incrementing filenames in Python and check which one is missing

I'm iterating through a folder with files and adding each of the file's path to a list. The folder contains files with incrementing file names, such as 00-0.txt, 00-1.txt, 00-2.txt, 01-0.txt, 01-1.txt, 01-2.txt and so on.
The number of files is not fixed and always varies. Also, sometimes a file could be missing. This means that I will sometimes get this list instead:
00-0.txt, 00-1.txt, 01-0.txt, 01-1.txt, 01-2.txt.
However, in my final list, I should always have groups of 9 (so 00-0, 00-1, 00-2 and so on until 00-8 is one group). If a file is missing, then I will append 'is missing' string text in the new list instead.
What I was thinking to do is the following:
Get the last character of the filename (for ex. '3')
Check if it's value is the same as the previous index + 1.
If it's not, then append 'it's missing' string
If it's the same, then append the file name
In pseudo-code (please don't mind the syntax errors, I'm mainly looking for high level advice), it would be something like this:
empty_list = []
list_with_paths = glob.glob("/path/to/dir*.txt")
for index, item in enumerate(list_with_paths):
basename = os.path.basename(item)
filename = os.path.splitext(basename)[0]
if index == 0 and int(filename[-1]) != 0:
empty_list.append('is missing')
elif filename[-1] != empty_list[index - 1] + 1:
empty_list.append('is missing')
else:
empty_list.append(filename)
I'm sure there is a more optimal solution in order to achieve this.

Once you have the set of actual paths, just iterate over the expected paths until you have accounted for all of the actual paths.
from itertools import count
list_with_paths = set(glob.glob("/path/to/dir/*.txt"))
groups = count()
results = []
for g in groups:
if not list_with_paths:
break
for i in range(0,9):
expected = "{:02}-{}.txt".format(g, i)
if "/path/to/dir/" + expected in list_with_paths:
list_with_paths.remove(expected)
else:
expected = "is missing"
results.append(expected)

How to add same number to same string in python?

I want to give same number to the same string name and save it to text file.
For example if there are multiple strings name "Ball" from filename, then I will give this string number 0. Another example, if I have multiple strings name "Square" from filename, then I will give this string number 1. And so on.
I tried using os.path.walk and splitting the text but still have no idea how to add the number and save it to text file
with open("check.txt", "w") as a:
for path, subdirs, files in os.walk(path):
for i, filename in enumerate(files):
#the filename have underscore to separate the space
#for example Ball_red_move
mylist = filename.split("_")
#I tried to take the first string name only after splitting, here
#for example "Ball"
k = mylist[0]
#After this I don't have idea to add number when the string name
#is same and also save it to txt file with the directory name
This is my expected result:
Check/Ball_red_move_01 0
Check/Ball_red_move_02 0
Check/Ball_red_move_03 0
Check/Square_jump_forward_01 1
Check/Square_jump_forward_02 1
Check/Square_jump_forward_03 1

You might like to do something like this:
Prepare a dictionary to map the string to some labeling numbers and check if the string is present.
object_map = {'Ball': 0, 'Square': 1}
def get_num_from_string(x):
for i in object_map:
if i in x:
return object_map[i]
A = ['Check/Ball_red_move_01', 'Check/Square_jump_forward_01']
for i in A:
print(i + ' '+str(get_num_from_string(i)))
This produces
Check/Ball_red_move_01 0
Check/Square_jump_forward_01 1
A few thing for you to consider, what do you want to do none of the string appears and also what do you want to do if multiple strings appear.

Script for separating series of photos

I need to visually separate photos (JPEGs) in a folder by placing black placeholder pictures between series with identical file names (only last two digits of the file names are different). The folder is typically containing single (stand alone) photos, named something like 03-12345-randomfilename.jpg and series named 03-12345-file01.jpg, 03-12345-file02.jpg, ..03, ..04, etc.
The singles should be left alone, but I need to place a black picture before and after all series.
I have the following Python script (originally written by someone else) that is intermittently failing for no apparent reason. It usually works, but sometimes it will overwrite files in the middle of a series, or more typically, it will fail to place a black picture after the last photo in a series. I've spent hours trying to figure out what's going on, but I'm stuck.
Any suggestions most appreciated.
def blackJPG(directory):
# iterate over every file name in the directory
blackJPG = '/Users/username/black.jpg'
filelist = {}
for file_name in os.listdir(directory):
filename, file_extension = os.path.splitext(file_name)
stringmatch = re.compile(r'(\d{2})(.*?)(\d+)(.*?)(([A-Za-z]+))(.*?)(\d+)')
m = stringmatch.search(file_name)
#Create search table
if m:
sequence = int(m.group(8))
filename_without_sequence = "{0}{1}{2}{3}{4}{5}".format(m.group(1),m.group(2),m.group(3),m.group(4),m.group(6),m.group(7))
filelist.update({filename_without_sequence: (sequence)})
for key, value in filelist.iteritems():
if value > 1:
newJPG = "{0}/{1}00.jpg".format(directory, key)
if value >= 10:
lastJPG = "{0}/{1}{2}.jpg".format(directory, key, value+1)
else:
lastJPG = "{0}/{1}0{2}.jpg".format(directory, key, value+1)
#Create first blackJPG
shutil.copyfile(blackJPG, newJPG)
#Create last blackJPG
shutil.copyfile(blackJPG, lastJPG)
return "Done"

If the variation is always the last 2 characters, then you can grab the part that doesn't change (the prefix) count the number of prefixes and create a file for those with more than one file:
def add_black_jpg(directory):
series_count = {}
for file in os.listdir(directory):
name, ext = os.path.splitext(file)
prefix = name[:-2]
count = series_count.get(prefix, 0)
series_count[prefix] = count + 1
for prefix, count in series_count.items():
if count > 1:
shutil.copyfile(black_jpg_location, f"{prefix}00.jpg")

I have a list of a part of a filename, for each one I want to go through the files in a directory that matches that part and return the filename

So, let's say I have a directory with a bunch of filenames.
for example:
Scalar Product or Dot Product (Hindi)-fodZTqRhC24.m4a
AP Physics C - Dot Product-Wvhn_lVPiw0.m4a
An Introduction to the Dot Product-X5DifJW0zek.m4a
Now let's say I have a list, of only the keys, which are at the end of the file names:
['fodZTqRhC24', 'Wvhn_lVPiw0, 'X5DifJW0zek']
How can I iterate through my list to go into that directory and search for a file name containing that key, and then return me the filename?
Any help is greatly appreciated!

I thought about it, I think I was making it harder than I had to with regex. Sorry about not trying it first. I have done it this way:
audio = ['Scalar Product or Dot Product (Hindi)-fodZTqRhC24.m4a',
'An Introduction to the Dot Product-X5DifJW0zek.m4a',
'AP Physics C - Dot Product-Wvhn_lVPiw0.m4a']
keys = ['fodZTqRhC24', 'Wvhn_lVPiw0', 'X5DifJW0zek']
file_names = []
for Id in keys:
for name in audio:
if Id in name:
file_names.append(name)
combined = zip(keys,file_names)
combined

Here is an example:
ls: list of files in a given directory
names: list of strings to search for
import os
ls=os.listdir("/any/folder")
n=['Py', 'sql']
for file in ls:
for name in names:
if name in file:
print(file)
Results :
.PyCharm50
.mysql_history
zabbix2.sql
.mysql
PycharmProjects
zabbix.sql

Assuming you know which directory that you will be looking in, you could try something like this:
import os
to_find = ['word 1', 'word 2'] # list containing words that you are searching for
all_files = os.listdir('/path/to/file') # creates list with files from given directory
for file in all_files: # loops through all files in directory
for word in to_find: # loops through target words
if word in file:
print file # prints file name if the target word is found
I tested this in my directory which contained these files:
Helper_File.py
forms.py
runserver.py
static
app.py
templates
... and i set to_find to ['runserver', 'static']...
and when I ran this code it returned:
runserver.py
static
For future reference, you should make at least some sort of attempt at solving a problem prior to posting a question on Stackoverflow. It's not common for people to assist you like this if you can't provide proof of an attempt.

Here's a way to do it that allows for a selection of weather to match based on placement of text.
import os
def scan(dir, match_key, bias=2):
'''
:0 startswith
:1 contains
:2 endswith
'''
matches = []
if not isinstance(match_key, (tuple, list)):
match_key = [match_key]
if os.path.exists(dir):
for file in os.listdir(dir):
for match in match_key:
if file.startswith(match) and bias == 0 or file.endswith(match) and bias == 2 or match in file and bias == 1:
matches.append(file)
continue
return matches
print scan(os.curdir, '.py'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save part of a file name in seperate panda df columns - python

Related

loop through folder and find files missing python

Comparing incrementing filenames in Python and check which one is missing

How to add same number to same string in python?

Script for separating series of photos

I have a list of a part of a filename, for each one I want to go through the files in a directory that matches that part and return the filename

Categories

Resources