How to add same number to same string in python? - python

I want to give same number to the same string name and save it to text file.
For example if there are multiple strings name "Ball" from filename, then I will give this string number 0. Another example, if I have multiple strings name "Square" from filename, then I will give this string number 1. And so on.
I tried using os.path.walk and splitting the text but still have no idea how to add the number and save it to text file
with open("check.txt", "w") as a:
for path, subdirs, files in os.walk(path):
for i, filename in enumerate(files):
#the filename have underscore to separate the space
#for example Ball_red_move
mylist = filename.split("_")
#I tried to take the first string name only after splitting, here
#for example "Ball"
k = mylist[0]
#After this I don't have idea to add number when the string name
#is same and also save it to txt file with the directory name
This is my expected result:
Check/Ball_red_move_01 0
Check/Ball_red_move_02 0
Check/Ball_red_move_03 0
Check/Square_jump_forward_01 1
Check/Square_jump_forward_02 1
Check/Square_jump_forward_03 1

You might like to do something like this:
Prepare a dictionary to map the string to some labeling numbers and check if the string is present.
object_map = {'Ball': 0, 'Square': 1}
def get_num_from_string(x):
for i in object_map:
if i in x:
return object_map[i]
A = ['Check/Ball_red_move_01', 'Check/Square_jump_forward_01']
for i in A:
print(i + ' '+str(get_num_from_string(i)))
This produces
Check/Ball_red_move_01 0
Check/Square_jump_forward_01 1
A few thing for you to consider, what do you want to do none of the string appears and also what do you want to do if multiple strings appear.

Related

Save part of a file name in seperate panda df columns

I would like to save different positions of a file name in different panda df columns.
For example my file names look like this:
001015io.png
position 0-2 in column 'y position' in this case '001'
position 3-5 in column 'x position' in this case '015'
position 6-7 in column 'status' in this case 'io'
My folder contains about 400 of these picture files. I'm a beginner in programming, so I don't know how I should start to solve this.
If the parts of the file names that you need are consistent (same position and length in all files), you can use string slicing to create new columns from the pieces of the file name like this:
import pandas as pd
df = pd.DataFrame({'file_name': ['001015io.png']})
df['y position'] = df['file_name'].str[0:3]
df['x position'] = df['file_name'].str[3:6]
df['status'] = df['file_name'].str[6:8]
This results in the dataframe:
file_name y position x position status
0 001015io.png 001 015 io
Note that when you slice a string you give a start position and a stop position like [0:3]. The start position is inclusive, but the stop position is not, so [0:3] gives you the substring from 0-2.
You can do this with slicing. A string is basically a list of character, so you can slice this string into the parts you need. See the example below.
filename = '001015io.png'
x = filename[0:3]
y = filename[3:6]
status = filename[6:8]
print(x, y, status)
output
001 015 io
As for getting the list of files, there's an absurdly complete answer for that here.
I have this function below in my personal library which I reuse whenever I need to generate a list of files.
def get_files_from_path(path: str = ".", ext=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this. I tend to use this one.
note that it also goes into subdirs of the path
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of full file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
There's one thing you need to take into account here, this function will give the full filepath, fe: path/to/file/001015io.png
You can use the code below to get just the filename:
import os
print(os.path.basename('path/to/file/001015io.png'))
ouput
001015io.png
Use what Bill the Lizard said to turn it into a df

Comparing incrementing filenames in Python and check which one is missing

I'm iterating through a folder with files and adding each of the file's path to a list. The folder contains files with incrementing file names, such as 00-0.txt, 00-1.txt, 00-2.txt, 01-0.txt, 01-1.txt, 01-2.txt and so on.
The number of files is not fixed and always varies. Also, sometimes a file could be missing. This means that I will sometimes get this list instead:
00-0.txt, 00-1.txt, 01-0.txt, 01-1.txt, 01-2.txt.
However, in my final list, I should always have groups of 9 (so 00-0, 00-1, 00-2 and so on until 00-8 is one group). If a file is missing, then I will append 'is missing' string text in the new list instead.
What I was thinking to do is the following:
Get the last character of the filename (for ex. '3')
Check if it's value is the same as the previous index + 1.
If it's not, then append 'it's missing' string
If it's the same, then append the file name
In pseudo-code (please don't mind the syntax errors, I'm mainly looking for high level advice), it would be something like this:
empty_list = []
list_with_paths = glob.glob("/path/to/dir*.txt")
for index, item in enumerate(list_with_paths):
basename = os.path.basename(item)
filename = os.path.splitext(basename)[0]
if index == 0 and int(filename[-1]) != 0:
empty_list.append('is missing')
elif filename[-1] != empty_list[index - 1] + 1:
empty_list.append('is missing')
else:
empty_list.append(filename)
I'm sure there is a more optimal solution in order to achieve this.
Once you have the set of actual paths, just iterate over the expected paths until you have accounted for all of the actual paths.
from itertools import count
list_with_paths = set(glob.glob("/path/to/dir/*.txt"))
groups = count()
results = []
for g in groups:
if not list_with_paths:
break
for i in range(0,9):
expected = "{:02}-{}.txt".format(g, i)
if "/path/to/dir/" + expected in list_with_paths:
list_with_paths.remove(expected)
else:
expected = "is missing"
results.append(expected)

how to get the number of occurrence of an expression in a file using python

I have a code that read files and find the matching expression with the user input and highlight it, using findall function in regular expression.
also i am trying to save in json file several information based on this matching.
like :
file name
matching expression
number of occurrence
the problem is that the program read the file and display the text with highlighted expression but in the json file it save the number of occurrence as the number of lines.
in this example the word this is the searched word it exist in the text file twice
the result in the json file is = 12 ==> that is the number of text lines
result of the json file and the highlighted text
code:
def MatchFunc(self):
self.textEdit_PDFpreview.clear()
x = self.lineEditSearch.text()
TextString=self.ReadingFileContent(self.FileListSelected())
d = defaultdict(list)
filename = os.path.basename(self.FileListSelected())
RepX='<u><b style="color:#FF0000">'+x+'</b></u>'
for counter , myLine in enumerate(filename):
self.textEdit_PDFpreview.clear()
thematch=re.sub(x,RepX,TextString)
thematchFilt=re.findall(x,TextString,re.M|re.I)
if thematchFilt:
d[thematchFilt[0]].append(counter + 1)
self.textEdit_PDFpreview.insertHtml(str(thematch))
else:
self.textEdit_PDFpreview.insertHtml('No Match Found')
OutPutListMetaData = []
for match , positions in d.items():
print ("this is match {}".format(match))
print("this is position {}".format(positions))
listMetaData = {"File Name":filename,"Searched Word":match,"Number Of Occurence":len(positions)}
OutPutListMetaData.append(listMetaData)
for p in positions:
print("on line {}".format(p))
jsondata = json.dumps(OutPutListMetaData,indent=4)
print(jsondata)
folderToCreate = "search_result"
today = time.strftime("%Y%m%d__%H-%M")
jsonFileName = "{}_searchResult.json".format(today)
if not(os.path.exists(os.getcwd() + os.sep + folderToCreate)):
os.mkdir("./search_result")
fpJ = os.path.join(os.getcwd()+os.sep+folderToCreate,jsonFileName)
print(fpJ)
with open(fpJ,"a") as jsf:
jsf.write(jsondata)
print("finish writing")
It's straightforward using Counter. Once you pass an iterable, it returns each one of them along with the number of occurrences as tuples.
As the re.findall function returns a list you can just do len(result).

How to organize file names that were named as only numbers [duplicate]

Lets say I have three files in a folder: file9.txt, file10.txt and file11.txt and i want to read them in this particular order. Can anyone help me with this?
Right now I am using the code
import glob, os
for infile in glob.glob(os.path.join( '*.txt')):
print "Current File Being Processed is: " + infile
and it reads first file10.txt then file11.txt and then file9.txt.
Can someone help me how to get the right order?
Files on the filesystem are not sorted. You can sort the resulting filenames yourself using the sorted() function:
for infile in sorted(glob.glob('*.txt')):
print "Current File Being Processed is: " + infile
Note that the os.path.join call in your code is a no-op; with only one argument it doesn't do anything but return that argument unaltered.
Note that your files will sort in alphabetical ordering, which puts 10 before 9. You can use a custom key function to improve the sorting:
import re
numbers = re.compile(r'(\d+)')
def numericalSort(value):
parts = numbers.split(value)
parts[1::2] = map(int, parts[1::2])
return parts
for infile in sorted(glob.glob('*.txt'), key=numericalSort):
print "Current File Being Processed is: " + infile
The numericalSort function splits out any digits in a filename, turns it into an actual number, and returns the result for sorting:
>>> files = ['file9.txt', 'file10.txt', 'file11.txt', '32foo9.txt', '32foo10.txt']
>>> sorted(files)
['32foo10.txt', '32foo9.txt', 'file10.txt', 'file11.txt', 'file9.txt']
>>> sorted(files, key=numericalSort)
['32foo9.txt', '32foo10.txt', 'file9.txt', 'file10.txt', 'file11.txt']
You can wrap your glob.glob( ... ) expression inside a sorted( ... ) statement and sort the resulting list of files. Example:
for infile in sorted(glob.glob('*.txt')):
You can give sorted a comparison function or, better, use the key= ... argument to give it a custom key that is used for sorting.
Example:
There are the following files:
x/blub01.txt
x/blub02.txt
x/blub10.txt
x/blub03.txt
y/blub05.txt
The following code will produce the following output:
for filename in sorted(glob.glob('[xy]/*.txt')):
print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# x/blub10.txt
# y/blub05.txt
Now with key function:
def key_func(x):
return os.path.split(x)[-1]
for filename in sorted(glob.glob('[xy]/*.txt'), key=key_func):
print filename
# x/blub01.txt
# x/blub02.txt
# x/blub03.txt
# y/blub05.txt
# x/blub10.txt
EDIT:
Possibly this key function can sort your files:
pat=re.compile("(\d+)\D*$")
...
def key_func(x):
mat=pat.search(os.path.split(x)[-1]) # match last group of digits
if mat is None:
return x
return "{:>10}".format(mat.group(1)) # right align to 10 digits.
It sure can be improved, but I think you get the point. Paths without numbers will be left alone, paths with numbers will be converted to a string that is 10 digits wide and contains the number.
You need to change the sort from 'ASCIIBetical' to numeric by isolating the number in the filename. You can do that like so:
import re
def keyFunc(afilename):
nondigits = re.compile("\D")
return int(nondigits.sub("", afilename))
filenames = ["file10.txt", "file11.txt", "file9.txt"]
for x in sorted(filenames, key=keyFunc):
print xcode here
Where you can set filenames with the result of glob.glob("*.txt");
Additinally the keyFunc function assumes the filename will have a number in it, and that the number is only in the filename. You can change that function to be as complex as you need to isolate the number you need to sort on.
glob.glob(os.path.join( '*.txt'))
returns a list of strings, so you can easily sort the list using pythons sorted() function.
sorted(glob.glob(os.path.join( '*.txt')))
for fname in ['file9.txt','file10.txt','file11.txt']:
with open(fname) as f: # default open mode is for reading
for line in f:
# do something with line

Whats the best way of determining if an image is part of a sequence

I have an image file and I'd like to check if its part of an image sequence using python.
For example i start with this file:
/projects/image_0001.jpg
and i want to check if the file is part of a sequence i.e.
/projects/image_0001.jpg
/projects/image_0002.jpg
/projects/image_0003.jpg
...
Checking for whether there is a sequence of images seems simple if i can determine if the file name could be art of a sequence, i.e. if there is a sequence of numbers of the file name
My first though was to ask the user to add #### to the file path where the numbers should be and input a start and end frame number to replace the hashes with but this is obviously not very user friendly. Is there a way to check for a sequence of numbers in a string with regular expressions or something similar?
It's relatively easy to use python's re module to see if a string contains a sequence of digits. You could do something like this:
mo = re.findall('\d+', filename)
This will return a list of all digits sequences in filename. If:
There is a single result (that is, the filename contains only a single sequence of digits), AND
A subsequent filename has a single digit sequence of the same length, AND
The second digit sequence is 1 greater than the previous
...then maybe they're part of a sequence.
I'm assuming the problem is more for being able to differentiate between sequenced files on disk than knowing any particular information about the filenames themselves.
If thats the case, and what you're looking for is something that is smart enough to take a list like:
/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
...
/path/to/file_10.png
/path/to/image_1.png
/path/to/image_2.png
...
/path/to/image_10.png
And get back a result saying - I have 2 sequences of files: /path/to/file_#.png and /path/to/image_#.png you are going to need 2 passes - 1st pass to determine valid expressions for files, 2nd pass to figure out what all other files meet that requirement.
You'll also need to know if you're going to support gaps (is it required to be sequential)
/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
/path/to/file_5.png
/path/to/file_6.png
/path/to/file_7.png
Is this 1 sequence (/path/to/file_#.png) or 2 sequences (/path/to/file_1-3.png, /path/to/file_5-7.png)
Also - how do you want to handle numeric files in sequences?
/path/to/file2_1.png
/path/to/file2_2.png
/path/to/file2_3.png
etc.
With that in mind, this is how I would accomplish it:
import os.path
import projex.sorting
import re
def find_sequences( filenames ):
"""
Parse a list of filenames into a dictionary of sequences. Filenames not
part of a sequence are returned in the None key
:param filenames | [<str>, ..]
:return {<str> sequence: [<str> filename, ..], ..}
"""
local_filenames = filenames[:]
sequence_patterns = {}
sequences = {None: []}
# sort the files (by natural order) so we always generate a pattern
# based on the first potential file in a sequence
local_filenames.sort(projex.sorting.natural)
# create the expression to determine if a sequence is possible
# we are going to assume that its always going to be the
# last set of digits that makes a sequence, i.e.
#
# test2_1.png
# test2_2.png
#
# test2 will be treated as part of the name
#
# test1.png
# test2.png
#
# whereas here the 1 and 2 are part of the sequence
#
# more advanced expressions would be needed to support
#
# test_01_2.png
# test_02_2.png
# test_03_2.png
pattern_expr = re.compile('^(.*)(\d+)([^\d]*)$')
# process the inputed files for sequences
for filename in filenames:
# first, check to see if this filename matches a sequence
found = False
for key, pattern in sequence_patterns.items():
match = pattern.match(filename)
if ( not match ):
continue
sequences[key].append(filename)
found = True
break
# if we've already been matched, then continue on
if ( found ):
continue
# next, see if this filename should start a new sequence
basename = os.path.basename(filename)
pattern_match = pattern_expr.match(basename)
if ( pattern_match ):
opts = (pattern_match.group(1), pattern_match.group(3))
key = '%s#%s' % opts
# create a new pattern based on the filename
sequence_pattern = re.compile('^%s\d+%s$' % opts)
sequence_patterns[key] = sequence_pattern
sequences[key] = [filename]
continue
# otherwise, add it to the list of non-sequences
sequences[None].append(filename)
# now that we have grouped everything, we'll merge back filenames
# that were potential sequences, but only contain a single file to the
# non-sequential list
for key, filenames in sequences.items():
if ( key is None or len(filenames) > 1 ):
continue
sequences.pop(key)
sequences[None] += filenames
return sequences
And an example usage:
>>> test = ['test1.png','test2.png','test3.png','test4.png','test2_1.png','test2_2.png','test2_3.png','test2_4.png']
>>> results = find_sequences(test)
>>> results.keys()
[None, 'test#.png', 'test2_#.png']
There is a method in there that refers to natural sorting, which is a separate topic. I just used my natural sort method from my projex library. It is open-source, so if you want to use or see it, its here: http://dev.projexsoftware.com/projects/projex
But that topic has been covered elsewhere on the forums, so Just used the method from the library.

Categories