New to python struggling with append - python

Hey I've looked around but can't seem to find an answer. I am looking to identify and print the number of files in a list & their names, but keeping running into a an error. I am new to python so I am quite sure I got something wrong and apologize if this is a stupid question. Below is the code I have so far
import os
folderpath = "C:\Users\Michaelf\Desktop\GEOG M173\LabData"
filelist = os.listdir(folderpath)
print filelist
Counter_Shapefiles = 0
Names_of_Shapefiles = 0
for the_file_name in filelist:
File_Extension = the_file_name[-4:]
if "file_Extension == .shp":
Counter_Shapefiles= Counter_Shapefiles + 1
Names_of_Shapefiles.append

to use append you need a list not an int so
Name_of_Shapefiles = 0
should be
Name_of_Shapefiles = []
Second, the syntax for append is Names_of_Shapefiles.append(the_file_name)

Names_of_Shapefiles is an int. change that to a list and add what you want appended into the append call.
Also, when adding questions, note what errors you get for future reference.

import os
folderpath = "C:\Users\Michaelf\Desktop\GEOG M173\LabData"
filelist = os.listdir(folderpath)
print filelist
Counter_Shapefiles = 0
Name_of_Shapefiles = []
for the_file_name in filelist:
File_Extension = the_file_name[-4:]
if File_Extension == ".shp":
Counter_Shapefiles = Counter_Shapefiles+1
Names_of_Shapefiles.append(the_file_name)
Have a look at the changes that I've made to your code.
For if statements you don't want your condition to be in quotation marks, as that turns it into a string. If you want to make it clear that it's your statement then you can use brackets, but it's not necessary
In that same if statement you type file_Extension without a capital f, which isn't the same as File_Extension, so your if statement doesn't know what it's looking for.
For your ".shp" string, that does need to be in quotation marks, to make it clear that it's a string.
When defining your Names_of_Shapefiles array, you need to put it in square brackets, or it'll automatically become a number instead of an array.
The .append is a function, and takes input; how else would your program know what to append to the Names_of_Shapefiles array? This is why you put what you want appending inside the brackets at the end.

Related

How do I use a list or set as keys in file renaming

Is something like this possible? Id like to use a dictionary or set as the key for my file renamer. I have a lot of key words that id like to filter out of the file names but the only way iv found to do it so far is to search by string such as key720 = "720" this make it functions correctly but creates bloat. I have to have a version of the code at bottom for each keyword I want to remove.
how do I get the list to work as keys in the search?
I tried to take the list and make it a string with:
str1 = ""
keyres = (str1.join(keys))
This was closer but it makes a string of all the entry's I think and didn't pick up any keywords.
so iv come to this at the moment.
keys = ["720p", "720", "1080p", "1080"]
for filename in os.listdir(dirName):
if keys in filename:
filepath = os.path.join(dirName, filename)
newfilepath = os.path.join(dirName, filename.replace(keys,""))
os.rename(filepath, newfilepath)
Is there a way to maybe go by index and increment it one at a time? would that allow the strings in the list to be used as strings?
What I'm trying to do is take a file name and rename it by removing all occurrences of the key words.
How about using Regular Expressions, specifically the sub function?
from re import sub
KEYS = ["720p", "720", "1080p", "1080"]
old_filename = "filename1080p.jpg"
new_filename = sub('|'.join(KEYS),'',old_filename)
print(new_filename)

Can you spot the problem with this REGEX statement?

Im running .txt files through a for loop which should slice out keywords and .append them into lists. For some reason my REGEX statements are returning really odd results.
My first statement which iterates through the full filenames and slices out the keyword works well.
# Creates a workflow list of file names within target directory for further iteration
stack = os.listdir(
"/Users/me/Documents/software_development/my_python_code/random/countries"
)
# declares list, to be filled, and their associated regular expression, to be used,
# in the primary loop
names = []
name_pattern = r"-\s(.*)\.txt"
# PRIMARY LOOP
for entry in stack:
if entry == ".DS_Store":
continue
# extraction of country name from file name into `names` list
name_match = re.search(name_pattern, entry)
name = name_match.group(1)
names.append(name)
This works fine and creates the list that I expect
However, once I move on to a similar process with the actual contents of files, it no longer works.
religions = []
reli_pattern = r"religion\s=\s(.+)."
# PRIMARY LOOP
for entry in stack:
if entry == ".DS_Store":
continue
# opens and reads file within `contents` variable
file_path = (
"/Users/me/Documents/software_development/my_python_code/random/countries" + "/" + entry
)
selection = open(file_path, "rb")
contents = str(selection.read())
# extraction of religion type and placement into `religions` list
reli_match = re.search(reli_pattern, contents)
religion = reli_match.group(1)
religions.append(religion)
The results should be something like: "therevada", "catholic", "sunni" etc.
Instead i'm getting seemingly random pieces of text from the document which have nothing to do with my REGEX like ruler names and stat values that do not contain the word "religion"
To try and figure this out I isolated some of the code in the following way:
contents = "religion = catholic"
reli_pattern = r"religion\s=\s(.*)\s"
reli_match = re.search(reli_pattern, contents)
print(reli_match)
And None is printed to the console so I am assuming the problem is with my REGEX. What silly mistake am I making which is causing this?
Your regular expression (religion\s=\s(.*)\s) requires that there be a trailing whitespace (the last \s there). Since your string doesn't have one, it doesn't find anything when searching thus re.search returns None.
You should either:
Change your regex to be r"religion\s=\s(.*)" or
Change the string you're searching to have a trailing whitespace (i.e 'religion = catholic' to 'religion = catholic ')

python openpyxl.load_workbook(r"mypath")

i want to use this piece of code openpyxl.load_workbook(r"mypath") but the only difference is that mypath is a varialbe path i change everytime depending on a loop of different folders.
PathsList = []
for folderName, subFolders, fileNames in os.walk
fileNamesList.append(os.path.basename(fileName))
PathsList.append(os.path.abspath(fileName))
or i in range(len(fileNamesList)):
j = 1
while j < len(fileNamesList):
if(first3isdigit(fileNamesList[i])) == (first3isdigit(fileNamesList[j])):
if(in_fileName_DOORS in str(fileNamesList[i]) and in_fileName_TAF in str(fileNamesList[j])):
mypath = PathsList[i]
File = openpyxl.load_workbook(r'mypath ')
wsFile = File.active
mypath is not readable as a vairable , is there's any solution!
Edit 1:i thought also about
File = openpyxl.load_workbook(exec(r'%s' % (mypath))
but couldn't since exec can't be inside brackets
This code
File = openpyxl.load_workbook(r'mypath ')
Tries to pass the raw string 'mypath ' as an argument to the load_workbook method.
If you want to pass the contents of the mypath variable to the method, you should remove the apostrophe and the r tag.
File = openpyxl.load_workbook(mypath)
This is basic python synthax. You can read more about it in the documentation.
Please let me know if this is what you needed.
Edit:
If the slashes are a concern you can do the following:
File = openpyxl.load_workbook(mypath.replace('\\','/')

How to remove .txt or .docx at end of string in python

I am trying to create a list of all file names from a specific directory. My code is below:
import os
#dir = input('Enter the directory: ')
dir = 'C:/Users/brian/Documents/Moeller'
r = os.listdir(dir)
for fnam in os.listdir(dir):
print(fnam.split())
sep = fnam.split()
My output is:
['50', 'OP', '856101P02.txt']
['856101P02', 'OP', '040.txt']
['856101P02', 'OP', '50.txt']
['OP', '040', '856101P02.txt']
How would I be able to remove anything to the right of a "." in a string, while keeping the text to the left of the period?
Basically, what you do is start splitting from the right with rsplit and then instruct it to split only once.
print "a.b.c.d".rsplit('.',1)[0]
prints a.b.c
You can use os.path.splitext to split a filename to two parts,
keeping only the extension in the right, and everything else on the left.
For example,
a path like some/path/file.tar.gz will be split to some/path/file.tar and .gz:
base, ext = os.path.splitext('path/to/hello.tar.gz')
If you want to get rid of the . in the ext part,
simply use ext[1:].
If the file has no extension, for example path/to/file,
then the ext part will be the empty string.
This is a nice feature,
so that os.path.splitext always returns a tuple of two elements,
and this way the base, ext = ... example above always works.
I am trying to create a list of all file names from a specific directory.
[...]
How would I be able to remove anything to the right of a "." in a string, while keeping the text to the left of the period?
To get the base names (filenames without the extension) of a specific directory somedir, you could use this list comprehension:
basenames = [os.path.splitext(f)[0] for f in os.listdir(somedir)]
From there, find the period and take everything up to that position. In simple steps ...
for fnam in os.listdir(dir):
nam_split = fnam.split() # "sep" is usually the separator character
print(nam_split)
ext_split = nam_split.rsplit('.', 1) # Split at only one dot, from the right
file_no_ext = ext_split[0] # The first part of the split is the file name
print(file_no_ext)

Regex Doesn't Match Beyond 1st Result

I'm using Python to match against a list (array), but I'm sure the problem lies on the regex itself.
Assuming I have the following:
foo.html
bar.html
teep.html
And I use the following regex: .*(?=.html)
where .* will match anything and (?=.html) requires the string be present, but does not include it in the results
Thus, I should just be left with what's before .html
When I check, it only matches the first item in the array (in this case foo), but why not the others
my_regex = re.compile('.html$')
r2 = re.compile('.*(?=.html)')
start = '/path/to/folder'
os.chdir(start)
a = os.listdir(start)
for item in a:
if my_regex.search(item) != None and os.path.isdir(item):
print 'FOLDER MATCH: '+ item # this is a folder and not a file
starterPath = os.path.abspath(item)
outer_file = starterPath + '/index.html'
outer_js = starterPath + '/outliner.js'
if r2.match(item) != None:
filename = r2.match(item).group() # should give me evertying before .html
makePage(outer_file, outer_js, filename) # self defined function
else:
print item + ': no'
filename = r2.match(item).group()
should be
filename = r2.match(item).groups() # plural !
According to the documentation, group will return one or more subgroups, whereas groups will return them all.
Figured out the problem. In my function, I changed directories, but never changed back. So when function ended and went back to for loop, it was now looking for the folder name in the wrong location. It's as simple as
def makePage(arg1, arg2, arg3):
os.chdir('path/to/desktop')
# write file to new location
os.chdir(start) # go back to start and continue original search
return
Also .group() worked for me and returned everything in the folder name before the string .html whereas .groups() just returned ()
The code in original post stayed the same. Something so simple, causing all this headache..

Categories