Python: Need to add chosen filenames into an array

Python: Need to add chosen filenames into an array - python

The idea is simple: there is a directory with 2 or more files *.txt. My script should look in the directory and get filenames in order to copy them (if they exist) over the network.
As a Python newbie, I am facing problems which cannot resolve so far.
My code:
files = os.listdir('c:\\Python34\\');
for f in files:
if f.endswith(".txt"):
print(f)
This example returns 3 files:
LICENSE.txt
NEWS.txt
README.txt
Now I need to use every filename in order to do a SCP. The problem is that when I try to get the first filename with:
print(f[0])
I am receiving just the first letters from each file in the list:
L
N
R
How to add filenames to an array in order to use them later as a array elements?

You can also try using the EXTEND method. So you say:
x = []
for f in files:
if f endswith(".txt"):
x.extend([f])
so it would be "adding" to the end of the list the file in which f is on.

If you want a list of matching files names, then instead of using os.listdir and filtering, use glob.glob with a suitable pattern.
import glob
files = glob.glob('C:\\python34\\*.txt')
Then you can access files[0] etc...

The array of files is files. In the loop, f is a single file name (a string) so f[x] gets the xth character of a filename. Do files[0] instead of f[0].

Related

Using regex to move multiple files

I'm really new to python and looking to organize hundreds of files and want to use regex to move them to the correct folders.
Example: I would like to move 4 files into different folders.
File A has "USA" in the name
File B has "Europe" in the name
File C has both "USA" and "Europe" in the name
Fild D has "World" in the name
Here is what I am thinking but I don't think this is correct
shutil.move('Z:\local 1\[.*USA.*]', 'Z:\local 1\USA')
shutil.move('Z:\local 1\[.*\(Europe\).*]', 'Z:\local 1\Europe')
shutil.move('Z:\local 1\[.*World.*]', 'Z:\local 1\World')

You can list all the files in a directory and move them in a new folder if their names matches a given regular expression as follows:
import os
import re
import shutil
for filename in os.listdir('path/to/some/directory'):
if re.match(r'Z:\\local 1\\[.*USA.*]+', filename):
shutil.move(os.path.join('path/to/some/directory', filename), 'Z:\local 1\USA')
elif re.match(r'Z:\\local 1\\[.*\(Europe\).*]+', filename):
shutil.move(os.path.join('path/to/some/directory', filename), 'Z:\local 1\Euro')
# and so forth
However, os.listdir shows only the direct subfolders and files, but it does not iterate deeper. If you want to analyze all the files recursively in a given folder use the os.walk method.

According to definition of shutil.move, it needs two things:
src, which is a path of a source file
dst, which is a path to the destination folder.
It says that src and dst should be paths, not regular expressions.
What you have is os.listdir() which list files in a directory.
So what you need to do is to list files, then try to match file names against regular expressions. If you get a match, then you know where the file should go.
That said, you still need to decide what to do with option C that matches both 'USA' and 'Europe'.
For added style points you can put pairs of (regex, destination_path) into an array, tuple or map; in this case you can add any number of rules without changing or duplicating the logic.

How to search files from a list?

I have a list
fileslist=[1.jpg,2.xml,3.png]
I want to search files in list in current working directory
I have tried
listingdir=os.getcwd()
for rootpath,directories,files in os.walk(listingdir):
for file in fileslist:
if file in files:
print("file:{} found".format(file))
I also tried
list=(set(files).intersection(fileslist))
but not worked because of not only one type extentions in files
when I used set it creates a list like following and i don't get the results
f=set(files)
print(f)
#result is
[[1.jpg,2.jpg,....],[1.png,2.png,...],[1.xml,2.xml,.......]]

If you only want to search through the current dir, you can do something like:
files = [f for f in os.listdir() if os.path.isfile(f)]
fileslist = ['1.jpg','2.xml','3.png']
list = (set(files).intersection(fileslist))
Output:
{'1.png'} # it wont always be this, just an example.

You may use os.path.isfile(...). It will check if a certain file exists. It may accept a full path or a filename only (then it will check if the file exists in the current working directory).
import os.path
fileslist=['1.jpg','2.xml','3.png'] # no, it won't work without the quotes!
for f in fileslist:
if os.path.isfile(f):
print("file:{} found".format(f))

Errors with Glob while outputting file names

I am combining two questions here because they are related to each other.
Question 1: I am trying to use glob to open all the files in a folder but it is giving me "Syntax Error". I am using Python 3.xx. Has the syntax changed for Python 3.xx?
Error Message:
File "multiple_files.py", line 29
files = glob.glob(/src/xyz/rte/folder/)
SyntaxError: invalid syntax
Code:
import csv
import os
import glob
from pandas import DataFrame, read_csv
#extracting
files = glob.glob(/src/xyz/rte/folder/)
for fle in files:
with open (fle) as f:
print("output" + fle)
f_read.close()
Question 2: I want to read input files, append "output" to the names and print out the names of the files. How can I do that?
Example: Input file name would be - xyz.csv and the code should print output_xyz.csv .
Your help is appreciated.

Your first problem is that strings, including pathnames, need to be in quotes. This:
files = glob.glob(/src/xyz/rte/folder/)
… is trying to divide a bunch of variables together, but the leftmost and rightmost divisions are missing operands, so you've confused the parser. What you want is this:
files = glob.glob('/src/xyz/rte/folder/')
Your next problem is that this glob pattern doesn't have any globs in it, so the only thing it's going to match is the directory itself.
That's perfectly legal, but kind of useless.
And then you try to open each match as a text file. Which you can't do with a directory, hence the IsADirectoryError.
The answer here is less obvious, because it's not clear what you want.
Maybe you just wanted all of the files in that directory? In that case, you don't want glob.glob, you want listdir (or maybe scandir): os.listdir('/src/xyz/rte/folder/').
Maybe you wanted all of the files in that directory or any of its subdirectories? In that case, you could do it with rglob, but os.walk is probably clearer.
Maybe you did want all the files in that directory that match some pattern, so glob.glob is right—but in that case, you need to specify what that pattern is. For example, if you wanted all .csv files, that would be glob.glob('/src/xyz/rte/folder/*.csv').
Finally, you say "I want to read input files, append "output" to the names and print out the names of the files". Why do you want to read the files if you're not doing anything with the contents? You can do that, of course, but it seems pretty wasteful. If you just want to print out the filenames with output appended, that's easy:
for filename in os.listdir('/src/xyz/rte/folder/'):
print('output'+filename)

This works in http://pyfiddle.io:
Doku: https://docs.python.org/3/library/glob.html
import csv
import os
import glob
# create some files
for n in ["a","b","c","d"]:
with open('{}.txt'.format(n),"w") as f:
f.write(n)
print("\nFiles before")
# get all files
files = glob.glob("./*.*")
for fle in files:
print(fle) # print file
path,fileName = os.path.split(fle) # split name from path
# open file for read and second one for write with modified name
with open (fle) as f,open('{}{}output_{}'.format(path,os.sep, fileName),"w") as w:
content = f.read() # read all
w.write(content.upper()) # write all modified
# check files afterwards
print("\nFiles after")
files = glob.glob("./*.*") # pattern for all files
for fle in files:
print(fle)
Output:
Files before
./d.txt
./main.py
./c.txt
./b.txt
./a.txt
Files after
./d.txt
./output_c.txt
./output_d.txt
./main.py
./output_main.py
./c.txt
./b.txt
./output_b.txt
./a.txt
./output_a.txt
I am on windows and would use os.walk (Doku) instead.
for d,subdirs,files in os.walk("./"): # deconstruct returned aktDir, all subdirs, files
print("AktDir:", d)
print("Subdirs:", subdirs)
print("Files:", files)
Output:
AktDir: ./
Subdirs: []
Files: ['d.txt', 'output_c.txt', 'output_d.txt', 'main.py', 'output_main.py',
'c.txt', 'b.txt', 'output_b.txt', 'a.txt', 'output_a.txt']
It also recurses into subdirs.

How to iterate through a list of file path names and delete each one?

I have a script that creates a list of local files by path name that I would like to see deleted. The essence of my problem in the code below.
If it's easier just to move these files rather than delete them, that's an option. I've seen it might be an option to set the directory before I can get it do delete but I'm hoping for a more efficient function that will just read the paths and deal with them.
I don't need any function to discriminate between any file path names stored in the list. I want each file stored in the list, OUT.
The code as is now gives the error:
TypeError: remove: illegal type for path parameter
Code:
import os
files = ['/users/computer/site/delete/photo1.jpg', '/users/computer/site/delete/photo3.jpg']
os.remove(files)

os.remove() takes a single path as argument, not a list of paths. You have to do something like:
for f in files:
os.remove(f)

You could use a list comprehension
[os.remove(f) for f in ['/users/computer/site/delete/photo1.jpg', '/users/computer/site/delete/photo3.jpg']]

For starters, you are calling os.remove(LIST CALLED files).
You want to iterate through the files and call os.remove on each individual file.
for file in files:
os.remove(file)

You can't delete the list at once. You must iterate over all of the files and delete each one.
The code for removing files from the list -
import os
files = ['/users/computer/site/delete/photo1.jpg', '/users/computer/site/delete/photo3.jpg']
for f in files:
os.remove(f)

Python2.7 search zipfiles for .kml containing string without unzipping

I am trying to write my first python script below. I want to search through a read only archive on an HPC to look in zipfiles contained within folders with a variety of other folder/file types. If the zip contains a .kml file I want to print the line in there starting with the string <coordinates>.
import zipfile as z
kfile = file('*.kml') #####breaks here#####
folderpath = '/neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/01/21' # folder with multiple folders and .zips
for zipfile in folderpath: # am only interested in the .kml files within the .zips
if kfile in zipfile:
with read(kfile) as k:
for line in k:
if '<coordinates>' in line: # only want the coordinate line
print line # print the coordinates
k.close()
Eventually I want to loop this through multiple folders rather than pointing to the exact folder location ie loop thorough every sub folder in here /neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/ but this is a starting point for me to try and understand how python works.
I am sure there are many problems with this script before it will run but the current one I have is
kfile = file('*.kml')
IOError: [Errno 22] invalid mode ('r') or filename: '*.kml'
Process finished with exit code 1
Any help appreciated to get this simple process script working.

When you run:
kfile = file('*.kml')
You are trying to open a single file named exactly *.kml, which is not what you want. If you want to process all *.kml files, you will need to (a) get a list of matching files and then (b) process those files in a list.
There are a number of ways to accomplish the above; the easiest is probably the glob module, which can be used something like this:
import glob
for kfilename in glob.glob('*.kml'):
print kfilename
However, if you are trying to process a directory tree, rather than a single directory, you may instead want to investigate the os.walk function. From the docs:
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
A simple example might look something like this:
import os
for root, dirs, files in os.walk('topdir/'):
kfilenames = [fn for fn in files if fn.endswith('.kml')]
for kfilename in kfilenames:
print kfilename
Additional commentary
Iterating over strings
Your script has:
for zipfile in folderpath:
That will simply iterate over the characters in the string folderpath. E.g., the output of:
folderpath = '/neodc/sentinel1a/data/IW/L1_GRD/h/IPF_v2/2015/01/21'
for zipfile in folderpath:
print zipefile
Would be:
/
n
e
o
d
c
/
s
e
n
t
i
n
e
l
1
a
/
...and so forth.
read is not a context manager
Your code has:
with read(kfile) as k:
There is no read built-in, and the .read method on files cannot be used as a context manager.
KML is XML
You're looking for "lines beginning with <coordinate>", but KML files are not line based. An entire KML could be a single line and it would still be valid.
Your are much better off using an XML parser to parse XML.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.