Getting data from different folders - python

I have two sets of json files store in two folders(named firstdata and seconddata) separately. I am trying to read all files in that two folders and put it into two arrays separately. Here is the code I did:
directory = os.path.normpath("D:\Python\project")
for subdir, dir, file in os.walk(directory):
if subdir == 'D:\Python\project\firstdata':
for f in file:
if f.endswith(".json"):
fread=open(os.path.join(subdir, f),'r')
a = fread.next().replace('\n','').split(',')
for line in a:
b = line.replace('.','').replace('\n','').replace('"','').split(': ')
print "___________________________________________________________________"
fread.close()
However it ignores (if subdir == 'D:\Python\project\firstdata': ) and get nothing at the end, can anyone helps?

You are interpreting things wrong. See the docs for **os.walk**.
The 3 variables for your for loop should be root, dirs, and files, in that order.
dirs and files are lists, of the directories and files in the current directory respectively. root is the current directory you are in.
subdir is being ignored because you are using os.walk incorrectly.

Related

Loop through folders of a directory and create an output after each one in python

Im trying to combine all .txt files in a directory and output them after the last one.
I have following Filesystem structure:
objects
object1
attribute1.txt
attribute2.txt
attribute3.txt
object2
attribute1.txt
attribute2.txt
attribute3.txt
and so on...
I've looked up this code
for subdir, dirs, files in os.walk(rootdir):
for file in files:
# collect the information
I am looking for a for loop like
for subdir, dirs, files in os.walk(rootdir): # <- what do I need to change here?
for object in objects: # <- how to implement this line correctly?
for file in files:
# collect the information
print(information)
But I have to idea how to do that, since I am very new to python.
EDIT:
Python concatenate text files does not answer my question, since there is not an actual loop but only an Array with file names.
The first loop you show will already go through all the files. And the path to the files is stored in subdir
If you run:
for subdir, dirs, files in os.walk(rootdir):
block=""
for file in files:
block+=subdir+"/"+file
print block
You'll see that you get all your files. So now instead of the print statement put the command you want to read the files and store the content to a variable (you should read files with the path subdir+"/"+file).
You need to store the output of your for loop somewhere. This is an important thing to learn =)
my_files = [] #empty list
for file in directory:
my_files.append(file) # each iteration adds to the list
print(my_files)

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)
You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))
You have to add the path from the root directory where you are
Append the entire pathname, not just the bare filename, to inc_files2.
You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

Walking into sub directories not wokring

I'm trying to export all of my maps that are in my subdirectories.
I have the code to export, but I cannot figure out where to add the loop that will make it do this for all subdirectories. As of right now, it is exporting the maps in the directory, but not the subfolders.
import arcpy, os
arcpy.env.workspace = ws = r"C:\Users\162708\Desktop\Burn_Zones"
for subdir, dirs, files in os.walk(ws):
for file in files:
mxd_list = arcpy.ListFiles("*.mxd")
for mxd in mxd_list:
current_mxd = arcpy.mapping.MapDocument(os.path.join(ws, mxd))
pdf_name = mxd[:-4] + ".pdf"
arcpy.mapping.ExportToPDF(current_mxd, pdf_name)
del mxd_list
What am I doing wrong that it isn't able to iterate through the subfolders?
Thank you!
Iterating through os.walk result you should give tuples containing (path, dirs, files) (the first in the tuple is the current path that contains files which is why I tend to name it that way). The current directory does not change automatically so you need to incorporate it into the path you're giving to arcpy.ListFiles like this:
arcpy.ListFiles(os.path.join(path, "*.mxd"))
You should also remove the loop for file in files. It seems like you're exporting the files per directory so why export the whole directory every time for each file?
Also you should change arcpy.mapping.MapDocument(os.path.join(ws, mxd)) to arcpy.mapping.MapDocument(os.path.join(path, mxd)) where path is again the first element from os.walk.

iterating through folders and from each use one specific file in a method python

What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!
os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain

os.walk on non C drive directory

I know there are several posts that touch on this, but I haven't found one that works for me yet. I need to create a list of files with an .mxd extension by searching an entire mapped directory. I used this code and it works:
import os
file_list = []
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
However, it only works on the C drive. I need to be able to search for these files on a mapped drive of my choosing. Here's the script I'm using, but it doesn't work. I know there is an mxd file in the subdirectory of this drive, but it isn't being reported in the file list. In fact, the file list is totally empty, and it shouldn't be.
import os
path = r"U:/TEST/"
filenamelist = []
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
os.walk(path) yields a 3-tuple which is commonly unpacked as root, dirs, files. So instead of
for files in os.walk(path):
...
use
for root, dirs, files in os.walk(path):
for filename in files:
if filename.endswith(".mxd"):
filenamelist.append(filename)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
Yes. Compare and contrast your two loops:
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
You're trying to do the exact same thing, but not using even remotely the same code.
In the first one, you loop over the walk, storing each tuple in (paths, dirs, files), then loop over files.
In the second one, you loop over the walk, storing each tuple in files, don't loop over anything, and then just use some variable named file left over from some earlier code.
And then, even if that part worked, you end up appending files (which, remember, is a tuple of three lists) rather than file—or, probably better, os.path.join(paths, file)—to the list.
Just make the second one look like the first. Or, better, put it in a function and call it twice, instead of copying and pasting it.
Here's the final script that worked for crawling on a drive root other than C:
import os
file_list = []
path = r'U:\\'
for (dirpath, subdirs, files) in os.walk(path):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(dirpath, file))

Categories