How to put output data into directory specified in path1?

How to put output data into directory specified in path1? - python

I want to write my output files into directory added to path1. But when I try to do that error occurs.
import os
import sys
path = 'ODOGS/LAB'
path1= 'ODOGS/lab_res_voiced'
for file in os.listdir(path):
current = os.path.join(path, file)
name_output=file #name of the current file must be the same for the output file
index_list_B = []
line_list_B = []
line_list_linije = []
with open(current) as f1:
for index1, line1 in enumerate(f1):
strings_B = ("p","P","t","T") #find lines which contain this str
line_list_B.append(line1) #put the line in line_list_B
if any(s in line1 for s in strings_B):
line_list_linije.append(line1)
print(line_list_linije)
index_list_B.append(index1) #positions of lines
with open(path1.join(name_output).format(index1), 'w') as output1:
for index1 in index_list_B:
print(line_list_B[index1])
output1.writelines([line_list_B[index11],
line_list_B[index1],','])
This code goes trough the text files in 'ODOGS/LAB' directory and searches if there are lines that contain ceratain strings. After it finds all lines which match the condition, it needs to write them in new file but with the same name as input file. That part of a code works just fine.
I want to put all of output files into another directory path1 but the part of
with statement doesn't work.
I get an error:
FileNotFoundError: [Errno 2] No such file or directory: 'sODOGS/lab_res_voicedzODOGS...
It works when the with statement is:
with open(name_output.format(index1), 'w') as output1:
but then I get all the files in the root folder which I don't want.
My question is how can I put my output files into directory in path1?

There's an error in forming the output path:
Instead of
with open(path1.join(name_output).format(index1), 'w') as output1:
you want
with open(os.path.join(path1, name_output), 'w') as output1:

Related

Filter Directory using Regex and output filtered files to another directory

I am simply trying to create a python 3 program that runs through all .sql files in a specific directory and then apply my regex that adds ; after a certain instance and write the changes made to the file to a separate directory with their respective file names as the same.
So, if I had file1.sql and file2.sql in "/home/files" directory, after I run the program, the output should write those two files to "/home/new_files" without changes the content of the original files.
Here is my code:
import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)
for file in folder_contents:
print("Checking", file)
for file in folder_contents:
read_file = open(file, 'rt',encoding='latin-1').read()
#words=read_file.split()
with open(read_file,"w") as output:
output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))
I receive an error of File name too long:"CREATE EXTERNAL TABLe" and also I am not too sure where I would put my output path (/home/files/new_dd)in my code.
Any ideas or suggestions?

With read_file = open(file, 'rt',encoding='latin-1').read() the whole content of the file was being used as the file descriptor. The code provided here iterate over the files names found with glob.glob pattern open to read, process data, and open to write (assuming that a folder newfile_sqls already exist,
if not, an error would rise FileNotFoundError: [Errno 2] No such file or directory).
import glob
import os
import re
folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"
folder_contents = glob.glob(os.path.join(folder_path,file_pattern))
# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:
# open to read
with open(os.path.join(folder_path,file_), "r") as inputf:
read_file = inputf.read()
# use variable 'read_file' here
tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)
# open to write to (previouly created) new folder
with open(os.path.join(output_path,file_), "w") as output:
output.writelines(tmp)

Browse all folders and subfolders to delete all files which contain a certain string

I have several folders containing several subfolders each, containing 5-6 .txt files each which have lists of fruits (apples, pears, grapes, ect.) A handful of random .txt files, however, contain "chicken", and must be deleted.
I am trying to write a program which will browse each folder and subfolder, deleting the files which contain the string "chicken", but it does not seem to be working for some reason.
The following is the code I have thus far:
import os
DIR = r'C:\Users\Steve\AppData\Local\Programs\Python\Python37-32\fruits'
for parent, dirnames, filenames in os.walk(DIR):
for fn in filenames:
found = False
with open(os.path.join(DIR,filename)) as f:
for line in f:
if 'chicken' in line:
found = True
break
if found:
os.remove(os.path.join(DIR, fn))
I am getting errors such as
File <stdin>, line 4, in <module>
FileNotFoundError: [errno 2] No such file or directory:
and I'm not sure why.
Any suggestions on how to make the code run smoothly are much appreciated!

I am not sure why do you break and then delete when you can directly remove the file. You are on the right track in your code but the structure and indentation were wrong. I hope this helps solve your issue.
import os
root = r'C:\Users\Steve\AppData\Local\Programs\Python\Python37-32\fruits'
for path, subdirs, files in os.walk(root):
for name in files:
# get file path
file_path = os.path.join(path, name)
# read content of file
with open(file_path) as f:
content = f.readlines()
# delete if it include key word
for line in content:
if "chicken" in line:
os.remove(file_path)
break

you are having indentation issue. Use the below code inside for loop
for line in f:
if 'chicken' in line:
found = True
break

Reading all json files in a directory

I have multiple (400) json files containing a dict in a directory that I want to read and append to a list. I've tried looping over all the files in the directory like this:
path_to_jsonfiles = 'TripAdvisorHotels'
alldicts = []
for file in os.listdir(path_to_jsonfiles):
with open(file,'r') as fi:
dict = json.load(fi)
alldicts.append(dict)
I keep getting the following error:
FileNotFoundError: [Errno 2] No such file or directory
However, when I look at the files in the directory, it gives me all the right files.
for file in os.listdir(path_to_jsonfiles):
print(file)
Just opening one of them with the file name works as well.
with open('AWEO-q_GiWls5-O-PzbM.json','r') as fi:
data = json.load(fi)
Were in the loop is it going wrong?

Your code has two errors:
1.file is only the file name. You have to write full filepath (including its folder).
2.You have to use append inside the loop.
To sum up, this should work:
alldicts = []
for file in os.listdir(path_to_jsonfiles):
full_filename = "%s/%s" % (path_to_jsonfiles, file)
with open(full_filename,'r') as fi:
dict = json.load(fi)
alldicts.append(dict)

Python - loop through subfolders and files in a directory without ignoring the subfolder

I have read all the stack exchange help files on looping through subfolders, as as well as the os documentation, but I am still stuck. I am trying to loop over files in subfolders, open each file, extract the first number in the first line, copy the file to a different subfolder(with the same name but in the output directory) and rename the file copy with the number as a suffix.
import os
import re
outputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers"
inputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/FRUS_Wisconsin"
suffix=".txt"
for root, dirs, files in os.walk(inputpath):
for file in files:
file_path = os.path.join(root, file)
foldername=os.path.split(os.path.dirname(file_path))[1]
filebname=os.path.splitext(file)[0]
filename=filebname + "_"
f=open(os.path.join(root,file),'r')
data=f.readlines()
if data is None:
f.close()
else:
with open(os.path.join(root,file),'r') as f:
for line in f:
s=re.search(r'\d+',line)
if s:
pagenum=(s.group())
break
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
with open(os.path.join(root,file),'r') as f:
for line in f:
f1.write(line)
I expect the result to be copies of the files in the input directory placed in the corresponding subfolder in the output directory, renamed with a suffix, such as "005_2", where 005 is the original file name, and 2 is the number the python code extracted from it.
The error I get seems to indicates that I am not looping through files correctly. I know the code for extracting the first number and renaming the file works because I tested it on a single file. But using os.walk to loop through multiple subfolders is not working, and I can't figure out what I am doing wrong. Here is the error:
File "<ipython-input-1-614e2851f16a>", line 23, in <module>
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
IOError: [Errno 2] No such file or directory: 'C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers\\FRUS_Wisconsin\\.dropbox_1473986809.txt'

Well, this isn't eloquent, but it worked
from glob import glob
folderlist=glob("C:\\...FRUS_Wisconsin*\\")
outputpath = "C:\\..\Wisconsin_Copies_With_PageNumbers"
for folder in folderlist:
foldername = str(folder.split('\\')[7])
for root, dirs, files in os.walk(folder):
for file in files:
filebname=os.path.splitext(file)[0]
filename=filebname + "_"
if not filename.startswith('._'):
with open(os.path.join(root,file),'r') as f:
for line in f:
s=re.search(r'\d+',line)
if s:
pagenum=(s.group())
break
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
with open(os.path.join(root,file),'r') as f:
for line in f:
f1.write(line)

how do I access information from files in subdirectories with os.walk?

I want to access and process information from files in sub-directories from a root directory. I have tried using os.walk, which gets me to the files, but how do I access their contents? I want specific files in these sub-directories that all have the same name, but there are other files in these sub-directories. This is what I have tried:
import os
import numpy as np
for root, dirs, files in os.walk("/rootDir/"):
for file in files:
if file.endswith(('sum.txt')):
print file #Here, the desired file name is printed
PIs = []
for line in file:
print line #Here, I only get 's' printed, which I believe is the first letter in 'sum.txt'
line = line.rstrip()
line = line.split('\t')
PIs.append(line[2])
print PIs #nothing is collected so nothing is printed
How do I loop over the lines in the desired files in these sub-directories in the root directory?
ADDED PROBLEM:
I got the answer to my first question, now I have another. In the directories under the root there are many sub-directories. I want to access information from only one sub-directory that has the same name in all directories. This is what I tried:
for root, dirs, files in os.walk("/rootPath/"):
for dname in dirs:
#print dname, type(dname)
allPIs = []
allDirs = []
if dname.endswith('code_output'): #I only want to access information from one file in sub-directories with this name
ofh = open("sumPIs.txt", 'w')
ofh.write("path\tPIs_mean\n")
for fname in files: #Here i want to be in the code_output sub-directory
print fname #here I only want to see files in the sub-directory with the 'code_output' end of a name, but I get all files in the directory AND sub-directory
if fname.endswith('sumAll.txt'):
PIs = []
with open(os.path.join(root,fname), 'r') as fh_in:
for line in fh_in:
line = line.rstrip()
line = line.split('\t')
PIs.append(int(line[2]))
PIs_mean = numpy.mean(PIs)
allPIs.append(PIs_mean)
allDirs.append(filePath)
Why does this loop over ALL files in the directory and not only the sub-directory with the name ending 'code_output'?

Use the with context handler to open the file handles. The file handle is closed when you exit the with block, so you don't accidentally leave lots of file handles open.
Also file is a built in class in Python so probably best not use that as the name of a variable.
import os
PIs = []
for root, dirs, files in os.walk("/rootDir/"):
for fname in files:
if fname.endswith('sum.txt'):
print fname #Here, the wanted file name is printed
with open(os.path.join(root,fname), 'r') as fh_in:
for line in fh_in:
print line # here I only get 's' printed, which I believe is the first letter in 'sum.txt'
line = line.rstrip()
line = line.split('\t')
PIs.append(line[2])
print PIs #nothing is collected so nothing is printed

Try not use keywords for variable names file. Use f, file_, etc ...
file is a string change the line
for line in file_
by
for line in open(file_).readlines()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to put output data into directory specified in path1? - python

There's an error in forming the output path: Instead of with open(path1.join(name_output).format(index1), 'w') as output1: you want with open(os.path.join(path1, name_output), 'w') as output1:

Related

Filter Directory using Regex and output filtered files to another directory

Browse all folders and subfolders to delete all files which contain a certain string

Reading all json files in a directory

Python - loop through subfolders and files in a directory without ignoring the subfolder

how do I access information from files in subdirectories with os.walk?

Categories

Resources