I have many files in a directory, like ['FredrikstadAvst1.dbf', 'FredrikstadAvst2.dbf', ...]. I want to write a Python script to concatenate these files into a new "*.dbf" file.
I have a Python script that almost does the job. But on the output file it overwrites all the time. So when the job is finished the output file only containes of the last file that is in my directory.
Here is my script:
import os, glob, shutil
folder_path = r'C:\Tom\Oppdrag_2019\Pendle\2018'
for filename in glob.glob(os.path.join(folder_path, '*.dbf')):
fd = open(filename, 'r')
List = []
List.append(fd)
print filename
wfd = open(r"C:\Tom\Oppdrag_2019\Pendle\FredrikstadAvst.dbf",'a')
shutil.copyfileobj(fd, wfd, 1024*1024*10)
Consider the following:
import os, glob, shutil
folder_path = r'C:\Tom\Oppdrag_2019\Pendle\2018'
wfd = open(r"C:\Tom\Oppdrag_2019\Pendle\FredrikstadAvst.dbf",'w')
for filename in glob.glob(os.path.join(folder_path, '*.dbf')):
fd = open(filename, 'r')
shutil.copyfileobj(fd, wfd, 1024*1024*10)
fd.close()
wfd.close()
By opening the file before the loop and closing only after iterating over every dbf file, it shouldn't overwrite. I removed the List (which is a reserved keyword so try not to use it) because I can't see what it's being used for here.
Almost work now. But the header writes for every file. I just want the header to write the first time. How to skip the header for each time ?
Related
I have a folder which has a text files in it. I want to be able to put in a path to this file and have python go through the folder, open each file and append its content to a list.
import os
folderpath = "/Users/myname/Downloads/files/"
inputlst = [os.listdir(folderpath)]
filenamelist = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
filenamelist.append(filename)
print(filename list)
So far this outputs:
['test1.txt', 'test2.txt', 'test3.txt', 'test4.txt', 'test5.txt', 'test6.txt', 'test7.txt', 'test8.txt', 'test9.txt', 'test10.txt']
I want to have the code take each of these files, open them and put all of its content into a single huge list not just print the file name. Is there any way to do this?
You should use file open for this.
Read here a documentation about its advanced options
Anyway, here is one way how you can do it:
import os
folderpath = r"yourfolderpath"
inputlst = [os.listdir(folderpath)]
filenamecontent = []
for filename in os.listdir(folderpath):
if filename.endswith(".txt"):
f = open(os.path.join(folderpath,filename), 'r')
filenamecontent.append(f.read())
print(filenamecontent)
If you are using Python3, you can use :
for filename in filename_list :
with open(filename,"r") as file_handler :
data = file_handler.read()
Please do mind that you will need the full (either relative or absolute) path to your file in filename
This way, your file handler will be automatically closed when you get out of the with scope.
More information around here : https://docs.python.org/fr/3/library/functions.html#open
On a side note, in order to list files, you might want to have a look to glob and use :
filename_list = glob.glob("/path/to/files/*.txt")
You can use fileinput
Code:
import fileinput
folderpath = "your_path_to_directory_where_files_are_stored"
file_list = [a for a in os.listdir(folderpath) if a.endswith(".txt")]
# This will return all the files which are in .txt format
get_all_files = fileinput.input(file_list)
with open("alldata.txt", 'ab+') as writefile:
for line in get_all_files:
writefile.write(line+'\n')
The above code will read all the data from .txt from a specified directory(folderpath) and store it in alldata.txt So, you wanted to have that long list, that list is now stored in .txt file if you want, else you can remove the write process.
Links:
https://docs.python.org/3/library/fileinput.html
https://docs.python.org/3/library/functions.html#open
I am simply trying to create a python 3 program that runs through all .sql files in a specific directory and then apply my regex that adds ; after a certain instance and write the changes made to the file to a separate directory with their respective file names as the same.
So, if I had file1.sql and file2.sql in "/home/files" directory, after I run the program, the output should write those two files to "/home/new_files" without changes the content of the original files.
Here is my code:
import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)
for file in folder_contents:
print("Checking", file)
for file in folder_contents:
read_file = open(file, 'rt',encoding='latin-1').read()
#words=read_file.split()
with open(read_file,"w") as output:
output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))
I receive an error of File name too long:"CREATE EXTERNAL TABLe" and also I am not too sure where I would put my output path (/home/files/new_dd)in my code.
Any ideas or suggestions?
With read_file = open(file, 'rt',encoding='latin-1').read() the whole content of the file was being used as the file descriptor. The code provided here iterate over the files names found with glob.glob pattern open to read, process data, and open to write (assuming that a folder newfile_sqls already exist,
if not, an error would rise FileNotFoundError: [Errno 2] No such file or directory).
import glob
import os
import re
folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"
folder_contents = glob.glob(os.path.join(folder_path,file_pattern))
# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:
# open to read
with open(os.path.join(folder_path,file_), "r") as inputf:
read_file = inputf.read()
# use variable 'read_file' here
tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)
# open to write to (previouly created) new folder
with open(os.path.join(output_path,file_), "w") as output:
output.writelines(tmp)
I'd like to read the contents of every file in a folder/directory and then print them at the end (I eventually want to pick out bits and pieces from the individual files and put them in a separate document)
So far I have this code
import os
path = 'results/'
fileList = os.listdir(path)
for i in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines = file.readlines()
print(allLines)
at the end I dont get any errors but it only prints the contents of the last file in my folder in a series of strings and I want to make sure its reading every file so I can then access the data I want from each file. I've looked online and I cant find where I'm going wrong. Is there any way of making sure the loop is iterating over all my files and reading all of them?
also i get the same result when I use
file = open(os.path.join('results/',i), 'r')
in the 5th line
Please help I'm so lost
Thanks!!
Separate the different functions of the thing you want to do.
Use generators wherever possible. Especially if there are a lot of files or large files
Imports
from pathlib import Path
import sys
Deciding which files to process:
source_dir = Path('results/')
files = source_dir.iterdir()
[Optional] Filter files
For example, if you only need files with extension .ext
files = source_dir.glob('*.ext')
Process files
def process_files(files):
for file in files:
with file.open('r') as file_handle :
for line in file_handle:
# do your thing
yield line
Save the lines you want to keep
def save_lines(lines, output_file=sys.std_out):
for line in lines:
output_file.write(line)
you forgot indentation at this line allLines = file.readlines()
and maybe you can try that :
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)
You forgot to indent this line allLines.append(file.read()).
Because it was outside the loop, it only appended the file variable to the list after the for loop was finished. So it only appended the last value of the file variable that remained after the loop. Also, you should not use readlines() in this way. Just use read() instead;
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)
This also creates a file containing all the files you wanted to print.
rootdir= your folder, like 'C:\\Users\\you\\folder\\'
import os
f = open('final_file.txt', 'a')
for root, dirs, files in os.walk(rootdir):
for filename in files:
data = open(full_name).read()
f.write(data + "\n")
f.close()
This is a similar case, with more features: Copying selected lines from files in different directories to another file
I am using os.walk in python 2.7 to open multiple files, then, add all lines of interest of those files to a list. Later I'd want to edit those lines with fileinput and close it. How can I achieve this? Using the code below is how I'm opening the files:
import os
import fnmatch
import fileinput
lines = []
def openFiles():
for root, dirs, files in os.walk('/home/test1/'):
for lists in fnmatch.filter(files, "*.txt"):
filepath = os.path.join(root, lists)
print filepath
with open(filepath, "r") as sources:#opens 8 files and read their lines
#edit = fileinput.input(filepath, inplace=1)
for line in sources:
if line.startswith('xe') :
lines.append(line)
Then later, for each lines that start with xe, I'd like to add a # in front of it then close that file. I'd like to do that in a different function.
Here's the I way I do it, adding to your code:
import os
import fnmatch
import fileinput
def openFiles(dir):
filePaths = []
for root, dirs, files in os.walk(dir):
for textFile in fnmatch.filter(files, "*.txt"):
filepath = os.path.join(root, textFile)
filePaths.append(filepath)
return filePaths
def prefixLines(filepaths, chartoPrefix, prefixWith):
res = ''
for filepath in filepaths:
# Read file
with open(filepath, 'r') as f:
for line in f:
if line.startswith(chartoPrefix):
res += prefixWith + line
else:
res += line
# Write to file
with open(filepath, 'w') as f:
f.write(res)
res = '' # Rest res
prefixLines(openFiles(r'/home/test1/'), 'xe', '#')
prefixLines suffers from many shortcomings:
Because we read all the lines of files and store them in res, we
may ran out of memory for large files.
If somehow the programmer forgot to indent res = '' in the
right block or if res was completely omitted and the code ran on
actual files that the user needs, you'll end up writing the contents
of the previous read file to the next file and the last
file will have the contents of all the read files. That's why you
have use this code in a testing environment or use it cautiously.
This code only serves to demonstrate how you could achieve your desired effects, prefixing file lines that starts with a string with another string. Therefore, a slight improvement of this code is recommended. For example, instead of reading all the contents of the file and storing them at res you could simply save the line number that needs to be prefixed and thus eliminating the need to load all the data into memory. enumerate could also be helpful to return the file number, it returns an iterable in 2.7. By obviating res not only do we save memory, but also eliminate the shortcoming in bullet 2.
I ended up doing it this way. But I'm using classes in my main code so It's split into 2 functions instead of one. In my main code, I used a list to hold all the file paths and use fileinput to open each filepaths from the list this way for line in fileinput.FileInput(pathlist, inplace=1): do something. I do thank #direprobs for her answer, as she shed some light on how I'm supposed to do this.
import fnmatch
import fileinput
import os
import sys
def openFiles():
for dirpath, dirs, files in os.walk('/home/test1/'):
for filename in fnmatch.filter(files, "*.txt"):
filepaths = os.path.join(dirpath, filename)
for line in fileinput.FileInput(filepaths, inplace=1):
if line.startswith("xe"):
add = "# {}".format(line)
line = line.replace(line, add)
sys.stdout.write(line)
fileinput.close()
openFiles()
I have a directory /directory/some_directory/ and in that directory I have a set of files. Those files are named in the following format: <letter>-<number>_<date>-<time>_<dataidentifier>.log, for example:
ABC1-123_20162005-171738_somestring.log
DE-456_20162005-171738_somestring.log
ABC1-123_20162005-153416_somestring.log
FG-1098_20162005-171738_somestring.log
ABC1-123_20162005-031738_somestring.log
DE-456_20162005-171738_somestring.log
I would like to read those a subset of those files (for example, read only files named as ABC1-123*.log) and export all their contents to a single csv file (for example, output.csv), that is, a CSV file that will have all the data from the inidividual files collectively.
The code that I have written so far:
#!/usr/bin/env python
import os
file_directory=os.getcwd()
m_class="ABC1"
m_id="123"
device=m_class+"-"+m_id
for data_file in sorted(os.listdir(file_dir)):
if str(device)+"*" in os.listdir(file_dir):
print data_file
I don't know how to read a only a subset of filtered files and also how to export them to a common csv file.
How can I achieve this?
just use re lib to match file name pattern, and use csv lib to export.
Only a few adjustments, You were close
filesFromDir = os.listdir(os.getcwd())
fileList = [file for file in filesFromDir if file.startswith(device)]
f = open("LogOutput.csv", "ab")
for file in fileList:
#print "Processing", file
with open(file, "rb") as log_file:
txt = log_file.read()
f.write(txt)
f.write("\n")
f.close()
Your question could be better stated, based on your current code snipet, I'll assume that you want to:
Filter files in a directory based on glob pattern.
Concatenate their contents to a file named output.csv.
In python you can achieve (1.) by using glob to list filenames.
import glob
for filename in glob.glob('foo*bar'):
print filename
That would print all files starting with foo and ending with bar in
the current directory.
For (2.) you just read the file and write its content to your desired
output, using python's open() builtin function:
open('filename', 'r')
(Using 'r' as the mode you are asking python to open the file for
"reading", using 'w' you are asking python to open the file for
"writing".)
The final code would look like the following:
import glob
import sys
device = 'ABC1-123'
with open('output.csv', 'w') as output:
for filename in glob.glob(device+'*'):
with open(filename, 'r') as input:
output.write(input.read())
You can use the os module to list the files.
import os
files = os.listdir(os.getcwd())
m_class = "ABC1"
m_id = "123"
device = m_class + "-" + m_id
file_extension = ".log"
# filter the files by their extension and the starting name
files = [x for x in files if x.startswith(device) and x.endswith(file_extension)]
f = open("output.csv", "a")
for file in files:
with open(file, "r") as data_file:
f.write(data_file.read())
f.write(",\n")
f.close()