I have 100 text files in a folder that I wanna load into a list.
I was only able to load one file. how can I load all the files?
Here is what I did
with open('varmodel/varmodel_2.var') as f:
varmodel_2 = f.read()
print(varmodel_2)
However, instead of 2, I have from 1 to 100
You can use the glob module to do just that. It gives you a list of all files/folders in a directory. Here is the code you would use to get a string containing all of the file information:
import glob
string = ""
for filename in glob.glob("*"):
with open(filename, "r") as f:
string += f.read()
print(string)
all_files = []
for dir in glob.glob('varmodel/*'):
with open(dir) as f:
varmodel = f.read(varmodel )
# not sure about txt file content
# it may need preprocess before put to list
all_files.append(varmodel )
print(varmodel)
Related
I have a directory of 50 txt files. I want to combine the contents of each file into a Python list.
Each file looks like;
line1
line2
line3
I am putting the files / file path into a list with this code. I just need to loop through file_list and append the content of each txt file to a list.
from pathlib import Path
def searching_all_files():
dirpath = Path(r'C:\num')
assert dirpath.is_dir()
file_list = []
for x in dirpath.iterdir():
if x.is_file():
file_list.append(x)
elif x.is_dir():
file_list.extend(searching_all_files(x))
return file_list
But I am unsure best method
Maybe loop something close to this?
NOTE: NOT REAL CODE!!!! JUST A THOUGHT PULLED FROM THE AIR. THE QUESTION ISNT HOW TO FIX THIS. I AM JUST SHOWING THIS AS A THOUGHT. ALL METHODS WELCOME.
file_path = Path(r'.....')
with open(file_path) as f:
source_path = f.read().splitlines()
source_nospaces = [x.strip(' ') for x in source_path]
return source_nospaces
You could make use of pathlib.rglob in order to search for all files in a directory recursively and readlines() to append the contents to list:
from pathlib import Path
files = Path('/tmp/text').rglob('*.txt')
res = []
for file in files:
res += open(file).readlines()
print(res)
Out:
['file_content2\n', 'file_content3\n', 'file_content1\n']
Lets say I have multiple text files in my path C:/Users/text_file/
I want to process them and set variables in loop for each processed text files in variables named after the filename.
To give an idea , if I have in text_file folder:
readfile_1.txt ,readfile_2.txt, readfile_3.txt, .....,....,.... ,readfile_n.txt
and i want to preprocess them with
with open(file_path, 'r', encoding='utf8') as f:
processed = [x.strip() for x in f]
I did
import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))
txtfiles={}
for file in files:
file_path = path+file
print('Processing...'+file_path)
with open(file_path, 'r', encoding='utf8') as f:
processed = [x.strip() for x in f]
txtfiles[file_path] = processed
for filename, contents in txtfiles.items():
print (filename, (contents))
But what I want with the loop is Variables with prefix cc i.e cc_readfile_1 , cc_readfile_2 and cc_readfile_3
so that whenever i call cc_readfile_1 or cc_readfile_2, the output is as it would be if done one by one i.e
with open(r'C:\Users\text_file\readfile_1.txt', 'r', encoding='utf8') as f:
cc_readfile_1 = [x.strip() for x in f]
print(readfile_1)
If you want to know why I need this , I have over 100 text files which I need to process and keep in variables in python notebook for further analysis. I do not want to execute the code 100 times renaming with different file names and variables each time.
you can use fstrings to generate the correct Key :
You will be able to access them in the dictionary
import glob, os
path = 'C:/Users/text_file/'
files = os.listdir(path)
print(len(files))
txtfiles={}
for file in files:
file_path = path+file
print('Processing...'+file_path)
with open(file_path, 'r', encoding='utf8') as f:
processed = [x.strip() for x in f]
txtfiles[f"cc_{file_path}"] = processed
for filename, contents in txtfiles.items():
print (filename, (contents))
Use a dictionary where the keys are the files' basenames without extension. There's no real point in adding a constant prefix (cc_).
So, for example, if the filename is readfile_1.txt then the key would simply be readfile_1
The value associated with each key should be a list of all of the (stripped) lines in the file.
from os.path import join, basename, splitext
from glob import glob
PATH = 'C:/Users/text_file'
EXT = '*.txt'
all_files = dict()
for file in glob(join(PATH, EXT)):
with open(file) as infile:
key = splitext(basename(file))[0]
all_files[key] = list(map(str.strip, infile))
Subsequently, to access the lines from readfile_1.txt it's just:
all_files['readfile_1']
I am new in Python. I am looking for the number of occurrences of a text string in a defined folder containing text files. I'm talking about the total number of this particular string.
def errors():
errors = 0
file = open ("\\d:\\myfolder\\*.txt", "r")
data = file.read()
errors = data.count("errors")
return errors
print("Errors:", errors)
Your code doesn't make any sense, but if I understand what you want to do, then here's some pseudo-code to get you going:
from glob import glob
text_file_paths = glob("\\d:\\myfolder\\*.txt")
error_counting = 0
for file_path in text_file_paths:
with open(file_path, 'r') as f:
all_file_lines = f.readlines()
error_counting += sum([line.count('errors') for line in all_lines])
print(error_counting)
Does that help?
I am trying to save my output from x .txt files in only one .txt file.
The .txt file should look like the output as you can see in the picture below.
What this program actually does is read a couple of .txt files with tons of data which I filter out using regex.
My source code:
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
print(values)
Thank you for your time! :)
Then you just need to open a file and write values to it. Try with this. You might need to format (I cannot test since I don't have your text files. I am assuming the output you have in values is correct and keep in mind that this is appending, so if you run more than once you will get duplicates.
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
outF = open("myOutFile.txt", "a")
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
outF.write(values)
print(values)
Please help me, i have some file txt in folder. I want to read and summary all data become one file txt. How can I do it with python.
for example :
folder name : data
file name in that folder : log1.txt
log2.txt
log3.txt
log4.txt
data in log1.txt : Size: 1,116,116,306 bytes
data in log2.txt : Size: 1,116,116,806 bytes
data in log3.txt : Size: 1,457,116,806 bytes
data in log4.txt : Size: 1,457,345,000 bytes
My expected output:
a file txt the result.txt and the data is : 1,116,116,306
1,116,116,806
1,457,116,806
1,457,345,000
Did you mean you want to read the contents of each file and write all of them in to a different file.
import os
#returns the names of the files in the directory data as a list
list_of_files = os.listdir("data")
lines=[]
for file in list_of_files:
f = open(file, "r")
#append each line in the file to a list
lines.append(f.readlines())
f.close()
#write the files to result.txt
result = open("result.txt", "w")
result.writelines(lines)
result.close()
If you are looking for size of file instead of the contents.
change the two lines :
f= open(file,"r")
lines.append(f.readlines())
to:
lines.append(os.stat(file).st_size)
File concat.py
#!/usr/bin/env python
import sys, os
def main():
folder = sys.argv[1] # argument contains path
with open('result.txt', 'w') as result: # result file will be in current working directory
for path in os.walk(folder).next()[2]: # list all files in provided path
with open(os.path.join(folder, path), 'r') as source:
result.write(source.read()) # write to result eachi file
main()
Usage concat.py <your path>
Import os. Then list the folder contents using os.listdir('data') and store it in an array. For each entry you can get the size by calling os.stat(entry).st_size. Each of these entries can now be written to a file.
Combined:
import os
outfile = open('result.txt', 'w')
path = 'data'
files = os.listdir(path)
for file in files:
outfile.write(str(os.stat(path + "/" + file).st_size) + '\n')
outfile.close()
You have to find all files that you are going to read:
path = "data"
files = os.listdir(path)
You have to read all files and for each of them to collect the size and the content:
all_sz = {i:os.path.getsize(path+'/'+i) for i in files}
all_data = ''.join([open(path+'/'+i).read() for i in files])
You need a formatted print:
msg = 'this is ...;'
sp2 = ' '*4
sp = ' '*len(msg) + sp2
print msg + sp2,
for i in all_sz:
print sp, "{:,}".format(all_sz[i])
If one needs to merge sorted files so that the output file is sorted too,
they can use the merge method from the heapq standard library module.
from heapq import merge
from os import listdir
files = [open(f) for f in listdir(path)]
with open(outfile, 'w') as out:
for rec in merge(*files):
out.write(rec)
Records are kept sorted in lexical order, if one needs something different merge accepts a key=... optional argument to specify a different ordering function.