I have a number of files that I am reading in. I would like to have a list that contains the file contents. I read the whole content into a string.
I need a list that looks like this:
["Content of the first file", "content of the second file",...]
I have tried various ways like append, extend or insert, but they all expect a list as parameter and not a str so I end up getting this:
[["Content of the first file"], ["content of the second file"],...]
How can I get a list that contains strings and then add strings without turning it into a list of lists?
EDIT
Some more code
for file in os.listdir("neg"):
with open("neg\\"+file,'r', encoding="utf-8") as f:
linesNeg.append(f.read().splitlines())
for file in os.listdir("pos"):
with open("pos\\"+file,'r', encoding="utf-8") as f:
linesPos.append(f.read().splitlines())
listTotal = linesNeg + linesPos
contents_list = []
for filename in filename_list:
with open(filename) as f:
contents_list.append(f.read())
There's definitely more than one way to do it. Assuming you have the opened files as file objects f1 and f2:
alist = []
alist.extend([f1.read(), f2.read()])
or
alist = [f.read() for f in (f1, f2)]
Personally I'd do something like this, but there's more than one way to skin this cat.
file_names = ['foo.txt', 'bar.txt']
def get_string(file_name):
with open(file_name, 'r') as fh:
contents = fh.read()
return contents
strings = [get_string(f) for f in file_names]
Related
I have 2 files with emailadresses in them and some of these emailadresses are the same and some aren't. I need to see which of the emailadresses in file1 aren't in file2. How can I do that? Also it would be great if I can put them in a list too.
here's what I got:
'file1 = open("competitor_accounts.txt")
file2 = open("accounts.txt")'
I know it ain't much, but I need help getting started
I thought maybe using a for loop with if statements? but I just don't know how.
You can read each file's contents to a separate list and then compare the lists to each other like so
with open('accounts.txt') as f:
accounts = [line for line in f]
with open('competitor_accounts.txt') as f:
competitors = [line for line in f]
accounts_not_competitors = [line for line in accounts if line not in competitors]
competitors_not_accounts = [line for line in competitors if line not in accounts]
You can use open as well with readlines() but using with is commonly more acceptable since you don't need to explicitly close() the file after you're done reading it.
file_a = open('accounts.txt')
accounts = file_a.readlines()
file_a.close()
The two rows in the end form an expression to generate a new list based on matches in the existing lists. These can be written out to an easier form:
accounts_not_competitors = []
for line in accounts:
if line not in competitors:
accounts_not_competitors.append(line)
I believe this should be enough to get you started with the syntax and functionality in case you wanted to do some other comparisons between the two.
Assuming that only one email is a line in each file
First save each file in a list and create another list that you will save the difference in.
Loop through file1 list and check if each item in file1 list is present in file2 list if not add that item to the diff list
f1_list = []
f2_list = []
diff = []
with open(file1name, 'r', encoding='utf-8') as f1:
for line in f1:
f1_list.append(line)
with open(file2name, 'r', encoding='utf-8') as f2:
for line in f2:
f2_list.append(line)
for email in f1_list:
if not email in f2_list:
diff.append(email)
print(diff)
You can use set
with open('competitor_accounts.txt', 'r') as file:
competitor_accounts = set([mail for mail in file])
with open('accounts.txt', 'r') as file:
accounts = set([mail for mail in file])
result = list(competitor_accounts - accounts)
Unfortunately I have the problem that I can not read strings from files, the part of the code would be:
names = ["Johnatan", "Jackson"]
I tried it with this =
open("./names.txt", "r")
instead of (code above)- list of names, but unfortunately this does not work, if I query it without the file it works without problems.
I would be very happy if someone could help me and tell me exactly where the problem is.
f = open('./names.txt', 'r', encoding='utf-8')
words_string = f.readlines()
words = words_string.split(',')
print(words)
I hope it helps.. As your file contains all the words with comma-separated using readlines we get all the lines in the file as a single string and then simply split the string using .split(','). This results in the list of words that you need.
try read the data in the file like this:
with open(file_path, "r") as f:
data = f.readlines()
data = ['maria, Johnatan, Jackson']
inside data is the list of name. you can parse it using split(",")
you can split in lines if you have in the file, Johnatan,Jackson,Maria
doing:
with open("./names.txt", "r", encoding='utf-8') as fp:
content = fp.read()
names = content.split(",")
you can also do:
names = open("./names.txt", "r", encoding='utf-8').read().split(",")
if you want it to be oneliner,
I want to "read" the content of many txt files I have in a dir, to a list.
The thing is that I want every object in the list to be a list too.
I'd like to be able to access each "file" (or content of a file) by the index - in order to later train it with an NLP model. Also, that's why I used the line.strip() because I need each content to be stripped into "lines".
Here is the code I tried, however, I get the Error:
IndexError: list index out of range
os.chdir(r'C:\Users\User1\Article\BBC\bbc\entertainment')
ent_txts = glob.glob('*.txt')
ent_docs = []
d=0
for i in ent_txts:
with open(i, 'r') as f:
for line in f:
ent_docs[d].append(line.strip())
d+=1
I think the problem is with the fact that I'm trying to address a list index that hasn't been created.
I'm sure there's must be a simple way to do it, though I can't find it.
I'd be glad for any help!
The error is because you don't have any inner list to insert to. I would fix it like so:
for i in ent_txts:
with open(i, 'r') as f:
file_lines = [line.strip() for line in f]
ent_docs.append(file_lines)
from collections import defaultdict
os.chdir(r'C:\Users\User1\Article\BBC\bbc\entertainment')
ent_txts = glob.glob('*.txt')
ent_docs = defaultdict(list)
d=0
for i in ent_txts:
with open(i, 'r') as f:
for line in f:
ent_docs[d].append(line.strip())
d+=1
I currently have the below code in Python 3.x:-
lst_exclusion_terms = ['bob','jenny', 'michael']
file_list = ['1.txt', '2.txt', '3.txt']
for f in file_list:
with open(f, "r", encoding="utf-8") as file:
content = file.read()
if any(entry in content for entry in lst_exclusion_terms):
print(content)
What I am aiming to do is to review the content of each file in the list file_list. When reviewing the content, I then want to check to see if any of the entries in the list lst_exclusion_terms exists. If it does, I want to remove that entry from the list.
So, if 'bob' is within the content of 2.txt, this will be removed (popped) out of the list.
I am unsure how to replace my print(content) with the command to identify the current index number for the item being examined and then remove it.
Any suggestions? Thanks
You want to filter a list of files based on whether they contain some piece(s) of text.
There is a Python built-in function filter which can do that. filter takes a function that returns a boolean, and an iterable (e.g. a list), and returns an iterator over the elements from the original iterable for which the function returns True.
So first you can write that function:
def contains_terms(filepath, terms):
with open(filepath) as f:
content = f.read()
return any(term in content for term in terms)
Then use it in filter, and construct a list from the result:
file_list = list(filter(lambda f: not contains_terms(f, lst_exclusion_terms), file_list))
Of course, the lambda is required because contains_terms takes 2 arguments, and returns True if the terms are in the file, which is sort of the opposite of what you want (but sort of makes more sense from the point of view of the function itself). You could specialise the function to your use case and remove the need for the lambda.
def is_included(filepath):
with open(filepath) as f:
content = f.read()
return all(term not in content for term in lst_exclusion_terms)
With this function defined, the call to filter is more concise:
file_list = list(filter(is_included, file_list))
I've had a desire like this before, where I needed to delete a list item when iterating over it. It is often suggested to just recreate a new list with the contents you wanted as suggested here
However, here is a quick and dirty approach that can remove the file from the list:
lst_exclusion_terms = ['bob','jenny', 'michael']
file_list = ['1.txt', '2.txt', '3.txt']
print("Before removing item:")
print(file_list)
flag = True
while flag:
for i,f in enumerate(file_list):
with open(f, "r", encoding="utf-8") as file:
content = file.read()
if any(entry in content for entry in lst_exclusion_terms):
file_list.pop(i)
flag = False
break
print("After removing item")
print(file_list)
In this case, file 3.txt was removed from the list since it matched the lst_exclusion_terms
The following were the contents used in each file:
#1.txt
abcd
#2.txt
5/12/2021
#3.txt
bob
jenny
michael
I have a file list.txt that contains a single list only e.g.
[asd,ask,asp,asq]
The list might be a very long one. I want to create a python program len.py that reads list.txt and writes the length of the within list to the file num.txt. Something like the following:
fin = open("list.txt", "rt")
fout = open("num.txt", "wt")
for list in fin:
fout.write(len(list))
fin.close()
fout.close()
However this does not work. Can someone point out what needs to be changed? Many thanks.
Use:
with open("list.txt") as f1, open("num.txt", "w") as f2:
for line in f1:
line = line.strip('\n[]')
f2.write(str(len(line.split(','))) + '\n')
with open("list.txt") as fin, open("num.txt", "w") as fout:
input_data = fin.readline()
# check if there was any info read from input file
if input_data:
# split string into list on comma character
strs = input_data.replace('[','').split('],')
lists = [map(int, s.replace(']','').split(',')) for s in strs]
print(len(lists))
fout.write(str(len(lists)))
I updated the code to use the with statement from another answer. I also used some code from this answer (How can I convert this string to list of lists?) to (more?) correctly count nested lists.
When python try to read a file using default method it generally treats content of that file as a string. So first responsibility is to type cast string content into appropriate content type for that you can not use default type casting method.
You can use special package by the name ast to type cast the data.
import ast
fin = open("list.txt", "r")
fout = open("num.txt", "w")
for list in fin.readlines():
fout.write(len(ast.literal_eval(list)))
fin.close()
fout.close()