Adding text files content into a list of lists - python

I want to "read" the content of many txt files I have in a dir, to a list.
The thing is that I want every object in the list to be a list too.
I'd like to be able to access each "file" (or content of a file) by the index - in order to later train it with an NLP model. Also, that's why I used the line.strip() because I need each content to be stripped into "lines".
Here is the code I tried, however, I get the Error:
IndexError: list index out of range
os.chdir(r'C:\Users\User1\Article\BBC\bbc\entertainment')
ent_txts = glob.glob('*.txt')
ent_docs = []
d=0
for i in ent_txts:
with open(i, 'r') as f:
for line in f:
ent_docs[d].append(line.strip())
d+=1
I think the problem is with the fact that I'm trying to address a list index that hasn't been created.
I'm sure there's must be a simple way to do it, though I can't find it.
I'd be glad for any help!

The error is because you don't have any inner list to insert to. I would fix it like so:
for i in ent_txts:
with open(i, 'r') as f:
file_lines = [line.strip() for line in f]
ent_docs.append(file_lines)

from collections import defaultdict
os.chdir(r'C:\Users\User1\Article\BBC\bbc\entertainment')
ent_txts = glob.glob('*.txt')
ent_docs = defaultdict(list)
d=0
for i in ent_txts:
with open(i, 'r') as f:
for line in f:
ent_docs[d].append(line.strip())
d+=1

Related

python how to put difference of 2 files in a list?

I have 2 files with emailadresses in them and some of these emailadresses are the same and some aren't. I need to see which of the emailadresses in file1 aren't in file2. How can I do that? Also it would be great if I can put them in a list too.
here's what I got:
'file1 = open("competitor_accounts.txt")
file2 = open("accounts.txt")'
I know it ain't much, but I need help getting started
I thought maybe using a for loop with if statements? but I just don't know how.
You can read each file's contents to a separate list and then compare the lists to each other like so
with open('accounts.txt') as f:
accounts = [line for line in f]
with open('competitor_accounts.txt') as f:
competitors = [line for line in f]
accounts_not_competitors = [line for line in accounts if line not in competitors]
competitors_not_accounts = [line for line in competitors if line not in accounts]
You can use open as well with readlines() but using with is commonly more acceptable since you don't need to explicitly close() the file after you're done reading it.
file_a = open('accounts.txt')
accounts = file_a.readlines()
file_a.close()
The two rows in the end form an expression to generate a new list based on matches in the existing lists. These can be written out to an easier form:
accounts_not_competitors = []
for line in accounts:
if line not in competitors:
accounts_not_competitors.append(line)
I believe this should be enough to get you started with the syntax and functionality in case you wanted to do some other comparisons between the two.
Assuming that only one email is a line in each file
First save each file in a list and create another list that you will save the difference in.
Loop through file1 list and check if each item in file1 list is present in file2 list if not add that item to the diff list
f1_list = []
f2_list = []
diff = []
with open(file1name, 'r', encoding='utf-8') as f1:
for line in f1:
f1_list.append(line)
with open(file2name, 'r', encoding='utf-8') as f2:
for line in f2:
f2_list.append(line)
for email in f1_list:
if not email in f2_list:
diff.append(email)
print(diff)
You can use set
with open('competitor_accounts.txt', 'r') as file:
competitor_accounts = set([mail for mail in file])
with open('accounts.txt', 'r') as file:
accounts = set([mail for mail in file])
result = list(competitor_accounts - accounts)

Python3 - list index out of range - extracting data from file

I want to extract data from a file and change the value of an entry with a 'for-loop'.
f = open(r"C:\Users\Measurement\LOGGNSS.txt", "r")
x=0
content = [[],[]]
for line in f:
actualline = line.strip()
content.append(actualline.split(","))
x+=1
f.close
print(x)
for z in range(x):
print(z)
print(content[z][1])
IndexError: list index out of range
Using a real value instead of the variable 'z' works fine. But I need to change all first entries in the whole 2D-Array.
Why it does not work?
Your code has several problems.
First of all, use the with statement to open/close files correctly.
Then, you don't need to use a variable like x to keep track of the number of lines, just use enumerate() instead!
Here is how I would refactor your code to make it slimmer and more readable.
input_file = r"C:\Users\Measurement\LOGGNSS.txt"
content = []
with open(input_file, 'r') as f:
for line in f:
clean_line = line.strip().split(",")
content.append(clean_line)
for z, data in enumerate(content):
print(z,'\n',data)
Note that you could print the content while reading the file in one single loop.
with open(input_file, 'r') as f:
for z, line in enumerate(f):
clean_line = line.strip().split(",")
content.append(clean_line)
print(z,'\n', clean_line)
Finally, if you are dealing with a plain and simple csv file, then use the csv module from the standard library.
import csv
with open(input_file, 'r') as f:
content = csv.reader(f, delimiter=',')
You initialize your content with two empty arrays, so both of these will fail to find the first index ([1]), just initialize it with an empty array
content = []

importing from a text file to a dictionary

filename:dictionary.txt
YAHOO:YHOO
GOOGLE INC:GOOG
Harley-Davidson:HOG
Yamana Gold:AUY
Sotheby’s:BID
inBev:BUD
code:
infile = open('dictionary.txt', 'r')
content= infile.readlines()
infile.close()
counters ={}
for line in content:
counters.append(content)
print(counters)
i am trying to import contents of the file.txt to the dictionary. I have searched through stack overflow but please an answer in a simple way (not with open...)
First off, instead of opening and closing the files explicitly you can use with statement for opening the files which, closes the file automatically at the end of the block.
Secondly, as the file objects are iterator-like objects (one shot iterable) you can loop over the lines and split them with : character. You can do all of these things as a generator expression within dict function:
with open('dictionary.txt') as infile:
my_dict = dict(line.strip().split(':') for line in infile)
I assume that you don't have semi-colons in your keys.
In that case you should:
#read lines from your file
lines = open('dictionary.txt').read().split('\n')
#create an empty dictionary
dict = {}
#split every lines at ':' and use the left element as a key for the right value
for l in lines:
content = l.split(':')
dict[content[0]] = content[1]

python: create a list of strings

I have a number of files that I am reading in. I would like to have a list that contains the file contents. I read the whole content into a string.
I need a list that looks like this:
["Content of the first file", "content of the second file",...]
I have tried various ways like append, extend or insert, but they all expect a list as parameter and not a str so I end up getting this:
[["Content of the first file"], ["content of the second file"],...]
How can I get a list that contains strings and then add strings without turning it into a list of lists?
EDIT
Some more code
for file in os.listdir("neg"):
with open("neg\\"+file,'r', encoding="utf-8") as f:
linesNeg.append(f.read().splitlines())
for file in os.listdir("pos"):
with open("pos\\"+file,'r', encoding="utf-8") as f:
linesPos.append(f.read().splitlines())
listTotal = linesNeg + linesPos
contents_list = []
for filename in filename_list:
with open(filename) as f:
contents_list.append(f.read())
There's definitely more than one way to do it. Assuming you have the opened files as file objects f1 and f2:
alist = []
alist.extend([f1.read(), f2.read()])
or
alist = [f.read() for f in (f1, f2)]
Personally I'd do something like this, but there's more than one way to skin this cat.
file_names = ['foo.txt', 'bar.txt']
def get_string(file_name):
with open(file_name, 'r') as fh:
contents = fh.read()
return contents
strings = [get_string(f) for f in file_names]

Opening a file in Python

Question:
How can I open a file in python that contains one integer value per line. Make python read the file, store data in a list and then print the list?
I have to ask the user for a file name and then do everything above. The file entered by the user will be used as 'alist' in the function below.
Thanks
def selectionSort(alist):
for index in range(0, len(alist)):
ismall = index
for i in range(index,len(alist)):
if alist[ismall] > alist[i]:
ismall = i
alist[index], alist[ismall] = alist[ismall], alist[index]
return alist
I think this is exactly what you need:
file = open('filename.txt', 'r')
lines = [int(line.strip()) for line in file.readlines()]
print(lines)
I didn't use a with statement here, as I was not sure whether or not you intended to use the file further in your code.
EDIT: You can just assign an input to a variable...
filename = input('Enter file path: ')
And then the above stuff, except open the file using that variable as a parameter...
file = open(filename, 'r')
Finally, submit the list lines to your function, selectionSort.
selectionSort(lines)
Note: This will only work if the file already exists, but I am sure that is what you meant as there would be no point in creating a new one as it would be empty. Also, if the file specified is not in the current working directory you would need to specify the full path- not just the filename.
Easiest way to open a file in Python and store its contents in a string:
with open('file.txt') as f:
contents = f.read()
for your problem:
with open('file.txt') as f:
values = [int(line) for line in f.readlines()]
print values
Edit: As noted in one of the other answers, the variable f only exists within the indented with-block. This construction automatically handles file closing in some error cases, which you would have to do with a finally-construct otherwise.
You can assign the list of integers to a string or a list
file = open('file.txt', mode = 'r')
values = file.read()
values will have a string which can be printed directly
file = open('file.txt', mode = 'r')
values = file.readlines()
values will have a list for each integer but can't be printed directly
f.readlines() read all the lines in your file, but what if your file contains a lot of lines?
You can try this instead:
new_list = [] ## start a list variable
with open('filename.txt', 'r') as f:
for line in f:
## remove '\n' from the end of the line
line = line.strip()
## store each line as an integer in the list variable
new_list.append(int(line))
print new_list

Categories