Print lists that contains second element only - python

I am working on counting the length of split sentences, but always get index out of range error when trying to print out lines/lists that has [1] within them.
The code:
for line in open("testing.txt"):
strip = line.rstrip()
words = strip.split(';')
first = words[0]
for test in words:
if words[1] in words:
print(words)
else:
continue
The split output of the sample .txt file are for example:
['"What does Bessie say I have done?" I asked.']
['Be seated somewhere', ' and until you can speak pleasantly, remain silent."']
['Of farthest Thule', ' and the Atlantic surge']
['Pours in among the stormy Hebrides."']
['"Alright, let's get out of here!" I yelled.']
So some sentences only got [0] element while the ones with [1] are the sentences I am trying to print out (The current if/else statement doesn't work).
The expected output (basically any split sentences/lists that has a second element):
['Be seated somewhere', ' and until you can speak pleasantly, remain silent."']
['Of farthest Thule', ' and the Atlantic surge']

You're getting this error because you try to access the second element of an array that contains only 1 string. In this case you want to check the length of the array
for line in open("testing.txt"):
strip = line.rstrip()
words = strip.split(';')
for test in words:
if len(words) > 1:
print(words)
else: # this else is not necessary
continue
Edit: If you want to print each sentences containing at least one ';' only once, you don't actually have to use a for loop. One concise way to get the desired output would be this:
for line in open("testing.txt"):
strip = line.rstrip()
words = strip.split(';')
if len(words) > 1:
print(words)

As far as I understand from your question you only trying to print the words list which has more than one element.
One simple way to do it is:
for line in open("testing.txt"):
strip = line.rstrip()
words = strip.split(';')
# first = words[0]
for test in words:
if len(words) > 1:
print(words)
Here you are just checking if the length of the words is greater than 1 and printing if that is the case
EDIT:
I think the for loop is unnecessary. All you want is to print lists of words greater than length 1.
So for that purpose:
for line in open("testing.txt"):
strip = line.rstrip()
words = strip.split(';')
if len(words) > 1:
print(words)
Here you are just splitting the sentences on ; and then checking after splitting if the length of the list (named words) is greater than 1; if so you are printing the list named words.
EDIT2:
As S3DEV had pointed out that you are opening a file inside for keyword which won't close your file automatically once you are out of for loop. As a result the file pointer remains open until the program stopped completely and it might cause weird issues. The best practice is to use with keyword. the with keyword automatically opens the file nad closes it once the block execution is complete, so you won't face any odd issues. form keeping a file pointer open.
with open("testing.txt", "r") as f: # this line open file as f in read-only format
for line in f:
strip = line.rstrip()
words = strip.split(';')
if len(words) > 1:
print(words)

Related

an item of list is not being equal to one another

I am creating a project game which will include palindrome words
I have a list of all the words in english and I want to check every word in the list and find the ones equal to eachother
file1 = open ('words.txt')
file2reversed = open ('words.txt')
words = file1.readlines()
print(words[3][::-1])
print()
if words[3][::-1] == words[3]:
print("equal")
else:
print("not")
my code looks like this, I wrote the 3rd word as a palindrome word and wanted to check if it is working and the output looks like this
aaa
aaa
not
why is words[3][::-1] not equal to words[3] even if it is a palindrome word?
Use file.read().splitlines() instead. file.readlines() returns lines with a newline appended to each string at the end, so when reversed, '\naaa' != 'aaa\n'.
More cleanly
file = open('words.txt')
text = file.read()
words = text.splitlines()
# words is a list of strings without '\n' at the end of each line.

read words from file, line by line and concatenate to paragraph

I have a really long list of words that are on each line. How do I make a program that takes in all that and print them all side by side?
I tried making the word an element of a list, but I don't know how to proceed.
Here's the code I've tried so far:
def convert(lst):
return([i for item in lst for i in item.split()])
lst = [''' -The list of words come here- ''']
print(convert(lst))
If you already have the words in a list, you can use the join() function to concatenate them. See https://docs.python.org/3/library/stdtypes.html#str.join
words = open('your_file.txt').readlines()
separator = ' '
print(separator.join(words))
Another, a little bit more cumbersome method would be to print the words using the builtin print() function but suppress the newline that print() normally adds automatically to the end of your argument.
words = open('your_file.txt').readlines()
for word in words:
print(word, end=' ')
Try this, and example.txt just has a list of words going down line by line.
with open("example.txt", "r") as a_file:
sentence = ""
for line in a_file:
stripped_line = line.strip()
sentence = sentence + f"{stripped_line} "
print(sentence)
If your input file is really large and you cant fit it all in memory, you can read the words lazy and write them to disk instead of holding the whole output in memory.
# create a generator that yields each individual line
lines = (l for l in open('words'))
with open("output", "w+") as writer:
# read the file line by line to avoid memory issues
while True:
try:
line = next(lines)
# add to the paragraph in the out file
writer.write(line.replace('\n', ' '))
except StopIteration:
break
You can check the working example here: https://replit.com/#bluebrown/readwritewords#main.py

How to make a Python program that recognizes words common to 2 text files?

So, I'm making a python program that will read the code from a .txt (source.txt)file, and see if source.txt has any words that are there in a certain wordlist (words.txt). Also, I need it to tell me which is the common word.
SO, any idea how to do this?
Text File:-
Hello, How are you today
I am doing very fine fine
I am also very cool
My friends are cool too
We are all very cool
Code: -
Not using any list comprehensions deliberately.
index = [] #Empty List
check = ['fine', 'cool'] #Words to check for
with open('Sample', 'r') as file: #Open Text File
for line in file: #Line in text file
for word in line.split(): #Split the line into words
for i in range(len(check)): #Check if words from check match the words in the line
if word == check[i]: #i equals the index of the word in the list "check"
index.append(i) #We add the index to our index list
#Find the most common index in our index list
max = 0
res = index[0]
for i in index:
freq = index.count(i)
if freq > max:
max = freq
res = i #The element with this index in "check" is the most common
print("The most common word is :", check[res],"It occurs", max, "times in the file")
Output:
The most common word is : cool It occurs 3 times in the file
Read from source txt file, either use regular expression or split to get list of words from the text file. Methods may vary.
Do same thing to your words.txt
Set & operator
below is bad but a working example :
f = open('./source.txt').read()
f2 = open('./words.txt').read()
a = set(' '.join(f.split('\n')).split(' '))
b = set(' '.join(f2.split('\n')).split(' '))
print (a&b)

Python duplicate words written into an outFile - where to define "i"

I sincerely apologize if this is the incorrect way to ask my question. This is my first time posting in Stack.
My inFile is six edited lines of the poem do not go gentle into the night. It should print out an outFile that contains the lines that contain a word that is greater than 3 letters, that is a duplicate. In example "rage rage against the dying of the light" would be printed because of "rage".
edit: When I run this it gives me an error saying "i" is undefined.
Oh, and I can't use any modules.
Here is my code:
def duplicateWordLines(inFile,outFile):
inFile=open(inFileName, "r")
outFile=open(outFileName, "w")
for line in inFile:
words=line.split() #split the lines
og=[] #orignal words
dups=[] #duplicate words
for word in words: #for each word in words
if og.count(i)>0 and line not in dups: #if the word appears more than once and not already in duplicates
dups.append(line) #add to duplicates
else: #if not a duplicate
og.append(i) #add to the original list - not to worry about it
for line in dups: #for the newly appended lines
outFile.write(line+'\n') #write in the outFile
#test case
inFileName="goodnightPoem.txt"
outFileName="goodnightPoemDUP.txt"
duplicateWordLines(inFileName,outFileName)
#should print
#rage rage against the dying of the light
#do not go gentle into that good good night
Thank you!
Try this out...
def duplicateWordLines(inFile,outFile):
inFile=open(inFileName, "r")
outFile=open(outFileName, "w")
for line in inFile:
# split the lines
words=line.split()
# remove all words less than 3 characters
words = [word for word in words if len(word)>3]
# make the list a set, so all duplicates are removed
no_dups = set(words)
# if there are more words in the words list than the
# no duplicate list, we must have a duplicate, so
# print the line
if len(words) > len(no_dups):
outFile.write(line+'\n') #write in the outFile
#test case
inFileName="file.txt"
outFileName="file_1.txt"
duplicateWordLines(inFileName,outFileName)
Regarding the i is undefined error, let's look at your for loop
for word in words: #for each word in words
if og.count(i)>0 and line not in dups: #if the word appears more than once and not already in duplicates
dups.append(line) #add to duplicates
else: #if not a duplicate
og.append(i) #add to the original list - not to worry about it
You don't actually define i anywhere, your loop defines word. You are blending a smart loop, i.e. for word for words with a range loop, like for i in range(0,len(words)). If we were to fix your loop, I think it would look something like this...
for word in words: #for each word in words
if og.count(word)>0 and line not in dups: #if the word appears more than once and not already in duplicates
dups.append(line) #add to duplicates
else: #if not a duplicate
og.append(word) #add to the original list - not to worry

python -- trying to count the length of the words from a file with dictionaries

def myfunc(filename):
filename=open('hello.txt','r')
lines=filename.readlines()
filename.close()
lengths={}
for line in lines:
for punc in ".,;'!:&?":
line=line.replace(punc," ")
words=line.split()
for word in words:
length=len(word)
if length not in lengths:
lengths[length]=0
lengths[length]+=1
for length,counter in lengths.items():
print(length,counter)
filename.close()
Use Counter. (<2.7 version)
You are counting the frequency of words in a single line.
for line in lines:
for word in length.keys():
print(wordct,length)
length is dict of all distinct words plus their frequency, not their length
length.get(word,0)+1
so you probably want to replace the above with
for line in lines:
....
#keep this at this indentaiton - will have a v large dict but of all words
for word in sorted(length.keys(), key=lambda x:len(x)):
#word, freq, length
print(word, length[word], len(word), "\n")
I would also suggest
Dont bring the file into memory like that, the file objects and handlers are now iterators and well optimised for reading from files.
drop the wordct and so on in the main lines loop.
rename length to something else - perhaps words or dict_words
Errr, maybe I misunderstood - are you trying to count the number of distinct words in the file, in which case use len(length.keys()) or the length of each word in the file, presumably ordered by length....
The question has been more clearly defined now so replacing the above answer
The aim is to get a frequency of word lengths throughout the whole file.
I would not even bother with line by line but use something like:
fo = open(file)
d_freq = {}
st = 0
while 1:
next_space_index = fo.find(" ", st+1)
word_len = next_space_index - st
d_freq.get(word_len,0) += 1
print d_freq
I think that will work, not enough time to try it now. HTH

Categories