how to add numbering inside the for loop? - python

I have a problem printing the line with the number. How to add numbering before of line? I made a comment on which part.
f = open('filename', "r")
lines = f.readlines()
for line in lines:
synonyms = []
print(line) # I want my interface to be, 1. word 2. word, and so on
answer = input("Answer: ").lower()
for syn in wordnet.synsets(line.strip()):
for l in syn.lemmas():
synonyms.append(l.name())
My code is just printing
word1
Answer:
word2
Answer:
My ideal code is:
1.word1
Answer:
2.word2
Answer:

replace your loop:
for i, line in enumerate(lines):
print(str(i + 1) + '. ' + str(line))
"i" will be the number waited...
you could use string interpolation if you are at min python3.6
print(f'{i + 1}. {line}')

Instead of traversing through every line in the lines list, just go through the indexes of every element, and then print the index+1, as obviously list indexes start from 0, but we want our numbering to start from 1. So, instead of printing only the line, we'll print line_no: line
f = open('filename', "r")
lines = f.readlines()
for line_no in range(len(lines)):
synonyms = []
print(f"{line_no+1}: {lines[line_no]}") # I want my interface to be, 1. word 2. word, and so on
answer = input("Answer: ").lower()
for syn in wordnet.synsets(line.strip()):
for l in syn.lemmas():
synonyms.append(l.name())
Hope my answer helped :D

Try this:
f = open('filename', "r")
lines = f.readlines()
for line_no,line in enumerate(lines):
synonyms = []
print(str(line_no+1)+'.'+line) # I want my interface to be, 1. word 2. word, and so on
answer = input("Answer: ").lower()
for syn in wordnet.synsets(line.strip()):
for l in syn.lemmas():
synonyms.append(l.name())

Add a variable called count and increment it with each iteration of the for loop.
Code:
count = 1
f = open('filename', "r")
lines = f.readlines()
for line in lines:
synonyms = []
print(str(count) + ". " + line)
count += 1
answer = input("Answer: ").lower()
for syn in wordnet.synsets(line.strip()):
for l in syn.lemmas():
synonyms.append(l.name())

Related

Q: How to find the next word of a specific word in a txt-file

New here!
I am searching for the following or the next word for the word "I". Ex "I am new here" -> the next word is "am".
import re
word = 'i'
with open('tedtalk.txt', 'r') as words:
pat = re.compile(r'\b{}\b \b(\w+)\b'.format(word))
print(pat.findall(words))
with open('tedtalk.txt','r') as f:
for line in f:
phrase = 'I'
if phrase in line:
next(f)
These are the codes i have developed so far, but i am kind of stuck already. Thanks in advance!
you have 2 options.
first, with split
with open('tedtalk.txt','r') as f:
data = f.read()
search_word = "I"
list_of_words = data.split()
next_word = list_of_words[list_of_words.index(search_word) + 1]
second, with regex:
import re
regex = re.compile(r"\bI\b\s*?\b(\w+)\b")
with open('tedtalk.txt','r') as f:
data = f.readlines()
result = regex.findall(data)
In your first piece of code, words is a file object, and there will be problems with line-by-line verification. For example, in the following case, am2 may not be found.
tedtalk.txt
I am1 new here, I
am2 new here, I am3 new here
So I modified the program and read 4096 bytes multiple times to prevent the file from being too large and causing the memory to explode.
In order to prevent the data being truncated causing it to be missed, the I will be looked for from the end of the data for a single read, and if found, the data following it will be truncated and put in front of the next read.
import re
regex = re.compile(r"\bI\b\s*?\b(\w+)\b")
def find_index(data, target_value="I"):
"""Look for spaces from the back, the intention is to find the value between two space blocks and check if it is equal to `target_value`"""
index = once_read_data.rfind(" ")
if index != -1:
index2 = index
while True:
index2 = once_read_data.rfind(" ", 0, index2)
if index2 == -1:
break
t = index - index2
# two adjacent spaces
if t == 1:
continue
elif t == 2 and once_read_data[index2 + 1: index] == target_value:
return index2
result = []
with open('tedtalk.txt', 'r') as f:
# Save data that might have been truncated last time.
prev_data = ""
while True:
once_read_data = prev_data + f.read(4096)
if not once_read_data:
break
index = find_index(once_read_data)
if index is not None:
# Slicing based on the found index.
prev_data = once_read_data[index:]
once_read_data = once_read_data[:index]
else:
prev_data = ""
result += regex.findall(once_read_data)
print(result)
Output:
['am1', 'am2', 'am3']
search_word = 'I'
prev_data = ""
result = []
with open('tedtalk.txt', 'r') as f:
while True:
data = prev_data + f.readline()
if data == prev_data: # reached eof
break
list_of_words = data.split()
for word_pos, word in enumerate(list_of_words[:-1]):
if word == search_word:
result.append(list_of_words[word_pos+1])
prev_data = list_of_words[-1] + ' '
print(result)
I modified the code to read the text file by line, this should handle unlimited large file. The code also addresses the case where the search word is the last word on a line by taking as next word the first word of the next line.
If you rather treat each line independently and ignore the search word if it is the last word in the line, the code can be simplified as follows:
search_word = 'I'
result = []
with open('tedtalk.txt', 'r') as f:
while True:
data = f.readline()
if not data: # reached eof
break
list_of_words = data.split()
for word_pos, word in enumerate(list_of_words[:-1]):
if word == search_word:
result.append(list_of_words[word_pos+1])
print(result)

reading a text file and counting how many times a word is repeated. Using .split function. Now wants it to ignore case sensitive

Getting the desired output so far.
The program prompts user to search for a word.
user enters it and the program reads the file and gives the output.
'ashwin: 2'
Now i want it to ignore case sensitive. For example, "Ashwin" and "ashwin" both shall return 2, as it contains two ashwin`s in the text file.
def word_count():
file = "test.txt"
word = input("Enter word to be searched:")
k = 0
with open(file, 'r') as f:
for line in f:
words = line.split()
for i in words:
if i == word:
k = k + 1
print(word + ": " + str(k))
word_count()
You could use lower() to compare the strings in this part if i.lower() == word.lower():
For example:
def word_count():
file = "test.txt"
word = input("Enter word to be searched:")
k = 0
with open(file, 'r') as f:
for line in f:
words = line.split()
for i in words:
if i.lower() == word.lower():
k = k + 1
print(word + ": " + str(k))
word_count()
You can either use .lower on the line and word to eliminate case.
Or you can use the built-in re module.
len(re.findall(word, text, flags=re.IGNORECASE))
Use the Counter class from collections that returns an dictionary with key value pairs that could be accessed using O(1) time.
from collections import Counter
def word_count():
file = "test.txt"
with open(file, 'r') as f:
words = f.read().replace('\n', '').lower().split()
count = Counter(words)
word = input("Enter word to be searched:")
print(word, ":", count.get(word.lower()))

Split string within list into words in Python

I'm a newbie in Python, and I need to write a code in Python that will read a text file, then split each words in it, sort it and print it out.
Here is the code I wrote:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
words = list()
for line in fh:
line = line.strip()
line.split()
lst.append(line)
lst.sort()
print lst
That's my output -
['Arise fair sun and kill the envious moon', 'But soft what light through yonder window breaks', 'It is the east and Juliet is the sun', 'Who is already sick and pale with grienter code herew',
'with', 'yonder']
However, when I try to split lst.split() it saying
List object has no attribute split
Please help!
You should extend the new list with the splitted line, rather than attempt to split the strings after appending:
for line in fh:
line = line.strip()
lst.extend(line.split())
The issue is split() does not magically mutate the string that is split into a list. You have to do sth with the return value.
for line in fh:
# line.split() # expression has has no effect
line = line.split() # statement does
# lst += line # shortcut for loop underneath
for token in line:
lst = lst + [token]
lst += [token]
The above is a solution that uses a nested loop and avoids append and extend. The whole line by line splitting and sorting can be done very concisely, however, with a nested generator expression:
print sorted(word for line in fh for word in line.strip().split())
You can do:
fname = raw_input("Enter file name: ")
fh = open(fname, "r")
lines = list()
words = list()
for line in fh:
# get an array of words for this line
words = line.split()
for w in words:
lines.append(w)
lines.sort()
print lines
To avoid dups:
no_dups_list = list()
for w in lines:
if w not in no_dups_list:
no_dups_list.append(w)

Print out the character, word, and line amounts using Python

This is what I have so far:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
infile = open(filename)
lines = infile.readlines()
words = infile.read()
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
When executed, return the number of lines properly, but return 0 for words and character counts. Not sure why...
You can iterate through the file once and count lines, words and chars without seeking back to the beginning multiple times, which you would need to do with your approach because you exhaust the iterator when counting lines:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
lines = chars = 0
words = []
with open(filename) as infile:
for line in infile:
lines += 1
words.extend(line.split())
chars += len(line)
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
Or alternatively use the side effect that the file is at the end for chars:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
words = []
with open(filename) as infile:
for lines, line in enumerate(infile, 1):
words.extend(line.split())
chars = infile.tell()
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
you need to go back to beginning of file with infile.seek(0) after you read the position is at the end, seek(0) resets it to the start, so that you can read again.
infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
Output:
line count: 2
word count: 19
character counter: 113
other way of doing it....:
from collections import Counter
from itertools import chain
infile = open('data')
lines = infile.readlines()
cnt_lines = len(lines)
words = list(chain.from_iterable([x.split() for x in lines]))
cnt_words = len(words)
cnt_chars = len([ c for word in words for c in word])
# show words frequency
print(Counter(words))
You have exhausted the iterator after you call to readlines, you can seek back to the start but really you don't need to read all the file into memory at all:
def stats(filename):
chars, words, dupes = 0, 0, False
seen = set()
with open(filename) as f:
for i, line in enumerate(f, 1):
chars += len(line)
spl = line.split()
words += len(spl)
if dupes or not seen.isdisjoint(spl):
dupes = True
elif not dupes:
seen.update(spl)
return i, chars, words, dupes
Then assign the values by unpacking:
no_lines, no_chars, no_words, has_dupes = stats("your_file")
You may want to use chars += len(line.rstrip()) if you don't want to include the line endings. The code only stores exactly the amount of data needed, using readlines, read, dicts of full data etc.. means for large files your code won't be very practical
File_Name = 'file.txt'
line_count = 0
word_count = 0
char_count = 0
with open(File_Name,'r') as fh:
# This will produce a list of lines.
# Each line of the file will be an element of the list.
data = fh.readlines()
# Count of total number for list elements == total number of lines.
line_count = len(data)
for line in data:
word_count = word_count + len(line.split())
char_count = char_count + len(line)
print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)

Python Count not resetting?

I'm trying to insert an increment after the occurance of ~||~ in my .txt. I have this working, however I want to split it up, so after each semicolon, it starts back over at 1.
So Far I have the following, which does everything except split up at semicolons.
inputfile = "output2.txt"
outputfile = "/output3.txt"
f = open(inputfile, "r")
words = f.read().split('~||~')
f.close()
count = 1
for i in range(len(words)):
if ';' in words [i]:
count = 1
words[i] += "~||~" + str(count)
count = count + 1
f2 = open(outputfile, "w")
f2.write("".join(words))
Why not first split the file based on the semicolon, then in each segment count the occurences of '~||~'.
import re
count = 0
with open(inputfile) as f:
semicolon_separated_chunks = f.read().split(';')
count = len(re.findall('~||~', semicolon_separated_chunks))
# if file text is 'hello there ~||~ what is that; what ~||~ do you ~|| mean; nevermind ~||~'
# then count = 4
Instead of resetting the counter the way you are now, you could do the initial split on ;, and then split the substrings on ~||~. You'd have to store your words another way, since you're no longer doing words = f.read().split('~||~'), but it's safer to make an entirely new list anyway.
inputfile = "output2.txt"
outputfile = "/output3.txt"
all_words = []
f = open(inputfile, "r")
lines = f.read().split(';')
f.close()
for line in lines:
count = 1
words = line.split('~||~')
for word in words:
all_words.append(word + "~||~" + str(count))
count += 1
f2 = open(outputfile, "w")
f2.write("".join(all_words))
See if this works for you. You also may want to put some strategically-placed newlines in there, to make the output more readable.

Categories