Counting number of words and lines in a file in Python

Counting number of words and lines in a file in Python - python

How to write a function that reads a file and appends line number and the number of words in the line at the end of each line?
The expected output should be something like the following
Hello world, how are you? 1 5 # first line, 5 words
I am good. 2 3 #second line 3 words

Is it possible in python to have a def w() tha could open a file with a word count for every line and a line counter while still having the original text from the file?
Yes.

I'll give you a more concise answer than #ironkey however if you are new to python you might find theirs clearer.
with open('data.text', 'r') as f:
for line_cnt, line in enumerate(f, start=1):
word_cnt = len(line.split(" "))
print(line, f"| {line_cnt} | {word_cnt}")
Edit: Used enumerate as suggested to make it even shorter.

Related

Pasting and working on value from a file to another file

I make a function which read text.txt file and then past the value from text.txt file to n_text.txt file. In this file I will need add number in front of every line of sentences. Example
input:
text.txt
this is my
txt file
use for the code
output:
n_text.txt
1 this is my
2 txt file
3 use for the code
I have try my code like this:
with open('text.txt') as file, open('n_text.txt') as file2:
lines = file.readlines()
numb = 0
for line in lines:
numb += 1
file2.write(str(numb)+ .join(line))
and get invalid syntax error. I don't know what should i do or how should i fix my code, i did try research but didn't find any good result for this.

Try this:
with open('text.txt') as file, open('n_text.txt', mode="w") as file2:
for numb, line in enumerate(file.readlines()):
file2.write(f"{numb+1} {line}")
Using enumerate means we don't have to count the lines ourselves, we use f-strings to simplify the output text, and we open n_text.txt in write mode.
Hope this helps :)
enumerate iterates over items, yielding an index (starting from 0) and the item.
eg:
fruits = ['apple', 'pear', 'banana']
for index, fruit in enumerate(fruits):
print(index, fruit)
Output:
0 apple
1 pear
2 banana

The syntax error returned should tell you more about where the error is. It seems to be on the last line:
file2.write(str(numb)+ .join(line))
When you use .join(), you use it incorrectly. I also don't think the join function will be useful in solving your problem. Here is how the function can be used, and the output it would give.
>>> 'hello'.join('hi')
'hhelloi'
Instead, you may want to add (+) the strings together directly.
>>> 'hello' + 'hi'
'hellohi'

Why not try using the format function to achieve your desired result?
with open('text.txt') as file, open('n_text.txt', 'w') as file2:
lines = file.readlines()
numb = 0
for line in lines:
numb += 1
file2.write("{} {}".format(numb, line))

Finding the Characters in a Specific Line in a File

I am trying to find the characters in one specific line of my code. Say the line is 4.
My file consists of:
1. randomname
2. randomname
3.
4. 34
5. 12202018
My code consists of:
with open('/Users/eviemcmahan/PycharmProjects/eCYBERMISSION/eviemcmahan', 'r') as my_file:
data = my_file.readline(4)
characters = 0
for data in my_file:
words = data.split(" ")
for i in words:
characters += len(i)
print(characters)
I am not getting an error, I am just getting the number "34"
I would appreciate any help on how to get a correct amount of characters for line 4.

my_file.readline(4) does not read the 4th line, instead reads the next line but only the 4 firsts characters. To read a specific line you need to, for example, read all the lines and put them in a list. Then is easy to get the line you want. You could also read line by line and stop whenever you find yourself in the line you desired.
Going with the first approach and using the count method of strings, it is straight-forward to count any character at a specific line. For example:
line_number = 3 # Starts with 0
with open('test.txt', 'r') as my_file:
lines = my_file.readlines() # List containing all the lines as elements of the list
print(lines[line_number ].count('0')) # 0
print(lines[line_number ].count('4')) # 2

Python 2: Using regex to pull out whole lines from text file with substring from another

I have a noob question. I am using python 2.7.6 on a Linux system.
What I am trying to achieve is to use specific numbers in a list, which correspond to the last number in a database text file, to pull out the whole line in the database text file and print it (going to write the line to another text file later).
Code I am currently trying to use:
reg = re.compile(r'(\d+)$')
for line in "text file database":
if list_line in reg.findall(line):
print line
What I have found is that I can input a string like
list_line = "9"
and it will output the whole line of the corresponding database entry just fine. But trying to use the list_line to input strings one by one in a loop doesn't work.
Can anyone please help me out or direct me to a relevant source?
Appendix:
The text file database text file contains data similar to these:
gnl Acep_1.0 ACEP10001-PA 1
gnl Acep_1.0 ACEP10002-PA 2
gnl Acep_1.0 ACEP10003-PA 3
gnl Acep_1.0 ACEP10004-PA 4
gnl Acep_1.0 ACEP10005-PA 5
gnl Acep_1.0 ACEP10006-PA 7
gnl Acep_1.0 ACEP10007-PA 6
gnl Acep_1.0 ACEP10008-PA 8
gnl Acep_1.0 ACEP10009-PA 9
gnl Acep_1.0 ACEP10010-PA 10
The search text file list_line looks similar to this:
2
5
4
6
Updated original code:
#import extensions
import linecache
import re
#set re.compiler parameters
reg = re.compile(r'(\d+)$')
#Designate and open list file
in_list = raw_input("list input: ")
open_list = open(in_list, "r")
#Count lines in list file
total_lines = sum(1 for line in open_list)
print total_lines
#Open out file in write mode
outfile = raw_input("output: ")
open_outfile = open(outfile, "w")
#Designate db string
db = raw_input("db input: ")
open_db = open(db, "r")
read_db = open_db.read()
split_db = read_db.splitlines()
print split_db
#Set line_number value to 0
line_number = 0
#Count through line numbers and print line
while line_number < total_lines:
line_number = line_number + 1
print line_number
list_line = linecache.getline(in_list, line_number)
print list_line
for line in split_db:
if list_line in reg.findall(line) :
print line
#close files
open_list.close()
open_outfile.close()
open_db.close()

Short version: your for loop is going through the "database" file once, looking for the corresponding text and stopping. So if you have multiple lines you want to pull out, like in your list_line file, you'll only end up pulling out a single line.
Also, the way you're looking for the line number isn't a great idea. What happens if you're looking for line 5, but the second line just happens to have the digit 5 somewhere in its data? E.g., if the second line looks like:
gnl Acep_1.0 ACEP15202-PA 2
Then searching for "5" will return that line instead of the one you intended. Instead, since you know the line number is going to be the last number on the line, you should take advantage of Python's str.split() function (which splits a string on spaces, and returns the last item of and the fact that you can use -1 as a list index to get the last item of a list, like so:
def get_one_line(line_number_string):
with open("database_file.txt", "r") as datafile: # Open file for reading
for line in datafile: # This is how you get one line at a time in Python
items = line.rstrip().split()
if items[-1] == line_number_string:
return line
One thing I haven't talked about is the rstrip() function. When you iterate over a file in Python, you get each line as-is, with its newline characters still intact. When you print it later, you'll probably be using print -- but print also prints a newline character at the end of what you give it. So unless you use rstrip() you'll end up with two newlines characters instead of one, resulting in an extra blank line between every line of your output.
The other thing you're probably not familiar with there is the with statement. Without going into too much detail, that ensures that your database file will be closed when the return line statement is executed. The details of how with works are interesting reading for someone who knows a lot about Python, but as a Python newbie you probably won't want to dive into that just yet. Just remember that when you open a file, try to use with open("filename") as some_variable: and Python will Do The Right Thing™.
Okay. So now that you have that get_one_line() function, you can use it like this:
with open("list_line.txt", "r") as line_number_file:
for line in line_number_file:
line_number_string = line.rstrip() # Don't want the newline character
database_line = get_one_line(line_number_string)
print database_line # Or do whatever you need to with it
NOTE: If you're using Python 3, replace print line with print(line): in Python 3, the print statement became a function.
There's more that you could do with this code (for example, opening the database file every single time you look for a line is kind of inefficient -- reading the whole thing into memory once and then looking for your data afterwards would be better). But this is good enough to get started with, and if your database file is small, the time you'd lose worrying about efficiency would be far more than the time you'd lose just doing it the simple-but-slower way.
So see if this helps you, then come back and ask more questions if there's something you don't understand or that isn't working.

You can build your regex pattern from the content of the list_line file:
import re
with open('list_line.txt') as list_line:
pattern = list_line.read().replace('\n', '|')
regex = re.compile('(' + pattern + ')$')
print('pattern = ' + regex.pattern)
with open('database.txt') as database:
for line in database:
if regex.search(line):
print(line)

Reset iteration index after using next() Python [duplicate]

This question already has answers here:
How can I iterate over overlapping (current, next) pairs of values from a list?
(12 answers)
Closed last month.
I am trying to edit a text file using fileinput.input(filename, inplace=1)
The text file has say 5 lines:
line 0
line 1
line 2
line 3
line 4
I wish to change data of line 1 based on info in line 2.
So I use a for loop
infile = fileinput.input(filename, inplace=1)
for line in infile:
if(line2Data):
#do something on line1
print line,
else:
line1=next(infile)
line2=next(infile)
#do something with line2
Now my problem is after the 1st iteration the line is set to line2 so in 2nd iteration the line is set to line3. I want line to be set to line1 in 2nd iteration. I have tried line = line but it doesn't work.
Can you please let me know how I am reset the iteration index on line which gets changed due to next
PS: This is a simple example of a huge file and function I am working on.

As far as I know (and that is not much) there is no way in resetting an iterator. This SO question is maybe useful. Since you say the file is huge, what I can think of is to process only part of the data. Following nosklos answer in this SO question, I would try something like this (but that is really just a first guess):
while True:
for line in open('really_big_file.dat')
process_data(line)
if some_condition==True:
break
Ok, your answer that you might want to start from the previous index is not captured with this attempt.

There is no way to reset the iterator, but there is nothing stopping your from doing some of your processing before you start your loop:
infile = fileinput.input("foo.txt")
first_lines = [next(infile) for x in range(3)]
first_lines[1] = first_lines[1].strip() + " this is line2 > " + first_lines[2]
print "\n".join(first_lines)
for line in infile:
print line
This uses next() to read the first 3 lines into a list. It then updates line1 based on line2 and prints all of them. It then continues to print the rest of the file using a normal loop.
For your sample, the output would be:
line 0
line 1 this is line2 > line 2
line 2
line 3
line 4
Note, if your are trying to modify the first lines of the file itself, rather than just display it, you would need to write the whole file to a new file. Writing to a file does not work like in a Word processor where all the lines move down when a line or character is added. It works as if you were in overwrite mode.

What qualifies collection of strings to become a line?

Following code is taking every character and running the loop as many times. But when I save the same line in a text file and perform same operation, the loop is only run once for 1 line. It is bit confusing. Possible reason I can think off is that first method is running the loop by considering "a" as a list. Kindly correct me if I am wrong. Also let me know how to create a line in code itself rather first saving it in a file and then using it.
>>> a="In this world\n"
>>> i=0
>>> for lines in a:
... i=i+1
... print i
...
1
2
3
4
5
6
7
8
9
10
11
12
13

You're trying to loop over a, which is a string. Regardless of how many newlines you have in a string, when you loop over it, you're going to go character by character.
If you want to loop through a bunch of lines, you have to use a list:
lines = ["this is line 1", "this is another line", "etc"]
for line in lines:
print line
If you have a string containing a bunch of newlines and want to convert it to a list of lines, use the split method:
text = "This is line 1\nThis is another line\netc"
lines = text.split("\n")
for line in lines:
print line
The reason why you go line by line when reading from a file is because the people who implemented Python decided that it would be more useful if iterating over a file yielded a collection of lines instead of a collection of characters.
However, a file and a string are different things, and you should not necessarily expect that they work in the same way.

Just change the name of the variable when looping on the line:
i = 0
worldLine ="In this world\n"
for character in worldLine:
i=i+1
print i
count = 0
readFile = open('myFile','r')
for line in readFile:
count += 1
now it should be clear what's going on.
Keeping meaningful names will save you a lot of debugging time.
Considering doing the following:
i = 0
worldLine =["In this world\n"]
for character in worldLine:
i=i+1
print i
if you want to loop on a list of lines consisting of worldLine only.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Counting number of words and lines in a file in Python - python

How to write a function that reads a file and appends line number and the number of words in the line at the end of each line? The expected output should be something like the following Hello world, how are you? 1 5 # first line, 5 words I am good. 2 3 #second line 3 words

Is it possible in python to have a def w() tha could open a file with a word count for every line and a line counter while still having the original text from the file? Yes.

Related

Pasting and working on value from a file to another file

Finding the Characters in a Specific Line in a File

Python 2: Using regex to pull out whole lines from text file with substring from another

Reset iteration index after using next() Python [duplicate]

What qualifies collection of strings to become a line?

Categories

Resources