Pasting and working on value from a file to another file - python

I make a function which read text.txt file and then past the value from text.txt file to n_text.txt file. In this file I will need add number in front of every line of sentences. Example
input:
text.txt
this is my
txt file
use for the code
output:
n_text.txt
1 this is my
2 txt file
3 use for the code
I have try my code like this:
with open('text.txt') as file, open('n_text.txt') as file2:
lines = file.readlines()
numb = 0
for line in lines:
numb += 1
file2.write(str(numb)+ .join(line))
and get invalid syntax error. I don't know what should i do or how should i fix my code, i did try research but didn't find any good result for this.

Try this:
with open('text.txt') as file, open('n_text.txt', mode="w") as file2:
for numb, line in enumerate(file.readlines()):
file2.write(f"{numb+1} {line}")
Using enumerate means we don't have to count the lines ourselves, we use f-strings to simplify the output text, and we open n_text.txt in write mode.
Hope this helps :)
enumerate iterates over items, yielding an index (starting from 0) and the item.
eg:
fruits = ['apple', 'pear', 'banana']
for index, fruit in enumerate(fruits):
print(index, fruit)
Output:
0 apple
1 pear
2 banana

The syntax error returned should tell you more about where the error is. It seems to be on the last line:
file2.write(str(numb)+ .join(line))
When you use .join(), you use it incorrectly. I also don't think the join function will be useful in solving your problem. Here is how the function can be used, and the output it would give.
>>> 'hello'.join('hi')
'hhelloi'
Instead, you may want to add (+) the strings together directly.
>>> 'hello' + 'hi'
'hellohi'

Why not try using the format function to achieve your desired result?
with open('text.txt') as file, open('n_text.txt', 'w') as file2:
lines = file.readlines()
numb = 0
for line in lines:
numb += 1
file2.write("{} {}".format(numb, line))

Related

Counting number of words and lines in a file in Python

How to write a function that reads a file and appends line number and the number of words in the line at the end of each line?
The expected output should be something like the following
Hello world, how are you? 1 5 # first line, 5 words
I am good. 2 3 #second line 3 words
Is it possible in python to have a def w() tha could open a file with a word count for every line and a line counter while still having the original text from the file?
Yes.
I'll give you a more concise answer than #ironkey however if you are new to python you might find theirs clearer.
with open('data.text', 'r') as f:
for line_cnt, line in enumerate(f, start=1):
word_cnt = len(line.split(" "))
print(line, f"| {line_cnt} | {word_cnt}")
Edit: Used enumerate as suggested to make it even shorter.

How do I convert each of the words to a number?

I am trying to read a file and overwrite its contents with numbers. That means for the first word it would be 1, for the second word it would be 2, and so on.
This is my code:
file=open("reviews.txt","r+")
i=1
for x in file:
line=file.readline()
word=line.split()
file.write(word.replace(word,str(i)))
i+=1
file.close()
Input file:
This movie is not so good
This movie is good
Expected output file:
1 2 3 4 5 6
7 8 9 10
During compilation time I keep getting an error that: AttributeError: 'list' object has no attribute 'replace'. Which one is the list object? All the variables are strings as far as I know. Please help me.
It might be OK to first create the output, with any method that you like, then write it once in the file. Maybe, file.write in the loop wouldn't be so necessary.
Steps
We open the file, get all its content, and close it.
Using re module in DOTALL mode, we'd get anything that we want to replace in the first capturing group, in this case, with (\S+) or (\w+) etc., then we collect all other chars in the second capturing group with (.+?), then with re.findall, we'd generate two-elements tuples in a list, which we'd want to replace the first element of those tuples.
We then write a loop, and replace the first group with an incrementing counter, which is the idea here, and the second group untouched, and we would stepwise concat both as our new content to string_out
We finally open the [empty] file, and write the string_out, and close it.
Test
import re
file = open("reviews.txt","r+")
word_finder, counter, string_out = re.findall(r"(\S+)|(.+?)", file.read(), re.DOTALL), 0, ''
file.close()
for item in word_finder:
if item[0]:
counter += 1
string_out += str(counter)
else:
string_out += item[1]
try:
file = open("reviews.txt","w")
file.write(string_out)
finally:
file.close()
Output
1 2 3 4 5 6
7 8 9 10
RegEx
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Reference
re — Regular expression operations
The call to split is returning a list, which you need to iterate to handle the replacement of each word:
with open("reviews.txt", "r+") as file:
i = 1
line = file.readline()
while line:
words = line.split()
for item in words:
file.write(str(i) + ' ')
i += 1
line = file.readline()
file.close()

how to make python print the whole file instead of just the single line

I have a .csv file that has just one column of numbers and I want to read each number in the column and print it in the console like this:
1
2
3
4
here is the code that I have used:
file_reference2 = open("file1.csv", "r")
read_lines1 = file_reference1.readlines()
for line1 in read_lines1:
print(line1)
file_reference1.close()
what I expect is:
1
2
3
in the console.
But what I get is:
1
And the program stops. How do I make it print the whole file?
You create a variable file_reference2, but later call file_reference1.readlines() (notice the difference in variable names). You are probably reading lines from a wrong file as this code works well for me if I change that line to file_reference2.readlines() like this:
file_reference2 = open("file1.csv", "r")
read_lines1 = file_reference2.readlines()
for line1 in read_lines1:
print(line1)
file_reference2.close()

Python 2: Using regex to pull out whole lines from text file with substring from another

I have a noob question. I am using python 2.7.6 on a Linux system.
What I am trying to achieve is to use specific numbers in a list, which correspond to the last number in a database text file, to pull out the whole line in the database text file and print it (going to write the line to another text file later).
Code I am currently trying to use:
reg = re.compile(r'(\d+)$')
for line in "text file database":
if list_line in reg.findall(line):
print line
What I have found is that I can input a string like
list_line = "9"
and it will output the whole line of the corresponding database entry just fine. But trying to use the list_line to input strings one by one in a loop doesn't work.
Can anyone please help me out or direct me to a relevant source?
Appendix:
The text file database text file contains data similar to these:
gnl Acep_1.0 ACEP10001-PA 1
gnl Acep_1.0 ACEP10002-PA 2
gnl Acep_1.0 ACEP10003-PA 3
gnl Acep_1.0 ACEP10004-PA 4
gnl Acep_1.0 ACEP10005-PA 5
gnl Acep_1.0 ACEP10006-PA 7
gnl Acep_1.0 ACEP10007-PA 6
gnl Acep_1.0 ACEP10008-PA 8
gnl Acep_1.0 ACEP10009-PA 9
gnl Acep_1.0 ACEP10010-PA 10
The search text file list_line looks similar to this:
2
5
4
6
Updated original code:
#import extensions
import linecache
import re
#set re.compiler parameters
reg = re.compile(r'(\d+)$')
#Designate and open list file
in_list = raw_input("list input: ")
open_list = open(in_list, "r")
#Count lines in list file
total_lines = sum(1 for line in open_list)
print total_lines
#Open out file in write mode
outfile = raw_input("output: ")
open_outfile = open(outfile, "w")
#Designate db string
db = raw_input("db input: ")
open_db = open(db, "r")
read_db = open_db.read()
split_db = read_db.splitlines()
print split_db
#Set line_number value to 0
line_number = 0
#Count through line numbers and print line
while line_number < total_lines:
line_number = line_number + 1
print line_number
list_line = linecache.getline(in_list, line_number)
print list_line
for line in split_db:
if list_line in reg.findall(line) :
print line
#close files
open_list.close()
open_outfile.close()
open_db.close()
Short version: your for loop is going through the "database" file once, looking for the corresponding text and stopping. So if you have multiple lines you want to pull out, like in your list_line file, you'll only end up pulling out a single line.
Also, the way you're looking for the line number isn't a great idea. What happens if you're looking for line 5, but the second line just happens to have the digit 5 somewhere in its data? E.g., if the second line looks like:
gnl Acep_1.0 ACEP15202-PA 2
Then searching for "5" will return that line instead of the one you intended. Instead, since you know the line number is going to be the last number on the line, you should take advantage of Python's str.split() function (which splits a string on spaces, and returns the last item of and the fact that you can use -1 as a list index to get the last item of a list, like so:
def get_one_line(line_number_string):
with open("database_file.txt", "r") as datafile: # Open file for reading
for line in datafile: # This is how you get one line at a time in Python
items = line.rstrip().split()
if items[-1] == line_number_string:
return line
One thing I haven't talked about is the rstrip() function. When you iterate over a file in Python, you get each line as-is, with its newline characters still intact. When you print it later, you'll probably be using print -- but print also prints a newline character at the end of what you give it. So unless you use rstrip() you'll end up with two newlines characters instead of one, resulting in an extra blank line between every line of your output.
The other thing you're probably not familiar with there is the with statement. Without going into too much detail, that ensures that your database file will be closed when the return line statement is executed. The details of how with works are interesting reading for someone who knows a lot about Python, but as a Python newbie you probably won't want to dive into that just yet. Just remember that when you open a file, try to use with open("filename") as some_variable: and Python will Do The Right Thing™.
Okay. So now that you have that get_one_line() function, you can use it like this:
with open("list_line.txt", "r") as line_number_file:
for line in line_number_file:
line_number_string = line.rstrip() # Don't want the newline character
database_line = get_one_line(line_number_string)
print database_line # Or do whatever you need to with it
NOTE: If you're using Python 3, replace print line with print(line): in Python 3, the print statement became a function.
There's more that you could do with this code (for example, opening the database file every single time you look for a line is kind of inefficient -- reading the whole thing into memory once and then looking for your data afterwards would be better). But this is good enough to get started with, and if your database file is small, the time you'd lose worrying about efficiency would be far more than the time you'd lose just doing it the simple-but-slower way.
So see if this helps you, then come back and ask more questions if there's something you don't understand or that isn't working.
You can build your regex pattern from the content of the list_line file:
import re
with open('list_line.txt') as list_line:
pattern = list_line.read().replace('\n', '|')
regex = re.compile('(' + pattern + ')$')
print('pattern = ' + regex.pattern)
with open('database.txt') as database:
for line in database:
if regex.search(line):
print(line)

How to find and replace multiple lines in text file?

I am running Python 2.7.
I have three text files: data.txt, find.txt, and replace.txt. Now, find.txt contains several lines that I want to search for in data.txt and replace that section with the content in replace.txt. Here is a simple example:
data.txt
pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit
find.txt
apple
banana
cherry
replace.txt
1
2
3
So, in the above example, I want to search for all occurences of apple, banana, and cherry in the data and replace those lines with 1,2,3.
I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible. One dumb way is to concatenate everything into one long string and use replace, and then output to a new text file so all the line breaks will be restored.
import re
data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')
data_str = ""
find_str = ""
replace_str = ""
for line in data: # concatenate it into one long string
data_str += line
for line in find: # concatenate it into one long string
find_str += line
for line in replace:
replace_str += line
new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)
But this seems so convoluted and inefficient for a large data file like mine. Also, the replace function appears to be deprecated so that's not good.
Another way is to step through the lines and keep a track of which line you found a match.
Something like this:
location = 0
LOOP1:
for find_line in find:
for i, data_line in enumerate(data).startingAtLine(location):
if find_line == data_line:
location = i # found possibility
for idx in range(NUMBER_LINES_IN_FIND):
if find_line[idx] != data_line[idx+location] # compare line by line
#if the subsequent lines don't match, then go back and search again
goto LOOP1
Not fully formed code, I know. I don't even know if it's possible to search through a file from a certain line on or between certain lines but again, I'm just a bit confused in the logic of it all. What is the best way to do this?
Thanks!
If the file is large, you want to read and write one line at a time, so the whole thing isn't loaded into memory at once.
# create a dict of find keys and replace values
findlines = open('find.txt').read().split('\n')
replacelines = open('replace.txt').read().split('\n')
find_replace = dict(zip(findlines, replacelines))
with open('data.txt') as data:
with open('new_data.txt', 'w') as new_data:
for line in data:
for key in find_replace:
if key in line:
line = line.replace(key, find_replace[key])
new_data.write(line)
Edit: I changed the code to read().split('\n') instead of readliens() so \n isn't included in the find and replace strings
couple things here:
replace is not deprecated, see this discussion for details:
Python 2.7: replace method of string object deprecated
If you are worried about reading data.txt in to memory all at once, you should be able to just iterate over data.txt one line at a time
data = open("data.txt", 'r')
for line in data:
# fix the line
so all that's left is coming up with a whole bunch of find/replace pairs and fixing each line. Check out the zip function for a handy way to do that
find = open("find.txt", 'r').readlines()
replace = open("replace.txt", 'r').readlines()
new_data = open("new_data.txt", 'w')
for find_token, replace_token in zip(find, replace):
new_line = line.replace(find_token, replace_token)
new_data.write(new_line + os.linesep)

Categories