Extracting the data from the same position over multiple lines in a string - python

Fairly simple question but I can't figure out where i'm going wrong. I have a text file which I have split into multiple lines. I want to print a certain location from each line, characters 14 to 20 but when I run the below code it prints a blank set of a characters.
with open('filetxt', 'r') as file:
data = file.read().rstrip()
for line in data:
print(line[14:20])

If you want to read the file line by line, try:
with open('filetxt', 'r') as file:
for line in file:
print(line[14:20])

I think you're using the wrong read() method. read() reads the whole file at once you might want to use readlines() which returns a list of the read lines. I.e.:
with open('filetxt', 'r') as file:
lines = file.readlines()
for line in lines:
print(line[14:20])

Related

to change a text file containing multiline strings

I have a text file consisting of multiline (hundreds of lines actually) strings. Each of the strings starts with '&' sign. I want to change my text file in a way that only the first 300 characters of each string remain in the new file. How I can do this by using python?
You can read a file and loop over the lines to do what you want. Strings are easily slicable in python to get the first 300 to write to another file.
file = open(path,"r")
lines = file.readlines()
newFile = open(newPath,"w")
for index, line in enumerate(lines):
newLine = line[0:301]
newFile.writelines([newLine])
Hope this is what you meant
You could do something like this:
# Open output file in append mode
with open('output.txt', 'a') as out_file:
# Open input file in read mode
with open("input.txt", "r") as in_file:
for line in in_file:
# Take first 300 characters from line
# I believe this works even when line is < 300 characters
new_line = line[0:300]
# Write new line to output
# (You might need to add '\n' for new lines)
out_file.write(new_line)
print(new_line)
You can use the string method split to split your lines, then you can use slices to keep only the 300 first characters of each split.
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line[:300]))
This version preserves ends of line \n within your strings.
If you want to remove ends of line in each individual string, you can use replace:
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line.replace("\n", "")[:300]))
Note that your new file will end with an empty line.
Another note is, depending on the size of your file, you may rather use a generator function version, instead of split which results in the whole file content being loaded in memory as a list of strings.

Combined effect of reading lines twice?

As a practice, I am learning to reading a file.
As is obvious from code, hopefully, I have a file in working/root whatever directory. I need to read it and print it.
my_file=open("new.txt","r")
lengt=sum(1 for line in my_file)
for i in range(0,lengt-1):
myline=my_file.readlines(1)[0]
print(myline)
my_file.close()
This returns error and says out of range.
The text file simply contains statements like
line one
line two
line three
.
.
.
Everything same, I tried myline=my_file.readline(). I get empty 7 lines.
My guess is that while using for line in my_file, I read up the lines. So reached end of document. To get same result as I desire, I do I overcome this?
P.S. if it mattersm it's python 3.3
No need to count along. Python does it for you:
my_file = open("new.txt","r")
for myline in my_file:
print(myline)
Details:
my_file is an iterator. This a special object that allows to iterate over it.
You can also access a single line:
line 1 = next(my_file)
gives you the first line assuming you just opened the file. Doing it again:
line 2 = next(my_file)
you get the second line. If you now iterate over it:
for myline in my_file:
# do something
it will start at line 3.
Stange extra lines?
print(myline)
will likely print an extra empty line. This is due to a newline read from the file and a newline added by print(). Solution:
Python 3:
print(myline, end='')
Python 2:
print myline, # note the trailing comma.
Playing it save
Using the with statement like this:
with open("new.txt", "r") as my_file:
for myline in my_file:
print(myline)
# my_file is open here
# my_file is closed here
you don't need to close the file as it done as soon you leave the context, i.e. as soon as you continue with your code an the same level as the with statement.
You can actually take care of all of this at once by iterating over the file contents:
my_file = open("new.txt", "r")
length = 0
for line in my_file:
length += 1
print(line)
my_file.close()
At the end, you will have printed all of the lines, and length will contain the number of lines in the file. (If you don't specifically need to know length, there's really no need for it!)
Another way to do it, which will close the file for you (and, in fact, will even close the file if an exception is raised):
length = 0
with open("new.txt", "r") as my_file:
for line in my_file:
length += 1
print(line)

How can I read a file to a string starting at a given word without knowing the line number?

I have test results in a log file that are formatted like:
useless info
useless info
======================
useful info
useful info
======================
test success
The number of lines in each section can vary, so I want to check for the first appearance of the double equal character '==' and read that line until the end of the file into a string. Currently I'm using the following code to read the whole file into the string.
with open ("Report.txt", "r") as myfile:
data = myfile.read()
Thanks for the help!
useful = []
with open ("Report.txt", "r") as myfile:
for line in myfile:
if "===" in line:
break
for line in myfile:
useful.append(line)
a_string = "".join(useful)
I would however prefer to hide it away in a generator, like this:
def report_iterator():
with open ("Report.txt", "r") as myfile:
for line in myfile:
if "===" in line:
break
for line in myfile:
yield line
for line in report_iterator():
# do stuff with line
All the filtering and nitpicking is done in the generator function, and you can separate the logic of "filtering input" from the logic of "working with the input".
You could read line by line, and by default not store the lines. When you get the line starting with '==', then all lines you read until you read the second '==' line you store in your string or list.
If you've got the whole file in memory, you can get "everything but the first section" like this:
useful = data.split('======================\n',1)[1]
That splits the data on the first occurrence of your delimiter, returning everything after the delimiter.
myfile = open("Report.txt", "r")
while myfile.readline()[:2] != '==':
pass
for line in myfile:
data = line

Remove whitespaces in the beginning of every string in a file in python?

How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.

How to delete parts of a file in python?

I have a file named a.txt which looks like this:
I'm the first line
I'm the second line.
There may be more lines here.
I'm below an empty line.
I'm a line.
More lines here.
Now, I want to remove the contents above the empty line(including the empty line itself).
How could I do this in a Pythonic way?
Basically you can't delete stuff from the beginning of a file, so you will have to write to a new file.
I think the pythonic way looks like this:
# get a iterator over the lines in the file:
with open("input.txt", 'rt') as lines:
# while the line is not empty drop it
for line in lines:
if not line.strip():
break
# now lines is at the point after the first paragraph
# so write out everything from here
with open("output.txt", 'wt') as out:
out.writelines(lines)
Here are some simpler versions of this, without with for older Python versions:
lines = open("input.txt", 'rt')
for line in lines:
if not line.strip():
break
open("output.txt", 'wt').writelines(lines)
and a very straight forward version that simply splits the file at the empty line:
# first, read everything from the old file
text = open("input.txt", 'rt').read()
# split it at the first empty line ("\n\n")
first, rest = text.split('\n\n',1)
# make a new file and write the rest
open("output.txt", 'wt').write(rest)
Note that this can be pretty fragile, for example windows often uses \r\n as a single linebreak, so a empty line would be \r\n\r\n instead. But often you know the format of the file uses one kind of linebreaks only, so this could be fine.
Naive approach by iterating over the lines in the file one by one top to bottom:
#!/usr/bin/env python
with open("4692065.txt", 'r') as src, open("4692065.cut.txt", "w") as dest:
keep = False
for line in src:
if keep: dest.write(line)
if line.strip() == '': keep = True
The fileinput module (from the standard library) is convenient for this kind of thing. It sets things up so you can act as though your are editing the file "in-place":
import fileinput
import sys
fileobj=iter(fileinput.input(['a.txt'], inplace=True))
# iterate through the file until you find an empty line.
for line in fileobj:
if not line.strip():
break
# Iterators (like `fileobj`) pick up where they left off.
# Starting a new for-loop saves you one `if` statement and boolean variable.
for line in fileobj:
sys.stdout.write(line)
Any idea how big the file is going to be?
You could read the file into memory:
f = open('your_file', 'r')
lines = f.readlines()
which will read the file line by line and store those lines in a list (lines).
Then, close the file and reopen with 'w':
f.close()
f = open('your_file', 'w')
for line in lines:
if your_if_here:
f.write(line)
This will overwrite the current file. Then you can pick and choose which lines from the list you want to write back in. Probably not a very good idea if the file gets to large though, since the entire file has to reside in memory. But, it doesn't require that you create a second file to dump your output.
from itertools import dropwhile, islice
def content_after_emptyline(file_object):
return islice(dropwhile(lambda line: line.strip(), file_object), 1, None)
with open("filename") as f:
for line in content_after_emptyline(f):
print line,
You could do a little something like this:
with open('a.txt', 'r') as file:
lines = file.readlines()
blank_line = lines.index('\n')
lines = lines[blank_line+1:] #\n is the index of the blank line
with open('a.txt', 'w') as file:
file.write('\n'.join(lines))
and that makes the job much simpler.

Categories