to change a text file containing multiline strings - python

I have a text file consisting of multiline (hundreds of lines actually) strings. Each of the strings starts with '&' sign. I want to change my text file in a way that only the first 300 characters of each string remain in the new file. How I can do this by using python?

You can read a file and loop over the lines to do what you want. Strings are easily slicable in python to get the first 300 to write to another file.
file = open(path,"r")
lines = file.readlines()
newFile = open(newPath,"w")
for index, line in enumerate(lines):
newLine = line[0:301]
newFile.writelines([newLine])
Hope this is what you meant

You could do something like this:
# Open output file in append mode
with open('output.txt', 'a') as out_file:
# Open input file in read mode
with open("input.txt", "r") as in_file:
for line in in_file:
# Take first 300 characters from line
# I believe this works even when line is < 300 characters
new_line = line[0:300]
# Write new line to output
# (You might need to add '\n' for new lines)
out_file.write(new_line)
print(new_line)

You can use the string method split to split your lines, then you can use slices to keep only the 300 first characters of each split.
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line[:300]))
This version preserves ends of line \n within your strings.
If you want to remove ends of line in each individual string, you can use replace:
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line.replace("\n", "")[:300]))
Note that your new file will end with an empty line.
Another note is, depending on the size of your file, you may rather use a generator function version, instead of split which results in the whole file content being loaded in memory as a list of strings.

Related

Extracting the data from the same position over multiple lines in a string

Fairly simple question but I can't figure out where i'm going wrong. I have a text file which I have split into multiple lines. I want to print a certain location from each line, characters 14 to 20 but when I run the below code it prints a blank set of a characters.
with open('filetxt', 'r') as file:
data = file.read().rstrip()
for line in data:
print(line[14:20])
If you want to read the file line by line, try:
with open('filetxt', 'r') as file:
for line in file:
print(line[14:20])
I think you're using the wrong read() method. read() reads the whole file at once you might want to use readlines() which returns a list of the read lines. I.e.:
with open('filetxt', 'r') as file:
lines = file.readlines()
for line in lines:
print(line[14:20])

Replace an arrow character, repeating headers and blank lines in text file and paste the data cleanly in Excel sheet

My attempt to remove arrow character, blank lines and headers from this text file is as below -
I am trying to ignore arrow character and blank lines and write in the new file MICnew.txt but my code doesn't do it. Nothing changes in the new file.
Please help, Thanks so much
I have attached sample file as well.
import re
with open('MIC.txt') as oldfile, open('MICnew.txt', 'w') as newfile:
for line in oldfile:
newfile.write(re.sub(r'[^\x00-\x7f]',r' ',line))
with open('MICnew.txt','r+') as file:
for line in file:
if not line.isspace():
file.write(line)
You can't read from and write to the same file simultaneously. When you open a file with mode r+, the I/O pointer is initially at the beginning but reading will push it to the end (as explained in this answer). So in your case, you read the first line of the file, which moves the pointer to the end of the file. Then you write out that line (unless it's all whitespace) but crucially, the pointer stays at the end. That means on the next iteration of the loop you will have reached the end of the file and your program stops.
To avoid this, read in all the contents of the file first, then loop over them and write out what you want:
file_data = Path('MICnew.txt').read_text()
with open('MICnew.txt', 'w') as out_handle: # THIS WILL OVERWRITE THE FILE!
for line in file_data.splitlines():
if not line.isspace():
file.write(line)
But that double loop is a bit clumsy and you can instead combine the two steps into one:
with open('MIC.txt', errors='ignore') as oldfile,
open('MICnew.txt', 'w') as newfile:
for line in oldfile:
clean_line = re.sub(r'[^\x00-\x7f]', ' ', line.strip('\x0c'))
if not clean_line.isspace():
newfile.write(clean_line)
In order to remove non-Unicode characters, the file is opened with errors='ignore' which will omit the improperly encoded characters. Since the sample file contains a number of rogue form feed characters throughout, it explicitly removes them (ASCII code 12 or \x0c in hex).

Find, Replace inline file from multiple lists in Python

I have three python lists:
filePaths
textToFind
textToReplace
The lists are always equal lengths and in the correct order.
I need to open each file in filePaths, find the line in textToFind, and replace the line with textToReplace. I have all the code that populates the lists. I am stuck on making the replacements. I have tried:
for line in fileinput.input(filePath[i], inplace=1):
sys.stdout.write(line.replace(find[i], replace[i]))
How do I iterate over each file to make the text replacements on each line that matches find?
When you need to use the indices of the items in a sequence while iterating over that sequence, use enumerate.
for i, path in enumerate(filePath):
for line in fileinput.input(path, inplace=1):
sys.stdout.write(line.replace(find[i], replace[i]))
Another option would be to use zip, which will give you one item from each sequence in order.
for path, find_text, replace_text in zip(filePath, textToFind, textToReplace):
for line in fileinput.input(path, inplace=1):
sys.stdout.write(line.replace(find_text, replace_text))
Note that for Python 2.x zip will produce a new list that can be iterated - so if the sequences you are zipping are huge it will consume memory. Python 3.x zip produces an iterator so it doesn't have that feature.
With a normal file object you could read the entire file into a variable and perform the string replacement on the whole file at once.
I might do something like this without more information
for my_file in file_paths:
with open(my_file, 'r') as cin, open(my_file, 'w') as cout:
lines = cin.readlines() #store the file in mem so i can overwrite it.
for line in lines:
line = line.replace(find, replace) # change as needed
cout.write(line)
Iterate over all the file paths, open the file up for reading and a separate one for writing. Store the files lines in a variable as in this code i will be overwriting the original file. Do your replace, remember if there is nothing to replace python just leaves the line alone. Write the line back to file.
You can read file to some temporary variable, make changes, and then write it back:
with open('file', 'r') as f:
text = f.read()
with open('file', 'w') as f:
f.write(text.replace('aaa', 'bbb'))

Replace certain element in only first line of the text file

I have a text file and would like to replace certain elements which is "NaN".
I usually have used file.replace function for change NaNs with a certain number through entire text file.
Now, I would like to replace NaNs with a certain number in only first line of text file, not whole text.
Would you give me a hint for this problem?
You can only read the whole file, call .replace() for the first line and write it to the new file.
with open('in.txt') as fin:
lines = fin.readlines()
lines[0] = lines[0].replace('old_value', 'new_value')
with open('out.txt', 'w') as fout:
for line in lines:
fout.write(line)
If your file isn't really big, you can use just .join():
with open('out.txt', 'w') as fout:
fout.write(''.join(lines))
And if it is really big, you would probably better read and write lines simultaneously.
You can hack this provided you accept a few constraints. The replacement string needs to be of equal length to the original string. If the replacement string is shorter than the original, pad the shorter string with spaces to make it of equal length (this only works if extra spaces in your data is acceptable). If the replacement string is longer than the original you can not do the replacement in place and need to follow Harold's answer.
with open('your_file.txt', 'r+') as f:
line = next(f) # grab first line
old = 'NaN'
new = '0 ' # padded with spaces to make same length as old
f.seek(0) # move file pointer to beginning of file
f.write(line.replace(old, new))
This will be fast on any length file.

Remove whitespaces in the beginning of every string in a file in python?

How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.

Categories