Splitting text at a determined character using Python

Splitting text at a determined character using Python - python

I'm trying to write a program that takes a .txt file with a messy text, read it and every time it comes across a full stop (.) it should create a new line, essentially breaking every paragraph into several. However I'm struggling to find something that will actually look for the specified character within the text.
I was thinking about having the program read the text character by character, then writing them to a different file and having it add a "\n" whenever it ran across a ".", but I'm having troubles implementing it along the lines of:
with open("test.txt", "r+") as f:
while True:
char = f.read(1)
if not char:
break
else:
if char==("."):
f.write(char + "\n")
else:
f.write(char)
break
I'm guessing this particular piece of code is a bloody mess, but I've been struggling with this problem for some time and at this point I'm trying pretty much anything I can think of.

Please Try below:
with open("test.txt", "r+") as f:
data=f.read().replace('.', '.\n')

Related

Modifying letters in a file

I'm new to programming so I'm pretty lost. I'm currently learning Python and I need to open a text file and change every letter to the next one in the alphabet (e.g a -> b, b -> c, etc.). How would I go about writing a code like this?

This sounds like a neat problem to work on for a beginner.
Things you may want to look at:
The open() function, which allows you to open files and read/write to them. For example
https://docs.python.org/3/library/functions.html#open
with open('test.out', 'r+') as fi:
all_lines = fi.readlines() # Read all lines from the file
fi.write('this string will be written to the file')
# The file is closed at this point in the code; `with()` is a context manager, look that up
The os.replace() function, which lets you overwrite one file with another. You might try reading the input file, writing to a new output file, then overwriting the input file with the new output file; this will let you do that.
https://docs.python.org/3/library/os.html#os.replace
Replacing a character with the next increment of a character is an interesting twist, as it's not something that a lot of python programmers have to deal with. Here's one way to increment a character:
x = 'c'
print(chr(ord(x) + 1)) # will print 'd'
Without just giving away the answer, this should give you the pieces that you need to get started, feel free to ask more questions.

I think that this will work very well. The code can be shortened I think but Im still not sure how. Not an expert with with open statements.
with open("(your text file path)", "r") as f:
data = f.readline()
new_data = ""
for x in range(len(data)):
i = ord(data[x][0])
i += 1
x = chr(i)
new_data += x
print(new_data)
with open("(your text file path)", "w") as f:
f.write(new_data)
You must change your letters to numbers so that you can increment them by one, and then change them back to letters. This should work.

Reading through a .m File and Python keeps reading a character in the .m File as a line?

I am trying to read the text within a .m file in Python and Python keeps reading a single character within the .m file as a line when I use file.readline(). I've also had issues with trying to remove certain parts of the line before adding it to a list.
I've tried adjusting where the readline is on for loops that I have set up since I have to read through multiple files in this program. No matter where I put it, the string always comes out separated by character. I'm new to Python so I'm trying my best to learn what to do.
# Example of what I did
with open('MyFile.m') as f:
for line in f:
text = f.readline()
if text.startswith('%'):
continue
else:
my_string = text.strip("=")
my_list.append(my_string)
This has only partially worked as it will still return parts of lines that I do not want and when trying to format the output by putting spaces between new lines it output like so:
Expected: "The String"
What happened: "T h e S t r i n g"

Without your input file I've had to make some guesses here
Input file:
%
The
%
String
%
Solution:
my_list = []
with open('MyFile.m') as f:
for line in f:
if not line.startswith('%'):
my_list.append(line.strip("=").strip())
print(' '.join(my_list))
The readLine() call was unnecessary as the for loop already gets you the line. The empty if was negated to only catch the part that you cared about. Without your actual input file I can't help with the '=' part. If you have any clarifications I'd be glad to help further.

As suggested by Xander, you shouldn't call readline since the for line in f does that for you.
my_list = []
with open('MyFile.m') as f:
for line in f:
line = line.strip() # lose the \n if you want to
if line.startswith('%'):
continue
else:
my_string = line.strip("=")
my_list.append(my_string)

python - Trying to do a program to replace a given line, by the same line but all CAPS

Trying to do a college exercise where I'm supposed to replace a given line in a file, by the same line but written in all caps. The problem is we can only write in the same file, and in that exact line, we can't write in the rest of the file.
This is the code I have so far, but I can't figure out how to go to the line I want
def upper(n):
count=0
with open("upper.txt", "r+") as file:
lines = file.readlines()
file.seek(0)
for line in file.readlines():
if count == n:
pos = file.tell()
line1 = str(line.upper())
count += 1
file.seek(pos)
file.write(line1)
Help appreciated!

The problem lies in that your readlines already has read the entire file, and so the position of the "file cursor" is always at the end of the file. In theory, a simple fix should be:
Initialize pos to 0.
Read a single line.
If the current line counter indicates this is the one you want, set the position to pos again, update that line, and exit.
Update pos to point to the end of this line (so it points to the start of the next line).
Loop until satisfied.
In code, that would be this:
def upper(n):
count=0
with open("text.txt", "r+") as file:
pos = 0
for line in file.readlines():
if count == n:
line1 = line.upper()
break
pos = file.tell()
count += 1
file.seek(pos)
file.write(line1)
upper(5)
However! There is a snag. File operations are heavily buffered, and the for loop on readlines does not read one line at a time. Instead, for efficiency, it reads as much as possible, but it only "returns" the next line to your program. On a next run through your loop, it simply checks if it already had read enough of your text file to return the following line, and if not, it fills its internal buffer again. So, even while tell() will correctly be updated to the external file position – the value you see –, it does not reflect the "cursor" position of what you are processing at the time.
One way to circumvent this is to physically mimic what readlines does: read a single byte at a time, determine whether you have read an entire line (then this byte would be \n), and update your position and status based on this.
However, a more proper way of updating a file is to read it into memory in its entirety, change it, and write it back to disk. Changing part of an existing file with "r+" is usually recommended to use binary mode (where the position of each byte is known beforehand); admittedly, in theory your method should have worked as well, but as you see the file buffering defeats this.
Reading, changing, and writing the file entirely is as simple as this:
def better_upper(n):
count=0
with open("text.txt", "r") as file:
lines = file.readlines()
lines[n] = lines[n].upper()
with open("text.txt", "w") as file:
file.writelines(lines)
better_upper(5)
(Where the only caveat is that it always overwrites the original file. That is: if something unexpected goes wrong, it will probably erase text.txt. If you want a belt-and-suspenders approach, write to a new file, then check if it got written correctly. If it did, delete the old file and rename the new one. Left as an exercise to the reader.)

Write a program in Python 3.5 that reads a file, then writes a different file with the same text that was in the first one as well as more?

The exact question to this problem is:
*Create a file with a 20 lines of text and name it “lines.txt”. Write a program to read this a file “lines.txt” and write the text to a new file, “numbered_lines.txt”, that will also have line numbers at the beginning of each line.
Example:
Input file: “lines.txt”
Line one
Line two
Expected output file:
1 Line one
2 Line two
I am stuck, and this is what I have so far. I am a true beginner to Python and my instructor does not make things very clear. Critique and help much appreciated.
file_object=open("lines.txt",'r')
for ln in file_object:
print(ln)
count=1
file_input=open("numbered_lines.txt",'w')
for Line in file_object:
print(count,' Line',(str))
count=+1
file_object.close
file_input.close
All I get for output is the .txt file I created stating lines 1-20. I am very stuck and honestly have very little idea about what I am doing. Thank you

You have all the right parts, and you're almost there:
When you do
for ln in file_object:
print(ln)
you've exhausted the contents of that file, and you won't be able to read them again, like you try to do later on.
Also, print does not write to a file, you want file_input.write(...)
This should fix all of that:
infile = open("lines.txt", 'r')
outfile = open("numbered_lines.txt", 'w')
line_number = 1
for line in infile:
outfile.write(str(line_number) + " " + line)
infile.close()
outfile.close()
However, here is a more pythonic way to do it:
with open("lines.txt") as infile, open("numbered_lines.txt", 'w') as outfile:
for i, line in enumerate(infile, 1):
outfile.write("{} {}".format(i, line))

Good first try, and with that, I can go through your code and explain what you did right (or wrong)
file_object=open("lines.txt",'r')
for ln in file_object:
print(ln)
This is fine, though generally you want to put a space before and after assignments (you are assigning the results of open to file_object) and add a space after a,` when separating arguments, so you might want to write that like so:
file_object = open("lines.txt", 'r')
for ln in file_object:
print(ln)
However, at this point the internal reference in the file_object have reached the end of the file, so if you wish to reuse the same object, you need to seek back to the beginning position, which is 0. As your assignment only states write to the file (and not on the screen), the above loop should be omitted from the file (but I get what you want to do, you want to see the contents of the file immediately though sometimes instructors are pretty strict on what they accept). Moving on:
count=1
file_input=open("numbered_lines.txt",'w')
for Line in file_object:
Looks pretty normal so far, again, minor formatting issues. In Python, typically we name all variables lower-case, as names with Capitalization are generally reserved for class names (if you wish to, you may read about them). Now we enter into the loop you got
print(count,' Line',(str))
This prints not quite what you want. as ' Line' is enclosed inside a quote, it is treated as a string literal - so it's treated literally as text and not code. Given that you had assigned Line, you want to take out the quotes. The (str) at the end simply just print out the string object and it definitely is not what you want. Also, you forgot to specify the file you want to print to. By default it will print to the screen, but you want to print it to the the numbered_lines.txt file which you had opened and assigned to file_input. We will correct this later.
count=+1
If you format this differently, you are assigning +1 to count. I am guessing you wanted to use the += operator to increment it. Remember this on your quiz/tests.
Finally:
file_object.close
file_input.close
They are meant to be called as functions, you need to invoke them by adding parentheses at the end with arguments, but as close takes no arguments, there will be nothing inside the parentheses. Putting everything together, the complete corrected code for your program should look like this
file_object = open("lines.txt", 'r')
count = 1
file_input = open("numbered_lines.txt", 'w')
for line in file_object:
print(count, line, file=file_input)
count += 1
file_object.close()
file_input.close()
Run the program. You will notice that there is an extra empty line between every line of text. This is because by default the print function adds a new line end character; the line you got from the file included a new-line character at the end (that's what make them lines, right?) so we don't have to add our own here. You can of course change it to an empty string. That line will look like this.
print(count, line, file=file_input, end='')
Naturally, other Python programmers will tell you that there are Pythonic ways, but you are just starting out, don't worry too much about them (although you can definitely pick up on this later and I highly encourage you to!)

The right way to open a file is using a with statement:
with open("lines.txt",'r') as file_object:
... # do something
That way, the context manager introduced by with will close your file at the end of "something " or in case of exception.
Of course, you can close the file yourself if you are not familiar with that. Not that close is a method: to call it you need parenthesis:
file_object.close()
See the chapter 7.2. Reading and Writing Files, in the official documentation.

In the first loop you're printing the contents of the input file. This means that the file contents have already been consumed when you get to the second loop. (Plus the assignment didn't ask you to print the file contents.)
In the second loop you're using print() instead of writing to a file. Try file_input.write(str(count) + " " + Line) (And file_input seems like a bad name for a file that you will be writing to.)
count=+1 sets count to +1, i.e. positive one. I think you meant count += 1 instead.
At the end of the program you're calling .close instead of .close(). The parentheses are important!

Python: How to write a list of strings on separate lines but without a blank line

EDIT: See bottom of post for the entire code
I am new to this forum and I have an issue that I would be grateful for any help solving.
Situation and goal:
- I have a list of strings. Each string is one word, like this: ['WORD', 'LINKS', 'QUOTE' ...] and so on.
- I would like to write this list of words (strings) on separate lines in a new text file.
- One would think the way to do this would be by appending the '\n' to every item in the list, but when I do that, I get a blank line between every list item. WHY?
Please have a look at this simple function:
def write_new_file(input_list):
with open('TEKST\\TEKST_ny.txt', mode='wt') as output_file:
for linje in input_list:
output_file.write(linje + '\n')
This produces a file that looks like this:
WORD
LINKS
QUOTE
If I remove the '\n', then the file looks like this:
WORDLINKSQUOTE
Instead, the file should look like this:
WORD
LINKS
QUOTE
I am obviously doing something wrong, but after a lot of experimenting and reading around the web, I can't seem to get it right.
Any help would be deeply appreciated, thank you!
Response to link to thread about write() vs. writelines():
Writelines() doesn't fix this by itself, it produces the same result as write() without the '\n'. Unless I add a newline to every list item before passing it to the writelines(). But then we're back at the first option and the blank lines...
I tried to use one of the answers in the linked thread, using '\n'.join() and then write(), but I still get the blank lines.
It comes down to this: For some reason, I get two newlines for every '\n', no matter how I use it. I am .strip()'ing the list items of newline characters to be sure, and without the nl everything is just one massive block of texts anyway.
On using another editor: I tried open the txt-file in windows notepad and in notepad++. Any reason why these programs wouldn't display it correctly?
EDIT: This is the entire code. Sorry for the Norwegian naming. The purpose of the program is to read and clean up a text file and return the words first as a list and ultimately as a new file with each word on a new line. The text file is a list of Scrabble-words, so it's rather big (9 mb or something). PS: I don't advocate Scrabble-cheating, this is just a programming exercise :)
def renskriv(opprinnelig_ord):
nytt_ord = ''
for bokstav in opprinnelig_ord:
if bokstav.isupper() == True:
nytt_ord = nytt_ord + bokstav
return nytt_ord
def skriv_ny_fil(ny_liste):
with open('NSF\\NSF_ny.txt', 'w') as f:
for linje in ny_liste:
f.write(linje + '\n')
def behandle_kildefil():
innfil = open('NSF\\NSF_full.txt', 'r')
f = innfil.read()
kildeliste = f.split()
ny_liste = []
for item in kildeliste:
nytt_ord = renskriv(item)
nytt_ord = nytt_ord.strip('\n')
ny_liste.append(nytt_ord)
skriv_ny_fil(ny_liste)
innfil.close()
def main():
behandle_kildefil()
if __name__ == '__main__':
main()

I think there must be some '\n' among your lines, try to skip empty lines.
I suggest you this code.
def write_new_file(input_list):
with open('TEKST\\TEKST_ny.txt', 'w') as output_file:
for linje in input_list:
if not linje.startswith('\n'):
output_file.write(linje.strip() + '\n')

You've said in the comments that python is writing two carriage return ('\r') characters for each line feed ('\n') character you write. It's a bit bizaare that python is replacing each line feed with two carriage returns, but this is a feature of opening a file in text mode (normally the translation would be to something more useful). If instead you open your file in binary mode then this translation will not be done and the file should display as you wish in Notepad++. NB. Using binary mode may cause problems if you need characters outside the ASCII range -- ASCII is basically just latin letters (no accents), digits and a few symbols.
For python 2 try:
filename = "somefile.txt"
with open(filename, mode="wb") as outfile:
outfile.write("first line")
outfile.write("\n")
outfile.write("second line")
Python 3 will be a bit more tricky. For each string literal you wish you write you must prepend it with a b (for binary). For each string you don't have immediate access to, or don't wish to change to a binary string, then you must encode it using the encode() method on the string. eg.
filename = "somefile.txt"
with open(filename, mode="wb") as outfile:
outfile.write(b"first line")
outfile.write(b"\n")
some_text = "second line"
outfile.write(some_text.encode())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting text at a determined character using Python - python

Please Try below: with open("test.txt", "r+") as f: data=f.read().replace('.', '.\n')

Related

Modifying letters in a file

Reading through a .m File and Python keeps reading a character in the .m File as a line?

python - Trying to do a program to replace a given line, by the same line but all CAPS

Write a program in Python 3.5 that reads a file, then writes a different file with the same text that was in the first one as well as more?

Python: How to write a list of strings on separate lines but without a blank line

Categories

Resources