python text splitter program

python text splitter program - python

I have a program that splits a text into sentences using the following rule:
Sentence boundaries occur at ".", "?" and "!" except:
A) Periods followed by digit with no intervening whitespaces. B) Periods followed by whitespaces followed by lower case letters. C) Periods with no followed whitespaces. D) Periods preceded with titles.
My written code is as given below:
file_name = raw_input("Enter the name of the text file: ")
txt_file = open('%s.txt' % file_name, 'r+')
text = txt_file.readline()
print; print "Original text is: "; print
print text
new_wrd = []
new_line = []
new_txt = []
while len(text.strip()) != 0:
for index, char in enumerate(text):
print char
if char == "." or char == "?" or char == "!":
if text[index+1] == " ":
if ("".join(new_wrd) == "Mrs" or "".join(new_wrd) == "Mr" or "".join(new_wrd) == "Ms"
or "".join(new_wrd) == "Dr" or "".join(new_wrd) == "Jr"):
new_wrd.append(char)
else:
if text[index+2].isupper():
new_line.append("".join(new_wrd))
new_line.append(char)
new_txt.append("".join(new_line))
new_line = []
new_wrd = []
else:
new_line.append("".join(new_wrd))
new_line.append(char + " ")
new_wrd = []
else:
new_wrd.append(char)
elif char == " ":
if ("".join(new_wrd) == "Mrs." or "".join(new_wrd) == "Mr." or "".join(new_wrd) == "Ms."
or "".join(new_wrd) == "Dr." or "".join(new_wrd) == "Jr.") or new_wrd != []:
new_line.append("".join(new_wrd))
new_line.append(" ")
new_wrd = []
else:
new_wrd.append(char)
text = txt_file.readline()
for txt in new_txt:
print txt
txt_file.write(txt)
For the given example:
Mr. XYZ is a good boy. He has just pass his B.Tech degree from ABC, Lmnop... At least, he has passed the degree.
The output should show:
Mr. XYZ is a good boy.
He has just pass his B.Tech degree from ABC, Lmnop...
At least, he has passed the degree.
But instead, it shows:
Mr. XYZ is a good boy.
He has just pass his B.Tech degree from ABC, Lmnop...
What are the corrections that could be made for proper output?
Also, the the code:
txt_file.write(txt)
not working. Why?

Related

Python strings formatting

Given a string containing at least one space character.
Output the substring located between the first and second spaces of the source string. If the string contains only one space, then output an empty string.
My attempt:
But input is incorrect, for example: user_input=Hello World my name , input is: World my , i don't know why , can you help me?
user_input = input("Enter your string: ")
space_counter = 0
for char in user_input:
if char == " ":
space_counter += 1
if space_counter > 1:
start_space_index = None
for i in range(len(user_input)):
if user_input[i] == " ":
start_space_index = i
break
second_space_index = None
for i in range(len(user_input)-1, -1, -1):
if user_input[i] == " ":
second_space_index = i
break
print(user_input[start_space_index+1: second_space_index])
else:
print("Empty string")

Example: 1
Assuming the input like:
hello my name is abc
Output should be
hello my
Example 2:
input
hello my
output
None
Code:
a = 'hello my name is abc'
obj = a.split(" ") #this splits like ['hello', 'my', 'name', 'is', 'abc']
if len(obj) > 2:
print(obj[0], obj[1])
else:
print None

Here, it is
user_input = input("Enter your string: ")
Lst = user_input.split(" ")
space_counter = 0
for char in user_input:
if char == " ":
space_counter += 1
if space_counter > 1:
start_space_index = None
for i in range(len(user_input)):
if user_input[i] == " ":
start_space_index = i
break
second_space_index = None
for i in range(len(user_input)-1, -1, -1):
if user_input[i] == " ":
second_space_index = i
break
if user_input[0] == " ":
print(Lst[0])
else:
print(Lst[1])

Why my each word reversing code is not reversing some words?

What I want to do
I am trying to make a program that reverses each words, but not reverses words in tags.
Example input and output:
Input:
Thank you stack overflow
Output:
knahT uoy kcats wolfrevo
If the word is in tags, it should be not reversed. Like this:
Input:
<tag>something
Ouput:
<tag>gnihtemos
My code
I tried to solve this using stack algorithm.
s = input()
def stackprint(st):
while st != []:
print(st.pop(), end="")
stack = []
tag = False
for ch in s:
if ch == '<':
stackprint(stack)
tag = True
print(ch, end="")
elif ch == '>':
tag = False
print(ch, end="")
elif tag:
print(ch, end="")
else:
if ch == ' ':
stackprint(stack)
print(ch, end="")
else:
stack.append(ch)
print("".join(stack))
The problem
But, my code is not working if there is only one word or there is no tag. When there is no tag, the last word is not reversed, and when there is only one word, it doesn't get reversed.
The output now:
First
When Input:
<tag>something
Ouput:
<tag>something
^ I need something to be reversed.
Second
Input:
Thank you stack overflow
Ouput:
knahT uoy kcats overflow
^ I need overflow to be reversed.
Important
I need whatever inside < > should be not reversed. If the word is in tags, it should be not reversed
like input:
<tag>word<tag>
output:
<tag>drow<tag>
There will be no space between a tag and a word.
Thank you <tag>stack overflow
knahT uoy <tag>kcats wolfrevo

As I've mentioned in the comment section, instead of printing the stack with the join method, calling the stackprint method to ensure that the stack is emptied will give you the desired result.
s = input()
def stackprint(st):
while st != []:
print(st.pop(), end="")
stack = []
tag = False
for ch in s:
if ch == '<':
stackprint(stack)
tag = True
print(ch, end="")
elif ch == '>':
tag = False
print(ch, end="")
elif tag:
print(ch, end="")
else:
if ch == ' ':
stackprint(stack)
print(ch, end="")
else:
stack.append(ch)
stackprint(stack)

This seems to work with the examples you have provided:
def revSetence(sentence):
sentence = sentence + " ";
flag = False
final_sentence = ""
word = ""
for letter in sentence:
if letter == "<":
flag = True
if letter == ">":
flag = False
if letter.isalpha():
if flag:
final_sentence = final_sentence + letter
else:
word = word + letter
else:
if len(word) > 0:
final_sentence = final_sentence + word[::-1]
final_sentence = final_sentence + letter
word =""
return final_sentence

Isolating the first word of a string

This is the code i've wrote so far:
def first_word(text: str) -> str:
while text.find(' ') == 0:
text = text[1:]
while text.find('.') == 0:
text = text[1:]
while text.find(' ') == 0:
text = text[1:]
while text.find('.') == 0:
text = text[1:]
if text.find('.') != -1:
text = text.split('.')
elif text.find(',') != -1:
text = text.split(',')
elif text.find(' ') != -1:
text = text.split(' ')
text = text[0]
return text
it's supposed to isolate the first word in a string, it shold delete any ".", " ", "," and keep only the word itself.

Using re, and split():
import re
ss = 'ba&*(*seball is fun'
print(''.join(re.findall(r'(\w+)', ss.split()[0])))
Output:
baseball

sentence="bl.a, bla bla"
first_word=first_word.replace(".","").replace(",","")
first_word=sentence.split(" ")[0]
print(first_word)
Or you could try a list comprehension:
sentence="bl.a, bla bla"
first_word=''.join([e for e in first_word if e not in ".,"]) #or any other punctuation
first_word=sentence.split(" ")[0]
print(first_word)

Looping through a string and only returning certain characters. Python

I have a problem when creating a function that's supposed to first return lowercase letters, "_" and "." and then uppercase letters, " " and "|" in that order. My version seems to return numbers and special characters like <># too which I don't want it to do, It's only supposed to read through the input string once and I don't know if that's achieved with my code.
My code is:
def split_iterative(n):
splitted_first = ""
splitted_second = ""
for i in n:
if i == i.lower() or i == "_" or i == ".":
splitted_first = splitted_first + i
elif i == i.upper() or i == " " or i == "|":
splitted_second = splitted_second + i
return splitted_first + splitted_second
if I do split_iterative("'lMiED)teD5E,_hLAe;Nm,0#Dli&Eg ,#4aI?rN#T§&e7#4E #<(S0A?<)NT8<0'")) it returns "'li)te5,_he;m,0#li&g ,#4a?r#§&e7#4 #<(0?<)8<0'MEDDELANDEINTESANT" which is incorrect as it should eliminate all those special characters and numbers. How do I fix this? It should return ('lite_hemligare', 'MEDDELANDE INTE SANT')

You could try this:
def f(input_string):
str1 = str2 = ""
for character in input_string:
if character.isalpha():
if character.islower():
str1 += character
else:
str2 += character
elif character in "_.":
str1 += character
elif character in " |":
str2 += character
return str1, str2
Output:
>>> input_string = "'lMiED)teD5E,_hLAe;Nm,0#Dli&Eg ,#4aI?rN#T§&e7#4E #<(S0A?<)NT8<0'"
>>>
>>> print f(input_string)
('lite_hemligare', 'MEDDELANDE INTE SANT')
>>>

This is because you are iterating through a string. The lowercase of the special characters is the same as the character. i.e.. '#'.lower() == '#'. hence it'll return '#' and all other special characters. you should explicitly check for alphabets using the isalpha() method on strings.
(i.isalpha() and i.lower() == i) or i == '_' or i == '.'

First, to make it return a list don't return the concatenated string but a list
Second, you are not checking or filtering out the characters, one way would be by checking if the character is a letter using isalpha() method
something like this:
def split_iterative(n):
splitted_first = ""
splitted_second = ""
for i in n:
if (i.isalpha() and i == i.lower()) or i == "_" or i == ".":
splitted_first = splitted_first + i
elif (i.isalpha() and i == i.upper()) or i == " " or i == "|":
splitted_second = splitted_second + i
#returns a list you can make it a variable if you need
return [splitted_first, splitted_second]

You can use ASCII values for the filtering of characters:
def split_iterative(n):
splitted_first = ""
splitted_second = ""
for i in n:
if ord(i) in range(97,122) or i == "_" or i == ".":
splitted_first = splitted_first + i
elif ord(i) in range(65,90) or i == " " or i == "|":
splitted_second = splitted_second + i
return (splitted_first , splitted_second)

You can make use of two lists while walking through characters of your text.
You can append lowercase, underscore, and stop characters to one list then uppercase, space and pipe characters to the other.
Finally return a tuple of each list joined as strings.
def splittext(txt):
slug, uppercase_letters = [], []
slug_symbols = {'_', '.'}
uppercase_symbols = {' ', '|'}
for letter in txt:
if letter.islower() or letter in slug_symbols:
slug.append(letter)
if letter.isupper() or letter in uppercase_symbols:
uppercase_letters.append(letter)
return ''.join(slug), ''.join(uppercase_letters)
txt="'lMiED)teD5E,_hLAe;Nm,0#Dli&Eg ,#4aI?rN#T§&e7#4E #<(S0A?<)NT8<0'"
assert splittext(txt) == ("lite_hemligare", "MEDDELANDE INTE SANT")

compress and decompress text files in python

I need to fix this program so that it removes punctuation from the decompressed file. For example when the file original text is decompressed there is a space between the word and punctuation.
example: cheese ,
should return cheese,
def RemoveSpace(ln): #subroutine used to remove the spaces after the punctuation
line = ""
line2 = ""
puncpst = []
for g in range(1, len(line)):
if line[g] == "." or line[g] == "," or line[g] == "!" or line[g] == "?":
puncpst.append(g) #get the positions of punctuation marks in a list
for b in range(len(line)):
if b + 1 not in puncpst:
line2 = line2 + line[b]
return line2

The reason for the code to not work is the indentation after if statement. Please correct the indentation as below:
if b+1 not in puncpst:
line2 = line2+line[b]
Another way to handle it is to directly replace space in the string:
line.replace(" .",".")
line.replace(" ,",",")

It sounds like your program should be like this:
def RemoveSpace(line):
puncpst = []
for g in range(1, len(line)):
if line[g] == "." or line[g] == "," or line[g] == "!" or line[g] == "?":
puncpst.append(g) #get the positions of punctuation marks in a list
ret = ""
for b in range(len(line)):
if b + 1 not in puncpst:
ret += line[b]
return ret
Your original had def RemoveSpace(ln): where ln was not used
An improved version, taking a lead from #v.coder, might be like this:
def RemoveSpace2(line):
punctuation = ['.', ',', '!', '?']
for p in punctuation:
original = ' ' + p
line = line.replace(original, p)
return line

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python text splitter program - python

Related

Python strings formatting

Why my each word reversing code is not reversing some words?

Isolating the first word of a string

Looping through a string and only returning certain characters. Python

compress and decompress text files in python

Categories

Resources