compress and decompress text files in python

compress and decompress text files in python - python

I need to fix this program so that it removes punctuation from the decompressed file. For example when the file original text is decompressed there is a space between the word and punctuation.
example: cheese ,
should return cheese,
def RemoveSpace(ln): #subroutine used to remove the spaces after the punctuation
line = ""
line2 = ""
puncpst = []
for g in range(1, len(line)):
if line[g] == "." or line[g] == "," or line[g] == "!" or line[g] == "?":
puncpst.append(g) #get the positions of punctuation marks in a list
for b in range(len(line)):
if b + 1 not in puncpst:
line2 = line2 + line[b]
return line2

The reason for the code to not work is the indentation after if statement. Please correct the indentation as below:
if b+1 not in puncpst:
line2 = line2+line[b]
Another way to handle it is to directly replace space in the string:
line.replace(" .",".")
line.replace(" ,",",")

It sounds like your program should be like this:
def RemoveSpace(line):
puncpst = []
for g in range(1, len(line)):
if line[g] == "." or line[g] == "," or line[g] == "!" or line[g] == "?":
puncpst.append(g) #get the positions of punctuation marks in a list
ret = ""
for b in range(len(line)):
if b + 1 not in puncpst:
ret += line[b]
return ret
Your original had def RemoveSpace(ln): where ln was not used
An improved version, taking a lead from #v.coder, might be like this:
def RemoveSpace2(line):
punctuation = ['.', ',', '!', '?']
for p in punctuation:
original = ' ' + p
line = line.replace(original, p)
return line

Related

Best approach for converting lowerCamelCase to snake_case

I came across below mentioned scenario:
Input:-
parselTongue
Expected Output:-
parsel_tongue
My code:-
empty_string = ""
word = input()
if word.islower() == 1:
empty_string = empty_string + word
print(empty_string)
else:
for char in word:
char = str(char)
if char.isupper() == 1:
x = char
y = word.find(x)
print(char.replace(char, word[0:y] + "_" + char.lower() + word[y:]))
My output:-
parsel_tTongue
Please advice where i am going wrong as my output is coming as "parsel_tTongue" and not "parsel_tongue"

The more elegant solution would be just to implement the logic using comprehension.
word = input()
output= ''.join(c if not c.isupper() else f'_{c.lower()}' for c in word)
#output: 'parsel_tongue'

I believe that this approach could be better.
It prevents from situations where word contains not only letters but also special characters or numbers.
word = "camelCaseWord"
res = "" # sanke case word
# handle 1st upper character
if word[0].isupper():
word = word[0].lower() + word[1:]
for w in word:
# Only letter can be upper
if w.isupper():
res += "_" + w.lower()
else:
res += w
print(res)
>>> camel_case_word
if word = "camelCase3Wor& - > >>> camel_case3_wor&

no need for loop use regex
import re
name = 'parselTongue'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name) # camel_case_name

Adjust the slice on word
empty_string = ""
word = input()
if word.islower() == 1:
empty_string = empty_string + word
print(empty_string)
else:
for char in word:
char = str(char)
if char.isupper() == 1:
x = char
y = word.find(x)
print(char.replace(char, word[0:y] + "_" + char.lower()+ word[y+1:]))
prints the following for the input parselTongue
praselTongue
prasel_tongue

The best practice may be using regex:
fooBarBaz -> foo_bar_baz
re.sub(r'([A-Z])',lambda match:'_'+match.group(1).lower(),'fooBarBaz')
foo_bar_baz -> fooBarBaz
re.sub(r'_([a-z])',lambda match:match.group(1).upper(),'foo_bar_baz')

import re
camel_case = 'miaBau'
snake_case = re.sub(r'([A-Z])', r'_\1', camel_case).lower()

Replace a sequence of characters by another one

I have a sequence of characters '-------' and i want to replace each '-' in it by each letter in 'jaillir' in the correct range.
How do i do that ?
Here is my code
import random
with open ("lexique.txt", "r", encoding= "utf8") as a:
words = []
letters = []
tirets= []
for line in a:
ligne = line[:-1]
words.append(ligne)
choix = random.choice(words)
tiret = ('-'* len(choix))
print(tiret)
print(choix)
accompli = False
while not accompli:
lettre = input("Entrez une lettre du mot ")
for t in range(len(tiret)):
if lettre in choix:
tiret.replace(tiret[t], lettre[t])
print(tiret)

I think you need to fix your file reading code, even though it is not the question, as below:
with open('lexique.txt', r) as f:
text = f.read() # get file contents
Next to replace the ---- by a word, I am assuming that the dashes in your text will only ever be the same length as the word, so:
word = 'word' # any string e.g. word
dashes = '-' * len(word)
So now you can use python's string.replace method like so:
text = text.replace(dashes, word) # every time it finds the sequence of dashes it will be replaced by your word
With a for loop (gradual replacement):
word = 'word' # any word
length = len(word)
temp = ''
for i, letter in enumerate(text):
if letter == '-':
if i + len(tempword) < len(text):
characters = [True if l == '-' else False for l in text[i:i + len(tempword)]]
if not(False in characters):
new += tempword[0]
if len(tempword) > 1:
tempword = tempword[1:]
else:
tempword = word
else:
new += letter
else:
new += letter
print(new)

How to count characters in a file using JES (Python for students)

How to have python read a text file and return the number of lines, characters, vowels, consonants, lowercase letters, and uppercase letters
Write a program that accepts the name of a file as a command-line argument. (You can assume that the input file will be a plain-text file.) If the user forgets to include the command line argument, your program should exit with an appropriate error message.
Otherwise, your program should print out:
The number of lines in the file
The number of characters in the file
The number of vowels in the file (For the purposes of this assignment, treat "y" as a consonant.)
The number of consonants in the file
The number of lowercase letters in the file
The number of uppercase letters in the file
I am at a lose. How would I do this? like im pretty sure there are commands that can do this but I dont know what they are. Thanks for your help :)
EDIT
This is my final program and its perfect. Thank you all for your help. Special thanks to Bentaye :)
import sys
def text():
countV = 0
countC = 0
lines = 0
countU = 0
countL = 0
characters = 0
vowels = set("AEIOUaeiou")
cons = set("bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ")
upper = set("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
lower = set("abcdefghijklmnopqrstuvwxyz")
with open(sys.argv[1]) as file:
fileLines = file.readlines()
for line in fileLines:
lines = lines + 1
characters = characters + len(line)
for char in line:
if char in vowels:
countV = countV + 1
elif char in cons:
countC = countC + 1
for char in line:
if char in upper:
countU = countU + 1
elif char in lower:
countL = countL + 1
print("Lines: " + str(lines))
print("Characters: " + str(characters))
print("Vowels: " + str(countV))
print("Consonants: " + str(countC))
print("Lowercase: " + str(countL))
print("Uppercase: " + str(countU))
text()

This fixes your problem, you can build onto it now for upper/lower cases
use sys.argv[0] to capture the argument (you need to import sys)
Then use file.readlines() to get an array of lines (as Strings)
Code
import sys
countV = 0
countC = 0
lines = 0
characters = 0
vowels = set("AEIOUaeiou")
cons = set("bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ")
with open(sys.argv[0]) as file:
fileLines = file.readlines()
for line in fileLines:
lines = lines + 1
characters = characters + len(line)
for char in line:
if char in vowels:
countV = countV + 1
elif char in cons:
countC = countC + 1
print("Lines: " + str(lines))
print("Characters: " + str(characters))
print (countV)
print (countC)
You call it this way
python test.py yourFile.txt
Complete answer for reference
import sys
vowels = "aeiou"
cons = "bcdfghjklmnpqrstvwxyz"
with open(sys.argv[0]) as file:
fileLines = file.readlines()
countVowels = 0
countConsonants = 0
countUpperCase = 0
countLowerCase = 0
countLines = 0
countCharacters = 0
countNonLetters = 0
for line in fileLines:
countLines += 1
countCharacters = countCharacters + len(line)
for char in line:
if char.isalpha():
if char.lower() in vowels:
countVowels += 1
elif char.lower() in cons:
countConsonants += 1
if char.isupper():
countUpperCase += 1
elif char.islower():
countLowerCase += 1
else:
countNonLetters += 1
print("Lines: " + str(countLines))
print("Characters: " + str(countCharacters))
print("Vowels: " + str(countVowels))
print("Consonants: " + str(countConsonants))
print("Upper case: " + str(countUpperCase))
print("Lower case: " + str(countLowerCase))
print("Non letters: " + str(countNonLetters))

Looping through a string and only returning certain characters. Python

I have a problem when creating a function that's supposed to first return lowercase letters, "_" and "." and then uppercase letters, " " and "|" in that order. My version seems to return numbers and special characters like <># too which I don't want it to do, It's only supposed to read through the input string once and I don't know if that's achieved with my code.
My code is:
def split_iterative(n):
splitted_first = ""
splitted_second = ""
for i in n:
if i == i.lower() or i == "_" or i == ".":
splitted_first = splitted_first + i
elif i == i.upper() or i == " " or i == "|":
splitted_second = splitted_second + i
return splitted_first + splitted_second
if I do split_iterative("'lMiED)teD5E,_hLAe;Nm,0#Dli&Eg ,#4aI?rN#T§&e7#4E #<(S0A?<)NT8<0'")) it returns "'li)te5,_he;m,0#li&g ,#4a?r#§&e7#4 #<(0?<)8<0'MEDDELANDEINTESANT" which is incorrect as it should eliminate all those special characters and numbers. How do I fix this? It should return ('lite_hemligare', 'MEDDELANDE INTE SANT')

You could try this:
def f(input_string):
str1 = str2 = ""
for character in input_string:
if character.isalpha():
if character.islower():
str1 += character
else:
str2 += character
elif character in "_.":
str1 += character
elif character in " |":
str2 += character
return str1, str2
Output:
>>> input_string = "'lMiED)teD5E,_hLAe;Nm,0#Dli&Eg ,#4aI?rN#T§&e7#4E #<(S0A?<)NT8<0'"
>>>
>>> print f(input_string)
('lite_hemligare', 'MEDDELANDE INTE SANT')
>>>

This is because you are iterating through a string. The lowercase of the special characters is the same as the character. i.e.. '#'.lower() == '#'. hence it'll return '#' and all other special characters. you should explicitly check for alphabets using the isalpha() method on strings.
(i.isalpha() and i.lower() == i) or i == '_' or i == '.'

First, to make it return a list don't return the concatenated string but a list
Second, you are not checking or filtering out the characters, one way would be by checking if the character is a letter using isalpha() method
something like this:
def split_iterative(n):
splitted_first = ""
splitted_second = ""
for i in n:
if (i.isalpha() and i == i.lower()) or i == "_" or i == ".":
splitted_first = splitted_first + i
elif (i.isalpha() and i == i.upper()) or i == " " or i == "|":
splitted_second = splitted_second + i
#returns a list you can make it a variable if you need
return [splitted_first, splitted_second]

You can use ASCII values for the filtering of characters:
def split_iterative(n):
splitted_first = ""
splitted_second = ""
for i in n:
if ord(i) in range(97,122) or i == "_" or i == ".":
splitted_first = splitted_first + i
elif ord(i) in range(65,90) or i == " " or i == "|":
splitted_second = splitted_second + i
return (splitted_first , splitted_second)

You can make use of two lists while walking through characters of your text.
You can append lowercase, underscore, and stop characters to one list then uppercase, space and pipe characters to the other.
Finally return a tuple of each list joined as strings.
def splittext(txt):
slug, uppercase_letters = [], []
slug_symbols = {'_', '.'}
uppercase_symbols = {' ', '|'}
for letter in txt:
if letter.islower() or letter in slug_symbols:
slug.append(letter)
if letter.isupper() or letter in uppercase_symbols:
uppercase_letters.append(letter)
return ''.join(slug), ''.join(uppercase_letters)
txt="'lMiED)teD5E,_hLAe;Nm,0#Dli&Eg ,#4aI?rN#T§&e7#4E #<(S0A?<)NT8<0'"
assert splittext(txt) == ("lite_hemligare", "MEDDELANDE INTE SANT")

python text splitter program

I have a program that splits a text into sentences using the following rule:
Sentence boundaries occur at ".", "?" and "!" except:
A) Periods followed by digit with no intervening whitespaces. B) Periods followed by whitespaces followed by lower case letters. C) Periods with no followed whitespaces. D) Periods preceded with titles.
My written code is as given below:
file_name = raw_input("Enter the name of the text file: ")
txt_file = open('%s.txt' % file_name, 'r+')
text = txt_file.readline()
print; print "Original text is: "; print
print text
new_wrd = []
new_line = []
new_txt = []
while len(text.strip()) != 0:
for index, char in enumerate(text):
print char
if char == "." or char == "?" or char == "!":
if text[index+1] == " ":
if ("".join(new_wrd) == "Mrs" or "".join(new_wrd) == "Mr" or "".join(new_wrd) == "Ms"
or "".join(new_wrd) == "Dr" or "".join(new_wrd) == "Jr"):
new_wrd.append(char)
else:
if text[index+2].isupper():
new_line.append("".join(new_wrd))
new_line.append(char)
new_txt.append("".join(new_line))
new_line = []
new_wrd = []
else:
new_line.append("".join(new_wrd))
new_line.append(char + " ")
new_wrd = []
else:
new_wrd.append(char)
elif char == " ":
if ("".join(new_wrd) == "Mrs." or "".join(new_wrd) == "Mr." or "".join(new_wrd) == "Ms."
or "".join(new_wrd) == "Dr." or "".join(new_wrd) == "Jr.") or new_wrd != []:
new_line.append("".join(new_wrd))
new_line.append(" ")
new_wrd = []
else:
new_wrd.append(char)
text = txt_file.readline()
for txt in new_txt:
print txt
txt_file.write(txt)
For the given example:
Mr. XYZ is a good boy. He has just pass his B.Tech degree from ABC, Lmnop... At least, he has passed the degree.
The output should show:
Mr. XYZ is a good boy.
He has just pass his B.Tech degree from ABC, Lmnop...
At least, he has passed the degree.
But instead, it shows:
Mr. XYZ is a good boy.
He has just pass his B.Tech degree from ABC, Lmnop...
What are the corrections that could be made for proper output?
Also, the the code:
txt_file.write(txt)
not working. Why?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

compress and decompress text files in python - python

The reason for the code to not work is the indentation after if statement. Please correct the indentation as below: if b+1 not in puncpst: line2 = line2+line[b] Another way to handle it is to directly replace space in the string: line.replace(" .",".") line.replace(" ,",",")

Related

Best approach for converting lowerCamelCase to snake_case

Replace a sequence of characters by another one

How to count characters in a file using JES (Python for students)

Looping through a string and only returning certain characters. Python

python text splitter program

Categories

Resources