I have been working on calculating the percentage of punctuations in a sentence. For some reason, my function works when doing double spacing, but counts all the characters and the white space. For example, I have a text DEACTIVATE: OK so total full length is 14 when I subtract the punctuation then length is 13, so percentage should be 1/13 = 7.63%, however, my function gives me 7.14%, which is basically 1/14 = 7.14%.
On the other side, if have just one white space, my function throws me an error
"ZeroDivisionError: division by zero".
Here is my code for your reference and a simple text samples
text= "Centre to position, remaining shift is still larger than maximum (retry nbr=1, centring_stroke.r=2.7662e-05, max centring stroke.r=2.5e-05)"
text2= "DEACTIVATE: KU-1421"
import string
def count_punct(text):
count = sum([1 for char in text if char in string.punctuation])
return round(count/(len(text) - text.count(" ")), 3)*100
df_sub['punct%'] = df_sub['Err_Text2'].apply(lambda x: count_punct(x))
df_sub.head(20)
Here, Make these small changes and your count_punct function should be up and running.. The reason your code was breaking is because you were checking for ___ instead of _. i.e 3 consecutive spaces instead of one space. That is why the difference always resulted in the same value.
import string
def count_punct(text):
if text.strip() == "": # To take of care of all space input
return 0
count = sum([1 if char in string.punctuation else 0 for char in text ])
spaces = text.count(" ") # Your error is here, Only check for 1 space instead of 3 spaces
total_chars = len(text) - spaces
return round(count / total_chars, 3)*100
text= "DEACTIVATE: OK"
print(count_punct(text))
Outputs:
7.7
And for the zero divide by error. It's a logic error when the total_chars is 0, because the length of string and number of spaces both are equal. Hence the difference is 0.
To fix this you can simply add an if statement (already added above)
if text.strip() == "":
print(0)
I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words. I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.
My code is currently like this:
y = 0
import os
files = os.listdir(".")
filenames = []
for names in files:
if names.endswith(".uexp"):
filenames.append(names)
y +=1
print(y)
print(filenames)
for x in range(1,y):
filenamestart = (filenames[x][0:3])
print(filenamestart)
if filenamestart == "CID":
openFile = open(filenames[x],'r')
fileContents = (openFile.read())
ItemName = (fileContents[104:])
print(ItemName)
Input Example file (pulled from HxD):
.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž
I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.
I appreciate any help that anyone can give me.
One way to remove non-alphabetic characters in a string is to use regular expressions [1].
>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'
EDIT
The first argument r'[^a-z]' is the pattern that captures what will removed (here, by replacing it by an empty string ''). The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the a-z denotes all the small caps alphabetiv characters. More information here:
https://docs.python.org/3/library/re.html#regular-expression-syntax
So for instance, to keep also capital letters and spaces it would be:
>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'
However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".
You can use filter:
import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF
You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned
Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase. You can use a for loop to do this.
import re
h = "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"
output = ""
for i in range(0, len(h)):
# Keep spaces
if h[i] is " ":
output += h[i]
# Start of a phrase found, so separate with space and store character
elif h[i].isupper() and h[i+1].islower():
output += " " + h[i]
# We want all lowercase characters
elif h[i].islower():
output += h[i]
# [1:] because we appended a space to the start of every word
print output[1:]
# If you dont care about Outfit since it is always there, remove it
print output[1:].replace("Outfit", "")
Output:
Sun Tan Specialist Outfit Dont get burned
Sun Tan Specialist Dont get burned
I am trying to create a for loop where the user inputs a number n and the output provides the range of values from n to n+6. This needs to all be printed in one row and be right aligned with spaces in between value outputs but no space at the end or start of the output.
So far this is what I've come up with:
n=eval(input("Enter the start number: "))
for n in range(n,n+7):
print("{0:>2}".format(n),end=" ")
However, this results in the following output:
-2 -1 0 1 2 3 4 <EOL>
When the output I want needs to look similar but without the space at the end, like so:
-2 -1 0 1 2 3 4<EOL>
How can I add spaces between values without adding an additional space to the final term?
There are 3 recommendations I could make:
use end="" and insert the whitespaces manually
create a string and print after the loop:
s = ""
for n in range(n, n+7):
s+= str(n)+ " "
s = s[:-1] #remove the ending whitespace
print(s)
which I recommend: Using sys.stdout.write instead print:
print only displays the message after a linebreak was printed. So if there is a long calculation in the loop and there is end=" " you will only see the resulr at the end of all calculations. Use sys.stdout instead
for n in range(n, n+7):
if n < n+7:
sys.stdout.write(str(n)+" ")
else:
sys.stdout.write(str(n))
sys.stdour.flush() #flush output to console
Edit: I evolved a bit and this is what I'd use nowadays:
4. message = " ".join(range(n, n+7))
This puts spaces between all elements of a list. You can choose any separation character instead of a space (or multiple characters).
I'm building an analyzer for a series of strings.
I need to check how much each line is indented (either by tabs or by spaces).
Each line is just a string in a text editor.
How do I check by how much a string is indented?
Or rather, maybe I could check how much whitespace or \t are before a string, but I'm unsure of how.
To count the number of spaces at the beginning of a string you could do a comparison between the left stripped (whitespace removed) string and the original:
a = " indented string"
leading_spaces = len(a) - len(a.lstrip())
print(leading_spaces)
# >>> 4
Tab indent is context specific... it changes based on the settings of whatever program is displaying the tab characters. This approach will only tell you the total number of whitespace characters (each tab will be considered one character).
Or to demonstrate:
a = "\t\tindented string"
leading_spaces = len(a) - len(a.lstrip())
print(leading_spaces)
# >>> 2
EDIT:
If you want to do this to a whole file you might want to try
with open("myfile.txt") as afile:
line_lengths = [len(line) - len(line.lstrip()) for line in afile]
I think Gizmo's basic idea is good, and it's relatively easy to extend it to handle any mixture of leading tabs and spaces by using a string object's expandtabs() method:
def indentation(s, tabsize=4):
sx = s.expandtabs(tabsize)
return 0 if sx.isspace() else len(sx) - len(sx.lstrip())
print indentation(" tindented string")
print indentation("\t\tindented string")
print indentation(" \t \tindented string")
The last two print statements will output the same value.
Edit: I modified it to check and return 0 if a line of all tabs and spaces is encountered.
The len() method will count tab (\t) as one. In some case, it will not behave expectedly. So my way is to use re.sub and then count the space(s).
indent_count = re.sub(r'^([\s]*)[\s]+.*$', r'\g<1>', line).count(' ')
def count_indentation(line) :
count = 0
try :
while (line[count] == "\t") :
count += 1
return count
except :
return count
There is a string, for example. EXAMPLE.
How can I remove the middle character, i.e., M from it? I don't need the code. I want to know:
Do strings in Python end in any special character?
Which is a better way - shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?
In Python, strings are immutable, so you have to create a new string. You have a few options of how to create the new string. If you want to remove the 'M' wherever it appears:
newstr = oldstr.replace("M", "")
If you want to remove the central character:
midlen = len(oldstr) // 2
newstr = oldstr[:midlen] + oldstr[midlen+1:]
You asked if strings end with a special character. No, you are thinking like a C programmer. In Python, strings are stored with their length, so any byte value, including \0, can appear in a string.
To replace a specific position:
s = s[:pos] + s[(pos+1):]
To replace a specific character:
s = s.replace('M','')
This is probably the best way:
original = "EXAMPLE"
removed = original.replace("M", "")
Don't worry about shifting characters and such. Most Python code takes place on a much higher level of abstraction.
Strings are immutable. But you can convert them to a list, which is mutable, and then convert the list back to a string after you've changed it.
s = "this is a string"
l = list(s) # convert to list
l[1] = "" # "delete" letter h (the item actually still exists but is empty)
l[1:2] = [] # really delete letter h (the item is actually removed from the list)
del(l[1]) # another way to delete it
p = l.index("a") # find position of the letter "a"
del(l[p]) # delete it
s = "".join(l) # convert back to string
You can also create a new string, as others have shown, by taking everything except the character you want from the existing string.
How can I remove the middle character, i.e., M from it?
You can't, because strings in Python are immutable.
Do strings in Python end in any special character?
No. They are similar to lists of characters; the length of the list defines the length of the string, and no character acts as a terminator.
Which is a better way - shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?
You cannot modify the existing string, so you must create a new one containing everything except the middle character.
Use the translate() method:
>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'
def kill_char(string, n): # n = position of which character you want to remove
begin = string[:n] # from beginning to n (n not included)
end = string[n+1:] # n+1 through end of string
return begin + end
print kill_char("EXAMPLE", 3) # "M" removed
I have seen this somewhere here.
card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)
How to remove one character from a string:
Here is an example where there is a stack of cards represented as characters in a string.
One of them is drawn (import random module for the random.choice() function, that picks a random character in the string).
A new string, cardsLeft, is created to hold the remaining cards given by the string function replace() where the last parameter indicates that only one "card" is to be replaced by the empty string...
On Python 2, you can use UserString.MutableString to do it in a mutable way:
>>> import UserString
>>> s = UserString.MutableString("EXAMPLE")
>>> type(s)
<class 'UserString.MutableString'>
>>> del s[3] # Delete 'M'
>>> s = str(s) # Turn it into an immutable value
>>> s
'EXAPLE'
MutableString was removed in Python 3.
Another way is with a function,
Below is a way to remove all vowels from a string, just by calling the function
def disemvowel(s):
return s.translate(None, "aeiouAEIOU")
Here's what I did to slice out the "M":
s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]
To delete a char or a sub-string once (only the first occurrence):
main_string = main_string.replace(sub_str, replace_with, 1)
NOTE: Here 1 can be replaced with any int for the number of occurrence you want to replace.
You can simply use list comprehension.
Assume that you have the string: my name is and you want to remove character m. use the following code:
"".join([x for x in "my name is" if x is not 'm'])
If you want to delete/ignore characters in a string, and, for instance, you have this string,
"[11:L:0]"
from a web API response or something like that, like a CSV file, let's say you are using requests
import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content
loop and get rid of unwanted chars:
for line in resp.iter_lines():
line = line.replace("[", "")
line = line.replace("]", "")
line = line.replace('"', "")
Optional split, and you will be able to read values individually:
listofvalues = line.split(':')
Now accessing each value is easier:
print listofvalues[0]
print listofvalues[1]
print listofvalues[2]
This will print
11
L
0
Two new string removal methods are introduced in Python 3.9+
#str.removeprefix("prefix_to_be_removed")
#str.removesuffix("suffix_to_be_removed")
s='EXAMPLE'
In this case position of 'M' is 3
s = s[:3] + s[3:].removeprefix('M')
OR
s = s[:4].removesuffix('M') + s[4:]
#output'EXAPLE'
from random import randint
def shuffle_word(word):
newWord=""
for i in range(0,len(word)):
pos=randint(0,len(word)-1)
newWord += word[pos]
word = word[:pos]+word[pos+1:]
return newWord
word = "Sarajevo"
print(shuffle_word(word))
Strings are immutable in Python so both your options mean the same thing basically.