Python expandtabs string operation - python

I am learning about Python and got to the expandtabs command in Python.
This is the official definition in the docs:
string.expandtabs(s[, tabsize])
Expand tabs in a string replacing them by one or more spaces, depending on the current column and the given tab size. The column number is reset to zero after each newline occurring in the string. This doesn’t understand other non-printing characters or escape sequences. The tab size defaults to 8.
So what I understood from that is that the default size of tabs is 8 and to increase that, we can use other values
So, when I tried that in the shell, I tried the following inputs -
>>> str = "this is\tstring"
>>> print str.expandtabs(0)
this isstring
>>> print str.expandtabs(1)
this is string
>>> print str.expandtabs(2)
this is string
>>> print str.expandtabs(3)
this is string
>>> print str.expandtabs(4)
this is string
>>> print str.expandtabs(5)
this is string
>>> print str.expandtabs(6)
this is string
>>> print str.expandtabs(7)
this is string
>>> print str.expandtabs(8)
this is string
>>> print str.expandtabs(9)
this is string
>>> print str.expandtabs(10)
this is string
>>> print str.expandtabs(11)
this is string
So here,
0 removes the tab character entirely,
1 is exactly like the default 8,
but 2is exactly like 1 and then
3 is different
and then again 4 is like using 1
and after that it increases up till 8 which is the default and then increases after 8.But why the weird pattern in numbers from 0 to 8? I know it is supposed to start from 8, but what is the reason?

str.expandtabs(n) is not equivalent to str.replace("\t", " " * n).
str.expandtabs(n) keeps track of the current cursor position on each line, and replaces each tab character it finds with the number of spaces from the current cursor position to the next tab stop. The tab stops are taken to be every n characters.
This is fundamental to the way tabs work, and is not specific to Python. See this answer to a related question for a good explanation of tab stops.
string.expandtabs(n) is equivalent to:
def expandtabs(string, n):
result = ""
pos = 0
for char in string:
if char == "\t":
# instead of the tab character, append the
# number of spaces to the next tab stop
char = " " * (n - pos % n)
pos = 0
elif char == "\n":
pos = 0
else:
pos += 1
result += char
return result
And an example of use:
>>> input = "123\t12345\t1234\t1\n12\t1234\t123\t1"
>>> print(expandtabs(input, 10))
123 12345 1234 1
12 1234 123 1
Note how each tab character ("\t") has been replaced with the number of spaces that causes it to line up with the next tab stop. In this case, there is a tab stop every 10 characters because I supplied n=10.

The expandtabs method replaces the \t with whitespace characters until the next multiple of tabsize parameter i.e., the next tab position.
for eg. take str.expandtabs(5)
'this (5)is(7)\tstring' so the '\t' is replaced with whitespace until index=10 and follwing string is moved forward. so you see 10-7=3 whitespaces.
(**number in brackets are index numbers **)
eg2. str.expandtabs(4)
'this(4) is(7)\tstring' here '\t' replaces until index=8. so you see only one whitespace

Related

calculate punctuation percentage in a string in Python

I have been working on calculating the percentage of punctuations in a sentence. For some reason, my function works when doing double spacing, but counts all the characters and the white space. For example, I have a text DEACTIVATE: OK so total full length is 14 when I subtract the punctuation then length is 13, so percentage should be 1/13 = 7.63%, however, my function gives me 7.14%, which is basically 1/14 = 7.14%.
On the other side, if have just one white space, my function throws me an error
"ZeroDivisionError: division by zero".
Here is my code for your reference and a simple text samples
text= "Centre to position, remaining shift is still larger than maximum (retry nbr=1, centring_stroke.r=2.7662e-05, max centring stroke.r=2.5e-05)"
text2= "DEACTIVATE: KU-1421"
import string
def count_punct(text):
count = sum([1 for char in text if char in string.punctuation])
return round(count/(len(text) - text.count(" ")), 3)*100
df_sub['punct%'] = df_sub['Err_Text2'].apply(lambda x: count_punct(x))
df_sub.head(20)
Here, Make these small changes and your count_punct function should be up and running.. The reason your code was breaking is because you were checking for ___ instead of _. i.e 3 consecutive spaces instead of one space. That is why the difference always resulted in the same value.
import string
def count_punct(text):
if text.strip() == "": # To take of care of all space input
return 0
count = sum([1 if char in string.punctuation else 0 for char in text ])
spaces = text.count(" ") # Your error is here, Only check for 1 space instead of 3 spaces
total_chars = len(text) - spaces
return round(count / total_chars, 3)*100
text= "DEACTIVATE: OK"
print(count_punct(text))
Outputs:
7.7
And for the zero divide by error. It's a logic error when the total_chars is 0, because the length of string and number of spaces both are equal. Hence the difference is 0.
To fix this you can simply add an if statement (already added above)
if text.strip() == "":
print(0)

How to remove all non-alphabetic characters from a string?

I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words. I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.
My code is currently like this:
y = 0
import os
files = os.listdir(".")
filenames = []
for names in files:
if names.endswith(".uexp"):
filenames.append(names)
y +=1
print(y)
print(filenames)
for x in range(1,y):
filenamestart = (filenames[x][0:3])
print(filenamestart)
if filenamestart == "CID":
openFile = open(filenames[x],'r')
fileContents = (openFile.read())
ItemName = (fileContents[104:])
print(ItemName)
Input Example file (pulled from HxD):
.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž
I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.
I appreciate any help that anyone can give me.
One way to remove non-alphabetic characters in a string is to use regular expressions [1].
>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'
EDIT
The first argument r'[^a-z]' is the pattern that captures what will removed (here, by replacing it by an empty string ''). The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the a-z denotes all the small caps alphabetiv characters. More information here:
https://docs.python.org/3/library/re.html#regular-expression-syntax
So for instance, to keep also capital letters and spaces it would be:
>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'
However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".
You can use filter:
import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF
You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned
Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase. You can use a for loop to do this.
import re
h = "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"
output = ""
for i in range(0, len(h)):
# Keep spaces
if h[i] is " ":
output += h[i]
# Start of a phrase found, so separate with space and store character
elif h[i].isupper() and h[i+1].islower():
output += " " + h[i]
# We want all lowercase characters
elif h[i].islower():
output += h[i]
# [1:] because we appended a space to the start of every word
print output[1:]
# If you dont care about Outfit since it is always there, remove it
print output[1:].replace("Outfit", "")
Output:
Sun Tan Specialist Outfit Dont get burned
Sun Tan Specialist Dont get burned

Add a space in between loop outputs using .format: Python 3

I am trying to create a for loop where the user inputs a number n and the output provides the range of values from n to n+6. This needs to all be printed in one row and be right aligned with spaces in between value outputs but no space at the end or start of the output.
So far this is what I've come up with:
n=eval(input("Enter the start number: "))
for n in range(n,n+7):
print("{0:>2}".format(n),end=" ")
However, this results in the following output:
-2 -1 0 1 2 3 4 <EOL>
When the output I want needs to look similar but without the space at the end, like so:
-2 -1 0 1 2 3 4<EOL>
How can I add spaces between values without adding an additional space to the final term?
There are 3 recommendations I could make:
use end="" and insert the whitespaces manually
create a string and print after the loop:
s = ""
for n in range(n, n+7):
s+= str(n)+ " "
s = s[:-1] #remove the ending whitespace
print(s)
which I recommend: Using sys.stdout.write instead print:
print only displays the message after a linebreak was printed. So if there is a long calculation in the loop and there is end=" " you will only see the resulr at the end of all calculations. Use sys.stdout instead
for n in range(n, n+7):
if n < n+7:
sys.stdout.write(str(n)+" ")
else:
sys.stdout.write(str(n))
sys.stdour.flush() #flush output to console
Edit: I evolved a bit and this is what I'd use nowadays:
4. message = " ".join(range(n, n+7))
This puts spaces between all elements of a list. You can choose any separation character instead of a space (or multiple characters).

Check string indentation?

I'm building an analyzer for a series of strings.
I need to check how much each line is indented (either by tabs or by spaces).
Each line is just a string in a text editor.
How do I check by how much a string is indented?
Or rather, maybe I could check how much whitespace or \t are before a string, but I'm unsure of how.
To count the number of spaces at the beginning of a string you could do a comparison between the left stripped (whitespace removed) string and the original:
a = " indented string"
leading_spaces = len(a) - len(a.lstrip())
print(leading_spaces)
# >>> 4
Tab indent is context specific... it changes based on the settings of whatever program is displaying the tab characters. This approach will only tell you the total number of whitespace characters (each tab will be considered one character).
Or to demonstrate:
a = "\t\tindented string"
leading_spaces = len(a) - len(a.lstrip())
print(leading_spaces)
# >>> 2
EDIT:
If you want to do this to a whole file you might want to try
with open("myfile.txt") as afile:
line_lengths = [len(line) - len(line.lstrip()) for line in afile]
I think Gizmo's basic idea is good, and it's relatively easy to extend it to handle any mixture of leading tabs and spaces by using a string object's expandtabs() method:
def indentation(s, tabsize=4):
sx = s.expandtabs(tabsize)
return 0 if sx.isspace() else len(sx) - len(sx.lstrip())
print indentation(" tindented string")
print indentation("\t\tindented string")
print indentation(" \t \tindented string")
The last two print statements will output the same value.
Edit: I modified it to check and return 0 if a line of all tabs and spaces is encountered.
The len() method will count tab (\t) as one. In some case, it will not behave expectedly. So my way is to use re.sub and then count the space(s).
indent_count = re.sub(r'^([\s]*)[\s]+.*$', r'\g<1>', line).count(' ')
def count_indentation(line) :
count = 0
try :
while (line[count] == "\t") :
count += 1
return count
except :
return count

How to delete a character from a string using Python

There is a string, for example. EXAMPLE.
How can I remove the middle character, i.e., M from it? I don't need the code. I want to know:
Do strings in Python end in any special character?
Which is a better way - shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?
In Python, strings are immutable, so you have to create a new string. You have a few options of how to create the new string. If you want to remove the 'M' wherever it appears:
newstr = oldstr.replace("M", "")
If you want to remove the central character:
midlen = len(oldstr) // 2
newstr = oldstr[:midlen] + oldstr[midlen+1:]
You asked if strings end with a special character. No, you are thinking like a C programmer. In Python, strings are stored with their length, so any byte value, including \0, can appear in a string.
To replace a specific position:
s = s[:pos] + s[(pos+1):]
To replace a specific character:
s = s.replace('M','')
This is probably the best way:
original = "EXAMPLE"
removed = original.replace("M", "")
Don't worry about shifting characters and such. Most Python code takes place on a much higher level of abstraction.
Strings are immutable. But you can convert them to a list, which is mutable, and then convert the list back to a string after you've changed it.
s = "this is a string"
l = list(s) # convert to list
l[1] = "" # "delete" letter h (the item actually still exists but is empty)
l[1:2] = [] # really delete letter h (the item is actually removed from the list)
del(l[1]) # another way to delete it
p = l.index("a") # find position of the letter "a"
del(l[p]) # delete it
s = "".join(l) # convert back to string
You can also create a new string, as others have shown, by taking everything except the character you want from the existing string.
How can I remove the middle character, i.e., M from it?
You can't, because strings in Python are immutable.
Do strings in Python end in any special character?
No. They are similar to lists of characters; the length of the list defines the length of the string, and no character acts as a terminator.
Which is a better way - shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?
You cannot modify the existing string, so you must create a new one containing everything except the middle character.
Use the translate() method:
>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'
def kill_char(string, n): # n = position of which character you want to remove
begin = string[:n] # from beginning to n (n not included)
end = string[n+1:] # n+1 through end of string
return begin + end
print kill_char("EXAMPLE", 3) # "M" removed
I have seen this somewhere here.
card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)
How to remove one character from a string:
Here is an example where there is a stack of cards represented as characters in a string.
One of them is drawn (import random module for the random.choice() function, that picks a random character in the string).
A new string, cardsLeft, is created to hold the remaining cards given by the string function replace() where the last parameter indicates that only one "card" is to be replaced by the empty string...
On Python 2, you can use UserString.MutableString to do it in a mutable way:
>>> import UserString
>>> s = UserString.MutableString("EXAMPLE")
>>> type(s)
<class 'UserString.MutableString'>
>>> del s[3] # Delete 'M'
>>> s = str(s) # Turn it into an immutable value
>>> s
'EXAPLE'
MutableString was removed in Python 3.
Another way is with a function,
Below is a way to remove all vowels from a string, just by calling the function
def disemvowel(s):
return s.translate(None, "aeiouAEIOU")
Here's what I did to slice out the "M":
s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]
To delete a char or a sub-string once (only the first occurrence):
main_string = main_string.replace(sub_str, replace_with, 1)
NOTE: Here 1 can be replaced with any int for the number of occurrence you want to replace.
You can simply use list comprehension.
Assume that you have the string: my name is and you want to remove character m. use the following code:
"".join([x for x in "my name is" if x is not 'm'])
If you want to delete/ignore characters in a string, and, for instance, you have this string,
"[11:L:0]"
from a web API response or something like that, like a CSV file, let's say you are using requests
import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content
loop and get rid of unwanted chars:
for line in resp.iter_lines():
line = line.replace("[", "")
line = line.replace("]", "")
line = line.replace('"', "")
Optional split, and you will be able to read values individually:
listofvalues = line.split(':')
Now accessing each value is easier:
print listofvalues[0]
print listofvalues[1]
print listofvalues[2]
This will print
11
L
0
Two new string removal methods are introduced in Python 3.9+
#str.removeprefix("prefix_to_be_removed")
#str.removesuffix("suffix_to_be_removed")
s='EXAMPLE'
In this case position of 'M' is 3
s = s[:3] + s[3:].removeprefix('M')
OR
s = s[:4].removesuffix('M') + s[4:]
#output'EXAPLE'
from random import randint
def shuffle_word(word):
newWord=""
for i in range(0,len(word)):
pos=randint(0,len(word)-1)
newWord += word[pos]
word = word[:pos]+word[pos+1:]
return newWord
word = "Sarajevo"
print(shuffle_word(word))
Strings are immutable in Python so both your options mean the same thing basically.

Categories