Splitting a string into smaller strings using split() - python

I'm trying to write a D&D dice roller program in Python. I'd like to have the input be typed in the form "xdy+z" (ex. 4d6+12, meaning roll 4 6-sided di and add 12 to the result), and have the program "roll the dice", add the modifier, and output the result. I'm trying to figure out how to split the string into numbers so I can have the program do the math.
I know of the split() function, and I'm trying to use it. When I input the example above, I get the string [4 6 12], but when I have the program print string[1], I get a white space because it's still one full string. I'd either like to figure out how to get the program to identify the individual numbers in the string, or a way to split the full string into smaller strings (like string1 = [4], string2 = [6], string3 = [12]).
Yes, I tried Google and searching this site, but I'm not sure what the terminology for this type of process is to it's been hard to find help.
Here's the relevant code:
separators = ["d", "+", "-"]
for sep in separators:
inputText = inputText.replace(sep, ' ')

Solution without regular expressions, using find, slicing and split. First, find the position of + or - to get the last number. Then, split the rest at d. Also, convert the partial strings to int:
s = "4d6+12"
if "-" in s:
pos = s.find("-")
add = int(s[pos:])
elif "+" in s:
pos = s.find("+")
add = int(s[pos:])
else:
add = 0
pos = len(s)
rolls, sides = map(int, s[:pos].split("d"))
print(f"{rolls=} {sides=} {add=}")

You can use a regular expression for this instead of splitting the string up. Just capture each numeric position with a group. Check out the docs on the re package for details.
Here's a sample:
import re
pattern = r"(?P<rolls>\d+)d(?P<sides>\d+)\+(?P<add>\d+)"
match = re.match(pattern, input_text)
rolls = int(match["rolls"])
sides = int(match["sides"])
add = int(match["add"])
With input_text as "4d6+12" as in your example, the resulting values are:
print(rolls) # 4
print(sides) # 6
print(add) # 12

Related

Change string for defiened pattern (Python)

Learning Python, came across a demanding begginer's exercise.
Let's say you have a string constituted by "blocks" of characters separated by ';'. An example would be:
cdk;2(c)3(i)s;c
And you have to return a new string based on old one but in accordance to a certain pattern (which is also a string), for example:
c?*
This pattern means that each block must start with an 'c', the '?' character must be switched by some other letter and finally '*' by an arbitrary number of letters.
So when the pattern is applied you return something like:
cdk;cciiis
Another example:
string: 2(a)bxaxb;ab
pattern: a?*b
result: aabxaxb
My very crude attempt resulted in this:
def switch(string,pattern):
d = []
for v in range(0,string):
r = float("inf")
for m in range (0,pattern):
if pattern[m] == string[v]:
d.append(pattern[m])
elif string[m]==';':
d.append(pattern[m])
elif (pattern[m]=='?' & Character.isLetter(string.charAt(v))):
d.append(pattern[m])
return d
Tips?
To split a string you can use split() function.
For pattern detection in strings you can use regular expressions (regex) with the re library.

Python: How to move the position of an output variable using the split() method

This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.

How to remove all non-alphabetic characters from a string?

I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words. I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.
My code is currently like this:
y = 0
import os
files = os.listdir(".")
filenames = []
for names in files:
if names.endswith(".uexp"):
filenames.append(names)
y +=1
print(y)
print(filenames)
for x in range(1,y):
filenamestart = (filenames[x][0:3])
print(filenamestart)
if filenamestart == "CID":
openFile = open(filenames[x],'r')
fileContents = (openFile.read())
ItemName = (fileContents[104:])
print(ItemName)
Input Example file (pulled from HxD):
.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž
I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.
I appreciate any help that anyone can give me.
One way to remove non-alphabetic characters in a string is to use regular expressions [1].
>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'
EDIT
The first argument r'[^a-z]' is the pattern that captures what will removed (here, by replacing it by an empty string ''). The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the a-z denotes all the small caps alphabetiv characters. More information here:
https://docs.python.org/3/library/re.html#regular-expression-syntax
So for instance, to keep also capital letters and spaces it would be:
>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'
However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".
You can use filter:
import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF
You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned
Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase. You can use a for loop to do this.
import re
h = "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"
output = ""
for i in range(0, len(h)):
# Keep spaces
if h[i] is " ":
output += h[i]
# Start of a phrase found, so separate with space and store character
elif h[i].isupper() and h[i+1].islower():
output += " " + h[i]
# We want all lowercase characters
elif h[i].islower():
output += h[i]
# [1:] because we appended a space to the start of every word
print output[1:]
# If you dont care about Outfit since it is always there, remove it
print output[1:].replace("Outfit", "")
Output:
Sun Tan Specialist Outfit Dont get burned
Sun Tan Specialist Dont get burned

Taking a specific character in the string for a list of strings in python

I have a list of 22000 strings like abc.wav . I want to take out a specific character from it in python like a character which is before .wav from all the files. How to do that in python ?
finding the spot of a character could be .split(), but if you want to pull up a specific spot in a string, you could use list[stringNum[letterNum]]. And then list[stringNum].split("a") would get two or more separate strings that are on the other side of the letter "a". Using those strings you could get the spots by measuring the length of the string versus the length of the strings outside of a and compare where those spots were taken. Just a simple algorithm idea ig. You'd have to play around with it.
I am assuming you are trying to reconstruct the same string without the letter before the extension.
resultList = []
for item in list:
newstr = item.split('.')[0]
extstr = item.split('.')[1]
locstr = newstr[:-1] <--- change the selection here depending on the char you want to remove
endstr = locstr + extstr
resultList.append(endstr)
If you are trying to just save a list of the letters you remove only, do the following:
resultList = []
for item in list:
newstr = item.split('.')[0]
endstr = newstr[-1]
resultList.append(endstr)
df= pd.DataFrame({'something':['asb.wav','xyz.wav']})
df.something.str.extract("(\w*)(.wav$)",expand=True)
Gives:
0 1
0 asb .wav
1 xyz .wav

How to remove special characters from a string

I've been looking into how to create a program that removes any whitespaces/special characters from user input. I wan't to be left with just a string of numbers but I've not been able to work out quite how to do this. Is it possible anyone can help?
x = (input("Enter a debit card number: "))
x.translate(None, '!.;,')
print(x)
The code I have created is possibly to basic but yeah, it also doesn't work. Can anyone please help? :) I'm using Python3.
The way str.translate works is different in Py 3.x - it requires mapping a dictionary of ordinals to values, so instead you use:
x = input("Enter a debit card number: ")
result = x.translate(str.maketrans({ord(ch):None for ch in '!.;,'}))
Although you're better off just removing all non digits:
import re
result = re.sub('[^0-9], x, '')
Or using builtins:
result = ''.join(ch for ch in x if ch.isidigit())
It's important to note that strings are immutable and their methods return a new string - be sure to either assign back to the object or some other object to retain the result.
You don't need translate for this aim . instead you can use regex :
import re
x = input("Enter a debit card number: ")
x = re.sub(r'[\s!.;,]*','',x)
[\s!.;,]* match a single character present in the list below:
Quantifier: * Between zero and unlimited times, as many times as
possible, giving back as needed [greedy] \s match any white space
character [\r\n\t\f ] !.;, a single character in the list !.;,
literally
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl.
Assuming that you wanted only numbers (credit card number)
import re: # or 'from re import sub'
s = re.sub('[^0-9]+', "", my_str);
I've used this as an input:
my_str = "663388191712-483498-39434347!2848484;290850 2332049832048 23042 2 2";
What I've got is only numbers (because you mentioned that you want to be left with only numbers):
66338819171248349839434347284848429085023320498320482304222

Categories