Which index of string is an int in python - python

I am reading a text file with high scores and trying to find which index of the string is where the name stops, and the score starts. This is the format of the file:
John 15
bob 27
mary 72
videogameplayer99 99
guest 71
How can I do this?

If you are looking to find the index to split the string into 2 separate parts, then you can just use [string].split() (where string is an individual line). If you need to find the index of the space for some other reason, use: [string].index(" ").

You can strip the line to separate it by the space. It will result in a list containing the 2 'words' in the line, in this case the words will be the name and the score (in string). You can get it using:
result = line.split()
name = result[0]
score = int(result[1])

In this case, for each line, you would be looking for the index where you first find the space character " ". In python, you can accomplish this by using the find function on a string. For example, if you have a string s = videogameplayer99 99, then s.find(" ") will return `17'.
If you are using this method to split a name from a number, I would instead recommend using the split function, which will split a string based on some delimiter character. For example, s.split(" ") = ["videogameplayer99", "99"].

Related

using .find to find the space in a line Python

im currently using this code
position= line.find(" ")
to help me find a position for example in the case of a name: Coner John Smith
how do i single out the smith, currently my line of code wont help because it positions itself on the first space it finds, how do rewrite the code to find the second space in a line?
Use find() twice:
firstspace = line.find(" ")
if firstspace != -1:
secondspace = line.find(" ", firstspace+1)
The second argument to str.find() is the position to start searching from.
The answer to your literal question is:
position= line.find(" ", line.find(" ") + 1)
What this does is find the first space in line, and then find the first space in line, starting the search on the position of the first space, plus one. Giving you the position of the second space.
But as #barmar points out, this is probably an example of an XY problem
Do you want the second space? Or the last space (e.g. 'Mary Jo Suzy Smith')?
Do you want the position of the space? Or are you really just after the last word in the string? Do you care about any interpunction following the last word in the string (e.g. 'John Smith!')
Every case has a slightly different better answer.
To get the last word, you could:
last_word = line.split(' ')[-1]
To find the last space, if you need it for something else:
last_space = line.rfind(' ')
Etc.
If you want the last space, use rfind or if you really want the last word, .rsplit() for it
>>> "Coner John Smith".rfind(" ")
10
>>> "Coner John Smith".rsplit(" ", 1)[-1]
'Smith'
If you want the Nth space, you can repeatedly locate with .find(), which can accept a start arg for where to begin finding!
source_text = "Coner John Smith"
N = 2
index = -1 # search from index 0
for _ in range(N):
index = source_text.find(" ", index + 1)
# opportunity to discover if missing or `break` on last, etc.

Python: How to move the position of an output variable using the split() method

This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.

Taking a specific character in the string for a list of strings in python

I have a list of 22000 strings like abc.wav . I want to take out a specific character from it in python like a character which is before .wav from all the files. How to do that in python ?
finding the spot of a character could be .split(), but if you want to pull up a specific spot in a string, you could use list[stringNum[letterNum]]. And then list[stringNum].split("a") would get two or more separate strings that are on the other side of the letter "a". Using those strings you could get the spots by measuring the length of the string versus the length of the strings outside of a and compare where those spots were taken. Just a simple algorithm idea ig. You'd have to play around with it.
I am assuming you are trying to reconstruct the same string without the letter before the extension.
resultList = []
for item in list:
newstr = item.split('.')[0]
extstr = item.split('.')[1]
locstr = newstr[:-1] <--- change the selection here depending on the char you want to remove
endstr = locstr + extstr
resultList.append(endstr)
If you are trying to just save a list of the letters you remove only, do the following:
resultList = []
for item in list:
newstr = item.split('.')[0]
endstr = newstr[-1]
resultList.append(endstr)
df= pd.DataFrame({'something':['asb.wav','xyz.wav']})
df.something.str.extract("(\w*)(.wav$)",expand=True)
Gives:
0 1
0 asb .wav
1 xyz .wav

Python Regex: how to not select whitespace before last string?

I am (a newbie,) struggling with separating a database in columns with regex.findall().
I want to separate these Dutch street names into name and number.
Roemer Visscherstraat 15
Vondelstraat 102-huis
For the number I use
\S*$
Which works just fine. For the street name I use
^\S.+[^\S$]
Or: use everything but the last element, which may be a number or a combination of a number and something else.
Problem is: Python then also keeps the last whitespace after the last name, so I get:
'Roemer Visscherstraat '
Any way I can stop this from happening?
Also, Findall returns a list consisting of the bit of database I wanted, and an empty string. How does this happen and can i prevent it somehow?
Thanks so much in advance for you help.
You can rstrip() the name to remove any spaces at the end of it:
>>>'Roemer Visscherstraat '.rstrip()
'Roemer Visscherstraat'
But if the input is similar to the one you posted, you can simply use split() instead of regex, for example:
st = 'Roemer Visscherstraat 15'
data = st.split()
num = st[-1]
name = ' '.join(st[:-1])
print 'Name: {}, Number: {}'.format(name, num)
output:
Name: Roemer Visscherstraat, Number: 15
For the number you should use the following:
\S+$
Using a + instead of a * will ensure that you have at least one character in the match.
For the street name you can use the following:
^.+(?=\s\S+$)
What this does is selects text up until the number.
However, what you may consider doing is using one regex match with capture groups instead. The following would work:
^(.+(?=\s\S+$))\s(\S+$)
In this case, the first capture group gives you the street name, and the second gives you the number.
([^\d]*)\s+(\d.*)
In this regex the first group captures everything before a space and a number and the 2nd group gives the desired number
my assumption is that number would begin with a digit and the name would not have a digit in it
take a look at https://regex101.com/r/eW0UP2/1
Roemer Visscherstraat 15
Full match 0-24 `Roemer Visscherstraat 15`
Group 1. 0-21 `Roemer Visscherstraat`
Group 2. 22-24 `15`
Vondelstraat 102-huis
Full match 24-46 `Vondelstraat 102-huis`
Group 1. 24-37 `Vondelstraat`
Group 2. 38-46 `102-huis`

Extracting part of a string based on its naming convention

I'm trying to extract a piece of information about a certain file. The file name is extracted from an xml file.
The information I want is stored in the name of the file, I want to know how to extract the letters between the 2nd and 3rd period in the string.
Eg. name is extracted from the xml, it is stored as a string that looks something like this "aa.bb.cccc.dd.ee" and I need to find what "cccc" actually is in each of the strings I extract (~50 of them).
I've done some searching and some playing around with slicing etc. but I can't get even close.
I can't just specify the letter in the range [6:11] because the length of the string varies as does the number of characters before the part I want to find.
UPDATE: Solution Added.
Due to the fact the data that I was trying to split and extract part from was from an xml file it was being stored as an element.
I iterated through the list of Estate Names and stored the EstateName attribute for each one as a variable
for element in EstateList:
EstateStr = element.getAttribute('EstateName')
I then used the split on this new variable which contains strings rather than elements and wrote them to the desired text file:
asset = EstateStr.split('.', 3)[2]
z.write(asset + "\n")
If you are certain it will always have this format (5 blocks of characters, separated by 4 decimals points) you can split on '.' then index the third element [2].
>>> 'aa.bb.cccc.dd.ee'.split('.')[2]
'cccc'
This works for various string lengths so you don't have to worry about the absolute position using slicing as your first approach mentioned.
>>> 'a.b.c.d.e'.split('.')[2]
'c'
>>> 'eeee.ddddd.ccccc.bbbbb.aaaa'.split('.')[2]
'ccccc'
Split the string on the period:
third_part = inputstring.split('.', 3)[2]
I've used str.split() with a limit here for efficiency; no point in splitting the dd.ee part here, for example.
The [2] index then picks out the third result from the split, your cccc string:
>>> "aa.bb.cccc.dd.ee".split('.', 3)[2]
'cccc'
You could use re module to extract the string between 2 and third dot.
>>> re.search(r'^[^.]*\.[^.]*\.([^.]*)\..*', "aa.bb.cccc.dd.ee").group(1)
'cccc'

Categories