Today I've learned that string partition(sep) gives me a tuple with the before, separator, and after params.
Let's say I have this:
string = 'Plans for this weekend include turning wine into water.'
print(string.partition('weekend '))
and it prints out:
('Plans for this ', 'weekend ', 'include turning wine into water.')
How do I grab the value in the third index?
Thanks in advance! (I'm pretty new to Python :)
Just store the result in new variable and access it through its index.
string = 'Plans for this weekend include turning wine into water.'
res = string.partition('weekend ')
print(res[2])
string = 'Plans for this weekend include turning wine into water.'
print(string.partition('weekend ')[2][0])
This will print out i
print(string.partition('weekend ')[2])
Output: include turning wine into water.
It works like array/list referencing.
Related
I'd like to use part of a string ('project') that is returned from an API. The string looks like this:
{'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}
I'd like to store the 'LS003942_EP... ' part in a new variable called foldername. I'm thought a good way would be to use a regex to find the text after Title. Here's my code:
orders = api.get_all(view='Folder', fields='Project Title', maxRecords=1)
for new in orders:
print ("Found 1 new project")
print (new['fields'])
project = (new['fields'])
s = re.search('Title(.+?)', result)
if s:
foldername = s.group(1)
print(foldername)
This gives me an error -
TypeError: expected string or bytes-like object.
I'm hoping for foldername = 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'
You can use ast.literal_eval to safely evaluate a string containing a Python literal:
import ast
s = "{'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}"
print(ast.literal_eval(s)['Project Title'])
# LS003942_EP - 5 Random Road, Sunny Place, SA 5000
It seems (to me) that you have a dictionary and not string. Considering this case, you may try:
s = {'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}
print(s['Project Title'])
If you have time, take a look at dictionaries.
I don't think you need a regex here:
string = "{'Project Title': 'LS003942_EP - 5 Random Road, Sunny Place, SA 5000'}"
foldername = string[string.index(":") + 2: len(string)-1]
Essentially, I'm finding the position of the first colon, then adding 2 to get the starting index of your foldername (which would be the apostrophe), and then I use index slicing and slice everything from the index to the second-last character (the last apostrophe).
However, if your string is always going to be in the form of a valid python dict, you could simply do foldername = (eval(string).values)[0]. Here, I'm treating your string as a dict and am getting the first value from it, which is your desired foldername. But, as #AKX notes in the comments, eval() isn't safe as somebody could pass malicious code as a string. Unless you're sure that your input strings won't contain code (which is unlikely), it's best to use ast.literal_eval() as it only evaluates literals.
But, as #MaximilianPeters notes in the comments, your response looks like a valid JSON, so you could easily parse it using json.parse().
You could try this pattern: (?<='Project Title': )[^}]+.
Explanation: it uses positive lookbehind to assure, that match will occure after 'Project Title':. Then it matches until } is encountered: [^}]+.
Demo
Suppose I have this following list
[('2015-2016-regular', '2016-playoff'), ('2016-2017-regular', '2017-playoff'), ('2017-2018-regular',)]
which represents the two previous complete NHL years and the current one.
I would like to convert it so that It will give me
[('Regular Season 2015-2016 ', 'Playoff 2016'), ('Regular Season 2016-2017', 'Playoff 2017'), ('Regular Season 2017-2018 ',)]
My English is bad and those writing will be used as titles. Are there any errors in the last list?
How could I construct a function which will do such conversions in respecting the 80 characters long norm?
This is a little hacky, but it's an odd question and use case so oh well. Since you have a really limited set of replacements, you can just use a dict to define them and then use a list comprehension with string formatting:
repl_dict = {
'-regular': 'Regular Season ',
'-playoff': 'Playoff '
}
new_list = [
tuple(
'{}{}'.format(repl_dict[name[name.rfind('-'):]], name[:name.rfind('-')])
for name in tup
)
for tup in url_list
]
I tried this. So, I unpacked the tuple. I know where I have to split and which parts to join and did the needful. capitalize() function is for making the first letter uppercase. Also I need to be careful whether the tuple has one or two elements.
l = [('2015-2016-regular', '2016-playoff'), ('2016-2017-regular', '2017-playoff'), ('2017-2018-regular',)]
ans = []
for i in l:
if len(i)==2:
fir=i[0].split('-')
sec = i[1].split('-')
ans.append((fir[2].capitalize()+" "+fir[0]+'-'+fir[1],sec[1].capitalize()+" "+sec[0]))
else:
fir=i[0].split('-')
ans.append((fir[2].capitalize()+" "+fir[0]+'-'+fir[1],))
print ans
Output:
[('Regular 2015-2016', 'Playoff 2016'), ('Regular 2016-2017', 'Playoff 2017'), ('Regular 2017-2018',)]
I'm using python 2.7 for this here. I've got a bit of code to extract certain mp3 tags, like this here
mp3info = EasyID3(fileName)
print mp3info
print mp3info['genre']
print mp3info.get('genre', default=None)
print str(mp3info['genre'])
print repr(mp3info['genre'])
genre = unicode(mp3info['genre'])
print genre
I have to use the name ['genre'] instead of [2] as the order can vary between tracks. It produces output like this
{'artist': [u'Really Cool Band'], 'title': [u'Really Cool Song'], 'genre': [u'Rock'], 'date': [u'2005']}
[u'Rock']
[u'Rock']
[u'Rock']
[u'Rock']
[u'Rock']
At first I was like, "Why thank you, I do rock" but then I got on with trying to debug the code. As you can see, I've tried a few different approaches, but none of them work. All I want is for it to output
Rock
I reckon I could possibly use split, but that could get very messy very quickly as there's a distinct possibility that artist or title could contain '
Any suggestions?
It's not a string that you can use split on,, it's a list; that list usually (always?) contains one item. So you can get that first item:
genre = mp3info['genre'][0]
[u'Rock']
Is a list of length 1, its single element is a Unicode string.
Try
print genre[0]
To only print the first element of the list.
import urllib2,sys
from bs4 import BeautifulSoup,NavigableString
obama_4427_url = 'http://www.millercenter.org/president/obama/speeches/speech-4427'
obama_4427_html = urllib2.urlopen(obama_4427_url).read()
obama_4427_soup = BeautifulSoup(obama_4427_html)
# find the speech itself within the HTML
obama_4427_div = obama_4427_soup.find('div',{'id': 'transcript'},{'class': 'displaytext'})
# convert soup to string for easier processing
obama_4427_str = str(obama_4427_div)
# list of characters to be removed from obama_4427_str
remove_char = ['<br/>','</p>','</div>','<div class="indent" id="transcript">','<h2>','</h2>','<p>']
remove_char
for char in obama_4427_str:
if char in obama_4427_str:
obama_4427_replace = obama_4427_str.replace(remove_char,'')
obama_4427_replace = obama_4427_str.replace(remove_char,'')
print(obama_4427_replace)
Using BeautifulSoup, I scraped one of Obama's speeches off of the above website. Now, I need to replace some residual HTML in an efficient manner. I've stored a list of elements I'd like to eliminate in remove_char. I'm trying to write a simple for statement, but am getting the error: TypeError: expected a character object buffer. It's a beginner question, I know, but how can I get around this?
Since you are using BeautifulSoup already , you can directly use obama_4427_div.text instead of str(obama_4427_div) to get the correctly formatted text. Then the text you get would not contain any residual html elements, etc.
Example -
>>> obama_4427_div = obama_4427_soup.find('div',{'id': 'transcript'},{'class': 'displaytext'})
>>> obama_4427_str = obama_4427_div.text
>>> print(obama_4427_str)
Transcript
To Chairman Dean and my great friend Dick Durbin; and to all my fellow citizens of this great nation;
With profound gratitude and great humility, I accept your nomination for the presidency of the United States.
Let me express my thanks to the historic slate of candidates who accompanied me on this ...
...
...
...
Thank you, God Bless you, and God Bless the United States of America.
For completeness, for removing elements from a string, I would create a list of elements to remove (like the remove_char list you have created) and then we can do str.replace() on the string for each element in the list. Example -
obama_4427_str = str(obama_4427_div)
remove_char = ['<br/>','</p>','</div>','<div class="indent" id="transcript">','<h2>','</h2>','<p>']
for char in remove_char:
obama_4427_str = obama_4427_str.replace(char,'')
I'm trying to get the "real" name of a movie from its name when you download it.
So for instance, I have
Star.Wars.Episode.4.A.New.Hope.1977.1080p.BrRip.x264.BOKUTOX.YIFY
and would like to get
Star Wars Episode 4 A New Hope
So I'm using this regex:
.*?\d{1}?[ .a-zA-Z]*
which works fine, but only for a movie with a number, as in 'Iron Man 3' for example.
I'd like to be able to get movies like 'Interstellar' from
Interstellar.2014.1080p.BluRay.H264.AAC-RARBG
and I currently get
Interstellar 2
I tried several ways, and spent quite a lot of time on it already, but figured it wouldn't hurt asking you guys if you had any suggestion/idea/tip on how to do it...
Thanks a lot!
Given your examples and assuming you always download in 1080p (or know that field's value):
x = 'Interstellar.2014.1080p.BluRay.H264.AAC-RARBG'
y = x.split('.')
print " ".join(y[:y.index('1080p')-1])
Forget the regex (for now anyway!) and work with the fixed field layout. Find a field you know (1080p) and remove the information you don't want (the year). Recombine the results and you get "Interstellar" and "Star Wars Episode 4 A New Hope".
The following regex would work (assuming the format is something like moviename.year.1080p.anything or moviename.year.720p.anything:
.*(?=.\d{4}.*\d{3,}p)
Regex example (try the unit tests to see the regex in action)
Explanation:
\.(?=.*?(?:19|20)\d{2}\b)|(?:19|20)\d{2}\b.*$
Try this with re.sub.See demo.
https://regex101.com/r/hR7tH4/10
import re
p = re.compile(r'\.(?=.*?(?:19|20)\d{2}\b)|(?:19|20)\d{2}\b.*$', re.MULTILINE)
test_str = "Star.Wars.Episode.4.A.New.Hope.1977.1080p.BrRip.x264.BOKUTOX.YIFY\nInterstellar.2014.1080p.BluRay.H264.AAC-RARBG\nIron Man 3"
subst = " "
result = re.sub(p, subst, test_str)
Assuming, there is always a four-digit-year, or a four-digit-resolution notation within the movie's file name, a simple solution replaces the not-wanted parts as this:
"(?:\.|\d{4,4}.+$)"
by a blank, strip()'ing them afterwards ...
For example:
test1 = "Star.Wars.Episode.4.A.New.Hope.1977.1080p.BrRip.x264.BOKUTOX.YIFY"
test2 = "Interstellar.2014.1080p.BluRay.H264.AAC-RARBG"
res1 = re.sub(r"(?:\.|\d{4,4}.+$)",' ',test1).strip()
res2 = re.sub(r"(?:\.|\d{4,4}.+$)",' ',test2).strip()
print(res1, res2, sep='\n')
>>> Star Wars Episode 4 A New Hope
>>> Interstellar