how to extract "e2f64fd6-13aa-5c6c-932a-c366a4f56076" from the below api response in python ?
{"message": "Rendition service output e2f64fd6-13aa-5c6c-932a-c366a4f56076/ae8f5aae-4d6a-5a17-9f95-d918634a668c has been created successfully."}
Assuming the message follows the same format always, there are several ways:
First I save the message on a variable d so I can work with it:
d = {"message": "Rendition service output e2f64fd6-13aa-5c6c-932a-c366a4f56076/ae8f5aae-4d6a-5a17-9f95-d918634a668c has been created successfully."}
Solution 1:
d['message'][25:].split('/')[0]
'e2f64fd6-13aa-5c6c-932a-c366a4f56076'
Solution 2 (I like this one more):
d['message'].split(' ')[3].split('/')[0]
'e2f64fd6-13aa-5c6c-932a-c366a4f56076'
If the format of the API response is fixed, you can use regex to extract your data.
import regex
message = "Rendition service output e2f64fd6-13aa-5c6c-932a-c366a4f56076/ae8f5aae-4d6a-5a17-9f95-d918634a668c has been created successfully."
m = regex.search('output (.+?)/', message)
if m:
print(m.group(1))
# prints e2f64fd6-13aa-5c6c-932a-c366a4f56076
If you don't want to use regex you could do:
start = message.find('output ') + len('output ') # To get the index of the character behind this string
end = message.find('/', start)
print(message[start:end])
# prints e2f64fd6-13aa-5c6c-932a-c366a4f56076
Found the information here
Related
I have a python string that comes in a standard format string and i want to extract a piece of that string.
The string come as such:
logs(env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired) #Properties.Source:Session).index(processing).rollup(count).by(#Properties.AcsInfo.Host).last(15m) > 60
I want to extract everything between logs(), that is i need to get this env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired) #Properties.Source:Session
I have tried the below regex but it's not working:
result = re.search('logs((.+?)).', message.strip())
return result.group(1)
result = re.search('logs((.*?)).', message.strip())
return result.group(1)
Can someone please help me ?
Conclusion first:
import pyparsing as pp
txt = 'logs(env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired) #Properties.Source:Session).index(processing).rollup(count).by(#Properties.AcsInfo.Host).last(15m) > 60'
pattern = pp.Regex(r'.*?logs(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))
result = pattern.parse_string(txt)[1][1:-1]
print(result)
* You can install pyparsing by pip install pyparsing
If you persist in using regex, my answer would not be appropriate.
According to this post, however, it seems difficult to parse such nested parentheses by regex. So, I used pyparsing to deal with your case.
Other examples:
The following examples work fine as well:
txt = 'logs(a(bc)d)e'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # a(bc)d
txt = 'logs(a(b(c)d)e(f)g)h(ij(k)l)m'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # a(b(c)d)e(f)g
Note:
Unfortunately, if a pair of parentheses gets broken inside logs(), an unexpected result is obtained or IndexError is raised. So you have to be careful about what kind of text comes in:
txt = 'logs(a)b)c'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # a
txt = 'logs(a(b)c'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # IndexError
If that input string is always in exactly the same format, then you could use the fact that the closing bracket for logs is followed by a .:
original = '''logs(env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired)#Properties.Source:Session).index(processing).rollup(count).by(#Properties.AcsInfo.Host).last(15m) > 60'''
extracted = original.split('logs(')[1].split(').')[0]
print(extracted)
Which gives you this, without the need for regex:
'env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired)#Properties.Source:Session'
You can achieve the result via regex like this:
input = "logs(env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired) #Properties.Source:Session).index(processing).rollup(count).by(#Properties.AcsInfo.Host).last(15m) > 60"
pattern = r'logs\((?P<log>.*)\).index'
print(re.search(pattern, input).group('log'))
# which prints:
# env:production service:FourDS3.Expirer #Properties.NewStatus:(ChallengeAbandoned OR Expired) #Properties.Source:Session
The ?<P> is a named group, which you access by calling group with the name specified inside <>
I am trying to use google translate api as below. Translation seems ok except the apostrophe chars which are returned as ' instaead.
Is it possible to fix those ? I can of course make a postprocessing but I don't know if there is another special character facing with same problem or not.
This is how I perform translation right now:
import pandas as pd
import six
from google.cloud import translate
# Instantiates a client
#translate_client = translate.Client()
"""Translates text into the target language.
Target must be an ISO 639-1 language code.
See https://g.co/cloud/translate/v2/translate-reference#supported_languages
"""
translate_client_en_de = translate.Client(target_language="de")
translate_client_de_en = translate.Client(target_language="en")
target1="de"
target2="en"
#if isinstance(text, six.binary_type):
# text = text.decode('utf-8')
fname ='fname.tsv'
df = pd.read_table(fname,sep='\t')
for i,row in df.iterrows():
text = row['Text']
de1 = translate_client_en_de.translate(
text, target_language=target1)
text2 = de1['translatedText']
en2 = translate_client_de_en.translate(
text2, target_language=target2)
text3 = en2['translatedText']
print(text)
print(text2)
print(text3)
print('----------')
break
Sample output:
Simon's advice after he wouldn't
Simon's advice after
I solve it as follows:
Problem:
The problem is that you need to specify that you are using plain text and not HTML text.
Look at the documentation here: https://googleapis.dev/python/translation/latest/client.html, look for the 'translate' attribute and the 'format_' parameter.
Solution:
Just add the parameter 'format_='text'. In my case I wrote it like this:
result = translate_client.translate(text, target_language=target, format_='text')
and it works well, now the api returns the apostrophe correctly:
Before I got: 'Hello, we haven't seen each other in a long time'.
Now I get: 'Hello, we haven't seen each other in a long time'
In this snippet of code I am trying to obtain the links to images posted in a groupchat by a certain user:
import groupy
from groupy import Bot, Group, Member
prog_group = Group.list().first
prog_members = prog_group.members()
prog_messages = prog_group.messages()
rojer = str(prog_members[4])
rojer_messages = ['none']
rojer_pics = []
links = open('rojer_pics.txt', 'w')
print(prog_group)
for message in prog_messages:
if message.name == rojer:
rojer_messages.append(message)
if message.attachments:
links.write(str(message) + '\n')
links.close()
The issue is that in the links file it prints the entire message: ("Rojer Doewns: Heres a special one +https://i.groupme.com/406x1199.png.7679b4f1ee964656bde93448ff9cee12')>"
What I am wanting to do, is to get rid of characters that aren't part of the URL so it is written like so:
"https://i.groupme.com/406x1199.png.7679b4f1ee964656bde93448ff9cee12"
are there any methods in python that can manipulate a string like so?
I just used string.split() and split it into 3 parts by the parentheses:
for message in prog_messages:
if message.name == rojer:
rojer_messages.append(message)
if message.attachments:
link = str(message).split("'")
rojer_pics.append(link[1])
links.write(str(link[1]) + '\n')
This can done using string indices and the string method .find():
>>> url = "(\"Rojer Doewns: Heres a special one +https://i.groupme.com/406x1199.png.7679b4f1ee964656bde93448ff9cee12')"
>>> url = url[url.find('+')+1:-2]
>>> url
'https://i.groupme.com/406x1199.png.7679b4f1ee964656bde93448ff9cee12'
>>>
>>> string = '("Rojer Doewns: Heres a special one +https://i.groupme.com/406x1199.png.7679b4f1ee964656bde93448ff9cee12\')>"'
>>> string.split('+')[1][:-4]
'https://i.groupme.com/406x1199.png.7679b4f1ee964656bde93448ff9cee12'
I'm trying to make a simple script which tells me when a Twitter account has a new tweet.
import urllib
def CurrentP(array, string):
count = 0
for a_ in array:
if a_ == string:
return count
count = count + 1
twitters = ["troyhunt", "codinghorror"]
last = []
site = "http://twitter.com/"
for twitter in twitters:
source = site+twitter
for line in urllib.urlopen(source):
if line.find(twitter+"/status") != -1:
id = line.split('/')[3]
if id != last[CurrentP(twitters,twitter)]:
print "[+] New tweet + " + twitter
last[CurrentP(twitters,twitter)] = id
But get this error when I try to run the script
File "twitter.py", line 16 in ?
for line in urllib.urlopen(source):
TypeError: iteration over non-sequence
What did I do wrong?
Web Scraping is not the most economical way of retrieving data, Twitter does provides it own API, which returns data in nice JSON format which is very easy to parse and get the relevant inforation, The nice thing is that there are many python libraries available which do the same for you , like Tweepy, making the data extraction as simple as this example.
I am looking for all the features that a YouTube url can have?
http://www.youtube.com/watch?v=6FWUjJF1ai0&feature=related
So far I have seen feature=relmfu, related, fvst, fvwrel. Is there a list for this somewhere. Also, my ultimate aim is to extract the video id (6FWUjJF1ai) from all possible youtube urls. How can I do that? It seems to be difficult. Is there anyone who has already done that?
You can use urlparse to get the query string from your url, then you can use parse_qs to get the video id from the query string.
wrote the code for your assistance....the credit of solving is purely Frank's though.
import urlparse as ups
m = ups.urlparse('http://www.youtube.com/watch?v=6FWUjJF1ai0&feature=related')
print ups.parse_qs(m.query)['v']
From the following answer https://stackoverflow.com/a/43490746/8534966, I ran 55 different test cases and it was able to get 51 matches. See my tests.
So I wrote some if else code to fix it:
# Get YouTube video ID
if "watch%3Fv%3D" in youtube_url:
# e.g.: https://www.youtube.com/attribution_link?a=8g8kPrPIi-ecwIsS&u=/watch%3Fv%3DyZv2daTWRZU%26feature%3Dem-uploademail
search_pattern = re.search("watch%3Fv%3D(.*?)%", youtube_url)
if search_pattern:
youtube_id = search_pattern.group(1)
elif "watch?v%3D" in youtube_url:
# e.g.: http://www.youtube.com/attribution_link?a=JdfC0C9V6ZI&u=%2Fwatch%3Fv%3DEhxJLojIE_o%26feature%3Dshare
search_pattern = re.search("v%3D(.*?)&format", youtube_url)
if search_pattern:
youtube_id = search_pattern.group(1)
elif "/e/" in youtube_url:
# e.g.: http://www.youtube.com/e/dQw4w9WgXcQ
youtube_url += " "
search_pattern = re.search("/e/(.*?) ", youtube_url)
if search_pattern:
youtube_id = search_pattern.group(1)
else:
# All else.
search_pattern = re.search("(?:[?&]vi?=|\/embed\/|\/\d\d?\/|\/vi?\/|https?:\/\/(?:www\.)?youtu\.be\/)([^&\n?#]+)",
youtube_url)
if search_pattern:
youtube_id = search_pattern.group(1)
You may rather want to consider a wider spectrum of url parser as suggested on this Gist.
It will parse more than what urlparse can do.