Sketch Engine API search query: How to get frequency?

Sketch Engine API search query: How to get frequency? - python

is anyone familiar with the Python Sketch Engine API and could tell us how to get the frequency of an n-gram?
So far we have this (example from website):
import requests
base_url = 'https://api.sketchengine.co.uk/bonito/run.cgi'
data = {
'corpname': 'bnc2',
'format': 'json',
'lemma': 'book',
'lpos': '-v',
'username': '...',
'api_key': '...'
# get it here: https://the.sketchengine.co.uk/auth/api_access/
}
d = requests.get(base_url + '/wsketch', params=data).json()
print("frequency=", d['freq'])
This gives us the frequency of a lemma, but not an n-gram.

The endpoint /wsketch only takes a single lemma as input. To work with n-grams a different endpoint should be used, for example /view.
import requests
base_url = 'https://api.sketchengine.co.uk/bonito/run.cgi'
data = {
'corpname': 'bnc2',
'format': 'json',
'q': 'q[lemma="read"][lemma="book"]',
'username': '...',
'api_key': '...'
# get it here: https://the.sketchengine.co.uk/auth/api_access/
}
d = requests.get(base_url + '/view', params=data).json()
print("frequency=", d['relsize'])
Here 'relsize' refers to frequency per million.

Related

Gravity form API with python

The documentation of the API is here, and I try to implement this line in python
//retrieve entries created on a specific day (use the date_created field)
//this example returns entries created on September 10, 2019
https://localhost/wp-json/gf/v2/entries?search={"field_filters": [{"key":"date_created","value":"09/10/2019","operator":"is"}]}
But when I try to do with python in the following code, I got an error:
import json
import oauthlib
from requests_oauthlib import OAuth1Session
consumer_key = ""
client_secret = ""
session = OAuth1Session(consumer_key,
client_secret=client_secret,signature_type=oauthlib.oauth1.SIGNATURE_TYPE_QUERY)
url = 'https://localhost/wp-json/gf/v2/entries?search={"field_filters": [{"key":"date_created","value":"09/01/2023","operator":"is"}]}'
r = session.get(url)
print(r.content)
The error message is :
ValueError: Error trying to decode a non urlencoded string. Found invalid characters: {']', '['} in the string: 'search=%7B%22field_filters%22:%20[%7B%22key%22:%22date_created%22,%22value%22:%2209/01/2023%22,%22operator%22:%22is%22%7D]%7D'. Please ensure the request/response body is x-www-form-urlencoded.
One solution is to parameterize the url:
import requests
import json
url = 'https://localhost/wp-json/gf/v2/entries'
params = {
"search": {"field_filters": [{"key":"date_created","value":"09/01/2023","operator":"is"}]}
}
headers = {'Content-type': 'application/json'}
response = session.get(url, params=params, headers=headers)
print(response.json())
But in the retrieved entries, the data is not filtered with the specified date.
In the official documentation, they gave a date in this format "09/01/2023", but in my dataset, the format is: "2023-01-10 19:16:59"
Do I have to transform the format ? I tried a different format for the date
date_created = "09/01/2023"
date_created = datetime.strptime(date_created, "%d/%m/%Y").strftime("%Y-%m-%d %H:%M:%S")
What alternative solutions can I test ?

What if you use urllib.parse.urlencode function, so your code would looks like:
import json
import oauthlib
from requests_oauthlib import OAuth1Session
import urllib.parse
consumer_key = ""
client_secret = ""
session = OAuth1Session(consumer_key,
client_secret=client_secret,signature_type=oauthlib.oauth1.SIGNATURE_TYPE_QUERY)
params = {
"search": {"field_filters": [{"key":"date_created","value":"09/01/2023","operator":"is"}]}
}
encoded_params = urllib.parse.urlencode(params)
url = f'https://localhost/wp-json/gf/v2/entries?{encoded_params}'
r = session.get(url)
print(r.content)
hope that helps

I had the same problem and found a solution with this code:
params = {
'search': json.dumps({
'field_filters': [
{ 'key': 'date_created', 'value': '2023-01-01', 'operator': 'is' }
],
'mode': 'all'
})
}
encoded_params = urllib.parse.urlencode(params, quote_via=urllib.parse.quote)
url = 'http://localhost/depot_git/wp-json/gf/v2/forms/1/entries?' + encoded_params + '&paging[page_size]=999999999' # nombre de réponses par page forcé manuellement
I'm not really sure what permitted it to work as I'm an absolute beginner with Python, but I found that you need double quotes in the URL ( " ) instead of simple quotes ( ' ), so the solution by William Castrillon wasn't enough.
As for the date format, Gravity Forms seems to understand DD/MM/YYYY. It doesn't need a time either.

Getting Deeper Key Value Pairs from a Dictionary in Python

I'm trying to get particular values out of a large dictionary and I'm having trouble doing so. I'm parsing through data from an API and attempting to get just the name attribute from my response. This is the format of the response I'm getting back:
{'data':
[{'id': '5555', 'type': 'proj-pha', 'links': {'self': '{sensitive_url}'}, 'attributes':
{'active': True, 'name': 'Plan', 'language': 'plan_l', 'position': 1},
'relationships': {'account': {'links': {'self': '{sensitive_url}', 'related':
'{sensitive_url}'}}, 'pro-exp': {'links': {'self':
'{sensitive_url}', 'related': '{sensitive_url}'}}}}
To clarify, I'm printing out the API response as a dictionary using:
print(response.json())
Here is some general code from my script for context:
params = {
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"redirect_uri": REDIRECT_URI,
"response_type": RESPONSE_TYPE
}
token = '{sensitive}'
print("Bearer Token: " + token)
session = requests.session()
session.headers = {"authorization": f"Bearer {token}"}
base_api_endpoint = "{sensitive}"
response = session.get(base_api_endpoint)
print(response.json())
What I want is just the 'name': 'Plan' attribute and that's all. The data provided repeats itself to the next "id" and so on until all the iterations have been posted. I'm trying to query out a list of all of these. I'm not looking for a particular answer on how to loop through these to get all of them though that would be helpful, I'm more focused on being able to just pick out the "name" value by itself.
Thanks!

To get all the names, just use list comprenhension, like this:
[item['attributes']['name'] for item in response['data']]
If you only want the name of the i-th item, just do:
response['data'][i]['attributes']['name']
And the last, if you want the name for a specific id:
def name_by_id(response, id):
for item in response['data']:
if item['id'] == id:
return item['attributes']['name']
return None

How can we reach the information with the opensubtitles API?

I'm trying to take the first download 'str' zip link. I don't need more than one file of information. When I tried a not famous movie such as Shame 2011 My code worked but when I tried Avatar doesn't work. I think the code trying to take a lot of 'str' files information, API after that blocks this request.
**How I can reach the first English str file download link? **
from xmlrpc.client import ServerProxy
from pprint import pprint
imdb='tt0499549'#-->Avatar
#'tt1723811'-->Shame 2011
server = ServerProxy("http://api.opensubtitles.org/xml-rpc")
token = server.LogIn('yourusername', 'yourpassword', 'eng', 'TemporaryUserAgent')['token']
response = server.SearchSubtitles(token, [{'sublanguageid': 'eng', 'query':imdb }])#'moviehash':"0"
pprint(response)
You only have five attempts with TemporaryUserAgent.

Check out opensubtitle's new API - here's the documentation. It's way easier to use than the older API.
Grabbing subtitles is as easy as
headers = {
'Api-Key': api_key,
}
params = (
('imdb_id', movie_id),
)
response = requests.get('https://www.opensubtitles.com/api/v1/subtitles', headers=headers, params=params)
Where api_key is your api_key from their website, and movie_id is the movie's IMDB id (e.g., Titanic's ID is 0120338, and can be found within the URL of its movie page on IMDb - https://www.imdb.com/title/tt0120338/)
An example of the response returned looks like this:
{'id': '5164746',
'type': 'subtitle',
'attributes': {'subtitle_id': '5164746',
'language': 'en',
'download_count': 9608,
'new_download_count': 46,
'hearing_impaired': False,
'hd': True,
'format': None,
'fps': 23.976,
'votes': 0,
'points': 0,
'ratings': 0.0,
'from_trusted': False,
'foreign_parts_only': False,
'auto_translation': False,
'ai_translated': False,
'machine_translated': None,
'upload_date': '2020-02-09T13:59:42Z',
'release': '2160p.4K.BluRay.x265.10bit.AAC5.1-[YTS.MX]',
'comments': "Slightly resynced the 1080p.WEBRip.x264-[YTS.LT] version by explosiveskull to this 4K release. HI removed. I didn't do 4K sync for Infinity War, as they're already on site here:\r\nHi: https://www.opensubtitles.org/en/subtitles/7436082/avengers-infinity-war-en\r\nNo HI: https://www.opensubtitles.org/en/subtitles/7436058/avengers-infinity-war-en",
'legacy_subtitle_id': 8092829,
'uploader': {'uploader_id': 66694,
'name': 'pooond',
'rank': 'bronze member'},
'feature_details': {'feature_id': 626618,
'feature_type': 'Movie',
'year': 2019,
'title': 'Avengers: Endgame',
'movie_name': '2019 - Avengers: Endgame',
'imdb_id': 4154796,
'tmdb_id': 299534},
'url': 'https://www.opensubtitles.com/en/subtitles/legacy/8092829',
'related_links': {'label': 'All subtitles for Avengers: Endgame',
'url': 'https://www.opensubtitles.com/en/movies/2019-untitled-avengers-movie',
'img_url': 'https://s9.osdb.link/features/8/1/6/626618.jpg'},
'files': [{'file_id': 5274788,
'cd_number': 1,
'file_name': 'Avengers.Endgame.2019.2160p.4K.BluRay.x265.10bit.AAC5.1-[YTS.MX].srt'}]}}
To download a file you would take the 'file_id' and input it into a download request to the Open Subtitle API like this:
headers = {
'Api-Key': api_key,
'Authorization': auth,
'Content-Type': 'application/json',
}
data = '{"file_id":5274788}'
response = requests.post('https://www.opensubtitles.com/api/v1/download', headers=headers, data=data)
Where auth is the authorization key you get from their API (/api/v1/login endpoint):
headers = {
'Api-Key': api_key,
'Content-Type': 'application/json',
}
data = '{"username":"__USERNAME","password":"__PASSWORD"}'
response = requests.post('https://www.opensubtitles.com/api/v1/login', headers=headers, data=data)
and __USERNAME and __PASSWORD is your account's username and password.

There is a solution
import requests
import json
from pprint import pprint
url = "https://www.opensubtitles.com/api/v1/login"
headers = {'api-key':'YOUR API KEY', 'content-type': 'application/json'}
user = {'username': 'YOUR USERNAME', 'password': "YOUR USER PASSWORD"}
try:
login_response = requests.post(url, data=json.dumps(user), headers=headers)
login_response.raise_for_status()
login_json_response = login_response.json()
login_token = login_json_response['token']
except:
print("Something wrong check again...")
imdb_id="tt0499549"
headers = {
'Api-Key': 'YOUR API KEY',
}
params = (
('imdb_id', imdb_id),
)
query_response = requests.get('https://www.opensubtitles.com/api/v1/subtitles?', params=params, headers=headers)
query_json_response = query_response.json()
print("Report:",query_response)
#pprint(query_json_response)# All data here...
query_file_name = query_json_response['data'][0]['attributes']['files'][0]['file_name']
query_file_no = query_json_response['data'][0]['attributes']['files'][0]['file_id']
movie_img = query_json_response['data'][0]['attributes']['related_links']['img_url']
print ("Movie Image url:",movie_img)
print("File Number:",query_file_no)
print("Subtile File Name:",query_file_name)
download_url = "https://www.opensubtitles.com/api/v1/download"
download_headers = {'api-key': 'YOUR API KEY',
'authorization':login_token,
'content-type': 'application/json'}
download_file_id = {'file_id': query_file_no}
download_response = requests.post(download_url, data=json.dumps(download_file_id), headers=download_headers)
download_json_response = download_response.json()
print("Report:",download_response)
print(download_json_response)
link=download_json_response['link']
saved_file_name = "subtitle.srt"
r = requests.get(link)
with open(saved_file_name, 'wb') as f:
f.write(r.content)

How not to hardcode the value of some correlation_id within headers to get required response?

I'm trying to grab different product names from this webpage. The product names, as in 0041-5053-005 generate dynamically. I can however scrape them using xhr with appropriate parameters.
It is necessary to use this following key and value within the headers to get the required data.
headers = {
'client_secret': '',
'client_id': '',
'correlation_id': '0196e1f2-fb29-0modod-6125-fcbb6c2c69c1',
}
This is how I scraped the titles:
import requests
link = "https://es-be-ux-search.cloudhub.io/api/ux/v2/search?"
payload = {
'queryText': '*:*',
'role': 'rockwell-search',
'spellingCorrect': 'true',
'spellcheckPremium': '10',
'segments': 'Productsv4',
'startIndex': 0,
'numResults': 10,
'facets': '',
'languages': 'en',
'locales': 'en_GLOBAL,en-US',
'sort': 'cat_a',
'collections': 'Literature,Web,Sample_Code',
'site': 'RA'
}
with requests.Session() as s:
r = s.get(link,params=payload,headers=headers)
for item in r.json()['response']['docs']:
print(item['catalogNumber'])
I've noticed that the value of client_secret and client_id are static but the value of correlation_id changes.
How can I use the value of correlation_id within the headers without hardcoding?

The correlation ID is used to correlate HTTP requests between a client and server. See this article for details on how that works. It seems as though this API requires the correlation ID to be present in the HTTP headers, but doesn't change the response based on its value. The response is the same if you give an empty string:
headers = {
'client_secret': '...',
'client_id': '...',
'correlation_id': '',
}

Extracting splukn data from Python script

I am trying to retrieve the data from splunk through python but i get syntax error where as curl command gives me output
import requests
baseurl = 'https://*****/services/search/jobs/export'
headers = {
"Content-Type": "application/json",
}
data = {
'username': '****',
'password': '*******',
"search": "search index=sso-fed-prod source="/app/pingfederate-9.3.2/pingfederate/log/splunk-audit.log" event=SSO OR AUTHN_ATTEMPT OR OAuth connectionid status=success",
}
r = requests.get(baseurl, data=json.dumps(data), headers=headers)
print(r.json())
output
edwops#ip-10-94-202-253:/app/edwops/scripts/python > python splunk_extract.py
File "splunk_extract.py", line 12
"search": "search index=sso-fed-prod source="/app/pingfederate-9.3.2/pingfederate/log/splunk-audit.log" event=SSO OR AUTHN_ATTEMPT OR OAuth connectionid status=success",
^
SyntaxError: invalid syntax

The problem is in the generation of data dict, try:
data = {
'username': '****',
'password': '*******',
"search": 'search index=sso-fed-prod source="/app/pingfederate-9.3.2/pingfederate/log/splunk-audit.log" event=SSO OR AUTHN_ATTEMPT OR OAuth connectionid status=success',
}
Explanation: when you set up value for search key, you close quotation marks " here:
search index=sso-fed-prod source="
so all what goes after is assumed to be a variable by an interpreatator and not the string as intended. So you just need to use single quotation marks ' for the whole search value.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sketch Engine API search query: How to get frequency? - python

Related

Gravity form API with python

Getting Deeper Key Value Pairs from a Dictionary in Python

How can we reach the information with the opensubtitles API?

How not to hardcode the value of some correlation_id within headers to get required response?

Extracting splukn data from Python script

Categories

Resources