Send a string of OCR text to resAPI - python

I am trying to work with RestfulAPI's on python.
After OCR a pdf, I want to send the text to an restfulAPI to get back retrieve specific words along with their position within the text. I have not manage to send the string of text to the API yet.
Code follows:
import requests
import PyPDF2
import json
url = "http://xxapi.xxapi.org/xxx.util.json"
pdfFileObj = open('/Users/xxx/pdftoOCR.pdf','rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(1) # To try with the text found in the first page
data = {"text": pageObj.extractText()}
data_json = json.dumps(data)
params = {'text':'string'}
r = requests.post(url, data=data_json, params=params)
r1 = json.loads(r.text)
Although I get a response 200 from the request, The data should come in Json format with the need to poll some token URL (Which I don`t know how to do it either) Also I don't think the request is correct as when I paste the token url to the browser I see an empty Json file (No words, no position) even if I know the piece of text I'm trying to send contains the desired words.
Thanks in advance! I work with OS X , python 3.5

Well, many thanks to #Jose.Cordova.Alvear for resolving this issue
import json
import requests
pdf= open('test.pdf','rb')
url = "http://xxapi.xxapi.org/xxx.util.json"
payload = {
'file' :pdf
}
response = requests.post(url, files=payload)
print response.json()

Related

Can't find a way to get Twitch API information through requests.py

I am stuck on this API thing. I want to print the incoming json file (channel points in specific) but it just prints the whole page in html format. Here is my code:
import requests
import json
client_id = secret
oauth_token = secret
my_uri = 'https://localhost'
header = {"Authorization": f"Bearer {oauth_token}"}
url = f'https://id.twitch.tv/oauth2/authorize?client_id={client_id}&redirect_uri={my_uri}&response_type=id_token&scope=channel:read:redemptions+openid&state=c3ab8aa609ea11e793ae92361f002671&claims={"id_token":{"email_verified":null}}'
response = requests.get(url, headers=header)
print(response.text)
My hypothesis is that either the url or the header is the problem. The twitch API is made for c# or js originally but I don't know how to convert that information to python.
I would also like to know how to do the "PING" and "PONG" thing that Twitch is writing about in the API
response.text show the html text
response.json show the json
tell me if it work now
response.json() will return you the data in JSON format

How to use Python requests to fill out a date option as well as download from a radiobutton

I'm trying to make a python script that will scrape data from this website
https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315
and download the CSV from yesterday. As you can see, it's got two menu options for dates, a radio button for CSV and then a submit button.
I thought perhaps I could use the requests library? Not looking for someone to do it for me, but if anyone could point me in the right direction that would be great!
I know this is too simple but here is what I have so far:
import requests
print('Download Starting...')
url = 'https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315'
r = requests.get(url)
filename = url.split('/')[-1] # this will take only -1 splitted part of the url
with open(filename,'wb') as output_file:
output_file.write(r.content)
print('done')
You need first to use requests.Session() in order to store cookies and re-send them in subsequent requests. The process is the following :
get the original URL first to get the cookies (session id)
make a request on POST /reports/ci_report/server/request.php with some parameters including date and output format. The result is a json with an id like this :
{'jrId': 'jr_13879611'}
make a request on GET /reports/ci_report/server/streamReport.php?jrId=jr_13879611 which gives the csv data
There is a parameter in the POST request where we need the menuitem query param value from your original url, so we parse the query params to get it using urlparse :
import requests
import time
import urllib.parse as urlparse
from urllib.parse import parse_qs
from datetime import datetime,timedelta
yesterday = datetime.now() - timedelta(1)
yesterday_date = f'{yesterday.strftime("%d")}-{yesterday.strftime("%B")[:3]}-{yesterday.strftime("%Y")}'
original_url = "https://noms.wei-pipeline.com/reports/ci_report/launch.php?menuitem=2600315"
parsed = urlparse.urlparse(original_url)
target_url = "https://noms.wei-pipeline.com/reports/ci_report/server/request.php"
stream_report_url = "https://noms.wei-pipeline.com/reports/ci_report/server/streamReport.php"
s = requests.Session()
# load the cookies
s.get(original_url)
#get id
r = s.post(target_url,
params = {
"request.preventCache": int(round(time.time() * 1000))
},
data = {
"ReportProc": "CIPR_DAILY_BULLETIN",
"p_ci_id": parse_qs(parsed.query)['menuitem'][0],
"p_opun": "PL",
"p_gas_day_from": yesterday_date,
"p_gas_day_to": yesterday_date,
"p_output_option": "CSV"
})
r = s.get(stream_report_url, params = r.json())
print(r.text)
Try this on repl.it

PUT method using python3 and urbllib - headers

So I am trying to just receive the data from this json. I to use POST, GET on any link but the link I am currently trying to read. It needs [PUT]. So I wanted to know if I was calling this url correctly via urllib or am I missing something?
Request
{"DataType":"Word","Params":["1234"], "ID":"22"}
Response {
JSON DATA IN HERE
}
I feel like I am doing the PUT method call wrong since it is wrapped around Request{}.
import urllib.request, json
from pprint import pprint
header = {"DataType":"Word","Params":"[1234"]", "ID":"22"}
req = urllib.request.Request(url = "website/api/json.service", headers =
heaer, method = 'PUT')
with urllib.request.urlopen(req) as url:
data = json.loads(url.read(), decode())
pprint(data)
I am able to print json data as long as its anything but PUT. As soon as I get a site with put on it with the following JSON template I get an Internal Error 500. So I assumed it was was my header.
Thank you in advanced!

urllib request for json does not match the json in browser

This is my code thus far.
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = urllib.urlopen(url)
print response
data = json.load(response)
print data
The problem is that when I look at the json in the browser it is long and contains more features than I see when printing it.
To be more exact, I'm looking for the 'points' part which should be
data['points']['points']
however
data['points']
has only 2 attributes and doesn't contain the second 'points' that I do see in the url in the browser.
Could it be that I can only load 1 "layer" deep and not 2?
You need to add a user-agent to your request.
Using requests (which urllib documentation recommends over directly using urllib), you can do:
import requests
url = 'https://www.endomondo.com/rest/v1/users/3014732/workouts/357031682'
response = requests.get(url, headers={'user-agent': 'Mozilla 5.0'})
print(response.json())
# long output....

Using requests function in python to submit data to a website and call back a response

I am trying to use the requests function in python to post the text content of a text file to a website, submit the text for analysis on said website, and pull the results back in to python. I have read through a number of responses here and on other websites, but have not yet figured out how to correctly modify the code to a new website.
I'm familiar with beautiful soup so pulling in webpage content and removing HTML isn't an issue, its the submitting the data that I don't understand.
My code currently is:
import requests
fileName = "texttoAnalyze.txt"
fileHandle = open(fileName, 'rU');
url_text = fileHandle.read()
url = "http://www.webpagefx.com/tools/read-able/"
payload = {'value':url_text}
r = requests.post(url, payload)
print r.text
This code comes back with the html of the website, but hasn't recognized the fact that I'm trying to a submit a form.
Any help is appreciated. Thanks so much.
You need to send the same request the website is sending, usually you can get these with web debugging tools (like chrome/firefox developer tools).
In this case the url the request is being sent to is: http://www.webpagefx.com/tools/read-able/check.php
With the following params: tab=Test+by+Direct+Link&directInput=SOME_RANDOM_TEXT
So your code should look like this:
url = "http://www.webpagefx.com/tools/read-able/check.php"
payload = {'directInput':url_text, 'tab': 'Test by Direct Link'}
r = requests.post(url, data=payload)
print r.text
Good luck!
There are two post parameters, tab and directInput:
import requests
post = "http://www.webpagefx.com/tools/read-able/check.php"
with open("in.txt") as f:
data = {"tab":"Test by Direct Link",
"directInput":f.read()}
r = requests.post(post, data=data)
print(r.content)

Categories