I tried to get url as an input for my python program not working
import request
get_url=raw_input(" ")
Page = get.request('get_url')
But
Page = get.request('www.armagsolutions.com')
is working.
Can someone help me with a example to get url as input?
In this line
Page = get.request('get_url')
you pass as arg to function str 'get_url' instead of variable get_url. Use instead
Page = get.request(get_url)
Related
(first post on stackoverflow)
I'm trying to download the source code from that page "https://nyaa.crystalyx.net/search?q=Kuzu+no+Honkai" using urllib3 on python 3.7.1.
I created the following function to save source code in a file:
def get_source_code(url : str):
print(url,len(url))
os.system("pause")
http = urllib3.PoolManager()
r = http.request('GET', url)
content = str(r.data)
#print(content)
#Saves the source_code in a file
source_code = open("source_code.txt","w+")
for letter in content:
source_code.write(letter)
source_code.close()
#Saves the elements of the source code in a list of element splitted by "\n" then deletes the initial file
source_code = open("source_code.txt","r+")
content = (source_code.readline()).split("\\n")
source_code.close()
#os.system("pause")
os.remove("source_code.txt")
#Creates a new file containing the source_code correctly displayed
source_code = open("source_code.txt","w+")
for element in content:
source_code.write(element + '\n')
source_code.close()
Everything works well when I call my function like this:
get_source_code("https://nyaa.crystalyx.net/search?q=Kuzu+no+Honkai")
(you can check the output here https://pastebin.com/SBumCH3b)
So I tried calling my function in a more user-friendly way by using input()
to_download = str(input("Enter the name of the anime you wanna download: "))
to_download = to_download.replace(" ","+")
to_download = str("https://nyaa.crystalyx.net/search?q=") + str(to_download)
get_source_code(to_download)
This ends up giving me a very different and uncomplete source code inside my file
(you can check the output here https://pastebin.com/bq0dqeZw)
I've already checked if the two strings given to get_source_code() are the same and have the same lenght
If anyone can help me it'd be cool.
Thanks.
It's spelled wrong in your second query hence the mistake. In the first screen shot you paste the query is
required type="search" value="Kuzu no Honkai">
In the second query where you take an input it gets spelled slightly different
required type="search" value="Kozu no Honkai">
Notice how it says Kozu instead of Kuzu. Looks like you spelled it wrong entering it.
I am beginner in Web scraping and I have become very much interested in the process.
I set for myself a Project that can keep me motivated till I completed the project.
My Project
My Aim is to write a Python Program that goes to my university results page(which happens to be a " xx.asp") and enters my
MY EXAM NO
MY COURSE
SEMESTER and submit it to the website.
Clicking on the submit button leads to another "yy.asp" page in which my results are displayed. But I am having a lot of trouble doing the same.
Some Sample Data to try it out
The Results Website: http://result.pondiuni.edu.in/candidate.asp
Register Number: 15te1218
Degree: BTHEE
Exam: Second
Could anyone give me directions of how I am to accomplish the task?
I have written a sample program that I am not really proud of or nor does it work as I wanted. The following is the code that I wrote. I am a beginner so sorry if I did something terribly wrong. Please correct me and would be awesome if you could guide me to solve the problem.
The website is a .asp website not .aspx.
I have provided sample data so that you can see whats happening where we submit a request to the website.
The Code
import requests
with requests.Session() as c:
url='http://result.pondiuni.edu.in/candidate.asp'
url2='http://result.pondiuni.edu.in/ResultDisp.asp'
TXTREGNO='15te1218'
CMBDEGREE='BTHEE~\BTHEE\result.mdb'
CMBEXAMNO='B'
DPATH='\BTHEE\result.mdb'
DNAME='BTHEE'
TXTEXAMNO='B'
c.get(url)
payload = {
'txtregno':TXTREGNO,
'cmbdegree':CMBDEGREE,
'cmbexamno':CMBEXAMNO,
'dpath':DPATH,
'dname':DNAME,
'txtexamno':TXTEXAMNO
}
post_request = requests.post(url, data=payload)
page=c.get(url2)
I have no idea what to do next so that I can retrieve my result page(displayed in url2 from the code). All the data is entered in link url in the program(the starting link were all the info is entered) from where after submitting takes is to url2 the results page.
Please help me make this program.
I took all the post form parameters from Chrome's Network Tab.
You are way over complicating it and you have carriage returns in your post data so that could never work:
In [1]: s = "BTHEE~\BTHEE\result.mdb"
In [2]: print(s) # where did "\result.mdb" go?
esult.mdbHEE
In [3]: s = r"BTHEE~\BTHEE\result.mdb" # raw string
In [4]: print(s)
BTHEE~\BTHEE\result.mdb
So fix you form data and just post to get to your results:
import requests
data = {"txtregno": "15te1218",
"cmbdegree": r"BTHEE~\BTHEE\result.mdb", # use raw strings
"cmbexamno": "B",
"dpath": r"\BTHEE\result.mdb",
"dname": "BTHEE",
"txtexamno": "B"}
results_page = requests.post("http://result.pondiuni.edu.in/ResultDisp.asp", data=data).content
To add to the answer already given, you can use bs4.BeautifulSoup to find the data you need in the result page afterwards.
#!\usr\bin\env python
import requests
from bs4 import BeautifulSoup
payload = {'txtregno': '15te1218',
'cmbdegree': r'BTHEE~\BTHEE\result.mdb',
'cmbexamno': 'B',
'dpath': r'\BTHEE\result.mdb',
'dname': 'BTHEE',
'txtexamno': 'B'}
results_page = requests.get('http://result.pondiuni.edu.in/ResultDisp.asp', data = payload)
soup = BeautifulSoup(results_page.text, 'html.parser')
SubjectElem = soup.select("td[width='66%'] font")
MarkElem = soup.select("font[color='DarkGreen'] b")
Subject = []
Mark = []
for i in range(len(SubjectElem)):
Subject.append(SubjectElem[i].text)
Mark.append(MarkElem[i].text)
Transcript = dict(zip(Subject, Mark))
This will give a dictionary with the subject as a key and mark as a value.
I'm trying to manipulate a dynamic JSON from this site:
http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do
It has 3 elements, imagem, a base64, labelValorCaptcha, just a message, and uuidCaptcha, a value to pass by parameter to play a sound in this link bellow:
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_e7b072e1fce5493cbdc46c9e4738ab8a
When I enter in the first site through a browser and put in the second link the uuidCaptha after the equal ("..uuidCaptcha="), the sound plays normally. I wrote a simple code to catch this elements.
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
But I dont know what's happening, the caught value of the uuidCaptcha doesn't work. Open a error web page.
Someone knows?
Thanks!
It works for me.
$ cat a.py
#!/usr/bin/env python
# encoding: utf-8
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
$ python a.py
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_efc8d4bc3bdb428eab8370c4e04ab42c
As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist.
More info here.
friends.
I'm trying to rewrite one of my little tools. basically, it gets an input from user, and if that input doesn't contain the "base url" a function will construct that input into a valid url for other part of the program to work on.
if I were wrote it so the program only accepts valid url as input, it will work; however if I pass a string and construct it, urllib2.urlopen() will fail, and I have no idea why, as the value return is exactly the same str value...
import urllib2
import re
class XunLeiKuaiChuan:
kuaichuanBaseAddress = 'http://kuaichuan.xunlei.com/d/'
regexQuery = 'file_name=\"(.*?)\"\sfile_url=\"(.*?)\sfile_size=\"(.*?)\"'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)'
def buildLink(self, aLink):
if aLink == '':
return
if 'xunlei.com' not in aLink:
aLink = self.kuaichuanBaseAddress + aLink
return aLink
def decodeLink(self, url):
url = self.buildLink(url) #it will return correct url with the value provided.
print 'in decodeLink ' + url
urlReq = urllib2.Request(url)
urlReq.add_header('User-agent', self.agent)
pageContent = urllib2.urlopen(urlReq).read()
realLinks = re.findall(self.regexQuery, pageContent)
return realLinks
test = XunLeiKuaiChuan()
link='y7L1AwKuOwDeCClS528'
link2 = 'http://kuai.xunlei.com/d/y7L1AwKuOwDeCClS528'
s = test.decodeLink(link2)
print s
when I call it with link2 it will function as expected. and will fail when use 'link' someone tell me what I miss here? my "old version" work with only accept full url, but this unknown behavior is killing me here......Thank you.
btw if with full url it returns an empty list, just open the url and enter the catcha on the page. they do it to prevent some kind of 'attacks'....
never mind I got hostname in the code wrong.
I am trying to write a class in Python to open a specific URL given and return the data of that URL...
class Openurl:
def download(self, url):
req = urllib2.Request( url )
content = urllib2.urlopen( req )
data = content.read()
content.close()
return data
url = 'www.somesite.com'
dl = openurl()
data = dl.download(url)
Could someone correct my approach? I know one might ask why not just directly open it, but I want to show a message while it is being downloaded. The class will only have one instance.
You have a few problems.
One that I'm sure is not in your original code is the failure to import urllib2.
The second problem is that dl = openurl() should be dl = Openurl(). This is because Python is case sensitive.
The third problem is that your URL needs http:// before it. This gets rid of an unknown url type error. After that, you should be good to go!
It should be dl = Openurl(), python is case sensitive