My current test python script does something like:
#!/usr/bin/env python
import sys
data = sys.stdin.read()
myjson = json.loads(data)
The problem is that even if this worked in some case, on others it seems to block, probably at the read().
For some other reasons I am forced to use tomcat to the cgi scripts, not sure if this matters anyway.
You'll need to check the content length before reading and limit the number of byte read by sys.stdin.read(). See cgi.parse_header().
Update:
Your incoming data comes through the environment which is populated by the web server. It is accessible in os.environ.
import os
from cgi import parse_header
os.environ['Content-Type'] = 'text/html; charset=utf-8'
parse_header(os.environ['Content-Type'])
# returns ('text/html', {'charset': 'utf-8'})
So in your CGI script you need (roughly):
import os, cgi, sys
cl, _ = parse_header(os.environ['Content-Length'])
data = sys.stdin.read(int(cl))
Related
I am trying to understand http queries and was succesfully getting the data from GET requests through the environment variables by first looking through the keys of the environment vars and then accessing 'QUERY_STRING' to get the actual data.
like this:
#!/usr/bin/python3
import sys
import cgi
import os
inputVars = cgi.FieldStorage()
f = open('test','w')
f.write(str(os.environ['QUERY_STRING])+"\n")
f.close()
Is there a way to get the POST data (the equivalent of 'QUERY_STRING' for POST - so to say) as well or is it not accessible because the POST data is send in its own package? the keys of the environment variables did not give me any hint so far.
the possible duplicate link solved it, as syntonym pointed out in the comments and user Schien explains in one of the answers to the linked question:
the raw http post data (the stuff after the query) can be read through stdin.
so the sys.stdin.read() method can be used.
my code now works looking like this:
#!/usr/bin/python3
import sys
import os
f = open('test','w')
f.write(str(sys.stdin.read()))
f.close()
I have been trying to write a python scripts used for checks on IP address to www.blacklistalert.com. The goal is to run the script on one or multiple IP addresses as standard input. Then have the standard output from the site be printed on the command line.
import urllib.parse
import urllib.request
ip = str(urllib.request'[$1]')
url = 'http://www.blacklistalert.org/'
values = { 'query': '$1' }
data = urllib.parse.urlencode(values)
url = '?'.join([url, data])
req = urllib.request.Request(url, binary_data)
response = urllib.request.urlopen(req)
the_page = response.read()
print (the_page)
I am running into the problem of sending the query to the page, and getting results, which come on a separate page. I am currently getting the error:
ip = str(urllib.request'[$1]')
SyntaxError: invalid syntax
Whats the best approach to run the IP address query, and get the response in standard output? Thanks in advance.
Python is not bash, and does not process arguments on stdin automatically into variables like $1 (hint: the only place $ is used in Python is when dealing with currency based on the dollar). Instead, Python places command-line arguments in the sys.argv list. You can either read them from there (sys.argv[0] is the script's name, sys.argv[1] and so on are the command-line parameters - read the docs for the full story), or use something like the argparse module, which I recommend. There's a bit more code involved, but it'll be worth it in the end.
OK, this is driving me nuts.
I am trying to read from the Crunchbase API using Python's Urllib2 library. Relevant code:
api_url="http://api.crunchbase.com/v/1/financial-organization/venrock.js"
len(urllib2.urlopen(api_url).read())
The result is either 73493 or 69397. The actual length of the document is much longer. When I try this on a different computer, the length is either 44821 or 40725. I've tried changing the user-agent, using Urllib, increasing the time-out to a very large number, and reading small chunks at a time. Always the same result.
I assumed it was a server problem, but my browser reads the whole thing.
Python 2.7.2, OS X 10.6.8 for the ~40k lengths. Python 2.7.1 running as iPython for the ~70k lengths, OS X 10.7.3. Thoughts?
There is something kooky with that server. It might work if you, like your browser, request the file with gzip encoding. Here is some code that should do the trick:
import urllib2, gzip
api_url='http://api.crunchbase.com/v/1/financial-organization/venrock.js'
req = urllib2.Request(api_url)
req.add_header('Accept-encoding', 'gzip')
resp = urllib2.urlopen(req)
data = resp.read()
>>> print len(data)
26610
The problem then is to decompress the data.
from StringIO import StringIO
if resp.info().get('Content-Encoding') == 'gzip':
g = gzip.GzipFile(fileobj=StringIO(data))
data = g.read()
>>> print len(data)
183159
I'm not sure if this is a valid answer, since it's a different module entirely but using the requests module, I get a ~183k response:
import requests
url = r'http://api.crunchbase.com/v/1/financial-organization/venrock.js'
r = requests.get(url)
print len(r.text)
>>>183159
So if it's not too late into the project, check it out here: http://docs.python-requests.org/en/latest/index.html
edit: Using the code you provided, I also get a len of ~36k
Did a quick search and found this: urllib2 not retrieving entire HTTP response
I'm trying to extract data from the following page:
http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#
Which, conveniently and inefficiently enough, includes all the data embedded as a csv file in the header, set as a variable called gs_csv.
How do I extract this? Document.body.innerhtml skips the header where the data is, what is the alternative that includes the header (or better yet, the value associated with gs_csv)?
(Sorry, new to all this, I've been searching through loads of documentation, and trying a lot of them, but nothing so far has worked).
Thanks to Sinan (this is mostly his solution transcribed into Python).
import win32com.client
import time
import os
import os.path
ie = Dispatch("InternetExplorer.Application")
ie.Visible=False
ie.Navigate("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#")
time.sleep(20)
webpage=ie.document.body.innerHTML
s1=ie.document.scripts(1).text
s1=s1[s1.find("gs_csv")+8:-11]
scriptfilepath="c:\FO Share\bmreports\script.txt"
scriptfile = open(scriptfilepath, 'wb')
scriptfile.write(s1.replace('\n','\n'))
scriptfile.close()
ie.quit
Untested: Did you try looking at what Document.scripts contains?
UPDATE:
For some reason, I am having immense difficulty getting this to work using the Windows Scripting Host (but then, I don't use it very often, apologies). Anyway, here is the Perl source that works:
use strict;
use warnings;
use Win32::OLE;
$Win32::OLE::Warn = 3;
my $ie = get_ie();
$ie->{Visible} = 1;
$ie->Navigate(
'http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?'
.'param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#'
);
sleep 1 until is_ready( $ie );
my $scripts = $ie->Document->{scripts};
for my $script (in $scripts ) {
print $script->text;
}
sub is_ready { $_[0]->{ReadyState} == 4 }
sub get_ie {
Win32::OLE->new('InternetExplorer.Application',
sub { $_[0] and $_[0]->Quit },
);
}
__END__
C:\Temp> ie > output
output now contains everything within the script tags.
fetch the source of that page using ajax, and parse the response text like XML using jquery. It should be simple enought to get the text of the first tag you encounter inside the
I'm out of touch with jquery, or I would have posted code examples.
EDIT: I assume you are talking about fetching the csv on the client side.
If this is just a one off script then exctracting this csv data is as simple as this:
import urllib2
response = urllib2.urlopen('http://www.bmreports.com/foo?bar?')
html = response.read()
csv = data.split('gs_csv=')[1].split('</SCRIPT>')[0]
#process csv data here
Thanks to Sinan (this is mostly his solution transcribed into Python).
import win32com.client
import time import os
import os.path
ie = Dispatch("InternetExplorer.Application") ie.Visible=False
ie.Navigate("http://www.bmreports.com/servlet/com.logica.neta.bwp_PanBMDataServlet?param1=¶m2=¶m3=¶m4=¶m5=2009-04-22¶m6=37#")
time.sleep(20)
webpage=ie.document.body.innerHTML
s1=ie.document.scripts(1).text s1=s1[s1.find("gs_csv")+8:-11]
scriptfilepath="c:\FO Share\bmreports\script.txt"
scriptfile = open(scriptfilepath, 'wb')
scriptfile.write(s1.replace('\n','\n'))
scriptfile.close()
ie.quit
I'm currently trying to initiate a file upload with urllib2 and the urllib2_file library. Here's my code:
import sys
import urllib2_file
import urllib2
URL='http://aquate.us/upload.php'
d = [('uploaded', open(sys.argv[1:]))]
req = urllib2.Request(URL, d)
u = urllib2.urlopen(req)
print u.read()
I've placed this .py file in my My Documents directory and placed a shortcut to it in my Send To folder (the shortcut URL is ).
When I right click a file, choose Send To, and select Aquate (my python), it opens a command prompt for a split second and then closes it. Nothing gets uploaded.
I knew there was probably an error going on so I typed the code into CL python, line by line.
When I ran the u=urllib2.urlopen(req) line, I didn't get an error;
alt text http://www.aquate.us/u/55245858877937182052.jpg
instead, the cursor simply started blinking on a new line beneath that line. I waited a couple of minutes to see if something would happen but it just stayed like that. To get it to stop, I had to press ctrl+break.
What's up with this script?
Thanks in advance!
[Edit]
Forgot to mention -- when I ran the script without the request data (the file) it ran like a charm. Is it a problem with urllib2_file?
[edit 2]:
import MultipartPostHandler, urllib2, cookielib,sys
import win32clipboard as w
cookies = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),MultipartPostHandler.MultipartPostHandler)
params = {"uploaded" : open("c:/cfoot.js") }
a=opener.open("http://www.aquate.us/upload.php", params)
text = a.read()
w.OpenClipboard()
w.EmptyClipboard()
w.SetClipboardText(text)
w.CloseClipboard()
That code works like a charm if you run it through the command line.
If you're using Python 2.5 or newer, urllib2_file is both unnecessary and unsupported, so check which version you're using (and perhaps upgrade).
If you're using Python 2.3 or 2.4 (the only versions supported by urllib2_file), try running the sample code and see if you have the same problem. If so, there is likely something wrong either with your Python or urllib2_file installation.
EDIT:
Also, you don't seem to be using either of urllib2_file's two supported formats for POST data. Try using one of the following two lines instead:
d = ['uploaded', open(sys.argv[1:])]
## --OR-- ##
d = {'uploaded': open(sys.argv[1:])}
First, there's a third way to run Python programs.
From cmd.exe, type python myprogram.py. You get a nice log. You don't have to type stuff one line at a time.
Second, check the urrlib2 documentation. You'll need to look at urllib, also.
A Request requires a URL and a urlencoded encoded buffer of data.
data should be a buffer in the
standard
application/x-www-form-urlencoded
format. The urllib.urlencode()
function takes a mapping or sequence
of 2-tuples and returns a string in
this format.
You need to encode your data.
If you're still on Python2.5, what worked for me was to download the code here:
http://peerit.blogspot.com/2007/07/multipartposthandler-doesnt-work-for.html
and save it as MultipartPostHandler.py
then use:
import urllib2, MultipartPostHandler
opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler())
opener.open(url, {"file":open(...)})
or if you need cookies:
import urllib2, MultipartPostHandler, cookielib
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MultipartPostHandler.MultipartPostHandler())
opener.open(url, {"file":open(...)})