So I have a Python CGI script running on my apache server. Basically, from a webpage, the user enters a word into a form, and that word is passed to the script. The word is then used to query the Twitter Search API and return all the tweets for that word. So the issue is, I'm running this query in a loop so I get three pages of results returned (approximately 300 tweets). But what I call the script (which prints out all the tweets into an HTML page), the page will sometimes display 5 tweets, sometimes 18, completley random numbers. Is this a timeout issue, or am I missing some basic in my code? Python CGI script posted below, thanks in advance.
#!/usr/bin/python
# Import modules for CGI handling
import cgi, cgitb
import urllib
import json
# Create instance of FieldStorage
form = cgi.FieldStorage()
# Get data from fields
topic = form.getvalue('topic')
results=[]
for x in range(1,3):
response = urllib.urlopen("http://search.twitter.com/search.json?q="+topic+"&rpp=100&include_entities=true&result_type=mixed&lang=en&page="+str(x))
pyresponse= json.load(response)
results= results + pyresponse["results"]
print "Content-type:text/html\r\n\r\n"
print "<!DOCTYPE html>"
print "<html>"
print "<html lang=\"en\">"
print "<head>"
print "<meta charset=\"utf-8\" />"
print "<meta name=\"description\" content=\"\"/>"
print "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/>"
print "<title>Data analysis for %s </title>" %(topic)
print "</head>"
print "<body>"
print "<label>"
for i in range(len(results)):
print str(i)+": "+results[i]["text"]+ "<br></br>"
print "</label>"
print "</body>"
print "</html>"
First of all I would point out that range(1,3) will not get you three pages like you are expecting.
However, running your Python code in an interpreter encountered an exception at this point:
>>> for i in range(len(results)):
... print str(i) + ": "+ results[x]["text"]
<a few results print successfully>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\U0001f611'
in position 121: ordinal not in range(256)
Modifying the encoding then would print them all:
>>> for i in range(len(results)):
... print str(i) + ": "+ results[i]["text"].encode('utf-8')
<success>
Ok, got it.
It was actually a really stupid fix. Basically, since the Python is parsing the JSON it needs to encode all of the text into UTF-8 format so it can display correctly.
print str(i)+": "+results[i]["text"].encode('utf-8')+ "<br></br>"
Nothing to do with the script or server itself.
Related
In python, how do I parse this into strings?
I expect the output to print each line, lines are found by a newline (\n) delimeter, but all I get are individual characters, for example, if the Server sends "This is a string
this is another one" I get
"T
h
i
s
..."
And so on.
from time import sleep
tn = Telnet('myhost',port)
sleep(0.5)
response = tn.read_very_eager()
#How do I do something like this? I tried parsing it using string.split,
#all I got was individual characters.
foreach (line in response):
print line, "This is a new line"
tn.close()
foreach (line in response):
print line, "This is a new line"
IF I got your question correctly, it should look like this:
from time import sleep
tn = Telnet('myhost', port)
sleep(0.5)
response = tn.read_very_eager()
for line in response.split():
# Python 3.x version print
print(line)
# Python 2.x version print
# print line
tn.close()
UPD: Updated answer according to the comment from OP.
def scrapeFacebookPageFeedStatus(page_id, access_token):
# -*- coding: utf-8 -*-
with open('%s_facebook_statuses.csv' % page_id, 'wb') as file:
w = csv.writer(file)
w.writerow(["status_id", "status_message", "link_name", "status_type", "status_link",
"status_published", "num_likes", "num_comments", "num_shares"])
has_next_page = True
num_processed = 0 # keep a count on how many we've processed
scrape_starttime = datetime.datetime.now()
print "Scraping %s Facebook Page: %s\n" % (page_id, scrape_starttime)
statuses = getFacebookPageFeedData(page_id, access_token, 100)
while has_next_page:
for status in statuses['data']:
w.writerow(processFacebookPageFeedStatus(status))
# output progress occasionally to make sure code is not stalling
num_processed += 1
if num_processed % 1000 == 0:
print "%s Statuses Processed: %s" % (num_processed, datetime.datetime.now())
# if there is no next page, we're done.
if 'paging' in statuses.keys():
statuses = json.loads(request_until_succeed(statuses['paging']['next']))
else:
has_next_page = False
print "\nDone!\n%s Statuses Processed in %s" % (num_processed, datetime.datetime.now() - scrape_starttime)
scrapeFacebookPageFeedStatus(page_id, access_token)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-43: ordinal not in range(128)
I'm writing code to scrape through Facebook pages to gather all the posts in cvs file.
The code is working properly when there is only the English language, but
the error above appears when I try to scrape through pages that post in Arabic.
I know the solution is to use utf-8 but I don't know how to implement it on the code.
Your problem probably is not in this code, I suspect is in your processFacebookPageFeedStatus function. But when you are formatting your fields you'll want to make sure any that may contain unicode characters are all decoded (or encoded as appropriate) in utf-8.
import codecs
field_a = "some unicode text in here"
field_a.decode('utf-8') -----> \u1234\u........
field_a.encode('utf-8') -----> Back to original unicode
Your CSV probably doesn't support unicode, so you need to decode each field in your source data.
Debugging unicode is a pain, but there are a lot of SO posts about different problems related to encoding/decoding unicode
import sys
reload(sys).setdefaultencoding("utf-8")
I added this piece of code and it works fine when I open this file in pandas .
there are no other errors or what so ever for now
I have read the other posts on this, but my situation is a bit unique I think. I am trying to use python to read my grades off of the school's home access center website, but I think there is something peculiar in the way they have it programmed, here is the code that I am using:
import urllib
def WebLogin(password):
params = urllib.urlencode(
{'txtLogin': username,
'txtPassword': password })
f = urllib.urlopen("http://home.tamdistrict.org/homeaccess/Student/Assignments.aspx", params)
if "The following errors occurred while attempting to log in:" in f.read():
print "Login failed."
print f.read()
else:
print "Correct!"
print f.read()
It always prints "Correct" no matter what I enter for the username and password. Each f.read() returns only a blank line. I am really stuck here, thanks for all of your help!
urlopen returns a file-like object. In particular, you can only call read() once (with no arguments -- you can read in chunks by passing a size to read, but ya) -- subsequent calls to read() will return None because you've exhausted it (and unlike regular file objects, there is no seek method). You should store the result in a variable.
content = f.read()
if "The following errors occurred while attempting to log in:" in content:
print "Login failed."
print content
else:
print "Correct!"
print content
I'm trying to read & print the result from google's URL in GAE. When i run the first program, output was blank. then i have added a print statement before printing the url result and run it. Now i got the result.
Why the Program 1 doesn't give any output ?
Program 1
import urllib
class MainHandler(webapp.RequestHandler):
def get(self):
url = urllib.urlopen("http://www.google.com/ig/calculator?hl=en&q=100EUR%3D%3FAUD")
result = url.read()
print result
Program 2
import urllib
class MainHandler(webapp.RequestHandler):
def get(self):
# Print something before print urllib result
print "Result -"
url = urllib.urlopen("http://www.google.com/ig/calculator?hl=en&q=100EUR%3D%3FAUD")
result = url.read()
print result
You're using print from inside a WSGI application. Never, ever use print from inside a WSGI application.
What's happening is that your text is being output in the place where the webserver expects to see headers, so your output is not displayed as you expect.
Instead, you should use self.response.out.write() to send output to the user, and logging.info etc for debugging data.
I met this issue before. But cannot find an exactly answer about it yet.
maybe the cache mechanism cause this issue, not sure.
You need do flush output to print the data:
import sys
sys.stdout.flush()
or just do like the way you did:
print "*" * 10
print data
I think you'll like logging when you are debugging:
logging.debug('A debug message here')
or
logging.info('The result is: %s', yourResultData)
I've got a program I would like to use to input a password and one or multiple strings from a web page. The program takes the strings and outputs them to a time-datestamped text file, but only if the password matches the set MD5 hash.
The problems I'm having here are that
I don't know how to get this code on the web. I have a server, but is it as easy as throwing pytext.py onto my server?
I don't know how to write a form for the input to this script and how to get the HTML to work with this program. If possible, it would be nice to make it a multi-line input box... but it's not necessary.
I want to return a value to a web page to let the user know if the password authenticated successfully or failed.
dtest
import sys
import time
import getopt
import hashlib
h = hashlib.new('md5')
var = sys.argv[1]
print "Password: ", var
h.update(var)
print h.hexdigest()
trial = h.hexdigest()
check = "86fe2288ac154c500983a8b89dbcf288"
if trial == check:
print "Password success"
time_stamp = time.strftime('%Y-%m-%d_%H-%M-%S', (time.localtime(time.time())))
strFile = "txt_" + str(time_stamp) + ".txt"
print "File created: txt_" + str(time_stamp) + ".txt"
#print 'The command line arguments are:'
#for i in sys.argv:
#print i
text_file = open(strFile, "w")
text_file.write(str(time_stamp) + "\n")
for i in range(2, len(sys.argv)):
text_file.write(sys.argv[i] + "\n")
#print 'Debug to file:', sys.argv[i]
text_file.close()
else:
print "Password failure"
You'll need to read up on mod_python (if you're using Apache) and the Python CGI module.
Take a look at django. It's an excellent web framework that can accomplish exactly what you are asking. It also has an authentication module that handles password hashing and logins for you.