I'm looking for a python wrapper for current Twitter API just to perform simple searches, but the wrappers I've tried (twython, python-twitter, etc.) are all outdated, and cannot perform search with under current Twitter.
Is there a python wrapper that is updated to work with current twitter?
If you need only searches — take one of the python's OAuth libs (this one for example), and make calls yourself. For small subset of requests it would be faster solution then searching and evaluating different libraries.
I used
http://pypi.python.org/pypi/python-twitter/
about a year or two ago and it's still in the same version 0.8.2. It served well for all I needed at that point - just posting tweets. If you're about to do something simple, it may suffice.
I figured, to perform a simple search,
it's better not to go through all the wrappers...
You can simply use http GET search as follows:
import urllib2
parent = urllib2.urlopen("https://api.twitter.com/1/statuses/show.json?id=291095878196920320&include_entities=true").read()
you can of course write it to file, etc.
Related
I have the problem that for a project I need to work with a framework (Python), that has a poor documentation. I know what it does since it is the back end of a running application. I also know that no framework is good if the documentation is bad and that I should prob. code it myself. But, I have a time constraint. Therefore my question is: Is there a cooking recipe on how to understand a poorly documented framework?
What I tried until now is checking some functions and identify the organizational units in the framework but I am lacking a system to do it more effectively.
If I were you, with time constaraints, and bound to use a specific framework. I'll go in the following manner:
List down the use cases I desire to implement using the framework
Identify the APIs provided by the framework that helps me implement the use cases
Prototype the usecases based on the available documentation and reading
The prototyping is not implementing the entire use case, but to identify the building blocks around the case and implementing them. e.g., If my usecase is to fetch the Students, along with their courses, and if I were using Hibernate to implement, I would prototype the database accesss, validating how easily am I able to access the database using Hibernate, or how easily I am able to get the relational data by means of joining/aggregation etc.
The prototyping will help me figure out the possible limitations/bugs in the framework. If the limitations are more of show-stoppers, I will implement the supporting APIs myself; or I can take a call to scrap out the entire framework and write one for myself; whichever makes more sense.
You may also use python debugging library: pdb. After importing it with import pdb you may set traces in the body of functions and classes pdb.set_trace(). Then it will stop the execution of the program in the line and you may look at existing variables and processes.
My answer is quite specific to the Betfair API however I would like to know how to use more APIs in general. I'm quite new to this sort of thing so don't really know how it works. I've just downloaded this package: https://github.com/jmcarp/betfair.py
My question is, how am I supposed to know the functions that come associated with it? How am I supposed to be able to know how to pull the data that I want from any given website without having any resource describing the functionality of the API?
These library is only a binding (with several errors) to Betfair APIs.
You can find documentation about this API, and therefore about this library, in Developer's web.
If you're interested in how to use Betfair APIs in Python you can take a look at Betfair's code samples.
I am trying to do some research on the wikipedia data, I am good at Python.
I came across this library, seems nice: https://pypi.python.org/pypi/wikipedia/
I don't want to hit wikipedia directly as this is slow, and also I am trying to access a lot of data and might run into their API limits.
Can I somehow hack this to make it access a local instance of wikipedia data. I know I can run a whole wikipedia server and try to do that, but that seems a round about way.
Is there a way to just point to the folder and get this library to work as it does. Or are you aware of any other libraries that do this?
thank you.
I figured out what I need. I think I shouldn't be searching for API, what I am looking for is a parser. Here are a couple options I have narrowed down so far. Both seem like solid starting points.
wikidump:
https://pypi.python.org/pypi/wikidump/0.1.2
mwlib:
https://pypi.python.org/pypi/mwlib/0.15.14
Update: While these are good parsers for wikipedia data, I found them too limiting in one way or the other, not to mention the lack of documentation. So I eventually went with good old python ElementTree and directly work with the XML.
I would appreciate some help here.
Google checkout has many ways to send it checkout data. I am using the XML server-to-server.
I have everything ready and now I want to throw some xml at google. I have been doing some reading and I know of a couple of ways to do this, one with urllib, another with pyCurl, but I am using django over here and I searched the Django api for some way to POST data to another site and I havent fallen upon anything. I really would like to use the django way, if there is one because I feel it would be more fluid and right, but if you all don't know of any way I will probably use urllib.
urllib2 is the appropriate way to post data if you're looking for python standard library. Django doesn't provide a specific method to do this (as well as it shouldn't). Django goes out of it's way to not simply reinvent tools that already exist in the the standard library (except email...), so you should never really fear using something out of the python standard library.
requests is also great, but not standard library. Nothing wrong with that though.
I'm writing a spider in Python to crawl a site. Trouble is, I need to examine about 2.5 million pages, so I could really use some help making it optimized for speed.
What I need to do is examine the pages for a certain number, and if it is found to record the link to the page. The spider is very simple, it just needs to sort through a lot of pages.
I'm completely new to Python, but have used Java and C++ before. I have yet to start coding it, so any recommendations on libraries or frameworks to include would be great. Any optimization tips are also greatly appreciated.
You could use MapReduce like Google does, either via Hadoop (specifically with Python: 1 and 2), Disco, or Happy.
The traditional line of thought, is write your program in standard Python, if you find it is too slow, profile it, and optimize the specific slow spots. You can make these slow spots faster by dropping down to C, using C/C++ extensions or even ctypes.
If you are spidering just one site, consider using wget -r (an example).
Where are you storing the results? You can use PiCloud's cloud library to parallelize your scraping easily across a cluster of servers.
As you are new to Python, I think the following may be helpful for you :)
if you are writing regex to search for certain pattern in the page, compile your regex wherever you can and reuse the compiled object
BeautifulSoup is a html/xml parser that may be of some use for your project.
Spidering somebody's site with millions of requests isn't very polite. Can you instead ask the webmaster for an archive of the site? Once you have that, it's a simple matter of text searching.
You waste a lot of time waiting for network requests when spidering, so you'll definitely want to make your requests in parallel. I would probably save the result data to disk and then have a second process looping over the files searching for the term. That phase could easily be distributed across multiple machines if you needed extra performance.
What Adam said. I did this once to map out Xanga's network. The way I made it faster is by having a thread-safe set containing all usernames I had to look up. Then I had 5 or so threads making requests at the same time and processing them. You're going to spend way more time waiting for the page to DL than you will processing any of the text (most likely), so just find ways to increase the number of requests you can get at the same time.