How to diff between two HTML codes? - python

I need to run a diff mechanism on two HTML page sources to kick out all the generated data (like user session, etc.).
I wondering if there is a python module that can do that diff and return me the element that contains the difference (So I will kick him in the rest of my code in another sources)

You can use the difflib module. It's available as a part of standard python library.

Related

integrating a web server into a python script

I have written a program to generate sequences that pass certain filters (the exact sequences etc don't matter). Each sequence is generated by making a random string of 40 characters made up of C, G, T or A. When each string is generated, it is put through a set of filters, and if it passes the filters it is saved to a list.
I am trying to make one of those filters include an online tool, BPROM, which doesn't appear to have a python library implementation. This means I will need to get my python script to send the sequence string described above to the online tool, and save the output as a python variable.
My question is, if I have a url to the tool (http://www.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb), how can I interface my script that generates the sequences, with the online tool - is there a way to send data to the web tool and save the tool's output as a variable? I've been looking into requests but i'm not sure it is the right way to approach this (as a massive python/coding noob).
Thanks for reading, I'm a bit brain dead so I hope this made sense :P
Of course, you can use requests or urllib
Here is demo code:
with urllib.request.urlopen('http://www.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb') as response:
html = response.read()

Converting Python scripts to APIs

I have a Python script that extracts certain consumer product aspects from customer reviews using LinearSVC, but I am trying to convert this script into some sort of API to use for new reviews. Is there an easy way to do this? I am very new to the whole concept of APIs.
An API is just a library you import once it's reachable by the interpreter in your case. So any import in python is you calling on an library/API.
So if you're script is called foobar.py for example, if it is in the same directory as other python files using
import foobar
at the top of your python file should allow you to reference any functions made in your original python script.

Collect calls and save them to csv

Is it possible if I have a list of url parse them in python and take this server calls key/values without need to open any browser manually and save them to a local file?
The only library I found for csv is pandas but anything for the first part. Any example will be perfect for me.
You can investigate the use of one of the built in or available libraries that let python actually perform the browser like operations and record the results, filter them and then use the built in csv library to output the results.
You will probably need one of the lower level libraries:
urllib/urllib2/urllib3
And you may need to override, one or more, of the methods to record the transaction data that you are looking for.

Can I use doxygen to document a command-line program?

I contribute to a large code project that uses Doxygen to document a series of C libraries. We are also starting to use doxygen with doxypy for associated python modules.
Is there an easy way to document command-line programs (in python or C), and their command line options, (automatically) using doxygen?
In order to generate man pages, you need to set GENERATE_MAN tag to Yes (.doxyfile).
By default, a sub-folder named man is created within the directory provided using OUTPUT_DIRECTORY to contain the pages generated.
By doing that, doxygen will render all the markup you added to the source code as a man page (one page for each translation unit).
At this point, you might want to exclude certain parts you want to ignore (I assume you are interested in showing only how to call the main) using the exclude* directives.
I advise you to compile two different doxyfiles: one for internal usage (complete javadoc-like documentation), the other for producing the program man and the like.
Of course, you will not get the expected result at the first try and you might need to play with doxygen markup a bit.

What is the easiest way to compare two web pages using python?

Hello I want to Compare two webpages using python script.
how can i achieve it? thanks in advance!
First, you want to retrieve both webpages. You can use wget, urlretrieve, etc.:
wget Vs urlretrieve of python
Second, you want to "compare" the pages. You can use a "diff" tool as Chinmay noted. You can also do a keyword analysis of the two pages:
Parse all keywords from page. e.g. How do I extract keywords used in text?
Optionally take the "stem" of the words with something like:
http://pypi.python.org/pypi/stemming/1.0
Use some math to compare the two pages' keywords, e.g. term frequency–inverse document frequency: http://en.wikipedia.org/wiki/Tf%E2%80%93idf with some of the python tools out there like these: http://wiki.python.org/moin/InformationRetrieval
What do you mean by compare? If you just want to find the differences between two files, try difflib, which is part of the standard Python library.

Categories