What's the best tool to parse log files? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I use grep to parse through my trading apps logs, but it's limited in the sense that I need to visually trawl through the output to see what happened etc.
I'm wondering if Perl is a better option? Any good resources to learn log and string parsing with Perl?
I'd also believe that Python would be good for this. Perl vs Python vs 'grep on linux'?

In the end, it really depends on how much semantics you want to identify, whether your logs fit common patterns, and what you want to do with the parsed data.
If you can use regular expressions to find what you need, you have tons of options. Perl is a popular language and has very convenient native RE facilities. I personally feel a lot more comfortable with Python and find that the little added hassle for doing REs is not significant.
If you want to do something smarter than RE matching, or want to have a lot of logic, you may be more comfortable with Python or even with Java/C++/etc. For instance, it is easy to read line-by-line in Python and then apply various predicate functions and reactions to matches, which is great if you have a ruleset you would like to apply.

All scripting languages are good candidates: Perl, Python, Ruby, PHP, and AWK are all fine for this. Using any one of these languages are better than peering at the logs starting from a (small) size.
Wearing Ruby Slippers to Work is an example of doing this in Ruby, written in Why's inimitable style. Here's a basic example in Perl. I suggest you choose one of these languages and start cracking.

A big advantage Perl has over Python is that when parsing text is the ability to use regular expressions directly as part of the language syntax. For example:
if ($line =~ m/^Regex/) {
... code goes here
}
Perl also assigns capture groups directly to $1, $2, etc, making it very simple to work with. Depending on the format and structure of the logfiles you're trying to parse, this could prove to be quite useful (or, if it can be parsed as a fixed width file or using simpler techniques, not very useful at all).
It's all just syntactic sugar, really, and other languages also allow you use regular expressions and capture groups (indeed, the linked article shows how to do it in Python). You just have to write a bit more code and pass around objects to do it.

There's a Perl program called Log_Analysis that does a lot of analysis and preprocessing for you.

Learning a programming language will let you take you log analysis abilities to another level.
Any dynamic or "scripting" language like Perl, Ruby or Python will do the job. What you should use really depends on external factors. Among the things you should consider:
does work already use a suitable
langauge?
do you know anyone who can
mentor you in a suitable language?
try each language a little and see which language fits you better.
Personally, for the above task I would use Perl. YMMV.
Several reasons to like Perl:
Powerful one-liners - if you need to do a real quick, one-off job, Perl offers some really great short-cuts. See perlrun -n for one example
Multi-paradigm language - Perl has support for imperative, functional and object-oriented programming methodologies.
Sigils - those leading punctuation characters on variables like $foo or #bar. They are a bit like hungarian notation without being so annoying.
Moose - an incredible new OOP system that provides powerful new OO techniques for code composition and reuse.
Strictures - the use strict pragma catches many errors that other dynamic languages gloss over at compile time. I miss it terribly when I use Python or PHP.
Self-discipline - Perl gives you the freedom to write and do what you want, when you want. This means that you have to learn to write clean code or you will hurt. Fortunately, there are tools to help a beginner. Perl::Critic does lint-like analysis of code for best practices.

I find this list invaluable when dealing with any job that requires one to parse with python.
I wouldn't use perl for parsing large/complex logs - just for the readability (the speed on perl lacks for me (big jobs) - but that's probably my perl code (I must improve)).
However if grep suits your needs perfectly for now - there really is no reason to get bogged down in writing a full blown parser. Simplest solution is usually the best, and grep is a fine tool.

Another possible interpretation of your question is "Are there any tools that make log monitoring easier?", and to answer that I would suggest you have a look at Splunk or maybe Log4view.

on linux, you can use just the shell(bash,ksh etc) to parse log files if they are not too big in size. The other tools to go for are usually grep and awk. However, for more programming power, awk is usually used. If you have big files to parse, try awk.
Of course, Perl or Python or practically any other languages with file reading and string manipulation capabilities can be used as well.

try Nagios Log Monitoring
The reason this tool is the best for your purpose is this:
It requires no installation of foreign packages. Which means, there's no need to install any perl dependencies or any silly packages that may make you nervous.
There is little to no learning curve. You don't need to learn any programming languages to use it. All you need to do is know exactly what you want to do with the logs you have in mind, and read the pdf that comes with the tool.
If the log you want to parse is in a syslog format, you can use a command like this:
./NagiosLogMonitor 10.20.40.50:5444 logrobot autofig /opt/jboss/server.log 60m 'INFO' '.' 1 2 -show
Even if your log is not in a recognized format, it can still be monitored efficiently with the following command:
./NagiosLogMonitor 10.20.40.50:5444 logrobot autonda /opt/jboss/server.log 60m 'INFO' '.' 1 2 jbosslogs -ndshow
To parse a log for specific strings, replace the 'INFO' string with the patterns you want to watch for in the log. If you want to search for multiple patterns, specify them like this 'INFO|ERROR|fatal'.
If efficiency and simplicity (and safe installs) are important to you, this Nagios tool is the way to go.

Related

What should I know about Python to identify comments in different source files?

I have a need to identify comments in different kinds of source files in a given directory. ( For example java,XML, JavaScript, bash). I have decided to do this using Python (as an attempt to learn Python). The questions I have are
1) What should I know about python to get this done? ( I have an idea that Regular Expressions will be useful but are there alternatives/other modules that will be useful? Libraries that I can use to get this done?)
2) Is Python a good choice for such a task? Will some other language make this easier to accomplish?
Your problem seems to be more related to programming language parsing. I believe with regular expressions you will be able to find comments in most of the languages. The good thing is that you have regular expressions almost everywhere: Perl, Python, Ruby, AWK, Sed, etc.
But, as the other answer said, you'd better use some parsing machinery. And, if not a full blown parser, a lexer. For Python, check out the Pygments library, which has lexers for many languages already implemented.
1) What you need to know about is parsing, not regex. Additionally you will need the os module and some knowledge about pythons file handling. DiveIntoPython (http://www.diveintopython.net/) is a good start here. I'd recommend chapter 6. (And maybe 1-5 as well :) )
2) Python is a good start. Another language is not going to make it easier, but different. Python allready is pretty simple to start with.
I would recommend not to use regex for your task, as it is as simple as searching for comment signs and linefeeds.
The pyparsing module directly supports several styles of comments. E.g.,
from pyparsing import javaStyleComment
for match in javaStyleComment.scanString(text):
<do stuff>
So if your goal is just getting the job done, look into this since the comment parsers are likely to be more robust than anything you'd throw together. If you're more interested in learning to do it yourself, this might be too much processed food for your taste.

Are there programmatic tools for Perl to Python conversion? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
In my new job more people are using Python than Perl, and I have a very useful API that I wrote myself and I'd like to make available to my co-workers in Python.
I thought that a compiler that compiled Perl code into Python code would be really useful for such a task. Before trying to write something that parsed Perl (or at least, the subset of Perl that I've used in defining my API), I came across bridgekeeper from a consultancy.
It's almost certainly not worth the money for me to engage a consultancy to translate this API, but that's a really interesting tool.
Does anyone know of a compiler that will parse (or try to parse!) Perl5 code and compile it into Python? If there isn't such a thing, how should I start writing a simple compiler that parses my object-oriented Perl code and turns it into Python? Is there an ANTLR or YACC grammar that I can use as a starting point?
Edit: I found perl.y, which might be a starting point if I were to roll my own compiler.
James,
I recommend you to just rewrite the module in Python, for several reasons:
Parsing Perl is DARN HARD. Unless this is an important and desirable exercise for you, you'll find yourself spending much more time on the translation than on useful work.
By rewriting it, you'll have a great chance to practice Python. Learning is best done by doing, and having a task you really need done is a great boon.
Finally, Python and Perl have quite different philosophies. To get a more Pythonic API, it's best to just rewrite it in Python.
I think you should rewrite your code. The quality of the results of a parsing effort depends on your Perl coding style.
I think the quote below sums up the theoretical side very well.
From Wikipedia:Perl in Wikipedia
Perl has a Turing-complete grammar because parsing can be affected by run-time code executed during the compile phase.[25] Therefore, Perl cannot be parsed by a straight Lex/Yacc lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language.
It is often said that "Only perl can parse Perl," meaning that only the Perl interpreter (perl) can parse the Perl language (Perl), but even this is not, in general, true. Because the Perl interpreter can simulate a Turing machine during its compile phase, it would need to decide the Halting Problem in order to complete parsing in every case. It's a long-standing result that the Halting Problem is undecidable, and therefore not even Perl can always parse Perl. Perl makes the unusual choice of giving the user access to its full programming power in its own compile phase. The cost in terms of theoretical purity is high, but practical inconvenience seems to be rare.
Other programs that undertake to parse Perl, such as source-code analyzers and auto-indenters, have to contend not only with ambiguous syntactic constructs but also with the undecidability of Perl parsing in the general case. Adam Kennedy's PPI project focused on parsing Perl code as a document (retaining its integrity as a document), instead of parsing Perl as executable code (which not even Perl itself can always do). It was Kennedy who first conjectured that, "parsing Perl suffers from the 'Halting Problem'."[26], and this was later proved.[27]
Starting in 5.10, you can compile perl with the experimental Misc Attribute Decoration enabled and set the PERL_XMLDUMP environment variable to a filename to get an XML dump of the parse tree (including comments - very helpful for language translators). Though as the doc says, this is a work in progress.
I never tried it and it seems unmaintained, but maybe PyPerl is an option?
How big is this API? If it really this useful then why don't you rewrite it in python. Writing an automatic converter will probably take longer then rewriting the API.
And even if you manage to automatically rewrite it, the resulting code probably won't be very pythonic anyway.
Be sure to check out the answers by weismat and eliben
As much as it might be fun to convert it to or rewrite it in python, I wouldn't make either of those my first choice. Then you'd be stuck with a forked code base. Any modifications you make will have to be duplicated.
Write some sort of wrapper for your API that you can access from outside of Perl. One possibility is a RESTful interface. Another, if you don't want to deal with networking issues, is to create a set of command line tools that access the API (possibly passing information as JSON). Then you can write an easy python library which accesses the wrapper API using httplib2 or subprocess (depending on how you've implemented the wrapper).
You'll still have to update the Python API whenever the interface changes, but now it's only for interface changes.
You could try writing a parser with PPI, dump it to some intermediary form and write Python mecanically from there. Hard, but doable. Useful? Er....
Or you could port your code to Perl 6, wait to Pynie to be ready enough to allow direct call from Python to Perl6 within the same runtime! It's not that far away after all. Too bad Ponie's dead though.
https://perthon.sourceforge.net could probably work? While it is still in alpha, I see a lot of potential.

Next step after PHP: Perl or Python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
It might seem it has been asked numerous times, but in fact it hasn't. I did my research, and now I'm eager to hear others' opinions.
I have experience with PHP 5, both with functional and object oriented programming methods. I created a few feature-minimalistic websites.
Professionals may agree about PHP not being a programming language that encourages good development habits. (I believe it's not the task of the tool, but this doesn't matter.) Furthermore, its performance is also controversial and often said to be poor compared to competitors.
In the 42nd podcast at Stack Overflow blog a developer from Poland asked what language he should learn in order to improve his skills. Jeff and Joel suggested that every one of them would help, altough there are specific ones that are better in some ways.
Despite they made some great points, it didn't help me that much.
From a beginner point of view, there are not one may not see (correction suggested by S. Lott) many differences between Perl & Python. I would like You to emphasize their strenghts and weaknesses and name a few unique services.
Of course, this wouldn't be fair as I could also check both of them. So here's my wishlist and requirements to help You help me.
First of all, I'd like to follow OOP structures and use it fundamentally. I partly planned a multiuser CMS using MySQL and XML, so the greater the implementations are, the better. Due to its foreseen nature, string manipulation will be used intensively.
If there aren't great differences, comparisons should probably mention syntax and other tiny details that don't matter in the first place.
So, here's my question: which one should I try first -- Perl || Python?
Conclusion
Both Perl and Python have their own fans, which is great. I'd like to say I'm grateful for all participation -- there is no trace of any flame war.
I accepted the most valued answer, although there are many great mini-articles below. As suggested more often, I will go with Python first. Then I'll try Perl later on. Let me see which one fits my mind better.
During the development of my special CMS, I'm going to ask more regarding programming doubts -- because developers now can count on each other! Thank you.
Edit: There were some people suggesting to choose Ruby or Java instead. Java has actually disappointed me. Maybe it has great features, maybe it hasn't. I wouldn't enjoy using it.
In addition, I was told to use Ruby. So far, most of the developers I communicate with have quite bad opinion about Ruby. I'll see it myself, but that's the last element on my priority list.
Perl is a very nice language and CPAN has a ton of mature modules that will save you a lot of time. Furthermore, Perl is really moving forwards nowadays with a lot of interesting projects (unlike what uninformed fanboys like to spread around). Even a Perl 6 implementation is by now releasing working Perl 6.
I you want to do OO, I would recommend Moose.
Honestly, the "majority" of my programming has been in Perl and PHP and I recently decided to do my latest project in Python, and I must admit it is very nice to program with. I was hesitant of the whole no curly braces thing as that's what I've always done, but it is really very clean. At the end of the day, though, you can make good web applications with all 3, but if you are dead-set on dropping PHP to try something new I would recommend Python and the Django framework.
I'd go with Perl. Not everyone will agree with me here, but it's a great language well suited to system administration work, and it'll expose you to some more functional programming constructs. It's a great language for learning how to use the smallest amount of code for a given task, as well.
For the usage scenario you mentioned though, I think PHP may be your best bet still. Python does have some great web frameworks, however, so if you just want to try out a new language for developing web applications, Python might be your bet.
I have no experience with Python. I vouch strongly to learn Perl, not out of attrition, but because there is a TON to learn in the platform. The key concepts of Perl are: Do What I Mean (DWIM) and There's More Than One Way To Do It (TMTOWTDI). This means, hypothetically there's often no wrong way to approach a problem if the problem is adequately solved.
Start with learning the base language of Perl, then extend yourself to learning the key Perl modules, like IO::File, DBI, HTML::Template, XML::LibXML, etc. etc. search.cpan.org will be your resource. perlmonks.org will be your guide. Just about everything useful to do will likely have a module published.
Keep in mind that Perl is a dynamic and loosely structured language. Perl is not the platform to enforce draconian OOP standards, but for good reason. You'll find the language extremely flexible.
Where is Perl used? System Admins use it heavily, as already mentioned. You can still do excellent web apps either by simple CGI or MVC framework.
I haven't worked with Python much, but I can tell why I didn't like about Perl when I used it.
OO support feels tacked on. OO in perl is very different from OO support in the other languages I've used (which include things like PHP, Java, and C#)
TMTOWTDI (There's More Than One Way To Do It). Good idea in theory, terrible idea in practice as it reduces code readability.
Perl uses a lot of magic symbols.
Perl doesn't support named function arguments, meaning that you need to dig into the #_ array to get the arguments passed to a function (or rather, a sub as perl doesn't have the function keyword). This means you'll see a lot of things like the example below (moved 'cause SO doesn't like code in numbered lists)
Having said all that, I'd look into Python. Unless you want to go with something heavier-weight like C++ or C#/Java.
Oh, before I forgot: I wanted to put an example for 4 above, but SO doesn't like putting code in numbered lists:
sub mySub {
#extremely common to see in Perl, as built-ins operators operate on the $_ scalar or #_ array implicitly
my $arg1 = shift;
my $arg2 = shift;
}
I recently made the step from Perl over to Python, after a couple of Perl-only years. Soon thereafter I discovered I had started to read through all kinds of Python-code just as it were any other easy to read text — something I've never done with Perl. Having to delve into third-party Perl code has always been kind of a nightmare for me, so this came as a very very nice surprise!
For me, this means Python-code is much easier to maintain, which on the long run makes Python much more attractive than Perl.
Python is clean and elegant, and the fact that LOTS of C APIs have been wrapped, gives you powerful hooks to much. I also like the "Zen of Python".
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to
do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
As a Perl programmer, I would normally say Perl. But coming from PHP, I think Perl is too similar and you won't actually get that much out of it. (Not because there isn't a lot to learn, but you are likely to program in Perl using the same style as you program in PHP.)
I'd suggest something completely different: Haskell (suggested by Joel), Lisp, Lua, JavaScript or C. Any one of these would make you a better programmer by opening up new ways of looking at the world.
But there's no reason to stop learning PHP in the meantime.
For a good look at the dark side of these languages, I heartily recommend: What are five things you hate about your favorite language?
I suggest going through a beginner tutorial of each and decide for yourself which fits you better. You'll find you can do what you need to do in either:
Python Tutorial (Python Classes)
Perl Tutorial (Perl Classes)
(Couldn't find a single 'official' perl tutorial, feel free to suggest one)
In my experience python provides a cleaner, more straight-forward experience.
My issues with perl:
'use strict;', Taint, Warnings? - Ideally these shouldn't be needed.
Passing variables: #; vs. $, vs shift
Scoping my, local, ours? (The local defintion seems to particularly point out some confusion with perl, "You really probably want to be using my instead, because local isn't what most people think of as "local".".)
In general with my perl skills I still find my self referencing documentation for built-in features. Where as in python I find this less so. (I've worked in both roughly the same amount of time, but my general programming expereince has grown with time. In other words, I'd probably be a better perl programmer now)
If your a unix command line guru though, perl may come more naturally to you. Or, if your using it mainly as a replacement or extension to command line admin tasks, it may suit your needs fine. In my opinion perl is "faster on the draw" at the command line than python is.
Why isn't there Ruby on your list? Maybe you should give it a try.
"I'd like to follow OOP structure..." advocates for Python or, even more so if you're open, Ruby. On the other hand, in terms of existing libraries, the order is probably Perl > Python >> Ruby. In terms of your career, Perl on your resume is unlikely to make you stand out, while Python and Ruby may catch the eye of a hiring manager.
As a PHP programmer, you are probably going to see all 3 as somewhat "burdensome" to get a Web page up. All have good solutions for Web frameworks, but none is quite as focussed on rendering a Web page as is PHP.
I think that Python is quite likely to be a better choice for you than Perl. It has many good resources, a large community (although not as large as Perl, probably), "stands out" a little on a resume, and has a good reputation.
If those 2 are your only choices, I would choose Python.
Otherwise you should learn javascript.
No I mean really learn it...
If you won't be doing web development with this language, either of them would do. If you are, you may find that doing web development in perl is a bit more complicated, since all of the frameworks require more knowledge of the language.
You can do nice things in both, but my opinion is that perl allows more rapid development. Also, perl's regexes rock!
Every dynamic language is from same family. It does not matter Which is the tool you work with it matter how you do..
PHP VS PYTHON OT PERL OR RUBY? Stop it
As many comments mentioned python is cleaner well sometime whose curly brackets are use full to. You just have to practice.

Is there a library that will detect the source code language of a block of code? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Writing a python script and it needs to find out what language a block of code is written in. I could easily write this myself, but I'd like to know if a solution already exists.
Pygments is insufficient and unreliable.
Pygments can guess too. Here is an example from the documentation:
>>> from pygments.lexers import guess_lexer, guess_lexer_for_filename
>>> guess_lexer('#!/usr/bin/python\nprint "Hello World!"')
<pygments.lexers.PythonLexer>
>>> guess_lexer_for_filename('test.py', 'print "Hello World!"')
<pygments.lexers.PythonLexer>
I guess you should try what this very site uses: google-code-prettify (from this question)
[EDIT]J.F. Sebastian pointed me to Pygments (see this answer)
This can be a little difficult to do reliably. For example, what language is the following:
print("blah");
The most reliable way (aside from having the user select the correct language, of course) is to check if the first line is starts with #! ("hashbang") - whatever is after this is the intepreter for the scripting language.
That will work reliably for a lot of scripting languages (including python, shell scripting, perl, ruby etc etc..), but not for compiled languages..
You could look for unique syntax stylings, or specific keywords and weight each one towards a specific language. For example $#somevar is probably Perl. somevar.each do |another| ..... end is probably ruby.. but this would end up being a lot of work, and will not always work (especially with short code blocks)
The other obvious way is to use the file-extension. If it's *.pl it's probably Perl code..
What are you trying to achieve? If you want to syntax highlight, look at what google-code-prettify does - basically a reasonably intelligent, generic syntax highlighter..
In the above above ambiguous example, print is probably a statement or function name, "blah" is probably a string. If you highlight those two differently, you've successfully highlighted a lot of different languages, without having to detect what one it actually is.. but that may not always work, depending on the task..
Ohcount has been developed for this exactly:
http://labs.ohloh.net/ohcount
They are using it at www.ohloh.net to count the contribution of people in languages.
The bad news is that it is coded in ruby, but I am sure that you can integrate it one way or the other in python.
Since you asked this question, GitHub have released the code they use to detect programming languages, Linguist. In my experience, GitHub is very accurate.
Language detection
Linguist defines the list of all languages known to GitHub in a yaml file. In order for a file to be highlighted, a language and lexer must be defined there.
Most languages are detected by their file extension. This is the fastest and most common situation.
For disambiguating between files with common extensions, we use a bayesian classifier. For an example, this helps us tell the difference between .h files which could be either C, C++, or Obj-C.
Ruby gem: http://rubygems.org/gems/github-linguist
If you can't use Ruby for whatever reason, the logic is simple enough to port https://github.com/github/linguist/blob/master/lib/linguist/language.rb
Vim uses a bunch of interesting tests and regular expressions to look for certain file formats. You can look at the vim instruction file at vim/vim71/filetype.vim, or here online.
what language a block of code is written in
What are your alternatives, among what languages? There is no way to determine this universally. But if you narrow your focus there is probably a tool somewhere
You can check highlight.js which automatically highlights the code block, they say they are using some kind of heuristic methods to accomplish this http://softwaremaniacs.org/soft/highlight/en/
As other have said Pygments will be your best bet.

Which of these scripting languages is more appropriate for pen-testing? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
First of all, I want to avoid a flame-war on languages. The languages to choose from are Perl, Python and Ruby . I want to mention that I'm comfortable with all of them, but the problem is that I can't focus just on one.
If, for example, I see a cool Perl module, I have to try it out. If I see a nice Python app, I have to know how it's made. If I see a Ruby DSL or some Ruby voodoo, I'm hooked on Ruby for a while.
Right now I'm working as a Java developer, but plan on taking CEH in the near future. My question is: for tool writing and exploit development, which language do you find to be the most appropriate?
Again, I don't want to cause a flame-war or any trouble, I just want honest opinions from scripters that know what they're doing.
One more thing: maybe some of you will ask "Why settle on one language?". To answer this: I would like to choose only one language, in order to try to master it.
You probably want Ruby, because it's the native language for Metasploit, which is the de facto standard open source penetration testing framework. Ruby's going to give you:
Metasploit's framework, opcode and shellcode databases
Metasploit's Ruby lorcon bindings for raw 802.11 work.
Metasploit's KARMA bindings for 802.11 clientside redirection.
Libcurl and net/http for web tool writing.
EventMachine for web proxy and fuzzing work (or RFuzz, which extends the well-known Mongrel webserver).
Metasm for shellcode generation.
Distorm for x86 disassembly.
BinData for binary file format fuzzing.
Second place here goes to Python. There are more pentesting libraries available in Python than in Ruby (but not enough to offset Metasploit). Commercial tools tend to support Python as well --- if you're an Immunity CANVAS or CORE Impact customer, you want Python. Python gives you:
Twisted for network access.
PaiMei for program tracing and programmable debugging.
CANVAS and Impact support.
Dornseif's firewire libraries for remote debugging.
Ready integration with WinDbg for remote Windows kernel debugging (there's still no good answer in Ruby for kernel debugging, which is why I still occasionally use Python).
Peach Fuzzer and Sully for fuzzing.
SpikeProxy for web penetration testing (also, OWASP Pantera).
Unsurprisingly, a lot of web work uses Java tools. The de facto standard web pentest tool is Burp Suite, which is a Java swing app. Both Ruby and Python have Java variants you can use to get access to tools like that. Also, both Ruby and Python offer:
Direct integration with libpcap for raw packet work.
OpenSSL bindings for crypto.
IDA Pro extensions.
Mature (or at least reasonable) C foreign function interfaces for API access.
WxWindows for UI work, and decent web stacks for web UIs.
You're not going to go wrong with either language, though for mainstream pentest work, Metasploit probably edges out all the Python benefits, and at present, for x86 reversing work, Python's superior debugging interfaces edge out all the Ruby benefits.
Also: it's 2008. They're not "scripting languages". They're programming languages. ;)
[Disclaimer: I am primarily a Perl programmer, which may be colouring my judgement. However, I am not a particularly tribal one, and I think on this particular question my argument is reasonably objective.]
Perl was designed to blend seamlessly into the Unix landscape, and that is why it feels so alien to people with a mainly-OO background (particularly the Java school of OOP). For that reason, though, it’s incredibly widely installed on machines with any kind of Unixoid OS, and many vendor system utilities are written in it. Also for the same reason, servers that have neither Python nor Ruby installed are still likely to have Perl on them, again making it important to have some familiarity with. So if your CEH activity includes extensive activity on Unix, you will have to have some amount of familiarity with Perl anyway, and you might as well focus on it.
That said, it is largely a matter of preference. There is not much to differentiate the languages; their expressive power is virtually identical. Some things are a little easier in one of the languages, some a little easier in another.
In terms of libraries I do not know how Ruby and Python compare against each other – I do know that Perl has them beat by a margin. Then again, sometimes (particularly when you’re looking for libraries for common needs) the only effect of that is that you get deluged with choices. And if you are only looking to do things in some particular area which is well covered by libraries for Python or Ruby, the mass of other stuff on CPAN isn’t necessarily an advantage. In niche areas, however, it matters, and you never know what unforeseen need you will eventually have (err, by definition).
For one-liner use on the command line, Python is kind of a non-starter.
In terms of interactive interpreter environment, Perl… uhm… well, you can use the debugger, which is not that great, or you can install one from CPAN, but Perl doesn’t ship a good one itself.
So I think Perl does have a very slight edge for your needs in particular, but only just. If you pick Ruby you’ll probably not be much worse off at all. Python might inconvenience you a little more noticeably, but it too is hardly a bad choice.
I could make an argument for all three :-)
Perl has all of CPAN - giving you a huge advantage in pulling together functionality quickly. It also has a nice flexible testing infrastructure that means you can plug lots of different automated testing styles (including tests in other languages) in the same framework.
Ruby is a lovely language to learn - and lacks some of the cruft in Perl 5. If you're doing web based testing it also has the watir library - which is trez useful (see http://wtr.rubyforge.org/)
Python - nice language and (while it's not to my personal preference) some folk find the way its structured easier to get to grips with.
Any of them (and many others) would be a great language to learn.
Instead of looking at the language - I'd look at your working environment. It's always easier to learn stuff if you have other folk around who are doing similar stuff. If you current dev/testing folk are already focussed on one of the above - I'd go for that. If not, pick the one that would be most applicable/useful to your current working environment. Chat to the rest of your team and see what they think.
That depends on the implementation, if it will be distributed I would go with Java, seeing as you know that, because of its portability. If it is just for internal use, or will be used in semi-controlled environments, then go with whatever you are the most comfortable maintaining, and whichever has the best long-term outlook.
Now to just answer the question, I would go with Perl, but I'm a linux guy so I may be a bit biased in this.
If you plan on using Metasploit for pen-testing and exploit development I would recommend ruby as mentioned previously Metasploit is written in ruby and any exploit/module development you may wish to do will require ruby.
If you will be using Immunity CANVAS for pen testing then for the same reasons I would recommend Python as CANVAS is written in python. Also allot of fuzzing frameworks like Peach and Sulley are written in Python.
I would not recommend Perl as you will find very little tools/scripts/frameworks related to pen testing/fuzzing/exploits/... in Perl.
As your question is "tool writing and exploit development" I would recommend Ruby if you choose Metasploit or python if you choose CANVAS.
hope that helps :)
Speaking as a CEH, learn the CEH material first. This will expose you to a variety of tools and platforms used to mount various kinds of attacks. Once you understand your target well, look into the capabilities of the tools and platforms already available (the previously mentioned metasploit framework is very thorough and robust). How can they be extended to meet your needs? Once you know that, you can compare the capabilities of the languages.
I would also recommend taking a look at the tools available on the BackTrack distro.
All of them should be sufficient for that. Unless you need some library that is only available in one language, I'd let personal preference guide me.
If you're looking for a scripting language that will play well with Java, you might want to look at Groovy. It has the flexibility and power of Perl (closures, built in regexes, associative arrays on every corner) but you can access Java code from it thus you have access to a huge number of libraries, and in particular the rest of the system you're developing.
metasploit is a great framework for penetration testing. It's mainly written in Ruby, so if you know that language well, maybe you can hook in there. However, to use metasploit, you don't need to know any language at all.
If you are interested in CEH, I'd take a look at Grey Hat Python. It shows some stuff that is pretty interesting and related.
That being said, any language should be fine.
Well, what kind of exploits are you thinking about? If you want to write something that needs low level stuff (ptrace, raw sockets, etc.) then you'll need to learn C. But both Perl and Python can be used. The real question is which one suits your style more?
As for toolmaking, Perl has good string-processing abilities, is closer to the system, has good support, but IMHO it's very confusing. I prefer Python: it's a clean, easy to use, easy to learn language with good support (complete language/lib reference, 3rd party libs, etc.). And it's (strictly IMHO) cool.
I'm with tqbf. I've worked with Python and Ruby. Currently I'm working with JRuby. It has all the power of Ruby with access to the Java libraries so if there is something you absolutely need a low-level language to solve you can do so with a high-level language. So far I haven't needed to really use much Java as Ruby has had the ability to do everything I've needed as an API tester.

Categories