I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been working on must not ever be inspected or understandable, for the security of various people.
Now this is a new subject to me, and the internet is not really resourceful for prevention against reverse engineering but rather depicts tons of information on how to reverse engineer
Some of the things I've thought of so far are:
Code injection (calling dummy functions before and after actual function calls)
Code obfustication (mangles the disassembly of the binary)
Write my own startup routines (harder for debuggers to bind to)
void startup();
int _start()
{
startup( );
exit (0)
}
void startup()
{
/* code here */
}
Runtime check for debuggers (and force exit if detected)
Function trampolines
void trampoline(void (*fnptr)(), bool ping = false)
{
if(ping)
fnptr();
else
trampoline(fnptr, true);
}
Pointless allocations and deallocations (stack changes a lot)
Pointless dummy calls and trampolines (tons of jumping in disassembly output)
Tons of casting (for obfuscated disassembly)
I mean these are some of the things I've thought of but they can all be worked around and or figured out by code analysts given the right time frame. Is there anything else alternative I have?
but they can all be worked around and or figured out by code analysists given the right time frame.
If you give people a program that they are able to run, then they will also be able to reverse-engineer it given enough time. That is the nature of programs. As soon as the binary is available to someone who wants to decipher it, you cannot prevent eventual reverse-engineering. After all, the computer has to be able to decipher it in order to run it, and a human is simply a slower computer.
What Amber said is exactly right. You can make reverse engineering harder, but you can never prevent it. You should never trust "security" that relies on the prevention of reverse engineering.
That said, the best anti-reverse-engineering techniques that I've seen focused not on obfuscating the code, but instead on breaking the tools that people usually use to understand how code works. Finding creative ways to break disassemblers, debuggers, etc is both likely to be more effective and also more intellectually satisfying than just generating reams of horrible spaghetti code. This does nothing to block a determined attacker, but it does increase the likelihood that J Random Cracker will wander off and work on something easier instead.
Safe Net Sentinel (formerly Aladdin). Caveats though - their API sucks, documentation sucks, and both of those are great in comparison to their SDK tools.
I've used their hardware protection method (Sentinel HASP HL) for many years. It requires a proprietary USB key fob which acts as the 'license' for the software. Their SDK encrypts and obfuscates your executable & libraries, and allows you to tie different features in your application to features burned into the key. Without a USB key provided and activated by the licensor, the software can not decrypt and hence will not run. The Key even uses a customized USB communication protocol (outside my realm of knowledge, I'm not a device driver guy) to make it difficult to build a virtual key, or tamper with the communication between the runtime wrapper and key. Their SDK is not very developer friendly, and is quite painful to integrate adding protection with an automated build process (but possible).
Before we implemented the HASP HL protection, there were 7 known pirates who had stripped the dotfuscator 'protections' from the product. We added the HASP protection at the same time as a major update to the software, which performs some heavy calculation on video in real time. As best I can tell from profiling and benchmarking, the HASP HL protection only slowed the intensive calculations by about 3%. Since that software was released about 5 years ago, not one new pirate of the product has been found. The software which it protects is in high demand in it's market segment, and the client is aware of several competitors actively trying to reverse engineer (without success so far). We know they have tried to solicit help from a few groups in Russia which advertise a service to break software protection, as numerous posts on various newsgroups and forums have included the newer versions of the protected product.
Recently we tried their software license solution (HASP SL) on a smaller project, which was straightforward enough to get working if you're already familiar with the HL product. It appears to work; there have been no reported piracy incidents, but this product is a lot lower in demand..
Of course, no protection can be perfect. If someone is sufficiently motivated and has serious cash to burn, I'm sure the protections afforded by HASP could be circumvented.
Making code difficult to reverse-engineer is called code obfuscation.
Most of the techniques you mention are fairly easy to work around. They center on adding some useless code. But useless code is easy to detect and remove, leaving you with a clean program.
For effective obfuscation, you need to make the behavior of your program dependent on the useless bits being executed. For example, rather than doing this:
a = useless_computation();
a = 42;
do this:
a = complicated_computation_that_uses_many_inputs_but_always_returns_42();
Or instead of doing this:
if (running_under_a_debugger()) abort();
a = 42;
Do this (where running_under_a_debugger should not be easily identifiable as a function that tests whether the code is running under a debugger — it should mix useful computations with debugger detection):
a = 42 - running_under_a_debugger();
Effective obfuscation isn't something you can do purely at the compilation stage. Whatever the compiler can do, a decompiler can do. Sure, you can increase the burden on the decompilers, but it's not going to go far. Effective obfuscation techniques, inasmuch as they exist, involve writing obfuscated source from day 1. Make your code self-modifying. Litter your code with computed jumps, derived from a large number of inputs. For example, instead of a simple call
some_function();
do this, where you happen to know the exact expected layout of the bits in some_data_structure:
goto (md5sum(&some_data_structure, 42) & 0xffffffff) + MAGIC_CONSTANT;
If you're serious about obfuscation, add several months to your planning; obfuscation doesn't come cheap. And do consider that by far the best way to avoid people reverse-engineering your code is to make it useless so that they don't bother. It's a simple economic consideration: they will reverse-engineer if the value to them is greater than the cost; but raising their cost also raises your cost a lot, so try lowering the value to them.
Now that I've told you that obfuscation is hard and expensive, I'm going to tell you it's not for you anyway. You write
current protocol I've been working on must not ever be inspected or understandable, for the security of various people
That raises a red flag. It's security by obscurity, which has a very poor record. If the security of the protocol depends on people not knowing the protocol, you've lost already.
Recommended reading:
The security bible: Security Engineering by Ross Anderson
The obfuscation bible: Surreptitious software by Christian Collberg and Jasvir Nagra
Take, for example, the AES algorithm. It's a very, very public algorithm, and it is VERY secure. Why? Two reasons: It's been reviewed by lots of smart people, and the "secret" part is not the algorithm itself - the secret part is the key which is one of the inputs to the algorithm. It's a much better approach to design your protocol with a generated "secret" that is outside your code, rather than to make the code itself secret. The code can always be interpreted no matter what you do, and (ideally) the generated secret can only be jeopardized by a massive brute force approach or through theft.
I think an interesting question is "Why do you want to obfuscate your code?" You want to make it hard for attackers to crack your algorithms? To make it harder for them to find exploitable bugs in your code? You wouldn't need to obfuscate code if the code were uncrackable in the first place. The root of the problem is crackable software. Fix the root of your problem, don't just obfuscate it.
Also, the more confusing you make your code, the harder it will be for YOU to find security bugs. Yes, it will be hard for hackers, but you need to find bugs too. Code should be easy to maintain years from now, and even well-written clear code can be difficult to maintain. Don't make it worse.
The best anti disassembler tricks, in particular on variable word length instruction sets are in assembler/machine code, not C. For example
CLC
BCC over
.byte 0x09
over:
The disassembler has to resolve the problem that a branch destination is the second byte in a multi byte instruction. An instruction set simulator will have no problem though. Branching to computed addresses, which you can cause from C, also make the disassembly difficult to impossible. Instruction set simulator will have no problem with it. Using a simulator to sort out branch destinations for you can aid the disassembly process. Compiled code is relatively clean and easy for a disassembler. So I think some assembly is required.
I think it was near the beginning of Michael Abrash's Zen of Assembly Language where he showed a simple anti disassembler and anti-debugger trick. The 8088/6 had a prefetch queue what you did was have an instruction that modified the next instruction or a couple ahead. If single stepping then you executed the modified instruction, if your instruction set simulator did not simulate the hardware completely, you executed the modified instruction. On real hardware running normally the real instruction would already be in the queue and the modified memory location wouldnt cause any damage so long as you didnt execute that string of instructions again. You could probably still use a trick like this today as pipelined processors fetch the next instruction. Or if you know that the hardware has a separate instruction and data cache you can modify a number of bytes ahead if you align this code in the cache line properly, the modified byte will not be written through the instruction cache but the data cache, and an instruction set simulator that did not have proper cache simulators would fail to execute properly. I think software only solutions are not going to get you very far.
The above are old and well known, I dont know enough about the current tools to know if they already work around such things. The self modifying code can/will trip up the debugger, but the human can/will narrow in on the problem and then see the self modifying code and work around it.
It used to be that the hackers would take about 18 months to work something out, dvds for example. Now they are averaging around 2 days to 2 weeks (if motivated) (blue ray, iphones, etc). That means to me if I spend more than a few days on security, I am likely wasting my time. The only real security you will get is through hardware (for example your instructions are encrypted and only the processor core well inside the chip decrypts just before execution, in a way that it cannot expose the decrypted instructions). That might buy you months instead of days.
Also, read Kevin Mitnick's book The Art of Deception. A person like that could pick up a phone and have you or a coworker hand out the secrets to the system thinking it is a manager or another coworker or hardware engineer in another part of the company. And your security is blown. Security is not all about managing the technology, gotta manage the humans too.
Many a times, fear of your product getting reverse engineered is misplaced. Yes, it can get reverse engineered; but will it become so famous over a short period of time, that hackers will find it worth to reverse engg. it ? (this job is not a small time activity, for substantial lines of code).
If it really becomes a money earner, then you should have gathered enough money to protect it using the legal ways like, patent and/or copyrights.
IMHO, take the basic precautions you are going to take and release it. If it becomes a point of reverse engineering that means you have done a really good job, you yourself will find better ways to overcome it. Good luck.
Take a read of http://en.wikipedia.org/wiki/Security_by_obscurity#Arguments_against. I'm sure others could probably also give a better sources of why security by obscurity is a bad thing.
It should be entirely possible, using modern cryptographic techniques, to have your system be open (I'm not saying it should be open, just that it could be), and still have total security, so long as the cryptographic algorithm doesn't have a hole in it (not likely if you choose a good one), your private keys/passwords remain private, and you don't have security holes in your code (this is what you should be worrying about).
Since July 2013, there is renewed interest in cryptographically robust obfuscation (in the form of Indistinguishability Obfuscation) which seems to have spurred from original research from Amit Sahai.
Sahai, Garg, Gentry, Halevi, Raykova, Waters, Candidate Indistinguishability Obfuscation
and Functional Encryption for all circuits (July 21, 2013).
Sahai, Waters, How to Use Indistinguishability Obfuscation:
Deniable Encryption, and More.
Sahai, Barak, Garg, Kalai, Paneth, Protecting Obfuscation Against Algebraic Attacks (February 4, 2014).
You can find some distilled information in this Quanta Magazine article and in that IEEE Spectrum article.
Currently the amount of resources required to make use of this technique make it impractical, but AFAICT the consensus is rather optimistic about the future.
I say this very casually, but to everyone who's used to instinctively dismiss obfuscation technology -- this is different. If it's proven to be truly working and made practical, this is major indeed, and not just for obfuscation.
To inform yourself, read the academic literature on code obfuscation. Christian Collberg of the University of Arizona is a reputable scholar in this field; Salil Vadhan of Harvard University has also done some good work.
I'm behind on this literature, but the essential idea I'm aware of is that you can't prevent an attacker from seeing the code that you will execute, but you can surround it with code that is not executed, and it costs an attacker exponential time (using best known techniques) to discover which fragments of your code are executed and which are not.
If someone wants to spend the time to reverse your binary then there is absolutely nothing you can do to stop them. You can make if moderately more difficult but that's about it. If you really want to learn about this then get a copy of http://www.hex-rays.com/idapro/ and disassemble a few binaries.
The fact that the CPU needs to execute the code is your undoing. The CPU only executes machine code... and programmers can read machine code.
That being said... you likely have a different issue which can be solved another way. What are you trying to protect? Depending on your issue you can likely use encryption to protect your product.
To be able to select the right option, You should think of the following aspects:
Is it likely that "new users" do not want to pay but use Your software?
Is it likely that existing customers need more licences than they have?
How much are potential users willing to pay?
Do You want to give licences per user / concurrent users / workstation / company?
Does Your software need training / customization to be useful?
If the answer to question 5 is "yes", then do not worry about illegal copies. They wouldn't be useful anyway.
If the answer to question 1 is "yes", then first think about pricing (see question 3).
If You answer questions 2 "yes", then a "pay per use" model might
be appropriate for You.
From my experience, pay per use + customization and training is the best protection
for Your software, because:
New users are attracted by the pricing model (little use -> little pay)
There are almost no "anonymous users", because they need training and customization.
No software restrictions scares potential customers away.
There is a continuous stream of money from existing customers.
You get valuable feedback for development from Your customers, because of a long-term business relationship.
Before You think of introducing DRM or obfuscation, You might think of these points and if they are applicable to Your software.
There is a recent paper called "Program obfuscation and one-time programs". If you are really serious about protecting your application. The paper in general goes around the theoretical impossibility results by the use of simple and universal hardware.
If you cant afford requiring extra hardware, then there is also another paper that gives the theoretically best-possible obfuscation "On best-possible obfuscation", amongst all programs with the same functionality and same size. However the paper shows that information-theoretic best-possible implies a collapse of the polynomial hierarchy.
Those papers should at least give you sufficient bibliographical leads to walk in the related literature if these results does not work for your needs.
Update: A new notion of obfuscation, called indistinguishable obfuscation, can mitigate the impossibility result (paper)
Protected code in a virtual machine seemed impossible to reverse engineer at first. Themida Packer
But it's not that secure anymore.. And no matter how you pack your code you can always do a memory dump of any loaded executable and disassemble it with any disassembler like IDA Pro.
IDA Pro also comes with a nifty assembly code to C source code transformer although the generated code will look more like a pointer/address mathematical mess.. if you compare it with original you can fix all errors and rip anything out.
No dice, you cannot protect your code from disassemble. What you can do is to set up the server for the business logic and use webservice to provide it for your app. Of course, this scenario is not always possible.
To avoid reverse engineering, you must not give the code to users. That said, I recommend using an online application...however (since you gave no context) that could be pointless on yours.
Possibly your best alternative is still using virtualization, which introduces another level of indirection/obfuscation needed to by bypassed, but as SSpoke said in his answer, this technique is also not 100% secure.
The point is you won't get ultimate protection, because there is no such thing, and if ever will be, it won't last long, which mean it wasn't ultimate protection in the first place.
Whatever man assemble, can be disassembled.
It's usually true that (proper) disassembling is often (a bit or more) harder task, so your opponent must be more skilled, but you can assume that there is always someone of such quality, and it's a safe bet.
If you want to protect something against REs, you must know at least common techniques used by REs.
Thus words
internet is not really resourceful for prevention against reverse engineering but rather depicts tons of information on how to reverse engineer
show bad attitude of yours. I'm not saying that to use or embed protection you must know how to break it, but to use it wisely you should know its weaknesses and pitfalls. You should understand it.
(There are examples of software using protection in a wrong way, making such protection practically nonexistent. To avoid speaking vaguely I'll give you an example briefly described in internet: Oxford English Dictionary Second Edition on CD-ROM v4. You can read about its failed use of SecuROM in following page: Oxford English Dictionary (OED) on CD-ROM in a 16-, 32-, or 64-bit Windows environment: Hard-disk installation, bugs, word processing macros, networking, fonts, and so forth)
Everything takes time.
If you're new to the subject and don't have months or rather years to get properly into RE stuff, then go with available solutions made by others. The problem here is obvious, they are already there, so you already know they're not 100% secure, but making your own new protection would give you only a false sense of being protected, unless you know really well state of the art in reverse engineering and protection (but you don't, at least at this moment).
The point of software protection is to scare newbies, stall common REs, and put a smile on the face of seasoned RE after her/his (hopefully interesting) journey to the center of your application.
In business talk you may say it's all about delaying competition, as much as it is possible.
(Have a look at nice presentation Silver Needle in the Skype by Philippe Biondi and Fabrice Desclaux shown on Black Hat 2006).
You're aware that there is a lot of stuff about RE out there, so start reading it. :)
I said about virtualization, so I'll give you a link to one exemplary thread from EXETOOLS FORUM: Best software protector: Themida or Enigma Protector?. It may help you a bit in further searches.
Contrary to what most people say, based on their intuition and personal experience, I don't think cryptographically-safe program obfuscation is proven to be impossible in general.
This is one example of a perfectly obfuscated program statement to demonstrate my point:
printf("1677741794\n");
One can never guess that what it really does is
printf("%d\n", 0xBAADF00D ^ 0xDEADBEEF);
There is an interesting paper on this subject, which proves some impossibility results. It is called "On the (Im)possibility of Obfuscating Programs".
Although the paper does prove that the obfuscation making the program non-distinguishable from the function it implements is impossible, obfuscation defined in some weaker way may still be possible!
I do not think that any code is unhackable but the rewards need to be great for someone to want to attempt it.
Having said that there are things you should do such as:
Use the highest optimization level possible (reverse engineering is not only about getting the assembly sequence, it is also about understanding the code and porting it into a higher-level language such as C). Highly optimized code can be a b---h to follow.
Make structures dense by not having larger data types than necessary. Rearrange structure members between official code releases. Rearranged bit fields in structures are also something you can use.
You can check for the presence of certain values which shouldn't be changed (a copyright message is an example). If a byte vector contains "vwxyz" you can have another byte vector containing "abcde" and compare the differences. The function doing it should not be passed pointers to the vectors but use external pointers defined in other modules as (pseudo-C code) "char *p1=&string1[539];" and "char p2=&string2[-11731];". That way there won't be any pointers pointing exactly at the two strings. In the comparison code you then compare for "(p1-539+i)-*(p2+11731+i)==some value". The cracker will think it is safe to change string1 because no one appears to reference it. Bury the test in some unexpected place.
Try to hack the assembly code yourself to see what is easy and what is difficult to do. Ideas should pop up that you can experiment with to make the code more difficult to reverse engineer and to make debugging it more difficult.
As many already said: On a regular CPU you cant stop them from doing, you can just delay them. As my old crypto teacher told me: You dont need perfect encryption, breaking the code must be just more expensive than the gain. Same holds for your obfuscation.
But 3 additional notes:
It is possible to make reverse engineering impossible, BUT (and this is a very very big but), you cant do it on a conventional cpu. I did also much hardware development, and often FPGA are used. E.g. the Virtex 5 FX have a PowerPC CPU on them, and you can use the APU to implement own CPU opcodes in your hardware. You could use this facility to really decrypt incstuctions for the PowerPC, that is not accessible by the outside or other software, or even execute the command in the hardware. As the FPGA has builtin AES encryption for its configuration bitstream, you could not reverse engineer it (except someone manages to break AES, but then I guess we have other problems...). This ways vendors of hardware IP also protect their work.
You speak from protocol. You dont say what kind of protocol it is, but when it is a network protocol you should at least protect it against network sniffing. This can you indeed do by encryption. But if you want to protect the en/decryption from an owner of the software, you are back to the obfuscation.
Do make your programm undebuggable/unrunnable. Try to use some kind of detection of debugging and apply it e.g. in some formula oder adding a debug register content to a magic constant. It is much harder if your program looks in debug mode is if it where running normal, but makes a complete wrong computation, operation, or some other. E.g. I know some eco games, that had a really nasty copy-protection (I know you dont want copyprotection, but it is similar): The stolen version altered the mined resources after 30 mins of game play, and suddenly you got just a single resource. The pirate just cracked it (i.e. reverse engineered it) - checked if it run, and volia released it. Such slight behaviour changings are very hard to detect, esp. if they do not appear instantly to detection, but only delayed.
So finally I would suggest:
Estimate what is the gain of the people reverse engineering your software, translate this into some time (e.g. by using the cheapest indian salary) and make the reverse engineering so time costing that it is bigger.
Traditional reverse engineering techniques depend on the ability of a smart agent using a disassembler to answer questions about the code. If you want strong safety, you have do to things that provably prevent the agent from getting such answers.
You can do that by relying on the Halting Program ("does program X halt?") which in general cannot be solved. Adding programs that are difficult to reason about to your program, makes your program difficult to reason about. It is easier to construct such programs than to tear them apart. You can also add code to program that has varying degrees of difficulty for reasoning; a great candidate is the program of reasoning about aliases ("pointers").
Collberg et al have a paper ("Manufacturing Cheap, Resilient and Stealthy Opaque Constructs") that discusses these topics and defines a variety of "opaque" predicates that can make it very difficult to reason about code:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.1946&rep=rep1&type=pdf
I have not seen Collberg's specific methods applied to production code, especially not C or C++ source code.
The DashO Java obfuscator seems to use similar ideas.
http://www.cs.arizona.edu/~collberg/Teaching/620/2008/Assignments/tools/DashO/
Security through obscurity doesn't work as has been demonstrated by people much cleverer than
the both of us. If you must protect the communication protocol of your customers then you are
morally obliged to use the best code that is in the open and fully scrutinized by experts.
This is for the situation where people can inspect the code. If your application is to run on an embedded microprocessor, you can choose one that has a sealing facility, which makes it impossible to inspect the code or observe more than trivial parameters like current usage while it runs. (It is, except by hardware invading techniques, where you carefully dismantle the chip and use advanced equipment to inspect currents on individual transistors.)
I'm the author of a reverse engineering assembler for the x86. If you're ready for a cold
surprise, send me the result of your best efforts. (Contact me through my websites.)
Few I have seen in the answers would present a substantial hurdle to me. If you want to see
how sophisticated reverse engineering code works, you should really study websites with
reverse engineering challenges.
Your question could use some clarification. How do you expect to keep a protocol secret if the
computer code is amenable to reverse engineering? If my protocol would be to send an RSA encrypted message (even public key) what do you gain by keeping the protocol secret?
For all practical purposes an inspector would be confronted with a sequence of random bits.
Groetjes Albert
FIRST THING TO REMEMBER ABOUT HIDING YOUR CODE: Not all of your code needs to be hidden.
THE END GOAL: My end goal for most software programs is the ability to sell different licenses that will turn on and off specific features within my programs.
BEST TECHNIQUE: I find that building in a system of hooks and filters like WordPress offers, is the absolute best method when trying to confuse your opponents. This allows you to encrypt certain trigger associations without actually encrypting the code.
The reason that you do this, is because you'll want to encrypt the least amount of code possible.
KNOW YOUR CRACKERS: Know this: The main reason for cracking code is not because of malicious distribution of licensing, it's actually because NEED to change your code and they don't really NEED to distribute free copies.
GETTING STARTED: Set aside the small amount of code that you're going to encrypt, the rest of the code should try and be crammed into ONE file to increase complexity and understanding.
PREPARING TO ENCRYPT: You're going to be encrypting in layers with my system, it's also going to be a very complex procedure so build another program that will be responsible for the encryption process.
STEP ONE: Obfuscate using base64 names for everything. Once done, base64 the obfuscated code and save it into a temporary file that will later be used to decrypt and run this code. Make sense?
I'll repeat since you'll be doing this again and again. You're going to create a base64 string and save it into another file as a variable that will be decrypted and rendered.
STEP TWO: You're going to read in this temporary file as a string and obfuscate it, then base64 it and save it into a second temp file that will be used to decrypt and render it for the end user.
STEP THREE: Repeat step two as many times as you would like. Once you have this working properly without decrypt errors, then you're going to want to start building in land mines for your opponents.
LAND MINE ONE: You're going to want to keep the fact that you're being notified an absolute secret. So build in a cracker attempt security warning mail system for layer 2. This will be fired letting you know the specifics about your opponent if anything is to go wrong.
LAND MINE TWO: Dependencies. You don't want your opponent to be able to run layer one, without layer 3 or 4 or 5, or even the actual program it was designed for. So make sure that within layer one you include some sort of kill script that will activate if the program isn't present, or the other layers.
I'm sure you can come up with you're own landmines, have fun with it.
THING TO REMEMBER: You can actually encrypt your code instead of base64'ing it. That way a simple base64 willnt decrypt the program.
REWARD: Keep in mind that this can actually be a symbiotic relationship between you and you'r opponent. I always place a comment inside of layer one, the comment congratulates the cracker and gives them a promo code to use in order to receive a cash reward from you.
Make the cash reward significant with no prejudice involved. I normally say something like $500. If your guy is the first to crack the code, then pay him his money and become his friend. If he's a friend of yours he's not going to distribute your software. Ask him how he did it and how you can improve!
GOOD LUCK!
Have anyone tried CodeMorth: http://www.sourceformat.com/code-obfuscator.htm ?
Or Themida: http://www.oreans.com/themida_features.php ?
Later one looks more promissing.
One thing that has not been mentioned so far:
You could run parts of the code at your side (server side, e.g. called by a REST API). This way, the code is completely inaccessible to the reverse engineer.
Of course, this only applies, if
latency
traffic volume
compute and I/O power
privacy issues
are not preventing server-side execution of (parts of) your code.
I'm studying Smalltalk right now. It looks very similar to python (actually, the opposite, python is very similar to Smalltalk), so I was wondering, as a python enthusiast, if it's really worth for me to study it.
Apart from message passing, what are other notable conceptual differences between Smalltalk and python which could allow me to see new programming horizons ?
In Python, the "basic" constructs such as if/else, short-circuiting boolean operators, and loops are part of the language itself. In Smalltalk, they are all just messages. In that sense, while both Python and Smalltalk agree that "everything is an object", Smalltalk goes further in that it also asserts that "everything is a message".
[EDIT] Some examples.
Conditional statement in Smalltalk:
((x > y) and: [x > z])
ifTrue: [ ... ]
ifFalse: [ ... ]
Note how and: is just a message on Boolean (itself produced as a result of passing message > to x), and the second argument of and: is not a plain expression, but a block, enabling lazy (i.e. short-circuiting) evaluation. This produces another Boolean object, which also supports the message ifTrue:ifFalse:, taking two more blocks (i.e. lambdas) as arguments, and running one or the other depending on the value of the Boolean.
As someone new to smalltalk, the two things that really strike me are the image-based system, and that reflection is everywhere. These two simple facts appear to give rise to everything else cool in the system:
The image means that you do everything by manipulating objects, including writing and compiling code
Reflection allows you to inspect the state of any object. Since classes are objects and their sources are objects, you can inspect and manipulate code
You have access to the current execution context, so you can have a look at the stack, and from there, compiled code and the source of that code and so on
The stack is an object, so you can save it away and then resume later. Bingo, continuations!
All of the above starts to come together in cool ways:
The browser lets you explore the source of literally everything, including the VM in Squeak
You can make changes that affect your live program, so there's no need to restart and navigate your way through to whatever you're working on
Even better, when your program throws an exception you can debug the live code. You fix the bug, update the state if it's become inconsistent and then have your program continue.
The browser will tell you if it thinks you've made a typo
It's absurdly easy to browse up and down the class hierarchy, or find out what messages a object responds to, or which code sends a given message, or which objects can receive a given message
You can inspect and manipulate the state of any object in the system
You can make any two objects literally switch places with become:, which lets you do crazy stuff like stub out any object and then lazily pull it in from elsewhere if it's sent a message.
The image system and reflection has made all of these perfectly natural and normal things for a smalltalker for about thirty years.
Smalltalk historically has had an amazing IDE built in. I have missed this IDE on many languages.
Smalltalk also has the lovely property that it is typically in a living system. You start up clean and start modifying things. This is basically an object persistent storage system. That being said, this is both good and bad. What you run is part of your system and part of what you ship. The system can be setup quite nicely before being distributed. The down side, is that the system has everything you run as part of what you ship. You need to be very careful packaging for redistribution.
Now, that being said, it has been a while since I have worked with Smalltalk (about 20 years). Yes, I know, fun times for those who do the math. Smalltalk is a nice language, fun to program in, fun to learn, but I have found it a little hard to ship things in.
Enjoy playing with it if you do. I have been playing with Python and loving it.
Jacob
The Smalltalk language itself is very important. It comprises a small set of powerful, orthogonal features that makes the language highly extensible. As Alan Lovejoy says:
"Smalltalk is also fun because defining and using domain specific languages isn’t an afterthought, it’s the only way Smalltalk works at all." The language notation is critically important because: "Differences in the expressive power of the programming notation used do matter." For more, read the full article here.
The language aspect often isn't that important, and many languages are quite samey,
From what I see, Python and Smalltalk share OOP ideals ... but are very different in their implementation and the power in the presented language interface.
the real value comes in what the subtle differences in the syntax allows in terms of implementation. Take a look at Self and other meta-heavy languages.
Look past the syntax and immediate semantics to what the subtle differences allow the implementation to do.
For example:
Everything in Smalltalk-80 is available for modification from within a running program
What differences between Python and Smalltalk allow deeper maniplation if any? How does the language enable the implementation of the compiler/runtime?