Concerning the usage of C++ and Python in hacking.

by Terry Lambert

Apple Core OS Kernel Team

Technical lead

via Quora

Here is a 50,000 foot view on why the answer is “uh, mostly no, and you should know better”.

Note: I don’t like this type of thing being called “hacking”, but I suppose Hollywood always gets its way, and you can call it “hacking” if you want. So I will use the words “hack” and “hacking”, even though they are not appropriate.

Hacking is generally about finding the cracks and crevices where pieces of a system don’t quite fit together perfectly, and then exploiting them to get them to perform functions that they were never intended to perform.

When you are talking about hacking, then, you are universally talking about giving a system bad, malformed, or just unexpected data, and having the system act on it.

If that system is on the other end of a protocol loop, for example, it’s a web server, and you give it unexpected data over a raw socket by giving it an intentionally malformed or altered data stream, as if you were a web browser, then yeah, you could use something like Python; otherwise, unless Python is part of a payload, it’s a pretty useless language for hacking.

So if you are doing a remote exploit: Python is OK, but not my first choice; C++ is almost never my first choice, unless I’m attacking something running .NET, and I’m using specialized DLL’s, and doing the attack from a Windows platform which normally runs the client for the remote service, and that client uses the DLL’s.

I suppose if someone is stupid enough to design their overall system to store credit card numbers on the same server that hosts their store front, it could be useful to hack that way.

Local exploits are another matter. The only way a local exploit works is if the system being exploited runs the code on behalf of the attacker.

Unless there’s a Python interpreter installed on the system being exploited: Python is totally useless for anything but on-the-fly payload generation of non-Python code, as part of a combined remote exploit + local exploit payload attack.

Likewise, it’s highly unlikely that they’re going to install developer tools for you on the system you are attempting to penetrate, so the language your attack is written in is going to be largely unimportant. Not entirely, mind you, but largely.

So there are three ways to do an exploit.

The primary mechanism is component replacement, which requires that you are somehow able to replace a component on the target system.

If you do, that replacement component will need to have the same ABI as the component it s replacing. In that case, you’re either going to use hand-coded assembly (because you know the ABI intimately), or you are going to use whatever language the component was written in originally, in order to get a plug compatible replacement.

For WordPress sites, that’s PHP, or (rarely) SQL database triggers, or (even more rarely) compiled helper executables.

For everything else, these will mostly be C, C++, or objective C. Very rarely assembly.

The next mechanism is a stack smash.

For a stack smash, you have to overwrite the function return address in the stack frame, and have it return to an unexpected location, rather than the call site for the current frame. For this reason, most modern systems have instituted stack probes, which prevent this type of exploit.

To get around stack probes, you can place the executable code on the stack itself, and then jump to that, and have it call your code. For this reason, most modern systems have what are commonly called “NX stacks”, which means that the page attribute on the stack page itself means that you can’t run executable code from it (“NX” is the name if the bit in the Intel MMU PTE — Page Table Entry — for “No eXecute”).

To get around NX and stack probes, you write enough data to push an entire fake stack frame — as if the exploited location in the code had made a function call in that place to the exploit, and then returned directly to continue the instruction after the [non-existent] call site, as if it were a legal call. While this works, it almost inevitably crashes, if your exploit ever returns, since it will pop the fake register spill “back” into the registers, giving them invalid data (since you had no idea of what fake data to put in there to be valid data).

The binary data you put on the stack for this is called the “payload”, and you are better off doing it in assembler than any other language. Writing a payload in C or C++ is usually not useful, and generating it from C, C++, Python, or some other language is a lot harder than just doing “as -o payload payload.a”.

The next method is a buffer overflow attack.

To do this, you rely on a lack of bounds checking to usefully corrupt one or more variables adjacent in memory to the variable you are overflowing. The “in memory” part is important, since the compiler is permitted to reorder variable layout on the stack for auto (stack) variables. This means you need to have a pretty intimate knowledge of the disassembly dump of the code segment you are attacking.

The next method is a data injection attack.

It involves simply relying on the other programmer performing poor input validation on the data you hand them. This is the source of the XKCD joke:

Effectively, you find a place where the code on the other end fails to sanitize its input, and then passes that data along to another component that then acts on it.

This is one of the “cracks and crevices” I talked about up top.

There are also “fuzzing-style” attacks.

These are fun, because what you do is generate a set of pseudo-random events, and throw them at an API boundary, until you get an unexpected behaviour. You want them to be pseudo-random, because when you find the unexpected behaviour — you’re going to want to find it again to be able to craft an exploit. Knowing that there is a problem is not useful; knowing there is a problem, and here’s where it lies is useful.

Most naive first timers preferred target for this type of attack is the system call layer, but it’s actually useful to attack library APIs, mostly because other people don’t (the fuzzers are harder to write).

This is akin to ice fishing, and there are some intentional “exploits” that OS vendors have coded in that aren’t actually exploits; for example, the System V tty subsystem “partial open hack”, that lets you open a modem control port tty without DCD being raised by the modem (it’s the only way to use a modem for both dial-in and dial-out on a System V UNIX system). It’s not an exploit, per se, because even though “you just have to know”, rather than it being well documented, it’s “works as designed”.

Look, you’re not going to get a full list of these things. Most “security” companies consider these lists to be proprietary, and don’t share them, because they think that their employees are smarter than everyone else’s.

Kind of like Apple with the iPhone jailbreaks, thinking no one but Apple employees could read ARM assembly code when it was written in hex.

So the answer to your question is probably “mostly, no, they use C and assembly”.

So who uses Python and C++ for hacking? Mostly script kiddies. If you aren’t an actual hacker: download Metasploit, pick your poison, add a payload, and call it a day.

NB: Before anyone gets on me for not including Phishing and Social engineering and yada yada yada: that’s just grifting and/or being a good con artist. Or a bad con artist, if you are calling pretending to be Microsoft Support or the IRS.