Logotype de cole normale suprieure

Automated Exploit Generation & Penetration Testing

Automatic Exploit Generation

Control flow exploits allow opponents to run hostile code on target computers. As we write these lines, control flow exploits are mostly created using human intelligence: A bug is found and hackers imagine how to use this bug to achieve adversarial goals. Automated techniques for finding bugs exist but often fail to match bugs with adversarial goals and thus translate bugs into exploits. Interestingly, in many cases machines generate trivial exploits while in others, when algorithms give-up and declare a bug non-exploitable, humans manage to finish the work and finally exploit the bug. We try to understand the root causes of this observation and develop automatic exploit generation tools that will be tested on real programs.

We proceed by steps: we first assume that the target source code and a formal model are given. We then automatically analyze the code, find bugs and turn them into control flow hijacks. We later try to avoid using the model and, ultimately, work only with compiled machine code.

As a natural complement of this plan we pursue two “sanity check” approaches:

1. Using the Apiary public repository of more than 23 million malware samples, we attempt to automatically analyze malwares and cluster them into exploit categories.

2. There exist a variety of security testing tools based on fuzzing[1] (e.g. radamsa, peachMiniFuzzcodenomicon., spirent, BestormIntent or Fuzzit). We try these tools on the same sample targets and test their efficiency.

A satisfactory problem treatment usually reveals an intersection between the results of these three complementary approaches.

Preventive Detection

After automatically generating exploits, we study what it takes to automatically immunize programs against them - not only by fixing the bugs themselves (which is, of course, an absolute necessity) - but also by injecting countermeasures that harden the code in advance. In particular, we would like to investigate in the coming years probabilistic bug detection. Anomaly detection consists in automatically producing catastrophic symptoms of particularly undesirable states and adding alerts to the program. Alerts are triggered if symptoms occur despite the explicit software quality assurance measures taken by the designer. Evidently, no tool can “list in advance” all undesirable states but, at least, the symptoms foreseen in advance will be caught if bugs nonetheless occur[2]. Fault detection is, of course, not new and computers are equipped with fault detection mechanisms since the 1950s. However, the effect of programming errors is very different from that of random bit-flips. Characterizing and statistically learning the symptoms of security-significant coding errors is a very interesting research area. In particular, we wish to increase protection beyond simple pattern comparisons and infer on-the-fly the probable presence of bugs given automatically observed symptoms. Bayesian networks embedded in the protected program could be useful to that end.

Automated Least Privilege Inference

The principle of least privilege requires that in a system every component is able to access only such information and resources that are necessary for its legitimate purpose. This definition requires, in turn, a precise understanding of the term “legitimate purpose” (which is sometimes not even clear to… the system designer himself).

Inferring legitimate purposes automatically is hence of great practical value. From a research perspective, the challenge is also connected to user profiling. We will illustrate this with a simple example, that we implemented: Mifare cards are general purpose smart cards. When one applies these cards to perform a specific task (e.g. public transport, passport, etc) s/he faces two problems:

  1. Which commands should be used to implement the application?
  2. How to define access rights within the card?

The answer to the first question starts by a detailed analysis of tens of iso commands offered by mifare, followed by the selection of a subset of commands that will accomplish the task. The second question requires examining each data object in the card[3] and each command to foresee the possible effects of commands on objects[4].

Despite the device’s apparent simplicity, the design of a secure mifare application is a challenging task. One needs to know very well the os and then “learn” the application. As we write these lines, a dozen of specialized consultants translate client needs into command flows and access right profiles. The stakes are also very high as the forgery of mifare-based passports can have serious national security consequences. At times, important details are overlooked and real attacks emerge. In other cases the system does not necessarily collapse although violating the principle of least privilege is not regarded as “desirable”.

We currently explore automatic least privilege inference as follows: Embark on a mifare card a software component that we call by analogy a “Flight Data Recorder” (fdr). This component acts exactly as an airplane’s fdr and records all the operations done with the card. The fdr will record the exact order of incoming iso commands and the exact usage done with each data object.

When one develops an application s/he will now proceed as follows: Configure the card as totally open: All files and objects are openly readable, executable and modifiable, and all commands allowed. Turn the fdr “on” and pass the card through the different test scenarios simulating all the device’s normal usage scenarios in the field. As the test campaign is over, read the fdr’s recording and do the following:

  1. Block access to all unused data objects and all unused commands.
  2. All data objects used in a given way will be blocked for all other uses.
  3. All command sequences that were not witnessed by the fdr will be forbidden. In other words the card will accept command sequences only if these commands match the prefix of a command sequence witnessed by the fdr. As soon as command flow diverges, the card will reject the command.

A tool synthetizes these rules into a security configuration profile added to cards. Now the consultant’s work is much easier: All s/he has to do is look into the combinations that remain permitted and decide if any of those may still cause a security breach. Moreover, we may now formally check the agreement between the inferred profile and a formal model of the application, to prove or refute conformance. Because the fdr approach is expected to brutally reduce freedom, the security configuration profile is expected to be minimalistic (although not necessarily minimal) and hence proof-friendly.

After implementing a simple fdr on the basic mifare automaton we now turn our attention to more complex systems (namely Android and ios) and to more powerful learning mechanisms.

Ontological Security

A brief look at any “good” cryptographic paper reveals that cryptographers rarely consider the meaning or even the structure of protected data. When a message is signed, hashed or encrypted, data is considered as raw bits fed into functions. Interestingly, our neighbor department (cryptographers) considers this low-level treatment as a virtue rather than a limitation because cryptographic algorithms do not assume anything about the structure of the data that they process.

Information security specialists work at another abstraction level and devise methods to protect structured information. For instance, sql injections target database entries, Java bytecode verifiers check type semantics and antiviruses analyze executable programs.

We believe that protecting data and information will start to become insufficient as we move into an era of ontology and knowledge. As we write these lines, ontologies already allow autonomous cars to make driving decisions. Ontologies also entrust computers with the authority to make important military and financial decisions. Hence, it appears necessary to start formalizing the foundations of ontological security. Here the adversary does not necessarily want to access data or corrupt information but to maliciously modify inferred knowledge[5]. Little seems to exist in this area today.

Network Security, Attack Scheduling & Botnets

Network Security Models

As of today, network security is an engineering discipline rather than a hard science. Standards change quickly, are complex and usually lack theoretical formalism. We endeavor to propose mathematical, semantic and algorithmic models to capture some of the complexity of real-life networks. We try to theoretically model threats such as drive-by downloads, botnets, identity theft, spam, information leakage or watering hole attacks and draw meaningful conclusions applicable to the design of secure network enclaves, virtualization, multi-level security, and adaptive quality of service management. Our goal is to either prove network security properties or foresee network problems to some extent.

In the same vein, despite Snowden’s recent revelations there seems to be no short-term alternative to using Gmail, Facebook or Yahoo in the coming future. Hence research needs to provide provable ways in which knowingly insecure infrastructures can be used in a secure way. This is a nontrivial exercise involving a variety of technical fields (in particular data mining, hardware and software architecture). The problem can be addressed under specific hardware assumptions (i.e. devise systems with some behavioral invariants) or assuming that machines are, in essence, open. From a societal and industrial perspective, stakes are as high as are the scientific challenges.

Attack Scheduling & Botnets

Botnets are networks of infected end-hosts (bots) that are under the control of a human operator commonly known as the botmaster. While botnets recruit vulnerable machines using methods also utilized by other classes of malware (e.g., remotely exploiting software vulnerabilities, social engineering, etc.), their defining characteristic is the use of command and control (c&c) channels to connect bots to their botmasters. Bot owners are usually unaware that their computers were hijacked to forward transmissions (e.g. spam or viruses) to other potential victims on the Internet. Computers hijacked into a botnet are often those whose owners fail to adequately protect. A bot is often created through an Internet port that has been left open and through which a malicious program can sneak in for future activation. At a certain time, the botmaster can unleash the effects of the botnet by sending a single command, possibly from an Internet Relay Channel (irc) site. We study optimal botnet control and operation strategies.

Biologically-Inspired Security, Self-Awareness & Self Modifications

Biologically-Inspired Information Security

This research stream leverages analogies between computer attacks and biological pathogen propagation to model and understand security risks. We apply and adapt the mathematical models developed for capturing the dynamics of biological systems, such as immune systems and epidemics. An increasingly popular field of research is Moving Target Defense where researchers design systems that guarantee a given level of service even in the presence of corrupted environments. As the complexity of information systems increases and approaches the complexity of living organisms, it becomes natural to try and apply similar computational methods to both contexts.

Computer Virology

Computer virology also falls within this research stream. Since the advent of polymorphic viruses, endowed with self-rewriting modules (sometimes allowing to rewrite the rewriting module itself and update the rewriting rules on the fly) identifying new viruses is mostly done using control flow analysis. However, self-rewriting viruses are usually locally detected by simple hashing that immediately detects any local code modifications. Hashing is, however, of little help in the context of worm propagation where the next victim does not possess any prior knowledge of the malware. An interesting way to analyze the problem, devised with two of ex ens students (Pablo Rauzy and Antoine Amarilli), consists in modeling viruses in λ-calculus: If T is a λ-term then we call TV the λ-term "T infected by the λ-virus V":

  • If T is a λ-abstraction (which we can see as a function or a program), then for any λ-term E, is (T E)V.

  • If T isn't a λ-abstraction (we can see it as some data, e.g., the final result of a computation), then TV is the value of (V* T), where V* is an arbitrary λ-abstraction defined by the creator of V.

The first point, for λ-abstraction, defines how the virus propagates while the second one defines how it acts. We are currently trying to apply this model to various categories self-modifying viruses and design viruses capable of rewriting their own control flow. We have encouraging results but these still suffer from steady code expansion and the lack of proper descriptive models.

Self-Control: Can a Code Constrain Its Own Flow-Control?

Consider a Sudoku game with several levels of increasing difficulty. The game editor does not want the game to be reverse-engineered to allow passing directly to level n without having properly completed level n-1. As of today, this is done using obfuscation.

We explore the following idea: assume that the executable code of level n is encrypted with the solution of level n-1. Then the only way to access level n would be to actually solve all the Sudokus of level n-1. Can this be generalized to other games? Consider now the game Angry Birds. In Angry Birds, the player controls two parameters: the shooting angle and the shot’s thrust. Hence, there are sequences of angles and thrusts that win the game. If the key to level n is broadcast-encrypted using as a receiver set all the solutions of level n-1 then the same idea can be put to work. This however assumes game determinism. We have hence reverse-engineered a simple videogame found on the Internet called “Zombie Shooter” (shown below) in which only the shooting angle matters. Our software constructed successfully the 321,923 solutions to screen #1 which proves that the idea is implementable.

It is hence legitimate to ask of this can be extended to general programs and general games. We think that a study of solutions as a collection of “good” sub-graphs of the directed graph corresponding to the potential executions of the code of level n-1 is doable. In any case, progress on this problem could have considerable practical applications for the game industry for the point of sale terminal industry (ascertain that only properly initiated transactions can trigger a given further action).

Cyber Physical & Physical Object Security

Cyber Physical and scada Security

Industrial Control Systems (ics) are considered as the main targets of highly capable threat agents, such as terrorists and nation-states. ics are complex systems and so is their protection. The increased popularity of smart grids and networked process control systems (scada) raise vital security concerns. These electro-physical systems differ a lot from traditional (a.k.a. “soft”) information systems. Cyber physical systems are engineered to last much longer than average soft systems and it is not uncommon to encounter 20 years old scada devices still in operation. Cyber physical systems are engineered as physical command relays and are usually characterized by strong legacy constraints. Defining and formalizing cyber physical security still remains to be done. We would like to explore the feasibility of mechanical system security proofs. Consider for instance a Swiss watch, a nuclear power-plant or an engine with an extremely complex mechanism. Is it possible to model each and every cogwheel and bolt (formalizing size, mechanical resistance, mass, friction coefficients and other physical dimensions) and reason on their safe operation? If so, can the (delicately combined) algorithmic and mechanical operation of a nuclear power plant be proved correct? Here we do not want to abstract away physics but model them in a hybrid model that will take into account both mechanics and information processing.

We are exchanging and cooperating with the State of Qatar on this topic.

Security in the Internet of Things

The Internet of Things (IoT) is defined as the interconnection of uniquely identifiable embedded computing devices (smart objects) via the Internet. The IoT adds autonomy and intelligence to computing objects and enables a variety of new services. According to Gartner by the 2020, there will be nearly 26 billion IoT objects (13 billion as we write these lines). Typical IoT objects are remote controls, bio-electronic implants, car parts, electricity meters, smart lamps, next generation watches or smart thermostats, to name only a few.

The IoT is currently spreading rapidly and suffers from the absence of a systematic analysis of the security challenges that stem from the massive deployment of autonomous computing devices. For instance, breaching into one node of a vpn (even if this node is a quasi-passive kitchen appliance) may not only serve to launch attacks on other network components but also create physical damage by setting fire or provoking domestic accidents.

IoT devices are usually computationally limited and benefit of poor tamper-resistance protections. They operate in hostile environments, communicate over insecure channels and generate a lot of data.

We develop models for securing and assessing the security of smart objects, look for theoretical foundations allowing to reason on their security and develop software toolkits allowing their sound large-scale deployment.


[1]Wikipedia defines fuzzing as “a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program. The program is then monitored for exceptions such as crashes, or failing built-in code assertions or for finding potential memory leaks. Fuzzing is commonly used to test for security problems in software or computer systems.”

[2]For instance, although software analysis may guarantee that engine melt-down temperature will never be reached, independent tests can ascertain that even if speed exceeds safety limits because of some unknown bug, the bug’s effect will be detected and an alert launched. This was apparently not done in the - nonetheless highly critical - nuclear facility software attacked by the worm Stuxnet.

[3]e.g. key, file, register etc.

[4]e.g. some keys will be used for mac only, others for encryption, others for authentication etc. Leaving an encryption key open for mac is a violation of the principle of least privilege.

[5] Note that the attacker does not necessarily need to know the raw data that defined the corrupted knowledge.