Compiler Design - Overview

Computer programs are formulated in a programming language and specify classes of computing processes.

Computers, however, interpret sequences of particular instructions, but not program texts. Therefore, the program text must be translated into a suitable instruction sequence before it can be processed by a computer. This translation can be automated, which implies that it can be formulated as a program itself. The translation program is called a compiler, and the text to be translated is called source code.

A compiler translates and/or compiles a program written in a suitable source language into an equivalent target language through a number of stages. Starting with recognition of token through target code generation provide a basis for communication interface between a user and a processor in significant amount of time.

The term compilation denotes the conversion of an algorithm expressed in a human-oriented source language to an equivalent algorithm expressed in a hardware-oriented target language. We shall be concerned with the engineering of compilers covering organization, algorithms, data structures and user interfaces. Programming languages are tools used to construct formal descriptions of finite computation. Each computation consists of operations that transform a given initial state into some final state.

A compiler compiles a program written in a high-level programming language that is suitable for human programmers into the low-level machine language that is required by computers. During this process, the compiler will also attempt to spot and report obvious programmer mistakes. Using a high-level language for programming has a large impact on how fast programs can be developed. One of advantage of using a high-level language can be the same program can be compiled to many different machine languages and, hence, be brought to run on many different machine.

LANGUAGE PROCESSING SYSTEM

In language processing system the source program is first preprocessed preprocessors. The modified source program is processed by compiler to form target assembly code which is then translated by assembler to generate relocatable object codes that are processed by linker and loader to generate target program. We write programs in high-level language, which is easier for us to understand and remember. These programs are then fed into a series of tools and OS components to get the desired code that can be used by the machine. This is known as Language Processing System.

It can be described for C compiler as:

User writes a program in C language. (high-level language).

The C compiler, compiles the program and translates it to assembly program (low-level language).

An assembler then translates the assembly program into machine code (object).

A linker tool is used to link all the parts of the program together for execution (executable machine code).

A loader loads all of them into memory and then the program is executed.

Before diving straight into the concepts of compilers, we should understand a few other tools that work closely with compilers.

A compiler is one component in a toolchain of programs used to create executables from source code. Typically, when you invoke a single command to compile a program, a whole sequence of programs are invoked in the background.

Figure : compiling C source code to assembly code.

Preprocessor

A preprocessor produce input to compilers. They may perform the following functions.

1. Macro processing: A preprocessor may allow a user to define macros that are short hands for longer constructs.

2. File inclusion: A preprocessor may include header files into the program text.

3. Rational preprocessor: these preprocessors augment older languages with more modern flow-of- control and data structuring facilities.

4. Language Extensions: These preprocessor attempts to add capabilities to the language by certain amounts to build-in macro

The preprocessor prepares the source code for the compiler proper. In the C and C++ languages, this means consuming all directives that start with the # symbol. For example, an #include directive causes the preprocessor to open the named file and insert its contents into the source code. A #define directive causes the preprocessor to substitute a value wherever a macro name is encountered. (Not all languages rely on a preprocessor.)

Interpreter

An interpreter is a program that appears to execute a source program as if it were machine language.

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses interpreter.

The process of interpretation can be carried out in following phases.

1. Lexical analysis

2. Synatx analysis

3. Semantic analysis

4. Direct Execution

Advantages:

1. Modification of user program can be easily made and implemented as execution proceeds.

2. Type of object that denotes a various may change dynamically.

3. Debugging a program and finding errors is simplified task for a program used for interpretation.

4. The interpreter for the language makes it machine independent.

Disadvantages:

1. The execution of the program is slower.

2. Memory consumption is more.

Assembler

Programmers found it difficult to write or read programs in machine language. They begin to use a mnemonic (symbols) for each machine instruction, which they would subsequently translate into machine language. Such a mnemonic machine language is now called an assembly language.

Programs known as assembler were written to automate the translation of assembly language in to machine language. The input to an assembler program is called source program, the output is a machine language translation (object program).

Loader

The loader is special program that takes input of object code from linker, loads it to main memory, and prepares this code for execution by computer.

Once the assembler procedures an object program, that program must be placed into memory and executed. The assembler could place the object program directly in memory and transfer control to it, thereby causing the machine language program to be execute. This would waste core by leaving the assembler in memory while the user’s program was being executed. Also the programmer would have to retranslate his program with each execution, thus wasting translation time. To over come this problems of wasted translation time and memory. System programmers developed another component called loader.

“A loader is a program that places programs into memory and prepares them for execution.” It would be more efficient if subroutines could be translated into object form the loader could relocate directly behind the user’s program. The task of adjusting programs o they may be placed in arbitrary core locations is called relocation. Relocation loaders perform four functions.

Loaders Scheme or types of Loader:

Based on the above four functions the loader is divided into different types, they are

i. Compile and go loader or Assemble and go loader

ii. General loader scheme

iii. Absolute loader

iv. Direct linking loader

v. Relocating loader

vi. Dynamic linking loader

Linker

Linker is a program in a system which helps to link a object modules of program into a single object file. It performs the process of linking. Linker are also called link editors. Linking is process of collecting and maintaining piece of code and data into a single file. Linker also link a particular module into system library. It takes object modules from assembler as input and forms an executable file as output for loader.

Linking is performed at the last step in compiling a program.


Source code -> compiler -> Assembler -> Object code -> Linker -> Executable file -> Loader

Linking is of two types:

1. Static Linking –

It is performed during the compilation of source program. Linking is performed before execution in static linking.

Static linker perform two major task:

1. Symbol resolution – It associates each symbol reference with exactly one symbol definition .Every symbol have predefined task.

2. Relocation – It relocate code and data section and modify symbol references to the relocated memory location

1. Dynamic Linking –

Dynamic linking is performed during the run time. This linking is accomplished by placing the name of a shareable library in the executable image. There is more chances of error and failure chances.

Cross-Compiler

A compiler that runs on platform and is capable of generating executable code for platform is called a cross-compiler.

Source-to-source Compiler

A compiler that takes the source code of one programming language and translates it into the source code of another programming language is called a source-to-source compiler.

Compiler Writing Tool

Number of tools has been developed in helping to construct compilers. Tools range from scanner and parser generators to complex systems, called compiler-compilers, compiler-generators or translator-writing systems.

The input specification for these systems may contain:

1. A description of the lexical and syntactic structure of the source languages.

2. A description of what output is to be generated for each source language construct.

3. A description of the target machine.

The principle aids provided by the compiler-compilers are:

1. For Scanner Generator the Regular Expression is being used.

2. For Parser Generator the Context Free Grammars are used.