Computer programs are formulated in a programming language and specify classes of computing processes.
Computers, however, interpret sequences of particular instructions, but not program texts. Therefore, the program text must be translated into a suitable instruction sequence before it can be processed by a computer. This translation can be automated, which implies that it can be formulated as a program itself. The translation program is called a compiler, and the text to be translated is called source code.
A compiler translates and/or compiles a program written in a suitable source language into an equivalent target language through a number of stages. Starting with recognition of token through target code generation provide a basis for communication interface between a user and a processor in significant amount of time.
The term compilation denotes the conversion of an algorithm expressed in a human-oriented source language to an equivalent algorithm expressed in a hardware-oriented target language. We shall be concerned with the engineering of compilers covering organization, algorithms, data structures and user interfaces. Programming languages are tools used to construct formal descriptions of finite computation. Each computation consists of operations that transform a given initial state into some final state.
A compiler compiles a program written in a high-level programming language that is suitable for human programmers into the low-level machine language that is required by computers. During this process, the compiler will also attempt to spot and report obvious programmer mistakes. Using a high-level language for programming has a large impact on how fast programs can be developed. One of advantage of using a high-level language can be the same program can be compiled to many different machine languages and, hence, be brought to run on many different machine.
It can be described for C compiler as:
User writes a program in C language. (high-level language).
The C compiler, compiles the program and translates it to assembly program (low-level language).
An assembler then translates the assembly program into machine code (object).
A linker tool is used to link all the parts of the program together for execution (executable machine code).
A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few other tools that work closely with compilers.
A compiler is one component in a toolchain of programs used to create executables from source code. Typically, when you invoke a single command to compile a program, a whole sequence of programs are invoked in the background.
A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short hands for longer constructs.
2. File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern flow-of- control and data structuring facilities.
4. Language Extensions: These preprocessor attempts to add capabilities to the language by certain amounts to build-in macro
The preprocessor prepares the source code for the compiler proper. In the C and C++ languages, this means consuming all directives that start with the # symbol. For example, an #include directive causes the preprocessor to open the named file and insert its contents into the source code. A #define directive causes the preprocessor to substitute a value wherever a macro name is encountered. (Not all languages rely on a preprocessor.)
An interpreter is a program that appears to execute a source program as if it were machine language.
Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses interpreter.
The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Synatx analysis
3. Semantic analysis
4. Direct Execution
1. Modification of user program can be easily made and implemented as execution proceeds.
2. Type of object that denotes a various may change dynamically.
3. Debugging a program and finding errors is simplified task for a program used for interpretation.
4. The interpreter for the language makes it machine independent.
1. The execution of the program is slower.
2. Memory consumption is more.
Programmers found it difficult to write or read programs in machine language. They begin to use a mnemonic (symbols) for each machine instruction, which they would subsequently translate into machine language. Such a mnemonic machine language is now called an assembly language.
Programs known as assembler were written to automate the translation of assembly language in to machine language. The input to an assembler program is called source program, the output is a machine language translation (object program).
The loader is special program that takes input of object code from linker, loads it to main memory, and prepares this code for execution by computer.
Once the assembler procedures an object program, that program must be placed into memory and executed. The assembler could place the object program directly in memory and transfer control to it, thereby causing the machine language program to be execute. This would waste core by leaving the assembler in memory while the user’s program was being executed. Also the programmer would have to retranslate his program with each execution, thus wasting translation time. To over come this problems of wasted translation time and memory. System programmers developed another component called loader.
“A loader is a program that places programs into memory and prepares them for execution.” It would be more efficient if subroutines could be translated into object form the loader could relocate directly behind the user’s program. The task of adjusting programs o they may be placed in arbitrary core locations is called relocation. Relocation loaders perform four functions.
Loaders Scheme or types of Loader:
Based on the above four functions the loader is divided into different types, they are
i. Compile and go loader or Assemble and go loader
ii. General loader scheme
iii. Absolute loader
iv. Direct linking loader
v. Relocating loader
vi. Dynamic linking loader
Linker is a program in a system which helps to link a object modules of program into a single object file. It performs the process of linking. Linker are also called link editors. Linking is process of collecting and maintaining piece of code and data into a single file. Linker also link a particular module into system library. It takes object modules from assembler as input and forms an executable file as output for loader.
Linking is performed at the last step in compiling a program.
Source code -> compiler -> Assembler -> Object code -> Linker -> Executable file -> Loader
Linking is of two types:
It is performed during the compilation of source program. Linking is performed before execution in static linking.
Static linker perform two major task:
1. Symbol resolution – It associates each symbol reference with exactly one symbol definition .Every symbol have predefined task.
2. Relocation – It relocate code and data section and modify symbol references to the relocated memory location
Dynamic linking is performed during the run time. This linking is accomplished by placing the name of a shareable library in the executable image. There is more chances of error and failure chances.
A compiler that runs on platform and is capable of generating executable code for platform is called a cross-compiler.
A compiler that takes the source code of one programming language and translates it into the source code of another programming language is called a source-to-source compiler.
Number of tools has been developed in helping to construct compilers. Tools range from scanner and parser generators to complex systems, called compiler-compilers, compiler-generators or translator-writing systems.
The input specification for these systems may contain:
1. A description of the lexical and syntactic structure of the source languages.
2. A description of what output is to be generated for each source language construct.
3. A description of the target machine.
The principle aids provided by the compiler-compilers are:
1. For Scanner Generator the Regular Expression is being used.
2. For Parser Generator the Context Free Grammars are used.
A compiler is characterized by three languages:
1. source language
2. object language
3. The language in which it is written.