Go to the first, previous, next, last section, table of contents.


13. treelang internals

13.1 treelang files

To create a compiler that integrates into GCC, you need create many files. Some of the files are integrated into the main GCC makefile, to build the various parts of the compiler and to run the test suite. Others are incorporated into various GCC programs such as gcc.c. Finally you must provide the actual programs comprising your compiler.

The files are:

  1. COPYING. This is the copyright file, assuming you are going to use the GNU General Public Licence. You probably need to use the GPL because if you use the GCC back end your program and the back end are one program, and the back end is GPLed. This need not be present if the language is incorporated into the main GCC tree, as the main gcc directory has this file.
  2. COPYING.LIB. This is the copyright file for those parts of your program that are not to be covered by the GPL, but are instead to be covered by the LGPL (Library or Lesser GPL). This licence may be appropriate for the library routines associated with your compiler. These are the routines that are linked with the output of the compiler. Using the LGPL for these programs allows programs written using your compiler to be closed source. For example LIBC is under the LGPL. This need not be present if the language is incorporated into the main GCC tree, as the main gcc directory has this file.
  3. ChangeLog. Record all the changes to your compiler. Use the same format as used in treelang as it is supported by an emacs editing mode and is part of the FSF coding standard. Normally each directory has its own changelog. The FSF standard allows but does not require a meaningful comment on why the changes were made, above and beyond why they were made. In the author's opinion it is useful to provide this information.
  4. root.texi. Macros for use in the main manual eg email addresses etc.
  5. treelang.texi. The manual, written in texinfo. Your manual would have a different file name. You need not write it in texinfo if you don't want do, but a lot of GNU software does use texinfo.
  6. Make-lang.in. This file is part of the make file which in incorporated with the GCC make file skeleton (Makefile.in in the gcc directory) to make Makefile, as part of the configuration process. Makefile in turn is the main instruction to actually build everything. The build instructions are held in the main gcc manual and web site so they are not repeated here. There are some comments at the top which will help you understand what you need to do. There are make commands to build things, remove generated files with various degrees of thoroughness, count the lines of code (so you know how much progress you are making), build info and html files from the texinfo source, run the tests etc.
  7. README. Just a brief informative text file saying what is in this directory.
  8. config-lang.in. This file is read by the configuration progress and must be present. You specify the name of your language, the name(s) of the compiler(s) incouding preprocessors you are going to build, whether any, usually generated, files should be excluded from diffs (ie when making diff files to send in patches). Whether the equate 'stagestuff' is used is unknown (???).
  9. lang-options. This file is included into gcc.c, the main gcc driver, and tells it what options your language supports. This is only used to display help (is this true ???).
  10. lang-specs. This file is also included in gcc.c. It tells gcc.c when to call your programs and what options to send them. The mini-language 'specs' is documented in the source of gcc.c. Do not attempt to write a specs file from scratch - use an existing one as the base and enhance it.
  11. Your texi files. Texinfo can be used to build documentation in HTML, info, dvi and postscript formats. It is a tagged language, is documented in its own manual, and has its own emacs mode.
  12. Your programs. The relationships between all the programs are explained in the next section. You need to write or use the following programs:

13.2 treelang compiler interfaces

13.2.1 treelang driver

The GCC compiler consists of a driver, which then executes the various compiler phases based on the instructions in the specs files.

Typically a program's language will be identified from its suffix (eg .tree) for treelang programs.

The driver (gcc.c) will then drive (exec) in turn a preprocessor, the main compiler, the assembler and the link editor. gcc options allow you to override all of this. In the case of treelang programs there is no preprocessor, and mostly these days the C preprocessor is run within the main C compiler apparently for reasons of speed.

You will be using the standard assembler and linkage editor so these are ignored from now on.

You have to write your own preprocessor if you want one. This is usually totally language specific. The main point to be aware of is to ensure that you find some way to pass file name and line number information through to the main compiler so that it can tell the back end this information and so the debugger can find the right source line for each piece of code. That is all there is to say about the preprocessor except that the preprocessor will probably not be the slowest part of the compiler and will probably not use the most memory so don't waste too much time tuning it until you know you need to do so.

13.2.2 treelang main compiler

The main compiler for treelang consists of toplev.c from the main GCC compiler, the parser, lexer and back end interface routines, and the back end routines themselves, of which there are many.

toplev.c does a lot of work for you and you shoudl seriously consider whether you want to reinvent it. It is quite possible to reuse it, as in the case of treelang.

Writing this code is the hard part of creating a compiler using GCC. The back end interface documentation is incomplete and the interface is complex.

There are three main aspects to interfacing to the other gcc code.

13.2.2.1 Interfacing to toplev.c

In treelang this is handled mainly in tree1.c and partly in treetree.c. Peruse toplev.c for details of what you need to do.

13.2.2.2 Interfacing to the garbage collection

Interfacing to the garbage collection. In treelang this is mainly in tree1.c.

Memory allocation in the compiler should be done using the ggc_alloc and kindred routines in ggc*.*. At the end of every function, toplev.c calls the garbage collection several times. The garbage collection calls mark routines which go through the memory which is still used, telling the garbage collection not to free it. Then all the memory not used is freed.

What this means is that you need a way to hook into this marking process. This is done by calling ggc_add_root. This provides the address of a callback routine which will be called duing garbage collection and which can call ggc_mark to save the storage. If storage is only used within the parsing of a function, you do not need to provide a way to mark it.

Note that you can also call ggc_mark_tree to mark any of the back end internal 'tree' nodes. This routine will follow the branches of the trees and mark all the subordinate structures. This is useful for example when you have created a variable declaaration that will be used across multiple functions, or for a function declaration (from a prototype) that may be used later on. See the next item for more on the tree nodes.

13.2.2.3 Interfacing to the code generation code.

In treelang this is done in treetree.c. A typedef called 'tree' which is defined in tree.h and tree.def in the gcc directory and largely implemented in tree.c and stmt.c forms the basic interface to the compiler back end.

In general you call various tree routines to generate code, either directly or through toplev.c. You build up data structures and expressions in similar ways.

You can read some documentation on this which can be found via the gcc main web page. In particular, the documentation produced by Joachim Nadler and translated by Tim Josling can be quite useful. the C compiler also has documentation in the main GCC manual (particularly the current CVS version) which is useful on a lot of the details.

In time it is hoped to enhance this document to provide a more comprehensive overview of this topic. The main gap is in explaining how it all works together.

13.3 Hints and tips


Go to the first, previous, next, last section, table of contents.