Home About us Products Services Contact us Bookmark
:: wikimiki.org ::
Emacs Lisp

Emacs Lisp

Emacs Lisp is a dialect of the Lisp programming language used by the GNU Emacs and XEmacs text editors (which we will collectively refer to as Emacs in this article.) It is used for implementing most of the editing functionality built into Emacs, the remainder being written in C. Users of Emacs commonly write Emacs Lisp code to customize and extend Emacs. Emacs Lisp is sometimes called Elisp, at the risk of confusion with an unrelated Lisp dialect with the same name. In terms of features, it is closely related to the Maclisp and Common Lisp dialects. It supports imperative and functional programming methods. Lisp was chosen as the extension language for Emacs because of its powerful features, including the ability to treat functions as data. Writing Emacs Lisp is not the only method of customizing GNU Emacs. Since version 20, GNU Emacs has included a "Customize" facility which allows users to set common customization variables through a graphical interface. "Customize" works by writing Emacs Lisp code for the user, and is limited to simple customizations. Not every user needs the full degree of extensibility offered by Emacs; those that do must write their own Emacs Lisp code.

Example

Here is a simple example of an Emacs extension written in Emacs Lisp. In Emacs, the editing area can be split into separate areas called windows, each displaying a different buffer. A buffer is, roughly speaking, a region of text loaded into Emacs' memory (possibly from a file), which can be saved into a text document. The user command for opening a new window is "C-x 2" (which means to press the 'x' key while holding down the 'control' key, and then to press the '2' key — do not type the space character — it is shown only for readability). This runs the Emacs Lisp function split-window-vertically. Normally, when the new window appears, it displays the same buffer as the previous one. Suppose we wish to make it display the next available buffer. In order to do this, the user writes the following Emacs Lisp code, in either an existing Emacs Lisp source file or an empty Emacs buffer: (defun my-split-window-function () (interactive) (split-window-vertically) (set-window-buffer (next-window) (other-buffer))) (global-set-key "\C-x2" 'my-split-window-function) The first statement, (defun ...), defines a new function, my-split-window-function, which calls split-window-vertically (the old window-splitting function), then tells the new window to display another buffer. The second statement, (global-set-key ...) re-binds the key sequence "C-x 2" to the new function. However, there is an easier way to write this. Emacs Lisp has a powerful feature called advice, which allows the user to create wrappers around existing functions instead of defining their own. Using advice, the above code can be reimplemented as follows: (defadvice split-window-vertically (after my-window-splitting-advice first () activate) (set-window-buffer (next-window) (other-buffer))) This instructs split-window-vertically to execute the user-supplied code whenever it is called, before executing the rest of the function. These changes take effect when the code is evaluated, using (for instance) the command "M-x eval-buffer". It is not necessary to recompile or even restart Emacs, which makes customizing Emacs very convenient. If the code is saved into the Emacs "init file" (usually a file named ".emacs" in the user's home directory), then Emacs will load the extension the next time it starts. Otherwise, the changes will be lost when the user exits Emacs.

Source code

Emacs Lisp code is stored as plain text, with the filename suffix ".el" (an exception being the user's init file, which is named ".emacs".) When the files are loaded, an interpreter component of the Emacs program reads and parses the functions and variables, storing them in memory. They are then available to other editing functions, and to user commands. Functions and variables can be freely modified and re-loaded. In order to save memory space, much of the functionality of Emacs is not loaded until it is needed. Each set of optional features is implemented by a collection of Emacs code called a "library". For example, there is a library for highlighting keywords in program source code, and a library for playing the game of Tetris. Each library is implemented using one or more Emacs Lisp source files. Certain functions are written in C. These are "primitives", also known as "built-in functions" or "subrs". Although primitives can be called from the Lisp code, they can only be modified by editing the C source files and recompiling the editor. Primitives are not available as libraries; they are part of the Emacs executable. Functions are written as primitives because C code is faster than Emacs Lisp code. However, only those few functions that need to run quickly and efficiently are written as primitives, because primitives are not as flexible as Emacs Lisp functions.

Byte code

The performance of Emacs Lisp code can be further increased by "byte-compilation". Emacs contains a compiler which can translate Emacs Lisp source files into a special representation known as bytecode. Emacs Lisp bytecode files have the filename suffix ".elc". Compared to source files, bytecode files load faster, occupy less space on the disk, use less memory when loaded, and run faster. Bytecode is still slower than primitives, but functions loaded as bytecode can be easily modified and re-loaded. In addition, bytecode files are platform-independent. The standard Emacs Lisp code distributed with Emacs is loaded as bytecode, although the matching source files are usually provided for the user's reference as well. User-supplied extensions are typically not byte-compiled, as they are neither as large nor as computationally intensive.

Language features

Emacs Lisp uses dynamic scoping instead of lexical scoping. If a variable is declared within the scope of a function, it is available to subroutines called from within that function. Originally, this was meant to provide greater flexibility for user customizations. However, dynamic scoping has several disadvantages. Firstly, it can easily lead to bugs in large programs, due to unintended interactions between variables in different function. Secondly, accessing variables under dynamic scoping is generally slower than under lexical scoping. As a result, plans have been made to convert Emacs Lisp to lexical scoping, though this has not yet been done. Emacs does not optimize Tail recursion, which is done in most other Lisp implementations.

External links


- [http://www.gnu.org/software/emacs/emacs.html The Emacs page at the Gnu Project]
- R. Chassell, "Programming in Emacs Lisp, an Introduction" http://www.gnu.org/software/emacs/emacs-lisp-intro/emacs-lisp-intro.html
- B. Lewis, D. LaLiberte, R. Stallman, "GNU Emacs Lisp Reference Manual" http://www.gnu.org/software/emacs/elisp-manual/elisp.html
- [http://www.emacswiki.org/cgi-bin/wiki The Emacs Wiki] Category:LISP dialects Lisp

Lisp programming language

Lisp is a multi-paradigm, reflective programming language with a long history. Originally envisioned as a practical (in contrast to Turing's) computation model, based on ideas from lambda calculus, Lisp immediately became the favored Artificial Intelligence programming language. Lisp pioneered the use of symbol processing, list programming and tree structures (see also IPL-V), automatic storage management, interpreters, functional programming, object-oriented programming, constraint programming, macro programming, and domain-oriented programming (e.g. little languages and extensions and metaprogramming). The name Lisp derives from "List Processing". Linked lists are one of Lisp languages' major data structures, and identical basic list operations work in all Lisp dialects. Other common features in Lisp dialects include strong dynamic typing, functional programming support, and the ability to manipulate source code as data. Although it remains something of a niche player, Lisp is used in all fields, from Artificial Intelligence to web development [http://alu.cliki.net/Industry%20Application]. It is often taught (perhaps misleadingly) in computer science courses in a minimalist fashion focusing on only a few of its original features. Because of its malleability, Lisp was one of the most important research vehicles in computer science, but it (and the ambitious applications writen in it) demanded such intense memory and processor resources that it was historically impractical to widely deploy outside of research labs, except on specialized Lisp Machine hardware. However, since the beginning of the 21st century, any common conventional hardware is suitable for Lisp. Lisp is often considered to be a functional programming language because of its uniform function calling syntax, first-class function objects, closures, higher-order functions, and the ability of any expression to return a meaningful value. Still, neither the original implementation nor any subsequent major dialect of Lisp is a purely functional language. Indeed, Lisp programming very frequently involves the use of side effects. This is a good example of Lisp's paradigm-agnostic worldview. Lisp languages have an instantly-recognizable appearance. Program code is written using the same syntax as lists – the parenthesized S-expression syntax. Every sub-expression in a program (or data structure) is set off with parentheses. This makes Lisp languages easy to parse, and also makes it simple to do metaprogramming – creating programs which write other programs. This is a major reason for its great popularity in the 70s and 80s, because artificial intelligence programmers believed that Lisp would lend itself naturally to self-propagating programs. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older. Like Fortran, Lisp has changed a great deal since its early days, and a number of dialects have existed over its history. Today, the most widely-known general-purpose Lisp dialects for programming are Common Lisp and Scheme.

History

Information Processing Language was the first AI language, from 1955 or 1956, and already included many of the concepts, such as list-processing and recursion, which came to be used in Lisp. Lisp was invented by John McCarthy in 1958 while he was at MIT. McCarthy published its design in a paper in Communications of the ACM in 1960, entitled "Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I". (Part II was never published.) He showed that with a few simple operators and a notation for functions, one can build a whole programming language. McCarthy's original notation used bracketed "M-expressions" externally. These were quickly abandoned in favor of the S-expressions which he originally proposed as an internal representation. As an example, the M-expression car[cons[A,B]] is equivalent to the S-expression (car (cons A B)). Lisp was originally implemented by Steve Russell on an IBM 704 computer, and two assembly language macros for that machine became the primitive operations for decomposing lists: car (Contents of Address Register) and cdr (Contents of Decrement Register). Lisp dialects still use car and cdr (pronounced: and ) for the operations that return the first item in a list and the rest of the list respectively. The first complete Lisp compiler, written in Lisp, was implemented in 1962 by Tim Hart and Mike Levin. ([ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-039.pdf AI Memo 39], 767 kB PDF.) This compiler introduced the Lisp model of incremental compilation, in which compiled and interpreted functions can intermix freely. The language used in Hart and Levin's memo is much closer to modern Lisp style than McCarthy's earlier code. Since its inception, Lisp was closely connected with the artificial intelligence research community, especially on PDP-10 systems. Lisp was used as the implementation of the programming language Micro Planner that was the foundation for the famous AI system SHRDLU. In the 1970s, as AI research spawned commercial offshoots, the performance of existing Lisp systems became a growing issue. Partly because of garbage collection and partly because of its representation of internal structures, Lisp became difficult to run on the memory-limited stock hardware of the day. This led to the creation of LISP machines: dedicated hardware for running Lisp environments and programs. Along with modern compiler construction techniques, today's gigantic computer capacities (by the standards of the 1970s) have made this specialization unnecessary and quite efficient Lisp environments now exist. During the 1980s and 1990s, a great effort was made to unify the numerous Lisp dialects (most notably, InterLisp, Maclisp, ZetaLisp, and Franz Lisp) into a single language. The new language, Common Lisp, was essentially a superset of the dialects it replaced. In 1994, ANSI published the Common Lisp standard, "ANSI X3.226-1994 Information Technology Programming Language Common Lisp." By this time the world market for Lisp was much smaller than in its heyday. Having declined somewhat in the 1990s, Lisp has experienced a regrowth of interest since 2000, partly due to the writings of Paul Graham. Most new activity is focused around open source implementations of Common Lisp, and includes the development of new portable libraries and applications. The language is amongst the oldest programming languages still in use as of the time of writing in 2005. Algol, Fortran and COBOL are of a similar vintage, and Fortran and COBOL are also still being used. The now-ubiquitous if-then-else structure, now taken for granted as an essential element of any programming language, was invented by McCarthy for use in Lisp, where it saw its first appearance in a more general form (the cond structure). It was inherited by Algol, which popularized it. Lisp heavily influenced the inventor of SmallTalk, and in turn Lisp was influenced by Smalltalk, by adopting object-oriented (class/instance) programming features in the late 1970s. A major benefit of Lisp is rapid prototyping of applications. For example, the first graphical IMAP client was written in InterLisp. A comparable program, written in Objective C, took much longer to write. Largely because of its resource requirements with respect to early computing hardware (including early microprocessors), Lisp did not become as popular outside of the AI community as FORTRAN and the Algol descended C language. Newer languages such as Java and Python have incorporated some limited versions of some of the features of Lisp, but are necessarily unable to bring the coherence and synergy of the full concepts found in Lisp. Because of its suitability to ill-defined, complex, and dynamic applications, Lisp is presently enjoying some resurgence of popular interest. See also [http://citeseer.ist.psu.edu/steele93evolution.html The Evolution of Lisp], a paper written by Guy L. Steele, Jr. and Richard P. Gabriel.

Syntax and Semantics

:Note: This article's examples are written in Common Lisp (though most are also valid Scheme). Lisp is an expression-oriented language. Unlike most other languages, no distinction is made between "expressions" and "statements"; all code and data are written as expressions. When an expression is evaluated, it produces a value (or list of values), which then can be embedded into other expressions. McCarthy's 1958 paper introduced two types of syntax: S-expressions (Symbolic Expressions), which are also called sexp's, and M-expressions (Meta Expressions), which express functions of S-expressions. M-expressions never found favour, and almost all Lisps today use S-expressions to manipulate both code and data. The heavy use of parentheses in S-expressions has been criticized – one joke acronym for Lisp is "Lots of Irritating Superfluous Parentheses"[http://www.catb.org/~esr/jargon/html/L/LISP.html] – but the S-expression syntax is also responsible for much of Lisp's power: the syntax is extremely regular, which facilitates manipulation by computer. The reliance on expressions gives the language great flexibility. Because Lisp functions are themselves written as lists, they can be processed exactly like data: allowing easy writing of programs which manipulate other programs (metaprogramming). Many Lisp dialects exploit this feature using macro systems, which enables extension of the language almost without limit. A Lisp list is written with its elements separated by whitespace, and surrounded by parentheses. For example, :(1 2 'foo) is a list whose elements are three atoms, the values 1, 2, and foo. These values are implicitly typed: They are respectively two integers and a string of characters, and do not have to be declared as such. Note that foo is quoted in this example; the quoting prevents the atom from being evaluated. The empty list () is also represented as the special atom nil. This is the only entity in Lisp which is both an atom and a list. Expressions are written as lists, using prefix notation. The first element in the list is the name of a form, i.e., a function, operator, macro, or "special operator" (see below.) The remainder of the list are the arguments. For example, the function list returns its arguments as a list, so the expression :(list 1 2 'foo) evaluates to the list (1 2 foo). If any of the arguments are expressions, they are recursively evaluated before the enclosing expression is evaluated. For example, :(list 1 2 (list 3 4)) evaluates to the list (1 2 (3 4)). Note that the third argument is a list; lists can be nested. Arithmetic operators are treated similarly. The expression :(+ 1 2 3 4) evaluates to 10. The equivalent under infix notation would be "1 + 2 + 3 + 4". "Special operators" (sometimes called "special forms" by older users) provide Lisp's control structure. For example, the special operator if takes three arguments. If the first argument is non-nil, it evaluates to the second argument; otherwise, it evaluates to the third argument. Thus, the expression :(if nil :::(list 1 2 "foo") :::(list 3 4 "bar")) evaluates to (3 4 "bar"). (Of course, this would be more useful if a non-trivial expression had been substituted in place of nil!)

Lambda expressions

Another special operator, lambda, is used to bind variables to values which are evaluated within an expression. This form is also used to create functions. The arguments to lambda are a list of arguments, and the expression or expressions that the function evaluates to (the return value is the value of the last expression to be evaluated). The expression :(lambda (arg) (+ arg 1)) is an expression which, when applied, takes one argument, bound to arg and returns the number one greater than that argument. Lambda expressions are treated no differently to named functions; they are invoked the same way. Therefore, the expression :((lambda (arg) (+ arg 1)) 5) evaluates to 6.

Conses and lists

A Lisp list is a singly-linked list. Each cell of this list is called a cons (or sometimes a pair, particularly in Scheme, because it contains two pointers), and is composed of two pointers, called the car and cdr respectively. These are equivalent to the data and next fields discussed in the article linked list. Of the many data structures that can be built out of singly-linked lists, one of the most basic is called a proper list. A proper list is either the special nil (empty list) symbol, or a cons in which the car points to a datum (which may be another cons structure, such as a list), and the cdr points to another proper list. If a given cons is taken to be the head of a linked list, then its car points to the first element of the list, and its cdr points to the rest of the list. For this reason, the car and cdr functions are also called first and rest when referring to conses which are part of a linked list (rather than, say, a tree). Thus, a Lisp list is not an atomic object, as an instance of a container class in C++ or Java would be. A list is nothing more than an aggregate of linked conses. A variable which refers to a given list is simply a pointer to the first cons in the list. Traversal of a list can be done by "cdring down" the list; that is, taking successive cdrs to visit each cons of the list; or by using any of a number of higher-order functions to map a function over a list. Parenthesized S-expressions represent linked list structure. There are several ways to represent the same list as an S-expression. A cons can be written in dotted-pair notation as (a . b), where a is the car and b the cdr. A longer proper list might be written (a . (b . (c . (d . nil)))) in dotted-pair notation. This is conventionally abbreviated as (a b c d) in list notation. An improper list may be written in a combination of the two – as (a b c . d) for the list of three conses whose last cdr is d (i.e., the list (a . (b . (c . d))) in fully-specified form). Because conses and lists are so universal in Lisp systems, it is a common misconception that they are Lisp's only data structure. In fact, all but the most simplistic Lisps have other data structures – such as vectors (arrays), hash tables, structures, and so forth.

Shared structure

Lisp lists, being simple linked lists, can share structure with one another. That is to say, two lists can have the same tail, or final sequence of conses. For instance, after the execution of the following Common Lisp code: :(setq foo (list 'a 'b 'c)) :(setq bar (cons 'x (cdr foo))) the lists foo and bar are (a b c) and (x b c) respectively. However, the tail (b c) is the same structure in both lists. In many languages, the usual way to place the same data in two different structures is to copy it. Sharing structure rather than copying can give a dramatic performance improvement. However, this technique can interact in undesired ways with functions that alter lists passed to them as arguments. Altering one list, such as by replacing the c with a goose, will affect the other: :(setf (third foo) 'goose) This changes foo to (a b goose), but also changes bar to (x b goose) – a possibly unexpected result. This can be a source of bugs, and functions which alter their arguments are documented as destructive for this very reason. Aficionados of functional programming avoid destructive functions. In the Scheme dialect, which favors the functional style, the names of destructive functions are marked with a cautionary exclamation point, or "bang" — such as set-car! (read set car bang), which replaces the car of a cons. In the Common Lisp dialect, destructive functions are commonplace; the equivalent of set-car! is named rplaca for "replace car." This function is rarely seen however as Common Lisp includes a special facility, setf, to make it easier to define and use destructive functions. A frequent style in Common Lisp is to write code functionally (without destructive calls) when prototyping, then to add destructive calls as an optimization where it is safe to do so.

Self-evaluating forms and quoting

Lisp evaluates expressions which are entered by the user. Symbols and lists evaluate to some other (usually, simpler) expression – for instance, a variable evaluates to its value; (+ 2 3) evaluates to 5. However, most other forms evaluate to themselves. They are parsed by the read function, but are left alone by eval. Numbers and strings are this way: if you enter 5 into Lisp, you just get back the same 5. Any expression can also be marked to prevent it from being evaluated (as is necessary for symbols and lists). This is the role of the quote special operator, or its abbreviation ' (a single quotation mark). For instance, usually if you enter the symbol foo you will get back the value of that variable (or an error, if there is no such variable). If you wish to refer to the symbol itself, you enter (quote foo) or, usually, 'foo. More complex forms of quoting are used with macros. For instance, both Common Lisp and Scheme support the backquote or quasiquote operator, entered with the ` character. This is almost the same as the plain quote, except it allows variables to be interpolated into a quoted list with the comma and comma-at operators. If the variable snue has the value (bar baz) then `(foo ,snue) evaluates to (foo (bar baz)), while `(foo ,@snue) evaluates to (foo bar baz). The backquote or quasiquote is most frequently used in defining macro expansions. Self-evaluating forms and quoted forms are Lisp's equivalent of literals. However, they are not necessarily constants. In some Lisp dialects it is possible to modify the values of literals in program code. For instance, if a quoted form is used in the body of a function, and is changed as a side-effect, that function's behavior may differ on subsequent iterations. This is usually a bug, and is undefined behavior in some dialects. When behavior like this is intentional, using a closure is the explicit way to do it. Lisp's formalization of quotation has been noted by Douglas Hofstadter and others as an example of the philosophical idea of self-reference.

Scoping and closures

A major split in the modern Lisp family is between dynamic scoping and lexical scoping. The latter makes use of closures whilst the former is simpler and does not. Today, Scheme and Common Lisp both make use of lexical scoping by default; while the more primitive Lisp systems used as embedded languages in Emacs and AutoCAD use dynamic scoping.

List structure of program code

A fundamental distinction between Lisp and other languages is that in Lisp, program code is not simply text. Parenthesized S-expressions, as depicted above, are the printed representation of Lisp code, but as soon as these are entered into a Lisp system they are translated by the parser (called the read function) into linked list and tree structures in memory. Lisp macros operate on these structures. Because Lisp code has the same structure as lists, macros can be built with any of the list-processing functions in the language. In short, anything that Lisp can do to a data structure, Lisp macros can do to code. In contrast, in most other languages the parser's output is purely internal to the language implementation and cannot be manipulated by the programmer. Macros in C, for instance, operate on the level of the preprocessor, before the parser is invoked, and cannot re-structure the program code in the way Lisp macros can. In simplistic Lisp implementations, this list structure is directly interpreted to run the program; a function is literally a piece of list structure which is traversed by the interpreter in executing it. However, most actual Lisp systems (including all conforming Common Lisp systems) also include a compiler. The compiler translates list structure into machine code or (rarely) bytecode for execution.

Evaluation and the Read-Eval-Print Loop

Lisp languages are frequently used with an interactive command line, which may be combined with an integrated development environment. The user types in expressions at the command line, or directs the IDE to transmit them to the Lisp system. Lisp reads the entered expressions, evaluates them, and prints the result. For this reason, the Lisp command line is called a "read-eval-print loop", or REPL. The basic operation of the REPL is as follows. This is a simplistic description which omits many elements of a real Lisp, such as quoting and macros. The read function accepts textual S-expressions as input, and parses them into list structure. For instance, if you type the string (+ 1 2) at the prompt, read translates this into a linked list with three elements – the symbol +, the number 1, and the number 2. It so happens that this list is also a valid piece of Lisp code; that is, it can be evaluated. This is because the car of the list names a function – the addition operation. The eval function evaluates list structure, returning some other piece of structure as a result. Evaluation does not have to mean interpretation; some Lisp systems compile every expression to native machine code. It is simple, however, to describe evaluation as interpretation: To evaluate a list whose car names a function, eval first evaluates each of the arguments given in its cdr, then applies the function to the arguments. In this case, the function is addition, and applying it to the argument list (1 2) yields the answer 3. This is the result of the evaluation. Evaluation is performed in applicative order. It is the job of the print function to represent output to the user. For a simple result such as 3 this is trivial. An expression which evaluated to a piece of list structure would require that print traverse the list and print it out as an S-expression. To implement a Lisp REPL, it is necessary only to implement these three functions and an infinite-loop function. (Naturally, the implementation of eval will be complicated, since it must also implement all the primitive functions like car and + and special operators like if.) This done, a basic REPL itself is but a single line of code: (loop (print (eval (read)))).

Control structures

Lisp originally had very few control structures, but many more were added during the language's evolution. (Lisp's original conditional operator, cond, is the precursor to later if-then-else structures.) Programmers in the Scheme dialect often express loops using tail recursion. Scheme's commonality in academic computer science has led some students to believe that tail recursion is the only, or the most common, way to write iterations in Lisp. Nothing could be further from the case. All frequently-seen Lisp dialects have imperative-style iteration constructs, from Scheme's straightforward do loop to Common Lisp's complex loop expressions. Most Lisp control structures are special operators, equivalent to other languages' syntactic keywords. Expressions using these operators have the same surface appearance as function calls, but differ in that the arguments are not necessarily evaluated -- or, in the case of an iteration expression, may be evaluated more than once. Both Common Lisp and Scheme have operators for non-local control flow. The differences in these operators are some of the deepest differences between the two dialects. Scheme supports re-entrant continuations using the call/cc procedure, which allows a program to save (and later restore) a particular place in execution. Common Lisp does not support re-entrant continuations, but does support several ways of handling escape continuations. Frequently, the same algorithm can be expressed in Lisp in either an imperative or a functional style. As noted above, Scheme tends to favor the functional style, using tail recursion and continuations to express control flow. However, imperative style is still quite possible. The style preferred by many Common Lisp programmers may seem more familiar to programmers used to structured languages such as C, while that preferred by Schemers more closely resembles pure-functional languages such as Haskell. Because of Lisp's early heritage in list processing, it has a wide array of higher-order functions relating to iteration over sequences. In many cases where an explicit loop would be needed in other languages (like a for loop in C) in Lisp the same task can be accomplished with a higher-order function. (The same is true of many functional programming languages.) A good example is a function which in Scheme is called map and in Common Lisp is called mapcar. Given a function and one or more lists, mapcar applies the function successively to the lists' elements in order, collecting the results in a new list: :(mapcar #'+ '(1 2 3 4 5) '(10 20 30 40 50)) This applies the + function to each corresponding pair of list elements, yielding the result (11 22 33 44 55).

Examples

Here are examples of Common Lisp code. While unlike Lisp programs used in industry, they are similar to Lisp as taught in computer science courses. As the reader may have noticed from the above discussion, Lisp syntax lends itself naturally to recursion. Mathematical problems such as the enumeration of recursively-defined sets are simple to express in this notation. Evaluate a number's factorial: :(defun factorial (n) :::(if (<= n 1) ::::::1 :::::(
- n (factorial (- n 1))))) In classic Lisp, this would be expressed as :(defun factorial (n) :::(cond ((zerop n) 1) ; if n=0, return 1 :::::(t (times n (factorial (sub1 n)))))) ; else return n
- factorial(n-1) This example would, however, result in an infinite loop when presented with a negative number. An alternative implementation, faster than the previous version if the Lisp system has tail recursion optimization: :(defun factorial (n &optional (acc 1)) :::(if (<= n 1) :::::acc :::::(factorial (- n 1) (
- acc n)))) Contrast with an iterative version which uses Common Lisp's loop macro: :(defun factorial (n) :::(loop for i from 1 to n :::::::for fac = 1 then (
- fac i) :::::::finally return fac)) The following function reverses a list. (Lisp's built-in reverse function does the same thing.) :(defun -reverse (l &optional acc) :::(if (atom l) ::::::acc ::::::(-reverse (cdr l) (cons (car l) acc))))

Object systems

Various object systems and models have been built on top of, alongside, or into Lisp, including:
- Flavors, built at MIT
- The Common Lisp Object System, CLOS (descended from Flavors)
- KR (short for Knowledge Representation), a constraints-based object system developed to aid the writing of Garnet, a GUI library for Common Lisp
- SageCLOS an Object Oriented Interface to AutoLISP invented by Ralph Gimenez. CLOS features multiple inheritance, multiple dispatch ("multimethods"), and a powerful system of "method combinations". In fact, Common Lisp, which includes CLOS, was the first object-oriented language to be officially standardized.

Genealogy and variants

Over its almost fifty-year history, Lisp has spawned many variations on the core theme of an S-expression language. Moreover, each given dialect may have several implementations – for instance, there are more than a dozen implementations of Common Lisp. Differences between dialects may be quite significant – for instance, Common Lisp and Scheme do not even use the same keyword to define functions! Within a dialect that is standardized, however, conforming implementations support the same core language, but with different extensions and libraries.

Major modern dialects

;Common Lisp : descended mainly from ZetaLisp and Franz Lisp, with some InterLisp input. Prevailing standard for industrial use today. ;Scheme : a minimalist "academic" Lisp; an early user of lexical variable scoping and continuations. ;Emacs Lisp : scripting language for the Emacs editor.

Historically significant dialects


- LISP 1.5 – First widely distributed version, developed by McCarthy and others at MIT. So named because it contained several improvements on the original "LISP 1" interpreter, but was not a major restructuring as the planned LISP 2 would be. (LISP 2 used an M-expression-based syntax and would not be widely used.)
- Stanford LISP 1.5 – This was a successor to LISP 1.5 developed at the Stanford AI Lab, and widely distributed to PDP-10 systems running the TOPS-10 operating system. It was quickly obsoleted by Maclisp and InterLisp.
- MACLISP – developed for MIT's Project MAC (no relation to Apple's Macintosh, nor to MacCarthy (sic)), direct descendant of LISP 1.5. It ran on the PDP-10 and Multics systems.
- InterLisp – developed at BBN for PDP-10 systems running the Tenex operating system, later adopted as a "West coast" Lisp for the Xerox Lisp machines. A small version called "InterLISP 65" was published for Atari's 6502-based computer line. For quite some time MacLisp and InterLisp were strong competitors.
- Franz Lisp – originally a Berkeley project; later developed by Franz, Inc. The name is a humorous deformation of 'Franz Liszt'.
- ZetaLisp – used on the Lisp machines, direct descendant of MacLisp.
- ANSI Common Lisp – mostly, a cleaned up subset of ZetaLisp incorporating CLOS.

Minor Dialects


- Arc – A lisp currently being designed by Paul Graham
- EuLisp
- GOO – A lisp which takes ideas from Dylan , Scheme and Cecil
- [http://web.media.mit.edu/~stefan/isis/ Isis] – A Lisp dialect developed at the MIT Media Lab for multimedia applications.
- ISLISP
- LUSH (Lisp Universal SHell) – A Lisp dialect developed for numerical and multimedia applications.
- Lispkit Lisp – A purely functional ("pure Lisp") dialect implemented on a virtual machine (the SECD machine) and used as a testbed for experimentation with functional language concepts.
- MultiLisp – dialect of Scheme, extended with constructs for parallel execution
- newLisp – A stripped down version of Lisp with a focus on scripting and integrated networking functions.
- rep – Read-Eval-Print, used in the Sawfish window manager.

Miscellaneous implementations

:See also the implementations listed at Scheme and Common Lisp.
- Gold Hill Common Lisp – an early PC implementation of Common Lisp.
- [http://www.digitool.com/ Macintosh Common Lisp (MCL)] – an implementation of Lisp for the Macintosh.
- Cambridge Lisp – originally implemented on IBM mainframes; published by Metacomco for the Amiga.
- Knowledge Representation System
- Symmetric Lisp – A parallel Lisp in which environments are first-class objects. It is implemented in Common Lisp.
- STING – A parallel dialect of Scheme intended to serve as a high-level operating system for symbolic programming languages. Features include first-class threads and processors and customisable scheduling policies.
- AutoLISP/Visual LISP – customization language for the AutoCAD product.
-
- Lisp
(StarLisp) – A data-parallel extension of Common Lisp for the Connection Machine, uses "pvars".
- [http://jatha.sourceforge.net/ Jatha] is a Java library that implements a fairly large subset of Common Lisp
- GOAL is a proprietary LISP derivative developed by Andrew Gavin at Naughty Dog for the express purpose of developing console video games.

Related languages


- Logo – A descendant of Lisp which generally avoids the use of parentheses. Best known for turtle graphics.
- Dylan – A Scheme descendant developed by Apple Computer, which originally used S-expressions but later adopted a non-Lisp syntax.

Quotations

:
Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot. : — Eric S. Raymond, "How to Become a Hacker" [http://www.catb.org/~esr/faqs/hacker-howto.html#skills1] : We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp. : — Guy Steele, co-author of the Java language specification : Lisp has jokingly been called "the most intelligent way to misuse a computer". I think that description is a great compliment because it transmits the full flavor of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts. : — Edsger Dijkstra, CACM, 15:10 : Lisp is a programmable programming language. : — John Foderaro : Experienced Lisp programmers divide up their programs differently. As well as top-down design, they follow a principle which could be called bottom-up design — changing the language to suit the problem. In Lisp, you don't just write your program down toward the language, you also build the language up toward your program. As you're writing a program you may think "I wish Lisp had such-and-such an operator." So you go and write it. Afterward you realize that using the new operator would simplify the design of another part of the program, and so on. Language and program evolve together. Like the border between two warring states, the boundary between language and program is drawn and redrawn, until eventually it comes to rest along the mountains and rivers, the natural frontiers of your problem. In the end your program will look as if the language had been designed for it. And when language and program fit one another well, you end up with code which is clear, small, and efficient. : — Paul Graham, "On Lisp" [http://www.paulgraham.com/onlisp.html] : Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp. : — Philip Greenspun, often called Greenspun's Tenth Rule of Programming [http://philip.greenspun.com/research/]

See also


- Planner (a programming language implemented in Lisp that was the basis of the famous SHRDLU AI system)
- Bill Schelter (Lisp programmer; maintainer of Maxima and GCL)
- Kent Pitman (Lisp programmer; Common Lisp standard editor)
- Paul Graham (Lisp programmer; inventor of Arc dialect of Lisp; essayist)

External links


- [http://lisp.org Association of Lisp Users]
- [http://wiki.alu.org/ Association of Lisp Users Wiki], a general discussion of things Lispish
- [http://www.cliki.net/ CLiki], a wiki about free software in Common Lisp.
- [http://www.cons.org/ A collection of Lisp-related sites]
- [http://community.computerhistory.org/scc/projects/LISP/ History of LISP at the Computer History Museum]
- [http://www.gigamonkeys.com/book/ "Practical Common Lisp"] (book by Peter Seibel)
- [http://www.norvig.com/paip.html "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp"] (book by Peter Norvig)
- [http://www.paulgraham.com/acl.html "ANSI Common Lisp"] (book by Paul Graham)
- [http://www.paulgraham.com/onlisp.html "On Lisp"] (book by Paul Graham)
- [http://mitpress.mit.edu/sicp/sicp.html "Structure and Interpretation of Computer Programs"] (book by Ableson and Sussman)
- [http://www.apl.jhu.edu/~hall/lisp.htm An Introduction and Tutorial for Common Lisp]
- [http://www.solace.mh.se/~janne/lecture-notes/university/pdf/common-lisp-97.pdf Mid-Sweden University Sundsvall Common Lisp B-level course], notes from the lectures, spring of 1997 (PDF document format)
- [http://www.psychologie.uni-trier.de/projects/ELM/elmart.html An interactive LISP course]
- [http://www.geocities.com/fhzeya20042000/lisp.htm Lisp tutorial by Faiz ul haque Zeya]
- [http://www.lisp.org/table/systems.htm A list of Common Lisp implementations]
- [http://www.cons.org/cmucl CMUCL: a high-performance, free Common Lisp implementation]
- [http://www.clisp.org/ GNU CLISP]: A portable (Unix, Microsoft Windows, Mac OS X) ANSI Common Lisp implementation
- [http://www.gnu.org/software/gcl GCL]: A GNU cross-platform Common Lisp implementation
- [http://www.rtfm.org.ar/slisp/ Simple Lisp Interpreter], designed for learning and teaching the basics of Lisp.
- [http://www.newlisp.org newLISP], a Lisp-like scripting language for quick learning.
- [http://hedgehog.oliotalo.fi/ Hedgehog Lisp], a functional Lisp dialect for small embedded targets (20kb bytecode interpreter and runtime)
- [http://software-lab.de/down.html Pico Lisp], another very small implementation including an application framework and clever browser-based GUI scheme. Source tarball (.tgz) is 352kb.
- [http://www.lisperati.com/ Lisp Comic] Category:Dynamically-typed programming languages Category:Functional languages Category:Lisp programming language Category:.NET programming languages Category:Programming languages ko:리스프 ja:LISP


GNU Emacs

GNU Emacs is one of the two most popular versions of Emacs (see also XEmacs). The GNU Emacs manual describes it as "the extensible, customizable, self-documenting, real-time display editor." Since so much of the user interface of GNU Emacs and XEmacs is the same, a combined introduction is available in Emacs.

Distribution

GNU Emacs is free software, distributed under the terms of the GNU GPL. The source code and binaries are available via FTP from the GNU project website (see below). They are also widely available from other sites on the Internet. Vendors of Unix systems, both free and proprietary, frequently provide Emacs bundled with the operating system. GNU Emacs runs on a large number of platforms, including GNU/Linux, FreeBSD, most other variants of Unix, Mac OS 8/OS 9, Mac OS X, and Microsoft Windows.

Development

GNU Emacs is part of the GNU project, and is under active development. Several, but not all, of the developers are affiliated with the Free Software Foundation (FSF). As of 2004, the latest release of GNU Emacs is version 21.4. Development takes place in a single CVS trunk, which is at version 22.0.50. The current maintainer is Richard Stallman. Until 1999, GNU Emacs development was relatively closed, to the point where it was used as an example of the "Cathedral" development style in The Cathedral and the Bazaar. The project has since adopted a public development mailing list and anonymous CVS access. As with all GNU projects, it remains policy to accept significant code contributions only if the copyright holder assigns the code's copyright to the FSF, although one exception was made to this policy for the MULE (MULtilingual Extension) code [http://mail.gnu.org/archive/html/bug-gnu-emacs/2000-09/msg00065.html] since the copyright holder is the Japanese government and copyright assignment was not possible. This does not apply to extremely minor code contributions or bug fixes—but. There is no strict definition of minor, but as a guideline less than 10 lines of code is considered minor. This policy is intended to facilitate copyleft enforcement, so that the FSF can defend the software in a court case if one arises. This requirement by the GNU Emacs maintainers is assumed to affect contributions. Some people claim that it even affects performance, e g the inability of GNU Emacs to handle large files in an efficient manner could be blamed on the mentioned requirement repelling any serious developer. However, according to Stallman, it is more important for the program to be "free" than good in any other aspect. Enforcement provides legal confidence in the GNU Emacs free software license—the GNU General Public License—and in the free software itself—an intellectual work with many copyrights and contributors.

References


- Stallman, Richard M. (2002). GNU Emacs Manual. 15th ed. Boston, Massachusetts: Free Software Foundation. ISBN 1-882114-85-X.
- Rosenblatt, Bill; Raymond, Eric S.; Cameron, Debra. (1996). Learning GNU Emacs. 2nd ed. O'Reilly & Associates. ISBN 1565921526.
- Cameron, Debra; Elliott, James; Loy, Marc. (December 2004). Learning GNU Emacs, 3rd ed. O'Reilly & Associates. ISBN 0596006489.
- Glickstein, Bob. (April 1997). Writing GNU Emacs Extensions. O'Reilly & Associates. ISBN 1-56592-261-1.

External links


- [http://www.gnu.org/software/emacs/emacs.html The GNU Emacs homepage] including
  - [http://www.gnu.org/software/emacs/manual/ GNU Emacs Manual. 15th ed. (Emacs 21.3). GNU Press, 2002] – Online version (HTML from texinfo), published under the GFDL
  - [http://www.gnu.org/software/emacs/emacs-lisp-intro/ An Introduction to Programming in Emacs Lisp]. 2nd ed. By R. Chassell
  - [http://www.gnu.org/software/emacs/emacs-faq.text GNU Emacs FAQ]
  - [http://www.gnu.org/software/emacs/windows/ntemacs.html GNU Emacs FAQ For Windows 95/98/ME/NT/XP and 2000] Category:Linux text editors Category:Mac OS text editors Category:OpenVMS text editors Category:Windows text editors Category:File comparison tools Category:Free software Emacs Category:Emacs Category:Integrated development environments ja:Emacs ko:Emacs

XEmacs

XEmacs is a text editor which is based on, and forked from, the GNU Emacs text editor. It capitalises on good GUI support. XEmacs runs on almost any Unix-like operating system (inside X or in a text terminal), as well as on Microsoft Windows. It also runs on Mac OS X with an X server (a native Carbon version is in alpha testing). Like GNU Emacs, XEmacs is free software which is available under the GNU General Public License. When speaking about an unspecified version of GNU Emacs/XEmacs, the generic lowercase term emacs (plural emacsen) is used. XEmacs was created in 1991 as Lucid Emacs by Richard P. Gabriel's Lucid Inc. to support their proprietary Energize C++ IDE. Lucid forked the code, developing and maintaining their own version of Emacs, because they were dissatisfied with the maintenance of the original Emacs, and delays in the release of the next GNU Emacs . Their version of Emacs was very popular, so when Lucid went out of business in 1994, the code was picked up by another development team, and began to be maintained under its current name, "XEmacs" (the "X" coming from the X Window System).

Differences between GNU Emacs and XEmacs

GNU Emacs and XEmacs have different development philosophies. XEmacs is more open to experimentation, and is often the first to offer new features, such as inline images, variable fonts and terminal coloring. In the past, some detractors have complained that because of its more aggressive, features-driven approach, XEmacs internals are less consistent and less extensively documented than GNU Emacs. Actually, the opposite is true: XEmacs comes with a 140-page internals manual (Wing and Buchholz, 1997), making it one of the most well-documented software projects, and has been more open to change than GNU Emacs, with the result that its internals have been extensively rewritten to improve consistency and follow modern programming conventions stressing data abstraction. One of the sticking points in the various GNU Emacs/XEmacs merge talks, in fact, has been the XEmacs preference for abstract data interfaces as compared to Richard Stallman's preference for interfaces that use simple Lisp data types (cons, vector) and expose the internals. It is a popular myth that XEmacs does not, or did not, support text terminals or anything other than X11. This is partly due to the name, which was due to a choice (sometimes seen as unfortunate) by Lucid, Inc. and Sun Microsystems when they were the primary developers of XEmacs. In fact, XEmacs has had proper support for text terminals (or emulators such as xterm) since version 19.12 (early 1995) and has supported Microsoft Windows natively since the late 1990s. For a period of time it even had some terminal features, such as coloring, that GNU Emacs lacked. In keeping with its bazaar-model development environment, XEmacs has a separate packaging system for independently maintained Lisp packages, which is more extensive and more up-to-date than GNU Emacs, which includes a much smaller subset of more carefully integrated packages in its core. Historically, XEmacs had a more open development environment, including anonymous CVS access and publicly accessible development mailing lists. However, with the release of GNU Emacs 21 in 2001, the GNU project has begun providing both of these facilities, as well. The development models of the projects are now very similar, with the exception that all contributions to GNU Emacs require copyright-assignment papers for any significant contributions (that is, the author must sign a legal document transferring the copyright of the code to the Free Software Foundation, or FSF.) The issue of copyright assignment is one of the main issues dividing the two camps. XEmacs still has somewhat better X toolkit support, and experimental GTK+ support. However, as of 2005, the released version depends on the unmaintained package called Mule-UCS to support Unicode, while GNU Emacs has had robust integrated Unicode support since <2003. The development branch of XEmacs has had robust native support for external Unicode encodings since May 2002, but the internal Mule character sets are incomplete, and development seems stalled as of September 2005. Programmers who wish their Emacs Lisp packages to work with both programs have to be careful to avoid features specific to either. For example, XEmacs introduced the concept of extents, a region of text that can be assigned attributes such as color and font. A similar but not identical feature, overlays, was later added to GNU Emacs. XEmacs' project policy is to maintain compatibility with the GNU Emacs API. For example, it provides a compatibility layer implementing overlays via the native extent functionality. The schism between GNU Emacs and XEmacs is one of the most well-known examples of a code fork. Both programs are licensed under the GNU GPL (in fact, the copyright of some of the XEmacs code is held by the Free Software Foundation, due to prior copyright assignment during merge attempts and to borrowing from GNU Emacs), so code could in principle be freely exchanged between the two projects. However, the GNU Emacs project has a policy of including only contributions whose copyright has been assigned to the FSF. The FSF asserts that copyright assignment is necessary to allow it to defend the code against GPL violations ([http://www.fsf.org/licensing/licenses/why-assign.html "Why the FSF gets copyright assignments from contributors"]). The XEmacs project, which does not share the FSF's interpretation, does not and has never required copyright assignment, and has in the past received much assistance from various software corporations who also agree. The XEmacs project has also freely accepted patches from various contributors over the years. The result is that much of the code in XEmacs has an unknown or corporate copyright, and copyright assignment of all the code is not possible in practice. Because of RMS's strict insistence on copyright assignment, much or all of the XEmacs code could not be used in a potential merge with GNU Emacs, which (in addition to numerous other issues) has tended to scuttle the various merge attempts to date. There is significant rivalry between the two camps, which is why new features in either editor usually show up in the other sooner or later. However, many developers contribute to both projects; in particular, many major Lisp subsystems, such as Gnus and Dired, are developed to work with both.

Project status

XEmacs development is split between three branches: stable, gamma, and beta, with beta being the first to get new features, but being the least tested, stable and secure. Version 20.0 was released on 9 February 1997, and 21.0 on 12 July 1998. As of November 2003, the current versions in these branches were 21.4.14 and 21.5.16, without any gamma release. Future version numbers will follow a scheme similar to Linux, with an odd second number signalling a development version, and an even second number for stable releases. In October 2005, the stable branch was 21.4.17 and the Beta branch was 21.5.23.

Further reading


- Emacs − a more in-depth article about GNU Emacs and XEmacs.
- [http://www.xemacs.org/ The XEmacs Project's website].
- [http://www.jwz.org/doc/lemacs.html Lucid Emacs history] from the view of its original maintainer, Jamie Zawinski.
- [http://stallman.org/articles/xemacs.origin The Origin of XEmacs, according to Richard Stallman] Category:Linux text editors Category:Mac OS text editors Category:Windows text editors Category:Emacs Category:Free software Category:Integrated development environments

Text editor

]] A text editor is a piece of computer software for editing plain text. It is distinguished from a word processor in that it does not manage document formatting or other features commonly used in desktop publishing. Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code. Some text editors are small and simple, while others offer a broad and complex range of functionality. For example, Unix and Unix-like operating systems have the vi editor (or a variant), but many also include the Emacs editor to edit text as well. Microsoft Windows systems come with the very simple Notepad, though many people (especially programmers) use a more complete program. Under Apple Macintosh's classic Mac OS, there was the native SimpleText, which was replaced or supplemented by WorldText. Some editors, such as TextEdit and WordStar, have dual operating modes, allowing them to be either a text editor or a word processor.

History

Before text editors existed, computer text was punched into Hollerith cards with keypunch machines. The text was carried as a physical box of these thin cardboard cards, and read into a card-reader. The first text editors were line editors oriented on typewriter style terminals and they did not provide a window or screen-oriented display. They usually had very short commands (typewriters were not very reliable) that reproduced the current line. Among them were command to print a selected section(s) of the file on the typewriter (or printer) in case of necessity. An "edit cursor", an imaginary insertion point, can be moved by special commands that operated with line numbers of specific text strings (context). Later the context strings were extended to regular expressions. To see the changes the file needed to be printed of the printer. These "line-based text editors" were considered revolutionary improvements over keypunch machines. In cases typewriter-based terminals were not available they were adapted to keypunch equipment. In this case the user needed to punch the commands into the separate deck of cards and feed them into the computer in order to edit the file. When computer terminals with video screens became inexpensive, screen-based text editors became common. One of the earliest "full screen" editors is vi, which is still a standard editor for Unix and Linux operating systems. The productivity of editing using these editors (compared to the line-based editors) motivated many of the early purchases of video terminals.

Types of text editors

Text editors geared for professional computer users place no limit on the size of the file being opened. In particular, they start quickly even when editing large files, and can edit files that are too large to fit the computer's main memory. Simpler text editors often just read files in an array in RAM. On larger files, this is slow, and very large files often do not fit. The ability to read and write very large files is needed by many professional computer users. For example, system administrators may need to read long log files. Programmers may need to change large source code, or examine naturally large texts, such as an entire dictionary placed in a single file. Some text editors include specialized computer languages to customize the editor (programmable editors). For example, EMACS can be customized by programming in Lisp. These usually permit the editor to simulate the keystroke combinations and features of other editors, so that users don't have to learn the native command combinations. Other important class of programmable editors uses REXX as its scriping language. They permit entering both commands and REXX statements directly in the command line at the bottom of the screen (can be hidden and activated by a keystroke). This class of editors is usually called "orthodox editors" and most of represenatatives of this class are derivatives of Xedit, IBM's editor for VM/CMS. Among them we can note THE(high quality open source editor), Kedit, Slickedit, X2, Uni-edit and Sedit. Some vi derivatives like VIM also support folding as well as macro language and has a command line at the bottom for entering the commands. They can be considered as another branch of orthodox editors family. Many text editors for software developers include source code syntax highlighting and automatic completion to make programs easier to read and write. Programming editors often permit one to select the name of a subprogram or variable, and then jump to its definition and back. Often an auxiliary utility, like ctags is used to locate the definitions. Some editors include special features and extra functions, for instance,
- Source code editors
- Folding editors. This subclass includes so-called "orthodox editors" that are derivatives of Xedit. The specialized version of folding is usually called outlining (see below).
- IDEs
- HTML editors
- Outliners. Folding can generally be considered as a generalized outlining. are packages with text editors included, usually with extra functionality.

See also


- Editor war
- List of text editors
- Comparison of text editors
- Collaborative real-time editor

External links


- [http://TextEditors.org The text editor wiki]
- [http://sourceforge.net/softwaremap/trove_list.php?form_cat=63 Text Editors at sourceforge.net]
- [http://www.softpanorama.org/Editors/index.shtml Orthodox Editors as a Special Class of Advanced Editors] Discusses Xedit and its clones with an emphasis of folding capabilities and programmability.
-
Category:Technical communication tools ko:문서 편집기 ms:Penyunting teks ja:テキストエディタ



C programming language

and Dennis Ritchie, the original edition that served for many years as an informal specification of the language]] The C programming language is a standardized imperative computer programming language developed in the early 1970s by Dennis Ritchie for use on the Unix operating system. It has since spread to many other operating systems, and is one of the most widely used programming languages. C is prized for its efficiency, and is the most popular programming language for writing system software, though it is also used for writing applications. It is also commonly used in computer science education, despite not being designed for novices.

Features

Overview

C is a relatively minimalist programming language that operates close to the hardware, and is more similar to assembly language than to most high-level languages. Indeed, C is sometimes referred to as "portable assembly", reflecting its important difference from low-level languages such as assembly languages: C code can be compiled to run on almost any computer, more than any other language in existence, while any given assembly language runs on at most a few very specific models of computers. For these reasons C has been called a medium-level language. C was created with one important goal in mind: to make it easier to write large programs with fewer errors in the procedural programming paradigm, but without encumbering the writer of the C compiler by complex language features. To this end, C has the following important features:
- A simple core language, with important functionality such as math functions or file handling provided by sets of library routines instead
- Focus on the procedural programming paradigm, with facilities for programming in a structured style
- A simple type system which prevents many operations that are not meaningful
- Use of a preprocessor language, the C preprocessor, for tasks such as defining macros and including multiple source code files
- Low-level unchecked access to computer memory via the use of pointers
- A minimalistic set of keywords
- Parameters that are passed either by value or by reference via an explicitly managed pointer.
- Function pointers, which allow for a rudimentary form of closures and polymorphism
- Lexical variable scoping
- Records, or user-defined aggregate datatypes (structs) which allow related data to be combined and manipulated as a whole Some features that C lacks that are found in other languages include:
- Type safety
- Automatic garbage collection
- Classes or objects with behavior (see object-oriented programming)
- An advanced type system
- Closures
- Nested functions
- Generic programming
- Overloading and operator overloading
- Native support for multithreading and networking Although the list of useful features C lacks is long, this has in a way been important to its acceptance, because it allows new compilers to be written quickly for it on new platforms, and because it keeps the programmer in close control of what the program is doing. This is what often allows C code to run more efficiently than many other languages. Typically only hand-tuned assembly language code runs faster, since it has full control of the machine, but advances in C compilers, and new complexity in modern processors, have gradually narrowed this gap. One consequence of C's wide acceptance and efficiency is that the compilers, libraries, and interpreters of other higher-level languages are often implemented in C.

"hello, world" example

The following simple application appeared in the first edition of K&R, and has become a standard introductory program in most programming textbooks, regardless of language. The program prints out "hello, world" to standard output, which is usually a terminal or screen display. However, it might be a file or some other hardware device, including the bit bucket, depending on how standard output is mapped at the time the program is executed.

main()
The above program will compile correctly on most modern compilers that are not in compliance mode. However, it produces several warning messages when compiled with a compiler that conforms to the ANSI C standard. Additionally, the code will not compile if the compiler strictly conforms to the C99 standard, as a return value of type int will no longer be assumed if the source code has not specified otherwise. These messages can be eliminated with a few minor modifications to the original program:
#include 

int main(void)
What follows is a line-by-line analysis of the above program:
#include 
This first line of the program is a preprocessing directive, #include. This causes the preprocessor — the first tool to examine source code when it is compiled — to substitute for that line the entire text of the file or other entity to which it refers. In this case, the header stdio.h — which contains the definitions of standard input and output functions — will replace that line. The angle brackets surrounding stdio.h indicate that stdio.h can be found using an implementation-defined search strategy. Double quotes may also be used for headers, thus allowing the implementation to supply (up to) two strategies. Typically, angle brackets are used for headers supplied by the implementation, and double quotes for "in-house" headers.
int main(void)
This next line indicates that a function named main is being defined. The main function serves a special purpose in C programs. When they are executed, main() is the first function called. The portion of the code that reads int indicates that the return value — the value to which the main function will evaluate — is an integer. The portion that reads (void) indicates that the main function takes no arguments. See also void.

This closing curly brace indicates the end of the code for the main function.

If the above code were compiled, it would do the following:


- Print the string "hello, world" onto the standard output device (typically but by no means always a terminal),
- Move the current position indicator to the beginning of the next line,
- Then return the integer zero to the application's executor.

Types

C has a type system similar to that of other ALGOL descendants such as Pascal, although different in a number of ways. There are primitive types for integers of various sizes, both signed and unsigned, floating-point numbers, characters, and enumerated types (enum). There are also derived types including arrays, pointers, records (struct), and untagged unions (union). C makes extensive use of pointers, a very simple type of reference that records, in effect, the address or location of an object in memory. Pointers can be dereferenced to access the data stored at the underlying address. Pointers can be freely manipulated, using normal assignments and also pointer arithmetic. The run-time representation of a pointer value is typically a raw memory address, but at compile time, a pointer variable's type includes the type of the data pointed to, which allows expressions including pointers to be type-checked. Pointers are used for many different purposes in C. Text strings are commonly manipulated using pointers into arrays of characters. Dynamic memory allocation, which is described below, is performed using pointers. It is also possible to use pointers to functions. A null pointer is a reserved pointer value that points to no valid location. (Dereferencing a null pointer is therefore meaningless.) Null pointers are useful for indicating special cases such as the next pointer in the final node of a linked list, or as an error return from functions that return pointers. Pointers to type void also exist, and point to objects of unknown type. A void pointer is therefore used as a "generic pointer" (see also generic programming). Since the size and type of the pointed-to object is not known, void pointers cannot be dereferenced, nor is pointer arithmetic on them possible, but they can be easily (and in fact implicitly) converted to and from any other object pointer type. Traditionally, array types in C were always one-dimensional and of a fixed, static size specified at compile time. (The latest "C99" standard does allow some forms of variable-length arrays.) However, it is also perfectly straightforward to allocate a block of memory (of arbitrary size) at run-time using the standard library and treat it as an array. C's unification of arrays and pointers (see below) means that true arrays and these dynamically-allocated, simulated arrays are virtually interchangeable. However, since arrays are always accessed (in effect) via pointers, array accesses are typically not checked against the underlying array size. Array bounds violations are therefore possible and rather common (see also the "Criticism" section below), and can lead to the usual sorts of repercussions: illegal memory accesses, corruption of data, run-time exceptions, etc. C has no built-in support for multidimensional arrays, but since the type system is recursive it is straightforward to declare an array of arrays, which accomplishes approximately the same thing. The index values of the resulting "multidimensional array" can be thought of as flowing in row-major order. C is often used in low-level systems programming, where "escapes" from the type system may be necessary. The compiler attempts to ensure type correctness of most expressions, but the programmer can override the checks in various ways, either by using a typecast to explicitly convert a value from one type to another, or by using pointers or unions to reinterpret the underlying bits of a value in some other way. (The use of typecasts obviously sacrifices some of the safety normally provided by the type system.)

Unification of Arrays and Pointers

A unique feature of C is its treatment of arrays and pointers. The array-subscript notation x[i] can also be used when x is a pointer; the interpretation (using pointer arithmetic) is to access the ith of several adjacent data objects pointed to by x. (Formally, x[i] is equivalent to
- (x + i)
.) Also, when the name of an array appears in an expression, a pointer to the array's first element is automatically derived and used thereafter: this means that arrays are never copied, or passed as arguments to functions, as a whole; rather, only a pointer is copied or passed. (A consequence is that although C's function calls use pass-by-value semantics, arrays seem to be passed by reference.)

Data storage

One of the most important functions of a programming language is to provide facilities for managing memory and the objects that are stored in memory. C provides three distinct ways to allocate memory for objects:
- Static memory allocation: space for the object is provided in the binary at compile-time; these objects have an extent (or lifetime) as long as the binary which contains them exists
- Automatic memory allocation: temporary objects can be stored on the stack, and this space is automatically freed and reusable after the block they are declared in is left
- Dynamic memory allocation: blocks of memory of any desired size can be requested at run-time using the library function malloc() from a region of memory called the heap; these blocks are reused after the library function free() is called on them These three approaches are appropriate in different situations and have various tradeoffs. For example, static memory allocation has no allocation overhead, automatic allocation has a small amount of overhead during initialization, and dynamic memory allocation can potentially have a great deal of overhead for both allocation and deallocation. On the other hand, stack space is typically much more limited than either static memory or heap space, and only dynamic memory allocation allows allocation of objects whose size is only known at run-time. Most C programs make extensive use of all three. Where possible, automatic or static allocation is usually preferred because the storage is managed by the compiler, freeing the programmer of the error-prone hassle of manually allocating and releasing storage. Unfortunately, many data structures can grow in size at runtime; since automatic and static allocations must have a fixed size at compile-time, there are many situations in which dynamic allocation must be used. Variable-sized arrays are a common example of this (see "malloc" for an example of dynamically allocated arrays).

Syntax

Main article: C syntax Unlike languages like Fortran 77, C is free-form, allowing programmers to use arbitrary whitespace (rather than rigid lines) in laying out their code. Comments can be included either between the delimiters /
-
and
- /
, or (in C99) following // until the end of the line. Each source file contains declarations and function definitions. Function definitions, in turn, contain declarations and statements. Declarations either define new types using keywords such as struct, union, and enum, or assign types to and reserve storage for new variables, usually by writing the type followed by the variable name. Keywords such as char and int, as well as the pointer-to symbol
-
, specify built-in types. Sections of code are enclosed in braces () to indicate the extent to which declarations and control structures apply. As an imperative language, C depends on statements to do most of the work. Most statements are expression statements which simply cause an expression to be evaluated -- and, in the process, cause variables to receive new values or values to be printed. Control-flow statements are also available for conditional or iterative execution, constructed with reserved keywords such as if, else, switch, do, while, and for. Arbitrary jumps are possible with goto. A variety of built-in operators perform primitive arithmetic, logical, comparative, bitwise, and array indexing operations and assignment. Expressions can also call functions, including a large number of standard library functions, for performing many common tasks.

Criticism

A popular saying, repeated by such notable language designers as Bjarne Stroustrup, is that "C makes it easy to shoot yourself in the foot" [http://www.research.att.com/~bs/bs_faq.html#really-say-that] In other words, C permits many operations that are generally not desirable, and thus many simple errors made by a programmer are not detected by the compiler or even when they occur at runtime. This leads to programs with unpredictable behavior and security holes. The safe C dialect Cyclone addresses some of these problems. Part of the reason for this is to avoid compile- and run-time checks that were too expensive when C was originally designed. Another reason is the desire to keep C as efficient and flexible as possible; the more powerful a language, the more difficult it is to prove things about programs written in it. Some checks were also relegated to external tools, such as those discussed in Compiler-external static-checking tools below.

Memory allocation

One problem with C is that automatically and dynamically allocated objects are not initialized; they initially have whatever value is present in the memory space they are assigned. This value is highly unpredictable, and can vary between two machines, two program runs, or even two calls to the same function. If the program attempts to use such an uninitialized value, the results are usually unpredictable. Many modern compilers try to detect and warn about this problem, but both false positives and false negatives occur. Another common problem is that heap memory cannot be reused until it is explicitly released by the programmer with free(). The result is that if the programmer accidentally forgets to free memory, but continues to allocate it, more and more memory will be consumed over time. This is called a memory leak. Conversely, it is possible to release memory too soon, and then continue to use it. Because the allocation system can reuse the memory at any time for unrelated reasons, this results in insidiously unpredictable behavior. These issues in particular are ameliorated in languages with automatic garbage collection.

Pointers

Pointers are one primary source of danger; because they are unchecked, a pointer can be made to point to any object of any type, including code, and then written to, causing unpredictable effects. Although most pointers point to safe places, they can be moved to unsafe places using pointer arithmetic, the memory they point to may be deallocated and reused (dangling pointers), they may be uninitialized (wild pointers), or they may be directly assigned any value using a cast or through another corrupt pointer. Another problem with pointers is that C freely allows conversion between any two pointer types. Other languages attempt to address these problems by using more restrictive reference types.

Arrays

Although C has native support for static arrays, it does not verify that array indexes are valid (bounds checking). For example, one can write to the sixth element of an array with five elements, yielding generally undesirable results. This is called a buffer overflow. This has been notorious as the source of a number of security problems in C-based programs. On the other hand, since bounds checking elimination technology was largely nonexistent when C was defined, bounds checking came with a severe performance penalty, particularly in numerical computation. It was also believed to be inconsistent with C's minimalist approach. Multidimensional arrays are necessary in numerical algorithms (mainly from applied linear algebra) to store matrices. The structure of the C array is very well adapted and fit for this particular task, provided one is prepared to count one's indices from 0 instead of 1. This issue is discussed in the book Numerical Recipes in C, Chap. 1.2, page 20 ff ([http://www.library.cornell.edu/nr/bookcpdf/c1-2.pdf read online]). In that book there is also a solution based on negative addressing which introduces other dangers.

Variadic functions

Yet another common problem are variadic functions, which take a variable number of arguments. Unlike other prototyped C functions, checking the arguments of variadic functions at compile-time is not mandated by the standard, and is impossible in general without additional information. If the wrong type of data is passed, the effect is unpredictable, and often fatal. Variadic functions also handle null pointer constants in an unexpected way. For example, the printf family of functions supplied by the standard library, used to generate formatted text output, is notorious for its error-prone variadic interface, which relies on a format string to specify the number and type of trailing arguments. Type-checking of variadic functions from the standard library is a quality of implementation issue, however, and many modern compilers do in particular type-check printf calls, producing warnings if the argument list is inconsistent with the format string. However, not all printf calls can be checked statically, since the format string can be built at runtime, and other variadic functions typically remain unchecked.

Syntax

Although mimicked by many languages because of its widespread familiarity, C's syntax has been often targeted as one of its weakest points. For example, Kernighan and Ritchie say in the second edition of The C Programming Language, "C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better." Bjarne Stroustrup has also derided C++'s syntax, which is very similar to that of C: "Within C++, there is a much smaller and cleaner language struggling to get out. [...] the C++ semantics is much cleaner than its syntax." [http://www.research.att.com/~bs/bs_faq.html] Some specific problems worth noting are:
- A function prototype which does not specify any parameters actually implicitly allows any set of parameters, a syntax problem introduced for backward compatibility with K&R C, which lacked prototypes.
- Some questionable choices of operator precedence, as mentioned by Kernighan and Ritchie above, such as

binding more tightly than & and | in expressions like x & 1

0.
- The use of the = operator, used in mathematics for equality, to indicate assignment, leading to unintended assignments in comparisons and a false impression that assignment is transitive. Having = denote assignment and

equality was a deliberate decision by Ritchie, who noted that assignment occurs much more often than comparisons.
- A lack of infix operators for complex objects, particularly for string operations, making programs which rely heavily on these operations difficult to read.
- Heavy reliance on punctuation-based symbols even where this is arguably less clear, such as "&&" and "||" instead of "and" and "or".
- The un-intuitive declaration syntax, particularly for function pointers. In the words of language researcher Damian Conway speaking about the very similar C++ declaration syntax: ::Specifying a type in C++ is made difficult by the fact that some of the components of a declaration (such as the pointer specifier) are prefix operators while others (such as the array specifier) are postfix. These declaration operators are also of varying precedence, necessitating careful bracketing to achieve the desired declaration. Furthermore, if the type ID is to apply to an identifier, this identifier ends up at somewhere between these operators, and is therefore obscured in even moderately complicated examples (see Appendix A for instance). The result is that the clarity of such declarations is greatly diminished. ::Ben Werther & Damian Conway. [http://www.csse.monash.edu.au/~damian/papers/HTML/ModestProposal.html#section3.1.1 A Modest Proposal: C++ Resyntaxed]. Section 3.1.1. 1996.

Maintenance problems

There are other problems in C that don't directly result in bugs or errors, but make it harder for inexperienced programmers to build a robust, maintainable, large-scale system. Examples of these include:
- A fragile system for importing definitions (#include) that relies on literal text inclusion and redundantly keeping prototypes and function definitions in sync, and drastically increases build times.
- A cumbersome compilation model that forces manual dependency tracking and inhibits compiler optimizations between modules (except by link-time optimization).
- A weak type system that lets many clearly erroneous programs compile without errors.
- The difficulty of creating opaque structures, which results in programs that tend to violate information hiding.

Compiler-external static-checking tools

Tools have been created to help C programmers avoid these errors in many cases. Automated source code checking and auditing is fruitful in any language, and for C many such tools exist such as Lint. A common practice is to use Lint to detect questionable code when a program is first written. Once a program passes Lint, it is then compiled using the C compiler. There are also compilers, libraries and operating system level mechanisms for performing array bounds checking, buffer overflow detection and automatic garbage collection, that are not a standard part of C. Cproto is a program that will read a C source file and output prototypes of all the functions within the source file. This program can be used in conjuction with the "make" command to create new files containing prototypes each time the source file has been changed. These prototype files can be included by the original source file (e.g., as "filename.p"), which reduces the problems of keeping function definitions and source files in agreement. It should be recognized that these tools are not a panacea. Because of C's flexibility, some types of errors involving misuse of variadic functions, out-of-bound array indexing, and incorrect memory management cannot be detected on some architectures without incurring a significant performance penalty. However, some common cases can be recognized and accounted for. History

Early developments

The initial development of C occurred at AT&T Bell Labs between 1969 and 1973; according to Ritchie, the most creative period occurred in 1972. It was named "C" because many of its features were derived from an earlier language called "B". Accounts differ regarding the origins of the name "B": Ken Thompson credits the BCPL programming language, but he had also created a language called Bon in honor of his wife Bonnie. There are many legends as to the origin of C and its related operating system, Unix, including:
- The development of C was the result of the programmers' desire to play [http://cm.bell-labs.com/cm/cs/who/dmr/spacetravel.html Space Travel]. They had been playing it on their company's mainframe, but being underpowered and having to support about 100 users, Thompson and Ritchie found they didn't have sufficient control over the spaceship to avoid collisions with the wandering space rocks. Thus, they decided to port the game to an idle PDP-7 in the office. But it didn't have an operating system (OS), so they set about writing one. Eventually they decided to port the operating system to the office's PDP-11, but this was onerous since all the code was in assembly language. They decided to use a higher-level portable language so the OS could be ported easily from one computer to another. They looked at using B, but it lacked functionality to take advantage of some of the PDP-11's advanced features. So they set about creating the new language, C.
- The justification for obtaining the original computer that was used to develop Unix was to create a system to automate the filing of patents. The original version of Unix was developed in assembly language. Later, the C language was developed in order to rewrite the operating system. By 1973, the C language had become powerful enough that most of the Unix kernel, originally written in PDP-11/20 assembly language, was rewritten in C. This was one of the first operating system kernels implemented in a language other than assembly, earlier instances being the Multics system (written in PL/I), and MCP (Master Control Program) for Burroughs B5000 written in ALGOL in 1961.

K&R C

In 1978, Ritchie and Brian Kernighan published the first edition of The C Programming Language. This book, known to C programmers as "K&R", served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as "K&R C." (The second edition of the book covers the later ANSI C standard, described below.) K&R introduced the following features to the language:
- struct data types
- long int data type
- unsigned int data type
- The =+ operator was changed to +=, and so forth (=+ was confusing the C compiler's lexical analyzer; for example, i =+ 10 compared with i = +10). K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. For many years, even after the introduction of ANSI C, it was considered the "lowest common denominator" that C programmers stuck to when maximum portability was desired, since not all compilers were updated to fully support ANSI C, and reasonably well-written K&R C code is also legal ANSI C. In these early versions of C, only functions that returned a non-integer value needed to be declared before use. A function used without any previous declaration was assumed to return an integer. Example call requiring previous declaration:
long int SomeFunction();

int CallingFunction()
Example call not requiring previous declaration:
int SomeOtherFunction()


int CallingFunction()

Since the K&R prototype did not include any information about function arguments, function parameter type checks were not performed, although some compilers would issue a warning message if a function was called with the wrong number of arguments. In the years following the publication of K&R C, several "unofficial" features were added to the language, supported by compilers from AT&T and some other vendors. These included:
- void functions and void
-
data type
- functions returning struct or union types
- struct field names in a separate name space for each struct type
- assignment for struct data types
- const qualifier to make an object read-only
- a standard library incorporating most of the functionality implemented by various vendors
- enumerations
- the single-precision float type

ANSI C and ISO C

During the late 1970s, C began to replace BASIC as the leading microcomputer programming language. During the 1980s, it was adopted for use with the IBM PC, and its popularity began to increase significantly. At the same time, Bjarne Stroustrup and others at Bell Labs began work on adding object-oriented programming language constructs to C. The language they produced, called C++, is now the most common application programming language on the Microsoft Windows operating system; C remains more popular in the Unix world. Another language developed around that time is Objective-C which also adds object oriented programming to C. While, now, not as popular as C++, it is used to develop Mac OS X's Cocoa applications. In 1983, the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C. After a long and arduous process, the standard was completed in 1989 and ratified as ANSI X3.159-1989 "Programming Language C". This version of the language is often referred to as ANSI C, or sometimes C89 (to distinguish it from C99). In 1990, the ANSI C standard (with a few minor modifications) was adopted by the