Erlang: Functions (Part 1).

Functions are very powerful, that's what you might have heard. As in many other languages, in Erlang, a function takes arguments and returns a result. And that's a great way to abstract things. Let me show you how functions look like.

get_random_number() -> 4.

While being a famous joke this line of code illustrates a simple function that despite of its misleading name would of course never return a random number. Instead the result of this function is always 4.

To define a function you give it a name which is always an atom. In the example above it's get_random_number. But it could be literally any valid atom, for instance, '+' is also a valid function name.

The name has to be followed by 0 or N patterns separated from each other by a comma; you wrap patterns in parentheses. The number of patterns is called arity and helps us distinguish one function of the same name from another. If the arity is equal to 0 (called a nullary function) you have no patterns but you still have to specify the parentheses. This is how it's done in the example above.

A combination of function's name and patterns is called a head (also known as a signature in other languages). We separate the head from the body by that arrow thing (->) and before you asked the body is a sequence of expressions separated by a comma. The very last expression is always the return value, the results of all other expressions are ignored. In the example above the body consists of a single expression which is a constant number 4.

In many other languages you only have a single head and a single body per function. Not in Erlang. In fact, in Erlang a function consists of one to N clauses (combinations of heads and bodies); and those clauses are separated by a semicolon (;).

The word "patterns" might be confusing at first. We call them patterns to specify they're more complex than just regular parameters. You could keep in mind that patterns follow the same rules and tricks we learned from the match operator (=) and the case / if expressions. For now, I don't really want to dig deeper, so these details would be discussed in the further examples. Just notice that patterns exist inside of a function. When you provide some real values when calling a function you provide arguments. To complete the evaluation BEAM would need to match arguments against patterns (this is how BEAM finds the right clause) and execute the body by substituting the actual values.

In most natural languages you organize words in sentences and usually end them with a period (full stop). In Erlang, you always end functions with a period. While this might look pretty verbose for some of you it still has its properties from the parsing point of view and influences the ability to format your code. We will discuss it later in a dedicated article on parsers. For now, just think about that period sign as a good way to visually distinguish one function from another.

Knowing the above we could try and illustrate another example, a function that increments its only argument by 1.

succ(X) -> X + 1.

This function is called the successor function. You could note that we're using an unbound variable here as a pattern so that the argument you provide when calling this function would be associated with that X variable. Please also note that we do not modify the origin argument (which is X), instead we return a copy of that X incremented by 1.

So far we have discussed what functions are but it's still unclear why programmers really need them. Surprisingly, there is no quick answer, so let's start discussing it step by step. As the time progresses, you would learn more properties of functions and find them useful.

One of such properties is that you could influence on the computation by mixing the arguments straight to the function's body, effectively creating a map between the arguments and the result. This way of thinking (and organizing your code) is a good way to reach high-level abstractions. It's obviously way simpler to ask the machine to reproduce some steps (configured by arguments) rather than having to specify steps yourself each time. Another good property is that you could define your functions in terms of previously defined functions. Let's illustrate it.

multiply(X, Y) -> X * Y.

double(X) -> multiply(X, 2).

quadruple(X) -> double(double(X)).

This of course is a synthetic example but it gives you a taste of the idea. We define a function called multiply which is a simple binary (arity is 2) wrapper over the multiply operator. Then we define a unary (arity is 1) function called double which body is defined in terms of the previously defined multiply function. What we do here is calling the multiply function providing some real arguments; the syntax of calling a function is similar to declaring the clause's head: you use parentheses for that. The X argument of the multiply function depends on the real argument of the double function. And you could see we "lock" the Y argument of the multiply function by substituting a constant integer value 2. The quadruple function is based on the idea of calling the double function twice; the argument of the outer double is based on the result of the inner double.

When defining new functions in terms of previously defined functions we create a layer of abstractions. In this way you hide the exact implementation and provide some sort of a contract for someone who is calling your function. That's it, you could now replace the body of the double function with the corresponding equivalent and leave the quadruple function untouched.

double(X) -> X + X.

quadruple(X) -> double(double(X)).

This approach works for any program, the synthetic ones you write for self-education purposes or even for some production-ready-real-world enterprise clusters. So mastering it at the very early stage is essential for any successful programmer.

You would soon notice it's not always required to define patterns in terms of unbound variables. Sometimes you just need to match the right clause against a pattern confirming some fact. To illustrate this let's define our own version of the not operator, you could see how it works in the following example.

1> not true.
2> not false.

This unary operator just inverses its boolean operand, so to define the equivalent function we would write this:

my_not(true) -> false;
my_not(false) -> true.

This function consists of 2 clauses; the first clause checks if the only argument is equal to the true atom. And if it's true, then we should return false. Same is for the second clause except that we check for the false atom and return true in this case. In languages where you could only have a clause per function you would do something like this:

my_not_v2(X) ->
    case X of
        true -> false;
        false -> true

This function is the equivalent of the previous one except for the fact we have to involve the variable called X.

So far we were discussing how to define functions, now let's discuss how to organise them in modules, the basic units where we store our functions (and other stuff too). Since this article is about functions we would only discuss some essential basics on modules. For those of you who strive to learn as much as possible before moving to other topics this link might be useful.

From the technical point of view modules are just text files. You could have a module that's full of utility functions which operate on lists. Or a module consists of functions intended to operate on some numeric computations, and so on. This is up to you how to organize your functions using modules but it's always recommended to group up functions logically and store them together.

To define a module you give it a name (which is always the hardest thing to do) and add the .erl extension. The name has to be a valid atom and while it's technically possible to name your module '123' I wouldn't recommend it since it would be incredible hard to figure out the module's purpose. Instead try some convenient names like utils, numbers, tools and so on.

Now let's organize our previously defined double and quadruple functions into a module. Create a file lesson1.erl, open it in your editor of choice and type this:

-export([double/1, quadruple/1]).

multiply(X, Y) -> X * Y.

double(X) -> multiply(X, 2).

quadruple(X) -> double(double(X)).

Each module consists of a sequence of attributes (a new thing to learn!), each attribute starts with the minus sign (-) followed with a valid atom and parentheses (where you store some value); as in functions you terminate attributes by period. You separate attributes and functions by a whitespace and while it's technically possible to place attributes and functions on to the same line I wouldn't recommend you to do that. Instead place each attribute and function to a single line. I also encourage you to have a blank line between functions.

The very first attribute of any module should be the module attribute. Its value is always an atom, and that atom is the module name without the .erl extension.

The second attribute is the export attribute; its value is a list of functions to export, written in the notation name/arity and separated from each other by a comma. You could notice we only export the double and quadruple functions ignoring the multiply function. In Erlang each function is local to where it's defined, hence you could only call it from within the module; to make a function public you would need to explicitly mention its name in the export attribute.

And of course we define our functions! As it was mentioned above, the multiply function is local (private) and the double and quadruple functions are public (hence available from the outside of the module).

Now start a new erl session (from where you have your lesson1.erl module) and try the c command.

1> c(lesson1).
2> lesson1:double(5).

The c command compiles your lesson1 module and only available while in the erl session; if everything is OK with the compilation you would see {ok,lesson1} as the result and the lesson1.beam file would appear at the same directory where you have your lesson1.erl file.

To call a function defined in a module you would need to specify the module name followed by a colon and the function name and provide some arguments. That's what lesson1:double(5) does.

Of course the main idea is that you define some functions organized in modules and then calling them from some other modules constructing higher-level abstractions. Try it yourself and then move to the next article.

P.S.: Don't forget to inspect and download the sources.