Chapter 16. Static typing

Description

Obix is a statically typed language.

Rules

  1. The type of object that can be assigned to an object reference (attribute, input argument, output argument, script constant or script variable) must always be explicitly specified in the source code.

    The following example shows how the type of object references is explicitly declared:

    service static_typing_examples
    
       attribute foo type:string default:"foo" end // type of attribute is 'string'
    
       command example_1
          in foo type:character end        // type of input argument is 'character'
          out bar type:positive32 end      // type of output argument is 'positive32'
    
          script
             const zero_positive32 tar = 0 // type of constant is 'zero_positive32'
             var yes_no zar = yes          // type of variable is 'yes_no'
             // ...
          end
       end
    
    end service
  2. If an object reference R is of type T then only objects of type T or child types of type T (i.e. types that are compatible to type T) can be assigned to R.

    This rule is a repetition of the type compatibility rule discussed in Chapter 15, Type inheritance. Please refer to that section for additional explanations and examples.

  3. If the type of an object cannot be known at compile-time, then the case type of instruction can be used to check the type at run-time and execute appropriate instructions.

    Please refer to the section called “case type of instruction” for additional explanations and examples.

Rationale

Static typing largely contributes to better code, less bugs, and increased maintainability, as explained below.

  • Static typing increases source code understandability.

    The obligation to declare the type of an object reference makes it often more easy to understand source code, especially source code written by somebody else.

    Suppose, for example, that a script declares a variable to store the quality of products.

    If Obix was a dynamically typed language, the variable declaration could look like this:

    var quality_level

    We can deduce from the above instruction that the level of quality is stored in variable quality_level, but we can't see how the information is stored.

    On the other hand the statically typed instruction:

    var zero_positive32 quality_level

    tells us immediately that the level of quality is stored as a positive integer value that can be 0.

  • Static typing supports the "fail fast!" principle, and therefore increases reliability and maintainability.

    Static typing enables the compiler to reliably detect errors that would otherwise only produce runtime errors or, even worse, produce wrong results that are sometimes only detected late after the whole application is already in production.

    It is best to illustrate this with an example.

    Let's continue with our previous example of software that handles the quality of products. Suppose we want to express the quality as an integer value ranging from 0 (very bad) to 9 (very good) and store the qualities of several products in a list.

    As we have seen already, if Obix was a dynamically typed language, a variable declaration holding the quality of a product would look like this:

    var quality_level

    Let's see what could happen now subsequently, still supposing Obix was a dynamically typed language.

    If a product has a quality level of 5, we could write:

    quality_level = 5

    BUT: We could also write:

    quality_level = "5"

    The compiler wouldn't generate an error, although our intention was to store qualities as integers, not as strings. Even worse, no runtime error is generated.

    Nonsense instructions like the following ones would also pass without producing a compile- or run-time error:

    quality_level = "ok"
    quality_level = "I don't know"
    quality_level = yes
    quality_level = fa_customer.create ( identifier = "123"; name = "Bob" )
    [Note]Note

    One might argue that nobody would ever write silly instructions like the above ones. However, we are here just looking at the most simplest instructions that allow us to understand the idea, and thus simplify the exercise. In a real application, the value of quality_level would rather be defined through the execution of another routine (possibly in another library) that returns a value. Obviously, this makes it more difficult to immediately grasp the error with a quick look at the source code. Experience shows that silly programming errors of the above kind are much more frequent than one would imagine, especially in big applications written by many programmers.

    Let's now store some values in a list. The code to create a list and store a value would be:

    var list = fa_list.co_create
    list.append ( quality_level )

    Once again we have a problem. Because the type of objects in the list is not specified, any value can be stored in the list, including all the silly values shown previously. No compiletime or runtime error is generated!

    The consequences can be evil. Suppose, for instance, we want to count how many products have quality level 5. The code is:

    var count = 0
    repeat for each quality in list
       if quality =v 5 then
          count = count + 1
       end if
    end repeat

    How will the application behave when if quality =v 5 is executed and the string value "5" is retrieved from the list? It depends on the compiler! The application could generate a runtime error, because "5" cannot be compared to 5. But in a dynamically typed language the application would more probably continue execution and decide that "5" is not equal to 5. Or it could silently convert 5 to "5", which means that the values are equal.

    The awful fact that "it depends on the compiler" and that the result of the boolean expression quality =v 5 can be a runtime error, or true or false is of course unacceptable. In a well designed language, nothing should ever depend on the internals of the compiler, because it makes the programmer's difficult life even more difficult. Moreover, an application's behavior could change if it is compiled with another compiler that applies different internal rules!

    It is not difficult to find other examples of wrong results. Suppose we want to display the list on screen. But instead of displaying values from 0 to 9, we display values ranging from 1 to 10:

    repeat for each quality in list
       console.message ( quality + 1 )
    end repeat

    What happens this time if a string value of "5" is in the list? Once again, it depends on the compiler! Most probably, the compiler would silently convert the integer 1 to string "1", and the value displayed would be "51", which is the result of concatenating strings "5" and "1"!!!

    [Note]Note

    Although Java is not a dynamically typed language, the above behavior is indeed built into the language. To proof this, it is sufficient to execute the following statements:

    String quality_level = "5";
    System.out.println ( quality_level + 1 );

    The compiler (Sun's compiler of Java version 6) does not nag and the result displayed is 51!

    The good news is that all these problems can't happen in a statically typed language, because the compiler detects all errors immediately.

    The above code rewritten in real Obix is shown below, and the comments explain how all previously encountered errors are immediately caught at compile-time:

    service static_typing_examples
    
       command example_2
          script
             var zero_positive32 quality_level
             quality_level = 5      // ok
             // quality_level = "5" // refused by compiler, because 'quality_level' can only hold 'zero_positive32' objects
    
             // declare mutable indexed list that contains 'zero_positive32' objects
             var !mutable_indexed_list<zero_positive32> list = !mutable_indexed_list_factory<zero_positive32>.co_create
    
             list.append ( quality_level ) // ok
             // list.append ( "5" )        // refused by compiler, because list can only contain 'zero_positive32' objects
    
             var zero_positive32 count = 0
             repeat for each zero_positive32 quality in list
                if quality =v 5 then // ok
                // if "5" =v 5 then  // refused by compiler, because 2 different types cannot be compared without explicit conversion
                  count = count + 1
                end if
    
                console.message ( (quality + 1).to_string ) // ok, because the result of the addition is explicitly converted
                                                            // into a 'string'. there is no ambiguity.
             // console.message ( "5" + 1 ) // refused by compiler, because a 'zero_positive32' object cannot be 
                                            // appended to a 'string' object without explicit conversion
    
             end repeat
          end
       end
    
    end service

    A last remark remains to be made. We initially specified that the quality level ranges from 0 to 9. However, in the above code we use type zero_positive32, which means that values above 9 would also be accepted. The instruction quality_level = 100 wouldn't generate a compile- or run-time error.

    To solve this problem, the best solution is to define a new type product_quality_level as follows:

    type product_quality_level 
    
       inherit zero_positive32
          attribute value and_check: i_value <= 9.value end
       end
    
    end
    [Note]Note

    The above code uses feature redefinition which is explained in Chapter 18, Feature redefinition. Please refer to that section for further explanations.

    Besides the advantage of range verification (0 to 9), the code now becomes even more robust, because the quality level is now semantically different from a zero_positive32 value. Hence, errors like assigning a loop index to a variable holding the quality of a product are now also detected at compile-time, as shown below:

    service static_typing_examples
    
       command example_3
          script
             var zero_positive32 loop_index = 5
             var product_quality_level quality_level
             // quality_level = loop_index  // refused by compiler, because types are not compatible
          end
       end
    
    end service
  • Static typing increases the quality of code in software projects.

    Although static typing is of course not a guarantee for better quality of code, it often leads to, or enforces, better quality. Because more errors are detected at compile-time (as seen before), the programmer is forced to correct the real source of those errors immediately. This often discourages and reduces sloppy programming and leads to a better design and typing system.

    Consider, for example, the following situation.

    The above mentioned error quality = "5" (instead of quality = 5) stays undetected and the application goes into production. One day, a customer reports that the result of counting the number of products with quality level 5 is wrong: "The computer displays 570 items instead of 571!".

    The programmer responsible for repairing the bug finds the following code we wrote earlier:

    var count = 0
    repeat for each quality in list
       if quality =v 5 then
          count = count + 1
       end if
    end repeat

    After some time of testing he or she discovers that one item in the customer's list is a string value of "5", instead of an integer value of 5, and that the expression quality =v 5 evaluates to false, because the compiler considers that "5" and 5 are unequal.

    The important question is now: What will the programmer do?

    Obviously, the programmer should search for the reason of having a string in the list, and then correct the source of the problem.

    However, in practice there is often a difference between what the programmer should do, and what he or she actually does.

    The big question is now: What will the programmer do?

    In our simplified example, the problem could of course easily be fixed by replacing quality = "5" with quality = 5. But in a real world application, it can get difficult and time-consuming, or even impossible, to find and fix the source of the problem, because the bug might have been created by another programmer, it might exist at several locations spread throughout the source code, or it might exist in a third-party library which is delivered without source code.

    Therefore, the programmer looks for a quick and easy solution. Finally he or she replaces

    if quality =v 5 then

    with

    if quality =v "5" or quality =v 5 then

    and the problem is solved!

    That's what might actually happen in practice!

    The consequences are dreadful, because the source of the problem has not been removed:

    • The quality of code has decreased because the instruction if quality =v "5" or quality =v 5 then is a "hack" and reduces source code understandability, especially for other programmers who are not aware of the problem reported by the customer.

    • Although the customer's particular problem has been solved, there remains the risk of other wrong objects that will appear in the list later on, and cause similar or different problems.

    • The problem of a string in the list might cause other problems not yet detected in other parts of the application.

    It is easy to see that many other similar problems can appear with dynamic typing, up to the point of jeopardizing a software project, especially in case of a big application that is written by many programmers, used by many users and extended continuously.

    [Note]Note

    Some people claim that dynamically typed languages makes them more productive. But this is only a perceived advantage. The programmer feels he or she writes more code in less time, because the compiler is less severe and accepts code that wouldn't be accepted in a statically typed language. But at the end this is deceptive, because the errors will only appear later, maybe during the test phase, or after the application is in production. And experience shows that the later a bug or bad design is detected, the more time-consuming, expensive and stressful it will be to correct the error.

    [Note]Note

    Some people claim that static typing is not necessary when a good framework for testing is available, because all errors will then be detected through unit tests. While it is true that all errors could be detected with adequate tests, it is also true that it is difficult, and in most cases even impossible, to write perfect unit tests that discover all errors, especially in the case of dynamic typing, because dynamic typing opens the door for more errors for which test cases have to be written. Moreover, writing tests also requires time and experience. Practice shows that a number of programmers don't write unit tests, even if they have good support for it. And some of them will probably never write unit tests, because they think that other people are responsible for testing their code, or because it requires time, discipline, experience, and sometimes also a good portion of humility.

    Without doubts, unit tests are very useful to detect bugs. However, they are an excellent complement, but not a replacement for static typing and other features, such as genericity, contract programming and feature redefinition. For more information about writing tests in Obix, please refer to Chapter 21, Testing

    The excellent book Code complete, second edition (ISBN 0-7356-1967-0), written by Steve McConnell contains interesting conclusions about bugs, which are the results of studies done by IBM, NASA, etc. Besides telling us that [fixing defects in source code often costs 10 times what it took to develop the whole system] (see bottom of page 517), he also tells us that [the modal rate of defects found by unit testing are only 30 to 35 percent] (page 470) and that [the best way to find a maximum of defects at the earliest stage is a combination of different methods].

  • Statically typed languages typically have better runtime-performance than dynamically typed languages.

    The reason is that dynamically typed languages have to evaluate and check types at runtime, while statically typed languages do this at compile-time.

To conclude, we can say that static typing requires the programmer to think a bit more before writing a bit more code, but the final rewards are manyfold:

See also