Clean Code

Introduction

These notes are derived from the book "Clean Code", link in references.

This notes contains a general guidelines for writing clean code. Writing clean code is very important has the code is only written onces and read multiple times. So this notes summaries best principles to follow to write clean and better code

What is clean code

Lets see what great programmers have to say about what clean code is.

I like my code to be elegant and efficient. The logic should be straightforward to make it hard for bugs to hide, the dependencies minimal to ease maintenance, error handling complete according to an articulated strategy, and per- formance close to optimal so as not to tempt people to make the code messy with unprinci- pled optimizations. Clean code does one thing well.

Bjarne Stroustrup - C++ inventor

Also

Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control.

Grady Booch - author Object Oriented Analysis and Design with Applications

Below are two of my most favourite.

I could list all of the qualities that I notice in clean code, but there is one overarching quality that leads to all of them. Clean code always looks like it was written by someone who cares. There is nothing obvious that you can do to make it better. All of those things were thought about by the code’s author, and if you try to imagine improvements, you’re led back to where you are, sitting in appreciation of the code someone left for you—code left by someone who cares deeply about the craft.

Michael Feathers- Author Working Effectively with Legacy Code

And

You know you are working on clean code when each routine you read turns out to be pretty much what you expected. You can call it beautiful code when the code also makes it look like the language was made for the problem.

Ward Cunningham - inventor of Wiki

Naming

Names are everywhere in software. We name our variables, our functions, our arguments, classes, and packages. We name our source files and the directories that contain them. We name and name and name. Because we do so much of it, we’d better do it well. What follows are some simple rules for creating good names.

Use Intention-Revealing Names

The name of a variable, function, or class, should answer all the big questions. It should tell you why it exists, what it does, and how it is used. If a name requires a comment, then the name does not reveal its intent.

int d; // elapsed days in time (Bad)

int elapsedTimeInDays; // (Good, without coments)

Make Meaningful Distinctions

Programmers create problems for themselves when they write code solely to satisfy a compiler or interpreter. For example, because you cant use the same name to refer to two different things in the same scope, you might be tempted to change one name in an arbitrary way. It is not sufficient to add number series or noise words, even though the compiler is satisfied. If names must be different, then they should also mean something different.

Number-series naming (a1, a2, .. aN) is the opposite of intentional naming. Such names are not disinformative they are non informative; they provide no clue to the authors intention. Consider:

// Bad variable name
public static void copyChars(char a1[], char a2[]) { 
    
    for (int i = 0; i < a1.length; i++) {
        a2[i] = a1[i]; 
    }
}

// Bad variable name
public static void copyChars2(char source[], char destination[]) { 
    
    for (int i = 0; i < source.length; i++) {
        destination[i] = source[i]; 
    }
}

This function reads much better when source and destination are used for the argument names.

Use Pronounceable Names

Humans are good at words. A significant part of our brains is dedicated to the concept of words. So make your names pronounceable.

Use Searchable Names

Single-letter names and numeric constants have a particular problem in that they are not easy to locate across a body of text.

One might easily grep for MAX_CLASSES_PER_STUDENT, but the number 7 could be more troublesome. Searches may turn up the digit as part of file names, other constant definitions, and in various expressions where the value is used with different intent. It is even worse when a constant is a long number and someone might have transposed digits, thereby creating a bug while simultaneously evading the programmers search.

Likewise, the name e is a poor choice for any variable for which a programmer might need to search. It is the most common letter in the English language and likely to show up in every passage of text in every program. In this regard, longer names trump shorter names, and any searchable name trumps a constant in code.

Class Names

Classes and objects should have noun or noun phrase names like Customer, WikiPage, Account, and AddressParser. Avoid words like Manager, Processor, Data, or Info in the name of a class. A class name should not be a verb.

Method Names

Methods should have verb or verb phrase names like postPayment, deletePage, or save. Accessors, mutators, and predicates should be named for their value and prefixed with get, set, and is.

Pick One Word per Concept

Pick one word for one abstract concept and stick with it. For instance, its confusing to have fetch, retrieve, and get as equivalent methods of different classes.

Functions

Functions are the first line of organization in any program. So let see some guidelines for writing code functions

Function show be small tiny

The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.

Every function must be just two, or three, or four lines long. Each one must be transparently obvious. Each functions told a story. And each led you to the next in a compelling order. That's how short your functions should be!

Blocks and Indenting

This implies that the blocks within if statements, else statements, while statements, and so on should be one line long. Probably that line should be a function call. Not only does this keep the enclosing function small, but it also adds documentary value because the function called within the block can have a nicely descriptive name.

This also implies that functions should not be large enough to hold nested structures. Therefore, the indent level of a function should not be greater than one or two. This, of course, makes the functions easier to read and understand.

Do one F**KING thing

Every function is should only do one thing. If it is doing multiple things, then break it down further in smaller functions.

Reading Code from Top to Bottom: The Stepdown Rule

We want the code to read like a top-down narrative. We want every function to be followed by those at the next level of abstraction so that we can read the program, descending one level of abstraction at a time as we read down the list of functions.

Switch Statements

It's hard to make a small switch statement. Even a switch statement with only two cases is larger than a single block or function to be. It's also hard to make a switch statement that does one thing. By their nature, switch statements always do N things. Unfortunately we can't always avoid switch statements, but we can make sure that each switch statement is buried in a low level class and is never repeated. We do this, of course, with polymorphism.

Consider the following code

public Money calculatePay(Employee e) throws InvalidEmployeeType {
    
    switch (e.type) { 
        
        case COMMISSIONED:
            return calculateCommissionedPay(e);
        
        case HOURLY:
            return calculateHourlyPay(e); 
        
        case SALARIED:
            return calculateSalariedPay(e); 
        
        default:
            throw new InvalidEmployeeType(e.type); 
    }
}

There are several problems with this function. First, it's large, and when new employee types are added, it will grow. Second, it very clearly does more than one thing. Third, it violates the Single Responsibility Principle, because there is more than one reason for it to change. Fourth, it violates the Open Closed Principle, because it must change whenever new types are added. But possibly the worst problem with this function is that there are an unlimited number of other functions that will have the same structure.

The solution to this problem is to bury the switch statement in the basement of an ABSTRACT FACTORY, and never let anyone see it. The factory will use the switch statement to create appropriate instances of the derivatives of Employee, and the various functions, such as calculatePay, isPayday, and deliverPay, will be dispatched polymorphically through the Employee interface.

public abstract class Employee {
    public abstract boolean isPayday();
    public abstract Money calculatePay();
    public abstract void deliverPay(Money pay);
}

public interface EmployeeFactory {
    public Employee makeEmployee(EmployeeRecord r) throws InvalidEmployeeType; 
}


public class EmployeeFactoryImpl implements EmployeeFactory {
    
    public Employee makeEmployee(EmployeeRecord r) throws InvalidEmployeeType { 
        
        switch (r.type) {
            case COMMISSIONED:
                return new CommissionedEmployee(r) ;

            case HOURLY:
                return new HourlyEmployee(r);

            case SALARIED:
                return new SalariedEmploye(r);
            
            default:
                throw new InvalidEmployeeType(r.type);
        } 
    }
}

Use Descriptive Names

Remember Ward's principle: "You know you are working on clean code when each routine turns out to be pretty much what you expected." Half the battle to achieving that principle is choosing good names for small functions that do one thing. The smaller and more focused a function is, the easier it is to choose a descriptive name.

Don't be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment. Use a naming convention that allows multiple words to be easily read in the function names, and then make use of those multiple words to give the function a name that says what it does.

Function Arguments

The ideal number of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification and then shouldn't be used anyway.

Have No Side Effects

Side effects are lies. Your function promises to do one thing, but it also does other hidden things. Sometimes it will make unexpected changes to the variables of its own class. Sometimes it will make them to the parameters passed into the function or to system globals. In either case they are devious and damaging mistruths that often result in strange temporal couplings and order dependencies.

So avoid them!

Command Query Separation

Functions should either do something or answer something, but not both. Either your function should change the state of an object, or it should return some information about that object. Doing both often leads to confusion. Consider, for example, the following function:

public boolean set(String attribute, String value);

This function sets the value of a named attribute and returns true if it is successful and false if no such attribute exists. Which is bad.

The real solution is to separate the command from the query.

public boolean attributeExists(String attribute);

public void setAttribute(String attribute, String value);

Prefer Exceptions to Returning Error Codes

Returning error codes from command functions is a subtle violation of command query separation. It promotes commands being used as expressions in the predicates of if statements in the calling function. So throw exception from function instead of returning error code.

Comments

Nothing can be quite so helpful as a well-placed comment. Nothing can clutter up a module more than frivolous dogmatic comments. Nothing can be quite so damaging as an old crufty comment that propagates lies and misinformation.

The proper use of comments is to compensate for our failure to express ourself in code. Comments are always failures. We must have them because we cannot always figure out how to express ourselves without them, but their use is not a cause for celebration.

So when you find yourself in a position where you need to write a comment, think it through and see whether there isn’t some way to turn the tables and express yourself in code. Every time you express yourself in code, you should pat yourself on the back. Every time you write a comment, you should grimace and feel the failure of your ability of expression.

But comments can be good in some situations as follows.

Sometimes our corporate coding standards force us to write certain comments for legal reasons. For example, copyright and authorship statements are necessary and reasonable things to put into a comment at the start of each source file.

Informative Comments

It is sometimes useful to provide basic information with a comment. For example, consider this comment:

// format matched kk:mm:ss EEE, MMM dd, yyyy 
Pattern timeMatcher = Pattern.compile("\\d*:\\d*:\\d* \\w*, \\w* \\d*, \\d*");

Explanation of Intent

Sometimes a comment goes beyond just useful information about the implementation and provides the intent behind a decision.

//This is our best attempt to get a race condition 
//by creating large number of threads.
for (int i = 0; i < 25000; i++) {
    WidgetBuilderThread widgetBuilderThread =new WidgetBuilderThread(widgetBuilder, text, parent, failFlag);
    Thread thread = new Thread(widgetBuilderThread);
    thread.start(); 
}

Objects and Data Structures

There is a reason that we keep our variables private. We don't want anyone else to depend on them. We want to keep the freedom to change their type or implementation on a whim or an impulse. Why, then, do so many programmers automatically add getters and setters to their objects, exposing their private variables as if they were public?

Data abstraction

Consider example below

public class Point { 
    public double x; 
    public double y;
}

and

public interface Point {
    double getX();
    double getY();
    void setCartesian(double x, double y); 
    double getR();
    double getTheta();
    void setPolar(double r, double theta); 
}

Both represent the data of a point on the Cartesian plane. And yet one exposes its implementation and the other completely hides it.

The beautiful thing about the interface example is that there is no way you can tell whether the implementation is in rectangular or polar coordinates. It might be neither! And yet the interface still unmistakably represents a data structure.

But it represents more than just a data structure. The methods enforce an access policy. You can read the individual coordinates independently, but you must set the coordinates together as an atomic operation.

Data/Object Anti-Symmetry

Consider the example

public class Square { 
    public Point topLeft; 
    public double side;
}

public class Rectangle { 
    public Point topLeft; 
    public double height; 
    public double width;
}

public class Circle { 
    public Point center; 
    public double radius;
}

public class Geometry {
    public final double PI = 3.141592653589793;
    
    public double area(Object shape) throws NoSuchShapeException {

        if (shape instanceof Square) { 
            Square s = (Square)shape; 
            return s.side * s.side;
        } else if (shape instanceof Rectangle) { 
            Rectangle r = (Rectangle)shape; return r.height * r.width;
        } else if (shape instanceof Circle) {
            Circle c = (Circle)shape;
            return PI * c.radius * c.radius; 
        }
        
        throw new NoSuchShapeException(); 
    }
}

And

public class Square implements Shape { 
    private Point topLeft;
    private double side;
    public double area() { 
        return side*side;
    } 
}

public class Rectangle implements Shape { 
    private Point topLeft;
    private double height;
    private double width;
    
    public double area() { 
        return height * width;
    } 
}

public class Circle implements Shape { 
    private Point center;
    private double radius;
    public final double PI = 3.141592653589793;
    
    public double area() {
       return PI * radius * radius;
    } 
}

These two examples show the difference between objects and data structures. Objects hide their data behind abstractions and expose functions that operate on that data. Data structure expose their data and have no meaningful functions.

As seen from the example, this is the fundamental difference between object and data structure.

Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.

The complement is also true:

Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change.

The law of Demeter

There is a well-known heuristic called the Law of Demeter2 that says a module should not know about the innards of the objects it manipulates. In other words, an object should not expose its internal structure through accessors because to do so is to expose, rather than to hide, its internal structure.

More precisely, the Law of Demeter says that a method f of a class C should only call the methods of these:

  • C
  • An object created by f
  • An object passed as argument to f
  • An object held in an instance variable C

For example:

final String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath();

The following code3 appears to violate the Law of Demeter (among other things) because it calls the getScratchDir() function on the return value of getOptions() and then calls getAbsolutePath() on the return value of getScratchDir().

Data transfer object

The quintessential form of a data structure is a class with public variables and no functions. This is sometimes called a data transfer object, or DTO. DTOs are very useful structures, especially when communicating with databases or parsing messages from sockets, and so on. They often become the first in a series of translation stages that convert raw data in a database into objects in the application code.

Example of Address DTO.

public class Address { 
    private String street; 
    private String streetExtra; 
    private String city; 
    private String state; 
    private String zip;

    public Address(String street, String streetExtra, String city, String state, String zip) {
        this.street = street; 
        this.streetExtra = streetExtra; 
        this.city = city;
        this.state = state;
        this.zip = zip;
    }
    
    public String getStreet() { 
        return street;
    }

    public String getStreetExtra() { 
        return streetExtra;
    }

    public String getCity() { 
        return city;
    }

    public String getState() { 
        return state;
    }

    public String getZip() { 
        return zip;
    } 
}

Active Records

Active Records are special forms of DTOs. They are data structures with public (or bean- accessed) variables; but they typically have navigational methods like save and find. Typically these Active Records are direct translations from database tables, or other data sources.

Classes

Class organization

A class should begin with a list of variables. Public static constants, if any, should come first. Then private static variables, followed by private instance variables. There is seldom a good reason to have a public variable.

Public functions should follow the list of variables. Private utilities called by a public function right after the public function itself. This follows the stepdown rule and helps the program read like a newspaper article.

Classes Should Be Small!

The first rule of classes is that they should be small. The second rule of classes is that they should be smaller than that.

With functions we measured size by counting physical lines. With classes we use a different measure. We count responsibilities.

The Single Responsibility Principle

The Single Responsibility Principle (SRP) states that a class or module should have one, and only one, reason to change. This principle gives us both a definition of responsibility, and a guidelines for class size. Classes should have one responsibility and one reason to change.

Cohesion

Classes should have a small number of instance variables. Each of the methods of a class should manipulate one or more of those variables. In general the more variables a method manipulates the more cohesive that method is to its class. A class in which each variable is used by each method is maximally cohesive.

In general it is neither advisable nor possible to create such maximally cohesive classes; on the other hand, we would like cohesion to be high. When cohesion is high, it means that the methods and variables of the class are co-dependent and hang together as a logical whole. Maintaining cohesion results in small classes.

Organizing for Change

For most systems, change is continual. Every change subjects us to the risk that the remainder of the system no longer works as intended. In a clean system we organize our classes so as to reduce the risk of change.

Classes should be open for extension but closed for modification.

References

Clean Code - Robert C. Martin (link)