Command line tools

Note : If you have downloaded the executable Phar, replace in all commands below the sequence "php mondrian.php" by

$ ./mondrian.phar

The console

This tool is using the Symfony/Console component, it means you have an helper when you type :

$ php mondrian.php

You have an helper on a command

$ php mondrian.php help digraph

And a listing of commands :

$ php mondrian.php list

All commands which generate digraph need a mandatory argument for the directory to recursively scan and parse all PHP files.

The second argument is the basename of the generated file.
Default value is "report"

You can optionally ignore some directories with the option --ignore
Default values are : "Tests" and "vendor" to avoid scanning for vendor and Tests.

The second option is the format for the generated report. Default value is "dot" (GraphViz format)

PHP has a soft touch

As you know, PHP is (very) soft-typed. I don't want to start a debate but this kind of static analysis tool has a disavantage with PHP : it is hard to "track" the type of a given variable. I've tried some simplistic approach but in the end, when you have a line like :

call_user_func(array($this->$$objName, $methodName));
You have a good chance to mistake on the type. That's why, I've dismissed any chance to analyse that and came up with a fine-tuning configuration stored in a file to remove bad links (and add some other).

You have to add a file named ".mondrian.yml" (don't forget the dot) in the package root directory. Here is a sample. It means the method "Trismegiste\Mondrian\Builder\Linking::run" will not be linked with "Trismegiste\Mondrian\Parser\PackageParser::parse" because of name collision on "parse".

A command is provided to instantly generate a default configuration with ALL links (bad or good) :

$ php mondrian.php typehint:config ~/MyMessyProject
You have to check and keep only the "false positive" links you want to ignore : please don't add ALL links !

Make the graph

Generates a digraph on the GraphViz format (only .dot, currently other formats are not tested)

$ php mondrian.php digraph ~/MyMessyProject

This digraph is a general picture of the source. If it is too heavy or too messy, you can try to generate a smaller one on a deeper directory. The same way you don't refactor all your app in one pass, you start by smaller components.

After the generation, this command prints some useful metrics. They are useful to fast evaluate what kind of project you have to refactor. But it is not a guide where you have to go. Of course a project with a 50/50 ratio in interfaces/classes can be a good thing but if classes are used in parameters of methods instead of interfaces, interfaces are not really usefull. It's easy to fake good metrics.

This analyser also counts where methods are declared first in the inheritance tree. A good point can be that you have low count of method first declared in class. This can mean you can decouple your concrete classes (remember LSP)

From my experience, it's better to have dirty code in loosely coupled classes than beautiful code in highly coupled classes, because your beautiful code does not stand a chance against the entropy of changing.

Dirty code can be refactored, even in paralell process, if you have loosely coupling.

In short : Bad coding practices has bad metrics but good metrics does not means good coding practices. That's why I didn't push too far these statistics.

Centrality

This tool find the "center of the source code". More precisely it calculates the centrality of the digraph with the eigenvectors of the adjacency matrix. With this you can determinate what component is critical and what component could be refactored later.

This tool helps you to find two effects on some components:

The ripple effect

$ php mondrian.php ripple ~/MyMessyProject

One component (class, parameter, method...) can be highly used accross the source code. Each time there is a change in this component, chances are you need to change many other components directly depending on it and so on. That's the ripple. With the "used" algorithm you can view what component is time consuming and can lead to many merge conflicts in Git.

The bottleneck effect

$ php mondrian.php bottleneck ~/MyMessyProject

Do you remember this project where everytime you made a change somewhere, THAT class need to be modified too ? The "depend" algorithm finds this kind of problems. It searchs for the depencencies, but not only direct dependencies but also the combination of dependencies accross all the vertices of a digraph. My recommandation : abstract this component first : make multiple interfaces, explode it with strategy pattern, decorator, CoR etc... All bugs are "drown" to this component like a blackhole.

Spaghetti coupling

SpaghettiCoupling is an analyser which finds coupling between classes with theirs concrete components.

$ php mondrian.php spaghetti ~/MyProject

Example :
In the implementation of the method A::doThing(), there is a call to the method B::getThing().

If B::getThing() is declared in B, the two methods are coupled. One can find a directed path between these implementation vertices.

If B::getThing() is an implementation of C::getThing() declared in the C interface from which B inherits, there is no coupling because, A::doThing() is linked to C::getThing(), therefore no directed path. Liskov principle is safe.

The first case is what I call "modern spaghetti code" : yes you haZ objects and classes but you are not S.O.L.I.D. You rely on concrete class, not abstraction, not "contract" (interface). Your classes are just a collection of functions with an attached data structure, not an abstract idea.

Therefore, each time you make a modification in B::getThing(), you can break its contract and break something in A::doThing(). Worst, A has a link to B, therefore A can call anything in B. Classes get fat, instable, and you fear each time you move a semi-colon.

Hidden coupling

This is an analyser which checks and finds hidden coupling between types.

$ php mondrian.php hidden ~/MyProject

This analyser searches for method calls. Everytime there is a call of a method against an object ( $obj->getThing() ), it means an edge from an implementation vertex where the call is to a method signature vertex.

Since "$obj" does not come from nowhere, its type (class or interface) must be known by the class owning the implementation vertex. In other words : If there is an edge from an implementation to a method, there must be at least one another directed path between these two vertices (through the class vertex, through a parameter vertex, superclass etc...) If you can't figure why, I recommand you to read the digraph language I've defined in this intent.

If there is none, *maybe* it means a hidden coupling. I add the "maybe" because, it's hard to find the type of "$obj" in soft-typed language like PHP. That's why there can be false positive. But it's easier to check false positives than to search through all over the php files to find that kind of weakness in the code.

Cyclic coupling

Can be renamed : How to avoid cyclic dependencies ? When yo have a cycle in source code, you are really screwd because it's very difficult to know where to break it. This tool finds cycles between vertices in the digraph.

$ php mondrian.php cycle ~/MyProject

This command uses the Tarjan algorithm for finding Strongly Connected Components. When a cycle is found between components, they are embedded in a cluster. With this graph, you can easily spot two problems :

  • which components are entangled ?
  • where to attack your monolith at the weak point

Concreteness

This tool generates a graph reduced to all calls to concrete methods to show the "lack of abstraction" in method parameters : if you have a class, the client could call anything. If you have an interface, you achieve the highest level of the Liskov substitution principle since the client only need to know the contract of the object, not its real type.

LSP is the first requirement to achieve ISP in a second pass of refactoring.

$ php mondrian.php liskov ~/MyProject

After the graph is reduced, there is centrality algorithm to put colours. The most used items are in red and the less are in green. With this, you can visually find where refactoring is the most cost-effective for abstraction and decoupling.

Beware : edges in this digraph are not oriented like the grammar I have defined. The edges Class ⇨ Method are reversed and Implementation ⇨ Method is replaced by Method (of that Implementation) ⇨ Class (of that Method).

Refactoring : the Good

This is a refactoring tool which edits and generates source code in your project.
The purpose :

  • It creates a new interface for each class with annotation like "@mondrian contractor NewInterfaceName".
  • it replaces all these classes by their new interface in methods parameters (public or not, this is important)
  • it adds the inheritance for NewInterfaceName

$ php mondrian.php refactor:abstract ~/MyProject

Each interface is stored in the same namespace, neighbour of the class in a directory. NewInterfaceName is a short name not a FQCN. It is not possible to store the generated content in another directory since everybody uses Git or at least SVN. Therefore you can launch the test suite immediately.

It is a dumb refactoring but it makes the dull job to create new interfaces by gathering public methods for each class in only one pass. There is no name collision check or whatsoever.

The boring stage of sequences of ctrl-C/ctrl-V/ctrl-X is passed, now it is time to use your brain and think about domain, model, business and object contract :)

Thereafter, you need to create a tree of contracts with these 'not-really-abstract' interfaces. You need to put common contract in parent interface, find common methods, remove unused methods, rename, move interfaces in other namespace etc... The perfect time to work with the digraph on the second screen.

Note: All classnames are transformed in FQCN. It is not beautiful but actually, it is more useful than I thought : since these interfaces will be splitted, renamed or moved, you don't have to think about "use" statements and massive "search & replace" are made easier.

Refactoring : the Bad

This tool searches for interfaces with class hint for parameters. This is bad because each time you inherit from these interface, you create a new coupling between concrete classes and god kills a kitten.

That's why these interfaces are literally "coupling generators", this is a seed for spaghetti coupling.

$ php mondrian.php badcontract ~/MyProject

This digraph is reduced to the revelant vertices : interfaces linked to methods linked to classes. Parameters and implementations vertices are removed, as well as useless edges.

This command is not really useful for existing code. It is more relevant in conjunction with the command refactor:abstract. Because code generation could become messy and prone to error, you can check the quality of the new generated interfaces with this tool.

Factory of factories

This command generates a protected method in a class each time the parser finds a "new" statement. Why ? Because this is the first step to stub a new instance by a mockup object in unit tests.

$ php mondrian.php refactor:factory ~/MyProject/MyClass.php
In a rapid creation of unit tests before the refactoring, you can easily make deep tests for your classes and initiating the decoupling process. When you have new instances in closed areas, you can start thinking about factory method patttern, builders, abstract factory, injection and so on.

Like the refactor:abstract command, it is not magic, it's only a copy-cut-paste robot. Names of factory methods are not pretty but it avoids name collision. You have to change them.