Skip to content
jbrooksuk edited this page Oct 28, 2014 · 3 revisions

Short description of how things are done. For those who would like to contribute to the project.

Analysis

Tokenization

  • The run.php file launch the scan on all the selected files / directories.
  • The Tokenizer.php file take a file in entry and return an array of TokenInfo items.
  • the PHPCheckStyle.php file is the main part of the project. It launch the tokenization of the files and then analyse the stream of tokens.
    • The processToken method is a big SWITCH / CASE that launch a processXXX method depending on the token .
    • The processXXX methods detect different cases and launch the "check" rules.
    • The checkXXX methods do the checks of the rules that are activated.

We use the default PHP Tokenizer that we extend to identify tabs and returns and to add a few tokens. This allow the project to work on any computer having PHP installed without any modification.

A cleaner / more complete solution would be to use a proper AST and parse the files with complete information about each token and its context.

The difficulty is that PHP doesn't have a real official grammar and it's not easy to build such an analyser. Some projects could help do that PHP-Parser or we could use Facebooks work on HipHopVM.

Statements Stack

To compensate for the lack of a real AST, we build during the analysis a stack of currently opened statements.

StatementItem objets are stored in the StatementStack. this allow us to have some limited contextual information.

This can be easily visualised by launching the tool with the --debug flag. It will display something like this:

CLASS(PHPCheckstyle) -> FUNCTION(_processControlStatement) -> IF -> IF

Reporting

When a check rule is not verified an error message is sent to one or more reporters. All the reporters (Console, HTML, XML, ...) extend the Reporter class.

Clone this wiki locally