Friday, September 18, 2009

Smart Perl

[Spanish original source]
In the previous article we saw an example of modern Perl, today we'll delve a bit more into Perl 5.10 smart matching, and how this combined with the dynamic nature of the language lead us to a ridiculously small program, which is also easier to understand and maintain.
I once read (I think from Paul Graham) that when sections of code seems very similar, it usually means that a level of abstraction is required, of course  he is a Lisp programmer, and has defmacro. However Perl also has its own means, and in this case our first solution could be based on a hash that includes the functions allowed in our calculator:
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my %FUNCS = (
10     sum                => 0,
11     mean               => 0,
12     count              => 0,
13     variance           => 0,
14     standard_deviation => 0,
15     min                => 0,
16     mindex             => 0,
17     max                => 0,
18     maxdex             => 0,
19     sample_range       => 0,
20     median             => 0,
21     harmonic_mean      => 0,
22     geometric_mean     => 0,
23     mode               => 0,
24     trimmed_mean       => 0,
25 );
26 
27 my $s = Statistics::Descriptive::Full->new();
28 while (1) {
29     print "Listo> ";
30     my $command = readline(STDIN) // last;
31     $command =~ s/^\s+//; $command =~ s/\s+$//;
32     given ($command) {
33         when ( looks_like_number($_) ) { $s->add_data($command) }
34         when (/^(exit|quit)$/)         {last}
35         default {
36             if   ( exists $FUNCS{$command} ) { ... }
37             else                             { say SYNTAX_ERROR}
38         }
39     }
40 }

This is a good step forward, because we are simplifying code in the complicated part of our program, and replacing it with a simple declaration of a hash, where including new functions is as simple as adding a line.
Of course any astute reader already noticed that I cheated because the program is incomplete and line 36 requires an action, our problem is how do we invoke the right method for the operation, and as usual there is more than one way to do it, the worst could have the hash filled with references to the methods, like this:

10     sum  => \&Statistics::Descriptive::sum,

which allows to invoke the methods as:

36     if ( exists $FUNCS{$command} ) { say "$command = " . $FUNCS{$command}($s) }

That is the worst way because you have to know a lot of Perl to understand how it works, and Perl has the ability to dispatch methods symbolically allow us to make our intention perfectly clear with code easier to understand:
36     if ( exists $FUNCS{$command} ) { say "$command = " . $s->$command }

The runtime cost involved in the later is higher than the former, but it is a price that I am willing to pay gladly, because it makes the program much easier to understand, and lately people is much more expensive than machines.
Finally, if laziness is one of your principles, you can rewrite the definition of the hash like this:
 9 my %FUNCS = map { $_ => 0 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean );
which I like, because it saves me some punctuation (which seems to overwhelm many people) and I have less probability of making a syntax error.
Basically I'm building a list of words (the names of the methods) with "qw", from this list I build another (using map) that contains each element of the original list ($ _) followed by 0, perl automatically converts this list into a hash where each name is associated with 0 as its value.
If you think the above explanation was complicated or incomprehensible, you can see the Perl documentation for map, what will be great because you can also learn something about functional programming, and that will be very helpful for sure.
Now I'm going to get rid of the "if", I prefer multi-way conditionals because they are linear, they look better and are easier to follow, that's why I think that the given/when is the best thing that happened to Perl in a long time, besides I am also got rid of the regular expression in line 34 for something that makes more sense for a Perl outsider:
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean );
12 
13 my $s = Statistics::Descriptive::Full->new();
14 while (1) {
15     print "Listo> ";
16     my $command = readline(STDIN) // last;
17     $command =~ s/^\s+//; $command =~ s/\s+$//;
18     given ($command) {
19         when ( looks_like_number($_) ) { $s->add_data($command) }
20         when ( ["exit", "quit"] )      {last}
21         when (%FUNCS)                  { say "$command = " . $s->$command }
22         default                        { say SYNTAX_ERROR }
23     }
24 }
Now it looks a lot better (it even resembles Erlang).
I am using some functions of the smart matching that will explain below.
In line 20 there is an array matching:
$command ~~ ["exit", "quit"]
Matching a scalar (to left) against an array (to the right) is equivalent to:
sub match_scalar_arrayref {
    my ($scalar, $arrayref) = @_;
    for my $item ( @$arrayref ) {
        return 1 if $scalar eq $item;
    }
    return undef;
}
I can't remember how many times I've wrote code like that, or this:
if ( grep { $scalar eq $_ } @$arrayref ) ...
which I may be able to write clearly and with less work:
if ( $scalar ~~ $arrayref ) ...
Probably you've already guessed that match at line 21 is equivalent to:
if ( exists $hash{$scalar} ) ...

A little temptation

I received a reader's suggestion that make our program shorter and easier to maintain, the idea was to change the line:

21    when (%FUNCS) { say "$command = " . $s->$command }

by:
21    when ($s->can($command)) { say "$command = " . $s->$command }

The "can" method is provided by the UNIVERSAL class, from which all objects ultimate derive in Perl, and the purpose of this method is to determine if an object or class has a particular method.
By using this we could delete %FUNCS completly, our interpreter will be automatically updated with new commands as Statistics:: Descriptive evolves, which sounds very good from the standpoint of maintainability, however, this has a fatal flaw for me: is not safe.
The problem is that I lose control over what Perl runs automatically, which may not be critical in this case, but it could be extremely dangerous. So I prefer the security and keep the hash as a mechanism of dispatch (and authorization of use).
The moral is to be careful when using dynamic execution control mechanisms, especially when using data from external unreliable sources to be injected into this execution control mechanisms.

Finishing the program



Line 22 gives an error when a command is unknown, the message says to use "help" for help, but the command "help" is not implemented yet, a quick way to implement it is:
23         when ("help") {
24             say "Los comandos vĂ¡lidos son: "
25                 . join( ", ", qw(exit quit help), keys %FUNCS )
26         }
Wow, that was easy, and the best is that it is also consistent because it uses the same data structure to report, select and authorize the commands.
One command that I forgot to include in the calculator in the previous article was "clear", adding this function now is as simple as putting a new word in the definition of %FUNCS:
 9 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean clear )
It was easy, right?. The best thing is that the new command appears automatically in the help because the program is consistent.
Lets  recap today's accomplishments, we have a program:
  • Very compact
  • Easy to understand
  • Easy to mantain
  • Consistent
  • Safe

Perl is as good or as any other language at many fronts including good design a quality, but few languages offer mechanisms like the ones used here to develop this program with such little work.
Next time I will improve the example with a manual of statistic functions with very little effort.
I will say goodbye with the final version of the program.
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean clear );
12 
13 my $s = Statistics::Descriptive::Full->new();
14 while (1) {
15     print "Listo> ";
16     my $command = readline(STDIN) // last;
17     $command =~ s/^\s+//;
18     $command =~ s/\s+$//;
19     given ($command) {
20         when ( looks_like_number($_) ) { $s->add_data($command) }
21         when (%FUNCS)                  { say "$command = " . $s->$command }
22         when ( [ "exit", "quit" ] )    {last}
23         when ("help") {
24             say "Los comandos vĂ¡lidos son: "
25                 . join( ", ", qw(exit quit help), keys %FUNCS )
26         }
27         default { say SYNTAX_ERROR };
28     }
29 }

2 comments:

  1. I tend in such cases to use a method name prefix - say '_do_op_' - so then you write a

    method _do_op_sum (...) { ... }

    and your code would become

    when ($self->can("_do_op_${op}")) { say "${op} = ".$self->${\"_do_op_${op}"}(@args); }

    or so.

    ReplyDelete
  2. Please find it in your heart to blog more.

    ReplyDelete