Sunday, September 27, 2009

Statistic Calculator: User friendly cosole

[Original Spanish source]
One of the nice features for console applications is the ability to edit the command line and reusing previous commands, when these two characteristics meet the application is much friendlier. So let's add these features to the calculator of the last article.
This is another job where CPAN shows why it is Perl' s best feature, I'll use Term::ReadLine, a unified interface for console reading, this library allows the use of several backends that implement it's functionality, and for our example to work I installed Term::ReadLine::Perl, but I suppose that Term::ReadLine::Gnu would work just as well. Both are interfaces of the GNU readline(3) library used by many applications, including bash.
To add a bash like interface to the calculator, we just need to change lines 27 and 28 by yhis:
28     my $command = $term->readline("Listo> ") // last;
you must declare the use of the module, and initialize the terminal creating the $term object that will allow us to invoke the readline method:
 7 use Term::ReadLine;
 8 
 9 my $term = new Term::ReadLine 'Statistic Calculator';
I used Term::ReadLine and not any of the specific variants, because this one is responsible for selecting and loading a variant automatically, but also allows the user to have the control if necessary. The library has functions that allow you to decorate the prompt, autocompleting and some other goodies, CPAN again saves the day by just reading a module's documentation.
I will fix some other annoying things that go through (read: bugs), first the most annoying but the easier to solve: an empty command gives an error message because an empty string is not matched by any when clause, to solve it, I  just added:
when ("")  {  } # si el comando es vacío no hacer nada
and it's ready.
A harder problem are commands that return undef, such as "clear", and produces a warning thanks to the implicit use warnings by Modern::Perl:
Listo> clear
Use of uninitialized value in concatenation (.) or string at calc1.pl line 32.
clear =

To solve this problem I'm going to make a subroutine to apply functions and print results, so the dispatch cycle remains as clear a possible, so this:
32   when (%FUNCS)  { say "$command = " . $s->$command }
will become this:
32   when (%FUNCS)  { apply $command }
then we add the "apply" subroutine, that receives a command, executes it, and gets the result, but it returns without saying anything unless the result is defined and not an empty string (notice the remarkable similarity between last sentence the Perl one):
26     return unless defined $result and $result ne "";
The calculator is now more user friendly, but still has some problems, if you try to execute "trimmed_mean", you'll notice lots of warnings, the manual ("man") describes the cause, "trimmed_mean" function receives parameters, but our program doesn't know how to handle this, so in next article I will fix this, and also make it display complex return values, such arrays and hashes.
Now our full program looks like this:
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 use Pod::Perldoc;
 7 use Term::ReadLine;
 8 
 9 my $term = new Term::ReadLine 'Statistic Calculator';
10 
11 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
12     min mindex max maxdex sample_range median harmonic_mean geometric_mean
13     mode trimmed_mean clear );
14 
15 my @COMMANDS = qw( exit quit help man );
16 
17 sub help { say "Comandos: " . join( ", ", sort @COMMANDS, keys %FUNCS ) }
18 
19 sub man { Pod::Perldoc->new(args => \@_)->process }
20 
21 my $s = Statistics::Descriptive::Full->new();
22 
23 sub apply {
24     my $command = shift;
25     my $result = $s->$command;
26     return unless defined $result and $result ne "";
27     say "$command = $result";
28 }
29 
30 while (defined(my $command = $term->readline("Listo> "))) {
31     $command =~ s/^\s+//; $command =~ s/\s+$//;
32     given ($command) {
33         when ( looks_like_number($_) ) { $s->add_data($command) }
34         when (%FUNCS)                  { apply $command }
35         when ("man")                   { man "Statistics::Descriptive" }
36         when ( [ "exit", "quit" ] )    { last }
37         when ("help")                  { help }
38         when ("")                      { }
39         default                        { say "Error: tipee 'help' para ayuda" }
40     }
41 }

Wednesday, September 23, 2009

Statistic Calculator: Using the system

[Original spanish source]
In the last article I told you we'll add a manual to our calculator, let's see some ways to interact with the Perl documentation system to achieve this.
The command to look at calculator's documentation will be "man", and as I assume most people should have used perldoc, I shall start by using this command to display a manual.
Perl has long has the ability to execute system commands in several ways, one of which is the "`" operator, if you write something like:
1  my @out = `ls -l`
the array @out will end with each of the output lines of the command, you can also execute commands and use its output as input to a program using open:
1 open $fd, "-|", "ls -ls" or die "Error: $!"
2 while ( readline $fd ) {
3     # $_ contiene una línea de la salida del comando
4 }
which is surprising, though not highly recommended in these days, although I use it over and over at work, where I solve lots of things just doing little programs in Perl.
But today I am interested in the "system" function that I will use as the first way to add a manual for our calculator. Since I don' t want to complicate the issue showing how to write a manual, I will use the Statistics::Descriptive's manual, the solution is to add a line to last article's program:
21  when ("man")  { system("perldoc Statistics::Descriptive") }
and that's it, such is Perl, there is nothing easier, some may say it is dirty, but it was definitively easy. When we use "system" perl sends the command directly to the shell, so is better to use it in this way:
21  when ("man")  { system("/usr/bin/perldoc", "Statistics::Descriptive") }
when system receives a list, perl executes the first element and pass the remaining ones as arguments, avoiding some security glitches that could occur, but the command isn't searched through the PATH, so you must pass the full path to the command executable.
It is not a big surprise that the perldoc command is also written in Perl, so we probably can reuse the code for this program in our calculator. Looking into the program you will realize that perldoc is a very simple program, in fact, the two important lines are:
1 use Pod::Perldoc;
2 Pod::Perldoc->run();
So all the perldoc's functionality is packed inside a Perl object!, this is a major pattern of Perl culture, allowing any application to be easily reused by another, and that is exactly what we want do, unfortunately someone forgot to document Pod::Perldoc so I got to look at the source code for hints about how may I integrate it into the calculator, I came up with the following:
21  when ("man") { Pod::Perldoc->new(args => ["Statistics::Descriptive"])->process }
and of course we must declare the use of the class, which I did at the beginning of the program:
 6 use Pod::Perldoc;
The real job was to learn how Pod::Perldoc works and it took me less than 2 minutes to get there using the excelent perl debugger.
Finally I took some time to refactor the code a little bit, improve the "help" command, and left the full program like this:
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 use Pod::Perldoc;
 7 
 8 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 9 
10 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
11     min mindex max maxdex sample_range median harmonic_mean geometric_mean
12     mode trimmed_mean clear );
13 
14 my @COMMANDS = qw( exit quit help man );
15 
16 sub help {
17     say "Comandos: " . join( ", ", @COMMANDS );
18     say "Funciones: " . join( ", ", keys %FUNCS );
19 }
20 
21 sub man {
22     Pod::Perldoc->new(args => \@_)->process
23 }
24 
25 my $s = Statistics::Descriptive::Full->new();
26 while (1) {
27     print "Listo> ";
28     my $command = readline(STDIN) // last;
29     $command =~ s/^\s+//; $command =~ s/\s+$//;
30     given ($command) {
31         when ( looks_like_number($_) ) { $s->add_data($command) }
32         when (%FUNCS)                  { say "$command = " . $s->$command }
33         when ("man")                   { man "Statistics::Descriptive" }
34         when ( [ "exit", "quit" ] )    { last }
35         when ("help")                  { help }
36         default                        { say SYNTAX_ERROR }
37     }
38 }
In the next article I will add more features to make calculator more user friendly.

Friday, September 18, 2009

Smart Perl

[Spanish original source]
In the previous article we saw an example of modern Perl, today we'll delve a bit more into Perl 5.10 smart matching, and how this combined with the dynamic nature of the language lead us to a ridiculously small program, which is also easier to understand and maintain.
I once read (I think from Paul Graham) that when sections of code seems very similar, it usually means that a level of abstraction is required, of course  he is a Lisp programmer, and has defmacro. However Perl also has its own means, and in this case our first solution could be based on a hash that includes the functions allowed in our calculator:
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my %FUNCS = (
10     sum                => 0,
11     mean               => 0,
12     count              => 0,
13     variance           => 0,
14     standard_deviation => 0,
15     min                => 0,
16     mindex             => 0,
17     max                => 0,
18     maxdex             => 0,
19     sample_range       => 0,
20     median             => 0,
21     harmonic_mean      => 0,
22     geometric_mean     => 0,
23     mode               => 0,
24     trimmed_mean       => 0,
25 );
26 
27 my $s = Statistics::Descriptive::Full->new();
28 while (1) {
29     print "Listo> ";
30     my $command = readline(STDIN) // last;
31     $command =~ s/^\s+//; $command =~ s/\s+$//;
32     given ($command) {
33         when ( looks_like_number($_) ) { $s->add_data($command) }
34         when (/^(exit|quit)$/)         {last}
35         default {
36             if   ( exists $FUNCS{$command} ) { ... }
37             else                             { say SYNTAX_ERROR}
38         }
39     }
40 }

This is a good step forward, because we are simplifying code in the complicated part of our program, and replacing it with a simple declaration of a hash, where including new functions is as simple as adding a line.
Of course any astute reader already noticed that I cheated because the program is incomplete and line 36 requires an action, our problem is how do we invoke the right method for the operation, and as usual there is more than one way to do it, the worst could have the hash filled with references to the methods, like this:

10     sum  => \&Statistics::Descriptive::sum,

which allows to invoke the methods as:

36     if ( exists $FUNCS{$command} ) { say "$command = " . $FUNCS{$command}($s) }

That is the worst way because you have to know a lot of Perl to understand how it works, and Perl has the ability to dispatch methods symbolically allow us to make our intention perfectly clear with code easier to understand:
36     if ( exists $FUNCS{$command} ) { say "$command = " . $s->$command }

The runtime cost involved in the later is higher than the former, but it is a price that I am willing to pay gladly, because it makes the program much easier to understand, and lately people is much more expensive than machines.
Finally, if laziness is one of your principles, you can rewrite the definition of the hash like this:
 9 my %FUNCS = map { $_ => 0 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean );
which I like, because it saves me some punctuation (which seems to overwhelm many people) and I have less probability of making a syntax error.
Basically I'm building a list of words (the names of the methods) with "qw", from this list I build another (using map) that contains each element of the original list ($ _) followed by 0, perl automatically converts this list into a hash where each name is associated with 0 as its value.
If you think the above explanation was complicated or incomprehensible, you can see the Perl documentation for map, what will be great because you can also learn something about functional programming, and that will be very helpful for sure.
Now I'm going to get rid of the "if", I prefer multi-way conditionals because they are linear, they look better and are easier to follow, that's why I think that the given/when is the best thing that happened to Perl in a long time, besides I am also got rid of the regular expression in line 34 for something that makes more sense for a Perl outsider:
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean );
12 
13 my $s = Statistics::Descriptive::Full->new();
14 while (1) {
15     print "Listo> ";
16     my $command = readline(STDIN) // last;
17     $command =~ s/^\s+//; $command =~ s/\s+$//;
18     given ($command) {
19         when ( looks_like_number($_) ) { $s->add_data($command) }
20         when ( ["exit", "quit"] )      {last}
21         when (%FUNCS)                  { say "$command = " . $s->$command }
22         default                        { say SYNTAX_ERROR }
23     }
24 }
Now it looks a lot better (it even resembles Erlang).
I am using some functions of the smart matching that will explain below.
In line 20 there is an array matching:
$command ~~ ["exit", "quit"]
Matching a scalar (to left) against an array (to the right) is equivalent to:
sub match_scalar_arrayref {
    my ($scalar, $arrayref) = @_;
    for my $item ( @$arrayref ) {
        return 1 if $scalar eq $item;
    }
    return undef;
}
I can't remember how many times I've wrote code like that, or this:
if ( grep { $scalar eq $_ } @$arrayref ) ...
which I may be able to write clearly and with less work:
if ( $scalar ~~ $arrayref ) ...
Probably you've already guessed that match at line 21 is equivalent to:
if ( exists $hash{$scalar} ) ...

A little temptation

I received a reader's suggestion that make our program shorter and easier to maintain, the idea was to change the line:

21    when (%FUNCS) { say "$command = " . $s->$command }

by:
21    when ($s->can($command)) { say "$command = " . $s->$command }

The "can" method is provided by the UNIVERSAL class, from which all objects ultimate derive in Perl, and the purpose of this method is to determine if an object or class has a particular method.
By using this we could delete %FUNCS completly, our interpreter will be automatically updated with new commands as Statistics:: Descriptive evolves, which sounds very good from the standpoint of maintainability, however, this has a fatal flaw for me: is not safe.
The problem is that I lose control over what Perl runs automatically, which may not be critical in this case, but it could be extremely dangerous. So I prefer the security and keep the hash as a mechanism of dispatch (and authorization of use).
The moral is to be careful when using dynamic execution control mechanisms, especially when using data from external unreliable sources to be injected into this execution control mechanisms.

Finishing the program



Line 22 gives an error when a command is unknown, the message says to use "help" for help, but the command "help" is not implemented yet, a quick way to implement it is:
23         when ("help") {
24             say "Los comandos válidos son: "
25                 . join( ", ", qw(exit quit help), keys %FUNCS )
26         }
Wow, that was easy, and the best is that it is also consistent because it uses the same data structure to report, select and authorize the commands.
One command that I forgot to include in the calculator in the previous article was "clear", adding this function now is as simple as putting a new word in the definition of %FUNCS:
 9 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean clear )
It was easy, right?. The best thing is that the new command appears automatically in the help because the program is consistent.
Lets  recap today's accomplishments, we have a program:
  • Very compact
  • Easy to understand
  • Easy to mantain
  • Consistent
  • Safe

Perl is as good or as any other language at many fronts including good design a quality, but few languages offer mechanisms like the ones used here to develop this program with such little work.
Next time I will improve the example with a manual of statistic functions with very little effort.
I will say goodbye with the final version of the program.
 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
10     min mindex max maxdex sample_range median harmonic_mean geometric_mean
11     mode trimmed_mean clear );
12 
13 my $s = Statistics::Descriptive::Full->new();
14 while (1) {
15     print "Listo> ";
16     my $command = readline(STDIN) // last;
17     $command =~ s/^\s+//;
18     $command =~ s/\s+$//;
19     given ($command) {
20         when ( looks_like_number($_) ) { $s->add_data($command) }
21         when (%FUNCS)                  { say "$command = " . $s->$command }
22         when ( [ "exit", "quit" ] )    {last}
23         when ("help") {
24             say "Los comandos válidos son: "
25                 . join( ", ", qw(exit quit help), keys %FUNCS )
26         }
27         default { say SYNTAX_ERROR };
28     }
29 }

Tuesday, September 15, 2009

Using Modern Perl

[Source article in spanish]
I will try to write a series of articles about Perl, showing how easy and quick is to make solutions based on this platform.
For this I chose a simple design that allows me to illustrate a number of techniques and best practices, with an algorithm accessible to any developer even to a rookie.
The example program will be a statistical calculator which at first will be written in a traditional style, but will gradually become more flexible and easier to maintain, while applying some unique mechanisms of language and some libraries from CPAN.
The grand finale is to make the calculator as a web application using a suprising mechanism available for Perl. Having said that, I will start using modern Perl now.
Giving honor to the title of the article, the first thing our program does is to use the module Modern::Perl, which is a shortcut to say:

use feature ':5.10';
use strict;
use warnings;
use mro 'c3';
That is, turns on all the features introduced in Perl 5.10, also activates the strict and warnings, and finally set the method resolution order to the C3 algorithm. As expected all the examples we will see throughout this series of articles, will only work in Perl 5.10, because I'm trying to promote as many new features as possible, so: install Perl 5.10 now.
Modern Perl advocates strongly recommended the use of strict because it captures many common errors, including accidental use of symbolic references, and typographical errors in variable names, at the cost of declaring them with our (globals) or my (lexicals) before use.
Perl warnings inform us about possible errors in coding. In Perl 5.10 strict is more strict and warnings gives many new warnings, so, they catch more problems than before, which usually improves the overall quality of code and save debugging time.
In my case, when I wanted to read a command or finish the cycle in case of an end of file, so I wrote:

my $comando = readline(STDIN) or last;
Perl immediately warned me that in some cases undef (which signals the EOF) could be confused with "0" (zero) coming from the file, because perl interprets "0" and undef as false values. One way to correct the instruction would be:

defined (my $comando = readline(STDIN)) or last;
But I rather use the new operator // (defined or), that simplifies the statement:

my $comando = readline(STDIN) // last;
The C3 method resolution order, solves some problems with the original resolution order of Perl, and it is advisable to always use it in new code, this is not entirely new, there are modules that use this resolution order for some 4 years now, beacause of a CPAN module (Class:: C3) but now C3 has native support in the language.
So the first tip is to use Modern:: Perl everywhere, because it activates a number of useful and recommended features of Perl in one shot.
Returning to the program, after using Modern:: Perl, it imports the subroutine looks_like_number() of Scalar:: Util, which saved me the trouble of writing regular expressions to recognize numbers, and also saves a lot of panic from readers that can freeze just by looking at those regular expressions.
The last module in use is the main ingredient of the calculator, it never crossed my mind to write statistical algorithms, that's the pupose of CPAN, which has almost everything in it. I choose to use Statistics:: Descriptive, which serves my purpose perfectly.
Line 7 declares a constant with an error message and line 9 defines a variable with an object of class Statistics::Descriptive::Full which will be the state of the calculator during the main loop.
The main loop is simple: read a command or terminate (last) if reached end of file [line 12], then remove the spaces from the left and right of the command [line 13], if the command is a number add it to the dataset [line 15] and if not, select and execute a command.
The selection is done with the new control structure of Perl 5.10 given/when [lines 18-36] that performs smart matching between the given value and the when clauses. As the matching is "smart" depends on the operands, and generally works as expected, however there are some oddities and it never hurts to read the manual.
Finally, the new say operator is just a print which puts a newline at the end of the string, avoiding a lot of concatenations with "\n" and therefore contributing to code clarity.

 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my $s = Statistics::Descriptive::Full->new();
10 while (1) {
11     print "Listo> ";
12     my $command = readline(STDIN) // last;
13     $command =~ s/^\s+//; $command =~ s/\s+$//;
14     if ( looks_like_number($command) ) {
15         $s->add_data($command);
16     }
17     else {
18         given ($command) {
19             when ("sum")                { say "$command = " . $s->sum() }
20             when ("mean")               { say "$command = " . $s->mean() }
21             when ("count")              { say "$command = " . $s->count() }
22             when ("variance")           { say "$command = " . $s->variance() }
23             when ("standard_deviation") { say "$command = " . $s->standard_deviation() }
24             when ("min")                { say "$command = " . $s->min() }
25             when ("mindex")             { say "$command = " . $s->mindex() }
26             when ("max")                { say "$command = " . $s->max() }
27             when ("maxdex")             { say "$command = " . $s->maxdex() }
28             when ("sample_range")       { say "$command = " . $s->sample_range() }
29             when ("median")             { say "$command = " . $s->median() }
30             when ("harmonic_mean")      { say "$command = " . $s->harmonic_mean() }
31             when ("geometric_mean")     { say "$command = " . $s->geometric_mean() }
32             when ("mode")               { say "$command = " . $s->mode() }
33             when ("trimmed_mean")       { say "$command = " . $s->trimmed_mean() }
34             when (/^(exit|quit)$/)      {last}
35             default                     { say SYNTAX_ERROR }
36         }
37     }
38 }
To use the calculator simply execute the file, below is a test run:

opr@toshi$ perl stat.pl
Listo> 19
Listo> 45
Listo> 24
Listo> 15
Listo> 39
Listo> 48
Listo> 36
Listo> count
count = 7
Listo> 10
Listo> 28
Listo> 30
Listo> count
count = 10
Listo> mean
mean = 29.4
Listo> standard_deviation
standard_deviation = 12.685950233756
Listo> salir
Error: tipee 'help' para ayuda
Listo> help
Error: tipee 'help' para ayuda
Listo> exit
opr@toshi$

A simple improvement

A better way to write the program would be to delete the if statement at line 15 and make a new "when" clause, this also allows me to show that given topicalizes $_ to the given value and when clauses not only compare strings (using eq) and regular expressions (using =~) but also allow, among others, to write boolean expressions using $_ as an alias to the value being matched.

 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 
 7 use constant SYNTAX_ERROR => "Error: tipee 'help' para ayuda";
 8 
 9 my $s = Statistics::Descriptive::Full->new();
10 while (1) {
11     print "Listo> ";
12     my $command = readline(STDIN) // last;
13     $command =~ s/^\s+//; $command =~ s/\s+$//;
14     given ($command) {
15         when ( looks_like_number($_) ) { $s->add_data($command) }
16         when ("sum")                   { say "$command = " . $s->sum() }
17         when ("mean")                  { say "$command = " . $s->mean() }
18         when ("count")                 { say "$command = " . $s->count() }
19         when ("variance")              { say "$command = " . $s->variance() }
20         when ("standard_deviation")    { say "$command = " . $s->standard_deviation() }
21         when ("min")                   { say "$command = " . $s->min() }
22         when ("mindex")                { say "$command = " . $s->mindex() }
23         when ("max")                   { say "$command = " . $s->max() }
24         when ("maxdex")                { say "$command = " . $s->maxdex() }
25         when ("sample_range")          { say "$command = " . $s->sample_range() }
26         when ("median")                { say "$command = " . $s->median() }
27         when ("harmonic_mean")         { say "$command = " . $s->harmonic_mean() }
28         when ("geometric_mean")        { say "$command = " . $s->geometric_mean() }
29         when ("mode")                  { say "$command = " . $s->mode() }
30         when ("trimmed_mean")          { say "$command = " . $s->trimmed_mean() }
31         when (/^(exit|quit)$/)         {last}
32         default                        { say SYNTAX_ERROR }
33     }
34 }
I think that almost any programmer used to dynamic languages like Python or Ruby can readily understand code in Modern Perl and even be comfortable working with it.
The programmers of languages like C, C++, C# or Java, after getting used to some basic principles should feel a kind of liberating experience, because writing a program such this in those languages is certanly more difficult.
In the next article we'll see some dynamic features of Perl that make the program shorter, more flexible and easier to maintain.