Saturday, October 24, 2009

Language evolution

[Spanish source]
Perhaps you have notice that I have not written very much lately, the reason was that I had to devote full time to an urgent migration.

The system in question belongs to the client who hired me to develop a web application, which incidentally was my first application of this style back in 1998.

At that time I was giving my first steps with Perl, but somehow people convinced me that the application should be developed in PHP, I slowly realized that PHP was not flexible enough, and end up doing several Perl programs that implement the most of the functionality, yet the entire application interface remained in PHP.

During this project, I learned that Perl was much more versatile, powerful and fun than PHP (3), and if I had not been persuaded, would probably have found, the entire application would be in Perl and I would not have a story to tell.

At the start of the migration process, it occurred to me that it would be easier to do interface migration first, because I thought that being the simplest part of the application could be easily migrated to PHP5, this was far from reality, because nothing worked as it should.

I understand that everything evolves, but programs didn't even compile, some APIs have changed enough to warrant a complete review of all source code, and of course the application was a mess, what else could it be?, At the time there wasn't available a library of templates, access to the database was done with the horrible library functions for PHP3, and the entire application is a monument to the ASP style programming, where the view, model and controller were completely integrated, as if they had gone through a blender.

The truth is that it was easier to compile PHP3 for the new operating system than trying to migrate the code to PHP5, in that matter I wish to thank Debian and particularly, because it may be one of the few places on the planet that still keep PHP3 source code, because the language's community has a policy of withdrawing old sources from the Internet.

Having overcome the problem (ie PHP), I had nightmares about the pile of Perl 5 code that would not work at all in the new Perl 5.10, so I resigned and started to copy and run all files.
The first thing broken was DBI, in fact the connection string to the database was using a format that I did not even remember it had existed:

  DBI-> connect ( "dbi: Pg: dbname =", "user", "pass");  
which I changed to the current syntax:

  DBI-> connect ( "dbi: Pg: dbname = mydb, host =", "user", "pass");  
I tried the program again and it worked, I proceeded to try another program and it worked, and the trend continued, program by program and couldn't find any faults.
A decade of language and the CPAN evolution and almost everithing worked at once, I was amazed (and of course pleased).

By contrasting the Perl and PHP evolution I can appreciate that while the former has acquired a wealth of features such as new language constructs, several concurrency implementations, event-driven, threads and corrutines, new abstractions to facilitate and improve OOP which allow the programmer to choose between several object-oriented systems, significant advances in the area of meta programming, and some others, the later has achieved basically a system of OOP, and still it is more incompatible than Perl.

The question is: why this radical difference?, especially when it is said that there are many more solutions ready in PHP than in Perl and then there should be more development effort in former that the later.

You may have the impression that I am bashing about PHP, but I am not, PHP was just the trigger that made me think about the problem, but if we look to other languages, will probably reach similar conclusions, for example: how much did Python language evolved in the last decade?, would a 10 year old program still work in the current envirnment?, what about Java?. My bet is that Perl beats them both on these fronts.

Although Perl is a New Jersey solution, the language itself is heavily influenced by Lisp, a MIT solution, although one can argue about the complexity of the syntax of Perl, the fact is that the semantics of their operations is consistent enough and extremely versatile, allowing modules extending the language, so you can experiment with new features without  introducing them into the interpreter.

Other programming environments have very good tools that greatly facilitate the implementation of compilers, but the complexity in the implementation of any language extension by preprocessing the language limits the development of such extensions.

 Moreover, the lack of integration with the compiler, forces the user to introduce complexity in the construction of software, execution of pre-compilers, handling of temporary files, and others, that become a nightmare particularly in dynamic environments, imagine that to load a Moose module, you should need to run the Moose preprocessor, and then load the module resulting from the compilation, it would be really annoying, Perl has mechanisms that allow the Moose compiler to run automatically, accessing the source code and returning the modified code to the Perl compiler for the final stage of compilation, and you all you need to do is: use Moose; at the program start.

The problems described above limit the adoption of external extensions to the language, restricting the evolution through extensions, which is the easiest way to evolve a language. In this sense Perl is a lot like Lisp, and most of the evolution of language is achieved by adding external modules from the CPAN. Today there are extensions that implement various structures such as try/catch, adding the ability to declare the parameters in the declaration of subroutines, and many other goodies that are not part of language.

With respect to compatibility, there are two factors to take into account: the compatibility of the libraries and the language itself.
The language compatibility is maintained using pragmas, which for Perl are almost the same as any CPAN module (or it seems so, from the perspective of a programmer), so when the language changes new pragmas are added. For example the new features of Perl 5.10 are activated using a pragma.

The greatest Perl incompatibility occurred in the last decade is the elimination of pseudohashes, and to minimize the problems mechanisms were created to facilitate the migration of existing code (use fields) and the deprecated pseudohash feature was keep around for about 5 years. This illustrates how the community is committed to maintain compatibility with existing code.

The community has recently adopted the term DarkPAN, which is all the code is written in Perl, that is not publicly available, but lies hidden in thousands of systems that people do not even know have been written in Perl (the one in this story appears to be written in PHP), and although many people prefer to modernize the Perl platform obviating the DarkPAN, community generally thinks that this is not the best policy.

In the universe dark matter accounts for most of the mass, and we do not know if DarkPAN outweighs the CPAN, but we can see some side effects, for example, though there seems more systems written in PHP, there are many more job offerings for Perl, and DarkPAN may be the cause.

The truth is that all that code which drives many business is there because it works, and because it can be maintained over time with less effort than similar code developed on other platforms.

The most impressive fact is that today we have a very modern platform compatible with a decade old code, with libraries so easy to use, modern and powerful such as Moose, Devel:: Declare and Catalyst, which are the envy of other languages' developers.

So if you want to program in a simple, modern, powerful and enduring, plataform the choice is clear: Perl.

Wednesday, October 14, 2009

Statistic Calculator: Final Console Application

[Spanish original]
In the last article we left pending to make our interpreter to recognize  parameters and be able to print any return value from the Statistics::Descriptive functions.
First we should define how to delimit commands and parameters, the easiest way to do it is like the unix shell does, using whitespace to delimit both, so once you get a line of commands:
34     $command =~ s/^\s+//; $command =~ s/\s+$//;
35     my @args = split /\s+/, $command;
36     my $oper = looks_like_number( $args[0] ) ? "add_data" : shift @args;
spaces are removed at the beginning and end (34), split all the elements separated by one or more white (35), and get the operation to perform (36) which is usually the first element, except when a number is read, in which case the implicit operation is "add_date". After these operations the operation to be performed is in $oper and its arguments are in @args, so it only remains to apply the operation:

38         when (%FUNCS)               { apply( $oper, @args ) }
The routine in charge of the implementation of the operation must get the arguments (26), assess the operation in context list (27), why should be that in list context?, to allow operations like "percentile" return multiple values.
La rutina a cargo de la aplicación de la operación debe obtener los argumentos (26), evaluar la operación en contexto lista (27), ¿por qué en contexto lista?, para permitir operaciones como "percentile" que produce múltiples valores.

Once the value is calculated, we had to check whether it is worth printing, so we simply return if @result is empty (28), or if you have a single element but is not defined or empty string (29).
To print complex values I will use the YAML format, because it is very readable and perl has a very good YAML module available from CPAN.
So depending on the number of elements in @result will print the first element or or the entire array converted to YAML text (30).
The calculator then console looks like:
 1 #!/usr/bin/perl
 3 use Modern::Perl;
 4 use Scalar::Util qw( looks_like_number );
 5 use Statistics::Descriptive;
 6 use Pod::Perldoc;
 7 use Term::ReadLine;
 8 use YAML;
10 my $term = new Term::ReadLine 'Statistic Calculator';
12 my %FUNCS = map { $_ => 1 } qw( sum mean count variance standard_deviation
13     min mindex max maxdex sample_range median harmonic_mean geometric_mean
14     mode trimmed_mean clear add_data percentile quantile least_squares_fit
15     frequency_distribution_ref frequency_distribution);
17 my @COMMANDS = qw( exit quit help man );
19 sub help { say "Comandos: " . join( ", ", sort @COMMANDS, keys %FUNCS ) }
21 sub man { Pod::Perldoc->new( args => \@_ )->process }
23 my $s = Statistics::Descriptive::Full->new();
25 sub apply {
26     my ( $oper, @args ) = @_;
27     my @result = $s->$oper(@args);
28     return unless @result;
29     return unless @result > 1 or (defined $result[0] and $result[0] ne "");
30     say YAML::Dump( @result == 1 ? $result[0] : \@result );
31 }
33 while ( defined( my $command = $term->readline("Listo> ") ) ) {
34     $command =~ s/^\s+//; $command =~ s/\s+$//;
35     my @args = split( /\s+/, $command ) or next;
36     my $oper = looks_like_number( $args[0] ) ? "add_data" : shift @args;
37     given ($oper) {
38         when (%FUNCS)               { apply( $oper, @args ) }
39         when ("man")                { man "Statistics::Descriptive" }
40         when ( [ "exit", "quit" ] ) {last}
41         when ("help")               {help}
42         default                     { say "Error: tipee 'help' para ayuda" };
43     }
44 }
After this series of articles, I hope you can appreciate how fast you can work in Perl, using the mechanisms offered by the language and the vast amount of tools available on the CPAN.
In future articles will use this example to illustrate other techniques such as web programming, object-oriented Perl and of course more modules from CPAN.