Monday, November 30, 2009

PSGI and Plack: the future of web applications

[Original spanish source]
A few weeks ago I showed my friend Joel a one-liner in Perl it featured a web server, perhaps he had too much work to do because it did not seem surprised by this fantastic line of perl module using IO::All:

perl -MIO::All -e 'io(":8080")->fork->accept->(sub { $_[0] < io(-x $1 ? "./$1 |" : $1) if /^GET \/(.*) / })'

But surprisingly (especially for a Perl fan) his response was: "You know that Python's people have software to deploy powerfull web servers easily", I noticed that he did not understand my point, so I let him go.

Although I was sure he was talking about WSGI (also known as PEP-333): a specification for a web application API, allowing the separation concerns between the interface (policy) and implementation (mechanisms).

In Perl this was the job of HTTP::Engine used among others by Catalyst.

However, I was curious and I looked at CPAN, would there be something new out there?.

I found modules like Mojo, which internally uses an interface similar to WSGI, however the most interesting thing I found was PSGI and Plack.

Apparently HTTP::Engine is far from an ideal solution. I read it is monolithic, difficult to adapt and not very efficient, for embedded environments I guess, so Miyagawa decided to separate HTTP::Engine into three parts:
  1. An specification: PSGI
  2. A reference implementation: Plack::Server
  3. Tools: Plack::*
Most interesting about Plack and PSGI is the pace at which it was implemented, only weeks ago it was an idea and for some time now there are reference implementations available, which allow applications to run standalone by Plack in a single thread or perfork, there are also interfaces for FastCGI, CGI and mod-perl of course, and as if this were not enough, PSGI has the ability to work with non-blocking I/O, so there are servers for POE, AnyEvent and Coro, there is even a PSGI module for Apache (mod-psgi).

On the other hand, PSGI adapters were developed for frameworks like Catalyst (Catalyst::Engine::PSGI), Squatting (Squatting::On::PSGI), CGI::Application (CGI::Application::PSGI), Dancer and even for WebGUI (PlebGUI), there are tools to help in the migration from other technologies, for example if you have an application written for HTTP::Engine, you can use it virtually unchanged in PSGI with HTTP::Engine::Interface::PSGI, if you have a CGI application you may migrate it with little modification with CGI::PSGI, and if even this is too much work you can use CGI::Emulate::PSGI that supports running a CGI server from the command line!.

In the last post I made a toy POD document server, and I choose to implement it as a CGI, I gess that some people may had problems making it work, because it needed a running web server with the right CGI configuration in place.

Using CGI::Emulate::PSGI you only write a program to start the server (perldocweb_starter):

1 use CGI::Emulate::PSGI;
2 my $app = CGI::Emulate::PSGI->handler(sub { do "perldocweb" })

and then run the command plackup:

$ plackup perldocweb_starter
Plack::Server::Standalone: Accepting connections at http://0:5000/

now we have our documentation server running on port 5000, so browsing:

http://localhost:5000/perldocweb?PSGI

PSGI specification should appear in the browser, easy right?.

If you are willing to modify your code, the emulator won't be necessary the application can be executed directly by plackup, and it will be much more efficient.

The first modification is to change line 4 to use CGI::PSGI, I also no longer use CGI::Carp, because Plack has a much more elegant way to display errors using Devel::StackTrace::AsHTML.

When using CGI::PSGI the program must create (and return) a closure that will be our application so the main code between lines 20 and 50 (of the old code) should be inside a closure, also line 20 now must initialize a CGI::PSGI object, so I replaced it with the lines 20 to 22 of the new application:

 1 #!/usr/bin/perl
 2 
 3 use Modern::Perl;
 4 use CGI::PSGI;
 5 use IO::File;
 6 use Pod::Simple::Search;
 7 use Pod::Simple::HTML;
 8 
 9 my %content_types = (
10     RTF   => "application/rtf",
11     LaTeX => "application/x-latex",
12     PDF   => "application/pdf",
13 );
14 my @wikis   = qw(Usemod Twiki Template Kwiki Confluence Moinmoin Tiddlywiki Mediawiki Textile);
15 my %formats = (
16     ( map { $_ => "Pod::Simple::$_" } keys %content_types ),
17     ( map { $_ => "Pod::Simple::Wiki::$_" } @wikis )
18 );
19 
20 my $app = sub {
21     my $env      = shift;
22     my $q        = CGI::PSGI->new($env);
23     my $filename = Pod::Simple::Search->new->inc(1)->find( $q->param("pod") );
24     my $format   = $q->param("format") || "HTML";
25     given ($format) {
26         when ("source") {
27             return [ $q->psgi_header("text/plain"), IO::File->new($filename) ];
28         }
29         when ('HTML') {
30             my $parser = Pod::Simple::HTML->new;
31             $parser->perldoc_url_prefix( $q->url( -path_info => 1 ) . "?pod=" );
32             my $footer = "<hr>"
33                 . join( " ", map { make_link( $_, $q ) } "source", keys %content_types )
34                 . " | Wiki formats: "
35                 . join( " ", map { make_link( $_, $q ) } @wikis );
36             $parser->html_footer(qq[\n<!-- end doc -->\n\n$footer</body></html>\n]);
37             $parser->output_string( my $output );
38             $parser->parse_file($filename);
39             return [ $q->psgi_header("text/html"), [$output] ];
40         }
41         when (%formats) {
42             my $class = $formats{$format};
43             eval "require $class";
44             my $parser = $class->new;
45             $parser->output_string( my $output );
46             $parser->parse_file($filename);
47             return [ $q->psgi_header( $content_types{$format} || "text/plain" ), [$output] ];
48         }
49         default {
50             die("Formato desconocido '$format'");
51         }
52     }
53 };
54 
55 sub make_link {
56     my $fmt = shift;
57     my $q   = shift;
58     $q->a( { href => $q->url( -path_info => 1, -query => 1 ) . "\&format=$fmt" }, $fmt );
59 }

The closure receives a PSGI environment as a parameter (line 21) and uses it to create the object $q that we may use like a regular CGI object.
This closure should return an array of two elements:
  1. The headers: an array of alternating header names and values
  2. The body: an array of lines or an IO::Handle object
A major difference between CGI::PSGI and CGI is that in the later anything written to STDOUT is sent to browser, whereas in the former the body is returned.

So the generation of content in the application must be changed, the source code case (line 26) is more simple than the CGI version because I just need to return the headers together with an IO::Handle object (created with IO::File). CGI::PSGI is responsible for reading the object's data and send it to the browser, if the handle is a real file (as in this case) and the operating system has sendfile(2) (as in linux), sending data is done entirely by the kernel so there is not difference in efficiency between this program and one optimized in C (like Apache).

In the case of HTML (line 29) I have changed the use of output_fh by output_string to store the content generated by Pod::Simple into $output, which is later returned in line 39.

As I can not longer use the STDOUT to send the content to the browser I can not use the shortcut $class->filter of Pod::Simple, so I've replaced it by its equivalent in lines 44 to 46 of the new application.

Although perhaps not obvious, the code returns the closure (line 20) because it is the last computed value in the file.

If we call our new program "server_pod", you can start it with plackup as follows:

$ plackup server_pod
Plack::Server::Standalone: Accepting connections at http://0:5000/

after that, POD content could be browsed as showed before, plackup is using it's default server (Plack::Server::Standalone) a single threaded process which is ideal for development or for personal use, but if you need a production-quality server you should see other options, for migrated CGI code I recommend Plack::Server::Standalone::Prefork, which may be started like this:

$ plackup -s Standalone::Prefork server_pod
Plack::Server::Standalone: Accepting connections at http://0:5000/

That was easy, default values are used for everything, but if you need to tune the server, you can give options to plackup, each server has specific options documented in their implementation class.

Finally, this code is much more efficient than the emulator in the first example because it does not need to use temporary files for capturing the standard output, but you can still run it in CGI, FastCGI or even mod-perl mode under Apache.

The next time I will improve the application to use Plack and its middleware.

No comments:

Post a Comment