Friday, January 19, 2007

Perl advent post Dec. 24th

Over the Christmas holiday I wrote a few posts for the Perl advent calendar. I'd like to share them here, if only for posterity. I believe you should be able to view the archived original.

Perl Advent Calendar 2006-12-24
Santa's Dilemna
by Ben Prew

Kris Kringle had just left the North Pole on his brand-new state-of-the-art sleigh, complete with wireless access to the North Pole presents database. The elves had been hard at work bringing present distribution into the 21st century, and their finishing touch was getting rid of Santa's old, outdated hand-written list. The who-is-naughty, who-is-nice, and who-wants-what lists were now stored in a fancy new Oracle(!) database. (Yes, Oracle got off the naughty list with that one)

Everything was going swimmingly and Kris was picking, packing and delivering more presents then ever (Which is a good thing, considering our "conversion" program currently going on in Iraq). Unfortunately, just as he was sweeping into Russia, a black-bellied plover slammed into the side of the sleigh! Kris looked over the side and to his horror, a smoking hole was all that remained of his wireless link to the North Pole. A crackling on the radio brought Kris back to his senses. It was Rudolph, his main reindeer.

"Captain, what was that?"

"Rudy, we just got groused and lost our NP link."

"KGB?"

"Uhh… I don't think so."

Rudolph was a huge conspiracy theory freak, and of course since they were over Russia, it could only be the KGB trying to sabotage Kris. They were trying to enact a "regime change", at least according to the reindeer.

Kris began rooting through his system, looking for a second chance… anything….

Any sort of land line connection… no
Cell-phone card… not even Verizon has that much coverage….
Tin-foil to make an improvised antenna from the reindeers antlers…. no

Wait, wait… maybe this was something….

Kris realized that Norrish, his head elf, had been using his laptop to munge the wish list before importing it into the new database, and the raw text files were still sitting on his hard drive! Unfortunately, Kris didn't have time to install Oracle, much less the horsepower to run the database and import the data. He slumped back in his seat, slowly coming to terms with what this meant. Just as he was about to turn the sleigh around, Rudolph's nose lit up…

"What is it boy, what are you thinking?"

Rudolph started wagging his tail excitedly (yes, reindeer have tails), and began jostling up and down, braying and making other reindeer noises.

"Come on Rudy, you know I can't understand you when you're this excited. Slow down, take a deep breath and try again"

Rudolph inhaled deeply, exhaled, and then quitely, calmly uttered four meager syllables,

"Any data?"

Kris stared at Rudolph blankly, not quite sure what to make of what was just said. What in the blazes did that mean? Out of ideas, but with little else to do over the vast Siberian wasteland, Kris focused all his Kringle energies and attempted to decode his companion's cryptic quip.

He scanned his memory, trying to dig up something, anything at all. Something nagged at the back of his mind, he could feel it, but he just couldn't mold it into a coherent thought. Suddenly, his eyes lit up, and quick as a wink Kris knew what Rudolph meant. The code began pouring from his fingers onto the screen as Kris entered the zone. His eyes narrowed as all other thoughts were put aside, as he intently focused on the task at hand.

"Where were those files… in Norrish's home directory of course"

"Now, the sql queries"

With furious typing, a little elf magic and much unit testing (You know he rolls with XP), Kris had solved his problems and saved Christmas. Before you scroll down to the code, can you guess what characters his keyboard had expressed? What masterpiece he was able to piece together in a few short minutes, with only raw text files, and but a word from his most trusted reindeer? Behold, the code of Kris Kringle!
mod24.pl - Kringle's Code

1 #!/usr/bin/perl
2
3 use DBI;
4
5 # My files are laid out like this:
6 #
7 # presents.txt
8 # person_no|present
9 # 1|bicycle
10 # 1|action figure
11 # 2|doll
12 # 3|doll
13 # 4|bicycle
14 #
15 # people.txt
16 # person_no|name|personality|address
17 # 1|bob smith|nice|123 anywhere st, St. Petersburg
18 # 2|alice andrews|nice|465 somewhere st, Moscow
19 # 3|frank martonick|naughty|1138 lenin ave, St. Petersburg
20 # 4|billy cutter|nice|31337 peoples lane, Moscow
21
22 my $dbh = DBI->connect('dbi:AnyData(RaiseError=>1):');
23
24 $dbh->func( 'people', 'Pipe', '/home/norrish/lists/people.txt', 'ad_import');
25 $dbh->func( 'presents', 'Pipe', '/home/norrish/lists/presents.txt', 'ad_import');
26
27 my $sth = $dbh->prepare(q/
28 SELECT person_no, name, address
29 FROM people
30 WHERE personality = ?/);
31
32 $sth->execute('nice');
33
34 while ( my $person = $sth->fetchrow_arrayref ) {
35 my $presents = $dbh->selectall_arrayref(q/
36 SELECT present
37 FROM presents
38 WHERE person_no = ?/, {}, $person->[0]);
39
40 print "Presents for ". $person->[1] . " (living at: ". $person->[2] . ")\n\t",
41 join("\n\t", map { join " ", @$_ } @$presents), "\n\n";
42 }
43
44 $dbh->disconnect();

DBD::AnyData is a Perl module that allows one to, among other things, use a variety of (mostly text) file formats as tables in a database. It seems to have limited support for joins, so I had to do the queries separately. Even so, AnyData is really cool!

Perl advent post Dec. 17th

Over the Christmas holiday I wrote a few posts for the Perl advent calendar. I'd like to share them here, if only for posterity. I believe you should be able to view the archived original.


Perl Advent Calendar 2006-12-17
Yule Log-Rolling
by Ben Prew

When running automated processes, I find it incredibly useful to have some sort of logging setup, so that I can see how long certain parts of processing take. Or, even more importantly, if the process dies, I can better determine what it was doing shortly before it bit the dust.

At work, we have many automated and semi-automated processes that run at a scheduled time. These processes all log to the same directory, which makes it easier to find them. Also, I would like to automatically rotate new files when they show up in this directory, and not have to deal with any sort of configuration file.

I could have done this with logrotate, or some other process, but I like doing things in Perl, and I didn't want to interfere with existing archiving processes on the box. With Logfile::Rotate I can eat my bûche de Nöel and have it too.

If I wanted to write a separate script to rotate all the log files for me, it might look something like mod17e.pl (external):

1 #!/usr/bin/perl;
2
3 use Logfile::Rotate;
4
5 my @logs = map {
6 my $file = $_;
7 Logfile::Rotate->new(
8 File => $file,
9 Gzip => 'lib',
10 Dir => '/var/logs/dev.old',
11 Post => sub { unlink $file } ); } ;
12
13 for (@logs) { $_->rotate() }

The default behavior is to leave an empty log file in the directory, but all my processes will create their own files, if needed, so I would rather just remove the file. This was easy to add with the Post argument.

Another benefit of Logfile::Rotate is that unlike an external binary, I can embed it in my existing code. All of our current logging is done though a mix-in, so I've got a single point of contact for each process that runs, regardless of where it logs to.

This method is called log(), and it handles all the logging for each file, as well as knowing which file to log to. This also gives me more flexibility in how each log is rotated. A rotation could be triggered by the process catching a signal, the number of logged messages exceeding some threshold, the time elapsed since last rotation, or the log growing too large (in an effort to avoid filling the partition), etc.

So, if I wanted to rotate each log file at 100 Mb, regardless of when it was last rotated, the code might look something like mod17i.pl (internal):

1 sub log
2 {
3 my ($self, $message) = @_;
4
5 # logging stuff here.
6
7 if ( ( -s $self->filename ) > 100_000_000) {
8 Logfile::Rotate->new(
9 File => $self->file_name,
10 Gzip => 'lib',
11 Dir => '/var/logs/dev.old',
12 )->rotate;
13 }
14 }

Having the log files rotate themselves, how great is that! Now we don't have any other external scripts or configurations to maintain. Of course, the downside to this approach is the implicit stat() on each call to log(), but it shouldn't add too much overhead. This can even be alleviated if there is only one process that writes to the log file, since you could then have a counter that is initialized to the current size of the file and then adds the size of the message to the counter. Then, once the counter reached 100_000_000, you could rotate the log file.
SEE ALSO
Log::Dispatch::FileRotate, logrotate(8)

Thursday, April 27, 2006

Decreasing the standard deviation of lisp

Recently, there was an (apparently cyclic) thread on comp.lang.lisp the other day about how lisp sucks and why new developers aren't flocking to bask in its glory.

And, while it started out with a few broad points, the thread quickly moved into several small points, and the various merits of those points.

One of them, in an example used by Rob Garret, discussed the merits of deprecating nth in favor of elt, since elt is a superset of nth.

However, when I think about the existence of nth and elt more, I don't think that it really matters to newbies whether or not both nth and elt exist, since either:
  1. They don't know about the existence of both
  2. They know about the existence of both, but they don't really care
And, as I thought about it more, I came to the conclusion that it is actually better to have both nth and elt in common lisp. This follows my thinking that the biggest barriers of entry to learning a new language are focused on a few main things:
  1. New syntax
  2. New libraries and their various available functions/methods
  3. Various new concepts presented within the language (C: pointers, ruby: blocks, lisp: macros,functional style, Java:mostly-OO)
Note: not meant to be a complete list of new concepts from various languages, just a few examples.

There are many reasons why languages do well and others don't, but I think one of main reasons a language does well is that it has similarities to the current collection of popular languages.

New concepts can be difficult to learn, but if you look at Ruby, Perl and Python, all languages that are slower then lisp, yet have concepts not commonly used in C/C++/Java that are similar to lisp (blocks, functional style), they tend to do better, IMHO, because they have strong syntax and library similarities to C/C++/Java.

I think that the difference in the 3 main points up above constitutes what I call a languages "standard deviation" to the current most popular languages. The corollary is; a language that offers a low standard deviation will have a higher appeal then a similar language with a higher standard deviation. (Note: I'm doing some pseudo-statistics at work; well, the statistics are real, but I'm not an real statistician, hence the "pseudo" part)

For example, why did Matz write Ruby? It has a lot of concepts that lisp employs, but its significantly slower. Between inventing ruby and just using lisp, why didn't Matz just use lisp, or smalltalk?

I think one of the big reasons (perhaps unconsciously), is that Matz recognized the "standard deviation" between both lisp and smalltalk, and set about designing a language that was closer to most programmers current expectation of a language.

I think that lisp would gain broader appeal as well by reducing its "standard deviation". Since I like the current syntax of lisp, and I suspect a lot of other people do, that only leaves standard functions/libraries and concepts.

Also, since I happen to think that lisp's concepts (macros) are some of the best ever invented, I'll nix that idea as well.

This leaves:
  1. standard functions / libraries / objects
I think lisp would do well to add extra library functions that are very similar to the existing favorite languages. Not only would this not break any existing code, but it would help ease the learning curve that new programmers face when learning a new language, especially one with a much different syntax, such as lisp.

I'll readily accept that whatever is currently popular may not be the best way to do something, but by not giving programmers a sense of familiarity, you force them to basically start from scratch. Once people give lisp a chance, they'll come to understand the power that it conveys, but most people already don't have enough time to spend learning new concepts they can use in their existing language, much less spend time learning new concepts, a radically different syntax, and a whole new set of libraries/functions!

I think that we (the lisp community) should imitate some functions of the more popular languages to increase membership. As an example, I was thinking it would be fairly easy to add things like "while", "for", "foreach", "var", etc.

They could even be interned into a new package (maybe 'new-lisp-user'), so someone could just import that package, and be greeted with functions that more closely mirrored their expectations.

Here's some sample code:

(defmacro var (&rest all)
`(let ,@all))

See, that's all I'm talking about. Just creating new methods that look an awful lot like existing lisp methods, but have function and name that are more familiar to new programmers. Now, as a new programmer, I can focus on learning the different syntax, those funky macro things, etc, all while having the familiarity of my favorite language ;)

Note, I think my arguments are implicitly supported by SteveY's blog post about why lisp in an unacceptable lisp. Its not that the points brought up were technically sound, but rather that they represent the kind of problem I'm talking about; things behaved differently then he had come to expect from most "mainstream" languages.

It's all about minimizing standard deviation.

Saturday, November 26, 2005

Solving the RIGHT problem

I was recently reading "Good to Great", which is a little outside my typical book selection, and as I was reading how a company makes the jump from a "good" company to a "great" company I started thinking about how people, specifically me, could make the jump from good or mediocre to great.

One of the points the book brought up, which I'm sure most of us have heard before, is that good, not bad, is the enemy of great. Once we get to good, we tend to stop striving, we get comfortable; lazy.

That got me thinking about the coding I do on the side, and how I choose the coding problems that I worked on. I think that you will agree, not all coding problems are created equal, and I realized that while I was working on several things, often times I was taking on pop-code projects (yes I made up that word). Projects that were simple or more quickly rewarding, but didn't necessarily stretch me as a developer, similar to pop music or soda pop. A project that would be simple to do, and not allow myself much of a chance to explore the theoretical side of computer science.

One can bang out code all day long and never have to even touch a hashtable, or write their own binary tree. In my opinion, banging out code all day is something I could teach my dog to do, or at the very least I expect these kinds of things to be solved in the near future (~20 years) by higher level languages. Just as we write in C++/Perl/etc now, and we can't imagine writing all our code in assembly, I'm sure the next generation will look down on our very verbose languages.

I think in order to stretch oneself as a developer, one should take on problems that you can't solve easily, or that require several iterations to solve. Not only that, but solving a problem you don't know how to solve, it often requires continuous learning.

I'm not sure if I can really explain how to stretch yourself; I'm not good enough to generalize a solution, I can only scramble and reach for ways to do it myself.

To that end, I'm going to try taking on more difficult solutions. Solutions that /require/ that I understand the problem domain and learn about algorithms that are needed for that area.

My current plan is to make a bicycle route for Google maps, that would only use bicycle routes, so it wouldn't suggest trying to ride on interstates or other bicycle un-friendly roads. Not only that, but it would take elevation changes into account.

So far, I've found that Seattle has GIS data available, although its in "shapefile" format, and may not have latitude and longitude information. If I get can retrieve latitude and longitude information, the USGS has a webservice that can give you an elevation based on lat and long.

I'll let you know how it turns out...

Friday, August 26, 2005

Can you improve on emptiness?

A while ago Google published what they called the "Google Aptitude Test", and one of the questions they posed (#9 I believe) involved improving on empty space. After reading several answers to this question, I thought that the correct answer should have been "nothing", for several reasons.

In terms of how it relates to Google, it appears to be one of their desired design goals. All right, simplicity might be the actual design goal, but you cannot achieve simplicity when you have a cluttered interface. Either way, as you look at the applications they have published (search, gmail, gtalk, etc), it all has a very simplistic, very small interface. Contrast their search homepage with that of Yahoo's. Google is very much interested in becoming a "portal", as gmail and talk both attest to. Yet, their homepage remains ever so sparse. So, I think its the right answer because it follows what Google is already attempting to achieve.

However, if we diverge into the more philosophical, I feel there is another reason that the answer of "nothing" is correct. Back when I was playing guitar more regularly, and I was reading a lot of guitar theory, one of the books I was reading, either the Advancing Guitarist or the Heavy Guitar Bible, they mentioned that when soloing, it is often the rests in-between notes that make the solo interesting.

So it is with other things as well. The space between words and thoughts becomes the emphasis. If you have a blank page, you can write anything on it. It is blank, it can become anything, it is completely malleable. However, once you begin writing on it, it loses that potential. The page then is forced to represent an idea, solve an equation, or otherwise complete what has been started. It can no longer be anything you imagine.

Just so I can tie this to software somehow, it is similar to an undefined variable; it can be anything, at least until you assign something to it.

Rather then clutter up empty space with some complex math proof, or some essay, why not sit back and think about all the things that could go into that space. Therefore, the answer of "nothing" gives way to all that could be written or displayed on that empty space, without actually committing to anything. It gives us a chance to do what we do so rarely in life:

pause,

and

think.

Of course, by typing this all out, it kind of goes against what I just discussed, so I'll end this here, but I'll leave something more important to think about:




















...

Thursday, August 04, 2005

Why testing doesn't occur in strongly-typed languages

This isn't meant to be an answer, or even truthful, more just something to think about, again, its just me ruminating on various floaties in my head. Maybe its true, maybe its not, I don't have the visual landscape to know for certain, and should not be taken as such. Think of it more as dinner conversation, in a one-sided sorta way.

Anyway, I was thinking about testing in loosly-typed languages, and how at the last company I worked for, where I only wrote in loosly-typed languages (Perl), that having tests written for our software was not only a Good Thing, but a necessity. There were many, many problems we caught via tests that we would not have caught until they bit us in a production environment.

However, most of C++ and Java code I've seen recently doesn't have any tests, nor with the complexity of the system could you test everything. Even without testing, the software runs, perhaps better then I expected. Although high quality software has been written without tests, so having tests is not a requirement for high-quality software, but that's not the point...

The point is more a comparison between strongly-typed an loosely-typed languages. In strong typing, the compiler does a lot of the checking work for you, even though in languages like C++ and Java, you have to be explicit about the type (which I find revulting, personally), so many of the tests you might write are already covered by the compiler.

In comparison, with Perl, you won't find out if you can call a method until run-time. This forces you to create tests that confirm that your code, can indeed, call the methods you want to call, on the objects you want to call them on.

So, in effect, part of your test suite becomes a simple-compiler, checking for certain things that strongly-typed languages have already checked. This poses a strange question, are you coders writing tests, or fixing a "broken" compiler? Is the compiler that broken to begin with?

Apparently, I'm not the first person to have thoughts like this (http://www.artima.com/intv/strongweak.html), but I'm no Guido Van Rossum, even on my better days.

Can software "Get Big Fast"

I recently joined a certain company as a software engineer, and during one of the discussions I was having, someone mentioned that it was a certain mentality (Get Big Fast), that allowed this company to thrive. I was pondering, as I often do, about how one could apply that to developing a software product, and if that is a reasonable software design model.

Moving to a software development standpoint, this means that the focus of the software product should be meeting the customers needs, and that all other concerns are secondary.

This means that things like adding new features, and cleaning up the interface, are more important then the condition of the underlying code. This does not mean that the underlying code is not important, it is just less important.

As an real-world example (the project which I am attempting to use this strategy on), say you have a system that parses FPS log files and generates static .html pages based on the data. This system isn't very large (about 6k lines of Perl), and its already partially broken up into several objects (~15).

Say that you now wanted to track a particular type of event that only occurs in one or two types of FPS logs, but not in all of them. Not only that, but on the reports, you do not want to display information that is possibly not tracked by that logfile.

Our example is a headshot, where a player shoots another player in the head. This is only tracked on games that log what location a player was hit in, some do, while others do not.

The "correct" way would probably be to have a set of attributes that each logfile has, and then have the reports only publish those attributes. But, there is nothing in the code that allows for this. Rather then create a larger scheme that will allow for future capabilities, you attempt to add that change in the fastest way possible. This allows you to get the new feature out to the customer faster, even though it may not be the best solution design-wise.

I believe that I can apply this thinking to an open source software project I am working on now (http://sf.net/projects/gamestats). I believe this software project is especially suited to this sort of philosphy. Its small enough that 1-2 people can wrap their head around the entire project, and easy enough to test the entire system that its possible to put out a release every day, given the time.

This sort of project can use the "Get Big Fast" design philosphy because you can keep track of all the little ugly 'hacks' you have put in. After you've gotten "Big", you can evaluate the code and refactor the appropriate methods.

Its a little along the lines of XP (http://extremeprogramming.org), but it advocates even less fore-thought, as code is secondary to customers. I do not know if this would work in a project of any real importance, but I belive MPlayer has adopted a similar strategy, where they do not concern themselves with the quality or the beauty of the code, and instead focus on adopting new formats as fast as possible.