Gosu CR IntelliJ Plugin (and Other Goodies)

Today, we’ve finally released our IntelliJ IDEA plugin for use with the Gosu Community Releases, along with a new release of Gosu itself.  A few open source projects using Gosu will also be releasing updated versions to work with the new Gosu and IntelliJ plugin releases.  The official Gosu CR announcement is over on the gosu-lang group:

http://groups.google.com/group/gosu-lang/browse_thread/thread/669d57dcb5885c8e?pli=1

While the official IntelliJ IDEA plugin announcement is on the new gosu-idea group:

http://groups.google.com/group/gosu-idea/browse_thread/thread/73c2ed8f9f8d6101

(Note that these releases are not part of any current Guidewire products and won’t work with any current Guidewire products; these are simple the standalone “community release” versions of the Gosu language and the plugin).  I’ve been playing with the plugin for several weeks as part of getting Tosa ready for an associated release, and I can definitely say that they make programming in Gosu a lot faster and a lot more fun.  So if you’ve been turned off by Gosu in the past due to the weak tooling support, especially if you’re used to Java IDEs, now’s the time to start checking it out!


An Apology for Agile: When to (Not) Use It and How to Make It Work

After reading this post about scrum yesterday, I went over to comment on the article on HackerNews, and was genuinely saddened by the overall negative tone of the comments there around scrum/agile. It seems like a fairly large percentage of people have only had negative experiences with “agile” processes and are either actively hostile to them or, at best, don’t see the point.

My overly-long comment response got eaten by what appeared to be a server timeout, but I think a blog post is a better explanation anyway. So here’s what you might consider an apology for agile process, wherein I’ll discuss what problems it solves, when not to use it, and how to avoid screwing it up if you do decide to use it.

In this discussion, I’ll refer to “agile” as a general thing, even though there are many variants on it. For purposes of this discussion, “agile” is a development process that involves timeboxed iterations of fixed length, work broken down into small-scale stories, and a product owner who generates (most of) the stories and decides on their priorities. That’s a simplification, and there are many variants on that theme, but for purposes of this post that’s what I mean.

We’ve All Got Problems, Right?

Anyone writing code is, kind of by definition, following some sort of process, even if it’s not an explicit process: your “process” might consist of writing code, testing a few things, deploying your website, and then repeating the cycle. So there’s always some current or default process that you’re following, and the only reason you should ever consider switching processes to agile (or anything else) is to solve a problem. If everything’s working great for you, then stop reading right now: keep doing what you’re doing!

The analogy that comes to mind here is, oddly, barefoot running. I struggled for years with shin splints and knee problems to the point where I could run, at most, once a week, on a trail, or else I’d get horribly injured. Eventually I tried barefoot-ish running (in those weird toe shoes) as a way to try to avoid those injuries and, for me, it’s worked wonders. There are people out there who evangelize barefoot running as if everyone should do it no matter what, because it’s better for you or “more natural” and so forth, but you know what? If you’re a runner and you don’t have recurring knee or lower leg injuries and you’re happy running, then keep doing whatever it is you’re doing! It’s working! The last thing you want to do is fix something that’s not broken. If you get shin splits and knee pain, by all means, try the barefoot thing; it might help. But if it ain’t broke, don’t fix it.

A development process is like that: if what you’re doing is working, keep doing it. If someone tries to tell you there’s One True Way to develop software and that if you’re not doing full-time pairing with test-driven development using story cards and iterations and daily standups . . . well, just ignore them, and be content in the knowledge that if that guy is your competitor, your business is going to do just fine.

What Agile Does

So what problems does agile solve? Primarily, agile solves problems that arise due to the interaction between developers and product owners. If you’ve got a division of labor where one person or set of people is responsible for defining what the product should do and prioritizing the features, while other people are responsible for the actual implementation, then you’re likely to run into these problems; if there’s no such division, then you’re much less likely to run into these problems, and agile is likely to be far less helpful. I can think of seven specific problems that agile helps address. Don’t have these problems? Then agile isn’t going to help you.

The first problem is killing developer productivity by constantly shifting priorities and directions. The classic problem here is that a developer starts working on feature A, but gets interrupted because the product owner decides suddenly that feature B is more important. The next day, after talking with a prospective customer, the product owner decides that feature C is really the most important thing. As a result, the developer is left with a bunch of half-finished work, and everyone loses. The primary mechanism for addressing this in agile is the iteration/sprint, which is supposed to be a “time box” where priorities are adjusted only at the start, but not within the time box.

The second problem is an inability for product owners to make well-informed tradeoffs around priorities. For example, in order to decide which out of features A, B, C, and D to work on, it’s important that a product owner know how much relative work those features are. D might be the most important individual feature, but if A, B, and C combined as are much work as D, then that combination of features might be more compelling. Without reasonably-accurate estimates, a product owner can’t make those tradeoffs. Agile attempts to address that with relatively-estimated, small-scale stories (which are ideally also fairly independent, but that’s often easier said than done).

The third problem is having too many unnecessary meetings and status checks. I realize this sounds odd given the number of meetings and ceremony that often accompanies agile (standups, estimation meetings, acceptance meetings, retrospectives, demos . . .), but the theory behind the daily standup meeting is to give everyone who cares about the project’s status a well-known place to listen in, so they don’t bug people for status updates at random one-off times.

The fourth problem is an inability to accurately predict when a project will be done, primarily with the aim of either cutting scope or moving the deadline (or adding developers, which is always dicey), and knowing that one of those will need to be done as much ahead of time as possible. If you’re doing continuous deployment, this probably matters a whole lot less. If you’re releasing packaged software with a hard ship date, it matters a lot more. Agile attempts to address this by empirically measuring the team’s “velocity” and then comparing that to the number of “points” of work left, which (in my experience) tends to work much, much better than just constantly estimating and re-estimating when things will be done. (More on this later, because it’s probably the most important bit.)

The fifth problem is frustrated developers due to poorly-defined features. I’ve been in situations where developers attempted to start work on a particular feature only to find that the product owner hadn’t really thought it through, and that tends to just lead to a bunch of frustration and wheel-spinning: at best you waste time while you wait for a hastily-conceived answer, at worst the developer just makes their own decisions about how things should work and manages to get it completely wrong. Agile attempts to address this problem via story generation and estimation; if you can’t estimate a story, or it seems way too big, it’s a pretty good sign it’s not well defined yet.

The sixth problem is the temptation to adjust the quality knob in order to meet a date target. This one is pretty self-explanatory, I’d imagine, to anyone who’s ever actually developed any software. Agile attempts to address this by getting a shared definition of “done-ness” up front, and then providing accurate information around progress such that other levers can be pulled instead.

Lastly, this isn’t so much a problem per se, but agile builds in time for reflection on the product and the process. The rhythm of iterations gives you natural points for retrospectives where you analyze what’s working and try to change what’s not.

Again, don’t have those problems? Then agile probably isn’t going to buy you much. Have those problems? Maybe it’ll help.

Let me also take this chance to say that development processes work best when they’re voluntarily adopted by the team in question in response to real problems that they want to address. When the developers themselves see the process as something they want, because it helps them do their work well, you’re much more likely to succeed than when the process is imposed on the developers by some outside agent to solve their problems. As a general rule, trying to get developers to do anything which they don’t perceive as helping them do their work is going to be a failure. Developers are happy and think they’re doing awesome work but product owners feel like they can’t prioritize and have no visibility? You’re in for some rough conversations if you’re a product owner or manager trying to impose a new process. Developers are frustrated by constantly changing priorities, vaguely-defined features, and constant nagging about when things will be done? They’ll likely be much more receptive to trying agile.

One problem that agile most definitely doesn’t solve is a dysfunctional organization. Agile evangelists sometimes spin it as a way to make a dysfunctional organization functional, which is precisely wrong the wrong thing: agile can help competent, well-intentioned people be more productive by improving the communication and information flow between different parties in the development process. If people are incompetent, if management and development are at odds on fundamental issues (management wants stuff done as fast as possible, developers don’t want to cut corners), or if the developers don’t trust the product owners to make prioritization decisions around features, agile isn’t going to solve any of those problems. Agile can perhaps help build back trust in an organization by allowing a team to be successful, but if the organizational prerequisites aren’t there, it’s not going to work. And, of course, agile is definitely not a guarantee of success, it’s just one thing that can, in the right situations, help make success more likely.

When Everything Looks Like A Flat-Head Screw

Allow me to make another poor analogy. A development process such as agile is tool, so think of agile as a flat-head screwdriver. The general problem domain for flat-head screwdrivers is effectively “attaching things to other things,” but a flat-head screwdriver is only useful if you’re going to attach those things with flat-head screws. If you have a phillips-head screw, you can maybe wedge a flat-head screwdriver in at an angle and try to use it, but if someone tells you to do it, you’re going to think they’re an idiot. If they ask you to pound a nail in with it, you’re really going to be annoyed. Sometimes, you need a different type of screwdriver, sometimes you need a hammer, and sometimes you just need superglue. Just because you’re attaching two things together doesn’t imply that you need a flat-head screwdriver or that one will even be helpful. And if your only exposure to flat-head screwdrivers is in situations where you really need a hammer, you’re going to think that flat-head screwdrivers pretty much suck, and you’ll wonder why any idiot would ever want one.

Unfortunately, there are agile disciples who seem to omit all those little subtleties. “Flat-head screwdrivers are awesome for attaching things together!” they say. “If it doesn’t work for you, you must be doing it wrong.” Or maybe, just maybe, you really just need some glue, and a screwdriver isn’t going to help at all . . . The one-process-fits-all evangelists have really done a lot to harm the popular perception of agile, I’m afraid.

So when should you *not* use agile? Agile works best when you have a team of people working with a product owner in a known technology and problem domain on definable features that can be relatively estimated with reasonable reliability and built and delivered incrementally. It doesn’t work so hot for, well, anything else. If you’re prototyping something, if it’s a small scale effort (i.e. a few days) you can fit it into agile by doing a timeboxed spike, but for large-scale prototypes it’s inappropriate. If it’s a fundamentally difficult or new problem domain, where you can’t reliably predict even the relative difficulty of any task, agile isn’t going to work. If there are major non-linearities to the work such that things are highly unpredictable, agile isn’t going to work. For example, performance tuning doesn’t fit the model at all: you have no idea how long it’ll take to make something run 50% faster, or if it’s even possible, so writing a story card that says “Make process X run 50% faster” is pretty pointless since there’s no possible way to estimate it. If you’re doing a large-scale refactoring or re-architecting project that’s really an all-or-nothing thing, agile doesn’t work so well; estimates are likely to be unreliable, and you don’t have the option to cut scope by shipping some stories out of that but not others. Hopefully you get the idea. The core of agile is really relative estimation of small-scale stories, and if that’s not possible then it’s not going to work, and things are going to break down pretty severely. In my experience agile also doesn’t deal well with large-scale architectural decisions; those have to be made outside the process, before it starts, or at some point during development you have to temporarily jettison agile as you re-architect things, then re-start your sprints.

Agile is neither useless nor a suicide pact. Use it when it works, do something else when it doesn’t, and use the built-in feedback mechanisms to adjust the process based on the problem domain. If you’ve got a bunch of screws to turn, use a screwdriver. If you’ve got a bunch of nails to bang in, by all means put the screwdriver down and go get a hammer instead. Don’t assume everyone else has flat-head screws to turn, but don’t assume that everyone else has nails either: there’s room in this world for all kinds of tools, and people using ones you don’t find useful are quite probably just solving a different set of problems than you are.

How To Not Screw Up Agile

Here at Guidewire we’ve done our fair share of experimenting, and I think we have a pretty good idea of what doesn’t work, as well as what can work in the right circumstances. And in my perspective on the world, the two non-obvious-and-easiest-to-screw-up aspects of agile that are critical to its success are relative estimation and agreement on “doneness.” We certainly didn’t do those at first, and I hear of lots of other teams that make the same mistakes, so I can only assume it’s an oversight that a lot of people make. The two things work together, and without them everything else in agile kind of falls apart.

Agreeing on doneness means deciding ahead of time what it means for a story to be finished. Sometimes people call this “done done,” since it’s not uncommon for a developer to say something like, “Oh yeah, the FooBar widget is done, I just need to write the tests for it,” with the implication that merely being done doesn’t actually mean work has halted, so it’s not really finished until it’s really truly done done.

Anyway, it’s absolutely critical that doneness be defined up front and that everyone, product owner and developers and QA and anyone else involved agree on it. Does it mean that unit tests are written? That QA has signed off and all identified bugs fixed? That code documentation was added? That it was code reviewed? That it was merged into the main code line? That customer-facing documentation was added? It doesn’t matter what the answers are, it just matters that there are answers that everyone agrees to. (Obviously, though, you want your answers to correlate with the condition the code needs to be in for you to ship it.)

Relative estimation means that stories are estimated in a relative fashion, relative to other stories, rather than in some absolute measure of days. That’s generally done in terms of “points,” with all 1 point stories being about the same size, all 2 point stories being roughly twice as much work as the 1 point stories, and so on. A game like planning poker is often used to help the team converge on estimates to improve their accuracy, and accuracy usually improves over time as the team becomes more familiar with the problem domain. Those relative estimates are then mapped back to actual days by empirically tracking how many stories the team actually gets “done done” in a given period of time, known as the team’s “velocity.” Note that velocity is a team metric, not an individual metric: if you change the team composition, the velocity will change, and the team as a whole can be either greater than or less than the sum of its parts, depending on how well people work together. Also note that velocity is likely to bounce around a bit, especially early in a project, so in practice you’ll often use something like the running average of the team’s velocity for planning purposes. Not perfect, because this is software development that we’re talking about and it’s inherently unpredictable, but it’s far better than anything else I’ve ever seen anyone try.

Relative estimation is much easier to do reliably than absolute estimation; not 100% reliably, of course, but more reliably. Absolute estimation requires a developer to take too many things into account: development time, test-writing time, documentation time, bug-fixing time, the probabilistic chance that something will blow up and go horribly wrong, even general overhead from meetings or other interruptions. It even requires you to take into account individual differences, since Alice might do a story in 1 day that takes Bob 2 days. Taking all of that into account is all really hard to do, as it turns out. Rather than saying something will take “2 days,” which might mean “2 days of uninterrupted work if everything goes smoothly and nothing blows up and I only write a few tests,” it’s much easier to say “this is 2 points because I think it’s about twice as much work as that thing I said was 1 point.” You don’t have to estimate your overhead, or how much time you have to spend on development versus testing, you just measure it. You don’t even have to take who does the work into account, so long as both Alice and Bob take twice as long to finish 2-point stories as they do to finish 1-point stories; Alice could work twice as fast as Bob, and the math still all works out, because you’re measuring overall team velocity, not individual velocity. Maybe your 4 developers get 40 points of work done in 10 days, and maybe they get 8 points of work done; it doesn’t really matter, so long as the estimates are about right relative to each other. That’s how fast you’re working, so now you can start to get an idea of how long the project will take, and make decisions accordingly.

The other crucial advantage is that relative estimation doesn’t pressure people into rushing things to meet the time estimate they gave. As a developer, if you say something is 2 days of work, and you’ve been working on it for 4 days, it’s very, very tempting to just call it done and move on; the psychological pull is pretty strong there. If you said it’s 2 points of work, and the points to days mapping is computed and somewhat variable anyway, it’s much easier to just keep working until it’s done. Maybe our velocity will be lower this sprint as a result, or maybe it was a poor estimate and it was more work than I thought, or maybe that’s just how long 2-point stories take. It’s much easier to just keep working until it’s “done done” if you don’t give a fixed date estimate up front.

I really can’t over-emphasize how important these two things are. If you don’t do these, you’re likely not going to have a good experience with agile. Here’s kind of how things tend to break down. Say you don’t agree on what “done done” means up front. Suddenly, everyone is tempted to adjust the quality knob when the going gets rough, which is exactly what no one really wants. (Note: if that is in fact what management wants, you have deeper organizational problems which agile isn’t going to fix.) Or perhaps you count stories as “done” which aren’t really “done done,” which gives you an inflated velocity for the initial stages of the project, so maybe you think you’re getting 20 points of work done an iteration when in reality you can only get 14 done but you’re fudging the numbers. Now you have two problems: you’ve got 6 points of unscheduled off-the-grid work lurking in the future *and* you’ve got everyone making their plans based on the team doing 20 points of work per iteration, which completely throws off everyone’s ability to plan and prioritize. Or say you try to do absolute estimation instead of relative estimation. Now there’s pressure to cut corners to meet the estimates, and the estimates themselves are wildly inaccurate, but no one is really sure how inaccurate they are because you’re not really rigorous about tracking it or making sure that things are “done”, so once again your ability to measure your rate of progress is gone. Once you lose that visibility into the real rate of progress, not only does it throw off your ability to plan your development schedule and cut scope/shift dates/take other action as necessary, but it starts to make people nervous. Product owners know that they don’t really know how things are going, so they start bothering developers outside of standup meetings and changing priorities mid-iteration to try to exert some control over the process, or they make hasty decisions to try to course-correct, or they just freak out and kill the project because they have no idea if it’ll be done on time and on budget or if it’ll come in 6 months late and 200% over budget, and if it comes in 200% over budget they’ll get fired.

So if you take away just one thing from this ramble of a blog post, let it be this: relative estimation and agreement on doneness are absolutely critical to the success of the agile process.

Summing It Up

So that’s what I’ve got for you this time around. In summary: agile is useful in certain circumstances, it solves specific problems that you may or may not have, it’s only worth trying if you do in fact have those problems and if it seems like a good match to your problem domain, and relative estimation and “done doneness” are essential to the success of the process.


Thoughts on Performance Tuning, Part 1: Making It Faster Versus Not Doing It At All

Performance tuning, as I’ll define it, is the art of making a program run faster. (Note that pure performance tuning is different from scalability as a general practice, which could also involve things like parallelization to take advantage of more hardware). It’s an area that I’ve worked in a fair amount over the years, and it’s a large enough topic that it can’t really be covered in a single blog post, so I’ve decided to do a semi-regular series of posts about various aspects of it. I should perhaps add the caveat that I can’t clam to be a serious expert on all aspects of performance tuning, so by no means will this serious be an exhaustive catalog of all the ways to go about it; there’s simply a huge amount of depth to the subject. Hopefully, however, it will serve as a useful introduction and give some insight into how at least one person thinks about the subject.

At the end of the day, performance tuning comes down to one thing: accomplishing the same end goal (as far as the client is concerned) while using fewer resources: fewer CPU instructions, less memory, fewer disk accesses. But it’s often helpful to break that problem down further into two related sides of one coin. For any given part of the program that looks problematic, you have two high-level options: you can make that part faster, or you can avoid doing it at all. For example, if your profiler tells you that a lot of your application’s time is spent in a function called writePageHeader(), you fundamentally have two options: you can try to make writePageHeader() faster, by optimizing the method or its downstream components, or you can try to avoid calling it, perhaps by caching the output when possible. Even though technically they amount to the same overall thing, i.e. doing less stuff, it’s helpful to think about the problem from those two different directions. Where you decide to target will determine how you look at the problem: making writePageHeader() faster might mean eliminating calls to some downstream writePageElement() function, so depending on where you’re looking you might either be “making something faster” (the writePageHeader() funtion) or “not doing something” (the writePageElement() function).

There are a lot of well-known techniques to use in both camps. When it comes to making something faster, you tend to look at familiar things like algorithmic complexity, for example using hashing to turn an O(N^2) algorithim into an O(N log N) algorithm. There are also other optimizations that make a big real-world difference but don’t show up in big-O analysis; reducing constant factors is often a critical part of optimization, for example by storing the results of a function in a local variable rather than calling it three different times inline, or by conditionally skipping some parts of the computation. And there are other, even stranger optimizations you can make, which I’ll likely talk about in a later post, which often depend on the interaction between your program, the compiler, or the hardware you’re running on: for example, re-arranging functions so the compiler or JIT can inline them, or so that memory accesses run along cache lines instead of across them. There’s a nearly infinite number of techniques to learn, and many of them are platform-dependent.

There’s a similarly large array of techniques when it comes to attempting to avoid doing something, though often times the answer comes down to caching something, either trivially (such as in a local variable as mentioned above) or in a more complex manner (for example by caching page fragments so you don’t have to regenerate them). Lazy computation can also help sometimes, if certain uses of the application or code paths will ultimately avoid triggering the computation at all, or will delay it usefully (perhaps amortizing the cost and reducing latency). Restructuring upstream code in certain ways can also be used to eliminate calls (that two sides of the same coin thing again). Occasionally, you might have to take more drastic measures by simply eliminating some kinds of functionality; for example, if your application colors text red when a certain condition is met, and evaluating that condition is taking up half your page rendering time, you might decide that the functionality simply isn’t worth the cost, or you might show the information in a tooltip when the user hovers over the area, delaying computation until that point and making the initial page render faster.

This isn’t intended to be anything close to an exhaustive list of techniques; the more of them that you know, though, the more options you’ll have when faced with a problem.

Ultimately, being successful when doing performance tuning requires the ability to be flexible in how you approach the problem, and to look at it from many different perspectives. Perhaps your profiler shows that writePageHeader() is a hotspot, but you can’t seem to figure out how to change that method and make it any faster. So instead, you start trying to figure out ways to just not call the function at all. If you’re unsuccessful there, you can try moving down the stack, and find the core methods writePageHeader() calls and squeeze some extra performance out of them instead. If that’s not good enough, you move way up the stack, back to your overall renderResponse() method, to see if there’s anything at the higher level, six calls up the chain from writePageHeader(), that can be changed to do some really high level page fragment caching. If that doesn’t work out, perhaps you decide that writePageHeader() isn’t going to get any faster right now, so you move on to a different hot spot to see if you can make progress there. Performance tuning really requires understanding multiple different perspectives of the same thing, even though you can only view the world through one lens at a time. In some ways, your job is akin to what the cubist painters were trying to do: the mental juxtaposition of a large number of normally-exclusive views on things to help reveal some larger truth about the relations between pieces. The key, then, is to have as many different perspectives as you can, to have tools at your disposal to help you get those perspectives, to know what techniques are useful for each perspective, and then to be able to shift perspectives freely whenever you get blocked.


OO Programming, instanceof, and Separation of Concerns

I know it’s been a while since I (or anyone else) posted here, but that’s not due to a lack of activity internally. We’re working hard on getting the latest versions of our products ready, and we’re still pushing out relatively regular point releases of Gosu. Hopefully in 2011 I’ll find the time to write a little more regularly.

There’s a fairly common aversion in the Object Oriented Programming community to code that uses the Java “instanceof” operator (or its equivalent in another language), and the aversion is reasonably well-founded: the instanceof operator often indicates a failure of encapsulation. For example, if you ran across code like this:

public void draw(Shape s) {
  if (s instanceof Circle) {
    drawCircle((Circle) s);  
  } else if (s instanceof Rectangle) {
    drawRectangle((Rectangle) s);
  } else {
    throw new IllegalArgumentException();
  }
}

it might be reasonable to say that you should instead move the draw() method onto the Shape class and have it implemented differently for different shapes. That has a number of benefits. First of all, it localizes the logic along with the class that’s going to be drawn, and presumably the author of the Shape in question, having implemented the class, knows how they intended for it to be drawn. Secondly, without that kind of encapsulation the author of the draw(Shape) method will need to update the method every time a new Shape is added, which means that two different parts of the system are now tightly coupled, and they’re coupled in a way that’s not easily discoverable. Third, it makes it clear to a new Shape author what’s required of them: if there’s an abstract method on the superclass (or if Shape is an interface), it’ll be clear that a draw() method needs to be implemented.

That’s a fundamental enough principle of good OO design that some people tend to take it a little far and start suggesting that all such if/else branches using instanceof are a bad idea, or even that all if statements themselves are somehow a failure of encapsulation. Unfortunately, that tendency can sometimes result in a cure that’s even worse than the disease: it can lead to a massive violation of the separation of concerns.

Separation of concerns refers to keeping distinct pieces of functionality within a given system as separated from one another as possible. (See http://en.wikipedia.org/wiki/Separation_of_concerns). In order to control the complexity of a larger system, it’s essential that the system be divided into pieces that interact and overlap as little as possible so that they can be worked on and understood independently. Taking OO programming too far, then, can violate separation of concerns by forcing code that properly belongs to one concern and jamming it into an area that’s part of another concern.

For example, suppose that instead of dealing with drawing shapes, the Shape objects that we’re getting were originally intended to represent elements being drawn in a client-side application, and now we want to add an alternative rendering engine that outputs Javascript. The OO-zealot way to do that might be to add in a method to each shape like:

drawUsingJS()

In this case, however, the Javascript-drawing layer is a separate concern from the Shape layer that is focused around constructing and manipulation shapes. Attempting to combine the two by pushing Javascript-specific logic up into the Shape classes would likely lead to an engineering disaster: the parts of the system would become tightly coupled, and if the Javascript team decided that they wanted to move to a different base Javascript library, everyone on the Shape team would have to rewrite their code and a huge amount of cross-team communication and coordination would be required. Instead, the Javascript team should logically be considered to be downstream of the Shape team and a client of their APIs (i.e. what methods are publicly exposed on the Shape objects), leaving them free to use those APIs in whatever way they need to in order to implement their functionality.

Of course, that could well leave them writing code that looks like:

public void drawUsingJS(Shape s) {
  if (s instanceof Circle) {
    drawCircleUsingJS((Circle) s);  
  } else if (s instanceof Rectangle) {
    drawRectangleUsingJS((Rectangle) s);
  } else {
    throw new IllegalArgumentException();
  }
}

which is exactly the sort of code that OO zealots (or the anti-if-statement crowd) will tell you is horrible. That’s not the only implementation option, of course. You could choose to create JS wrapper objects around each Shape class that themselves have a drawUsingJS() method on them . . . but then the construction of those wrapper classes might end up as a bunch of if/else/instanceof calls (unless you want to do it reflectively, which is a reasonable alternative but can lead to its own sorts of problems). You could also use double dispatch here, with a drawUsingJS() method on Shape that takes a JSDrawingContext object which has methods on it like drawCircle, drawRectangle, etc. . . . but then you end up in a situation where the two components are coupled to each other, and the Shape library has to be aware of the JS-drawing library, rather than just having a one-way dependency.

Plenty of other patterns and options exist for handling that kind of implementation problem, but my point is not to argue for any of them. Rather, my point is that there’s no easy answer and you’re going to have to make a tradeoff between, on the one hand, using traditional OO encapsulation techniques that tightly couple the libraries together and violate the separation of concerns, or you lose all the benefits of OO encapsulation while keeping concerns separate and ensuring that the dependencies in the graph flow only in one direction. In some circumstances, coupling things will be the right choice, and at other times keeping them loosely coupled will be the right one.

This is just yet another instance of the general rule that no particular programming approach or rule is itself infallible: you often have to trade off one sort of a problem for another and decide which sort of non-ideal state is less harmful. The task of managing a large software system is often reducible to the task of managing (and containing) a huge amount of complexity, and managing complexity is itself an exercise in making tradeoffs given imperfect information. And the fallacy with being too much of an OO zealot is not so much in making bad tradeoffs as it is in failing to acknowledge that there’s a tradeoff to be made, and that sometimes the balance is going to tip away from what might otherwise be seen as a “best practice” in order to avoid doing something even more damaging.


One Language to Rule Them All: The Importance of Scaling Up and Scaling Down

As anyone that’s programmed in Java can attest, Java does a horrible job of scaling down to small projects. Part of that is simply due to the Java syntax and standard libraries: there’s no way to write a little code to solve a little problem. If you want to write a program that does something simple like looping through every line in a file and counting the number of times a particular word appears, good luck doing it in only a few lines of code. But even if those problems get fixed, Java has an additional problem, which is simply that it’s a compiled, statically-typed language, which necessitates a toolchain that’s simply unnecessary for dynamic languages that can execute as scripts. You can’t just write some .java files and then say “java MyScript.java” and watch it execute: you have to compile them down to Java classes, which means making decisions about where to store the .class files and how to make sure they get loaded in to your classpath at runtime. And of course, to do anything interesting in Java you’ll probably need to download some third-party libraries and figure out where to put them and how to get them into your classpath, and the Java syntactic overhead is really only manageable with an IDE . . . so basically, doing just about anything in Java requires installing an IDE, deciding on your project layout, creating a project for your IDE with the right path settings, maybe writing some build scripts to do the compilation, and probably writing some other scripts to actually run your program with the right classpath arguments; the overhead of doing that is only worth it if you’re writing at least a few thousand lines of code. Once you have the environment and build scripts set up the environmental issues are less of a pain, but they represent a fairly large barrier to entry for small-scale projects where the development time is measured in hours rather than months or years. The same set of problems holds, to a greater or lesser extent, for pretty much any other compiled language like C or C++: the extra overhead of creating build scripts, figuring out where to store compilation artifacts, and figuring out how to link in additional libraries is a serious pain (though at least on Linux you have things like a default path for libraries that make compiling a “hello world” program in C without a makefile a little less painful).

Dynamic languages, on the other hand, scale down very well to those small-scale projects: just write your hello-world.rb file and run “ruby hello-world.rb” and you’re done. How long did that take? Ten seconds? And since most mature dynamic languages have some sort of built-in packaging scheme like Ruby gems or Python eggs, for simple scripts you can easily pull in any libraries you’ve already got on your system, and you can easily pull in new ones without worrying about build scripts and classpath shenanigans.

The ability to scale down is one reason why we think it’s important that Gosu has first-class programs, which can have an embedded classpath directive in them, so that you can use Gosu for simple one-off scripts and not just for larger projects. It’s also why we ship a usable out of the box editor that can do things like code completion, error highlighting, and other basic stuff so that you write simple scripts and explore the language without needing additional tools (we’re working on making it better). One thing we don’t have an answer for yet is around the default packaging, installation, and usage of Gosu libraries: it’s definitely something to handle better in the future.

That’s the scaling down story. But what does it mean to scale up? Scaling up means handling projects with more developers, more lines of code, more features, more versions, and more customers. This is where things get a little more contentious: fans of static typing, like myself, tend to argue that statically typed languages scale up to larger projects better than dynamically typed languages. Why? First of all, static typing helps when coordinating across large numbers of people (both concurrently and through time) by providing more up-front error checking and discoverability: if you want to add another argument to the foo() method, static typing makes it easier for tools to identify all usages of that method, and if you change it in an incompatible way, there’s a good chance the program won’t compile. It’s not foolproof, and it’s not nearly as useful to someone who understands the entire code base, but it’s a very valuable tool when the code base is large enough that people often have to make changes to code that will in turn affect code they’re not even aware of. It also makes it easier to discover how someone else’s code works, thanks to things like auto-completion, reference-traversals, usage finding, and other such tools that are enabled by static types. Secondly, static typing makes it easier to refactor and clean up large code bases, since it makes it possible for tools like IDEs to make those changes safely and automatically. If I want to rename the MyClass.getName() method to getShortName(), my IDE can do that while only changing references to MyClass.getName() and not references to OtherClass.getName(); if there are only 5k lines of code and there’s only one method named getName() that doesn’t matter so much, but if there are 500k lines of code it’s more likely that there are several methods with that same name, and static typing makes it possible to differentiate between their uses statically. Lastly, static typing makes it easier to create hard APIs: using a construct like a Java interface makes it clear what the contract is that a particular class will adhere to, which makes it easier to understand what sorts of changes will end up breaking the API and thus affect caller code and which changes can be safely made without changes to callers. Managing large, long-lived projects is highly dependent on your ability to properly modularize code into components that have well-defined APIs to other parts of the system. Defining those APIs tightly, and telling when they’re changing, is made much easier by static typing.

Now, none of those statements I made before are contention-free. The most common argument I hear is that dynamic languages are so much more efficient that you simply don’t have large projects with large numbers of people. Your 500k LOC Java project might turn into a 50k LOC Ruby project, and your team of 50 might instead become a team of 10, so many of the arguments about dealing with large code bases and large teams don’t apply. There’s some truth to that argument, and it’s definitely worth pointing out, but acting like it’s a discussion-ending trump care tends to display a certain amount of ignorance/arrogance around A) the scale of other people’s systems and B) the efficiency of code in a dynamic language versus what the best engineers can do even in a syntactically-crippled language like Java. So while it’s true that syntactically-powerful languages can help minimize the problem, and thus keep more projects below the threshold at which the project becomes so big that it can’t be done by a small team, that doesn’t mean that there simply aren’t any projects that are fundamentally big and complicated and require a lot of work and a lot of people. I also don’t personally lend much credence to arguments that refactoring can be done well in a dynamic language (I’ll believe it when I see it, and SmallTalk isn’t a counter-example: show me refactoring tools for Python or Ruby or Javascript working anywhere near as well as they work for Java and I’ll believe it).

That isn’t to say that there aren’t tradeoffs, or that static typing is always a win: you could reasonably argue that you think the benefits of static typing in terms of tools and API clarity aren’t worth the other associated costs relative to your preferred dynamically-typed language. My point isn’t to convince anyone that static typing is better than dynamic typing, but rather just to argue that static typing has certain benefits that are valuable when working on large projects.

The ideal language to me, then, is one that is able to both scale up and scale down. I want something that I can use to write one-off scripts to do simple things, that’s still fine for a 50k LOC program, and that still excels when I have 5 MM lines of code. I have a pretty good memory, but my mental capacity is still limited, so I don’t want to have to learn a bunch of different languages that are each well-suited to a different task: I don’t want to constantly be trying to juggling a bunch of different syntax rules/execution ordering rules/libraries/toolsets in my head, I just want one set of tools I can set up and use for whatever I need to work on. To me, at least, that means something that has first-class scripts, reasonable default libraries, a concise syntax, static typing, and excellent tooling with support for automated refactoring. It would help if the performance is good enough to never be an issue, and if the language itself is relatively easy to learn (which is yet another highly-contentious metric). Right now, though, I don’t see such a language out there; it’s certainly the niche we’re hoping to aim Gosu at, though, so perhaps a few years from now we’ll be able to credibly say that we think it’s a good candidate for a language that can scale both up and down.


Why Gosu?

The most common question we’ve gotten following the release of Gosu as a programming language is pretty simple: Why? Why did you create your own language? Why not use an existing language like Scala, Groovy, Javascript, Clojure, Ruby, Python, C#, or basically anything else at all? Why does the world need yet-another programming language, especially one that doesn’t seem to have any ground-breaking features? Why should anyone care?

There are kind of two sides to the answer. The first is the Guidewire-specific, historical part of the story: we needed a language with particular characteristics to use for configuring our applications (statically typed, dynamically compilable, some metaprogramming capabilities, fairly simple syntax and easy learning-curve for people familiar with Java or other similar imperative languages), and at the time that we started working on Gosu (back in 2002) there wasn’t really anything close to what we needed. We ended up creating it almost accidentally out of necessity.

But of course, that reasoning applies only to us, at Guidewire, and why we need something that we couldn’t find off-the-shelf. Why should you, if you’re someone outside Guidewire, care about Gosu?

As we worked on Gosu, we started to realize that we could actually turn it into a language that we, the language authors, liked, and that there’s currently a vacuum in the programming language landscape for the sort of thing that we were creating: a fairly “simple” (I realize that’s a massively loaded term, but bear with me) language that’s statically typed, dynamically compilable, with some metaprogramming capabilities, syntactic sugar in the most needed places, and with language features like closures, type inference, and enhancements that address some of the most glaring deficiencies and pain points in Java. We want to build something that, at least eventually, is unequivocally better than Java: a language that retains all the strengths of Java but has strictly fewer weaknesses.

Why target Java as the baseline? Because Java is, these days, essentially the lowest common denominator language for a lot of people, especially within the business community. It’s also the first language a lot of people learn in school these days. I have plenty of issues with Java myself, but it does a lot of things right as well, in my opinion: it’s statically typed, that static typing enables a huge array of tools for working with the Java language (i.e. excellent IDEs with refactoring support and error highlight as-you go), it can perform basically as fast as C or C++ for long-enough-running processes, it has garbage collection and a reasonable object-passing model (as compared to, say, worrying about pass-by-reference versus pass-by-value semantics) and a reasonable scoping model, the syntax is familiar enough to most other imperative languages that it’s not too much of a shock to transition from C or C++ or similar, and the lack of “powerful” language features also means that most people’s Java code looks pretty similar, so once you learn to read Java and understand the idioms you’re rarely thrown for a loop (with the glaring exception of generics).

Basically, Java manages to be a lowest common denominator language at this point that, while it’s lack of language features is really annoying, largely manages to avoid any absolute deal-breakers like poor performance, a lack of tools, or an inability to find (or train) people to program in it.

Now, that’s all speculation/opinion on my part as to why Java is where it is currently. You, the reader, are free to disagree with it. If you do agree in large part with that, though, it becomes clear what the imperative is for the next lowest-common-denominator language: it can’t screw up any of the things that Java got right, but it needs to improve on all the places where Java is weak.

So what are the deal-breakers to be avoided? The first is dynamic typing, in my opinion. Saying that is essentially flame-bait, I know, so I’m not trying to say that static typing is strictly better than dynamic typing, but merely that static typing enables certain things that people actually like about Java. Static typing enables all kinds of static verification, of course, which a lot of people find useful, especially on larger projects. When moving people between projects and forcing them to get up to speed on a new code base, for example, static typing can help make it obvious if you’re using a library even remotely correctly, or if your changes are going to clearly break someone else’s code. More important at this point, I think, is that static analysis enables amazing tooling: automatic error-detection as you type, auto-completion that actually lets you usefully understand and explore the code you’re working with (instead of, say, suggesting all functions anywhere with that name), automated refactoring, the ability to quickly navigate through the code base (for example by quickly going to the right definition of the “getName()” function you’re looking at, rather than one of the other 50 functions with that name somewhere in your code base).

Static typing also tends to be an important factor in execution speed, though dynamic languages are catching up there; you might argue that execution speed doesn’t matter anymore, but I’d argue that there’s always the possibility that it might matter in the future even if it doesn’t matter now, so at least in the business world people doing core systems work are often scared away by anything that they think has a performance disadvantage that might, at some point in the future, require them to purchase a bunch more hardware. You might disagree and attribute it to a failure of imagination on my part, but I find it hard to imagine the next LCD language being dynamically typed.

The second deal-breaker is what I’ll call “unfamiliarity.” There are lots and lots of people out there who know Java, or C, or C++, or C#, or even VisualBasic, and they’re so used to the standard imperative language syntax of those languages that something too foreign to that simply isn’t going to fly. It doesn’t matter how good the language is, or how expressive it is, something that doesn’t fit roughly into that mold simply won’t become the next LCD language, at least not any time soon.

The last deal-breaker is what I’ll call “complexity,” another obvious flame-bait term. Everyone’s got a different definition of the term, but here I’m equating it to roughly two related things: first of all, how hard is to fully learn and understand all features of a language, and secondly, for any two programmers A and B how different is their code likely to look/how easy is it for them to write code that the other one doesn’t understand. Again, I’m not trying to start a flame war, and opinions seems to very greatly on the relative complexity of languages, so hopefully we can all at least agree that if a language includes monads, a concept which historically many people have struggled to understand, that ups the complexity bar a fair bit, while a language like Ruby that doesn’t have them is probably a bit “simpler.” Likewise, languages like C and C++ with explicit pointers and memory management are more complex than languages that abstract out those details. Languages like Python where there’s one standard way to do things are also less “complex” by this metric than languages like Perl, where there are multiple ways to do everything, not so much because the individual features of Perl are complex but simply because it’s generally easier for Python Programmer A to read Python Programmer B’s code than it is for Perl Programmer A to read Perl Programmer B’s code. (Again, that doesn’t mean you can’t write readable, awesome code in Perl, I’m just talking about the presumed statistical average difference between two people’s coding styles.)

So to sum that all up: my theory (and I think I speak for the other Gosu designers as well) is that the next LCD language will be statically typed, imperative with a familiar syntax, and will avoid shall we say, “more advanced” language features.

So what can we add to Java to make it better? Well to start with we can add simple type inference with simple rules to make the language less verbose and make static typing impose less of a tax on your code. We can add in simple closures to make data structure manipulation or basic FP coding possible. We can add in enhancement methods so that people can improve APIs that they have to use without resorting to un-discoverable, ugly static util classes. We can add in first-class properties to avoid all the ugly get/set calls. We can add in syntactic sugar around things like creating lists and maps. We can add in dynamic compilation and first-class programs so that the language can scale down to be suitable for scripting. We can simplify the Java generics model and undo the travesty of wildcards. We can kill off the more-or-less failed concept of checked exceptions. And we can add in some metaprogramming capabilities, ideally in a way that’s relatively transparent to the clients of said capabilities so it doesn’t bump up the complexity factor too much.

If we do that, what we’re left with is a language that’s pretty good for most things, without too many (in my opinion, of course) glaring weaknesses: something fast, that allows for good tools, that scales down to small projects or up to big ones, that has enough expressiveness that you only have to write a little code to solve a little problem, that’s easy for programmers familiar with the existing LCD language to pick up, and that has enough metaprogramming capabilities to let you build good frameworks (because good frameworks, in my opinion, require metaprogramming . . . but that’s a different argument).

So that brings us full-circle back to our original question. Why Gosu instead of some existing language? Well, Java has all the flaws everyone knows and is annoyed about. C# is still controlled by Microsoft and isn’t truly cross-platform. C++ is too complex and easy to screw up, which is why so many people moved to Java. Python, Ruby, and Javascript are all dynamically typed and don’t really have good tools around them. (Frameworks and libraries? Definitely. IDEs that let you navigate and refactor? Not so much). Clojure is dynamically typed, lacks good tools, and any kind of Lisp is a bridge way too far for the LCD crowd. At this point any new language that’s not on the JVM or the CLR will have a massive uphill battle in terms of trying to get library adoption. Scala is statically typed and solves a lot of the problems with Java, but it’s also in my opinion a pretty “complex” language that many Java programmers would have a hard time fully understanding or leveraging. Groovy is perhaps the closest thing to what Gosu is, but it’s dynamically typed (with optional type annotations).

So merely in that sense, we think that Gosu fills a vacuum for a statically-typed JVM language that has a familiar syntax, doesn’t add too much complexity to Java, and which improves on Java in the most critical ways.

The last topic, which I haven’t really touched on at all, is around the metaprogramming allowed by Gosu’s type system. That’s worth another blog post simply in itself; the short version is that since the type system in Gosu is pluggable, that types can be generated or modified “on the fly” (i.e. really up front at parse time) to let you do things like take a WSDL and turn it into usable types that can be invoked just like any normal Gosu or Java class, but without having to do code generation. It’s the sort of thing that dynamic languages are incredibly good at but which, in a statically-typed language, has historically required reams of ugly code generation. There are other neat framework tricks you can do given that more runtime type information is available in Gosu than in Java or most other statically-typed languages. That’s what we really think will emerge, over time, as the killer feature of Gosu. For now, though, that’s less apparent, because it will only become a killer feature if people leverage it to create frameworks that other people want to use; it’s not the sort of thing you yourself will want to use every day, it’s something that you want other people to use when building the libraries and frameworks that you use every day.

So there you have it. That’s my (overly-verbose, as usual) explanation for why we think Gosu has a place in a world that already has so many existing language options; you most certainly don’t have to agree with my reasoning or my arguments, but hopefully it’s at least now clear what they are and what we’re trying to do. We’re not trying to push the language envelope in new directions, or to come out with language features no one’s ever thought of before; we’re trying to pick from the best ideas already out there, wrap them up in a pragmatic language that we’ve tried to keep simple and easy to pick up, and create something that will be appealing and useful to the very large number of programmers in this world who just want something relatively familiar that makes their programs easier to write and maintain without having to give up too many of the things that they’re already used to and have come to rely on.


Gosu 0.7.0 Is Now Available

The title pretty much sums it up; the first publicly available version of the Gosu programming language is available for download at the main language site, http://gosu-lang.org/. We’re all pretty excited about this, so we’d love for everyone to try it out and let us know what you think. If you have questions you can contact us via the Gosu-lang Google group and you can report any bugs you find via our Google code group. Please, go give it a whirl!


My Ignite Silicon Valley 2 Talk on How to be Wrong

Last night I gave a talk at the Ignite Silicon Valley 2 event down at Hacker Dojo in Mountain View. Ignite is an interesting idea: everyone presenting gets exactly 5 minutes to deliver exactly 20 slides, and the slides are set to auto-advance every 15 seconds. It definitely keeps the talks moving along, but it certainly makes ad-libbing a lot harder, so I ended up writing out my entire talk and then more or less memorizing it. The talks were recorded, but unfortunately the first half of the talks (which included mine) had no audio. So instead I’ve posted the text of the actual talk that I wrote, which matches what I delivered pretty closely.

Here goes:

If there’s one thing I want everyone listening tonight to come away with, it’s this idea: correctness is a collaborative effort, not an individual one. Being seen to be right by others is not the same as actually getting the right answer, and it’s almost always the right answer that’s the important thing.

I’m an engineer by profession, and this talk is most applicable to endeavors like science and engineering, but many of the ideas apply equally well to other areas, like relationships where the same truth holds that the “right” answer is most appropriately found through a collective effort.

Most of us, naturally, spend our lives trying to be right, and we hold our beliefs and opinions because we think they’re correct, not because we’ve chosen them randomly or capriciously. But everyone does this, even people we vehemently disagree with. They’re just as convinced that they’re right as we are.

When we’re trying to solve a problem or answer a question, then, we tend to have two subtly different options: we can try to convince everyone else that we’re right, or we can present our arguments and opinions in such a way that we try to move the group as a whole towards finding the correct answer, even if it’s not the one we ourselves proposed.

The first option, and what most of us do instinctively, is to try to convince other people that we’re right. I call that approach “being seen to be right,” because it has nothing to do with actual correctness, and everything do with other people’s opinions. Trying to be seen to be right often involves tactics such as rhetorical tricks to make things sound better than they are.

Those include strategies like appeals to emotion, false analogies, misrepresentations of alternative positions, logical fallacies such as appeals to authority or ad hominem attacks on people holding other viewpoints, or simply trying to win an argument by sheer force of personality.

Truly committing to trying to find the right answer, in contrast, requires putting aside our egos, recognizing our own fallibility and biases, making our arguments as clearly as possible, and opening ourselves up to the possibility of being wrong.

We can often learn as much from an incorrect answer as we can from a correct one; science is advanced just as much by proving hypotheses incorrect as it is by experiments that confirm what we already think to be true. A few tips, then, on how to work towards the right answer through a collaborative effort.

Tip #1 is to avoid rhetoric. Avoid appeals to emotion, false analogies, and clever soundbites that oversimplify complex problems. Avoid logical fallacies like appeals to authority or ad hominem attacks.

Tip #2: Make it easy to for someone else to pinpoint exactly where they disagree with you. Doing that involves clearly identifying your assumptions and facts as well as your reasoning and how you logically proceed from those assumptions and facts through to your conclusion.

Philosophy papers will often go to the length of giving numbers to individual statements that are supposed to follow logically from one another, so that if you disagree you can clearly state that you think assumption 2 is incorrect, or that point 3 doesn’t follow from 1 and 2. If someone disagrees with you, you want them to be able to pinpoint the exact points of disagreement, rather than just saying “I think you’re wrong.”

Tip #3: Be honest if you’re unsure about something. If you think an argument is weak, or if you’re not sure about a fact, say so. Doing so helps highlight the issues that are most in need of further discussion or enlightenment.

Tip #4: Anticipate and reason through criticisms of your argument and potential counter-proposals. Do it as objectively and as fairly as you can, and don’t gloss over them. Try to break your own argument to find where it’s weak, and legitimately try to adopt contrary viewpoints.

Tip #5: This is perhaps one of the hardest ones to do, but be willing to change your mind. You may find that someone else makes a convincing counter-argument, or you may find that if you honestly do your best to consider alternative viewpoints you like one of them better than your original opinion.

On a personal note, one of the hardest things for me to learn to do as a philosophy undergraduate was to throw out a paper that was 80% written when I found a counter-argument that I simply couldn’t refute. In that case I’d simply have to start over, building an argument that was completely counter to my original position.

Tip #6: Don’t bully people. Especially if you’re someone who’s used to being right, or someone in a position of authority or respect, it can be easy to steamroll people unintentionally. If people are overly-deferential to your opinions, you should do everything you can to make sure they feel like they can disagree with you.

Tip #7: Don’t take it personally. Everyone is wrong at times, and the smartest scientists, philosophers, or engineers you can think of have been all been incredibly mistaken about very fundamental things. It happens. Being wrong doesn’t mean that there’s something wrong with you.

Tip #8: Avoid a culture of blame. If the overall culture of a group or organization is one where people are punished or blamed for any small kind of incorrectness, it will encourage people to pursue being seen to be right at the expense of actual correctness.

Tip #9: You can’t win if you don’t play. Even if you’re not 100% sure you’re right, if you’ve got on opinion about something be willing to put it out there. You can still make a valuable contribution to a discussion without having the ultimate right answer.

Tip #10: Keep your eyes on the prize. The prize is almost certainly not simply having people think you’re smart. It’s more likely something like the overall advancement of human knowledge, the proper functioning of some system, or merely a happy relationship. Whatever it is, it’s a goal that’s best achieved collectively.


Bad Testing Idea #357: Automatic Suite Partitioning

Here at Guidewire we’ve been attempting to do automated developer testing of one sort or another for probably about 7 1/2 of the 8 years I’ve been here, and in that time we’ve come up with a lot of bad ideas. As it turns out, writing tests that can then be run and maintained for years across multiple major versions of a product is really, really hard. There are a bunch of different ways to fail: writing tests so narrowly that they don’t accurately test the application (i.e. the tests still pass even though the application is broken), writing tests that are incredibly fragile and require huge amounts of maintenance, writing tests that are non-deterministic, writing tests that run differently in your automated harness versus on a developer’s machine, writing tests that run too slowly . . . the list is endless, and we’re in a fairly continuous cycle of adjusting how we write tests as we learn more about what works and what doesn’t.

Today, though, I’d like to call out one particular bad idea we’ve had, in the hopes that I can discourage anyone else from ever trying it: automatically partitioning large, long-running test suites so they can easily run in parallel.

Those unit-test evangelists among you might scoff at the premise of the idea: “Why would you ever have suites that take that long to run?” you might ask. Well, given that this is the real world and all . . . stuff happens. When you step out of the world of true “unit” tests and move into integration testing, things start to slow down a bit . . . and when you start actually testing your UI, there’s really no hope. A suite of 5000 UI-level tests simply isn’t going to run in any reasonable amount of time, no matter what sort of technology you’re talking about. (If there is some technology that can test a web client, or a desktop client for that matter, with an average test running time of < 0.1s, some please correct me . . . but anyone that’s ever used Selenium or SWTBot will probably be lucky if their tests execute in 1s per test on average).

As with all such ideas, this one started innocently enough. At first we didn’t have enough tests to need to split them up: we’d have a test suite for all the domain logic for an application, and a test suite for the UI, and they’d each take a few minutes to run. But as we added more tests, and more logic, the suites started to take longer and longer, so the logical thing was to split the tests up into suites so they could be run in parallel. So how did we go about doing that? Well, like the engineers that we are, we came up with an engineering solution: parallelization is something that should be done automatically by the framework, not something you should have to think about, right? So at first, we just created suites named things like CCServerTestSuite1, CCServerTestSuite2, through CCServerTestSuiteN, and the suite had some simple logic: find all the tests that could be in the suite, divide the number of classes by the number of suites, segment the test classes into N buckets, and then pick the appropriate bucket. So if there were 10 tests and 5 suites, Suite1 would run tests in classes 1 and 2, Suite2 would run tests in classes 3 and 4, and so on.

Of course, that was annoying because we had to manually monitor the number of tests, the running times of the suites, and add suites as they got too slow. So we turned the crank once more and changed our test harness to itself automatically partition a suite into N pieces, where N could be dynamically determined based on the actual running time of the suites, with the number adjusted up or down to try to stick near a target suite running time.

With the benefit of hindsight, I can now confidently say that all of that was just a terrible, terrible idea. Now in an ideal world, where tests within a test suite had no chance of interacting, this wouldn’t be nearly as bad. After all, that’s how you’re supposed to write tests, right? Well, sure . . . but as always, reality intervenes, and it turns out that for certain classes of tests (those darn integration tests again), it’s hard to ensure there are no interactions. If you’re testing the search functionality of the UI, you’d better be sure you know what’s in the database prior to the test executing, and that no prior test in the suite has mucked things up. You’d also better be sure you don’t have any static variables or other shared state that gets modified by any of your tests. Again, having tests not interact with each other is Testing 101 sort of stuff, but in practice it can often be difficult to 100% ensure it doesn’t happen (aside from running each test in isolation), and when you do inevitably screw it up you won’t notice until you have the right combination of tests running in the right order in your test suite. As a result, you can have latent test interactions that only show up as tests shift get re-ordered. To make matters even worse, our test suites have historically been fairly heterogeneous, meaning that the tests themselves require the system to be in different states prior to their execution, and the test framework is responsible for making the necessary system changes prior to the execution of the test . . . but that code is also not infallible. As a result, certain issues will show up depending on which tests execute first in a suite and thus perform the initial system setup.

So what happens when you add in automatic suite partitioning? Suppose again that we’ve got 10 tests, Test1 through Test10, and initially we split the suite into two partitions. Partition 1 includes Test1 through Test5, while partition 2 includes Test6 through Test10. Now suppose that you add two more tests; suddenly, partition 1 includes Test1 through Test6, while partition 2 contains Test7 through Test12. All of a sudden, Test6 now runs after five other tests, rather than as the first test in a suite. If Test6 interacts with any of the first five tests, that issue will only show up after it moves over to partition 1, and you’ll end up with a test break showing up in your continuous test harness that coincides with merely checking in additional tests. Given that people are adding, removing, and modifying tests all the time, that sort of shift of tests from one partition to another happens basically constantly in our test harness. In the absolute worst case, the new combination of tests results in some sort of memory leak, deadlock, or other problem that ends up killing the entire test suite partition rather than just resulting in a failed test.

There’s one other, less catastrophic problem with automatic test partitioning, which is that the partitions end up being lumpy in terms of their running time, rather than consistently even. Tests, especially integration and UI tests, can vary widely in terms of their running time, so merely ordering tests by name and then chopping them up into evenly-sized (in terms of number of classes) partitions doesn’t ensure that the partitions themselves will be even in terms of running time. It’s fairly common for our partitions, for example, to vary in running time between 5 and 35 minutes, simply because the automatic partition splits end up lumping together a bunch of slow-running tests. The testing turnaround time for a code branch is, naturally, bottlenecked by the slowest-running suite partition, so having lumpy suite execution time merely means that we spend more time waiting for tests to finish running prior to pushing or pulling a branch; not catastrophic, but certainly not ideal either.

So what’s the solution? Well, the first obvious solution is to make the tests run fast enough that you don’t need to partition them. That’s a whole lot easier to do if that’s an explicit design goal of your testing efforts from the start, and a whole lot harder to do if you’ve ignored test speed problems over the years because you assume you can just run the tests in parallel anyway. Barring that, I much prefer to explicitly group tests together in suites based on whatever sort of categorization makes sense (ideally functional area), explicitly controlling which tests run with which other tests so that the interactions between tests within a suite are at least stable (i.e. Test6 always runs before Test7, and never runs after Test5 because it’s always in a different suite) and so that the suites can be chopped up in a way that gives them a more consistent running time.

There’s one other possible approach worth mentioning, which is to completely isolate the tests somehow. If each test runs entirely on its own against a freshly-started server, a fresh VM, a fresh database, a fresh browser session, etc., there’s no chance of interaction between tests, and you can parallelize at the level of the individual test class. Unfortunately, doing that for integration tests that require a significant amount of one-time setup (i.e. starting up a server, initializing the database, etc.) in a way that’s performant in a world without infinite processing resources is . . . difficult. One of our developers has done some experiments internally around trying to make that happen using virtual machines, but we haven’t yet managed to develop the technique to the point where we can realistically do it for our entire test harness.

So for now, at least, we’re moving back to explicitly organizing suites and away from automatic test suite partitioning. It was a noble experiment, but one that I ultimately consider to have failed.


In Defense of the Open Plan Office Layout

Every so often some meme seems to fly around the internet (or at least around the parts that I tend to frequent) that explores some variant of “engineers need offices (or cubes) so they don’t get interrupted” or “physical proximity doesn’t matter.” Now, I’m a firm believer in the theory that if it works for you, keep doing it, and that different things work for different people, but it seems like plenty of people are not so charitable and assume that there’s one right way to do software development, and that what works for their team must be the right thing for everyone else as well.

Here at Guidewire, our development teams use an open plan layout, and we do it because it works for us. We’ve got clusters of desks all around our engineering floor, with developers, product managers, and QA all mixed together in small groups that we call “pods” which collectively form our application and platform teams. The company started with an open plan setup from the beginning, and we’ve been anti-cube on the development side from the very beginning, even going so far as to pay to disassemble and store the cubes that were already set up in our current offices when we moved in. At first the open plan takes a bit of getting used, but over the years I’ve really come to like it, and I really have a hard time imagining working another way.

In my opinion, communication is actually one of the hardest problems to solve in software development, at least once you have a team of any decent size where knowledge is spread across a number of individuals. That problem is magnified when you’re in an industry-specific business like we are, where the developers aren’t experts in the field the software is targeting. As a result, we have to rely on product managers and subject matter experts to make decisions about how features should work and what should be prioritized; while I might be able to make reasonable decisions for myself about how a feature of an e-mail client should work, since I use one all the time, I can’t on my own make reasonable decisions about how a policy administration system should function, and it’s important that I talk to someone who does know what to do. Furthermore, that communication channel needs to be high-bandwidth and constant: you can’t just decide on requirements up front and then go work for a month, because there are a million small decisions that need to be made every day and many of them should really be made by an expert and not arbitrarily by someone who isn’t an expert user of the system.

The communication between the other members of the development team is also critical, though. The QA team needs to know as much about the product as the product managers or the developers, and they need to be able to understand the customer perspective in order to properly exercise the features and to understand the design of the feature so they can know if what’s happen is correct and intentional. They need to be able to ask questions of the product managers and developers as they go, and they need to be looped in as even small-scale decisions are being made. The developers also need to talk to each other on any reasonably-sized system in order to share knowledge about areas of the code, transmit best practices, and help get each other unstuck.

A lot of failures of software projects are failures of communication. With a team of highly competent engineers, the chance that any one person will do anything catastrophically stupid in isolation is pretty much 0. If I go off to implement a hash table, I can pretty well make sure that the thing works before I hand it off to other people. The far, far more likely causes of failure lie on the boundaries between people: things don’t get done because of a communication breakdown where two people each thought the other person was doing it, or the wrong features get built because the developer didn’t understand the use cases and product direction, or the team can’t expand enough because there’s not enough knowledge sharing, or the features themselves are right but the work wasn’t prioritized correctly and too much time was spent on relatively inconsequential things. Mitigating most of those risks requires optimizing for communication between the right parties, and part of ensuring that communication happens is setting up the right environment.

An open office plan is one way to optimize for that kind of communication. It’s not the only way to do so, of course, but if you have the luxury of physical proximity it can work well. Physical proximity has another benefit as well: you develop better relationships with the people around you. Cubes and offices can be very isolating, depending on the environment, and simply being around the other people on your team tends to lead to better working relationships, which improves both the work environment and makes communication even better. If you talk with people a little bit all day, asking questions and brainstorming and making jokes, assuming you like those people it can lead to a really good work environment.

The most common concern I’ve heard about the open plan office is that it’s too easy to get distracted; that concern is usually then followed up with some reference to how it takes 15 minutes to achieve “flow” again after being interrupted, so if you’re interrupted just once per hour that kills 2 hours out of your daily productivity. Of course, with an open office with conversations going on all the time and the ease of asking questions of other people, we all must be getting interrupted all the time, right? How can we possibly get anything done? We must be killing ourselves! If only we all worked in splendid isolation in offices, with very rare, pre-arranged meetings to hash out details, we’d certainly be way more productive!

But the reality is, it doesn’t actually work out that way, at least not for me. It’s true that interrupting someone can remove them from “flow,” but not all interruptions are created equally, and not everyone is in a state of “flow” all the time anyway. But supposing even then that a five minute conversation does result in a 15 minute loss of productivity for the person being interrupted: even then, it’s worth it if that 5 minute conversation saves someone a couple of hours of fumbling around on their own (“don’t reinvent the wheel, I think Bob did something similar last month, so just go talk to him and see what he did”) or, even more dramatically, saves someone from several days of going off in the wrong direction on something. It’s very easy to feel productive when you’re just writing code, but it’s a poor measure of productivity if you’re doing stuff that doesn’t need to be done or doing it sub-optimally. Most people also eventually develop some strategy for dealing with the distractions: they wear headphones or earplugs much of the time, or they work remotely sometimes so they have fewer distractions. Techniques like pair programming also help reduce the impact of distractions; one member of the pair can answer questions while the other stays focused, and when the interrupted developer returns to the task at hand their partner can help them context switch back much more quickly.

I’m sure it’s also the case that some companies set up gigantic factory-like warehouse floors full of anonymous coder units that are conveniently herded together so they can be more easily lambasted en masse by some dictatorial manager . . . but that’s not exactly how we roll over here, and many companies like us choose open plans intentionally and thoughtfully because we really feel like it’s the best way for us to develop. Again, that’s not to say that this is the right thing for every team or every product, or that it’s the only way to do things. But it’s often a good way to do things, and it’s a well thought out way that’s intentionally set up to optimize for communication (and collegiality).


Follow

Get every new post delivered to your Inbox.

Join 36 other followers