Gosu’s Inconceivable non-ClassLoader (Take 1)

Two feature requests stand out on Gosu’s short list for our next milestone release. Primarily they stand out because the demand for them overshadows all other requests combined. And it happens that both features are blocked on what appears to be a virtual impossibility. To give you a hint as to their nature, if you’ve considered using Gosu in any of your current Java projects and chose not to, the absence of at least one of these features is probably why you passed over Gosu. If you’re thinking, “Hard to use Gosu from Java” or “No class files”, you’ve nailed it — as if you’re not thinking that.

Yep, Gosu currently does not produce Java class files on disk. There’s no command line compiler, instead the runtime dynamically compiles source to bytecode and defines Java classes directly from memory. While there are benefits in doing it this way, there are significant drawbacks to consider. Performance is an obvious problem. The extra time it takes to compile Gosu may impact startup time and introduce lag in initial usage of an application’s features. Web applications can mitigate the issue by “warming up” the type system where critical parts are preloaded before users log in. Unfortunately, this strategy won’t work for other kinds of applications like scripts and command-line tools where startup time can be crucial.

Closely related to the problem of not having class files is basic Java interoperability. Gosu interoperates seamlessly with Java… if Java is being used inside Gosu. Turn that around, however, and we have a lot of room for improvement; specifically Gosu classes aren’t directly usable inside Java. I’ll go into detail about this in a bit, but basically the core problem deals with the fact that Java’s class loader has no knowledge of Gosu’s class loader. This is the nature of class loaders in general where a Java application loader is oblivious of custom loaders created after the application initializes. Consequently we have no way of instructing a Java app loader to delegate to our loader. As a result calling or referencing Gosu from Java involves low-road solutions such as reflection or extracting intermediary Java interfaces from Gosu classes. Not quite the seamless experience we’re looking for, hence the demand for better Java-to-Gosu interoperability. And it all stems from having to define a special class loader.

To illustrate the problem consider the following two classes:

package com.abc; 
public class JavaClass {
  private GosuClass obj;
}

package com.abc 
class GosuClass {
  var obj: JavaClass
}

    +-------------+
    | System      |       Loads system classes
    |             +------>
    | ClassLoader |       eg. java/lang/String.class
    +-------------+
           ^
Ask parent |
           |
    +------+------+
    | Application |       Loads application classes
    |             +------>
    | ClassLoader |       eg. com/abc/JavaClass.class
    +-------------+
           ^
Ask parent |
           |
    +------+------+
    | Gosu        |       Loads Gosu classes
    |             +------>
    | ClassLoader |       eg. com/abc/GosuClass.gs
    +-------------+

(For the purposes of this example let’s take it on faith we can compile JavaClass; we’ll touch on that later. Also let’s assume Gosu is initialized, its class loader locked and loaded.)

Ok. Regardless of what kind of application we run (command line, web, etc.) the host environment loads our compiled Java classes for us from the class path. In our generalized example the Application ClassLoader loads JavaClass, but JavaClass fails to verify because com.abc.GosuClass can’t be found. This is because the application loader doesn’t delegate to our Gosu ClassLoader, it delegates upward to its parent – it has no idea the Gosu loader even exists. Conversely, if we explicitly load com.abc.GosuClass using the Gosu ClassLoader, the JVM has no trouble verifying the reference to JavaClass because the Gosu loader involves its parent, the Application loader.

Now you might think that if we go ahead and write class files to disk, there’d be no need for a special loader because the normal Java loader could load our class files — two birds, one stone. And you’d be right if all of Gosu’s classes could be statically compiled to disk. Ah, but there’s the rub. First, and I’ll spare you the particulars, there are a few lingering implementation details with Gosu’s runtime that on occasion rely on compiling bits of code dynamically; these will go away eventually, but we’re stuck with them for now.

More critically, Gosu’s parser and compiler are necessarily incorporated into the runtime. A few corner cases in the language require these services e.g., eval(), but mostly the open type system drives this design. Types in Gosu aren’t required to resolve as Java classes; instead Gosu provides abstractions for type loaders and type information. At runtime Gosu classes compile to conventional Java classes, yet other custom types that are not directly source code oriented e.g., Guidewire’s web framework, aren’t represented this way. Instead such types provide runtime type information for method, constructor, and property invokers. For instance, a web page method typically wraps a user-defined chunk of Gosu. Instead of requiring the web page framework to manage a bytecode oriented type system for all the snippets of Gosu associated with web pages, Gosu allows the snippets to execute loosely at runtime as programs or fragments. Thus a method invoker can simply instruct Gosu’s runtime to execute its anonymous chunk of Gosu, directly from source. But that chunk of Gosu will ultimately compile down to an anonymous Java class which is not discoverable from the Java application’s primary class loader.

The problem isn’t quite as knotty as it may seem. Because these particular chunks of Gosu are anonymous — they’re just nameless scripts — there’s no opportunity for Java classes to reference them by name. Basically this means the Java application loader should never be required to load any of these classes. Therefore we can get away with defining our own special class loader for these, which is what Gosu has done all along. There’s a deeper problem, however. Our special class loader parses and compiles these classes, which involves locking in our type system. Java also has locks for its type system; have a look at ClassLoader, note the synchronized methods especially on loadClass():

protected synchronized Class<?> loadClass(String name, boolean resolve)

Suffice it to say we flirt with deadlock every time our special loader does its job. We lessen the likelihood of this deadlock by ensuring we acquire both our lock and our class loader’s monitor atomically, if not we release both, wait a bit, and try again:

while( !TypeSystem.tryLock() ) {
  wait(100); // release this class loader's monitor obtained from the enclosing synchronized method
}
try {
  compile();
}
finally {
  TypeSystem.unlock();
}

Mission accomplished! Well… we’re not quite out of the woods here. What if our loader compiles a class that references another class from our parent loader, the application loader? Its monitor must be acquired too. We’re back to square one. It turns out deadlock arising from this particular scenario is quite rare, but it’s there, waiting. The only way to avoid it is to take our special loader out of the picture and somehow have the application loader compile and load our classes. Inconceivable.

But wait, there’s more. The JVM considers classes, abc.A and abc.B to be in different packages if they’re loaded in separate loaders. Thus even though A and B are declared in the same logical package, if A is a Java class and B is a Gosu class, because our special loader loads B, B will not have access to A’s package-protected (“internal” in Gosu speak) members. Likewise, A will not have access to B’s internal members. We currently overcome this limitation by generating bytecode for internal method calls and so forth as reflective calls. It works, but it’s an ugly hack with performance ramifications we’d rather avoid. Again the only way to get this right is for the Java class loader to do our bidding. Inconceivable!

Lastly, and most importantly, although we understand and appreciate the benefits of deploying class files, part of the appeal of Gosu is that it doesn’t require them. You can simply write code, say in your favorite text editor, and run it, no IDE or build step necessary, an aspect of dynamic languages our static language currently enjoys — some might say the best of both worlds. What’s more, legacy Gosu libraries don’t have class files. We’d like to support them as they are. So we’re back to square one yet again. Are we sure Java’s application loader can’t be made to somehow load our Gosu classes? “As I told you, it would be absolutely, totally, and in all other ways inconceivable.” (Vizzini to Inigo Montoya)

It would seem so. A class loader can only load the classes it’s designed to load, right? Some of you might be thinking of an unforgivable hack involving AspectJ or some such where we rewrite a portion of the class loader’s bytecode, tailoring it to load our classes. I like the energy, but let’s not; too many different class loaders out there to screw up and they’re all moving targets, it just won’t hold. We couldn’t take that approach anyway because Oracle’s Binary Code License Agreement prohibits the modification of system classes such as sun.misc.Launcher$AppClassLoader which we desperately need to control. Here’s another idea. A lot of application class loaders derive from URLClassLoader, even Java’s standard AppClassLoader does this. We could reflectively call addURL() and modify the class path dynamically. But what URL would we add? The whole point here is to get the class loader to resolve Gosu class names from source, not class files. The URLs we normally add are file system directories and jar files. What we need is a URL for… what?

A better question to begin with is, what does URLClassLoader do with the URLs as it attempts to resolve a class name. For instance, let’s say my class path is comprised of a single directory:

-classpath /my/application/classes

The application loader will have the corresponding URL:

file://my/application/classes/

URLClassLoader finds classes in a given URL using the protocol handler associated with the URL’s protocol. Java provides stock protocol handlers for standard protocols such as file, http, ftp, etc. A protocol handler’s primary responsibility is to establish a connection to a given URL in the form of a java.net.URLConnection. Most protocol handlers extend java.net.URLStreamHandler and implement openConnection( URL ). So, essentially, URLClassLoader delegates responsibility of finding a class or resource on its class path to the openConnection() methods corresponding with the protocol handlers of the URLs on its path. Given our single directory example, let’s say URLClassLoader handles a call to findClass() with argument “com.abc.Foo”, let’s assume our directory contains this class. First, the loader transforms the class name to a relative file name: “com/abc/Foo.class”. Then it creates a new URL for the class by appending the relative name of the class file to the directory URL:

file://my/application/classes/com/abc/Foo.class

Then it calls URL.openConnection() to establish a connection to the class file on disk. Next, the URL finds the protocol handler associated with the “file” protocol and delegates to it. From there the resulting URLConnection provides a stream to the contents of the file and the rest is history. I’ve skipped over a lot of implementation details but that’s the gist.

Interesting. The nature of protocol handlers is to find content wherever it exists. In the case of the file protocol it looks for files on disk. Likewise, the http handler finds resources at an address on the web. In theory we could define a handler for finding gosu classes in a Java application. How would that work? Let’s say we define our protocol as “gosuclass”. Fine.  Now we need to define our subclass of java.net.URLStreamHandler. But wait. Looking over the rules for adding a protocol handler it’s not so simple. Check out the curious process for adding a new handler:

http://docs.oracle.com/javase/1.4.2/docs/api/java/net/URL.html#URL(java.lang.String, java.lang.String, int, java.lang.String)

There are three options. The first option suggests we define our own URLStreamHandlerFactory for creating protocol handlers. The intention, I think, is to centralize control of protocol handler creation so we have a consistent experience with… protocol handling. Fair enough if you’re after controlling all of protocol handler creation, which we aren’t. Adding insult to injury the way we go about adding our factory pretty much rules out its use. From the documentation for URL. setURLStreamHandlerFactory():

“This method can be called at most once in a given Java Virtual Machine.”

Well, I’m pretty sure most web servers call this method way before we’ll get a crack at it. Which is what we want because we need to get a handle to the server’s factory so our poor factory can delegate to it for finding the web server’s other handlers. Since it can be called only once, we’re out of luck. Strike one!

Next on the list describes a convoluted convention where the system property for ” java.protocol.handler.pkgs” provides a list of packages containing handlers. The handlers must be named as follows:

<package>.<protocol>.Handler

Where <package> is one of the packages in the list and <protocol> is the name of the protocol and then, of course, Handler is the name of your URLStreamHandler subclass. We do all of this and depending on the execution environment (web server, rich client, command line) we get mixed results. The worst of it, on the web server it flat just doesn’t work, ClassNotFoundExceptions abound. What’s the deal? It turns out that the URL class tries to load protocol handlers using Class.forName(), which uses the caller’s class loader. Well this means URL’s class loader since the call originates from that class. On a typical web server Java system classes such as URL are loaded in a different loader than our application classes, therefore it never finds our Handler class. Strike two!

Our third and final option is a dud. Basically the same as the second one except the “default system package” is consulted for the Handler. Strike three! Yeerrr Out…

…wait a sec…

We’re not asking for much, we’d just like to add a bleeping protocol handler.  Where and how are these handlers managed inside Java? I mean how complicated can this really be?  Following the debugger it’s all too easy. The URL class maintains a static Hashtable (how retro) mapping protocol name to handler. That’s it. Couldn’t they just have… I’ll spare you my emotional diatribe regarding pragmatism in software design, or the disregard of it in the Java libraries. Anyway, all we need to do here is resort to a bit of reflection hackery, feel the requisite amount of shame, and move on. For your viewing pleasure:

private static void addOurHandler( Handler handler ) throws Exception {
  Field field = URL.class.getDeclaredField( "handlers" );
  field.setAccessible( true );
  Method put = Hashtable.class.getMethod( "put", Object.class, Object.class );
  put.invoke( field.get( null ), handler.getProtocol(), handler );
}

Now that we’ve force fed our protocol handler to Java we can confidently add our gosuclass URL to the class loader’s path. This part is easy albeit not entirely guilt-free as it involves another reflective call to protected method, addURL():

private static void addOurUrl() {
  URLClassLoader urlLoader = findUrlLoader();
  if( !(Arrays.asList( urlLoader.getURLs() ).contains( url )) ) {
    Method addURL = URLClassLoader.class.getDeclaredMethod( "addURL", URL.class );
    addURL.setAccessible( true );
    addURL.invoke( urlLoader, url );
  }
}

The protocol handler itself is pretty simple. It’s only task is to produce a GosuClassURLConnection, which does all the work :

protected URLConnection openConnection( URL url ) throws IOException {
  GosuClassesUrlConnection connection = new GosuClassesUrlConnection( url );
  return connection.isValid() ? connection : null;
}

GosuClassUrlConnection “connects” with the URL for a Gosu class. Connecting doesn’t involve anything more than resolving the name of the class, which it accomplishes by a call to Gosu’s TypeSystem.getByFullName( className ). If the type exists and is an instance of a Gosu class, the connection is successful. Actual compilation can wait until the class loader asks for the content of the class, which it obtains via a call to our URL connection’s getInputStream() method. This is where we dynamically compile the class and return an input stream for the resulting bytecode and in the process complete the “inconceivable” task of making the application loader do our work for us. No more special loader. Gosu classes, both as class files and source files, are now for all intents and purposes regular Java classes to the JVM; they are usable directly from Java code as Java code.  Hallelujah!

As an added bonus our compiler generates Gosu classes with a static block to initialize Gosu. Initialization includes the critical steps to inject our protocol handler and add the gosuclass URL to the app class loader’s class path. This may seem chicken-and-egg-like because static block execution is part of class loading. Right, but recall the other more dynamic bits of Gosu that are not written to disk. Also recall we support both modes of execution: from source and from class files — the source file-based classes need to load too. The idea is that Java application developers incorporating Gosu, essentially meaning applications executed with Java.exe and not Gosu.exe, will likely choose to compile the Gosu classes they create in their projects to class files. As such with Gosu app code precompiled developers aren’t bothered with wedging Gosu initialization in their app’s to support dependencies on Gosu code not compiled to class files. In other words, the first Gosu class file that loads will automatically setup Gosu so that source-based classes and such will be loaded via our protocol handler. The only requirement for using Gosu in a Java app is that the core Gosu jars be included on the class path. Other than that there should be no difference whether your Java app’s classes are Java or Gosu.

That about wraps it up. Essentially what we’ve done is implement a virtual class loader in the form of a URLConnection. It does the work of finding and producing bytecode for a given Gosu source file; everything a real class loader does aside from actual class definition. Which brings us back to what were after all along. Unbeknownst to the Java application class loader, it is now loading and defining our Gosu classes simply by deferring to our clandestine gosuclass protocol handler. Gosu no longer has a “special” class loader” and now sits at the cool table with the other Java classes. “HE DIDN’T FALL?  INCONCEIVABLE.” (Vizzini to Inigo Montoya)

There were some things I deliberately glossed over or just didn’t cover here. Mostly I avoided compiler stuff dealing with how we compile Java with Gosu class references in it. Well, most of this is still a work in progress. I have a side project where I’m tricking javac into compiling Java with Gosu type references in it. I’ll write about that next, but in summary javac can be made to compile with Java and Gosu intertwined. In fact with the technique I’m using Gosu’s entire type system can be used from within Java with no modifications made to javac. I also skipped over IDE related specifics. Each IDE has its own way of supporting (or not supporting) multi-language integration. The IntelliJ IDE plugin we’re currently focused on has all the support we need — I have another side project with Gosu-in-Java integration limping along in IJ. In fact the guys at JetBrains graciously fixed a blocking bug for me in their most recent release — a two week turnaround!  Those guys are awesome.


13 Comments on “Gosu’s Inconceivable non-ClassLoader (Take 1)”

  1. sayewich says:

    HOW WONDERFUL!!!!

  2. sayewich says:

    Great Post!!!!!

  3. jpcamara says:

    Great to get some insight into how you guys are accomplishing this and that it’s in the works. Since you guys are reflectively handling this, will there be portability issues across jvms?

    Any roadmap for when it’ll end up in open source gosu or perhaps an edge release? :) sounds like its early days so not expecting anything soon, just interested.

    • Scott McKinney says:

      Hey JP. There shouldn’t be any portability issues. The reflective calls we’re making should work across all Sun VMs as well as IBM’s. However, we currently don’t handle the case when the app loader isn’t a URLClassLoader. For instance, in an OSGi environment. We’ll eventually handle that case; we just haven’t gotten to that point yet.

      We’ll try hard to finish this up and deliver it asap, but we’re currently spending most of our time doing the IntelliJ plugin stuff, which isn’t going to be ready for another drop for some time.

      Thanks for the interest! :)

  4. Jeff says:

    Can you be sure that all JVMs implement their protocol handlers in the same manor? “The URL class maintains a static Hashtable (how retro) mapping protocol name to handler”

    This seems a pretty fragile injection point. The good news I guess is that as long as Oracle maintains its grip on production JVMs things won’t change.

    • Scott McKinney says:

      In theory, yes, quite fagile. There’s no guarantee Oracle won’t change java.net.URL, but it’s pretty well dug in. That Hashtable is older than I am :| If I were a betting man, I’d bet URL keeps that field for as long as Java remains viable. Of course, I could be wrong.

      • Alan Keefer says:

        Jeff’s comment was that while the IBM JDK has to obey the same API for the URL class, the actual implementation could be significantly different from the Oracle JDK’s implementation. You’d actually need to run Gosu against the IBM JDK to see if it works, and potentially devise a different hack in that case. Same goes for JRocket, I think.

      • Scott McKinney says:

        Right. Most VMs I’m aware of including IBM’s and JRocket’s don’t change the stock java.* API classes all that much; URL’s static Hashtable is indeed in both… although my copy of IBM’s jdk is quite out of date…

        Ok, I just verified IBM’s JDK v6.0 does have the static Hashtable. Bases covered.

  5. Jon Jagger says:

    You’ve made a serious error in this post ;-)
    It’s Inigo Montoya not Anigo Montoya.

  6. It has been almost a year since this blog entry was written. What’s the status of this work? And will Gosu classes every compile to class files?

    • Scott McKinney says:

      Yes, the Gosu IntelliJ plugin currently compiles Gosu classes to disk. This summer we’ll release an updated version of the plugin that is compatible with IJ 12.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 39 other followers