Jim Blackler

Posted by jimblackler on Oct 12, 2008

Consider a function that collects and returns a list of results. It might look like this:

public ArrayList<String> retrieveAll(long requesterId);

Or even better, this, so it could use anything that implements Iterable<>:

public Iterable<String> retrieveAll(long requesterId);

However, storing the results in a container which is then returned may be inefficient for the application. It may be better to enable the calling code to act immediately on each collected result rather than first waiting for all results to be collected and stored. If that were possible..

No memory would be used to store a list.
Results could be presented to the user straight away.
Calling code could abort the collecting process part way through based on its own logic – for instance if it already has more results than it can handle.

Custom iterators

One way to avoid an intermediate list is for the collecting function to construct a custom Iterable<> object, which constructs Iterator<> objects that contain the collecting logic. Each execution of next() calculates the next value and returns it straight away for the calling code.

The difficulty for the programmer is that this may require a significantly different structure of the collecting code. This code has to store any state from result to result as private variables inside the iterator. Call stack is reset with each result, so algorithms that use recursion are not possible.

More convenient for the programmer would be a technique that allows the collecting code to keep control of the machine state during the collecting process.

Yield return

C# has ‘yield return’ which allows a function to be structured exactly as if it is building a list. It may keep its own state in local variables and the call stack, but still present each value to the calling code immediately as it is found.

Java doesn’t have yield return, but another pattern that could be used is this:

public void retrieveAll(long requesterId, ResultProcessor<String> collector);

Where ResultHandler<> is a simple interface to an object invoked by the collecting code to return individual results as they are collected.

public interface ResultHandler<T> {
void handleResult(T value) throws CollectionAbortedException;
}

It is left up to the calling code what ResultHandler<> implementing object to supply, and how to handle the data. Anonymous classes would be quite neat here. For instance, a caller could output the results to a console like so:

retreiveAll(myId, new ResultHandler<String>(){
public void handleResult(String value) {
System.out.println(value);
}
});

The collect() function may abort the collecting operation part way through by throwing a CollectionAbortedException.

The disadvantage to this approach is that it is a little less convenient from the perspective of the calling code. The function has a novel pattern, taking ResultHandler<> as a parameter rather than the more familiar Iterable<> as a return value. Logic has to be embedded inside an anonymous or specially constructed ResultHandler<> class.

Unlike the case of returned Iterables, the calling code can not use ‘for each’ loops on the result. Nor can it store or pass around the Iterable<> to other systems as a parameter.

Solution : The yield adapter

A best-of-both-worlds solution would allow a collecting method based on ResultHandler<> to be automatically adapted at run time to an implementation based on Iterable<>. A data-collecting function implemented in this form (convenient for the collecting code) …

public void retrieveAll(long requesterId, ResultHandler<String> collector);

.. would be adapted into this form (convenient for the calling code)..

public Iterable<String> retrieveAll(long requesterId);

This would provide the same benefit C# programs get from yield return. Namely, the ability for both collecting code and calling code to have their own machine state and callstack control throughout the entire collecting and processing operation. It is not necessary for either side to separate logic into methods inside anonymous classes. Collecting code can use complex recursion algorithms. Calling code can store and defer use of the Iterable<> used to collect results, or pass them as parameters to other functions.

Approach	Collector controls flow	Caller controls flow	Requires list	Caller can abort collect	Uses new thread
Collector builds and returns list in form of Iterable<>	yes	yes	yes	no	no
Collector returns Iterable<> containing logic	no	yes	no	yes	no
Caller passes in ResultHandler<>	yes	no	no	yes	no
ResultHandler<> wrapped to Iterable<> with yield adapter.	yes	yes	no	yes	yes

How it works

I am not the first engineer to attempt to bring yield to Java. Aviad Ben Dov’s article in 2007 http://chaoticjava.com/posts/java-yield-return-code-published/ describes a way to do this using bytecode manipulation and classloader modification. My view is that a solution in Java alone would be more practical as a portable library.

My starting premise was that if calling code and collecting code are both to have their own call stack, this could only be achieved with the use of multiple threads.

In addition, I was aware of a Java collection SynchronousQueue which is designed to allow two threads to pass values between each other, each in turn yielding control to the other.

The Yield Adapter simply makes a new custom Iterable<> which is returned to the calling code. The iterable creates iterators on demand, which also starts a new thread for the collection, and makes a SynchronousQueue for communication between the two. Results are wrapped in message object and, .put() into the queue on the collecting side. The iterator side uses .take(). There is also some extra logic to ensure that incomplete reads do not result in leaked resources.

Please note that although the adapter uses threads, the code does not normally need to be ‘thread safe’ in the classic sense. That is because there is never a time that both threads are executing simultaneously. One thread always ‘yields’ to the other. The exception to this is if multiple iterators were ever in effect at once.

Source and demos

The adapter source can be found here with the interfaces employed all here.

A demo here shows how the adapter can be used to allow a recursive scan of files and directories, returning the result through an iterator.

A more complex demo here returns all possible ways the letters in a word can be rearranged, without repeats.

The library source can be downloaded with Subversion or your browser here and the tests and demos are here.

Feedback

I hope that this small library will allow people to develop complex algorithms that present results to calling code as iterators.

If you have any comments on the code, please add them as comments on this article.

Update Feb 2009

Many thanks to Dominic Lachowicz who brought to my attention the fact that the SynchronousQueue does not in fact stop both threads running at the same time. A revision checked in today uses a Semaphore object to achieve the desired behaviour.

8 Comments »

Aviad Ben Dov:

This is great. My first idea was to use a threading system like yours, and I eventually reasoned to create the yielder the way I did for two reasons:
1. It was more efficient, resource-wise. When dealing with tree-searching model, for example (see my collections example), you can find yourself in a canonical-yielder situation. With a threading model, you could quickly use the computer’s resources. You end up with a lot of locks (due to the SynchronousQueue) and a lot of events running around your application.
2. It opens the path to adding a mechanism to a pre-process compiler (like APT), or even to OpenJDK, to be done in the C# way: in compile-time. The threading method works but its a workaround for a missing feature.

Don’t get me wrong: effectively, the threading solution works (as mine is broken) and it probably takes less time to develop and understand, so kudos! :)

October 12th, 2008 | 7:12 pm

jimblackler:

Thanks a lot for your comments Aviad. And also for your original article which I found very interesting. I confess I haven’t profiled my implementation for performance, and it does seem likely this may make this technique unsuitable for some applications.

October 12th, 2008 | 7:56 pm

Howard Lovatt:

I think that Phasers (http://www.cs.rice.edu/~vs3/PDF/SPSS08-phasers.pdf) are going to be part of Java 7 and I think these can achieve the same aim of generating a list lazily, but via a slightly different paradigm.

October 15th, 2008 | 10:23 am

Neal Gafter:

See http://gafter.blogspot.com/2007/07/internal-versus-external-iterators.html for another take at this problem.

November 5th, 2008 | 6:30 am

Chris:

I don’t know if you’re interested, but I made a quick change to allow runtime exceptions thrown in a collect method to be re-thrown by the threaded yield adapter.

diff -r 0107610716ee src/net/jimblackler/Utils/ThreadedYieldAdapter.java
— a/src/net/jimblackler/Utils/ThreadedYieldAdapter.java Tue Apr 12 13:42:43 2011 -0400
+++ b/src/net/jimblackler/Utils/ThreadedYieldAdapter.java Tue Apr 12 13:51:27 2011 -0400
@@ -27,6 +27,16 @@
private class AbortedMessage extends StopMessage {

}
+
+ private class ExceptionMessage extends StopMessage {
+ RuntimeException cause;
+ ExceptionMessage(RuntimeException cause) {
+ this.cause = cause;
+ }
+ RuntimeException getCause() {
+ return cause;
+ }
+ }

/**
* The vehicle to pass the actual values.
@@ -103,6 +113,8 @@
// to receive it, and the thread will block.
synchronousQueue.put(new AbortedMessage());
}
+ } catch (RuntimeException e) {
+ synchronousQueue.put(new ExceptionMessage(e));
}

} catch (InterruptedException e) {
@@ -140,6 +152,9 @@
try {
returnQueue.put(new Object()); // allow other thread to gather result
messageWaiting = synchronousQueue.take();
+ if (ExceptionMessage.class.isAssignableFrom(messageWaiting.getClass())) {
+ throw ((ExceptionMessage)messageWaiting).getCause();
+ }

} catch (InterruptedException e) {
messageWaiting = new EndMessage();

April 12th, 2011 | 6:56 pm

Olivier Mélois:

This is absolutely interesting.

Have your code been published under an open-source license ? I’m working on a project to allow the creation of easily customizable model iterators. The project has not really started yet, but it will in a few days. I’m thinking about re-using some of your code, would you allow it.

Great article :D

June 1st, 2012 | 10:51 am

jimblackler:

Thanks Olivier, consider it public domain.

June 1st, 2012 | 1:10 pm

Viliam Simko:

I think you have a typo on line:
public void retrieveAll(long requesterId, ResultProcessor collector);

It should be:
public void retrieveAll(long requesterId, ResultHandler collector);

June 15th, 2015 | 11:31 am

Archives

Categories

Bringing ‘yield return’ from C# to Java

Posted by jimblackler on Oct 12, 2008

Custom iterators

Yield return

Solution : The yield adapter

How it works

Source and demos

Feedback

Update Feb 2009

8 Comments »

Pages

Blogroll

Recently

Meta