archived posts

Parallel processing and Google collections

In java on April 27, 2010 at 10:17 pm

If you have been using google collections lately, chances are you will have been using one of the transform static methods available to you.

  List<String> source = Lists.newArrayList("a", "b", "c"); 	
  ...
  Iterables.transform(source, Functions.toInt());

By default the above transform will execute your list transformations on the current thread. While this approach works well and is in fact preferable for the majority of cases you may find yourself in a situation where you need to leverage the power of concurrency.

Some examples that might benefit from concurrency:

  1. Transformations that interact with the network.
  2. Transformations that read from a database.
  3. Transformations that require significant computation.

The first two examples provide excellent opportunities for exploiting concurrency. These are both likely to trigger a low level network operation that will cause the current thread to block until a remote process has completed. This is valuable time that could be used to spin up another thread and do something useful while the first one is blocked.

That sounds like , but how do we go about introducing some concurrency to our collection processing? Well, we will need to implement our own version of the transform method that processes elements using a number of threads.

First things first, what does transform actually do? It’s pretty simple and can be boiled down to something like this:

public <F, T> List<T> transform(List<F> src, Function<F, T> fn) {
  List<T> result = Lists.newArrayList(); 
  for (F element : src) {
    result.add(fn.apply(element));
  }
  return result; 
}

Nothing too complicated, grab each element in the list and pair it with the given function. The google version is a lot more elegant in reality but the main point is to show how the transformation is applied.

  ...
  fn.apply(element)
  ...

In order to make our element transformation concurrent we first need to wrap it up in something that can can be offloaded to another thread. For this the Callable interface introduced in Java 5 is ideal.

public <F, T> List<T> transform(List<F> src, Function<F, T> fn) {
  ...
  for (F element : src) { 	
    Callable<T> transformTask = new Callable<T>() {
      public T call() throws Exception { 
        return fn.apply(element); 
      }
    };
  } 
  ...
}

Once transformations are represented as callable objects we need to ship them off to something that can coordinate a bunch of threads to get them all executed. The ExecutorService is a nice, easy way to get this done.

public <F, T> List<T> transform(List<F> src, Function<F, T> fn) {
  ExecutorService service = Executors.newFixedThreadPool(3);
  List<Future<T>> resultFuture = Lists.newArrayList();
  for (F element : src) { 	
    Callable<T> transformTask = new Callable<T>() {
      public T call() throws Exception {
        return fn.apply(element); 				
      }
    };
    resultFuture.add(service.submit(transformTask));
  }
  ...
}

The ExecutorService returns a Future which allows us to track the progress of our callables. After we have scheduled our transformations we want the current thread to wait until everything has been completed. Future.get provides exactly this behaviour.

public <F, T> List<T> transform(List<F> src, Function<F, T> fn) {
  ExecutorService service = Executors.newFixedThreadPool(3);
  List<Future<T>> resultFuture = Lists.newArrayList();
  for (F element : src) {
    Callable<T> transformTask = new Callable<T>() {
      public T call() throws Exception {
        return fn.apply(element);
      }
    };
    resultFuture.add(service.submit(transformTask));
  } 
  List<T> result = Lists.newArrayList();
  for (Future<T> future : resultFuture) {
    try {
      result.add(future.get());
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }
  return result;
}

And that’s pretty much it, our transformations will be queued up into the executor service and executed in parallel. The current thread will block until the last transformation is complete and then the list of transformed results will be returned as expected.

Now all you need to do is replace the relevant imports of the google transform method with your own concurrent version and your done. If you are using compacted-collections however, things are even easier. You can switch to parallel processing mode with a single fluent method call. It’s as easy as selecting the number of threads you want:

// Standard transformations
FluentList.listOf("a", "b", "c").map(Functions.toInt());

// Parallel transformations
FluentList.listOf("a", "b", "c").parallel(3).map(Functions.toInt());

This will apply the parallel processing strategy to all subsequent transform and filter operations. Everything works exactly as it would with the default single threaded strategy. This allows you to start out with lightweight single thread approach and easily move up to the big guns if and when performance starts to become a problem.

Check it out, I’m no concurrency expert so any feedback on areas for improvement is welcome.

Google collections and JavaBeans

In java on April 21, 2010 at 8:47 pm

In this article I want to show you some useful techniques to simplify the process of using google collections with your JavaBeans.

Examples will be based around the manipulation of the following customer data.

  public class Customer {
    private String name;
    public Customer(String name) {
      this.name = name;
    }
    public String getName() {
      return name;
    }
  }
  private List<Customer> customers = newArrayList(
    new Customer("shanon"), 
    new Customer("fred")
  );

Extracting a list of property values

If you want to extract a list of names from your customers you might start with an implementation like this:

  import static com.google.common.collect.Lists.newArrayList;
  import static com.google.common.collect.Lists.transform;
  ...
  @Test
  public void listCustomerNames() {
    Function<Customer, String> toName = new Function<Customer, String>() {
      public String apply(Customer customer) {
        return customer.getName();
      }
    };
    List<String> expected = newArrayList("shanon", "fred");
    assertEquals(expected, transform(customers, toName));
  }

It’s very likely that you will be writing this sort of code a lot. You might need to extract a list of customer emails for a marketing campaign. You might need to extract a list of customer ages so you can calculate the average age of your customers… You get the idea.

A lot of the time your simply going to be extracting a property value from a JavaBean. If your not scared of using a little reflection you can eliminate a lot of this tedious code and still maintain a reasonable level of type safety.

compacted-collections provides a simple reflection based solution that you can use, however it’s not difficult to write your own.

  import static com.google.common.collect.Lists.newArrayList;
  import static com.google.common.collect.Lists.transform;
  import static com.compactcode.Functions.toPropertyValue;
  ...
  @Test
  public void listCustomerNamesUsingReflection() {
    Function<Customer, String> toName = toPropertyValue("name");
    List<String> expected = newArrayList("shanon", "fred");
    assertEquals(expected, transform(customers, toName));
  }

Filtering a list on a property value

If you want to filter a list of customers by name you might start with an implementation like this:

  import static com.google.common.collect.Iterables.find;
  ...
  @Test
  public void findCustomerNameEqualToFred() {
    assertEquals("fred", find(customers, nameEqualTo("fred")).getName());
  }
 
  private Predicate<Customer> nameEqualTo(final String value) {
    return new Predicate<Customer>() {
      public boolean apply(Customer customer) {
        return customer.getName().equals(value);
      }
    };
  }

Inside this predicate we are converting our customer to a name and then performing an equals on that name. But again this is the sort of code your going to be writing a lot so we need something that’s a bit more reusable.

Google collections provides a handy way to for us to combine existing functions and predicates in interesting ways that help promote code re-use.

Using composition we can combine our existing function for mapping customers to names with a predicate for matching strings like this:

  import static com.google.common.base.Predicates.compose;
  import static com.google.common.base.Predicates.equalTo;
  ...
  private Predicate<Customer> nameEqualTo(String value) {
    return compose(equalTo(value), toName);
  }

If you are using compacted-collections you can achieve a similar result with the convenience methods that do the composition for you:

  import static com.compactcode.FluentList.fluent;
  import static com.google.common.base.Predicates.equalTo;
  ...
  @Test
  public void findCustomerNameEqualToFred() {
    assertEquals("fred", customers.find(toName, equalTo("fred")).getName());
  }

Or, if you wanted to avoid using composition altogether you could just use a Hamcrest matcher that achieves the same result:

  import static com.compactcode.FluentList.fluent;
  import static org.hamcrest.beans.HasPropertyWithValue.hasProperty;
  import static org.hamcrest.core.IsEqual.equalTo;
  ...
  @Test
  public void findCustomerNameEqualToFred() {
    Matcher<Customer> isFred = hasProperty("name", equalTo("fred"));
    assertEquals("fred", customers.find(isFred).getName());
  }

Well, that’s all I’ve got for now. I don’t really feel like I’ve done these topics justice, especially composition but perhaps it is enough of a spark to start a fire 🙂

Google collections made easier

In java on April 19, 2010 at 10:20 pm

Due to legal/ethical objections around trademark violation the project presented here has been renamed to compacted-collections.

I spent a few hours over the last two weeks working on a little project that aims to make google collections even easier to use. I set out to make a thin wrapper that mimics the way the ruby programming language does collections.

The result is compacted-collections and this is what you can do with it…

Sorting:

import static com.compactcode.FluentList.fluent;

fluent(2, 1, 3, 4).sort();
fluent(1, 2, 3, 4).reverse();

Filtering:

import static com.compactcode.FluentList.fluent;
import static org.hamcrest.number.OrderingComparisons.greaterThan;
import static org.hamcrest.text.StringStartsWith.startsWith;
import static com.google.common.base.Predicates.equalTo;

fluent("foo", "bar", "baz").filter(startsWith("ba"));
fluent(1, 2, 3, 4).filter(greaterThan(2));
fluent(1, 2, 3, 4).find(equalTo(3));
fluent(1, 2, 3, 4).first();
fluent(1, 2, 3, 4).last();

In order to provide a bunch of out of the box predicates the project has been tightly integrated with the Hamcrest project. You can use hamcrest matchers interchangeably with predicates.

Transformation:

import static com.compactcode.FluentList.fluent;
import static com.compactcode.Functions.toInt;
import static com.compactcode.Functions.sum;

fluent("1", "2", "3", "4").map(toInt());
fluent(1, 2, 3, 4).reduce(sum());

Like google collections everything statically typed, so you will get instant feedback about your mistakes. The really nice thing though is that the interface is fluent so you can chain together operations in a nice readable way.

Transformation + Filtering + Sorting:

import static com.compactcode.FluentList.fluent;
import static com.compactcode.Functions.toInt;
import static org.hamcrest.number.OrderingComparisons.greaterThan;

fluent( "4", "1", "2").map(toInt()).filter(greaterThan(1)).sort();

Ignoring the imports it doesn’t really read much different from a pure ruby implementation…

["4", "1", "2"].map{|it| it.to_i}.select{|it| it &gt; 1}.sort()

If you’re interested in having a look the code it’s all available on github. You can clone it, download it or fork it to your hearts content. The project is built using maven so it should be easy to get it working in your favourite IDE.

I’ll be posting more about the features of the library over the next weeks so if you like what you see and you’d like something explained or improved, let me know.

Enjoy!