Learnings from Using a Reactive Platform – Akka/Squbs

By

Introduction

Simple and reliable APIs are important for winning customer trust in all industries. This is especially true when the API is a customer facing endpoint that accepts payment instructions. Add to that the ability to submit 15,000 payments in a single request and reliability becomes critical.

Let’s talk a bit about Payouts – a PayPal product through which users can submit mass payment requests. It’s an asynchronous interaction – a request with multiple payment instructions returns a token. The caller can then use the token to poll for the payment outcomes or listen for webhooks.

As you can imagine, customers like to be sure we got each request right and want to be kept up to date on the status of the money movements. The actual request processing involves orchestrating calls to a slew of services ranging from ones that provide information on recipients to services that perform the actual money movement.

We recently finished work on improving the features offered by the product and on further improving its reliability. One of the changes we made was to use Akka to build the payment orchestration engine as a data flow pipeline. Discussed below are some learnings from that exercise.

Concurrency – Harder Than It Needs to Be?

Object-oriented design does a great job of capturing the static aspects of a real world system with entities, their attributes and relationships. These aspects can be translated directly to Java artifacts – classes with variables using inheritance and composition.

Real world entities also have a dynamic nature to them – the way in which they come to life over the lifetime of the application and how they interact with each other. Java unfortunately does not have a sufficiently nuanced abstraction for representing such changes in states of entities over time or their degree of concurrency. Instead, we have a CPU’s (hardware) model of the world – threads and processors, that has no equivalent in the real world. This results in an impedance mismatch that the programmer is left to reconcile.

Some improvements in the building blocks for representing concurrent aspects came with Java 8. While we didn’t get a better model, we did get some elegant constructs that reduced the pain of using Java for such tasks.

Lambdas made multithreaded Java code actually readable. Java 8 introduced the ability to compose futures.

And, we got streams. Streams trace their roots back to a programming paradigm from the 1960s called data flow programming. The model lets us compose a series of transformations on a dataset as a pipeline. While we concentrate on the logical composition of a data processing job, the framework is tasked with figuring out the physical orchestration of parallel processing for the job.

Java’s implementation of streams was designed with the expectation that it would not have to deal with high latency (typically IO). This makes it unsuitable for building services. But, reactive frameworks such as Akka provide implementations that account for high latency tasks such as service invocations within a stream.

Data flow programming using streams and the actor model provide great conceptual models for reasoning about the dynamic aspects of a real world system. These models represent a paradigm shift once adopted. We can use them to represent concurrency using abstractions that are intuitive. The same model can be represented in Java as well and brought to life by the underlying platform, without developers having to manually code for concurrency using lower level primitives.

Learnings From Using Reactive Streams and the Actor Model

To set the context, the primary design goal of the system is to reliably accept and process payments. That means we maintain an accurate record of progress and prioritize features such as check pointing & auto-recovery over response time and throughput. That does not mean the solution can be “slow” though. The difference in emphasis only means we need to get more done in the same amount of time.

Below are some of the learnings we had over the course of the project as it relates to reactive streams and using the actor model. The ideas come from a variety of sources including other teams who had used Akka before us, the team which maintains Squbs (Akka with PayPal operationalization) as well as our own experience.

Foundational Patterns for Reuse

Over time, there are a bunch of stream patterns that recur throughout the application. It helps to create a pattern library as we encounter them.

  • Select between two flows based on a predicate
  • Select one of many flows based on a decider function
  • Bypass running a flow and short circuit if an error was encountered in a previous stage
  • Retry an operation based on a decider (see one from Squbs here)
  • Akka provides some patterns as well – akka-stream-contrib

An example of a flow selector that selects one of many flows based on a decider function would be:

public static <M, N> Flow<M, N, NotUsed> selectByIndex(ActorSystem system, Function<M, Integer> numericPartitioner, Flow<M, N, NotUsed>... innerFlows) {
 return Flow.fromGraph(GraphDSL.create(builder -> {
   // given a data element in the stream, return the index of the flow to be used to process it
   UniformFanOutShape<M, M> top = builder.add(Partition.<M>create(innerFlows.length, msg -> {
     int partition = numericPartitioner.apply(msg);
     return partition;
   }));
   // merge the output of the individual flows back to the primary flow
   // this is made possible by the use of a homogeneous envelope type across the flow
   akka.stream.scaladsl.MergePreferred.MergePreferredShape<N> bottom = builder.add(MergePreferred.<N>create(innerFlows.length - 1));
   int index = 0;
   for (Flow<M, N, NotUsed> flow : innerFlows) {
   FlowShape<M, N> tgt = builder.add(flow);
   builder.from(top.out(index)).toInlet(tgt.in());
   //last flow is the preferred flow
   builder.from(tgt.out()).toInlet((index + 1 == innerFlows.length) ? bottom.preferred() : bottom.in(index));
   index++;
 }
 return FlowShape.of(top.in(), bottom.out());
 }));
}

While individually most of these helpers are not complex to create, they save a lot of effort for the team over time. This is especially true for ones such as a predicate based selector that show up everywhere. These otherwise would have to be written as graph stages. With helpers, we can now use flow composition to put them to use and focus on writing and certifying the business logic alone.

Failure is the Message

It helps a great deal to use a homogenous data type to transfer information on success and failure down the stream without aborting processing. To achieve that, we model stream elements as envelopes with business information and metadata about the status of previous transformations.

For example, say we could not reach a service as part of a stream stage. We can embed a Try or an optional exception in the current message when that happens. A subsequent transformation may choose to not run and short-circuit on seeing the failure while a stage that updates the database could choose to participate to record the outcome. A stream stage’s behavior of not participating in processing if an exception was recorded should be provisioned via flow composition rather than having each stage run the same check.

Use Actors when Warranted

Akka streams should serve as the default model for most use cases. Code modelled as a stream is not only performant but also simple to create and maintain. But, sometimes actors provide advantages that are difficult to achieve otherwise.

The primary use case for actors is, of course, to maintain state and to perform tasks that benefit from supervision. Actors can work with a single stream or across multiple streams for use cases such as aggregation, performance or business statistics and even centralized logging via an event bus. They are also a natural choice when you want to separate processing of non-critical tasks away from a critical pipeline. They are also useful for implementing blocking tasks (on a separate thread pool) especially if we don’t need feedback on the outcome.

Stream Decomposition Using Actors

There are advantages in decomposing a large stream into smaller streams that are anchored in chained actors. We used such a model to reduce the solution complexity while servicing the requirement to checkpoint and auto-recover. Each of those actors act as a checkpoint from where processing can resume in case of a failure.

We can model check pointing within the stream as well. But, using chained actors with smaller and more focused streams made the code simpler to write and easier to maintain. This is because the branching introduced by features such as check pointing are orthogonal to business logic. They compound the complexity already being expressed in the stream and tend to not be amenable to local optimizations like flow composition.

If you are designing for high velocity or performance, you are going to want to use a single stream. But, for most other applications, leveraging actors and streams together can help simplify the design of complex streams.

Platform Lock-in and Portability

If we can provide a reasonable level of isolation between the actual business logic and the code that strings them together and orchestrates the flow, we can retain a good amount of flexibility in moving to a different implementation of reactive streams or the actor model.

One area that needs more attention though is Akka HTTP. One of Akka’s key strengths as a reactive platform is the way in which Akka HTTP is tied in with Akka Streams. Together they deliver a lot of value for developers as a simple cohesive package. But, in the context of portability, it makes sense for us to build a layer of abstraction on top of the Akka HTTP client. This serves two purposes –

  • Airgap developers from Akka HTTP to enable changing the HTTP client provider at a later point in time
  • A layer of indirection enables provisioning a richer set of features that’s comparable to JAX-RS – marshalling/unmarshalling, custom retry. The flip side is we lose advanced features such as client side HTTP response streaming.

Conclusion

We’re in the midst of a significant transformation in the way we write Java applications. Functional programming in Java 8 is fairly well understood by now.  Reactive programming, given the massive adoption Java streams had, has established itself as indispensable. Reactive streams have been added to the Java 9 SDK.

Broadly, these are 3 class of applications that benefit from adopting a reactive framework.

  • Backend services – those with multiple downstream dependencies (micro-services) or with a need to orchestrate complex tasks
  • Engineering teams who want to leverage modern frameworks that abstract complexity in traditional programing – multithreading, non-blocking IO
  • Asynchronous or high velocity event streams – twitter, mobile application GUI

Akka is built for performance and ease of use. That’s a hard balance to achieve. It’s even harder to achieve on your own. Whether it’s concurrency or non-blocking IO, the Akka toolkit provides intuitive and performant models for developers to build upon.

In our case, we saw an 80% reduction in processing time with the new stack and a near 0% failure rate. Surely, as we turn on more features, there’s more to learn and improve. But, it’s clear that adopting reactive streams and the actor model has enabled us to service our customers better by providing a more reliable and performant API.

Balaji Srinivasaraghavan

Member of Technical Staff 2, PayPal