Hello Newman: A REST Client for Scala

By

Hi everyone, I’m Aaron. I came to PayPal from the StackMob team recently and while I was at StackMob, I co-created the Newman project. Our original goal was to unify high quality HTTP clients on the JVM together into a single, idiomatic Scala interface. I believe we’ve accomplished that, and we’ve moved on to higher level features to make it a great choice to talk to a RESTful service.

I recently gave a Newman talk at the Scala Bay Meetup in Mountain View, CA. A big thanks to everyone who came. I really appreciated all the great questions and feedback!

For those who missed my talk, I’ll give a recap here as well as describe Newman in more detail, and talk about some future plans. You can also check out all code sample and slides from the talk at https://github.com/arschles/newman-example.

Background & Motivation

At StackMob, we ran a service oriented architecture to power our products. To build out that architecture we ran a distributed system composed of many services inside our firewall.

Every service in the system used HTTP for transport and JSON for serialization to communicate over the wire. The challenge we faced was this: How do we easily and flexibly send and receive JSON data over HTTP in Scala? We had the same challenge for building servers and clients.

When we began investigating the existing pool of HTTP clients, we turned to the massive JVM community for high quality clients that could handle our needs. We found a lot of them! I’ll highlight two clients with which we gained significant experience.

Apache HttpClient

When we looked at the Apache foundation, we found the HttpClient project. As expected, we found HttpClient to be very high quality. We used this library for a lot of our initial work, but we found a usability problem – it took too much code to do a simple request. The below code shows setup and execution logic for a GET request:

/**
 * set up a connection manager and client.
 * you'd normally only do this once in your module or project.
 */
val connManager: ClientConnectionManager = {
  val cm = new PoolingClientConnectionManager()
  cm.setDefaultMaxPerRoute(maxConnectionsPerRoute)
  cm.setMaxTotal(maxTotalConnections)
  cm
}
val httpClient: AbstractHttpClient = {
  val client = new DefaultHttpClient(connManager)
  val httpParams = client.getParams
  HttpConnectionParams.setConnectionTimeout(httpParams, connectionTimeout)
  HttpConnectionParams.setSoTimeout(httpParams, socketTimeout)
  client
}

/**
 * now make the actual GET request
 */
val req = new HttpGet
val url = new URL("http://paypal.com”)
req.setURI(url.toURI)
val headers: List[(String, String)] = ???
headers.foreach { tup: (String, String) =>
  if(!headerName.equalsIgnoreCase("Content-Type")) req.addHeader(tup._1, tup._2)
}
val body: Array[Byte] = Array(‘a’.toByte, ‘b’.toByte, ‘c’.toByte)
//oops, sending a request body with a GET request doesn't make sense
req.setEntity(new ByteArrayEntity(body)) 
val resp = httpClient.execute(req)

Twitter Finagle

Finagle is Twitter’s core library for building distributed systems. The company has built almost all of their distributed systems infrastructure on top of this library. Furthermore, it represents a major abstraction that one of its creators has called services. See this paper for more.

Finagle is built atop the Netty project, so we expected Finagle to handle high concurrency workloads, which was important in many of our use cases. Also, we had used Netty directly to build some of our servers and found it’s stable and has a good community. With Finagle we found a similar pattern. For more on Finagle and Netty at Twitter, check out the recent Twitter blog posts.

Building HTTP clients with Finagle required less overall code than with the Apache library, but is still somewhat involved. The following setup an execution code for the same GET request as above:

//Set up the client. It's bound to one host.
host = "http://paypal.com/"
val url = new URL(host)
val client = ClientBuilder()
  .codec(Http())
  .hosts(host) //there are more params you can set here
  .build()

//Execute the request.
//Make sure the request is going to the same host
//as the client is bound to
val headers: Map[String, String] = ???
val method: Method = new HttpGet()
//this is an org.jboss.netty.buffer.ChannelBuffer
val channelBuf: ChannelBuffer = ??? 
val req = RequestBuilder()
  .url(url)
  .addHeaders(headers)
  //oops, sending a request body with a GET request doesn't make sense
  .build(method, Some(channelBuf))
val respFuture: Future[HttpResponse] = client.apply(req)

respFuture.ensure {
  client.close() //don’t forget!
}

In Summary

In our search, we looked at other libraries as well, but found common patterns with all of them:

  1. HTTP libraries on the JVM tend to be very stable and well tested, or built atop very stable and well tested core libraries.
  2. You usually have to write setup and cleanup code.
  3. It usually takes at least 5 lines of code to execute a request.
  4. The plain Java libraries (obviously) require you to write non-idiomatic Scala.

Overall, the libraries we found required us to remember a lot of code, common patterns and sometimes implementation details. With so much to remember, we decided to either commit to a single library or write a wrapper around each that we wanted to use.

In Comes Newman

Newman started as an informal wrapper around Apache HttpClient. As our overall codebase grew and evolved, we needed to use new clients and knew we needed to formalize our original wrapper into a stable interface to wrap all the messy details of each implementation.

We began with the core interface and two implementations: ApacheHttpClient and FinagleHttpClient. After we deployed code using our first Newman clients, we found more benefits to the core abstraction:

  1. Safety – We iterated on the interface and used Scala’s powerful type system to enforce various rules of HTTP and REST. We’re now at a point where our users can’t compile code that attempts to execute various types of invalid HTTP requests.
  2. Performance – Behind the interface, we added various levels of caching and experimented with connection pooling mechanisms, timeouts, and more to extract the best performance from Newman based on our workloads. We didn’t have to change any code on the other side of the abstraction.
  3. Concurrency – Regardless of the underlying implementation, executing a request returns standard Scala Futures that contain the response. This pattern helps ensure that code doesn’t block on downstream services. It also ensures we can interoperate with other Scala frameworks like Akka or Spray. The Scala community has a lot of great literature on Futures, so I’ll defer to those resources instead of repeating things. The Reactive Manifesto begins to explain some reasoning behind Futures (and more!) and the standard Scala documentation on Futures shows some usage patterns.
  4. Extensibility – Our environments and workloads change, so our clients must also. To effect the change we need, we just need to switch clients with one line of code. We also made the core client interface in Newman very easy to extend, so we can implement a new client quickly and have more time to focus on getting the performance correct.

Higher Level Features

We had our basic architecture figured out and tested, and it looks like this:

Slide23

A few notes about this architecture:

  • HttpClient is heavy – it handles various caching tasks, complex concurrency tasks (running event loops and maintaining thread pools, for example), and talking to the network.
  • HttpClient creates HttpRequests – each HttpRequest is very small and light. It contains a pointer back to the client that created it, so it’s common to have many requests for one client.
  • HttpRequest creates Future[HttpResponse] – the Future[HttpResponse] is tied to the HttpClient that is executing the request. That Future will be completed when the response comes back into the client.

With this architecture, we had proven to ourselves in production that we had a consistent, safe and performant HTTP client library. Our ongoing task now is to build features that make building and running systems easier for everyone who uses Newman. Here are a few higher level features that Newman has now:

  • Caching – Newman has an extensible caching mechanism that plugs into its clients. You define your caching strategy (when to cache) and backend (how and where to store cached data) by implementing interfaces. You can then plug them in to a caching HttpClient as necessary. Also, with this extensible caching system, it’s possible to build cache hierarchies. We’ve so far built an ETag and a simple read-through caching strategy and an in-memory caching backend. All ship with Newman.
  • JSON – As I mentioned at the beginning of this post, we use JSON extensively as our data serialization format over the wire, so we built it into Newman as a first class feature. Newman enables full serialization and deserialization to/from any type. Since JSON operations are built into the request and response interfaces, all client implementations get JSON functionality “for free.”
  • DSL – We built a domain specific language into Newman that makes even complex requests possible to create and execute in one line of code. The same goes for reading, deserializing, and decoding for handling responses. The DSL is standard Scala and provides more type safety on top of core Newman. Newman DSL code has become canonical.

The Result

Newman abstracts away the basics of RPC. For example, we were able to replace 10+ lines of code with the following (excluding imports and comments in both cases):

implicit val client = new ApacheHttpClient() //or swap out for another
GET(url(http, "paypal.com")).addHeaders("hello" -> "readers").apply

This code has more safety features than what it replaced in most cases and the setup and teardown complexities are written once and encapsulated inside the client. We have been pleased with Newman so far and anticipate that next steps will make Newman more powerful and useful for everyone.

The Future

We have a long list of plans for Newman. Our roadmap is open on GitHub at https://github.com/stackmob/newman/issues?state=open. The code is also open source, licensed under Apache 2.0. Read the code, file issues, request features, and submit pull requests at http://github.com/stackmob/newman.

Finally, if similar distributed systems work excites you, we build very large scale, high availability distributed systems and we’re hiring. If you’re interested, send me a GitHub, Twitter or LinkedIn message. Regardless, happy coding.

– Aaron Schlesinger – https://github.com/arschleshttps://twitter.com/arschleshttp://www.linkedin.com/profile/view?id=15144078

 

 

Aaron Schlesinger

I am a Sr. Member of the Technical Staff on the platform team here at PayPal. I like studying concurrent systems and concepts, and I pay special attention to how that world applies to distributed systems.

In my free time, I enjoy playing soccer and running.