Tag Archives: Open Source

PayPal bttn for Commerce

By

PayPal bttnIn my first rotation of PayPal’s Technology Leadership Program (TLP), I was fortunate enough to work on our Western Europe region out of our Paris office. The team there wanted to tap into the Internet of Things (IoT) market and with PayPal’s strategic movement from just a button on a website to existing across all contexts – including the offline world — it was clear that a physical button that integrates with our Braintree APIs was something worth investigating.

After some investigation, we found bt.tn, a start-up that has an innovative approach to buttons and is based out of Helsinki, Finland. Over the course of a few months, we had a working prototype, and today we have open-sourced this integration to allow anyone to integrate a bttn with PayPal. This technology will allow a business to associate their customer in a Braintree vault to a physical button and can be used over cell data, wifi, or the Sigfox network.

This product realizes the full potential of bttn and a Braintree-enabled merchant account in a way that most people might not think of. In order to avoid any database usage, the code was written to leverage both Braintree and bttn features that allowed for us to save information in these systems without the need for a database. All of this was done with a very small amount of code.

This first comes into play in recording the status of a bttn. There are two custom Braintree fields that are important to discuss. bttn_status is defined as either ACTIVE: Able to be used by a consumer, ONBOARDED: The consumer has shown interested in receiving a bttn or the bttn is being sent to the consumer, or UNREGISTERED: The consumer no longer has a button linked to their account in the merchant’s Braintree vault. A bttn_code is the unique code that is generated by bttn after the bttn has been registered. An example of the function to update bttn_status is supplied below.


function updateButtonStatusOnBraintree($braintreeCustomerId, $status)
{
  $statuses = array("ACTIVE", "ONBOARDED", "UNREGISTERED");
  if( !in_array($status, $statuses) )
  {
    die("Status: $status not in list of valid statuses.");
  }

  $updateResult = Braintree_Customer::update(
    $braintreeCustomerId,
    [
      'customFields' => array(
        "bttn_status" => $status
      )
    ]
  );
  return $updateResult->success;
}

See web/lib/braintree.php for more details.

Once the bttn is associated with a consumer, another call to the bttn API to associate metadata with the bttn must be performed. This is where the Braintree Customer ID, bt_id, is stored along with charge_type and url. The url parameter is called whenever the bttn is pressed. In our case, this happens to be process_button_push.php. The charge_type parameter indicates whether the payment should be fixed price (a single amount), reorder (charge the customer’s previous order amount), or selection (send the customer a selection of things to purchase via email, SMS, or push notification).
We have worked with bttn to make getting started easy.

  1. Purchase a bttn and fill out this form to enable the bttn-for-commerce integration. You will receive an email from bttn with your BTTN_API_KEY and BTTN_API_MERCHANT_NAME.
  2. When you receive your button, register it.
  3. Apply the bttn-for-commerce action to your bttn.

More specific steps and screenshots are available in the paypal-bttn GitHub project which is meant to be a proof of concept.

Python by the C side

By

C shells by the C shoreMahmoud’s note: This will be my last post on the PayPal Engineering blog. If you’ve enjoyed this sort of content subscribe to my blog/pythondoeswhat.com or follow me on Twitter. It’s been fun!

All the world is legacy code, and there is always another, lower layer to peel away. These realities cause developers around the world to go on regular pilgrimage, from the terra firma of Python to the coasts of C. From zlib to SQLite to OpenSSL, whether pursuing speed, efficiency, or features, the waters are powerful, and often choppy. The good news is, when you’re writing Python, C interactions can be a day at the beach.

 

A brief history

As the name suggests, CPython, the primary implementation of Python used by millions, is written in C. Python core developers embraced and exposed Python’s strong C roots, taking a traditional tack on portability, contrasting with the “write once, debug everywhere” approach popularized elsewhere. The community followed suit with the core developers, developing several methods for linking to C. Years of these interactions have made Python a wonderful environment for interfacing with operating systems, data processing libraries, and everything the C world has to offer.

This has given us a lot of choices, and we’ve tried all of the standouts:

Approach Vintage Representative User Notable Pros Notable Cons
C extension modules 1991 Standard library Extensive documentation and tutorials. Total control. Compilation, portability, reference management. High C knowledge.
SWIG 1996 crfsuite Generate bindings for many languages at once Excessive overhead if Python is the only target.
ctypes 2003 oscrypto No compilation, wide availability Accessing and mutating C structures cumbersome and error prone.
Cython 2007 gevent, kivy Python-like. Highly mature. High performance. Compilation, new syntax and toolchain.
cffi 2013 cryptography, pypy Ease of integration, PyPy compatibility New/High-velocity.

There’s a lot of history and detail that doesn’t fit into a table, but every option falls into one of three categories:

  1. Writing C
  2. Writing code that translates to C
  3. Writing code that calls into libraries that present a C interface

Each has its merits, so we’ll explore each category, then finish with a real, live, worked example.

Writing C

Python’s core developers did it and so can you. Writing C extensions to Python gives an interface that fits like a glove, but also requires knowing, writing, building, and debugging C. The bugs are much more severe, too, as a segmentation fault that kills the whole process is much worse than a Python exception, especially in an asynchronous environment with hundreds of requests being handled within the same process. Not to mention that the glove is also tailored to CPython, and won’t fit quite right, or at all, in other execution environments.

At PayPal, we’ve used C extensions to speed up our service serialization. And while we’ve solved the build and portability issue, we’ve lost track of our share of references and have moved on from writing straight C extensions for new code.

Translating to C

After years of writing C, certain developers decide that they can do better. Some of them are certainly onto something.

Going Cythonic

Cython is a superset of the Python programming language that has been turning type-annotated Python into C extensions for nearly a decade, longer if you count its predecessor, Pyrex. Apart from its maturity, the points that matters to us are:

  • Every Python file is a valid Cython file, enabling incremental, iterative optimization
  • The generated C is highly portable, building on Windows, Mac, and Linux
  • It’s common practice to check in the generated C, meaning that builders don’t need to have Cython installed.

Not to mention that the generated C often makes use of performance tricks that are too tedious or arcane to write by hand, partially motivated by scientific computing’s constant push. And through all that, Cython code maintains a high level of integration with Python itself, right down to the stack trace and line numbers.

PayPal has certainly benefitted from their efforts through high-performance Cython users like gevent, lxml, and NumPy. While our first go with Cython didn’t stick in 2011, since 2015, all native extensions have been written and rewritten to use Cython. It wasn’t always this way however.

A sip, not a SWIG

An early contributor to Python at PayPal got us started using SWIG, the Simplified Wrapper and Interface Generator, to wrap PayPal C++ infrastructure. It served its purpose for a while, but every modification was a slog compared to more Pythonic techniques. It wasn’t long before we decided it wasn’t our cup of tea.

Long ago SWIG may have rivaled extension modules as Python programmers’ method of choice. These days it seems to suit the needs of C library developers looking for a fast and easy way to wrap their C bindings for multiple languages. It also says something that searching for SWIG usage in Python nets as much SWIG replacement libraries as SWIG usage itself.

Calling into C

So far all our examples have involved extra build steps, portability concerns, and quite a bit of writing languages other than Python. Now we’ll dig into some approaches that more closely match Python’s own dynamic nature: ctypes and cffi.

Both ctypes and cffi leverage C’s Foreign Function Interface (FFI), a sort of low-level API that declares callable entrypoints to compiled artifacts like shared objects (.so files) on Linux/FreeBSD/etc. and dynamic-link libraries (.dll files) on Windows. Shared objects take a bit more work to call, so ctypes and cffi both use libffi, a C library that enables dynamic calls into other C libraries.

Shared libraries in C have some gaps that libffi helps fill. A Linux .so, Windows .dll, or OS X .dylib is only going to provide symbols: a mapping from names to memory locations, usually function pointers. Dynamic linkers do not provide any information about how to use these memory locations. When dynamically linking shared libraries to C code, header files provide the function signatures; as long as the shared library and application are ABI compatible, everything works fine. The ABI is defined by the C compiler, and is usually carefully managed so as not to change too often.

However, Python is not a C compiler, so it has no way to properly call into C even with a known memory location and function signature. This is where libffi comes in. If symbols define where to call the API, and header files define what API to call, libffi translates these two pieces of information into how to call the API. Even so, we still need a layer above libffi that translates native Python types to C and vice versa, among other tasks.

ctypes

ctypes is an early and Pythonic approach to FFI interactions, most notable for its inclusion in the Python standard library.

ctypes works, it works well, and it works across CPython, PyPy, Jython, IronPython, and most any Python runtime worth its salt. Using ctypes, you can access C APIs from pure Python with no external dependencies. This makes it great for scratching that quick C itch, like a Windows API that hasn’t been exposed in the os module. If you have an otherwise small module that just needs to access one or two C functions, ctypes allows you to do so without adding a heavyweight dependency.

For a while, PayPal Python code used ctypes after moving off of SWIG. We found it easier to call into vanilla shared objects built from C++ with an extern C rather than deal with the SWIG toolchain. ctypes is still used incidentally throughout the code for exactly this: unobtrusively calling into certain shared objects that are widely deployed. A great open-source example of this use case is oscrypto, which does exactly this for secure networking. That said, ctypes is not ideal for huge libraries or libraries that change often. Porting signatures from headers to Python code is tedious and error-prone.

cffi

cffi, our most modern approach to C integration, comes out of the PyPy project. They were seeking an approach that would lend itself to the optimization potential of PyPy, and they ended up creating a library that fixes many of the pains of ctypes. Rather than handcrafting Python representations of the function signatures, you simply load or paste them in from C header files.

For all its convenience, cffi’s approach has its limits. C is really almost two languages, taking into account preprocessor macros. A macro performs string replacement, which opens a Fun World of Possibilities, as straightforward or as complicated as you can imagine. cffi’s approach is limited around these macros, so applicability will depend on the library with which you are integrating.

On the plus side, cffi does achieve its stated goal of outperforming ctypes under PyPy, while remaining comparable to ctypes under CPython. The project is still quite young, and we are excited to see where it goes next.

A Tale of 3 Integrations: PKCS11

We promised an example, and we almost made it three.

PKCS11 is a cryptography standard for interacting with many hardware and software security systems. The 200-plus-page core specification includes many things, including the official client interface: A large set of C header-style information. There are a variety of pre-existing bindings, but each device has its own vendor-specific quirks, so what are we waiting for?

Metaprogramming

As stated earlier, ctypes is not great for sprawling interfaces. The drudgery of converting function signatures invites transcription bugs. We somewhat automated it, but the approach was far from perfect.

Our second approach, using cffi, worked well for our first version’s supported feature subset, but unfortunately PKCS11 uses its own CK_DECLARE_FUNCTION macro instead of regular C syntax for defining functions. Therefore, cffi’s approach of skipping #define macros will result in syntactically invalid C code that cannot be parsed. On the other hand, there are other macro symbols which are compiler or operating system intrinsics (e.g. __cplusplus, _WIN32, __linux__). So even if cffi attempted to evaluate every macro, we would immediately runs into problems.

So in short, we’re faced with a hard problem. The PKCS11 standard is a gnarly piece of C. In particular:

  1. Many hundreds of important constant values are created with #define
  2. Macros are defined, then re-defined to something different later on in the same file
  3. pkcs11f.h is included multiple times, even once as the body of a struct

In the end, the solution that worked best was to write up a rigorous parser for the particular conventions used by the slow-moving standard, generate Cython, which generates C, which finally gives us access to the complete client, with the added performance bonus in certain cases. Biting this bullet took all of a day and a half, we’ve been very satisfied with the result, and it’s all thanks to a special trick up our sleeves.

Parsing Expression Grammars

Parsing expression grammars (PEGs) combine the power of a true parser generating an abstract syntax tree, not unlike the one used for Python itself, with the convenience of regular expressions. One might think of PEGs as recursive regular expressions. There are several good libraries for Python, including parsimonious and parsley. We went with the former for its simplicity.

For this application, we defined a two grammars, one for pkcs11f.h and one for pkcs11t.h:

PKCS11F GRAMMAR

    file = ( comment / func / " " )*
    func = func_hdr func_args
    func_hdr = "CK_PKCS11_FUNCTION_INFO(" name ")"
    func_args = arg_hdr " (" arg* " ); #endif"
    arg_hdr = " #ifdef CK_NEED_ARG_LIST" (" " comment)?
    arg = " " type " " name ","? " " comment
    name = identifier
    type = identifier
    identifier = ~"[A-Z_][A-Z0-9_]*"i
    comment = ~"(/\*.*?\*/)"ms

PKCS11T GRAMMAR

    file = ( comment / define / typedef / struct_typedef / func_typedef / struct_alias_typedef / ignore )*
    typedef = " typedef" type identifier ";"
    struct_typedef = " typedef struct" identifier " "? "{" (comment / member)* " }" identifier ";"
    struct_alias_typedef = " typedef struct" identifier " CK_PTR"? identifier ";"
    func_typedef = " typedef CK_CALLBACK_FUNCTION(CK_RV," identifier ")(" (identifier identifier ","? comment?)* " );"    member = identifier identifier array_size? ";" comment?
    array_size = "[" ~"[0-9]"+ "]"
    define = "#define" identifier (hexval / decval / " (~0UL)" / identifier / ~" \([A-Z_]*\|0x[0-9]{8}\)" )
    hexval = ~" 0x[A-F0-9]{8}"i
    decval = ~" [0-9]+"
    type = " unsigned char" / " unsigned long int" / " long int" / (identifier " CK_PTR") / identifier
    identifier = " "? ~"[A-Z_][A-Z0-9_]*"i
    comment = " "? ~"(/\*.*?\*/)"ms
    ignore = ( " #ifndef" identifier ) / " #endif" / " "

Short, but dense, in true grammatical style. Looking at the whole program, it’s a straightforward process:

  1. Apply the grammars to the header files to get our abstract syntax tree.
  2. Walk the AST and sift out the semantically important pieces, function signatures in our case.
  3. Generate code from the function signature data structures.

Using only 200 lines of code to bring such a massive standard to bear, along with the portability and performance of Cython, through the power of PEGs ranks as one of the high points of Python in practice at PayPal.

Wrapping up

It’s been a long journey, but we stayed afloat and we’re happy to have made it. To recap:

  • Python and C are hand-in-glove made for one another.
  • Different C integration techniques have their applications, our stances are:
    • ctypes for dynamic calls to small, stable interfaces
    • cffi for dynamic calls to larger interfaces, especially when targeting PyPy
    • Old-fashioned C extensions if you’re already good at them
    • Cython-based C extensions for the rest
    • SWIG pretty much never
  • Parsing Expression Grammars are great!

All of this encapsulates perfectly why we love Python so much. Python is a great starter language, but it also has serious chops as a systems language and ecosystem. That bottom-to-top, rags-to-riches, books-to-bits story is what makes it the ineffable, incomparable language that it is.

C you around!

Kurt and Mahmoud

Python Packaging at PayPal

By

Year after year, Pythonists all over are churning out more code than ever. People are learning, the ecosystem is flourishing, and everything is running smoothly, right up until packaging. Packaging Python is fundamentally un-Pythonic. It can be a tough lesson to learn, but across all environments and applications, there is no one obvious, right way to deploy. Frankly, it’s hard to think of an area where Python’s Zen applies less.

At PayPal, we write and deploy our fair share of Python, and we wanted to devote a couple minutes to our story and give credit where credit is due. For conclusion seekers, without doubt or further ado: Continuum Analytics’ Anaconda Python distribution has made our lives so much easier. For small- and medium-sized teams, no matter the deployment scale, Anaconda has big implications. But let’s talk about how we got here.

Beginnings

Right now, PayPal Python Infrastructure provides equitable support for Windows, OS X, Linux, and Solaris, supporting various combinations of 32-bit and 64-bit Python 2.6, Python 2.7, and PyPy 5.

Glossing over the primordial days, when Kurt and I started building the Python platform at PayPal, we didn’t know we would be building the first cross-platform stack the company had ever seen. It was December 2012, we just wanted to see every developer unwrap a brand new laptop running PayPal Python services locally.

What ensued was the most intense engineering sprint I had ever experienced. We ported critical functionality previously only available in shared objects we had been calling into with ctypes. Several key parts were available in binary form only and had to be disassembled. But with the New Year, 2013, we were feeling like a whole new stack. All the PayPal-specific parts of our framework were pure-Python and portable. Just needed to install a few open-source libraries, like gevent, greenlet, maybe lxml. Just pip install, right?

Up the hill

In an environment where Python is still a new technology to most, pip is often not available, let alone understood. This learning curve can represent a major hurdle to many. We wanted more people to be able to write Python, and even more to be able to run it, as many places as possible, regardless of whether they were career Pythonists. So with a judicious shake of Python simplicity, we adopted a policy of “vendoring in” all of our core dependencies, including compiled extensions, like gevent.

This model yields somewhat larger repositories, but the benefits outweighed a few extra seconds of clone time. Of all the local development stories, there is still no option more empowering than the fully self-contained repository. Clone and run. A process so seamless, it’s like a miniature demo that goes perfect every time. In a world of multi-hour C++ and Java builds, it might as well be magic.

“So what’s the problem?”

Static builds. Every few months (or every CVE) the Python team would have to sit down to refresh, regression test, and certify a new set of libraries. New libraries were added sparingly, which is great for auditability, but not so great for flexibility. All of this is fine for a tight set of networking, cryptography, and serialization libraries, but no way could we support the dozens of dependencies necessary for machine learning and other advanced Python use cases.

And then came Anaconda. With the Anaconda Python distribution, Continuum is doing effectively what our team had been doing, but for free, for everyone, for hundreds of libraries. Finally, there was a standard option that made Python even simpler for our developers.

Adopting and adapting

As soon as we had the opportunity, we made Anaconda a supported platform for development. From then on, regardless of platform, Python beginners got one of two introductions: Install Anaconda, or visit our shared Jupyter Notebook, also backed by Anaconda.

The good kind of escalation

Today, Anaconda has gone beyond development environments to enable production PayPal machine learning applications for the better part of a year. And it’s doing so with more optimizations than we can shake a stick at, including running all the intensive numerical operations on Intel’s MKL. From now on, Python applications exist on a moving walkway to production perfection.

This was realized through two Anaconda packaging models that work for us. The first preinstalls a complete Anaconda on top of one of PayPal’s base Docker images. This works, and is buzzword-compliant, but for reasons outside the scope of this post, also entails maintaining a single large Docker image with the dependencies of all our downstream users.

As with all packaging, there’s always another way. One alternative approach that has worked well for us involves a little Continuum project known as Miniconda. This minimalist distribution has just enough to make Python and conda work. At build time, our applications package Miniconda, the bzip2 conda archives of the dependencies, and a Python installer, wrapped up with a CalVer filename. At deploy time, we install Miniconda, then conda install the dependencies. No downloads, no compilation, no outside dependencies. The code is only a little longer than the description of the process. Conda envs are more powerful than virtualenvs, and have a better cross-platform, cross-dev/prod story, as well. Developers enjoy the increased control, smaller packages, and applicability across both standard and containerized environments.

Packages to come

As stated in Enterprise Software with Python, packaging and deployment is not the last step. The key to deployment success is uniform, well-specified environments, with minimal variation between development and production. Or use Anaconda and call it good enough! We sincerely thank the Anaconda contributors for their open-source contributions, and hope that their reach spreads to ever more environments and runtimes.

Powering Transactions Search with Elastic – Learnings from the Field

By

Introduction

We see a lot of transactions at PayPal. Millions every day.

These transactions originate externally (a customer using PayPal to pay for a purchase on a website) as well as internally, as money moves through our system. Regular reports of these transactions are delivered to merchants in the form of a csv or a pdf file. Merchants use these reports to reconcile their books.

Recently, we set out to build a REST API that could return transaction data back to merchants. We also wanted to offer the capability to filter on different criteria such as name, email or transaction amount. Support for light aggregation/insight use cases was a stretch goal. This API would be used by our partners, merchants and external developers.

We choose Elastic as the data store. Elastic has proven, over the past 6 years, to be an actively developed product that constantly evolves to adapt to user needs. With a great set of core improvements introduced in version 2.x (memory mapped doc values, auto-regulated merge throttling), we didn’t need to look any further.

Discussed below is our journey and key learnings we had along the way, in setting up and using Elastic for this project.

Will It Tip Over

Once live, the system would have tens of terabytes of data spanning 40+ billion documents. Each document would have over a hundred attributes. There would be tens of millions of documents added every day. Each one of the planned Elastic blades has 20TB of SSD storage, 256 GB RAM and 48 cores (hyper).

While we knew Elastic had all the great features we were looking for, we were not too sure if it would be able to scale to work with this volume (and velocity) of data. There are a host of non-functional requirements that arise naturally in financial systems which have to be met. Let’s limit our discussion in this post to performance – response time to be specific.

Importance of Schema

Getting the fundamentals right is the key.

When we initially setup Elastic, we turned on strict validation of fields in the documents. While this gave us a feeling of security akin to what we’re used to with relational systems (strict field and type checks), it hurt performance.

We inspected the content of the Lucene index Elastic created using Luke. With our initial index setup, Elastic was creating sub-optimal indexes. For e.g. in places where we had defined nested arrays (marked index=”no”), Elastic was still creating child hidden documents in Lucene, one per element in the array. This document explains why, but it was still using up space when we can’t even query the property via the index. Due to this, we switched the “dynamic” index setting from strict to false. Avro helped ensure that the document conforms to a valid schema, when we prepared the documents for ingestion.

A shard should have no more than 2 billion parent plus nested child documents, if you plan to run force merge on it eventually (Lucene doc_id is an integer). This can seem high but is surprisingly easy to exceed, especially when de-normalizing high cardinality data into the source. An incorrectly configured index could have a large number of hidden Lucene documents being created under the covers.

Too Many Things to Measure

With the index schema in place, we needed a test harness to measure the performance of the cluster. We wanted to measure Elastic performance under different load conditions, configurations and query patterns. Taken together, the dimensions total more than 150 test scenarios. Sampling each by hand would be near impossible. jMeter and Beanshell scripting really helped here to auto-generate scenarios from code and have jMeter sample each hundreds of times. The results are then fed into Tableau to help make sense of the benchmark runs.

  • Indexing Strategy
    • 1 month data per shard, 1 week data per shard, 1 day data per shard
  • # of shards to use
  • Benchmark different forms of the query (constant score, filter with match all etc.)
  • User’s Transaction Volume
    • Test with accounts having 10 / 100 / 1000 / 10000 / 1000000 transactions per day
  • Query time range
    • 1 / 7 / 15 / 30 days
  • Store source documents in Elastic? Or store source in a different database and fetch only the matching IDs from Elastic?

Establishing a Baseline

The next step was to establish a baseline. We chose to start with a single node with one shard. We loaded a month’s worth of data (2 TB).

Tests showed we could search and get back 500 records from across 15 days in about 10 seconds when using just one node. This was good news since it could only get better from here. It also proves an Elastic (Lucene segments) shard can handle 2 billion documents indexed into it, more than what we’ll end up using.

One take away was high segment count increases response time significantly. This might not be so obvious when querying across multiple shards but is very obvious when there’s only one. Use force merge if you have the option (offline index builds). Using a high enough value for refresh_interval and translog.flush_threshold_size enables Elastic to collect enough data into a segment before a commit. The flip side was it increased the latency for the data to become available in the search results (we used 4GB & 180 seconds for this use case). We could clock over 70,000 writes per second from just one client node.

Nevertheless, data from the recent past is usually hot and we want all the nodes to chip in when servicing those requests. So next, we shard.

Sharding the Data

The same one month’s data (2 TB) was loaded onto 5 nodes with no replicas. Each Elastic node had one shard on it. We choose 5 nodes to have a few unused nodes. They would come in handy in case the cluster started to falter and needed additional capacity, and to test recovery scenarios. Meanwhile, the free nodes were used to load data into Elastic and acted as jMeter slave nodes.

With 5 nodes in play,

  • Response time dropped to 6 seconds (40% gain) for a query that scanned 15 days
  • Filtered queries were the most consistent performers
  • As a query scanned more records due to an increase in the date range, the response time also grew with it linearly
  • A force merge to 20 segments resulted in a response time of 2.5 seconds. This showed a good amount of time was being spent in processing results from individual segments, which numbered over 300 in this case. While tuning the segment merge process is largely automatic starting with Elastic 2.0, we can influence the segment count. This is done using the translog settings discussed before. Also, remember we can’t run a force merge on a live index taking reads or writes, since it can saturate available disk IO
  • Be sure to set the “throttle.max_bytes_per_sec” param to 100 MB or more if you’re using SSDs, the default is too low
  • Having the source documents stored in Elastic did not affect the response time by much, maybe 20ms. It’s surely more performant than having them off cluster on say Couchbase or Oracle. This is due to Lucene storing the source in a separate data structure that’s optimized for Elastic’s scatter gather query format and is memory mapped (see fdx and fdt files section under Lucene’s documentation). Having SSDs helped, of course

 

Sharding the Data

Final Setup

The final configuration we used had 5-9 shards per index depending on the age of the data. Each index held a week’s worth of data. This got us a reasonable shard count across the cluster but is something we will continue to monitor and tweak, as the system grows.

We saw response times around the 200 ms mark to retrieve 500 records after scanning 15 days’ worth of data with this setup. The cluster had 6 months of data loaded into it at this point.

Shard counts impact not just read times; they impact JVM heap usage due to Lucene segment metadata as well as recovery time, in case of node failure or a network partition. We’ve found it helps Elastic during rebalance if there are a number of smaller shards rather than a few large ones.

We also plan to spin up multiple instances per node to better utilize the hardware. Don’t forget to look at your kernel IO scheduler (see hardware recommendations) and the impact of NUMA and zone reclaim mode on Linux.

Final Setup

Conclusion

Elastic is a feature rich platform to build search and data intensive solutions. It removes the need to design for specific use cases, the way some NoSQL databases require. That’s a big win for us as it enables teams to iterate on solutions faster than would’ve been possible otherwise.

As long as we exercise due care when setting up the system and validate our assumptions on index design, Elastic proves to be a reliable workhorse to build data platforms on.

Lessons Learned from the Java Deserialization Bug

By

(with input from security researcher Mark Litchfield)

Introduction

At PayPal, the Secure Product LifeCycle (SPLC) is the assurance process to reduce and eliminate security vulnerabilities in our products over time by building repeatable/sustainable proactive security practices embedding them within our product development process.

A key tenet of the SPLC is incorporating the lessons learned from remediating security vulnerabilities back into our processes, tools, and training to keep us on a continuous improvement cycle.

The story behind the Java deserialization vulnerability

The security community has known about deserialization vulnerabilities for a few years but they were considered to be theoretical and hard to exploit. Chris Frohoff (@frohoff) and Gabriel Lawrence (@gebl) shattered that notion back in January of 2015 with their talk at AppSecCali – they also released their payload generator (ysoserial) at the same time.

Unfortunately, their talk did not get enough attention among mainstream technical media. Many called it “the most underrated, under-hyped vulnerability of 2015”. It didn’t stay that way after November 2015, when security researchers at FoxGlove Security published their exploits for many products including WebSphere, JBOSS, WebLogic, etc.

This caught our attention, and we began forking a few work-streams to assess the impact to our application infrastructure.

Overview of the Java Deserialization bug

A custom deserialization method in Apache commons-collections contains a reflection logic that can be exploited to execute arbitrary code. Any Java application that accepts untrusted data to deserialize (aka marshalling or un-pickling) and has commons-collections in its classpath can be exploited to run arbitrary code.

The obvious quick fix was to patch commons-collections jar so that it does not contain the exploitable code. A quick search on our internal code repository showed us how complex this process could be – many different libraries and applications use many different versions of commons-collections and the transitive nature of how this gets invoked made it even more painful.

Also, we quickly realized that this is not just about commons-collections or even Java, but an entire class of vulnerability by itself and could occur to any of our other apps that perform un-pickling of arbitrary object types. We wanted to be surgical about this as patching hundreds of apps and services was truly a gargantuan effort.

Additionally, most financial institutions  have a holiday moratorium that essentially prevents any change or release to the production environment, and PayPal is no exception. We conducted our initial assessment on our core Java frameworks and found that we were not utilizing the vulnerable libraries and therefore had no immediate risk from exploit tools in the wild. We still had to review other applications that were not on our core Java frameworks in addition to our adjacencies.

The Bug Bounty Submission

Mark Litchfield, one of the top security researchers of our Bug Bounty program submitted a Remote Code Execution (RCE) using the above mentioned exploit generator on December 11, 2015. Mark has been an active member of PayPal Bug Bounty community since 2013. He has submitted 17 valid bugs and been featured on our Wall of Fame.

As it turned out, this bug was found in one of our apps that was not on our core Java frameworks. Once we validated the bug, the product development and security teams quickly moved to patch the application in a couple of days despite the holiday moratorium.

Below are Mark’s notes about this submission in his own words:

  • It was a Java deserialization vulnerability
  • Deserialization bugs were (as of a month ago) receiving a large amount of attention as it had been overlooked as being as bad as it was
  • There were nine different attack vectors within the app

While there were nine different instances of the root cause, the underlying vulnerability was just in one place and fixing that took care of fixing all instances.

Remediation – At Scale

The next few days were a whirlwind of activity across many PayPal security teams (AppSec, Incident Response, Forensics, Vulnerability Management, Bug Bounty and other teams) and our product development and infrastructure teams.

Below is a summary of the key learnings that we think would be beneficial to other security practitioners.

When you have multiple application stacks, core and non-core applications, COTS products, subsidiaries, etc. that you think are potentially vulnerable, where do you start?

Let real-world risk drive your priorities. You need to make sure that the critical risk assets get the first attention. Here are the different “phases” of activities that would help bring some structure when driving the remediation of such a large-scale and complex vulnerability.

Inventory

  • If your organization has a good app inventory, it makes your life easy.
  • If you don’t, then start looking for products that have known exploit code like WebSphere, WebLogic, Jenkins, etc.
  • Validate the manual inventory with automated scans and make sure you have a solid list of servers to be patched.
  • For custom code, use static and dynamic analysis tools to determine the exposure.
  • If you have a centralized code repo, it definitely makes things easy.
  • Make sure you don’t miss one-off apps that are not on your core Java frameworks.
  • Additionally, reach out to the security and product contacts within all of your subsidiaries (if they are not fully integrated) and ask them to repeat the above steps.

Toolkit

Here are a few tools that are good to have in your arsenal:

  • SerializeKiller
  • Burp Extension from DirectDefense
  • Commercial tools that you may already have in-house such as static/dynamic analysis tools and scanners. If your vendors don’t have a rule-pack for scanning, push them to implement one quickly.

Additionally, don’t hesitate to get your hands dirty and write your own custom tools that specifically help you navigate your unique infrastructure.

Monitoring & Forensics

  • Implement monitoring until a full-fledged patch can be applied.
  • Signatures for Java deserialization can be created by analyzing the indicators of the vulnerability.
  • In this case, a rudimentary indicator is to parse network traffic for the hexadecimal string “AC ED 00 05” and in base64 encoded format “rO0”.
  • Whether you use open source or commercial IDS, you should be able to implement a rule to catch this. Refer to the free Snort rules released by Emerging Threats.
  • Given that this vulnerability has been in the wild for a while, do a thorough forensics analysis on all the systems that were potentially exposed.

Short-term Remediation

  • First up, drive the patching of your high-risk Internet facing systems – specifically COTS products that have published exploits.
  • For custom apps that use the Apache commons-collection, you can either remove the vulnerable classes if not used or upgrade to the latest version.
  • If the owners of those systems are different, a lot can be accomplished in parallel.

Long-term Remediation

  • Again, this vulnerability is not limited to commons-collections or Java per se. This is an entire class of vulnerability like XSS. Your approach should be holistic accordingly.
  • Your best bet is to completely turn off object serialization everywhere.
  • If that is not feasible, this post from Terse Systems has good guidance on dealing with custom code.

Summary

In closing, we understand that today’s application infrastructure is complex and you don’t own/control all the code that runs in your environment. This specific deserialization vulnerability is much larger than any of us initially anticipated – spanning across open source components, third-party commercial tools and our own custom code. Other organizations should treat this seriously as it can result in remote code execution and implement security controls for long-term remediation, and not stop just at patching the commons-collections library. To summarize:

  • We turned around to quickly patch known vulnerabilities to protect our customers
  • We are investing in tools and technologies that help inventory our applications, libraries and their dependencies in a much faster and smarter way
  • We are streamlining our application upgrade process to be even more agile in responding to such large-scale threats
  • We changed our policy on fixing security bug fixes during moratorium – we now mandate fixing not just P0 and P1 bugs – but also P2 bugs (e.g., internal apps that are not exposed to the Internet).

 

Enterprise Overhaul: Resolving DNS

By

Everyone assumes all software engineers are great with numbers. If only they knew the truth. How many people’s phone numbers can you recite? No peeking and emergency numbers don’t count! Don’t worry if you couldn’t name that many. Here’s the real embarrassing test of the day: How many sites’ IP addresses can you name? No pinging and local subnets don’t count!

telephone
Most telephones still looked like this when DNS was invented. Not pictured: the phonebook.

Back in the mid-1980s, the first Domain Name System (DNS) implementations started putting our IP addresses into server-based contact lists and the Internet has never looked the same since. These days, we may associate DNS with large-scale networks, but it’s important to remember that DNS really came from a very human distaste for numbers. Thirty years later, we engineers use it so much in normal Internet usage that it’s easy to take for granted.

DNS may be a mature, but the fact of networks is that it always takes at least two to tango. As new technologies and deployments emerge, the implications of integrating with DNS must still be revisited. Your datacenter is not the Internet, even if it’s in the cloud. Continuing the enterprise themes of our previous posts, this post looks at how to resolve a few of the DNS pitfalls preying on precious reliability and performance.

A protocol precaution

The client side of DNS, resolution, is virtually all UDP. This is interesting because UDP is designed as a lightweight, unreliable transport. However, in many of the most common use cases, DNS calls precede TCP-backed HTTP and other protocols based on reliable transports. This fundamental difference changes many things. Looking upstream, UDP does not load-balance like TCP. Because UDP is not connection-oriented or congestion-controlled, DNS traffic will act very differently at scale.

So our first lesson is to stay true to the stateless nature of UDP and avoid putting stateful load balancers in front of DNS infrastructure. Instead, configure clients and servers to conform to the built-in load-handling architecture of DNS. The Internet’s DNS “deployment” is load balanced via its inherent hierarchy and IP Anycast.

Client integration

Back on the client side, you can do a lot to optimize and robustify your application’s DNS integration. The first step is to take a hard look at your stack. Whether you’re running Python, Java, JavaScript, or C++, the defaults may not be for you, especially when working with traffic within the datacenter.

For example, while not supported here at PayPal, it’s safe to say Tornado is a popular Python web framework, with many asynchronous networking features. But, silently and subtly, DNS is not one of them. Tornado’s default DNS resolution behavior will block the entire IO event loop, leading to big issues at scale.

And that’s just one example of library DNS defaults jeopardizing application reliability. Third-party packages and sometimes even builtins in Java, Node.js, Python, and other stacks are full of hidden DNS faux pas.

For instance, the average off-the-shelf HTTP client seems like a neutral-enough component. Where would we be without reliable standbys like wget? And that is how the trouble starts. The DNS defaults in most tools are designed to make for good Internet citizens, not reliable and performant enterprise foundations.

DNS_in_the_real_world.svg
The hops Internet-connected applications make for you. It’s no wonder the default timeout is 5000 milliseconds.

The first difference is name resolution timeouts. By default, resolve.conf, netty, and c-ares (gevent, node.js, curl) are all configured to a whopping 5 seconds. But this is your enterprise, your service, and your datacenter. Look at the SLA of your service and the reliability of your DNS. If your service can’t take an extra 500 milliseconds some percentage of the time, then you should lower that timeout. I’ve usually recommended 200 milliseconds or less. If your infrastructure can’t resolve DNS faster than that, do one or more of the following:

  1. Put the authoritative DNS servers topologically closer.
  2. Add caching DNS servers, maybe even on the same machine.
  3. Build application-level DNS caching.

Option #1 is purely a network issue, and a matter for network operations to discuss. For brevity’s sake, option #2 is outside the scope of this article. But option #3 is the one we recommend most, because it is bureaucracy-free and relatively easy to implement, even with enterprise considerations.

Application-level DNS caching

When designing an enterprise application-level DNS cache, we must recognize that we are not discussing standard-issue web components like scrapers and browsers. Most enterprise services talk to a fixed set of relatively few machines. Even the most powerful and complex production PayPal services communicate with fewer than 200 addresses, partly due to the prevalence of load balancing LTMs in our architecture.

For our gevent-based Python stack, we use an asynchronous DNS cache that refreshes those addresses every five minutes. Plus, the stack warms up our application’s DNS cache by kicking off preresolution of many known DNS-addressed hosts at startup, ensuring that the first requests are as fast as later ones.

Some may be asking, why use a custom, application-level DNS cache when virtually every operating system caches DNS automatically? In short, when the OS cache expires, the next DNS resolution will block, causing stacks without this asynchronous DNS cache to block on the next resolution. Our DNS cache allows us to use mildly stale addresses while the cache is refreshing, making us robust to many DNS issues. For our use cases both the chances and consequences of connecting to the wrong server are so minute that it’s not worth inflating outlier response times by inlining DNS. This arrangement also makes services much more robust to network glitches and DNS outages, as well as allowing for more logging and instrumentation around the explicit DNS resolution so you can see when DNS is performing badly.

Denecessitizing DNS?

The overhaul wouldn’t be complete without exploring one final scenario. What’s it like to not use DNS at all? It may sound odd, given the number of technologies built on DNS in the last 30 years. But even today, PayPal production services still communicate to each other using a statically generated IP-address-based system, like a souped-up hosts file. This design decision long predates my tenure here, and for a long time I considered it technical debt. But after collaborating with architects here and at other enterprise datacenters, I’ve come to appreciate the advantages of skipping DNS. DNS was designed for multi-authority, federated, eventually-consistent networks, like the Internet. Even the biggest datacenters are not the Internet. A datacenter is topologically smaller, has only one operational authority, and must meet much tighter reliability requirements.

pp_midtier
A little peek at PayPal’s midtier-to-midtier traffic. Each shrunken line of text is a service endpoint. It looks like a lot, but each endpoint only talks to a few others.

Whether or not your system uses DNS, when you own the entire network it’s still best practice to maintain a central, version-controlled, “single source of truth” repository for networking configurations. After all, even DNS server configurations have to come from somewhere. If it were possible to efficiently and reliably push that same information to every client, would you?

Explicit preresolution of all service names reduces the window of inconsistency while saving the datacenter billions of network requests. If you already have a scalable deployment system, could it also fill the network topology gap, saving you the trouble of overhauling, scaling, and maintaining an Internet system for enterprise use? There’s a lot packed in a question like that, but it’s something to consider when designing your service ecosystem.

In short

So, to sum it all up, here are the key takeaways:

  • Beware the pitfalls of stateful load-balancing for DNS and UDP.
  • Tighten up your timeouts according to your SLAs.
  • Consider an in-application DNS cache with explicit resolution.
  • The fastest and most reliable request is the request you don’t have to make.
  • A datacenter is not the Internet.

It may be obvious now, but it bears repeating. If you’re not careful, out-of-box solutions will fill your inbox with avoidable problems. Quality enterprise engineering means taking a microscope to libraries, with deliberate overhauling for your organization’s needs.

If you found this interesting, have some experience, and know a thing or two about security, have we got the job for you! See this description and contact mahmoud@paypal.com and kurose@paypal.com with your résumé/portfolio.

 

Swagger is Now a Part of PayPal’s Future

By and

On November 5th, the Linux Foundation announced a new collaborative project, the Open API Initiative. PayPal is proud to be one of the founding corporate members. This expands our relationship with the Linux Foundation and the open source world, as we are already members of the Node Foundation. This collaborative project establishes an open governance structure for moving the Swagger specification into the future, with corporate resources supporting the specification.

swagger1

If you’ve followed Swagger’s story in recent years, you’ll know that in 2014, the project’s brand was bought by SmartBear (an API testing tool company, know for SOAPUI). As it turned out, this complicated things somewhat, from a trademark standpoint, for some adopters of Swagger’s open source standard.

The folks at SmartBear wanted to do the right thing for the Swagger open source community, so they’ve contributed the specification format to the Linux Foundation. The specification format will be referred to as the Open API Definition Format (OADF), which is essentially a brand-free synonym for Swagger.

PayPal was contacted by SmartBear and the Linux Foundation at the inception of the Open API Initiative. We had previously established a relationship with Swagger maintainers, as we’ve had numerous 2015 internal initiatives utilizing Swagger for API definitions. We were excited to join other leaders in the API space as part of the founding group of this collaborative project.

In discussions so far, it’s clear that all members involved are interested in supporting open source collaboration, and moving the Swagger specification forward without hindrance. The most exciting part is hearing that many member companies are planning to dedicate development resources toward contributing to Swagger open source projects.

PayPal, Braintree and Venmo are shifting more internal and external API initiatives and development resources toward utilizing Swagger, and we look forward to continuing to contribute to existing projects, and hopefully releasing some of our own.

PayPal has had a long-standing commitment to open source through our many iterations of APIs, including our latest REST APIs/SDKs and Braintree’s SDK. We understand that our SDKs don’t cover every language available, and that there are some great open source codegen products available for API clients, when a Swagger definition is provided.

We’re committed to delivering Swagger definitions for our APIs to our developer community in 2016, stay tuned for more information.

Jason HarmonAuthor: Jason Harmon

About the author: Jason is the former head of the API Design team at PayPal, helping development teams design high quality, usable APIs across the platform. He blogs at apiux.com, and has a Youtube channel API Workshop (https://www.youtube.com/channel/UCKK2ir0jqCvfB-kzBGka_Lg).

Isomorphic React Apps with React-Engine

By

Earlier this year, we started using react in our various apps at PayPal. For existing apps the plans were to bring react in incrementally for new features and start migrating portions of the existing functionality into a pure react solve. Regardless, most implementations were purely client side driven. But most recently, in one of the apps that we had to start from scratch, we decided to take a step forward and use react end to end. Given express based kraken-js as PayPal’s application stack, we wanted to incorporate react Views in JS or JSX to be the default template solution along with routing and isomorphic support.

Thus, a summary of our app’s requirements were

  1. Use react’s JSX (or react’s JS) as the default template engine in our kraken-js stack
  2. Support server side rendering and preserve render state in the browser for fast page initialization
  3. Map app’s visual state in the browser to URLs using react-router to support bookmarking
  4. Finally, support both react-router based UI workflows and simple stand alone react view UI workflows in kraken-js

As we started working towards the above requirements, we realized that there was a lot of boiler plate involved in using react-router along side with simple react views in an express app. react-router requires its own route declaration to be run before a component can be rendered, because it needs to figure out the component based on URL dynamically. But plain react views could be just rendered without any of those intermediate steps.

We wanted to take this boiler plate that happens at the express render time and simplify it using a clean api. Naturally, all things pointed to express’s `res.render`. The question became very clear now on how `res.render` can be used as-is without any api facade changes but support both react-router rendering and regular react view rendering.

Thus react-engine was born to abstract all of the complexities into the res.render!

So in simple terms, react-engine is a Javascript library for express based NodeJS web apps for rendering composite react views. The phrase composite react views reflects react-engine’s ability to handle rendering of both react-router based UI workflows and stand alone react views.

Render Example
// to run and render a react router based component
res.render('/account', {});
// or more generically
res.render(req.url, {});

// to render a react view
res.render('index', {});

Notice how the first render method is called with a `/` prefix in the view name. That is key to the react-engine’s MAGIC. Behind the scenes, react-engine uses a custom express View to intercept view names and if they start with a `/` then it first runs the react-router and then renders the component that react router spits out or if there is no `/` then it just renders the view file.

Setup Example
var express = require('express');
var engine = require(‘react-engine');

var app = express();

// react-engine options
var engineOptions = {
  // optional, if not using react-router
  reactRoutes: 'PATH_TO_REACT_ROUTER_ROUTE_DECLARATION' 
};

// set `react-engine` as the view engine
app.engine('.jsx', engine.server.create(engineOptions));

// set the view directory
app.set('views', __dirname + '/public/views');

// set js as the view engine
app.set('view engine', 'jsx');

// finally, set the custom react-engine view for express
app.set('view', engine.expressView);

In a nutshell, the code sets react-engine as the render engine with its custom express View for express to render jsx views.

react-engine supports isomorphic react apps by bootstrapping server rendered components onto the client or browser in an easy fashion. It exposes a client api function that can be called whenever the DOM is ready to bootstrap the app.

document.addEventListener('DOMContentLoaded', function onLoad() {
    client.boot(options, function onBoot(renderData) {
    });
});
The complete options spec can be found here. More detailed examples of using react-engine can be found here https://github.com/paypal/react-engine/tree/master/examples.

We love react at PayPal and react-engine helps us abstract out the boiler plate in setting up react to be used with react-router in isomorphic express apps and focus on writing the business logic.

Introducing SuPPort

By

In our last post, Ten Myths of Enterprise Python, we promised a deeper dive into how our Python Infrastructure works here at PayPal and eBay. There is a problem, though. There are only so many details we can cover, and at the end of the day, it’s just so much better to show than to tell.

support_logoSo without further ado, we’re pleased to introduce SuPPort, an in-development distillation of our PayPal Python Infrastructure.

Started in 2010, Python Infrastructure initially powered PayPal’s internal price-setting interfaces, then grew to support payment orchestration interfaces, and now in 2015 it supports dozens of projects at PayPal and eBay, having handled billions of production-critical requests for a wide range of teams and tiers. So what does it mean to distill this functionality into SuPPort?

SuPPort is an event-driven server framework designed for building scalable and maintainable services and clients. It’s built on top of several open-source technologies, so before we dig into the workings of SuPPort, we ought to showcase its foundations:

Some or all of these may be new to many developers, but all-in-all they comprise a powerful set of functionality. With power comes complexity, and while Python as a language strives for technical convergence, there are many ways to approach the problem of developing, scaling, and maintaining components. SuPPort is one way gevent and the libraries above have been used to build functional services and products with anywhere from 100 requests per day to 100 requests per second and beyond.

Enterprise Ideals, Flexible Features

Many motivations have gone into building up a Python stack at PayPal, but as in any enterprise environment, we continuously aim to achieve the following:

Of course organizations of all sizes want these features as well, but the key difference is that large organizations like PayPal usually end up building more. All while demanding a higher degree of redundancy and risk mitigation from their processes. This often results in great cost in terms of both hardware and developer productivity. Fortunately for us, Python can be very efficient in both respects.

So, let’s take a stroll through a selection of SuPPort’s feature set in the context of these criteria! Note that if you’re unfamiliar with evented programming, nonblocking sockets, and gevent in particular, some of this may seem quite foreign. The gevent tutorial is a good entry point for the intermediate Python programmer, which can be supplemented with this well-illustrated introduction to server architectures.

Interoperability

Python usage here at PayPal has spread to virtually every imaginable use case: administrative interfaces, midtier services, operations automation, developer tools, batch jobs; you name it, Python has filled a gap in that area. This legacy has resulted in a few rather interesting abstractions exposed in SuPPort.

BufferedSocket

PayPal has hundreds of services across several tiers. Interoperating between these means having to implement over half a dozen network protocols. The BufferedSocket type eliminated our inevitable code duplication, handling a lot of the nitty-gritty of making a socket into a parser-friendly data source, while retaining timeouts for keeping communications responsive. A must-have primitive for any gevent protocol implementer.

ConnectionManager

Errors happen in live environments. DNS requests fail. Packets are lost. Latency spikes. TCP handshakes are slow. SSL handshakes are slower. Clients rarely handle these problems gracefully. This is why SuPPort includes the ConnectionManager, which provides robust error handling code for all of these cases with consistent logging and monitoring. It also provides a central point of configuration for timeouts and host fallbacks.

Introspectability

As part of a large organization, we can afford to add more machines, and are even required to keep a certain level of redundancy and idle hardware. And while DevOps is catching on in many larger-scale environments, there are many cases in enterprise environments where developers are not allowed to attend to their production code.

SuPPort currently comes with all the same general-purpose introspection capabilities that PayPal Python developers enjoy, meaning that we get you as much structured information about your application as possible without actually requiring login privileges. Of course almost every aspect of this is configurable, to suit a wide variety of environments from development to production.

Context management

Python famously has no global scope: all values are namespaced in module scope. But there are still plenty of aspects of the runtime that are global. Some are out of our control, like the OS-assigned process ID, or the VM-managed garbage collection counters. Others aspects are in our control, and best practice in concurrent programming is to keep these as well-managed as possible.

SuPPort uses a system of Contexts to explicitly manage nonlocal state, eliminating difficult-to-track implicit global state for many core functions. This has the added benefit of creating opportunities to centrally manage and monitor debugging data and statistics (some charts of which are shown below), all made available through the MetaApplication, detailed further down.

Accept SSL Charts

Charting quantiles and recent timings for incoming SSL connections from a remote service.

SuPPort Stats table

Values are also available in table-based and JSON formats, for easy human and machine readability.

MetaApplication

While not exclusively a web server framework, SuPPort leverages its strong roots in the web to provide both a web-based user interface and API full of useful runtime information.

As you can see below, there is a lot of information exposed through this default interface. This is partly because of restricted environments not allowing local login on machines, and another part is the relative convenience of a browser for most developers. Not pictured is the feature that the same information is available in JSON format for easy programmatic consumption. Because this application is such a rich source of information, we recommend using SuPPort to run it on a separate port which can be firewalled accordingly, as seen in this example.

MetaApplication Screenshot #1

A screenshot of the MetaApplication, showing load averages and other basic information, as well as subroutes to further info.

MetaApplication Screenshot 2

Another shot of the MetaApplication, showing process and runtime info

 

Infallibility

At the end of the day, reliability over long periods of time is what earns a stack approval and adoption. At this point, the SuPPort architecture has a billion production requests under its belt here at PayPal, but on the way we put it through the proverbial paces. At various points, we have tested and confirmed several edge behaviors. Here are just a few key characteristics of a well-behaved application:

  • Gracefully sheds traffic under load (no unbounded queues here)
  • Can and has run at 90%+ CPU load for days at a time
  • Is free from framework memory leaks
  • Is robust to memory leakage in user code

To illustrate, a live service handling millions of requests per day had a version of OpenSSL installed which was leaking memory on every handshake. Thanks to preemptive worker cycling on excessive process memory usage, no intervention was required and no customers were impacted. The worker cycling was noted in the logs, the leak was traced to OpenSSL, and operations was notified. The problem was fixed with the next regularly scheduled release rather than being handled as a crisis.

No monkeypatching

One of the first and sometimes only ways that people experience gevent is through monkeypatching. At the top of your main module you issue a call to gevent that automatically swaps out virtually all system libraries with their cooperatively concurrent ones. This sort of magic is relatively rare in Python programming, and rightfully so. Implicit activities like this can have unexpected consequences. SuPPort is a no-monkeypatching approach to gevent. If you want to implement your own network-level code, it is best to use gevent.socket directly. If you want gevent-incompatible libraries to work with gevent, best to use SuPPort’s gevent-based threading capabilities.

Using threads with gevent

“Threads? In my gevent? I thought the whole point of greenlets and gevent was to eliminate evil, evil threads!” –Countless strawmen

Originating in Stackless and ported over in 2004 by Armin Rigo (of PyPy fame), greenlets are mature and powerful concurrency primitives. We wanted to add that power to the process- and thread-based world of POSIX. There’s no point running from standard OS capabilities; threads have their place. Many architectures adopt a thread-per-request or process-per-request model, but the last thing we want is the number of threads going up as load increases. Threads are expensive; each thread adds a bit of contention to the mix, and in many environments the memory overhead alone, typically 4-8MB per thread, presents a problem. At just a few kilobytes apiece, greenlet’s microthreads are three orders of magnitude less costly.

Furthermore, thread usage in our architecture is hardly about parallelism; we use worker processes for that. In the SuPPort world, threads are about preemption. Cooperative greenlets are much more efficient overall, but sometimes you really do need guarantees about responsiveness.

One excellent example of how threads provide this responsiveness is the ThreadQueueServer detailed below. But first, there are two built-in Threadpools with decorators worth highlighting, io_bound and cpu_bound:

io_bound

This decorator is primarily used to wrap opaque clients built without affordances for cooperative concurrent IO. We use this to wrap cx_Oracle and other C-based clients that are built for thread-based parallelization. Other major use cases for io_bound is when getting input from standard input (stdin) and files.

A rough sketch of what threads inside a worker look like. The outer box is a process, inner boxes are threads/threadpools, and each text label refers to a coroutine/greenlet.

A rough sketch of what threads inside a worker look like. The outer box is a process, inner boxes are threads/threadpools, and each text label refers to a coroutine/greenlet.

cpu_bound

The cpu_bound decorator is used to wrap expensive operations that would halt the event loop for too long. We use it to wrap long-running cryptography and serialization tasks, such as decrypting private SSL certificates or loading huge blobs of XML and JSON. Because the majority of use cases’ implementations do not release the Global Interpreter Lock, the cpu_bound ThreadPool is actually just a pool of one thread, to minimize CPU contention from multiple unparallelizable CPU-intensive tasks.

It’s worth noting that some deserialization tasks are not worth the overhead of dispatching to a separate thread. If the data to be deserialized is very short or a result is already cached. For these cases, we have the cpu_bound_if decorator, which conditionally dispatches to the thread, yielding slightly higher responsiveness for low-complexity requests.

Also note that both of these decorators are reentrant, making dispatch idempotent. If you decorate a function that itself eventually calls a decorated function, performance won’t pay the thread dispatch tax twice.

ThreadQueueServer

The ThreadQueueServer exists as an enhanced approach to pulling new connections off of a server’s listening socket. It’s SuPPort’s way of incorporating an industry-standard practice, commonly associated with nginx and Apache, into the gevent WSGI server world.

If you’ve read this far into the post, you’re probably familiar with the standard multi-worker preforking server architecture; a parent process opens a listening socket, forks one or more children that inherit the socket, and the kernel manages which worker gets which incoming client connection.

Preforking architecture

Basic preforking architecture. The OS balances traffic between workers, monitored by an arbiter.

The problem with this approach is that it generally results in inefficient distribution of connections, and can lead to some workers being overloaded while others have cycles to spare. Plus, all worker processes are woken up by the kernel in a race to accept a single inbound connection, in what’s commonly referred to as the thundering herd.

The solution implemented here uses a thread that sleeps on accept, removing connections from the kernel’s listen queue as soon as possible, then explicitly pushing accepted connections to the main event loop. The ability to inspect this user-space connection queue enables not only even distribution but also intelligent behavior under high load, such as closing incoming connections when the backlog gets too long. This fail-fast approach prevents the kernel from holding open fully-established connections that cannot be reached in a reasonable amount of time. This backpressure takes the wait out of client failure scenarios leading to a more responsive extrinsic system, as well.

What’s next for SuPPort

The sections above highlight just a small selection of the features already in SuPPort, and there are many more to cover in future posts. In addition to those, we will also be distilling more code from our internal codebase out into the open. Among these we are particularly excited about:

  • Enhanced human-readable structured logging
  • Advanced network security functionality based on OpenSSL
  • Distributed online statistics collection
  • Additional generalizations for TCP client infrastructure

And of course, more tests! As soon as we get a couple more features distilled out, we’ll start porting out more than the skeleton tests we have now. Suffice to say, we’re really looking forward to expanding our set of codified concurrent software learnings, and incorporating as much community feedback as possible, so don’t forget to subscribe to the blog and the SuPPort repo on GitHub.

Mahmoud Hashemi, Kurt Rose, Mark Williams, and Chris Lane

Template specialization with Krakenjs Apps

By

Template Specialization is a mechanism to dynamically switch partials in your webpage at render time, based on the context information. Very common scenarios where you would need this is when you want different flavors of the same page for:

  • Country specific customization
  • A/B testing different designs
  • Adapting to various devices

Paypal runs into the above cases quite often. Most web applications in Paypal are using our open-sourced framework krakenjs on top of express/node.js. So, the mechanism for template specialization was very much inspired by the config driven approach in kraken apps. If you are not familiar with krakenjs, I’d recommend you take a quick peek at the krakenjs github repo and/or generate a kraken app to understand what I mean by ‘config driven approach’.  My example will be using dust templates to demonstrate the feature.

The main problems to solve were:

  1. A way to specify template maps for a set of context rules. I ended up writing a simple rule parsing module, Karka, which, when provided a json-based rule spec, and the context information at run-time, will resolve a partial to another when the rules match.
  2. A way to integrate the above mentioned rules into the page render workflow in the app, so that the view engine will give a chance to switch the partial in the page if a set of rules match.

After some experiments, I arrived at what I call the 3 step recipe to including specialization in the render workflow. The recipe should be applicable to any view engine (with or without kraken) in express applications (Check out my talk at JSConf 2014 on how to approach specialization without kraken)

  1. Intercept the render workflow by adding a wrapper for the view engine.
  2. Using karka (or any rule parser of your choice) generate the template map at request time using the context information, then stash it back into the context.
  3. Using the hook in your templating engine (which will let you know whenever it encounters partials and gives you an opportunity to do any customization before rendering it ), switch the template if a mapping is found.

If you are using kraken + dustjs, the above recipe has already been implemented and available ready-made for use. With some simple wiring you can see it working.

Lets see how to wire it up into an app using kraken@1.0 with the following super simple example.

  1. Generate a kraken 1.0 app (The example below uses generator-kraken@1.1.1)
  2. Add a simple karka rule spec into the file ‘config/specialization.json’ in the generated app.
{
     "whoami": [
         {
             "is": "jekyll",
             "when": {
                 "guy.is": "good",
                 "guy.looks": "respectable",
                 "guy.known.for": "philanthropy"
             }
         },
         {
             "is": "hyde",
             "when": {
                 "guy.is": "evil",
                 "guy.looks": "pre-human",
                 "guy.known.for": "violence"
             }
         }
     ]
}

Interpreting the rule spec:  ‘whoami’  will be replaced by ‘jekyll’  or ‘hyde’ when the corresponding ‘when’ clause is satisfied.

3. Wire the specialization rule-spec to be read by the view engine, by adding the following line into config/config.json file.

"specialization": "import:./specialization"

4. Change your ‘public/templates/index.dust’ to look like the following:

{>"layouts/master" /}

{<body}
 <h1>{@pre type="content" key="greeting"/}</h1>
 {>"whoami" /}
{/body}

Add  ‘public/templates/whoami.dust’ , ‘public/templates/jekyll.dust’ , ‘public/templates/hyde.dust’

{! This is whoami.dust !}
<div>
Who AM I ?????
</div>
{! This is jekyll.dust !}
<div>
 I am the good one
</div>
{! This is hyde.dust !}
<div>
 I am the evil one
</div>

5.  You want your request context to have the following to be able to see the specialization of ‘whoami’ to ‘jekyll’ :

{
    .....
    .....
    guy : {
        is: 'good',
        looks: 'respectable',
        known: {
            for: 'philanthropy'
        }
    }
    ..... 
    .....  
}

So lets try setting this context information in our sample app in  controllers/index.js.

 router.get('/', function (req, res) {
     model.guy = {
         is: 'good',
         looks: 'respectable',
         known: {
             for: 'philanthropy'
         }
     };
     res.render('index', model);
 });

What I am doing above is leveraging the model that I pass to express ‘res.render’ to set the context information to see specialization for jekyll.dust. Internally express merges res.locals and the model data to pass to the view engine. So you can instead set the values in ‘res.locals’ as well. You can also try to set the rules for ‘hyde’ in model.guy above to see specialization to hyde.dust.

Now you are ready to see it working. Open a terminal window and start the app.

$cd path/to/app
$node .

Hit ‘http://localhost:8000’ on your browser and you will see the specialized partial, per the context information. Here is the sample I created, with the exact same steps above.

The above example does not demonstrate more practical things like styling partials differently, or specializing while doing a client side render. A more comprehensive example here.

PayPal products span multiple devices and hundreds of locales and hence specialization could be the way to solve customization requirements in the views cleanly.