Tag Archives: Enterprise

Python by the C side

By

C shells by the C shoreMahmoud’s note: This will be my last post on the PayPal Engineering blog. If you’ve enjoyed this sort of content subscribe to my blog/pythondoeswhat.com or follow me on Twitter. It’s been fun!

All the world is legacy code, and there is always another, lower layer to peel away. These realities cause developers around the world to go on regular pilgrimage, from the terra firma of Python to the coasts of C. From zlib to SQLite to OpenSSL, whether pursuing speed, efficiency, or features, the waters are powerful, and often choppy. The good news is, when you’re writing Python, C interactions can be a day at the beach.

 

A brief history

As the name suggests, CPython, the primary implementation of Python used by millions, is written in C. Python core developers embraced and exposed Python’s strong C roots, taking a traditional tack on portability, contrasting with the “write once, debug everywhere” approach popularized elsewhere. The community followed suit with the core developers, developing several methods for linking to C. Years of these interactions have made Python a wonderful environment for interfacing with operating systems, data processing libraries, and everything the C world has to offer.

This has given us a lot of choices, and we’ve tried all of the standouts:

Approach Vintage Representative User Notable Pros Notable Cons
C extension modules 1991 Standard library Extensive documentation and tutorials. Total control. Compilation, portability, reference management. High C knowledge.
SWIG 1996 crfsuite Generate bindings for many languages at once Excessive overhead if Python is the only target.
ctypes 2003 oscrypto No compilation, wide availability Accessing and mutating C structures cumbersome and error prone.
Cython 2007 gevent, kivy Python-like. Highly mature. High performance. Compilation, new syntax and toolchain.
cffi 2013 cryptography, pypy Ease of integration, PyPy compatibility New/High-velocity.

There’s a lot of history and detail that doesn’t fit into a table, but every option falls into one of three categories:

  1. Writing C
  2. Writing code that translates to C
  3. Writing code that calls into libraries that present a C interface

Each has its merits, so we’ll explore each category, then finish with a real, live, worked example.

Writing C

Python’s core developers did it and so can you. Writing C extensions to Python gives an interface that fits like a glove, but also requires knowing, writing, building, and debugging C. The bugs are much more severe, too, as a segmentation fault that kills the whole process is much worse than a Python exception, especially in an asynchronous environment with hundreds of requests being handled within the same process. Not to mention that the glove is also tailored to CPython, and won’t fit quite right, or at all, in other execution environments.

At PayPal, we’ve used C extensions to speed up our service serialization. And while we’ve solved the build and portability issue, we’ve lost track of our share of references and have moved on from writing straight C extensions for new code.

Translating to C

After years of writing C, certain developers decide that they can do better. Some of them are certainly onto something.

Going Cythonic

Cython is a superset of the Python programming language that has been turning type-annotated Python into C extensions for nearly a decade, longer if you count its predecessor, Pyrex. Apart from its maturity, the points that matters to us are:

  • Every Python file is a valid Cython file, enabling incremental, iterative optimization
  • The generated C is highly portable, building on Windows, Mac, and Linux
  • It’s common practice to check in the generated C, meaning that builders don’t need to have Cython installed.

Not to mention that the generated C often makes use of performance tricks that are too tedious or arcane to write by hand, partially motivated by scientific computing’s constant push. And through all that, Cython code maintains a high level of integration with Python itself, right down to the stack trace and line numbers.

PayPal has certainly benefitted from their efforts through high-performance Cython users like gevent, lxml, and NumPy. While our first go with Cython didn’t stick in 2011, since 2015, all native extensions have been written and rewritten to use Cython. It wasn’t always this way however.

A sip, not a SWIG

An early contributor to Python at PayPal got us started using SWIG, the Simplified Wrapper and Interface Generator, to wrap PayPal C++ infrastructure. It served its purpose for a while, but every modification was a slog compared to more Pythonic techniques. It wasn’t long before we decided it wasn’t our cup of tea.

Long ago SWIG may have rivaled extension modules as Python programmers’ method of choice. These days it seems to suit the needs of C library developers looking for a fast and easy way to wrap their C bindings for multiple languages. It also says something that searching for SWIG usage in Python nets as much SWIG replacement libraries as SWIG usage itself.

Calling into C

So far all our examples have involved extra build steps, portability concerns, and quite a bit of writing languages other than Python. Now we’ll dig into some approaches that more closely match Python’s own dynamic nature: ctypes and cffi.

Both ctypes and cffi leverage C’s Foreign Function Interface (FFI), a sort of low-level API that declares callable entrypoints to compiled artifacts like shared objects (.so files) on Linux/FreeBSD/etc. and dynamic-link libraries (.dll files) on Windows. Shared objects take a bit more work to call, so ctypes and cffi both use libffi, a C library that enables dynamic calls into other C libraries.

Shared libraries in C have some gaps that libffi helps fill. A Linux .so, Windows .dll, or OS X .dylib is only going to provide symbols: a mapping from names to memory locations, usually function pointers. Dynamic linkers do not provide any information about how to use these memory locations. When dynamically linking shared libraries to C code, header files provide the function signatures; as long as the shared library and application are ABI compatible, everything works fine. The ABI is defined by the C compiler, and is usually carefully managed so as not to change too often.

However, Python is not a C compiler, so it has no way to properly call into C even with a known memory location and function signature. This is where libffi comes in. If symbols define where to call the API, and header files define what API to call, libffi translates these two pieces of information into how to call the API. Even so, we still need a layer above libffi that translates native Python types to C and vice versa, among other tasks.

ctypes

ctypes is an early and Pythonic approach to FFI interactions, most notable for its inclusion in the Python standard library.

ctypes works, it works well, and it works across CPython, PyPy, Jython, IronPython, and most any Python runtime worth its salt. Using ctypes, you can access C APIs from pure Python with no external dependencies. This makes it great for scratching that quick C itch, like a Windows API that hasn’t been exposed in the os module. If you have an otherwise small module that just needs to access one or two C functions, ctypes allows you to do so without adding a heavyweight dependency.

For a while, PayPal Python code used ctypes after moving off of SWIG. We found it easier to call into vanilla shared objects built from C++ with an extern C rather than deal with the SWIG toolchain. ctypes is still used incidentally throughout the code for exactly this: unobtrusively calling into certain shared objects that are widely deployed. A great open-source example of this use case is oscrypto, which does exactly this for secure networking. That said, ctypes is not ideal for huge libraries or libraries that change often. Porting signatures from headers to Python code is tedious and error-prone.

cffi

cffi, our most modern approach to C integration, comes out of the PyPy project. They were seeking an approach that would lend itself to the optimization potential of PyPy, and they ended up creating a library that fixes many of the pains of ctypes. Rather than handcrafting Python representations of the function signatures, you simply load or paste them in from C header files.

For all its convenience, cffi’s approach has its limits. C is really almost two languages, taking into account preprocessor macros. A macro performs string replacement, which opens a Fun World of Possibilities, as straightforward or as complicated as you can imagine. cffi’s approach is limited around these macros, so applicability will depend on the library with which you are integrating.

On the plus side, cffi does achieve its stated goal of outperforming ctypes under PyPy, while remaining comparable to ctypes under CPython. The project is still quite young, and we are excited to see where it goes next.

A Tale of 3 Integrations: PKCS11

We promised an example, and we almost made it three.

PKCS11 is a cryptography standard for interacting with many hardware and software security systems. The 200-plus-page core specification includes many things, including the official client interface: A large set of C header-style information. There are a variety of pre-existing bindings, but each device has its own vendor-specific quirks, so what are we waiting for?

Metaprogramming

As stated earlier, ctypes is not great for sprawling interfaces. The drudgery of converting function signatures invites transcription bugs. We somewhat automated it, but the approach was far from perfect.

Our second approach, using cffi, worked well for our first version’s supported feature subset, but unfortunately PKCS11 uses its own CK_DECLARE_FUNCTION macro instead of regular C syntax for defining functions. Therefore, cffi’s approach of skipping #define macros will result in syntactically invalid C code that cannot be parsed. On the other hand, there are other macro symbols which are compiler or operating system intrinsics (e.g. __cplusplus, _WIN32, __linux__). So even if cffi attempted to evaluate every macro, we would immediately runs into problems.

So in short, we’re faced with a hard problem. The PKCS11 standard is a gnarly piece of C. In particular:

  1. Many hundreds of important constant values are created with #define
  2. Macros are defined, then re-defined to something different later on in the same file
  3. pkcs11f.h is included multiple times, even once as the body of a struct

In the end, the solution that worked best was to write up a rigorous parser for the particular conventions used by the slow-moving standard, generate Cython, which generates C, which finally gives us access to the complete client, with the added performance bonus in certain cases. Biting this bullet took all of a day and a half, we’ve been very satisfied with the result, and it’s all thanks to a special trick up our sleeves.

Parsing Expression Grammars

Parsing expression grammars (PEGs) combine the power of a true parser generating an abstract syntax tree, not unlike the one used for Python itself, with the convenience of regular expressions. One might think of PEGs as recursive regular expressions. There are several good libraries for Python, including parsimonious and parsley. We went with the former for its simplicity.

For this application, we defined a two grammars, one for pkcs11f.h and one for pkcs11t.h:

PKCS11F GRAMMAR

    file = ( comment / func / " " )*
    func = func_hdr func_args
    func_hdr = "CK_PKCS11_FUNCTION_INFO(" name ")"
    func_args = arg_hdr " (" arg* " ); #endif"
    arg_hdr = " #ifdef CK_NEED_ARG_LIST" (" " comment)?
    arg = " " type " " name ","? " " comment
    name = identifier
    type = identifier
    identifier = ~"[A-Z_][A-Z0-9_]*"i
    comment = ~"(/\*.*?\*/)"ms

PKCS11T GRAMMAR

    file = ( comment / define / typedef / struct_typedef / func_typedef / struct_alias_typedef / ignore )*
    typedef = " typedef" type identifier ";"
    struct_typedef = " typedef struct" identifier " "? "{" (comment / member)* " }" identifier ";"
    struct_alias_typedef = " typedef struct" identifier " CK_PTR"? identifier ";"
    func_typedef = " typedef CK_CALLBACK_FUNCTION(CK_RV," identifier ")(" (identifier identifier ","? comment?)* " );"    member = identifier identifier array_size? ";" comment?
    array_size = "[" ~"[0-9]"+ "]"
    define = "#define" identifier (hexval / decval / " (~0UL)" / identifier / ~" \([A-Z_]*\|0x[0-9]{8}\)" )
    hexval = ~" 0x[A-F0-9]{8}"i
    decval = ~" [0-9]+"
    type = " unsigned char" / " unsigned long int" / " long int" / (identifier " CK_PTR") / identifier
    identifier = " "? ~"[A-Z_][A-Z0-9_]*"i
    comment = " "? ~"(/\*.*?\*/)"ms
    ignore = ( " #ifndef" identifier ) / " #endif" / " "

Short, but dense, in true grammatical style. Looking at the whole program, it’s a straightforward process:

  1. Apply the grammars to the header files to get our abstract syntax tree.
  2. Walk the AST and sift out the semantically important pieces, function signatures in our case.
  3. Generate code from the function signature data structures.

Using only 200 lines of code to bring such a massive standard to bear, along with the portability and performance of Cython, through the power of PEGs ranks as one of the high points of Python in practice at PayPal.

Wrapping up

It’s been a long journey, but we stayed afloat and we’re happy to have made it. To recap:

  • Python and C are hand-in-glove made for one another.
  • Different C integration techniques have their applications, our stances are:
    • ctypes for dynamic calls to small, stable interfaces
    • cffi for dynamic calls to larger interfaces, especially when targeting PyPy
    • Old-fashioned C extensions if you’re already good at them
    • Cython-based C extensions for the rest
    • SWIG pretty much never
  • Parsing Expression Grammars are great!

All of this encapsulates perfectly why we love Python so much. Python is a great starter language, but it also has serious chops as a systems language and ecosystem. That bottom-to-top, rags-to-riches, books-to-bits story is what makes it the ineffable, incomparable language that it is.

C you around!

Kurt and Mahmoud

Python Packaging at PayPal

By

Year after year, Pythonists all over are churning out more code than ever. People are learning, the ecosystem is flourishing, and everything is running smoothly, right up until packaging. Packaging Python is fundamentally un-Pythonic. It can be a tough lesson to learn, but across all environments and applications, there is no one obvious, right way to deploy. Frankly, it’s hard to think of an area where Python’s Zen applies less.

At PayPal, we write and deploy our fair share of Python, and we wanted to devote a couple minutes to our story and give credit where credit is due. For conclusion seekers, without doubt or further ado: Continuum Analytics’ Anaconda Python distribution has made our lives so much easier. For small- and medium-sized teams, no matter the deployment scale, Anaconda has big implications. But let’s talk about how we got here.

Beginnings

Right now, PayPal Python Infrastructure provides equitable support for Windows, OS X, Linux, and Solaris, supporting various combinations of 32-bit and 64-bit Python 2.6, Python 2.7, and PyPy 5.

Glossing over the primordial days, when Kurt and I started building the Python platform at PayPal, we didn’t know we would be building the first cross-platform stack the company had ever seen. It was December 2012, we just wanted to see every developer unwrap a brand new laptop running PayPal Python services locally.

What ensued was the most intense engineering sprint I had ever experienced. We ported critical functionality previously only available in shared objects we had been calling into with ctypes. Several key parts were available in binary form only and had to be disassembled. But with the New Year, 2013, we were feeling like a whole new stack. All the PayPal-specific parts of our framework were pure-Python and portable. Just needed to install a few open-source libraries, like gevent, greenlet, maybe lxml. Just pip install, right?

Up the hill

In an environment where Python is still a new technology to most, pip is often not available, let alone understood. This learning curve can represent a major hurdle to many. We wanted more people to be able to write Python, and even more to be able to run it, as many places as possible, regardless of whether they were career Pythonists. So with a judicious shake of Python simplicity, we adopted a policy of “vendoring in” all of our core dependencies, including compiled extensions, like gevent.

This model yields somewhat larger repositories, but the benefits outweighed a few extra seconds of clone time. Of all the local development stories, there is still no option more empowering than the fully self-contained repository. Clone and run. A process so seamless, it’s like a miniature demo that goes perfect every time. In a world of multi-hour C++ and Java builds, it might as well be magic.

“So what’s the problem?”

Static builds. Every few months (or every CVE) the Python team would have to sit down to refresh, regression test, and certify a new set of libraries. New libraries were added sparingly, which is great for auditability, but not so great for flexibility. All of this is fine for a tight set of networking, cryptography, and serialization libraries, but no way could we support the dozens of dependencies necessary for machine learning and other advanced Python use cases.

And then came Anaconda. With the Anaconda Python distribution, Continuum is doing effectively what our team had been doing, but for free, for everyone, for hundreds of libraries. Finally, there was a standard option that made Python even simpler for our developers.

Adopting and adapting

As soon as we had the opportunity, we made Anaconda a supported platform for development. From then on, regardless of platform, Python beginners got one of two introductions: Install Anaconda, or visit our shared Jupyter Notebook, also backed by Anaconda.

The good kind of escalation

Today, Anaconda has gone beyond development environments to enable production PayPal machine learning applications for the better part of a year. And it’s doing so with more optimizations than we can shake a stick at, including running all the intensive numerical operations on Intel’s MKL. From now on, Python applications exist on a moving walkway to production perfection.

This was realized through two Anaconda packaging models that work for us. The first preinstalls a complete Anaconda on top of one of PayPal’s base Docker images. This works, and is buzzword-compliant, but for reasons outside the scope of this post, also entails maintaining a single large Docker image with the dependencies of all our downstream users.

As with all packaging, there’s always another way. One alternative approach that has worked well for us involves a little Continuum project known as Miniconda. This minimalist distribution has just enough to make Python and conda work. At build time, our applications package Miniconda, the bzip2 conda archives of the dependencies, and a Python installer, wrapped up with a CalVer filename. At deploy time, we install Miniconda, then conda install the dependencies. No downloads, no compilation, no outside dependencies. The code is only a little longer than the description of the process. Conda envs are more powerful than virtualenvs, and have a better cross-platform, cross-dev/prod story, as well. Developers enjoy the increased control, smaller packages, and applicability across both standard and containerized environments.

Packages to come

As stated in Enterprise Software with Python, packaging and deployment is not the last step. The key to deployment success is uniform, well-specified environments, with minimal variation between development and production. Or use Anaconda and call it good enough! We sincerely thank the Anaconda contributors for their open-source contributions, and hope that their reach spreads to ever more environments and runtimes.

Statistics for Software

By

Software is like a watch: you want a measurement you can read and rely on.Software development begins as a quest for capability, doing what could not be done before. Once that what is achieved, the engineer is left with the how. In enterprise software, the most frequently asked questions are, “How fast?” and more importantly, “How reliable?”

Questions about software performance cannot be answered, or even appropriately articulated, without statistics.

Yet most developers can’t tell you much about statistics. Much like math, statistics simply don’t come up for typical projects. Between coding the new and maintaining the old, who has the time?

Engineers must make the time. I understand fifteen minutes can seem like a big commitment these days, so maybe bookmark it. Insistent TLDR seekers can head for our instrumentation section or straight to the summary.

For the dedicated few, class is in session. It’s time to broach the topic, learn what works, and take the guesswork out of software. A few core practices go a long way in generating meaningful systems analysis. And a few common mistakes set projects way back. This guide aims to lighten software maintenance and speed up future development through answers made possible by the right kinds of applied statistics.

Collection and convention

To begin, let’s consider a target component we want to measure and improve. At PayPal this is often one of our hundreds of HTTP server applications. If you were to look at our internal frameworks, you would find hundreds of code paths instrumented to generate streams of measurements: function execution times, service latencies, request lengths, and response codes, to name a few.

We discuss more about instrumentation below, but for now we assume these measurement points exist and focus on numerical measurements like execution times.

The correct starting point minimizes bias. We assume the least, treating our measurements as values of random variables. So opens the door to the wide world of descriptive statistics, a whole area devoted to describing the behavior of randomness. While this may sound obvious, remember that much of statistics is inferential, dedicated to modeling future outcomes. They may go hand-in-hand, but knowing the difference and proper names drastically speeds up future research. Lesson #1 is that it pays to learn the right statistics terminology.

Measurement is all about balance. Too little data and you might as well have not bothered. Or worse, you make an uninformed decision that takes your service down with it. Then again, too much data can take down a service. And even if you keep an eye on memory consumption, you need to consider the resources required to manage excess data. “Big data” looms large, but it is not a necessary step toward understanding big systems. We want dense, balanced data.

So what tools does the engineer have for balanced measurement? When it comes to metadata, there is never a shortage of volume, dimensions, or techniques. To avoid getting lost, we focus on descriptive summary statistics that can be collected with minimal overhead. With that direction, let’s start out by covering some familiar territory.

Moments of truth

Statistical moments may not sound familiar, but virtually everyone uses them, including you. By age 10, most students can compute an average, otherwise known as the arithmetic mean. Our everyday mean is known to statisticians as the first moment. The mean is nice to calculate and can be quite enlightening, but there’s a reason this post doesn’t end here.

The mean is just the most famous of four commonly used moments. These moments are used to create a progression that adds up to a standard data description:

  1. Mean – The central tendency of the data
  2. Variance – The looseness of the grouping around the mean
  3. Skewness – The symmetry or bias toward one end of the scale
  4. Kurtosis – The amount of data in “the wings” of the set
Four charts showing the mean, variance, skewness, and kurtosis.

The mean, variance, skewness, and kurtosis. In all images blue is less than yellow is less than pink.

These four standardized moments represent the most widely-applied metrics for describing the shape of a distribution. Each successive measure adds less practical information than the last, so skewness and kurtosis are often omitted. And while many are just hearing about these measures for the first time, omission may not be a bad thing.

The mean, variance, skewness, and kurtosis are almost never the right tools for a performance-minded engineer. Moment-based measures are not trustworthy messengers of the critical engineering metadata we seek.

Moment-based measures are not robust. Non-robust statistics simultaneously:

  • Bend to skew by outliers
  • Dilute the meaning of those outliers

An outlier is any data point distant from the rest of the distribution. “Outlier” may sound remote and improbable, but they are everywhere, making moment-based statistics uninformative and even dangerous. Outliers often represent the most critical data for a troubleshooting engineer.

So why do so many still use moments for software? The short answer, if you’ll pardon the pun: momentum. The mean and variance have two advantages: easy implementation and broad usage. In reality, that familiarity leads to damaging assumptions that ignore specifics like outliers. For software performance, the mean and variance are useful only in the context of robust statistical measures.

So, enough buildup. Lesson #2 is avoid relying solely on non-robust statistics like the mean to describe your data. Now, what robust techniques can we actually rely on?

Quantile mechanics

If you’ve ever looked at census data or gotten standardized test results, you’re already familiar with quantiles. Quantiles are points that subdivide a dataset’s range into equal parts. Most often, we speak of quartiles (4 parts) and percentiles (100 parts). Measures of central tendency tend to be the most popular, and the same goes for the 50th percentile, the median.

While experienced engineers are happier to see the median than the mean, the data doesn’t really get interesting until we get into the extremes. For software performance and reliability, that means the 95th, 98th, 99th, and 99.9th percentiles. We call these the extremes, but as in high-traffic systems, they are far from uncommon. We also look for the range formed by the minimum and maximum values, sometimes called the 0th and 100th percentiles.

The challenge with quantiles and ranges is efficient computation. For instance, the traditional way to calculate the median is to choose the middle value, or the average of the two middle values, from a sorted set of all the data. Considering all the data at once is the only way to calculate exact quantiles.

Keeping all our data in memory would be prohibitively expensive for our use cases. Instead, using the principle of “good enough”, an engineer has ways of estimating quantiles much more efficiently. Statistics is founded on utilitarian compromise.

Dipping into the stream

As a field, statistics formed to tackle the logistical issue of approximating an unreachable population. Even today, it’s still not possible to poll every person, taste every apple, or drive every car. So statistics continues to provide.

In the realm of software performance, data collection is automatable, to the point of making measurement too automatic. The problem becomes collation, indexing, and storage. Hard problems, replete with hard-working people.

Here we’re trying to make things easier. We want to avoid those hard problems. The easiest way to avoid too much data is to throw data away. Discarded data may not need storage, but if you’re not careful, it will live on, haunting data in the form of biases. We want pristine, meaningful data, indistinguishable from the population, just a whole lot smaller. So statistics provides us random sampling.

Reservoir sampling only holds on to a little bit of data, letting most of it pass through uncollected.

Reservoir sampling only holds on to a little bit of data, letting most of it pass through uncollected.

The twist is that we want to sample from an unknown population, considering only one data point at a time. This use case calls for a special corner of computer science: online algorithms, a subclass of streaming algorithms. “Online” implies only individual points are considered in a single pass. “Streaming” implies the program can only consider a subset of the data at a time, but can work in batches or run multiple passes. Fortunately, Donald Knuth helped popularize an elegant approach that enables random sampling over a stream: Reservoir sampling. Let’s jump right in.

First we designate a counter, which will be incremented for every data point seen. The reservoir is generally a list or array of predefined size. Now we can begin adding data. Until we encounter size elements, elements are added directly to reservoir. Once reservoir is full, incoming data points have a size / counter chance to replace an existing sample point. We never look at the value of a data point and the random chance is guaranteed by definition. This way reservoir is always representative of the dataset as a whole, and is just as likely to have a data point from the beginning as it is from the end. All this, with bounded memory requirements, and very little computation. See the instrumentation section below for links to Python implementations.

In simpler terms, the reservoir progressively renders a scaled-down version of the data, like a fixed-size thumbnail. Reservoir sampling’s ability to handle populations of unknown size fits perfectly with tracking response latency and other metrics of a long-lived server process.

Interpretations

Once you have a reservoir, what are the natural next steps? At PayPal, our standard procedure looks like:

  1. Look at the min, max, and other quantiles of interest (generally median, 95th, 99th, 99.9th percentiles).
  2. Visualize the CDF and histogram to get a sense for the shape of the data, usually by loading the data in a Jupyter notebook and using Pandas, matplotlib, and occasionally bokeh.
Histograms are most of what you need to get a silhouette view of your data.

Reservoirs are perfect for getting that histogram silhouette of your performance data. CDF overlays and QQ plots can help with comparison.

And that’s it really. Beyond this we are usually adding more dimensions, like comparisons over time or between datacenters. Having the range, quantiles, and sampled view of the data really eliminates so much guesswork that we end up saving time. Tighten a timeout, implement a retry, test, and deploy. Maybe add higher-level acceptance criteria like an Apdex score. Regardless of the performance problem, we know when we’ve fixed it and we have the right numbers and charts to back us up.

Reservations

Accuracy is a matter of having the right resolution.

Accuracy is a matter of having the right resolution.

Reservoir sampling does have its shortcomings. In particular, like an image thumbnail, accuracy is only as good as the resolution configured. In some cases, the data near the edges gets a bit blocky. Good implementations of reservoir sampling will already track the maximum and minimum values, but for engineers interested in the edges, we recommend keeping an increased set of the exact outliers. For example, for critical paths, we sometimes explicitly track the n highest response times observed in the last hour.

Depending on your runtime environment, resources may come at a premium. Reservoir sampling requires very little processing power, provided you have an efficient PRNG. Even your Arduino has one of those. But memory costs can pile up. Generally speaking, accuracy scales with the square root of size. Twice as much accuracy will cost you four times as much memory, so there are diminishing returns.

At PayPal, the typical reservoir is allocated 16,384 floating point slots, for a total of 64 kilobytes. At this rate, humans run out of memory before the servers do. Tracking 500 variables only takes 8 megabytes. As a developer, remembering what they all are is a different story.

Transitions

Usually, reservoirs get us what we want and we can get on with non-statistical development. But sometimes, the situation calls for a more tailored approach.

At various points at PayPal, we’ve worked with q-digests, biased quantile estimators, and plenty of other advanced algorithms for handling performance data. After a lot of experimentation, two approaches remain our go-tos, both of which are much simpler than one might presume.

The first approach, histogram counters, establishes ranges of interest, called bins or buckets, based on statistics gathered from a particular reservoir. While reservoir counting is data agnostic, looking only at a random value to decide where to put the data, bucketed counting looks at the value, finds the bucket whose range includes that value, and increments the bucket’s associated counter. The value itself is not stored. The code is simple, and the memory consumption is even lower, but the key advantage is the execution speed. Bucketed counting is so low overhead that it allows statistics collection to permeate much deeper into our code than other algorithms would allow.

The second approach, Piecewise Parabolic Quantile Estimation (P2 for short), is an engineering classic. A product of the 1980s electronics industry, P2 is a pragmatic online algorithm originally designed for simple devices. When we look at a reservoir’s distribution and decide we need more resolution for certain quantiles, P2 lets us specify the quantiles ahead of time, and maintains those values on every single observation. The memory consumption is very low, but due to the math involved, P2 uses more CPU than reservoir sampling and bucketed counting. Furthermore, we’ve never seen anyone attempt combination of P2 estimators, but we assume it’s nontrivial. The good news is that for most distributions we see, our P2 estimators are an order of magnitude more accurate than reservoir sampling.

These approaches both take something learned from the reservoir sample and apply it toward doing less. Histograms provide answers in terms of a preconfigured value ranges, P2 provides answers at preconfigured quantile points of interest. Lesson #3 is to choose your statistics to match the questions you need to answer. If you don’t know what those questions are, stick to general tools like reservoir sampling.

Next steps

There are a lot of ways to combine statistics and engineering, so it’s only fair that we offer some running starts.

Instrumentation

We focused a lot on statistical fundamentals, but how do we generate relevant datasets in the first place? Our answer is through structured instrumentation of our components. With the right hooks in place, the data will be there when we need it, whether we’re staying late to debug an issue or when we have a spare cycle to improve performance.

Much of PayPal’s Python services’ robustness can be credited to a reliable remote logging infrastructure, similar to, but more powerful than, rsyslog. Still, before we can send data upstream, the process must collect internal metrics. We leverage two open-source projects, fast approaching major release:

  • faststat – Optimized statistical accumulators
  • lithoxyl – Next-generation logging and application instrumentation

Lithoxyl is a high-level library designed for application introspection. It’s intended to be the most Pythonic logging framework possible. This includes structured logging and various accumulators, including reservoir sampling, P2, and others. But more importantly, Lithoxyl creates a separate instrumentation aspect for applications, allowing output levels and formats to be managed separately from the instrumentation points themselves.

Faststat operates on a lower level. True to its name, Faststat is a compiled Python extension that implements accumulators for the measures described here and many more. This includes everything from geometric/harmonic means to Markov-like transition tracking to a metametric that tracks the time between stat updates. At just over half a microsecond per point, Faststat’s low-overhead allows it to permeate into some of the deepest depths of our framework code. Faststat lacks output mechanisms of its own, so our internal framework includes a simple web API and UI for browsing statistics, as well as a greenlet that constantly uploads faststat data to a remote accumulation service for alerting and archiving.

One of the many advantages to investing in instrumentation early is that you get a sense for the performance overhead of data collection. Reliability and features far outweigh performance in the enterprise space. Many critical services I’ve worked on could be multiple times faster without instrumentation, but removing this aspect would render them unmaintainable, which brings me to my next point.

Good work takes cycles. All the methods described here are performance-minded, but you have to spend cycles to regain cycles. An airplane could carry more passengers without all those heavy dials and displays up front. It’s not hard to imagine why logging and metrics is, for most services, second only to the features themselves. Always remember and communicate that having to choose between features and instrumentation does not bode well for the reliability record of the application or organization.

For those who really need to move fast and would prefer to reuse or subscribe, there are several promising choices out there, including New Relic and Prometheus. Obviously we have our own systems, but those offerings do have percentiles and histograms.

Expansion

Without detracting from the main course above, this section remains as dessert. Have your fill, but keep in mind that the concepts above fulfill 95% of the needs of an enterprise SOA environment like PayPal.

We focused here on descriptive, non-parametric statistics for numerical applications. Statistics is much more than that. Having the vocabulary means the difference between the right answer and no answer. Here are some areas to expand into, and how they have applied to our work:

Non-parametric statistics gave us quantiles, but offers so much more. Generally, non-parametric describes any statistical construct that does not make assumptions about probability distribution, e.g. normal or binomial. This means it has the most broadly-applicable tools in the descriptive statistics toolbox. This includes everything from the familiar histogram to the sleeker kernel density estimation (KDE). There’s also a wide variety of nonparametric tests aimed at quantitatively discovering your data’s distribution and expanding into the wide world of parametric methods.

Parametric statistics contrast with non-parametric statistics in that the data is presumed to follow a given probability distribution. If you’ve established or assumed that your data can be modeled as one of the many published distributions, you’ve given yourself a powerful set of abstractions with which to reason about your system. We could do a whole article on the probability distributions we expect from different parts of our Python backend services (hint: expect a lot of fish and phones). Teasing apart the curves inherent in your system is quite a feat, but we never drift too far from the real observations. As with any extensive modeling exercise, heed the cautionary song of the black swan.

Inferential statistics contrast with descriptive statistics in that the goal is to develop models and predict future performance. Applying predictive modeling, like regression and distribution fitting, can help you assess whether you are collecting sufficient data, or if you’re missing some metrics. If you can establish a reliable model for your service and hook it into monitoring and alerting, you’ll have reached SRE nirvana. In the meantime, many teams make do with simply overlaying charts with the last week. This is often quite effective, diminishing the need for mathematical inference, but does require constant manual interpretation, doesn’t compose well for longer-term trend analysis, and really doesn’t work when the previous week isn’t representative (i.e., had an outage or a spike).

Categorical statistics contrast with numerical statistics in that the data is not mathematically measurable. Categorical data can be big, such as IPs and phone numbers, or small, like user languages. Our key non-numerical metrics are around counts, or cardinality, of categorical data. Some components have used HyperLogLog and Count-Min sketches for distributable streaming cardinality estimates. While reservoir sampling is much simpler, and can be used for categorical data as well, HLL and CMS offer increased space efficiency, and more importantly: proven error bounds. After grasping reservoir sampling, but before delving into advanced cardinaltiy structures, you may want to have a look at boltons ThresholdCounter, the heavy hitters counter used extensively in PayPal’s Python services. Regardless, be sure to take a look at this ontology of basic statistical data types.

Multivariate statistics allow you to analyze multiple output variables at a time. It’s easy to go overboard with multiple dimensions, as there’s always an extra dimension if you look for it. Nevertheless, a simple, practical exploration of correlations can give you a better sense of your system, as well as inform you as to redundant data collection.

Measurements are like a silhouette, a blanket over real behavior.

Measurements are like a blanket over real behavior.

Multimodal statistics abound in real world data: multiple peaks or multiple distributions packed into a single dataset. Consider response times from an HTTP service:

  • Successful requests (200s) have a “normal” latency.
  • Client failures (400s) complete quickly, as little work can be done with invalid requests.
  • Server errors (500s) can either be very quick (backend down) or very slow (timeouts).

Here we can assume that we have several curves overlaid, with 3 obvious peaks. This exaggerated graph makes it clear that maintaining a single set of summary statistics can do the data great injustice. Two peaks really narrows down the field of effective statistical techniques, and three or more will present a real challenge. There are times when you will want to discover and track datasets separately for more meaningful analysis. Other times it makes more sense to bite the bullet and leave the data mixed.

Time-series statistics transforms measurements by contextualizing them into a single, near-universal dimension: time intervals. At PayPal, time series are used all over, from per-minute transaction and error rates sent to OpenTSDB, to the Python team’s homegrown $PYPL Pandas stock price analysis. Not all data makes sense as a time series. It may be easy to implement certain algorithms over time series streams, but be careful about overapplication. Time-bucketing contorts the data, leading to fewer ways to safely combine samples and more shadows of misleading correlations.

Moving metrics, sometimes called rolling or windowed metrics, are another powerful class of calculation that can combine measurement and time. For instance, the exponentially-weighted moving average (EWMA), famously used by UNIX for its load averages:

$ uptime
10:53PM  up 378 days,  1:01, 3 users, load average: 1.37, 0.22, 0.14

This output packs a lot of information into a small space, and is very cheap to track, but it takes some knowledge and understanding to interpret correctly. EWMA is simultaneously familiar and nuanced. It’s fun to consider whether you want time series-style disrcete buckets or the continuous window of a moving statistic. For instance, do you want the counts for yesterday, or the past 24 hours? Do you want the previous hour or the last 60 minutes? Based on the questions people ask about our applications, PayPal Python services keep few moving metrics, and generally use a lot more time series.

Some survival modeling will make you feel clean.

Even a little survival modeling will make your code cleaner.

Survival analysis is used to analyze the lifetimes of system components, and must make an appearance in any engineering article about reliability. Invaluable for simulations and post-mortem investigations, even a basic understanding of the bathtub curve can provide insight into lifespans of running processes. Failures are rooted in causes at the beginning, middle, and end of expected lifetime, which when overlaid, create a bathtub aggregate curve. When the software industry gets to a point where it leverages this analysis as much as the hardware industry, the technology world will undoubtedly have become a cleaner place.

Summary

It’s been a journey, but I hope you learned something new. If nothing else, you know how we do it. Remember the three lessons:

  1. Statistics is full of very specific terminology and techniques that you may not know that you need. It pays to take time to learn them.
  2. Averages and other moment-based measures don’t cut it for software performance and reliability. Never underestimate the outlier.
  3. Use quantiles, counting, and other robust metrics to generate meaningful data suitable for exploring the specifics of your software.

If you would like to learn more about statistics, especially branching out from descriptive statistics, I recommend Allen Downey‘s video course Data Exploration in Python, as well as his free books Think Stats and Think Bayes. If you would like to learn more about enterprise software development, or have developers in need of training, take a look at my new video course, Enterprise Software with Python. It includes even more software fundamentals informed by PayPal Python development, and is out now on oreilly.com and Safari.

I hope you found this a useful enough guide that you’ll save yourself some time, either by applying the techniques described or subscribing and sharing the link with other developers. Everyone needs a reintroduction to statistics after a few years of real work. Even if you studied statistics in school, real work, real data, and real questions have a way of making one wonder who passed those statistics exams. As interconnectivity increases, a rising tide of robust measurement floats all software engineering boats.

Introducing SuPPort

By

In our last post, Ten Myths of Enterprise Python, we promised a deeper dive into how our Python Infrastructure works here at PayPal and eBay. There is a problem, though. There are only so many details we can cover, and at the end of the day, it’s just so much better to show than to tell.

support_logoSo without further ado, we’re pleased to introduce SuPPort, an in-development distillation of our PayPal Python Infrastructure.

Started in 2010, Python Infrastructure initially powered PayPal’s internal price-setting interfaces, then grew to support payment orchestration interfaces, and now in 2015 it supports dozens of projects at PayPal and eBay, having handled billions of production-critical requests for a wide range of teams and tiers. So what does it mean to distill this functionality into SuPPort?

SuPPort is an event-driven server framework designed for building scalable and maintainable services and clients. It’s built on top of several open-source technologies, so before we dig into the workings of SuPPort, we ought to showcase its foundations:

Some or all of these may be new to many developers, but all-in-all they comprise a powerful set of functionality. With power comes complexity, and while Python as a language strives for technical convergence, there are many ways to approach the problem of developing, scaling, and maintaining components. SuPPort is one way gevent and the libraries above have been used to build functional services and products with anywhere from 100 requests per day to 100 requests per second and beyond.

Enterprise Ideals, Flexible Features

Many motivations have gone into building up a Python stack at PayPal, but as in any enterprise environment, we continuously aim to achieve the following:

Of course organizations of all sizes want these features as well, but the key difference is that large organizations like PayPal usually end up building more. All while demanding a higher degree of redundancy and risk mitigation from their processes. This often results in great cost in terms of both hardware and developer productivity. Fortunately for us, Python can be very efficient in both respects.

So, let’s take a stroll through a selection of SuPPort’s feature set in the context of these criteria! Note that if you’re unfamiliar with evented programming, nonblocking sockets, and gevent in particular, some of this may seem quite foreign. The gevent tutorial is a good entry point for the intermediate Python programmer, which can be supplemented with this well-illustrated introduction to server architectures.

Interoperability

Python usage here at PayPal has spread to virtually every imaginable use case: administrative interfaces, midtier services, operations automation, developer tools, batch jobs; you name it, Python has filled a gap in that area. This legacy has resulted in a few rather interesting abstractions exposed in SuPPort.

BufferedSocket

PayPal has hundreds of services across several tiers. Interoperating between these means having to implement over half a dozen network protocols. The BufferedSocket type eliminated our inevitable code duplication, handling a lot of the nitty-gritty of making a socket into a parser-friendly data source, while retaining timeouts for keeping communications responsive. A must-have primitive for any gevent protocol implementer.

ConnectionManager

Errors happen in live environments. DNS requests fail. Packets are lost. Latency spikes. TCP handshakes are slow. SSL handshakes are slower. Clients rarely handle these problems gracefully. This is why SuPPort includes the ConnectionManager, which provides robust error handling code for all of these cases with consistent logging and monitoring. It also provides a central point of configuration for timeouts and host fallbacks.

Introspectability

As part of a large organization, we can afford to add more machines, and are even required to keep a certain level of redundancy and idle hardware. And while DevOps is catching on in many larger-scale environments, there are many cases in enterprise environments where developers are not allowed to attend to their production code.

SuPPort currently comes with all the same general-purpose introspection capabilities that PayPal Python developers enjoy, meaning that we get you as much structured information about your application as possible without actually requiring login privileges. Of course almost every aspect of this is configurable, to suit a wide variety of environments from development to production.

Context management

Python famously has no global scope: all values are namespaced in module scope. But there are still plenty of aspects of the runtime that are global. Some are out of our control, like the OS-assigned process ID, or the VM-managed garbage collection counters. Others aspects are in our control, and best practice in concurrent programming is to keep these as well-managed as possible.

SuPPort uses a system of Contexts to explicitly manage nonlocal state, eliminating difficult-to-track implicit global state for many core functions. This has the added benefit of creating opportunities to centrally manage and monitor debugging data and statistics (some charts of which are shown below), all made available through the MetaApplication, detailed further down.

Accept SSL Charts

Charting quantiles and recent timings for incoming SSL connections from a remote service.

SuPPort Stats table

Values are also available in table-based and JSON formats, for easy human and machine readability.

MetaApplication

While not exclusively a web server framework, SuPPort leverages its strong roots in the web to provide both a web-based user interface and API full of useful runtime information.

As you can see below, there is a lot of information exposed through this default interface. This is partly because of restricted environments not allowing local login on machines, and another part is the relative convenience of a browser for most developers. Not pictured is the feature that the same information is available in JSON format for easy programmatic consumption. Because this application is such a rich source of information, we recommend using SuPPort to run it on a separate port which can be firewalled accordingly, as seen in this example.

MetaApplication Screenshot #1

A screenshot of the MetaApplication, showing load averages and other basic information, as well as subroutes to further info.

MetaApplication Screenshot 2

Another shot of the MetaApplication, showing process and runtime info

 

Infallibility

At the end of the day, reliability over long periods of time is what earns a stack approval and adoption. At this point, the SuPPort architecture has a billion production requests under its belt here at PayPal, but on the way we put it through the proverbial paces. At various points, we have tested and confirmed several edge behaviors. Here are just a few key characteristics of a well-behaved application:

  • Gracefully sheds traffic under load (no unbounded queues here)
  • Can and has run at 90%+ CPU load for days at a time
  • Is free from framework memory leaks
  • Is robust to memory leakage in user code

To illustrate, a live service handling millions of requests per day had a version of OpenSSL installed which was leaking memory on every handshake. Thanks to preemptive worker cycling on excessive process memory usage, no intervention was required and no customers were impacted. The worker cycling was noted in the logs, the leak was traced to OpenSSL, and operations was notified. The problem was fixed with the next regularly scheduled release rather than being handled as a crisis.

No monkeypatching

One of the first and sometimes only ways that people experience gevent is through monkeypatching. At the top of your main module you issue a call to gevent that automatically swaps out virtually all system libraries with their cooperatively concurrent ones. This sort of magic is relatively rare in Python programming, and rightfully so. Implicit activities like this can have unexpected consequences. SuPPort is a no-monkeypatching approach to gevent. If you want to implement your own network-level code, it is best to use gevent.socket directly. If you want gevent-incompatible libraries to work with gevent, best to use SuPPort’s gevent-based threading capabilities.

Using threads with gevent

“Threads? In my gevent? I thought the whole point of greenlets and gevent was to eliminate evil, evil threads!” –Countless strawmen

Originating in Stackless and ported over in 2004 by Armin Rigo (of PyPy fame), greenlets are mature and powerful concurrency primitives. We wanted to add that power to the process- and thread-based world of POSIX. There’s no point running from standard OS capabilities; threads have their place. Many architectures adopt a thread-per-request or process-per-request model, but the last thing we want is the number of threads going up as load increases. Threads are expensive; each thread adds a bit of contention to the mix, and in many environments the memory overhead alone, typically 4-8MB per thread, presents a problem. At just a few kilobytes apiece, greenlet’s microthreads are three orders of magnitude less costly.

Furthermore, thread usage in our architecture is hardly about parallelism; we use worker processes for that. In the SuPPort world, threads are about preemption. Cooperative greenlets are much more efficient overall, but sometimes you really do need guarantees about responsiveness.

One excellent example of how threads provide this responsiveness is the ThreadQueueServer detailed below. But first, there are two built-in Threadpools with decorators worth highlighting, io_bound and cpu_bound:

io_bound

This decorator is primarily used to wrap opaque clients built without affordances for cooperative concurrent IO. We use this to wrap cx_Oracle and other C-based clients that are built for thread-based parallelization. Other major use cases for io_bound is when getting input from standard input (stdin) and files.

A rough sketch of what threads inside a worker look like. The outer box is a process, inner boxes are threads/threadpools, and each text label refers to a coroutine/greenlet.

A rough sketch of what threads inside a worker look like. The outer box is a process, inner boxes are threads/threadpools, and each text label refers to a coroutine/greenlet.

cpu_bound

The cpu_bound decorator is used to wrap expensive operations that would halt the event loop for too long. We use it to wrap long-running cryptography and serialization tasks, such as decrypting private SSL certificates or loading huge blobs of XML and JSON. Because the majority of use cases’ implementations do not release the Global Interpreter Lock, the cpu_bound ThreadPool is actually just a pool of one thread, to minimize CPU contention from multiple unparallelizable CPU-intensive tasks.

It’s worth noting that some deserialization tasks are not worth the overhead of dispatching to a separate thread. If the data to be deserialized is very short or a result is already cached. For these cases, we have the cpu_bound_if decorator, which conditionally dispatches to the thread, yielding slightly higher responsiveness for low-complexity requests.

Also note that both of these decorators are reentrant, making dispatch idempotent. If you decorate a function that itself eventually calls a decorated function, performance won’t pay the thread dispatch tax twice.

ThreadQueueServer

The ThreadQueueServer exists as an enhanced approach to pulling new connections off of a server’s listening socket. It’s SuPPort’s way of incorporating an industry-standard practice, commonly associated with nginx and Apache, into the gevent WSGI server world.

If you’ve read this far into the post, you’re probably familiar with the standard multi-worker preforking server architecture; a parent process opens a listening socket, forks one or more children that inherit the socket, and the kernel manages which worker gets which incoming client connection.

Preforking architecture

Basic preforking architecture. The OS balances traffic between workers, monitored by an arbiter.

The problem with this approach is that it generally results in inefficient distribution of connections, and can lead to some workers being overloaded while others have cycles to spare. Plus, all worker processes are woken up by the kernel in a race to accept a single inbound connection, in what’s commonly referred to as the thundering herd.

The solution implemented here uses a thread that sleeps on accept, removing connections from the kernel’s listen queue as soon as possible, then explicitly pushing accepted connections to the main event loop. The ability to inspect this user-space connection queue enables not only even distribution but also intelligent behavior under high load, such as closing incoming connections when the backlog gets too long. This fail-fast approach prevents the kernel from holding open fully-established connections that cannot be reached in a reasonable amount of time. This backpressure takes the wait out of client failure scenarios leading to a more responsive extrinsic system, as well.

What’s next for SuPPort

The sections above highlight just a small selection of the features already in SuPPort, and there are many more to cover in future posts. In addition to those, we will also be distilling more code from our internal codebase out into the open. Among these we are particularly excited about:

  • Enhanced human-readable structured logging
  • Advanced network security functionality based on OpenSSL
  • Distributed online statistics collection
  • Additional generalizations for TCP client infrastructure

And of course, more tests! As soon as we get a couple more features distilled out, we’ll start porting out more than the skeleton tests we have now. Suffice to say, we’re really looking forward to expanding our set of codified concurrent software learnings, and incorporating as much community feedback as possible, so don’t forget to subscribe to the blog and the SuPPort repo on GitHub.

Mahmoud Hashemi, Kurt Rose, Mark Williams, and Chris Lane

10 Myths of Enterprise Python

By

2016 Update: Whether you enjoy myth busting, Python, or just all enterprise software, you will also likely enjoy Enterprise Software with Python, presented by the author of the article below, and published by O’Reilly.

PayPal enjoys a remarkable amount of linguistic pluralism in its programming culture. In addition to the long-standing popularity of C++ and Java, an increasing number of teams are choosing JavaScript and Scala, and Braintree‘s acquisition has introduced a sophisticated Ruby community.

One language in particular has both a long history at eBay and PayPal and a growing mindshare among developers: Python.

Python has enjoyed many years of grassroots usage and support from developers across eBay. Even before official support from management, technologists of all walks went the extra mile to reap the rewards of developing in Python. I joined PayPal a few years ago, and chose Python to work on internal applications, but I’ve personally found production PayPal Python code from nearly 15 years ago.

Today, Python powers over 50 projects, including:

  • Features and products, like RedLaser
  • Operations and infrastructure, both OpenStack and proprietary
  • Mid-tier services and applications, like the one used to set PayPal’s prices and check customer feature eligibility
  • Monitoring agents and interfaces, used for several deployment and security use cases
  • Batch jobs for data import, price adjustment, and more
  • And far too many developer tools to count

In the coming series of posts I’ll detail the initiatives and technologies that led the eBay/PayPal Python community to grow from just under 25 engineers in 2011 to over 260 in 2014. For this introductory post, I’ll be focusing on the 10 myths I’ve had to debunk the most in eBay and PayPal’s enterprise environments.

Myth #1: Python is a new language

What with all the start-ups using it and kids learning it these days, it’s easy to see how this myth still persists. Python is actually over 23 years old, originally released in 1991, 5-years before HTTP 1.0 and 4-years before Java. A now-famous early usage of Python was in 1996: Google’s first successful web crawler.

If you’re curious about the long history of Python, Guido van Rossum, Python’s creator, has taken the care to tell the whole story.

Myth #2: Python is not compiled

While not requiring a separate compiler toolchain like C++, Python is in fact compiled to bytecode, much like Java and many other compiled languages. Further compilation steps, if any, are at the discretion of the runtime, be it CPython, PyPy, Jython/JVM, IronPython/CLR, or some other process virtual machine. See Myth #6 for more info.

The general principle at PayPal and elsewhere is that the compilation status of code should not be relied on for security. It is much more important to secure the runtime environment, as virtually every language has a decompiler, or can be intercepted to dump protected state. See the next myth for even more Python security implications.

Myth #3: Python is not secure

Python’s affinity for the lightweight may not make it seem formidable, but the intuition here can be misleading. One central tenet of security is to present as small a target as possible. Big systems are anti-secure, as they tend to overly centralize behaviors, as well as undercut developer comprehension. Python keeps these demons at bay by encouraging simplicity. Furthermore, CPython addresses these issues by being a simple, stable, and easily-auditable virtual machine. In fact, a recent analysis by Coverity Software resulted in CPython receiving their highest quality rating.

Python also features an extensive array of open-source, industry-standard security libraries. At PayPal, where we take security and trust very seriously, we find that a combination of hashlib, PyCrypto, and OpenSSL, via PyOpenSSL and our own custom bindings, cover all of PayPal’s diverse security and performance needs.

For these reasons and more, Python has seen some of its fastest adoption at PayPal (and eBay) within the application security group. Here are just a few security-based applications utilizing Python for PayPal’s security-first environment:

  • Creating security agents for facilitating key rotation and consolidating cryptographic implementations
  • Integrating with industry-leading HSM technologies
  • Constructing TLS-secured wrapper proxies for less-compliant stacks
  • Generating keys and certificates for our internal mutual-authentication schemes
  • Developing active vulnerability scanners

Plus, myriad Python-built operations-oriented systems with security implications, such as firewall and connection management. In the future we’ll definitely try to put together a deep dive on PayPal Python security particulars.

Myth #4: Python is a scripting language

Python can indeed be used for scripting, and is one of the forerunners of the domain due to its simple syntax, cross-platform support, and ubiquity among Linux, Macs, and other Unix machines.

In fact, Python may be one of the most flexible technologies among general-use programming languages. To list just a few:

  1. Telephony infrastructure (Twilio)
  2. Payments systems (PayPal, Balanced Payments)
  3. Neuroscience and psychology (many, many, examples)
  4. Numerical analysis and engineering (numpy, numba, and many more)
  5. Animation (LucasArts, Disney, Dreamworks)
  6. Gaming backends (Eve Online, Second Life, Battlefield, and so many others)
  7. Email infrastructure (Mailman, Mailgun)
  8. Media storage and processing (YouTube, Instagram, Dropbox)
  9. Operations and systems management (Rackspace, OpenStack)
  10. Natural language processing (NLTK)
  11. Machine learning and computer vision (scikit-learn, Orange, SimpleCV)
  12. Security and penetration testing (so many and eBay/PayPal
  13. Big Data (Disco, Hadoop support)
  14. Calendaring (Calendar Server, which powers Apple iCal)
  15. Search systems (ITA, Ultraseek, and Google)
  16. Internet infrastructure (DNS) (BIND 10)

Not to mention web sites and web services aplenty. In fact, PayPal engineers seem to have a penchant for going on to start Python-based web properties, YouTube and Yelp, for instance. For an even bigger list of Python success stories, check out the official list.

Myth #5: Python is weakly-typed

Python’s type system is characterized by strong, dynamic typing. Wikipedia can explain more.

Not that it is a competition, but as a fun fact, Python is more strongly-typed than Java. Java has a split type system for primitives and objects, with null lying in a sort of gray area. On the other hand, modern Python has a unified strong type system, where the type of None is well-specified. Furthermore, the JVM itself is also dynamically-typed, as it traces its roots back to an implementation of a Smalltalk VM acquired by Sun.

Python’s type system is very nice, but for enterprise use there are much bigger concerns at hand.

Myth #6: Python is slow

First, a critical distinction: Python is a programming language, not a runtime. There are several Python implementations:

  1. CPython is the reference implementation, and also the most widely distributed and used.
  2. Jython is a mature implementation of Python for usage with the JVM.
  3. IronPython is Microsoft’s Python for the Common Language Runtime, aka .NET.
  4. PyPy is an up-and-coming implementation of Python, with advanced features such as JIT compilation, incremental garbage collection, and more.

Each runtime has its own performance characteristics, and none of them are slow per se. The more important point here is that it is a mistake to assign performance assessments to a programming languages. Always assess an application runtime, most preferably against a particular use case.

Having cleared that up, here is a small selection of cases where Python has offered significant performance advantages:

  1. Using NumPy as an interface to Intel’s MKL SIMD
  2. PyPy‘s JIT compilation achieves faster-than-C performance
  3. Disqus scales from 250 to 500 million users on the same 100 boxes

Admittedly these are not the newest examples, just my favorites. It would be easy to get side-tracked into the wide world of high-performance Python and the unique offerings of runtimes. Instead of addressing individual special cases, attention should be drawn to the generalizable impact of developer productivity on end-product performance, especially in an enterprise setting.

A comparison of C++ to Python,. The Python is approximately a tenth the size.

C++ vs Python,. Two languages, one output, as apples to apples as it gets.

Given enough time, a disciplined developer can execute the only proven approach to achieving accurate and performant software:

  1. Engineer for correct behavior, including the development of respective tests
  2. Profile and measure performance, identifying bottlenecks
  3. Optimize, paying proper respect to the test suite and Amdahl’s Law, and taking advantage of Python’s strong roots in C.

It might sound simple, but even for seasoned engineers, this can be a very time-consuming process. Python was designed from the ground up with developer timelines in mind. In our experience, it’s not uncommon for Python projects to undergo three or more iterations in the time it C++ and Java to do just one. Today, PayPal and eBay have seen multiple success stories wherein Python projects outperformed their C++ and Java counterparts, with less code (see right), all thanks to fast development times enabling careful tailoring and optimization. You know, the fun stuff.

Myth #7: Python does not scale

Scale has many definitions, but by any definition, YouTube is a web site at scale. More than 1 billion unique visitors per month, over 100 hours of uploaded video per minute, and going on 20 pecent of peak Internet bandwidth, all with Python as a core technology. Dropbox, Disqus, Eventbrite, Reddit, Twilio, Instagram, Yelp, EVE Online, Second Life, and, yes, eBay and PayPal all have Python scaling stories that prove scale is more than just possible: it’s a pattern.

The key to success is simplicity and consistency. CPython, the primary Python virtual machine, maximizes these characteristics, which in turn makes for a very predictable runtime. One would be hard pressed to find Python programmers concerned about garbage collection pauses or application startup time. With strong platform and networking support, Python naturally lends itself to smart horizontal scalability, as manifested in systems like BitTorrent.

Additionally, scaling is all about measurement and iteration. Python is built with profiling and optimization in mind. See Myth #6 for more details on how to vertically scale Python.

Myth #8: Python lacks good concurrency support

Occasionally debunking performance and scaling myths, and someone tries to get technical, “Python lacks concurrency,” or, “What about the GIL?” If dozens of counterexamples are insufficient to bolster one’s confidence in Python’s ability to scale vertically and horizontally, then an extended explanation of a CPython implementation detail probably won’t help, so I’ll keep it brief.

Python has great concurrency primitives, including generators, greenlets, Deferreds, and futures. Python has great concurrency frameworks, including eventlet, gevent, and Twisted. Python has had some amazing work put into customizing runtimes for concurrency, including Stackless and PyPy. All of these and more show that there is no shortage of engineers effectively and unapologetically using Python for concurrent programming. Also, all of these are officially support and/or used in enterprise-level production environments. For examples, refer to Myth #7.

The Global Interpreter Lock, or GIL, is a performance optimization for most use cases of Python, and a development ease optimization for virtually all CPython code. The GIL makes it much easier to use OS threads or green threads (greenlets usually), and does not affect using multiple processes. For more information, see this great Q&A on the topic and this overview from the Python docs.

Here at PayPal, a typical service deployment entails multiple machines, with multiple processes, multiple threads, and a very large number of greenlets, amounting to a very robust and scalable concurrent environment (see figure below). In most enterprise environments, parties tends to prefer a fairly high degree of overprovisioning, for general prudence and disaster recovery. Nevertheless, in some cases Python services still see millions of requests per machine per day, handled with ease.

Sketch of a PayPal Python server worker

A rough sketch of a single worker within our coroutine-based asynchronous architecture. The outermost box is the process, the next level is threads, and within those threads are green threads. The OS handles preemption between threads, whereas I/O coroutines are cooperative.

Myth #9: Python programmers are scarce

There is some truth to this myth. There are not as many Python web developers as PHP or Java web developers. This is probably mostly due to a combined interaction of industry demand and education, though trends in education suggest that this may change.

That said, Python developers are far from scarce. There are millions worldwide, as evidenced by the dozens of Python conferences, tens of thousands of StackOverflow questions, and companies like YouTube, Bank of America, and LucasArts/Dreamworks employing Python developers by the hundreds and thousands. At eBay and PayPal we have hundreds of developers who use Python on a regular basis, so what’s the trick?

Well, why scavenge when one can create? Python is exceptionally easy to learn, and is a first programming language for children, university students, and professionals alike. At eBay, it only takes one week to show real results for a new Python programmer, and they often really start to shine as quickly as 2-3 months, all made possible by the Internet’s rich cache of interactive tutorials, books, documentation, and open-source codebases.

Another important factor to consider is that projects using Python simply do not require as many developers as other projects. As mentioned in Myth #6 and Myth #9, lean, effective teams like Instagram are a common trope in Python projects, and this has certainly been our experience at eBay and PayPal.

Myth #10: Python is not for big projects

Myth #7 discussed running Python projects at scale, but what about developing Python projects at scale? As mentioned in Myth #9, most Python projects tend not to be people-hungry. while Instagram reached hundreds of millions of hits a day at the time of their billion dollar acquisition, the whole company was still only a group of a dozen or so people. Dropbox in 2011 only had 70 engineers, and other teams were similarly lean. So, can Python scale to large teams?

Bank of America actually has over 5,000 Python developers, with over 10 million lines of Python in one project alone. JP Morgan underwent a similar transformation. YouTube also has engineers in the thousands and lines of code in the millions. Big products and big teams use Python every day, and while it has excellent modularity and packaging characteristics, beyond a certain point much of the general development scaling advice stays the same. Tooling, strong conventions, and code review are what make big projects a manageable reality.

Luckily, Python starts with a good baseline on those fronts as well. We use PyFlakes and other tools to perform static analysis of Python code before it gets checked in, as well as adhering to PEP8, Python’s language-wide base style guide.

Finally, it should be noted that, in addition to the scheduling speedups mentioned in Myth #6 and #7, projects using Python generally require fewer developers, as well. Our most common success story starts with a Java or C++ project slated to take a team of 3-5 developers somewhere between 2-6 months, and ends with a single motivated developer completing the project in 2-6 weeks (or hours, for that matter).

A miracle for some, but a fact of modern development, and often a necessity for a competitive business.

A clean slate

Mythology can be a fun pastime. Discussions around these myths remain some of the most active and educational, both internally and externally, because implied in every myth is a recognition of Python’s strengths. Also, remember that the appearance of these seemingly tedious and troublesome concerns is a sign of steadily growing interest, and with steady influx of interested parties comes the constant job of education. Here’s hoping that this post manages to extinguish a flame war and enable a project or two to talk about the real work that can be achieved with Python.

Keep an eye out for future posts where I’ll dive deeper into the details touched on in this overview. If you absolutely must have details before then, or have corrections or comments, shoot me an email at mahmoud@paypal.com. Until then, happy coding!