Autoscaling applications @ PayPal

By , and

PayPal’s infrastructure has grown tremendously over the past few years hosting multitudes of applications serving billions of transactions every year. Our strong belief in decoupling the infrastructure and the app development layer enables us to independently evolve quicker. With more than 2500 internal applications and more than 100k instances across our entire infrastructure, we are committed in ensuring high availability of our applications and optimal resource utilization. In our previous post, we discussed in detail about Nyx, an automated remediation system for our infrastructure. The underlying modular architecture of Nyx is designed to listen to a spectrum of signals from… Read more

Democratizing Experimentation Data for Product Innovations


This blog is organized as follows: Introduction to A/B Testing Next Generation PayPal Experimentation Platform Next Generation PayPal Experimentation Analytics Platform Hypothesis Testing Null Hypothesis Z-Test Z-Score Two-Sample Z-Test P-Value P-Value of a Two-Tailed Z-Test Druid Druid Post Aggregators zscore2sample pvalue2tailedztest Implementation of Druid Post Aggregator for Two-Sample Z-test Druid Query Evaluation of Post Aggregators A Druid Query Example using Zscore and Pvalue Post Aggregators Next Step in furthering Druid Performance of Post Aggregators Concluding Remarks Acknowledgements Introduction to A/B Testing A/B Testing (also known as experimentation or bucket testing), enables the product teams to gain more insights and understandings of PayPal users,… Read more

Releasing squbs 0.9.0


squbs (pronounced “sqewbs” and rhymes with “cubes”) has already made past headlines (details) for its great performance and scalability. squbs 0.9 is the biggest update to squbs yet. It paves the road towards squbs 1.0. The updates in squbs 0.9 covers the following areas: Full migration from Spray to Akka HTTP Support for HTTP end-to-end streaming and back-pressure with the new low-level FlowDefinition API for service definitions Ultimate resiliency Lowest-possible latency Brand-new streaming HTTP client With integrated client configuration and circuit breaker Java API as first class citizens (besides the solid, powerful Scala API) enabling end-to-end Java use cases Rich set… Read more

Beam Me Up – Profiling a Beam-over-Spark Application

By and

As we move forward with adopting Apache Beam for some of our streaming needs, our Beam applications need to be tested for stability. Such tests are aimed at ensuring performance does not degrade over time, and applications are capable of maintaining desired performance characteristics (e.g., latency) as they run over long periods of time. When we ran a Beam-over-Spark application (Beam 0.7.0-SNAPSHOT ; Spark 1.6.2) for a period of several hours, the batch processing time was increasing unexpectedly (e.g., regardless of traffic seasonality). In this post we share the steps and methods we used to diagnose the performance degradation we witnessed in our application’s (batch) processing time, a diagnosis which ultimately led… Read more

PayPal bttn for Commerce


In my first rotation of PayPal’s Technology Leadership Program (TLP), I was fortunate enough to work on our Western Europe region out of our Paris office. The team there wanted to tap into the Internet of Things (IoT) market and with PayPal’s strategic movement from just a button on a website to existing across all contexts – including the offline world — it was clear that a physical button that integrates with our Braintree APIs was something worth investigating. After some investigation, we found, a start-up that has an innovative approach to buttons and is based out of Helsinki, Finland.… Read more

DMARC-Related Recommendations Included in NIST Guidance on Trustworthy Email

By and

Another important milestone was recently achieved for Domain-based Message Authentication Reporting and Conformance (DMARC), one of the PayPal Ecosystem Security team’s major undertakings in making the internet a safer, more secure place. After several years of collaboration with the email security community, the U.S. National Institute of Standards and Technology (NIST) included recommendations for supporting DMARC in NIST’s SP 800-177, Trustworthy Email. SP 800-177 was released in September and is intended to give recommendation and guidelines for enhancing trust in email. While the audience for NIST publications is typically US federal agencies, its guidance does tend to influence other global… Read more

From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World – Part 2


Part 2: Lambda Architecture meets reality Part 1 can be found here. Fast Data Fast forward to December 2015. We have a cross data center Kafka clusters, we have Spark adoption through the roof. All of this, however, was to fix our traditional batch platform. I’m not going to pretend we never thought about real-time stuff. We’d been gearing up toward the Lambda architecture all along, but truly we were not working specifically for the sake of the near real-time analytics. The beauty of our current stack and skill set is that streaming just comes with it. All we needed to do… Read more

Carrier Payments Big Data Pipeline using Apache Storm


Carrier payments is a frictionless payment method enabling users to place charges for digital goods directly on their monthly mobile phone bill. There is no account needed, just the phone number. Payment authorization happens by verification of a four digit PIN sent via SMS to a user’s mobile phone. After the successful payment transaction, charges will appears on user’s monthly mobile phone bill. Historically fraud has been handled on the mobile carrier side through various types of spending caps (daily, weekly, monthly, etc.). While these spending caps were able to keep fraud at bay in the early years, as this… Read more

From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World – Part 1


Part 1: Reactive Manifesto’s Invisible Hand Let me first setup the context for my story. I’ve been with PayPal for 5-years. I’m an architect. I’m part of the team responsible for PayPal Tracking domain. Tracking is commonly and historically understood as the measurement of customer visits to web pages. With the customer’s permission our platform collects all kinds of signals from PayPal web pages, mobile apps and services, for variety of reasons. Most prominent among them are measuring new product adoptions, A/B testing, and fraud analysis. We collect several terabytes of data on our Hadoop systems every day. This is… Read more

Python by the C side


Mahmoud’s note: This will be my last post on the PayPal Engineering blog. If you’ve enjoyed this sort of content subscribe to my blog/ or follow me on Twitter. It’s been fun! All the world is legacy code, and there is always another, lower layer to peel away. These realities cause developers around the world to go on regular pilgrimage, from the terra firma of Python to the coasts of C. From zlib to SQLite to OpenSSL, whether pursuing speed, efficiency, or features, the waters are powerful, and often choppy. The good news is, when you’re writing Python, C interactions… Read more