Releasing squbs 0.9.0

By

squbs (pronounced “sqewbs” and rhymes with “cubes”) has already made past headlines (details) for its great performance and scalability. squbs 0.9 is the biggest update to squbs yet. It paves the road towards squbs 1.0. The updates in squbs 0.9 covers the following areas: Full migration from Spray to Akka HTTP Support for HTTP end-to-end streaming and back-pressure with the new low-level FlowDefinition API for service definitions Ultimate resiliency Lowest-possible latency Brand-new streaming HTTP client With integrated client configuration and circuit breaker Java API as first class citizens (besides the solid, powerful Scala API) enabling end-to-end Java use cases Rich set… Read more

Beam Me Up – Profiling a Beam-over-Spark Application

By and

As we move forward with adopting Apache Beam for some of our streaming needs, our Beam applications need to be tested for stability. Such tests are aimed at ensuring performance does not degrade over time, and applications are capable of maintaining desired performance characteristics (e.g., latency) as they run over long periods of time. When we ran a Beam-over-Spark application (Beam 0.7.0-SNAPSHOT ; Spark 1.6.2) for a period of several hours, the batch processing time was increasing unexpectedly (e.g., regardless of traffic seasonality). In this post we share the steps and methods we used to diagnose the performance degradation we witnessed in our application’s (batch) processing time, a diagnosis which ultimately led… Read more

PayPal bttn for Commerce

By

In my first rotation of PayPal’s Technology Leadership Program (TLP), I was fortunate enough to work on our Western Europe region out of our Paris office. The team there wanted to tap into the Internet of Things (IoT) market and with PayPal’s strategic movement from just a button on a website to existing across all contexts – including the offline world — it was clear that a physical button that integrates with our Braintree APIs was something worth investigating. After some investigation, we found bt.tn, a start-up that has an innovative approach to buttons and is based out of Helsinki, Finland.… Read more

DMARC-Related Recommendations Included in NIST Guidance on Trustworthy Email

By and

Another important milestone was recently achieved for Domain-based Message Authentication Reporting and Conformance (DMARC), one of the PayPal Ecosystem Security team’s major undertakings in making the internet a safer, more secure place. After several years of collaboration with the email security community, the U.S. National Institute of Standards and Technology (NIST) included recommendations for supporting DMARC in NIST’s SP 800-177, Trustworthy Email. SP 800-177 was released in September and is intended to give recommendation and guidelines for enhancing trust in email. While the audience for NIST publications is typically US federal agencies, its guidance does tend to influence other global… Read more

From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World – Part 2

By

Part 2: Lambda Architecture meets reality Part 1 can be found here. Fast Data Fast forward to December 2015. We have a cross data center Kafka clusters, we have Spark adoption through the roof. All of this, however, was to fix our traditional batch platform. I’m not going to pretend we never thought about real-time stuff. We’d been gearing up toward the Lambda architecture all along, but truly we were not working specifically for the sake of the near real-time analytics. The beauty of our current stack and skill set is that streaming just comes with it. All we needed to do… Read more

Carrier Payments Big Data Pipeline using Apache Storm

By

Carrier payments is a frictionless payment method enabling users to place charges for digital goods directly on their monthly mobile phone bill. There is no account needed, just the phone number. Payment authorization happens by verification of a four digit PIN sent via SMS to a user’s mobile phone. After the successful payment transaction, charges will appears on user’s monthly mobile phone bill. Historically fraud has been handled on the mobile carrier side through various types of spending caps (daily, weekly, monthly, etc.). While these spending caps were able to keep fraud at bay in the early years, as this… Read more

From Big Data to Fast Data in Four Weeks or How Reactive Programming is Changing the World – Part 1

By

Part 1: Reactive Manifesto’s Invisible Hand Let me first setup the context for my story. I’ve been with PayPal for 5-years. I’m an architect. I’m part of the team responsible for PayPal Tracking domain. Tracking is commonly and historically understood as the measurement of customer visits to web pages. With the customer’s permission our platform collects all kinds of signals from PayPal web pages, mobile apps and services, for variety of reasons. Most prominent among them are measuring new product adoptions, A/B testing, and fraud analysis. We collect several terabytes of data on our Hadoop systems every day. This is… Read more

Python by the C side

By

Mahmoud’s note: This will be my last post on the PayPal Engineering blog. If you’ve enjoyed this sort of content subscribe to my blog/pythondoeswhat.com or follow me on Twitter. It’s been fun! All the world is legacy code, and there is always another, lower layer to peel away. These realities cause developers around the world to go on regular pilgrimage, from the terra firma of Python to the coasts of C. From zlib to SQLite to OpenSSL, whether pursuing speed, efficiency, or features, the waters are powerful, and often choppy. The good news is, when you’re writing Python, C interactions… Read more

Spark in Flames – Profiling Spark Applications Using Flame Graphs

By

When your organization runs multiple jobs on a Spark cluster, resource utilization becomes a priority. Ideally, computations receive sufficient resources to complete in an acceptable time and release resources for other work. In order to make sure applications do not waste any resources, we want to profile their threads to try and spot any problematic code. Common profiling methods are difficult to apply to a distributed application running on a cluster. This post suggests an approach to profiling Spark applications. The form of thread profiling used is sampling – capturing stack traces and aggregating these stack traces into meaningful data, in this case displayed… Read more

Python Packaging at PayPal

By

Year after year, Pythonists all over are churning out more code than ever. People are learning, the ecosystem is flourishing, and everything is running smoothly, right up until packaging. Packaging Python is fundamentally un-Pythonic. It can be a tough lesson to learn, but across all environments and applications, there is no one obvious, right way to deploy. Frankly, it’s hard to think of an area where Python’s Zen applies less. At PayPal, we write and deploy our fair share of Python, and we wanted to devote a couple minutes to our story and give credit where credit is due. For… Read more