Introducing Aardvark and Repokid

AWS Least Privilege for Distributed, High-Velocity Development

by Jason Chan, Patrick Kelley, and Travis McPeak


Today we are pleased to announce two new open-source cloud security tools from Netflix: Aardvark and Repokid. Used together, these tools are the next logical step in our goal to run a secure, large scale Amazon Web Services (AWS) deployment that accommodates rapid innovation and distributed, high-velocity development. When used together, Aardvark and Repokid help us get closer to the principle of least privilege without sacrificing speed or introducing heavy process. In this blog post we’ll describe the basic problem and why we need tools to solve it, introduce new tools that we’ve developed to tackle the problem, and discuss future improvements to blend the tools seamlessly into our continual operations.

IAM Permissions — Inside the Cockpit

AWS Identity and Access Management (IAM) is a powerful service that allows you to securely configure access to AWS cloud resources. With over 2,500 permissions and counting, IAM gives users fine-grained control over which actions can be performed on a given resource in AWS. However, this level of control introduces complexity, which can make it more difficult for developers. Rather than focusing on getting their application to run correctly they have to switch context to work on knowing the exact AWS permissions the system needs. If they don’t grant necessary permissions, the application will fail. Overly permissive deployments reduce the chances of an application mysteriously breaking, but create unnecessary risk and provide attackers with a large foothold from which they may further penetrate a cloud environment.

Continuous Analysis and Adjustment

Rightsizing Permissions — Autopilot for IAM

In an ideal world every application would be deployed with the exact permissions required. In practice, however, the effort required to determine the precise permissions required for each application in a complicated production environment is prohibitively expensive and doesn’t scale. At Netflix we’ve adopted an approach that we believe balances developer freedom and velocity and security best-practices: access profiling and automated and ongoing right-sizing. We allow developers to deploy their applications with a basic set of permissions and then use profiling data to remove permissions that are demonstrably not used. By continually re-examining our environment and removing unused permissions, our environment converges to least privilege over time.

Introducing Aardvark

AWS provides a service named Access Advisor that shows all of the various AWS services that the policies of an IAM Role permit access to and when (if at all) they were last accessed. Today Access Advisor data is only available in the console, so we created Aardvark to make it easy to retrieve at scale. Aardvark uses PhantomJS to log into the AWS console and retrieve Access Advisor data for all of the IAM Roles in an account. Aardvark stores the latest Access Advisor data in a database and exposes a RESTful API. Aardvark supports threading to retrieve data for multiple accounts simultaneously, and in practice refreshes data for our environment daily in less than 20 minutes.

Introducing Repokid

Repokid uses the data about services used (or not) by a role to remove permissions that a role doesn’t need. It does so by keeping a DynamoDB table with data about each role that it has seen including: policies, count of permissions (total and unused), whether a role is eligible for repo or if it is filtered, and when it was last repoed (“repo” is shortened from repossess — our verb for the act of taking back unused permissions). Filters can be used to exclude a role from repoing if, for example, if it is too young to have been accurately profiled or it is on a user-defined blacklist.

Once a role has been sufficiently profiled, Repokid’s repo feature revises inline policies attached to a role to exclude unused permissions. Repokid also maintains a cache of previous policy versions in case a role needs to be restored to a previous state. The repo feature can be applied to a single role, but is more commonly used to target every eligible role in an account.

Future Work

Currently Repokid uses Access Advisor data (via Aardvark) to make decisions about which services can be removed. Access Advisor data only applies to a service as a whole, so we can’t see which specific service permissions are used. We are planning to extend Repokid profiling by augmenting Access Advisor with CloudTrail. By using CloudTrail data, we can remove individual unused permissions within services that are otherwise required.

We’re also working on using Repokid data to discover permissions which are frequently removed so that we can deploy more restrictive default roles.

Finally, In its current state Repokid keeps basic stats about the total permissions each role has over time, but we will continue to refine metrics and record keeping capabilities.

Extending our Security Automation Toolkit

At Netflix, a core philosophy of the Cloud Security team is the belief that our tools should enable developers to build and operate secure systems as easily as possible. In the past we’ve released tools such as Lemur to make it easy to request and manage SSL certificates, Security Monkey to raise awareness and visibility of common AWS security misconfigurations, Scumblr to discover and manage software security issues, and Stethoscope to assess security across all of a user’s devices. By using these tools, developers are more productive because they can worry less about security details, and our environment becomes more secure because the tools prevent common misconfigurations. With Repokid and Aardvark we are now extending this philosophy and approach to cover IAM roles and permissions.

Stay in touch!

At Netflix we are currently using both of these tools internally to keep role permissions tightened in our environment. We’d love to see how other organizations use these tools and look forward to collaborating on further development.

Introducing Aardvark and Repokid was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read More

Protecting Netflix Viewing Privacy at Scale

On the Open Connect team at Netflix, we are always working to enhance the hardware and software in the purpose-built Open Connect Appliances (OCAs) that store and serve Netflix video content. As we mentioned in a recent company blog post, since the beginning of the Open Connect program we have significantly increased the efficiency of our OCAs – from delivering 8 Gbps of throughput from a single server in 2012 to over 90 Gbps from a single server in 2016. We contribute to this effort on the software side by optimizing every aspect of the software for our unique use case – in particular, focusing on the open source FreeBSD operating system and the NGINX web server that run on the OCAs.

Members of the team will be presenting a technical session on this topic at the Intel Developer Forum (IDF16) in San Francisco this month. This blog introduces some of the work we’ve done.

Adding TLS to Video Streams

In the modern internet world, we have to focus not only on efficiency, but also security. There are many state-of-the-art security mechanisms in place at Netflix, including Transport Level Security (TLS) encryption of customer information, search queries, and other confidential data. We have always relied on pre-encoded Digital Rights Management (DRM) to secure our video streams. Over the past year, we’ve begun to use Secure HTTP (HTTP over TLS or HTTPS) to encrypt the transport of the video content as well. This helps protect member privacy, particularly when the network is insecure – ensuring that our members are safe from eavesdropping by anyone who might want to record their viewing habits.

Netflix Open Connect serves over 125 million hours of content per day, all around the world. Given our scale, adding the overhead of TLS encryption calculations to our video stream transport had the potential to greatly reduce the efficiency of our global infrastructure. We take this efficiency seriously, so we had to find creative ways to enhance the software on our OCAs to accomplish this objective.

We will describe our work in these three main areas:
  • Determining the ideal cipher for bulk encryption
  • Finding the best implementation of the chosen cipher
  • Exploring ways to improve the data path to and from the cipher implementation

Cipher Evaluation

We evaluated available and applicable ciphers and decided to primarily use the Advanced Encryption Standard (AES) cipher in Galois/Counter Mode (GCM), available starting in TLS 1.2. We chose AES-GCM over the Cipher Block Chaining (CBC) method, which comes at a higher computational cost. The AES-GCM cipher algorithm encrypts and authenticates the message simultaneously – as opposed to AES-CBC, which requires an additional pass over the data to generate keyed-hash message authentication code (HMAC). CBC can still be used as a fallback for clients that cannot support the preferred method.

All revisions of Open Connect Appliances also have Intel CPUs that support AES-NI, the extension to the x86 instruction set designed to improve encryption and decryption performance.
We needed to determine the best implementation of AES-GCM with the AES-NI instruction set, so we investigated alternatives to OpenSSL, including BoringSSL and the Intel Intelligent Storage Acceleration Library (ISA-L).

Additional Optimizations

Netflix and NGINX had previously worked together to improve our HTTP client request and response time via the use of sendfile calls to perform a zero-copy data flow from storage (HDD or SSD) to network socket, keeping the data in the kernel memory address space and relieving some of the CPU burden. The Netflix team specifically added the ability to make the sendfile calls asynchronous – further reducing the data path and enabling more simultaneous connections.

However, TLS functionality, which requires the data to be passed to the application layer, was incompatible with the sendfile approach.

To retain the benefits of the sendfile model while adding TLS functionality, we designed a hybrid TLS scheme whereby session management stays in the application space, but the bulk encryption is inserted into the sendfile data pipeline in the kernel. This extends sendfile to support encrypting data for TLS/SSL connections.

We also made some important fixes to our earlier data path implementation, including eliminating the need to repeatedly traverse mbuf linked lists to gain addresses for encryption.

Testing and Results

We tested the BoringSSL and ISA-L AES-GCM implementations with our sendfile improvements against a baseline of OpenSSL (with no sendfile changes), under typical Netflix traffic conditions on three different OCA hardware types. Our changes in both the BoringSSL and ISA-L test situations significantly increased both CPU utilization and bandwidth over baseline – increasing performance by up to 30%, depending on the OCA hardware version. We chose the ISA-L cipher implementation, which had slightly better results. With these improvements in place, we can continue the process of adding TLS to our video streams for clients that support it, without suffering prohibitive performance hits.

Read more details in this paper and the follow up paper. We continue to investigate new and novel approaches to making both security and performance a reality. If this kind of ground-breaking work is up your alley, check out our latest job openings!

By Randall Stewart, Scott Long, Drew Gallatin, Alex Gutarin, and Ellen Livengood

Read More

ComixWall 4.6 installation demo

ComixWall是一个基于OpenBSD的ISG(互联网安全网关),也可以说是UTM(同意威胁管理)系统, 它集成了数据包过滤,HTTP/SMTP/POP3代理,IDS和IPS以及VPN等等多项网络安全相关应用。本视频演示ComixWall的安装过程。同时,ComixWall项目于12月8号发表声明,表示项目将于今年年终终止,不在发布新的发行版。所以这次我们演示的4.6也将成为最后一个。但我仍然要说这个系统是值得大家尝试和使用的。

Read More


ComixWall是一个基于OpenBSD的ISG(互联网安全网关),也可以说是UTM(同意威胁管理)系统, 它集成了数据包过滤,HTTP/SMTP/POP3代理,IDS和IPS以及VPN等等多项网络安全相关应用。本视频演示ComixWall的安装过程。同时,ComixWall项目于12月8号发表声明,表示项目将于今年年终终止,不在发布新的发行版。所以这次我们演示的4.6也将成为最后一个。但我仍然要说这个系统是值得大家尝试和使用的。

Read More