Skip to content

Netflix at RecSys 2016 – Recap

A key aspect of Netflix is providing our members with a personalized experience so they can easily find great stories to enjoy. A collection of recommender systems drive the main aspects of this personalized experience and we continuously work on researching and testing new ways to make them better. As such, we were delighted to sponsor and participate in this year’s ACM Conference on Recommender Systems in Boston, which marked the 10th anniversary of the conference. For those who couldn’t attend or want more information, here is a recap of our talks and papers at the conference.
Justin and Yves gave a talk titled “Recommending for the World” on how we prepared our algorithms to work world-wide ahead of our global launch earlier this year. You can also read more about it in our previous blog posts.


Justin also teamed up with Xavier Amatriain, formerly at Netflix and now at Quora, in the special Past, Present, and Future track to offer an industry perspective on what the future of recommender systems in industry may be.

Chao-Yuan Wu presented a paper he authored last year while at Netflix, on how to use navigation information to adapt recommendations within a session as you learn more about user intent.

Yves also shared some pitfalls of distributed learning at the Large Scale Recommender Systems workshop.

Hossein Taghavi gave a presentation at the RecSysTV workshop on trying to balance discovery and continuation in recommendations, which is also the subject of a recent blog post.

Dawen Liang presented some research he conducted prior to joining Netflix on combining matrix factorization and item embedding.

If you are interested in pushing the frontier forward in the recommender systems space, take a look at some of our relevant open positions!

Read more

ZFS on ubuntu

Ubuntu 从16.04 (AMD64架构上)开始正式支持ZFS作为数据文件系统(非root)。但是需要另外安装。过程非常简单: sudo apt install zfs zfsutils-linux 验证是否ZFS是否已经安装并加载: $ lsmod | grep zfs zfs 2801664 11 zunicode 331776 1 zfs zcommon 57344 1 zfs znvpair 90112 2 zfs,zcommon spl 102400 3 zfs,zcommon,znvpair zavl 16384 1 zfs 作为一个非常NB的文件系统,ZFS主要的特性有: 快照 写时复制(copy-on-write…

Read more

IT培训,实验楼及其他

可达书院建立的初衷就是为了做(在线)IT培训。自它建立以来,我也一直在思考在线培训究竟要怎么做才能有效率。就像那种面对面,一对一的培训一样的效率。 刚开始的想法(2008年左右)就是录制培训视频,通过让感兴趣者通过在线观看视频来学习。这种想法的出发点是,在IT技术学习中,尤其是偏重实战的技能,视频讲解比文字叙述更有效,更容易理解。 于是经过一年多的时间,录制了一套OpenBSD绝对新手入门的视频教程。 不知不觉又过了几年,开始觉得这样还不够。如果能一边看视频,一边练习所讲的知识点,这样效果会更好一点。于是希望将原来单纯播放视频的界面改为左边播放视频,右边是一个命令行界面。这样就可以一边观看,一边练习了。但是作为系统管理员出身的我,编程并不是强项。这个想法很久都未能实现。 后来又录制了OpenBSD快速入门视频,针对有一定Linux/Unix经验的朋友。 后来一个偶然的机会遇到了实验楼这个网站,感觉这个差不多就是我想要的样子,虽然它的教程以文字为主。于是我也在那里注册了一个账号,学习了几个课程。慢慢觉得这种方式还是有些美中不足,虽然相比纯粹观看视频的方式有了不少的进步。对我来说,这个不足主要在于:一,不能以我想要的速度进行学习(有时候想要快速浏览,有时候想要慢慢思考或者状态不佳,无力以正常速度学习);二,有困惑的时候没有人指点迷津; 于是我继续思考,究竟什么样的学习方式(或者培训方式)才能让效率最大化? 对于我个人来说,学习进度上最大的问题在于,当你有困惑的时候没有人给你指点迷津。你需要停下来,去搜索,去思考,去回头复习过去的内容,想要知道是否遗漏了什么东西。可能过了很长时间,你依然没能解开心中的疑问,学习进程就此被耽搁,或不得不带着这个疑问继续,以致于后面的学习效果大打折扣。 那么,如何解决这个问题? 目前我能想到的方案,就是学习时有个一对一的辅导老师,或者一个领路者。这个人知道你大概的经验,了解你对学习目标技术的了解程度,而他/她又这方面的实际经验。当你有疑惑的时候可以马上咨询,解开心中疑团(虽然有时候未必100%能够做到这一点)。面对面的答疑解惑是最好的,其次就是借助实时网络视频。 而在这几年里,随着虚拟化、云计算和容器技术的出现,出现了很多面向云计算/容器类的工具和平台,如较早一点的OpenStack,新一点的Kubernetes,Mesos,fleet以及SmartOS/Triton,CoreOS系统等等。这些东西注定要改变系统管理员(或者称为DevOps或CloudOps)的工作方式。 我自己也一直试图跟上这个趋势。为了学习这些新的平台和工具,自己也看了不少的视频,读了很多的文档/电子书,也花了很多时间折腾这些东西。在整个过程当中,我也感觉到观看视频未必就是最好的方式。有的时候你需要一本书,最好是纸质书,这样你可以快速浏览,一目十行,或逐字逐句,慢慢思考;有时候你需要视频,可以看到实际操作时究竟是什么样子(这个时候文字表达相对就比较无力了)。 还有一点就是,如果是视频,最好是未编辑修饰过的。我自己录制视频时如果出现意外情况,我就只能暂停录制,然后停下来排错,完成后再接着录。我觉得很多时候这个排错的过程也是一个非常宝贵的学习素材,让学习者能更深入的去了解这个技术的工作机制。只是这个过程可能需要很长的时间,不太适合视频录制。  

Read more

Netflix Chaos Monkey Upgraded

We are pleased to announce a significant upgrade to one of our more popular OSS projects.  Chaos Monkey 2.0 is now on github!

Years ago, we decided to improve the resiliency of our microservice architecture.  At our scale it is guaranteed that servers on our cloud platform will sometimes suddenly fail or disappear without warning.  If we don’t have proper redundancy and automation, these disappearing servers could cause service problems.

The Freedom and Responsibility culture at Netflix doesn’t have a mechanism to force engineers to architect their code in any specific way.  Instead, we found that we could build strong alignment around resiliency by taking the pain of disappearing servers and bringing that pain forward.  We created Chaos Monkey to randomly choose servers in our production environment and turn them off during business hours.  Some people thought this was crazy, but we couldn’t depend on the infrequent occurrence to impact behavior.  Knowing that this would happen on a frequent basis created strong alignment among our engineers to build in the redundancy and automation to survive this type of incident without any impact to the millions of Netflix members around the world.

We value Chaos Monkey as a highly effective tool for improving the quality of our service.  Now Chaos Monkey has evolved.  We rewrote the service for improved maintainability and added some great new features.  The evolution of Chaos Monkey is part of our commitment to keep our open source software up to date with our current environment and needs.

Integration with Spinnaker

Chaos Monkey 2.0 is fully integrated with Spinnaker, our continuous delivery platform.
Service owners set their Chaos Monkey configs through the Spinnaker apps, Chaos Monkey gets information about how services are deployed from Spinnaker, and Chaos Monkey terminates instances through Spinnaker.

Since Spinnaker works with multiple cloud backends, Chaos Monkey does as well. In the Netflix environment, Chaos Monkey terminates virtual machine instances running on AWS and Docker containers running on Titus, our container cloud.

Integration with Spinnaker gave us the opportunity to improve the UX as well.  We interviewed our internal customers and came up with a more intuitive method of scheduling terminations.  Service owners can now express a schedule in terms of the mean time between terminations, rather than a probability over an arbitrary period of time.  We also added grouping by app, stack, or cluster, so that applications that have different redundancy architectures can schedule Chaos Monkey appropriate to their configuration. Chaos Monkey now also supports specifying exceptions so users can opt out specific clusters.  Some engineers at Netflix use this feature to opt out small clusters that are used for testing.

Chaos Monkey Spinnaker UI

Tracking Terminations

Chaos Monkey can now be configured for specifying trackers.  These external services will receive a notification when Chaos Monkey terminates an instance.  Internally, we use this feature to report metrics into Atlas, our telemetry platform, and Chronos, our event tracking system.  The graph below, taken from Atlas UI, shows the number of Chaos Monkey terminations for a segment of our service.  We can see chaos in action.  Chaos Monkey even periodically terminates itself.

Chaos Monkey termination metrics in Atlas

Termination Only

Netflix only uses Chaos Monkey to terminate instances.  Previous versions of Chaos Monkey allowed the service to ssh into a box and perform other actions like burning up CPU, taking disks offline, etc.  If you currently use one of the prior versions of Chaos Monkey to run an experiment that involves anything other than turning off an instance, you may not want to upgrade since you would lose that functionality.

Finale

We also used this opportunity to introduce many small features such as automatic opt-out for canaries, cross-account terminations, and automatic disabling during an outage.  Find the code on the Netflix github account and embrace the chaos!

-Chaos Engineering Team at Netflix
Lorin Hochstein, Casey Rosenthal

Read more

vmm enabled on OpenBSD

As per undeadly.org: With a small commit, OpenBSD now has a hypervisor and virtualization in-tree. This has been a lot of hard work by Mike Larkin, Reyk Flöter, and many others. VMM requires certain hardware features (Intel Nehalem or later,…

Read more

To Be Continued: Helping you find shows to continue watching on Netflix

Introduction

Our objective in improving the Netflix recommendation system is to create a personalized experience that makes it easier for our members to find great content to enjoy. The ultimate goal of our recommendation system is to know the exact perfect show for the member and just start playing it when they open Netflix. While we still have a long way to achieve that goal, there are areas where we can reduce the gap significantly.

When a member opens the Netflix website or app, she may be looking to discover a new movie or TV show that she never watched before, or, alternatively, she may want to continue watching a partially-watched movie or a TV show she has been binging on. If we can reasonably predict when a member is more likely to be in the continuation mode and which shows she is more likely to resume, it makes sense to place those shows in prominent places on the home page.
While most recommendation work focuses on discovery, in this post, we focus on the continuation mode and explain how we used machine learning to improve the member experience for both modes. In particular, we focus on a row called “Continue Watching” (CW) that appears on the main page of the Netflix member homepage on most platforms. This row serves as an easy way to find shows that the member has recently (partially) watched and may want to resume. As you can imagine, a significant proportion of member streaming hours are spent on content played from this row.

Continue Watching

Previously, the Netflix app in some platforms displayed a row with recently watched shows (here we use the term show broadly to include all forms of video content on Netflix including movies and TV series) sorted by recency of last time each show was played. How the row was placed on the page was determined by some rules that depended on the device type. For example, the website only displayed a single continuation show on the top-left corner of the page. While these are reasonable baselines, we set out to unify the member experience of CW row across platforms and improve it along two dimensions:

  • Improve the placement of the row on the page by placing it higher when a member is more likely to resume a show (continuation mode), and lower when a member is more likely to look for a new show to watch (discovery mode)
  • Improve the ordering of recently-watched shows in the row using their likelihood to be resumed in the current session

Intuitively, there are a number of activity patterns that might indicate a member’s likelihood to be in the continuation mode. For example, a member is perhaps likely to resume a show if she:

  • is in the middle of a binge; i.e., has been recently spending a significant amount of time watching a TV show, but hasn’t yet reached its end
  • has partially watched a movie recently
  • has often watched the show around the current time of the day or on the current device

On the other hand, a discovery session is more likely if a member:

  • has just finished watching a movie or all episodes of a TV show
  • hasn’t watched anything recently
  • is new to the service
These hypotheses, along with the high fraction of streaming hours spent by members in continuation mode, motivated us to build machine learning models that can identify and harness these patterns to produce a more effective CW row.

Building a Recommendation Model for Continue Watching

To build a recommendation model for the CW row, we first need to compute a collection of features that extract patterns of the behavior that could help the model predict when someone will resume a show. These may include features about the member, the shows in the CW row, the member’s past interactions with those shows, and some contextual information. We then use these features as inputs to build machine learning models. Through an iterative process of variable selection, model training, and cross validation, we can refine and select the most relevant set of features.

While brainstorming for features, we considered many ideas for building the CW models, including:

  1. Member-level features:
    • Data about member’s subscription, such as the length of subscription, country of signup, and language preferences
    • How active has the member been recently
    • Member’s past ratings and genre preferences
  2. Features encoding information about a show and interactions of the member with it:
    • How recently was the show added to the catalog, or watched by the member
    • How much of the movie/show the member watched
    • Metadata about the show, such as type, genre, and number of episodes; for example kids shows may be re-watched more
    • The rest of the catalog available to the member
    • Popularity and relevance of the show to the member
    • How often do the members resume this show
  3. Contextual features:
    • Current time of the day and day of the week
    • Location, at various resolutions
    • Devices used by the member


Two applications, two models


As mentioned above, we have two tasks related to organizing a member’s continue watching shows: ranking the shows within the CW row and placing the CW row appropriately on the member’s homepage.

Show ranking


To rank the shows within the row, we trained a model that optimizes a ranking loss function. To train it, we used sessions where the member resumed a previously-watched show – i.e., continuation sessions – from a random set of members. Within each session, the model learns to differentiate amongst candidate shows for continuation and ranks them in the order of predicted likelihood of play. When building the model, we placed special importance on having the model place the show of play at first position.

We performed an offline evaluation to understand how well the model ranks the shows in the CW row. Our baseline for comparison was the previous system, where the shows were simply sorted by recency of last time each show was played. This recency rank is a strong baseline (much better than random) and is also used as a feature in our new model. Comparing the model vs. recency ranking, we observed significant lift in various offline metrics. The figure below displays Precision@1 of the two schemes over time. One can see that the lift in performance is much greater than the daily variation.


This model performed significantly better than recency-based ranking in an A/B test and better matched our expectations for member behavior. As an example, we learned that the members whose rows were ranked using the new model had fewer plays originating from the search page. This meant that many members had been resorting to searching for a recently-watched show because they could not easily locate it on the home page; a suboptimal experience that the model helped ameliorate.


Row placement


To place the CW row appropriately on a member’s homepage, we would like to estimate the likelihood of the member being in a continuation mode vs. a discovery mode. With that likelihood we could take different approaches. A simple approach would be to turn row placement into a binary decision problem where we consider only two candidate positions for the CW row: one position high on the page and another one lower down. By applying a threshold on the estimated likelihood of continuation, we can decide in which of these two positions to place the CW row. That threshold could be tuned to optimize some accuracy metrics. Another approach is to take the likelihood and then map it onto different positions, possibly based on the content at that location on the page. In any case, getting a good estimate of the continuation likelihood is critical for determining the row placement. In the following, we discuss two potential approaches for estimating the likelihood of the member operating in a continuation mode.

Reusing the show-ranking model


A simple approach to estimating the likelihood of continuation vs. discovery is to reuse the scores predicted by the show-ranking model. More specifically, we could calibrate the scores of individual shows in order to estimate the probability P(play(s)=1) that each show s will be resumed in the given session. We can use these individual probabilities over all the shows in the CW row to obtain an overall probability of continuation; i.e., the probability that at least one show from the CW row will be resumed. For example, under a simple assumption of independence of different plays, we can write the probability that at least one show from the CW row will be played as:

Dedicated row model


In this approach, we train a binary classifier to differentiate between continuation sessions as positive labels and sessions where the user played a show for the first time (discovery sessions) as negative labels. Potential features for this model could include member-level and contextual features, as well as the interactions of the member with the most recent shows in the viewing history.
Comparing the two approaches, the first approach is simpler because it only requires having a single model as long as the probabilities are well calibrated. However, the second one is likely to provide a more accurate estimate of continuation because we can train a classifier specifically for it.

Tuning the placement


In our experiments, we evaluated our estimates of continuation likelihood using classification metrics and achieved good offline metrics. However, a challenge that still remains is to find an optimal mapping for that estimated likelihood, i.e., to balance continuation and discovery. In this case, varying the placement creates a trade-off between two types of errors in our prediction: false positives (where we incorrectly predict that the member wants to resume a show from the CW row) and false negatives (where we incorrectly predict that the member wants to discover new content). These two types of errors have different impacts on the member. In particular, a false negative makes it harder for members to continue bingeing on a show. While experienced members can find the show by scrolling down the page or by using the search functionality, the additional friction can make it more difficult for people new to the service. On the other hand, a false positive leads to wasted screen real estate, which could have been used to display more relevant recommendation shows for discovery. Since the impacts of the two types of errors on the member experience are difficult to measure accurately offline, we A/B tested different placement mappings and were able to learn the appropriate value from online experiments leading to the highest member engagement.

Context Awareness


One of our hypotheses was that continuation behavior depends on context: time, location, device, etc. If that is the case, given proper features, the trained models should be able to detect those patterns and adapt the predicted probability of resuming shows based on the current context of a member. For example, members may have habits of watching a certain show around the same time of the day (for example, watching comedies at around 10 PM on weekdays). As an example of context awareness, the following screenshots demonstrate how the model uses contextual features to distinguish between the behavior of a member on different devices. In this example, the profile has just watched a few minutes of the show “Sid the Science Kid” on an iPhone and the show “Narcos” on the Netflix website. In response, the CW model immediately ranks “Sid the Science Kid” at the top position of the CW row on the iPhone, and puts “Narcos” at the first position on the website.

Serving the Row

Members expect the CW row to be responsive and change dynamically after they watch a show. Moreover, some of the features in the model are time and device dependent and can not be precomputed in advance, which is an approach we use for some of our recommendation systems. Therefore, we need to compute the CW row in real-time to make sure it is fresh when we get a request for a homepage at the start of a session. To keep it fresh, we also need to update it within a session after certain user interactions and immediately push that update to the client to update their homepage. Computing the row on-the-fly at our scale is challenging and requires careful engineering. For example, some features are more expensive to compute for the users with longer viewing history, but we need to have reasonable response times for all members because continuation is a very common scenario. We collaborated with several engineering teams to create a dynamic and scalable way for serving the row to address these challenges.

Conclusion

Having a better Continue Watching row clearly makes it easier for our members to jump right back into the content they are enjoying while also getting out of the way when they want to discover something new. While we’ve taken a few steps towards improving this experience, there are still many areas for improvement. One challenge is that we seek to unify how we place this row with respect to the rest of the rows on the homepage, which are predominantly focused on discovery. This is challenging because different algorithms are designed to optimize for different actions, so we need a way to balance them. We also want to be thoughtful about pushing CW too much; we want people to “Binge Responsibly” and also explore new content. We also have details to dig into like how to determine if a show is actually finished by a user so we can remove it from the row. This can be complicated by scenarios such as if someone turned off their TV but not the playing device or fell asleep watching. We also keep an eye out for new ways to use the CW model in other aspects of the product.
Can’t wait to see how the Netflix Recommendation saga continues? Join us in tackling these kinds of algorithmic challenges and help write the next episode.

Read more

Submitting to Systems We Love

Are you tantalized by Systems We Love but you don’t know what proposal to submit? For those looking for proposal guidance, my advice is simple: find the love. Just as every presentation title at !!Con must assert its enthusiasm by ending with two bangs, you can think of every talk at Systems We Love as beginning with an implicit “Why I love…” So instead of a lecture on, say, the innards of ZFS (and well you may love ZFS!), pick an angle on ZFS that you particularly love. Why do you love it or what do you love about it? Keep it personal: this isn’t about asserting the dominance of one system—this is about you and a system (or an aspect of a system) that you love.

Read more
Sidebar