可達書院 – Page 7 – 计算机与操作系统

关于量化投资、IT及教育的一点随想

如果你在做股票，你可能会意识到现在量化投资有越来越火的迹象。国内目前几个免费的量化投资平台，像米筐，优矿和聚宽等，使用的编程语言都是Python。我想要表达的意思是：即使你无意从事IT行业，你也应该学习一门编程语言，熟练掌握一个操作系统的使用。而计算机或者是一个编程语言，他们作为一个工具，对各行各业都有很大的影响。随着时间的推移，这种影响会越来越大。我也想就此再发挥一下，聊一点青少年教育的科目问题。我对现在的学校教育最大的不满就是它们的目标。我觉得，它们的目标是把学生，我们的下一代，培养称为一个合格的工人，如果能让你毕业后找到一份工作，他们就圆满完成了他的使命。而我理想中的教育，是要把学生培养称为一个真正的人，有能力，有信仰，有追求，有同情心，有使命感。在现在的学校教育中，我看不到这种迹象。一个人要在社会上生存发展，当然需要具备谋生的技能。所以教给学生们这些技能并没有错。错在学校把这个当成教育唯一的任务，而且绝大部分的学校都在这样做。而且我并不认为国内的顶级院校就能免俗，只是他们对“工作”的标准要求很高而已。我觉得，除了根据学生的特长和兴趣进行有针对行的技能培训以外，每个人从小到大都应该学习以下的内容：如何了解自身，了解“人”作为一种生物所具有的属性，这涉及到生理学，医疗保健或者说如何促进自身健康的基本知识。人类的心理特征，性格的发展与形成，人与他人或者说周围的环境如何互动，人类发展历史等。了解我们生活的这个物理世界或者说自然环境。这涉及到植物学，动物学，天文地理，数学、化学、物理学以及电子技术等等。了解我们生存的这个社会。涉及的科目包括心理学（社会心理学，犯罪心理学等等），历史文化，政治、经济以及宗教。…

2016 年 8 月 18 日acheng

Protecting Netflix Viewing Privacy at Scale

On the Open Connect team at Netflix, we are always working to enhance the hardware and software in the purpose-built Open Connect Appliances (OCAs) that store and serve Netflix video content. As we mentioned in a recent company blog post, since the beginning of the Open Connect program we have significantly increased the efficiency of our OCAs – from delivering 8 Gbps of throughput from a single server in 2012 to over 90 Gbps from a single server in 2016. We contribute to this effort on the software side by optimizing every aspect of the software for our unique use case – in particular, focusing on the open source FreeBSD operating system and the NGINX web server that run on the OCAs.

Members of the team will be presenting a technical session on this topic at the Intel Developer Forum (IDF16) in San Francisco this month. This blog introduces some of the work we’ve done.

Adding TLS to Video Streams

In the modern internet world, we have to focus not only on efficiency, but also security. There are many state-of-the-art security mechanisms in place at Netflix, including Transport Level Security (TLS) encryption of customer information, search queries, and other confidential data. We have always relied on pre-encoded Digital Rights Management (DRM) to secure our video streams. Over the past year, we’ve begun to use Secure HTTP (HTTP over TLS or HTTPS) to encrypt the transport of the video content as well. This helps protect member privacy, particularly when the network is insecure – ensuring that our members are safe from eavesdropping by anyone who might want to record their viewing habits.

Netflix Open Connect serves over 125 million hours of content per day, all around the world. Given our scale, adding the overhead of TLS encryption calculations to our video stream transport had the potential to greatly reduce the efficiency of our global infrastructure. We take this efficiency seriously, so we had to find creative ways to enhance the software on our OCAs to accomplish this objective.

We will describe our work in these three main areas:

Determining the ideal cipher for bulk encryption
Finding the best implementation of the chosen cipher
Exploring ways to improve the data path to and from the cipher implementation

Cipher Evaluation

We evaluated available and applicable ciphers and decided to primarily use the Advanced Encryption Standard (AES) cipher in Galois/Counter Mode (GCM), available starting in TLS 1.2. We chose AES-GCM over the Cipher Block Chaining (CBC) method, which comes at a higher computational cost. The AES-GCM cipher algorithm encrypts and authenticates the message simultaneously – as opposed to AES-CBC, which requires an additional pass over the data to generate keyed-hash message authentication code (HMAC). CBC can still be used as a fallback for clients that cannot support the preferred method.

All revisions of Open Connect Appliances also have Intel CPUs that support AES-NI, the extension to the x86 instruction set designed to improve encryption and decryption performance.

We needed to determine the best implementation of AES-GCM with the AES-NI instruction set, so we investigated alternatives to OpenSSL, including BoringSSL and the Intel Intelligent Storage Acceleration Library (ISA-L).

Additional Optimizations

Netflix and NGINX had previously worked together to improve our HTTP client request and response time via the use of sendfile calls to perform a zero-copy data flow from storage (HDD or SSD) to network socket, keeping the data in the kernel memory address space and relieving some of the CPU burden. The Netflix team specifically added the ability to make the sendfile calls asynchronous – further reducing the data path and enabling more simultaneous connections.

However, TLS functionality, which requires the data to be passed to the application layer, was incompatible with the sendfile approach.

To retain the benefits of the sendfile model while adding TLS functionality, we designed a hybrid TLS scheme whereby session management stays in the application space, but the bulk encryption is inserted into the sendfile data pipeline in the kernel. This extends sendfile to support encrypting data for TLS/SSL connections.

We also made some important fixes to our earlier data path implementation, including eliminating the need to repeatedly traverse mbuf linked lists to gain addresses for encryption.

Testing and Results

We tested the BoringSSL and ISA-L AES-GCM implementations with our sendfile improvements against a baseline of OpenSSL (with no sendfile changes), under typical Netflix traffic conditions on three different OCA hardware types. Our changes in both the BoringSSL and ISA-L test situations significantly increased both CPU utilization and bandwidth over baseline – increasing performance by up to 30%, depending on the OCA hardware version. We chose the ISA-L cipher implementation, which had slightly better results. With these improvements in place, we can continue the process of adding TLS to our video streams for clients that support it, without suffering prohibitive performance hits.

Read more details in this paper and the follow up paper. We continue to investigate new and novel approaches to making both security and performance a reality. If this kind of ground-breaking work is up your alley, check out our latest job openings!

By Randall Stewart, Scott Long, Drew Gallatin, Alex Gutarin, and Ellen Livengood

2016 年 8 月 8 日techwriter

缓存更新的套路

看到好些人在写更新缓存数据代码时，先删除缓存，然后再更新数据库，而后续的操作会把数据再装载的缓存中。然而，这个 […]

2016 年 7 月 27 日陈皓

为什么我不在微信公众号上写文章

很多朋友问我为什么不在微信公众号上写文章。我都没有直接回答，老实说，我也是扭扭捏捏的，才去开了个个人的微信的公 […]

2016 年 7 月 11 日陈皓

subprocess: python 2.7脚本中如何运行shell命令

每次用的时候都要去读一下文档，这次总结一下，记录于此，已备后用。其实这个主要是用subprocess这个模块。最普通的，执行命令，但不管命令的输出： import subprocess return_code = subprocess.call(“ls”,”-ltra”], shell=True) 设置”shell=True” 是告诉python在后台生成一个shell进程，然后在此shell进程中执行相应命令。这样可以利用到shell中的一些特性，但这样做有安全风险，只有传入的命令完全可控时才这样做。命令的返回值保存在return_code中。输出则直接输出到屏幕（stdout）。如果需要获取命令的输出，则用subprocess中的check_output, 而非call： output = subprocess.check_output([‘ls’, ‘-l’]) print ‘Have %d bytes in output’ % len(output) print output ## if the command fails, python throws an CalledProcessError 上面的方式中，如果命令的出错输出（如果有的话）仍然会出现在屏幕上。…

2016 年 7 月 7 日acheng

性能测试应该怎么做？

偶然间看到了阿里中间件Dubbo的性能测试报告，我觉得这份性能测试报告让人觉得做这性能测试的人根本不懂性能测试 […]

2016 年 7 月 6 日陈皓

Meson: Workflow Orchestration for Netflix Recommendations

At Netflix, our goal is to predict what you want to watch before you watch it. To do this, we run a large number of machine learning (ML) workflows every day. In order to support the creation of these workflows and make efficient use of resources, we created Meson.

Meson is a general purpose workflow orchestration and scheduling framework that we built to manage ML pipelines that execute workloads across heterogeneous systems. It manages the lifecycle of several ML pipelines that build, train and validate personalization algorithms that drive video recommendations.

One of the primary goals of Meson is to increase the velocity, reliability and repeatability of algorithmic experiments while allowing engineers to use the technology of their choice for each of the steps themselves.

Powering Machine Learning Pipelines

Spark, MLlib, Python, R and Docker play an important role in several current generation machine learning pipelines within Netflix.

Let’s take a look at a typical machine learning pipeline that drives video recommendations and how it is represented and handled in Meson.

(click to enlarge)

The workflow involves:

Selecting a set of users – This is done via a Hive query to select the cohort for analysis
Cleansing / preparing the data – A Python script that creates 2 sets of users for ensuring parallel paths
In the parallel paths, one uses Spark to build and analyze a global model with HDFS as temporary storage.
The other uses R to build region (country) specific models. The number of regions is dynamic based on the cohort selected for analysis. The Build Regional Model and Validate Regional Model steps in the diagram are repeated for each region (country), expanded at runtime and executed with different set of parameters as shown below
Validation – Scala code that tests for the stability of the models when the two paths converge. In this step we also go back and repeat the whole process if the model is not stable.
Publish the new model – Fire off a Docker container to publish the new model to be picked up by other production systems

(click to enlarge)

The above picture shows a run in progress for the workflow described above

The user set selection, and cleansing of the data has been completed as indicated by the steps in green.
The parallel paths are in progress

The Spark branch has completed the model generation and the validation
The for-each branch has kicked off 4 different regional models and all of them are in progress (Yellow)

The Scala step for model selection is activated (Blue). This indicates that one or more of the incoming branches have completed, but it is still not scheduled for execution because there are incoming branches that have either (a) not started or (b) are in progress
Runtime context and parameters are passed along the workflow for business decisions

Under the Hood

Let’s dive behind the scenes to understand how Meson orchestrates across disparate systems and look at the interplay within different components of the ecosystem. Workflows have a varying set of resource requirements and expectations on total run time. We rely on resource managers like Apache Mesos to satisfy these requirements. Mesos provides task isolation and excellent abstraction of CPU, memory, storage, and other compute resources. Meson leverages these features to achieve scale and fault tolerance for its tasks.

Meson Scheduler

Meson scheduler, which is registered as a Mesos framework, manages the launch, flow control and runtime of the various workflows. Meson delegates the actual resource scheduling to Mesos. Various requirements including memory and CPU are passed along to Mesos. While we do rely on Mesos for resource scheduling, the scheduler is designed to be pluggable, should one choose to use another framework for resource scheduling.

Once a step is ready to be scheduled, the Meson scheduler chooses the right resource offer from Mesos and ships off the task to the Mesos master.

Meson Executor

The Meson executor is a custom Mesos executor. Writing a custom executor allows us to maintain a communication channel with Meson. This is especially useful for long running tasks where framework messages can be sent to the Meson scheduler. This also enables us to pass custom data that’s richer than just exit codes or status messages.

Once Mesos schedules a Meson task, it launches a Meson executor on a slave after downloading all task dependencies. While the core task is being executed, the executor does housekeeping chores like sending heartbeats, percent complete, status messages etc.

DSL

Meson offers a Scala based DSL that allows for easy authoring of workflows. This makes it very easy for developers to use and create customized workflows. Here is how the aforementioned workflow may be defined using the DSL.

val getUsers = Step(“Get Users”, …)
val wrangleData = Step(“Wrangle Data”, …)

…

val regionSplit = Step(“For Each Region”, …)

val regionJoin = Step(“End For Each”, …)

val regions = Seq(“US”, “Canada”, “UK_Ireland”, “LatAm”, …)

val wf = start -> getUsers -> wrangleData ==> (

trainGlobalModel -> validateGlobalModel,

regionSplit **(reg = regions) –< (trainRegModel, validateRegModel) >– regionJoin

) >== selectModel -> validateModel -> end

// If verbs are preferred over operators

val wf = sequence(start, getUsers, wrangleData) parallel {

sequence(trainGlobalModel, validateGlobalModel)

sequence(regionSplit,

forEach(reg = regions) sequence(trainRegModel, validateRegModel) forEach,

regionJoin)

} parallel sequence(selectModel, validateModel, end)

Extension architecture

Meson was built from the ground up to be extensible to make it easy to add custom steps and extensions. Spark Submit Step, Hive Query Step, Netflix specific extensions that allow us to reach out to microservices or other systems like Cassandra are a some examples.

In the above workflow, we built a Netflix specific extension to call out to our Docker execution framework that enables developers to specify the bare minimum parameters for their Docker images. The extension handles all communications like getting all the status URLs, the log messages and monitoring the state of the Docker process.

Artifacts

Outputs of steps can be treated as first class citizens within Meson and are stored as Artifacts. Retries of a workflow step can be skipped based on the presence or absence of an artifact id. We can also have custom visualization of artifacts within the Meson UI. For e.g. if we store feature importance as an artifact as part of a pipeline, we can plug in custom visualizations that allow us to compare the past n days of the feature importance.

Screen Shot 2016-05-27 at 4.01.02 PM.png

Mesos Master / Slave

Mesos is used for resource scheduling with Meson registered as the core framework. Meson’s custom Mesos executors are deployed across the slaves. These are responsible for downloading all the jars and custom artifacts and send messages / context / heartbeats back to the Meson scheduler. Spark jobs submitted from Meson share the same Mesos slaves to run the tasks launched by the Spark job.

Native Spark Support

Supporting Spark natively within Meson was a key requirement and goal. The Spark Submit within Meson allows for monitoring of the Spark job progress from within Meson, has the ability to retry failed spark steps or kill Spark jobs that may have gone astray. Meson also supports the ability to target specific Spark versions – thus, supporting innovation for users that want to be on the latest version of Spark.

Supporting Spark in a multi-tenant environment via Meson came with an interesting set of challenges. Workflows have a varying set of resource requirements and expectations on total run time. Meson efficiently utilizes the available resources by matching the resource requirements and SLA expectation to a set of Mesos slaves that have the potential to meet the criteria. This is achieved by setting up labels for groups of Mesos slaves and using the Mesos resource attributes feature to target a job to a set of slaves.

ML Constructs

As adoption increased for Meson, a class of large scale parallelization problems like parameters sweeping, complex bootstraps and cross validation emerged.

Meson offers a simple ‘for-loop’ construct that allows data scientists and researchers to express parameter sweeps allowing them to run tens of thousands of docker containers across the parameter values. Users of this construct can monitor progress across the thousands of tasks in real time, find failed tasks via the UI and have logs streamed back to a single place within Meson making managing such parallel tasks simple.

Conclusion

Meson has been powering hundreds of concurrent jobs across multiple ML pipelines for the past year. It has been a catalyst in enabling innovation for our algorithmic teams thus improving overall recommendations to our members.

We plan to open source Meson in the coming months and build a community around it. If you want to help accelerate the pace of innovation and the open source efforts, join us.

Here are some screenshots of the Meson UI:

(click to enlarge)

– Antony Arokiasamy, Kedar Sadekar, Raju Uppalapati, Sathish Sridharan, Prasanna Padmanabhan, Prashanth Raghavan, Faisal Zakaria Siddiqi, Elliot Chow and “a man has no linkedin” (aka Davis Shepherd) for the Meson Team

2016 年 5 月 31 日Antony Arokiasamy

一些关于链接和加载的博客

COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORY http://www.tenouk.com/ModuleW.html Linkers – part I （总共20） http://www.airs.com/blog/archives/38 Linking — chapter 7 of Computer Systems: a programmers’ perspective http://csapp.cs.cmu.edu/2e/ch7-preview.pdf

2016 年 5 月 8 日acheng

SmartOS上更新虚拟机配置

这里以一个SmartOS zone更新IP及网关为例。

用到的命令是vmadm update UUID < json.file。相关配置都需要以json的格式给出，这个我有点不习惯。或许还有其它方式，不过我不了解。

2016 年 5 月 4 日acheng

SmartOS上如何创建一个SmartOS zone

如何在SmartOS上创建zones

2016 年 5 月 2 日acheng