Using Microservices to Encode and Publish Videos at The New York Times

Video publishing at The Times is growing
For the past 10 years, the video publishing lifecycle at The New York Times has relied on vendors and in-house hardware solutions. With our growing investment in video journalism over the past couple of years, we’ve found ourselves producing more video content every month, along with supporting new initiatives such as 360-degree video and Virtual Reality. This growth has created the need to migrate to a video publishing platform that could adapt to, and keep up with, the fast pace that our newsroom demands and the continued evolution of our production process. Along with this, we needed a system that could continuously scale in both capacity and features while not compromising on either quality or reliability.

A solution
At the beginning of this year, we created a group inside our video engineering team to implement a new solution for the ingesting, encoding, publishing and the syndication of our growing library of video content. The main goal of the team was to implement a job processing pipeline that was vendor agnostic and cloud-based, along with being highly efficient, elastic, and, of course, reliable. Another goal was to make the system as easy to use as possible, removing any hurdles that might get in the way of our video producers publishing their work and distributing it to our platforms and third-party partners. To do that, we decided to leverage the power of a microservices architecture combined with the benefits of the Go programming language. We named this team Media Factory.

The setup
The first version of our Media Factory encoding pipeline is being used in production by a select group of beta users at The New York Times, and we are actively working with other teams to fully integrate it within our media publishing system. The minimum viable product consists of these three different parts:

Acquisition: After clipping and editing the videos, our video producers, editors, and partners export a final, high-resolution asset usually in ProRes 442 format. Our producers then upload the asset to an AWS S3 bucket to get it ready for the transcoding process. We implemented two different upload approaches:

An internal API that supports multipart uploads, called video-acquisition-api, used from server-side clients, like small scripts or jobs.
A JavaScript wrapper that uses EvaporateJS to upload files directly from the browser, which is integrated with our internal Content Management System (CMS), Scoop.

Transcoding: After the acquisition step is complete, we use another microservice called video-transcoding-api to create multiple outputs based on the source file. Currently, we create a single HLS output with six resolutions and bitrates to support adaptive streaming, four different H.264/MP4 outputs, and one VP8/WebM for the benefit of the 1 percent of our users on the Mozilla Firefox browser running on Microsoft Windows XP.

The transcoding service is by far the most crucial part of our workflow. In order to integrate with cloud-based transcoding providers, we decided to design a tiny wrapper containing provider-specific logic. This design gives us great flexibility. We can schedule and trigger jobs based on a set of parameters such as speed, reliability, current availability, or even the price of the encoding operation for a specific provider. For instance, we can transcode news clips (which are time-sensitive) on the fastest, most expensive encoding service, while simultaneously transcoding live action videos, documentaries, and animations (which are not time-sensitive) using lower-cost providers.

Distribution: The transcoding step transfers the final renditions into another AWS S3 bucket. Since we use a content delivery network (CDN) to deliver the video to our end users, we need a final step to transfer the files from S3 to the CDN (leveraging Aspera’s FASP protocol to do so). Once the files are on the CDN, our video journalists are able to publish their content on The New York Times.

Giving back to the community
Today, we are open sourcing the video-transcoding-api and the video encoding presets that we use to generate all of our outputs. We are also open sourcing the encoding-wrapper, which contains a set of Go clients for the services we support and that are used by the video-transcoding-api.

We believe the format we’ve created will be of particular interest to the open source community. By leveraging the abstractions found in the video-transcoding-api, any developer can write the code necessary to send jobs to any transcoding provider we support without having to rewrite the base preset or the job specification. Sending a job to a different provider is as simple as changing a parameter.

We currently support three popular transcoding providers and plan to add support for more. See a sample preset below, in JSON format:

{
  "providers": ["encodingcom", "elementalconductor", "elastictranscoder"],
  "preset": {
    "name": "1080p_hls",
    "description": "1080p HLS",
    "container": "mp4",
    "profile": "Main",
    "profileLevel": "3.1",
    "rateControl": "VBR",
    "video": {
      "height": "1080",
      "width": "",
      "codec": "h264",
      "bitrate": "3700000",
      "gopSize": "90",
      "gopMode": "fixed",
      "interlaceMode": "progressive"
    },
    "audio": {
      "codec": "aac",
      "bitrate": "64000"
    }
  },
  "outputOptions": {
    "extension": "m3u8",
    "label": "1080p"
  }
}

Our philosophy for presets: “Write once, run anywhere”

Our future plans
In order to fulfill our vision of having a fully open sourced video encoding and distribution pipeline, we thought it best to also tackle the issue of actually encoding the video. We’re officially taking on the development and maintenance of the open source project Snickers to serve this purpose. We’ll not only gain the freedom of deploying our own encoding service anywhere, but we’ll also be able to experiment and implement new features that may not be available with existing service providers or and respond to specific requests from our newsroom. A few examples on the horizon are the automatic generation of thumbnails and accurate audio transcripts.

We’ve also turned our sights to fragmented MP4 (fMP4), and we’ll be investing some time into fully moving to an HLS-first approach for our on-demand videos. In case you missed it, last June at WWDC 2016, Apple introduced fMP4 to the HLS protocol, making it so now almost all devices and browsers support fMP4 playback natively. This means we can now eliminate the overhead of having to transmux the MPEG-TS segments into fMP4 on the fly when playing videos using our video player (we use hls.js to do this) and instead just concatenate and play fMP4 fragments on our local buffer.

Lastly, content-driven encoding is a trendy topic within the online video community, especially after the release of VMAF. We are planning to adopt this approach by splitting the content-driven encoding project into two phases:

Classify our content into four different categories, each with its own preset. For example, animation videos, like the ones we have for our Modern Love show, require fewer bits than our high-motion videos, like some of our Times Documentaries, to achieve the same content fidelity.
Create and integrate an additional microservice within the Media Factory pipeline for the purpose of checking the quality of our outputs using VMAF and triggering new re-encoding jobs with optimized presets.

Come along with us!
Our Media Factory team (Maxwell Dayvson da Silva, Said Ketchman, Thompson Marzagão, Flavio Ribeiro, Francisco Souza and Veronica Yurovsky) believes that these projects will help address the encoding challenges faced by many of you in the online video industry. We hope to receive your crucial feedback and generate contributions from the open source community at large.

Check them out:

https://github.com/NYTimes/video-transcoding-api
https://github.com/NYTimes/video-presets
https://github.com/NYTimes/encoding-wrapper
https://github.com/snickers/snickers

And feel free to ask questions via GitHub Issues in each of the projects!

What's Next