A startup's guide to video: part 1 (Getting the video)

As the growth of video usage on the internet has exploded over the past decade we have come across an increasing number of startups looking to leverage this medium in interesting ways. Despite these companies having extremely talented teams we have noticed that there are a few stumbling blocks that often trip a team up. The purpose of this series and code base is to provide a solid foundation and architecture for companies dealing with video. This is by no means a catch all, but we do believe there are some common themes that will be helpful regardless of how your product plans to leverage video.

The first thing to realize is that video files are often big, really BIG. This may seem like an obvious fact, but when you start thinking about your architecture this fact will drive many of the best practices listed below.

The first thing we need to do is get the video file from the client and to our servers. It is tempting to simply have the user upload the video file to your web server, your web servers can potentially do some post processing and then ultimately put the files somewhere for long term storage. Don't do this. The reason we discourage this approach is for two reasons.

  1. Having large video files flow through your web servers clogs up resources. In the best case scenario this requires you to spin up additional instances to handle the load and in the worst case scenario result in out of memory errors.

  2. Having to move a file around twice is clearly slower. It may not necessarily be twice as slow if you are streaming the files as they come into your web servers, but no matter what, it is still slower. This just makes your customers sad, and we want to avoid sad customers.

So what is the right approach? The best approach is to upload the files directly to Amazon's s3 or some other long term storage location directly from the client. There are a couple of ways to get these files to s3 but we suggest using Amazon's fairly new multipart upload support. We prefer this approach over generating a temporary URL for two specific reasons:

  1. The client can send up multiple parts of the file in parallel resulting in significantly faster uploads.

  2. It is more robust. If one part fails, it can be retried, rather the retrying the entire file.

Now that we know what we want to do, lets dive into how we will do it. We will assume you know how to show file input field in a browser, so lets jump to what happens when the user hits the submit button.

Step 1: Generate temporary tokens

Using Amazon's security token service we generate temporary credentials for the client. Do not use your root user but create a new IAM user with restricted access.

In addition to the credentials we also generate a unique ID for our key. We do this to avoid any collisions in our s3 bucket.

  sts.getSessionToken({}, function(err, data) {
    var payload
    if (err) {
      console.warn('ERROR generating session token', err)
      return res.status(500).send({})
    }

    payload = {
      credentials: data.Credentials,
      key: key = uuid()
    }

    payload.credentials.region = config.aws.region
    payload.credentials.bucket = config.aws.bucket

    res.status(200).json(payload);
  })

Step 2: Use Amazon's managed uploader

Once we have our credentials returned from the server, the client needs to upload the files to S3.

Another nice feature of the AWS managed uploader is that we get progress events that allow us provide useful feedback to the user.

  function _uploadFileToS3(file, creds, key, callback) {
    var $progressBar = $('.progress-bar')
    var bucket
    var uploadParams

    AWS.config.update(
      { 
        accessKeyId: creds.AccessKeyId,
        secretAccessKey: creds.SecretAccessKey,
        sessionToken: creds.SessionToken,
        region: creds.region
      }
    )

    bucket = new AWS.S3({ params: { Bucket: creds.bucket }})
    uploadParams = { Key: key, ContentType: file.type, Body: file }
    
    bucket.upload(uploadParams, function(err, data) {
      if (err) {
        alert('Error uploading file')
      }

      $progressBar.width('100%')
      callback(err, data)
    }).on('httpUploadProgress', function(fileInfo) {
      $progressBar.width(Math.floor(fileInfo.loaded / fileInfo.total * 100) + '%')
    })
  }

Step 3: Inform the server of the new asset

Once we have written the file to s3 we want to tell the server that we did so in preparation for post processing of the video.

  function _informServerOfFile(data, callback) {
    $.ajax({
      type: 'POST',
      url: '/api/v1/video',
      data: JSON.stringify(data)
    })
    .done(function() {
      callback()
    })
    .fail(function() {
      callback()
    })
  }

Note: that you must set your CORS policy correctly for your S3 bucket or you will cross domain errors. This includes an ETags policy. Below is an example of our policy

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>HEAD</AllowedMethod>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>POST</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
    <ExposeHeader>ETag</ExposeHeader>
    <ExposeHeader>x-amz-meta-custom-header</ExposeHeader>
  </CORSRule>
</CORSConfiguration>

Checkout the code for a fully working version here.

In our next series we will discuss encoding the video for playback.