Sunday, February 26, 2012

A Primer on SCTE-130 and IAB VAST


In the past year, I have been involved in building next generation advertising solutions based on SCTE-130 and IAB VAST standard. I have been asked countless times of what these standards are for and how they have been adopted. This blog is my attempt to give a brief introduction to SCTE-130 and IAB VAST, discuss how each standard is adopted in today’s advertising industry. In a second blog, I will discuss how these standards are different and discuss the possible convergence of both standards.

SCTE-130

What is SCTE-130

SCTE-130 is a standard from The Society of Cable Telecommunications Engineers (SCTE) aiming to build a unified platform for addressable advertising. SCTE-130 standard consists of a set of XML based protocols and will work in traditional cable deployment, i.e. setup boxes, for both linear and VOD programming.

Core Elements in SCTE-130

SCTE-130 consists of the following core elements and each of them plays a crucial part in selecting, playing, and tracking an ad.
Core Element
What does it do?
Ad Management Service (ADM)
Initiating ad insertion request to ADS
Reporting activity events (whether ad was successfully placed, if the ad was skipped, etc)
Ad Decision service (ADS)
Deciding which ad to play
Content Information System (CIS)
Storing availability of an ad
Placement Opportunity Information System (POIS)
Storing the opportunities to place an ad
Subscriber Information System (SIS)
Storing subscriber related information, like demographic, geographic information

Who services Ads

Once a decision has been made on what ads to insert, the insertion device will insert the ads in the right place and will report back the activity events (whether ad was successfully placed, how much of the ad was viewed, paused, skipped, etc) to ADM.

Basic Flow for Ad Placement Decision and Tracking

In the following section, we will show the basic flow for two different types of ad insertions, VOD and schedule-based linear ad insertion.

VOD Ad Insertion


Schedule-based Linear Ad Insertion

Linear Ad insertion is typically based on a schedule that contains what Ad to play within a specific time window on a particular channel. For example, when an Ad placement opportunity comes in between 7-7:30 PM, play the soda ad.

IAB VAST

The Interactive Advertising Bureau (IAB) is comprised of more than 500 leading media and technology companies that are responsible for selling 86% of online Advertising in the United States.

What is IAB VAST

IAB’s Digital Video Ad Serving Template (VAST) is designed to standardize the communication protocol between online video player and Ad services.

What is in IAB VAST

  1. Defines a standard ad response for in-stream video
  2. Includes guidance for most on-demand video players (i.e., Adobe’s Flash, Microsoft’s Silverlight and Real Player)
  3. Includes accommodations for linear video and interactive ads (e.g.”pre-roll”) as well as non-linear ads such as clickable banners and overlays
  4. How to track ad playback

Who serves Ads

Online video player (Flash or Silverlight) is the one that carries out the actual Ad insertion.

Basic Flow for Ad Placement Decision and Tracking


What is not in IAB VAST

In contrast to SCTE-130 standard, IAB VAST defines what Ads to play, but not when to play it. It is up to the player to decide when to insert the Ads. The Media Abstract Sequencing Template (MAST) format specification proposed by Akamai is an attempt to address the sequencing of in-stream advertising content.
In short, VAST standard defines what Ads to play and MAST standard defines when to play Ads.
MAST has been widely supported by Flash Strobe Media Framework as well Microsoft Silverlight Media Framework.

Friday, February 24, 2012

Hadoop 1.0 Quick Start Update

Purpose
There seems to be quite a few changes in Hadoop 1.0, which was not reflected in Hadoop's official setup guide. This document attempts to supplement Hadoop's setup guide, with updates for version 1.0.


Please follow the official Hadoop setup guide first and then check the specific sections for 1.0 update.


Standalone Operations
In Hadoop 1.0, all configuration files have been moved to etc/hadoop directory.


The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory. 


$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/
hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 

$ cat output/*

Execution
Start the hadoop daemons:
$ sbin/start-all.sh


Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put etc/hadoop/ input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Should see something like this:

12/02/23 16:48:04 INFO mapred.FileInputFormat: Total input paths to process : 16
12/02/23 16:48:05 INFO mapred.JobClient: Running job: job_201202231031_0001
12/02/23 16:48:06 INFO mapred.JobClient:  map 0% reduce 0%
12/02/23 16:48:19 INFO mapred.JobClient:  map 12% reduce 0%
12/02/23 16:48:28 INFO mapred.JobClient:  map 25% reduce 0%
12/02/23 16:48:34 INFO mapred.JobClient:  map 25% reduce 4%
12/02/23 16:48:37 INFO mapred.JobClient:  map 37% reduce 4%
12/02/23 16:48:40 INFO mapred.JobClient:  map 37% reduce 8%
12/02/23 16:48:43 INFO mapred.JobClient:  map 50% reduce 8%
12/02/23 16:48:50 INFO mapred.JobClient:  map 56% reduce 12%
12/02/23 16:48:53 INFO mapred.JobClient:  map 62% reduce 12%
12/02/23 16:48:56 INFO mapred.JobClient:  map 68% reduce 18%
12/02/23 16:48:58 INFO mapred.JobClient:  map 75% reduce 22%
12/02/23 16:49:01 INFO mapred.JobClient:  map 81% reduce 22%
12/02/23 16:49:04 INFO mapred.JobClient:  map 87% reduce 22%
12/02/23 16:49:07 INFO mapred.JobClient:  map 93% reduce 27%
12/02/23 16:49:10 INFO mapred.JobClient:  map 100% reduce 27%
12/02/23 16:49:13 INFO mapred.JobClient:  map 100% reduce 29%
12/02/23 16:49:20 INFO mapred.JobClient:  map 100% reduce 100%
12/02/23 16:49:25 INFO mapred.JobClient: Job complete: job_201202231031_0001
12/02/23 16:49:25 INFO mapred.JobClient: Counters: 30
12/02/23 16:49:25 INFO mapred.JobClient:   Job Counters 
12/02/23 16:49:25 INFO mapred.JobClient:     Launched reduce tasks=1
12/02/23 16:49:25 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=99019
12/02/23 16:49:25 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/23 16:49:25 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/02/23 16:49:25 INFO mapred.JobClient:     Launched map tasks=16
12/02/23 16:49:25 INFO mapred.JobClient:     Data-local map tasks=16
12/02/23 16:49:25 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=60471
12/02/23 16:49:25 INFO mapred.JobClient:   File Input Format Counters 
12/02/23 16:49:25 INFO mapred.JobClient:     Bytes Read=26852
12/02/23 16:49:25 INFO mapred.JobClient:   File Output Format Counters 
12/02/23 16:49:25 INFO mapred.JobClient:     Bytes Written=180
12/02/23 16:49:25 INFO mapred.JobClient:   FileSystemCounters
12/02/23 16:49:25 INFO mapred.JobClient:     FILE_BYTES_READ=82
12/02/23 16:49:25 INFO mapred.JobClient:     HDFS_BYTES_READ=28574
12/02/23 16:49:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=367327
12/02/23 16:49:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=180
12/02/23 16:49:25 INFO mapred.JobClient:   Map-Reduce Framework
12/02/23 16:49:25 INFO mapred.JobClient:     Map output materialized bytes=172
12/02/23 16:49:25 INFO mapred.JobClient:     Map input records=758
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce shuffle bytes=166
12/02/23 16:49:25 INFO mapred.JobClient:     Spilled Records=6
12/02/23 16:49:25 INFO mapred.JobClient:     Map output bytes=70
12/02/23 16:49:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=2596864000
12/02/23 16:49:25 INFO mapred.JobClient:     CPU time spent (ms)=12500
12/02/23 16:49:25 INFO mapred.JobClient:     Map input bytes=26852
12/02/23 16:49:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1722
12/02/23 16:49:25 INFO mapred.JobClient:     Combine input records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce input records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce input groups=3
12/02/23 16:49:25 INFO mapred.JobClient:     Combine output records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=2788790272
12/02/23 16:49:25 INFO mapred.JobClient:     Reduce output records=3
12/02/23 16:49:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=9619705856
12/02/23 16:49:25 INFO mapred.JobClient:     Map output records=3
12/02/23 16:49:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/02/23 16:49:25 INFO mapred.FileInputFormat: Total input paths to process : 1
12/02/23 16:49:25 INFO mapred.JobClient: Running job: job_201202231031_0002
12/02/23 16:49:26 INFO mapred.JobClient:  map 0% reduce 0%
12/02/23 16:49:41 INFO mapred.JobClient:  map 100% reduce 0%
12/02/23 16:49:53 INFO mapred.JobClient:  map 100% reduce 100%
12/02/23 16:49:58 INFO mapred.JobClient: Job complete: job_201202231031_0002
12/02/23 16:49:58 INFO mapred.JobClient: Counters: 30
12/02/23 16:49:58 INFO mapred.JobClient:   Job Counters 
12/02/23 16:49:58 INFO mapred.JobClient:     Launched reduce tasks=1
12/02/23 16:49:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13874
12/02/23 16:49:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/23 16:49:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/02/23 16:49:58 INFO mapred.JobClient:     Launched map tasks=1
12/02/23 16:49:58 INFO mapred.JobClient:     Data-local map tasks=1
12/02/23 16:49:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10575
12/02/23 16:49:58 INFO mapred.JobClient:   File Input Format Counters 
12/02/23 16:49:58 INFO mapred.JobClient:     Bytes Read=180
12/02/23 16:49:58 INFO mapred.JobClient:   File Output Format Counters 
12/02/23 16:49:58 INFO mapred.JobClient:     Bytes Written=52
12/02/23 16:49:58 INFO mapred.JobClient:   FileSystemCounters
12/02/23 16:49:58 INFO mapred.JobClient:     FILE_BYTES_READ=82
12/02/23 16:49:58 INFO mapred.JobClient:     HDFS_BYTES_READ=296
12/02/23 16:49:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42387
12/02/23 16:49:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=52
12/02/23 16:49:58 INFO mapred.JobClient:   Map-Reduce Framework
12/02/23 16:49:58 INFO mapred.JobClient:     Map output materialized bytes=82
12/02/23 16:49:58 INFO mapred.JobClient:     Map input records=3
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce shuffle bytes=82
12/02/23 16:49:58 INFO mapred.JobClient:     Spilled Records=6
12/02/23 16:49:58 INFO mapred.JobClient:     Map output bytes=70
12/02/23 16:49:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=220725248
12/02/23 16:49:58 INFO mapred.JobClient:     CPU time spent (ms)=1990
12/02/23 16:49:58 INFO mapred.JobClient:     Map input bytes=94
12/02/23 16:49:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=116
12/02/23 16:49:58 INFO mapred.JobClient:     Combine input records=0
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce input records=3
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce input groups=1
12/02/23 16:49:58 INFO mapred.JobClient:     Combine output records=0
12/02/23 16:49:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=235311104
12/02/23 16:49:58 INFO mapred.JobClient:     Reduce output records=3
12/02/23 16:49:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1171832832
12/02/23 16:49:58 INFO mapred.JobClient:     Map output records=3



Examine output.

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ sbin/stop-all.sh