Vapnick Chervonenkis Dimension

Vapnick Chervonenkis Dimension

It is the measure of the capacity (complexity, flexibility,richness and expressive power) of space of function that can be learned by a classification algorithm.

Let H be the hypothesis space for some machine learning problem. The VC dimension of H , denoted by VC(H) is a measure of complexity of the space H.

To define VC dimension we require the notion of the shattering of a set of instances.

Shattering of sets: Let D be a data set containing N examples fora binary classification problem with class label 0 and 1.

Let H be the hypothesis space for the problem.

Each hypothesis h in H partitions D in to two subsets as follows:

{ x ∈ D | h(x)=0 } and {x ∈ D| h(x)=1}

Such a partition is called as dichotomies and there are 2^N possible dichotomies in D.

To each dichotomy of D there is a unique assignment of the labels "1" and "0" to the elements of D.

Conversely if s is any subset of D then S defines a unique hypothesis h as follows

h(x)= { 1 if x ∈ S

0 otherwise}

A data set is containing n points.

There endpoints can be labelled in 2^n ways as positive and negative.

A hypothesis h ∈H that separates the positive example from negative example then we say H shatter N points.

N examples can be learned with no error by a hypothesis drawn from H.

The maximum number of points that can be shattered by it is called VC dimension of H and measures the capacity of H.

Comments