Vapnick Chervonenkis Dimension



  • It is the measure of the capacity (complexity, flexibility,richness and expressive power) of space of function that can be learned by a classification algorithm.
  • Let H be the hypothesis space for some machine learning problem. The VC dimension of H , denoted by VC(H)  is a measure of complexity of the space H.
  • To define VC dimension we require the notion of the shattering of a set of instances.
  • Shattering of sets: Let D be a data set containing N examples fora binary classification problem with class label 0 and 1.
  • Let H be the hypothesis space for the problem.
  • Each hypothesis h in H partitions D in to two subsets as follows:
                                             { x ∈ D | h(x)=0 } and {x ∈ D| h(x)=1}

  • Such a partition is called as dichotomies and there are 2^N possible dichotomies in D.
  • To each dichotomy of D there is a unique assignment of the labels "1" and "0" to the elements of D.
  • Conversely if s is any subset of D then S defines a unique hypothesis h as follows
                                             h(x)= { 1 if x ∈ S
                                                          0 otherwise}
  • A data set is containing n points.
  • There endpoints can be labelled in 2^n  ways as positive and negative.
  • A hypothesis h ∈H that separates the positive example from negative example then we say H shatter N points.
  • N examples can be learned with no error by a hypothesis drawn from H.
  • The maximum number of points that can be shattered by it is called VC dimension of H and measures the capacity of H.

Comments