- It is the measure of the capacity (complexity, flexibility,richness and expressive power) of space of function that can be learned by a classification algorithm.
- Let H be the hypothesis space for some machine learning problem. The VC dimension of H , denoted by VC(H) is a measure of complexity of the space H.
- To define VC dimension we require the notion of the shattering of a set of instances.
- Shattering of sets: Let D be a data set containing N examples fora binary classification problem with class label 0 and 1.
- Let H be the hypothesis space for the problem.
- Each hypothesis h in H partitions D in to two subsets as follows:
{ x ∈ D | h(x)=0 } and {x ∈ D| h(x)=1}
- Such a partition is called as dichotomies and there are 2^N possible dichotomies in D.
- To each dichotomy of D there is a unique assignment of the labels "1" and "0" to the elements of D.
- Conversely if s is any subset of D then S defines a unique hypothesis h as follows
h(x)= { 1 if x ∈ S
0 otherwise}
- A data set is containing n points.
- There endpoints can be labelled in 2^n ways as positive and negative.
- A hypothesis h ∈H that separates the positive example from negative example then we say H shatter N points.
- N examples can be learned with no error by a hypothesis drawn from H.
- The maximum number of points that can be shattered by it is called VC dimension of H and measures the capacity of H.
Comments
Post a Comment