Density Estimation

The goal of density estimation is to be able to give a density estimation for each coordinate in the vector space.

There are two approaches

  • parametric (model based)
    • Gaussian Densities
  • nonparametric (data driven)
    • Kernel Density Estimate

Kernel Density Estimation (exemplary with Gliding Histogram)


  • width of rectangle

Histogram Kernel

  • are the coordinates at which we want to measure the density
  • is the normalized (well, to 1/2 normalized. Why would anyone do that?) distance between two points.

Does the vector given by end outside our rectangle with width ?

The estimation of density

  • width of the rectangle
  • number of dimensions

Drawbacks of Gliding Histograms

  • “Bumpy” whenevery a new data point falls into the rectangle (especially with few data points or high dimensionality)
  • Rectangle not really a good choice
  • Optimal size of non-trivial - needs model selection. lower h leads to overfitting

** Alternatively Gaussian**

Also a Gaussian kernel instead of the rectangle can be used, which reduces most of the side efects.

Parametric Density Estimation

TODO: Figure out what and mean (they compose )

Parametric Density estimation finds a good value for .

Family of parametric density functions: $$\hat{P}(\underline x;\underline w)

Cost function for model selection

Problem: Minimizing the training costs leads to overfitting

==> We needs , the generalization costs, but they rely on the knowledge of ==> Use a proxy function

Alternative approach: Select the model that gives the highest probability for the already known data points.

Probably simple gradient descent

Conditions for multivariate cases

Mixture Models - EM