## Density Estimation

The goal of density estimation is to be able to give a density estimation for each coordinate in the vector space.

There are two approaches

• parametric (model based)
• Gaussian Densities
• nonparametric (data driven)
• Kernel Density Estimate

### Kernel Density Estimation (exemplary with Gliding Histogram)

Parameter

• $h$ width of rectangle

Histogram Kernel $H$

• $\underline x$ are the coordinates at which we want to measure the density
• $\underline u$ is the normalized (well, to 1/2 normalized. Why would anyone do that?) distance between two points.

Does the vector given by $\underline u$ end outside our rectangle with width $h$?

The estimation of density

• $h$ width of the rectangle
• $n$ number of dimensions

Drawbacks of Gliding Histograms

• “Bumpy” whenevery a new data point falls into the rectangle (especially with few data points or high dimensionality)
• Rectangle not really a good choice
• Optimal size of $h$ non-trivial - needs model selection. lower h leads to overfitting

** Alternatively Gaussian**

Also a Gaussian kernel instead of the rectangle can be used, which reduces most of the side efects.

### Parametric Density Estimation

TODO: Figure out what $\underline \mu ^*$ and $\underline\sum$ mean (they compose $\underline w$)

Parametric Density estimation finds a good value for $h$.

Family of parametric density functions: \hat{P}(\underline x;\underline w)

Cost function for model selection

Problem: Minimizing the training costs leads to overfitting

==> We needs $E^G$, the generalization costs, but they rely on the knowledge of $P$ ==> Use a proxy function

Alternative approach: Select the model that gives the highest probability for the already known data points.

Probably simple gradient descent

Conditions for multivariate cases