Local Outlier Factor (LOF)
· Local Outlier Factor (LOF) is an unsupervised machine learning anomaly detection method that identifies data points with significantly lower local density compared to their neighbours.
· It compares the local density of a point to the densities of its k-nearest neighbours.
· A higher LOF score (>1.0) indicates a potential outlier — the point is in a sparse region.
· LOF is effective in detecting local anomalies, which might not stand out globally but deviate from nearby data.
How LOF Works (Detection Phase)
· For each data point, LOF computes:
-
The average distance to its k nearest neighbours.
-
A score representing how isolated the point is relative to its neighbourhood.
· The algorithm assigns a LOF score:
-
~1.0→ normal -
>1.5(configurable) → likely outlier
Outlier Correction Approach
· Local Outlier Factor (LOF) is designed for detecting outliers but does not modify the data itself.
· To correct the detected outliers:
-
First, compute the median or mean of the values classified as normal (anomaly == 1) for each feature.
-
Then, replace the values identified as outliers (anomaly == -1) with the corresponding computed median or mean.
Local Outlier Factor (LOF) Parameters
| Param Name | Description | Default Value | Possible Values |
|---|---|---|---|
| N-NEIGHBORS | Number of neighbors to use for LOF calculation | 20 | Integer ≥ 1 |
| CONTAMINATION | Proportion of outliers in the data | auto |
Float in (0, 0.5] or "auto" |
| METRIC | Distance metric used | euclidean |
'euclidean', 'manhattan', etc. |
| N-JOBS | Number of parallel jobs to run | -1 | Integer or None (uses 1 or all cores) |
| INTERPOLATION-METHOD | Method for interpolation when imputing missing values | linear |
'linear', 'nearest', 'spline', etc. |