Local Outlier Factor (LOF)
· Local Outlier Factor (LOF) is an unsupervised machine learning anomaly detection method that identifies data points with significantly lower local density compared to their neighbours.
· It compares the local density of a point to the densities of its k-nearest neighbours.
· A higher LOF score (>1.0) indicates a potential outlier — the point is in a sparse region.
· LOF is effective in detecting local anomalies, which might not stand out globally but deviate from nearby data.
How LOF Works (Detection Phase)
· For each data point, LOF computes:
-
The average distance to its k nearest neighbours.
-
A score representing how isolated the point is relative to its neighbourhood.
· The algorithm assigns a LOF score:
-
~1.0
→ normal -
>1.5
(configurable) → likely outlier
Outlier Correction Approach
· Local Outlier Factor (LOF) is designed for detecting outliers but does not modify the data itself.
· To correct the detected outliers:
-
First, compute the median or mean of the values classified as normal (anomaly == 1) for each feature.
-
Then, replace the values identified as outliers (anomaly == -1) with the corresponding computed median or mean.
Local Outlier Factor (LOF) Parameters
Param Name | Description | Default Value | Possible Values |
---|---|---|---|
N-NEIGHBORS | Number of neighbors to use for LOF calculation | 20 | Integer ≥ 1 |
CONTAMINATION | Proportion of outliers in the data | auto |
Float in (0, 0.5] or "auto" |
METRIC | Distance metric used | euclidean |
'euclidean' , 'manhattan' , etc. |
N-JOBS | Number of parallel jobs to run | -1 | Integer or None (uses 1 or all cores) |
INTERPOLATION-METHOD | Method for interpolation when imputing missing values | linear |
'linear' , 'nearest' , 'spline' , etc. |