Local Outlier Factor (LOF)

· Local Outlier Factor (LOF) is an unsupervised machine learning anomaly detection method that identifies data points with significantly lower local density compared to their neighbours.

· It compares the local density of a point to the densities of its k-nearest neighbours.

· A higher LOF score (>1.0) indicates a potential outlier — the point is in a sparse region.

· LOF is effective in detecting local anomalies, which might not stand out globally but deviate from nearby data.

How LOF Works (Detection Phase)

· For each data point, LOF computes:

  • The average distance to its k nearest neighbours.

  • A score representing how isolated the point is relative to its neighbourhood.

· The algorithm assigns a LOF score:

  • ~1.0 → normal

  • >1.5 (configurable) → likely outlier

Outlier Correction Approach

· Local Outlier Factor (LOF) is designed for detecting outliers but does not modify the data itself.

· To correct the detected outliers:

  • First, compute the median or mean of the values classified as normal (anomaly == 1) for each feature.

  • Then, replace the values identified as outliers (anomaly == -1) with the corresponding computed median or mean.

Local Outlier Factor (LOF) Parameters

Param Name Description Default Value Possible Values
N-NEIGHBORS Number of neighbors to use for LOF calculation 20 Integer ≥ 1
CONTAMINATION Proportion of outliers in the data auto Float in (0, 0.5] or "auto"
METRIC Distance metric used euclidean 'euclidean', 'manhattan', etc.
N-JOBS Number of parallel jobs to run -1 Integer or None (uses 1 or all cores)
INTERPOLATION-METHOD Method for interpolation when imputing missing values linear 'linear', 'nearest', 'spline', etc.