Isolation Forest

Isolation Forest (iForest) is a popular machine learning algorithm specifically designed for outlier/anomaly detection. It's efficient, scales well to high-dimensional data, and works on the principle that anomalies are easier to isolate than normal points.

How Isolation Forest Works (Outlier Detection)

· It constructs binary trees (isolation trees) by randomly selecting a feature and then randomly selecting a split value between the max and min of that feature.

· The idea: Anomalies get isolated quicker (fewer splits), so they have shorter path lengths in the tree.

· Aggregating across many trees gives an anomaly score between 0 and 1:

Closer to 1 → Likely anomaly.
Closer to 0.5 → Normal data.

Outlier Correction Approach

· Isolation Forest (iForest) is designed for detecting outliers, but it does not modify the data itself.

· To correct the detected outliers:

First, compute the median or mean of the values classified as normal (anomaly == 1) for each feature.
Then, replace the values identified as outliers (anomaly == -1) with the corresponding computed median or mean.

Isolation Forest Key Parameters

Param Name	Description	Default Value	Possible Values
CONTAMINATION	Estimated proportion of outliers in the dataset	0.05	Float between 0.0 and 0.5, or 'auto'
N_ESTIMATORS	Number of trees (base estimators) in the forest	100	Any positive integer
MAX-SAMPLES	Number of samples to draw for training each tree	'auto'	Integer (1 to n_samples) or float (0.0–1.0]
MAX-FEATURES	Number of features to draw for each tree	1.0	Float (0.0–1.0], or integer (1 to n_features)
N-JOBS	Number of parallel jobs for computation	-1	-1 (all CPUs), 1, 2, ..., or None