Interpolation Search is an improved search technique over the traditional Binary Search. In Binary Search, we always calculate the middle of the array and compare it with the key. If the key is equal to the middle element then we return; otherwise, we reduce our array to the sub array based on our comparison. But, with Interpolation Search, instead of calculating the middle element, we apply the interpolation formula on our uniformly distributed array/input to calculate the nearest position to the key.
To understand it better let’s consider a scenario where I need to search for the name of my friend “Yatharth” in a dictionary. Obviously, I would start flipping the pages of the dictionary from the end because I know that this name would be present somewhere closer to the end. That is what our Interpolation search actually does. On the contrary, if I were to use Binary search here, I would flip to the middle of the dictionary and then lexically compare the middle word with my search word and it would take comparatively more time.
Let us dive into it and derive the Interpolation Formula by just using school level maths.
Derivation:
Suppose we have a linear straight line function y = f(x).
Let us consider two points (x1, y1) and (x2, y2) where y1 = f(x1) and y2 = f(x2) ∀ x1 < x2. Let’s place them somewhere randomly on the graph.
Now, consider a point (x,y) on the function f(x) such that x1<=x<=x2.
From the figure, for triangle ABC we have: tan𝛳 = (y2-y1)/(x2-x1)
Similarly for triangle ADE we have: tan𝛳 = (y-y1)/(x-x1)
Equating both the equations, we get our Interpolation Formula:
X = x1 + ((x2 – x1) * (y – y1)) / (y2 – y1)
If we consider our input array as a function f(x) then,
x1 => low and y1 => array[low]
x2 => high and y2 => array[high]
x => pos and y => array[pos] i.e. key
Substituting these values we get,
pos = low + ((high – low) * (key – arr[low])) / (arr[high] – arr[low])
(You might be wondering this is the basic coordinate geometry formula which we have used thousands of times in solving coordinate questions. Yes, that’s it! Complex concepts are based on basic logic.)
Algorithm:
- Step 1: In a loop, calculate the value of pos using the above formula.
- Step 2: If it is a match, return the index of the item, and exit.
- Step 3: If the item is less than the element at position pos, calculate the target position of the left sub-array. Otherwise, calculate the same in the right sub-array.
- Step 4: Repeat until a match is found or the search space reduces to zero.
Implementation:
int interpolationSearch(int* array, int n, int key){
int high = n-1;
int low = 0;
int mid;
while ( (array[high] != array[low]) && (key >= array[low]) && (key <= array[high])) {
mid = low + (int)( ((float)(high-low) / (array[high] – array[low])) * (key – array[low]) );
if(array[mid] == key)
return mid;
else if (array[mid] < key)
low = mid + 1;
else
high = mid – 1;
}
return -1;
}
Complexity:
- Worst-case time complexity: O(N)
- Average case time complexity: O(log log N)
- Best case time complexity: O(1)
- Space complexity: O(1)
- On assuming a uniform distribution of the data on the linear scale used for interpolation, the performance can be shown to be O(log log n)
This algorithm can be really useful in performing search operations in a large set of data as well as in distributed systems. For further reading, you can refer to the original research paper of Interpolation Search at http://www.cs.technion.ac.il/~itai/publications/Algorithms/p550-perl.pdf
Happy Coding
-An Article by Harsh Agarwal, 3rd year Information and Technology