Improve Performance Greatly By Eliminating Branches
Problem
The next greater element of some element x in an array is the first greater element that is to the right of x in the same array.
You are given two distinct 0-indexed integer arrays nums1 and nums2, where nums1 is a subset of nums2.
For each 0 <= i < nums1.length, find the index j such that nums1[i] == nums2[j] and determine the next greater element of nums2[j] in nums2. If there is no next greater element, then the answer for this query is -1.
Return an array ans of length nums1.length such that ans[i] is the next greater element as described above.
Example 1:
Input:
nums1 = [4,1,2],nums2 = [1,3,4,2]Output:[-1,3,-1]Explanation: The next greater element for each value of nums1 is as follows:
- 4 is underlined in
nums2 = [1,3,4,2]. There is no next greater element, so the answer is -1.- 1 is underlined in
nums2 = [1,3,4,2]. The next greater element is 3.- 2 is underlined in
nums2 = [1,3,4,2]. There is no next greater element, so the answer is -1.
Example 2:
Input:
nums1 = [2,4],nums2 = [1,2,3,4]Output:[3,-1]Explanation: The next greater element for each value of nums1 is as follows:
- 2 is underlined in
nums2 = [1,2,3,4]. The next greater element is 3.- 4 is underlined in
nums2 = [1,2,3,4]. There is no next greater element, so the answer is -1.
Constraints:
1 <= nums1.length <= nums2.length <= 10000 <= nums1[i],nums2[i] <= 104- All integers in
nums1andnums2are unique. - All the integers of
nums1also appear innums2.
Solution
With the monotonic stack technique, we could achieve O(n) time complexity.
func nextGreaterElement(nums1 []int, nums2 []int) []int {
greater := map[int]int{}
decreasing := []int{}
for _, n := range nums2 {
for len(decreasing) > 0 && n > decreasing[len(decreasing)-1] {
greater[decreasing[len(decreasing)-1]] = n
decreasing = decreasing[:len(decreasing)-1]
}
decreasing = append(decreasing, n)
}
var res []int
// !!!!!!!!!!!
for _, n := range nums1 {
if v, ok := greater[n]; ok {
res = append(res, v)
} else {
res = append(res, -1)
}
}
return res
}
But the data shows it only beats 17% solutions in terms of runtime.
What happened?
Notice the !!!!!!!!!!! comment line in the solution, the code under it is very suspicious, as we all (maybe) know,
if statement inside a loop causes performance penalties, because of the branch prediction failure,
okay, let’s eliminate the branch: instead of determin whether the number exists in the map, we loop over the stack,
and assign -1 to the corresponding element in the map in advance.
func nextGreaterElement(nums1 []int, nums2 []int) []int {
greater := map[int]int{}
decreasing := []int{}
for _, n := range nums2 {
for len(decreasing) > 0 && n > decreasing[len(decreasing)-1] {
greater[decreasing[len(decreasing)-1]] = n
decreasing = decreasing[:len(decreasing)-1]
}
decreasing = append(decreasing, n)
}
// added here
for _, n := range decreasing {
greater[n] = -1
}
var res []int
for _, n := range nums1 {
res = append(res, greater[n])
}
return res
}
Submit it again, and boom! 100% beats!
Algorithms aren’t the only core of performance, but also subtle places like this, which directly related to the structure of modern CPU.