Friday, March 28, 2014

LeetCode: Median of Two Sorted Arrays

There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).


The naive idea for this problem is merge these two arrays first and then find the median, which will take O(m+n) time O(m+n) space. Another idea is utilizing binary search, each time we can shrink the array by comparison.

C++ Code:

/*
 * func: find_kth_sorted_arrays
 * goal: find the kth element in two sorted array
 * @param A[]: sorted array A
 * @param m: size of sorted array A
 * @param B[]: sorted array B
 * @param n: size of sorted array B
 * @param k: the kth index
 * return: the kth element of two sorted arrays
 */
int find_kth_sorted_arrays(int A[], int m, int B[], int n, int k){
    //check for k, m and n
    assert(k>0), assert(m>=0), assert(n>=0);
    assert(k <= m+n);
    
    //we try to put the smaller-sized array ahead to make it simple
    if(m > n){
        return find_kth_sorted_arrays(B, n, A, m, k);
    }
    
    //Corner Case1: if one array is empty
    if(m == 0){
        return B[k-1];
    }
    
    //Corner Case2: if k is 1
    if(k == 1){
        return A[0] > B[0] ? B[0] : A[0];
    }
    
    //Specify index in A and B
    int index_a = min(k/2, m);
    int index_b = k - index_a;
    
    //If A[index_a-1] <= B[index_b-1], elements from A[0] to A[index_a-1] can be
    //filtered since they are too small
    if(A[index_a-1] <= B[index_b-1]){
        return find_kth_sorted_arrays(A+index_a, m-index_a, B, n, k-index_a);
    }
    //The same situation for array B
    else{
        return find_kth_sorted_arrays(A, m, B+index_b, n-index_b, k-index_b);
    }
    
}

/*
 * func: find_median_sorted_arrays
 * goal: find the median of two sorted array
 * @param A[]: sorted array A
 * @param m: the size of sorted array A
 * @param B[]: sorted array B
 * @param n: the size of sorted array B
 * return: the median of two sorted arrays
 */
/*
 *Basic Idea: Use binary search to speed up searching
 */
double find_median_sorted_arrays(int A[], int m, int B[], int n){
    //Compute the total size first
    int total = m + n;
    //If the total size is odd, then the median is the middle element
    //Note >> has lower precedence than '+'; '+' will be evaluated first
    if(total & 1){
        return find_kth_sorted_arrays(A, m, B, n, (total>>1) + 1);
    }
    //Else, the median is the average of the two middle elements
    else{
        return double(find_kth_sorted_arrays(A, m, B, n, total>>1)
                      + find_kth_sorted_arrays(A, m, B, n, (total >> 1) + 1))/2;
        
    }
}

Python Code:

# func: find the kth element of two sorted arrays
# @param A: sorted array A
# @param B: sorted array B
# @param k: the kth index
# @return: the kth element
def find_kth_sorted_arrays(A, B, k):
    assert 0 < k <= len(A) + len(B), "Invalid input k"

    if len(A) > len(B):
        return find_kth_sorted_arrays(B, A, k)

    #Corner Case1: if one array is empty
    if not A:
        return B[k-1]

    #Corner Case2: if k is 1
    if k == 1:
        return min(A[0], B[0])

    #Decide searching index in A and B
    index_a = min(k/2, len(A))
    index_b = k - index_a

    #If A[index_a-1] <= B[index_b-1], then elements from A[0] to
    #A[index_a-1] could be too small
    if A[index_a-1] <= B[index_b-1]:
        return find_kth_sorted_arrays(A[index_a:], B, k-index_a)
    else:
        return find_kth_sorted_arrays(A, B[index_b:], k-index_b)


#func: find the median of two sorted arrays
#@param A: sorted array A
#@param B: sorted array B
#@return: the median value
def find_median_sorted_arrays(A, B):
    #the total length for two arrays
    total_length = len(A) + len(B)

    #If the length is odd, then return the middle element
    if total_length % 2 != 0:
        return find_kth_sorted_arrays(A, B, total_length/2+1)
    #else, return the average of the two middle elements
    else:
        return float(find_kth_sorted_arrays(A, B, total_length/2)+
        find_kth_sorted_arrays(A, B, total_length/2+1)) / 2

Extension: What if the question is changed to find the median of trillion numbers stored in several machines?

The idea is that we can use the thinking of quick sort, each time choose a pivot and keep a count of the elements bigger than the pivot and smaller than the pivot.


Reference Link:

1.Find the kth element: http://yucoding.blogspot.com/2013/01/leetcode-question-50-median-of-two.html

2.Utilization of binary search: http://hp.dewen.org/?p=1466

3.The median of a trillion numbers: http://matpalm.com/median/question.html

No comments:

Post a Comment