CSSE 230
SortingRaces Programming Assignment

Overview

For the first part of this assignment, you will need your Binary Heap from a previous assignment. See here: heaps.html

In this assignment, you will compare the speeds of various O(n log n) sorting algorithms.

  1. You will study the runtimes of five sorting algorithms: three built into Java, and two that you write (or have recently written!).
    Built-in:
    1. mergesort. Java's mergesort is supposed to be optimized when array is partially sorted.
    2. quicksort. Java's quicksort is an optimized "dual-pivot" version that is supposedly faster than the single pivot version we learned.
    3. bstSort. Here you just insert the items into a binary search tree and then iterate through the tree, copying the data back into the array. (Can you explain why this is O(n log n)? It would be great if we could use your solution to EditorTrees, but EditorTrees don't sort their data like AVL trees do. Please use java's TreeSet instead. Since TreeSets don't hold duplicates, you'll need to account for those somehow (see FAQ below).
    Ones you provide:
    1. heapsort. done in part 1. Drag and drop BinaryHeap and BinaryHeapTest into your project. If Eclipse gives you an error in BinaryHeapTest, hover your mouse over the first error at the top of that file, and it will give suggestions as to how to fix it (you might need to click the error to see the suggestions). Choose the last option, "Fix Project Setup" and then add JUnit 4 to the build path in the next window. Then run the BinaryHeapMoreTests.java tests to see how your heap does with a more complete set of tests. Debug it if needed.
    2. quicksort. Implement a simple version like we did in class so we can see the effects of Java's optimizations. For simplicity, you can hardcode it for ints. Try to recall it from the concepts you remember from class, but feel free to refer to our Weiss textbook as needed. You may have to modify the given code to handle cases where the pivot repeats. We suggest making a class called Quicksort with a static sort(array) method.
  2. Test your code using at least four kinds of arrays, so you can investigate sort performance under various conditions.

    1. Random order: We have given you tests for this.
    2. Random order, but guaranteed to have no duplicates. To create such an array, you can create an array such that a[i] = i, and shuffle it. The Collections class has a static method called shuffle that will efficiently and throughly shuffle the elements.
    3. Almost sorted. Start with a sorted array and perform 0.01*n swaps of pairs of elements in random positions (to "slightly" shuffle it). This will introduce some inversions.
    4. Almost sorted, but backwards. Just reverse the array from the previous step.

    Your array size should be large enough so that all of your sorts take approximately 25 milliseconds or more. If you go much smaller than that, it will be hard to tell them apart. You can go much larger for most of the sorts.

  3. Frequently Asked Questions

    1. Java's TreeSet doesn't allow for duplicates? So what if my array has them? Answer: Find any reasonable way to deal with this. For example, you could use a TreeMap instead of a TreeSet, if you are clever. Or you can tell when it tries to insert a duplicate and can't. Store those values and add them to the array in the right place once you are copying the data back into the array. Compared to the sort runtime, this shouldn't take long. There are typically very few duplicates to account for (for example, I once had 2 duplicates when testing with n=50000 elements). You should find a way to add them all in one O(n) loop through the array.
    2. The quicksort code you gave us in class doesn't handle the i and j walking out of bounds. What should I do? Answer: Modify the code to handle that case. 
    3. The quicksort code you gave us in class is giving me a stack overflow error in my recursion. Answer: That code also doesn't consider the case where the pivot element is duplicated - you'll need to handle that too! (When you are sorting arrays with unique elements, this won't be an issue.)
    4. I want to display array ar1 while debugging, but there is no ar1.toString(). Do I need to write my own method to print the array? Answer: No. Instead, call Arrays.toString(ar1)

    Deliverables

    Commit and push your Git repo, including:

    1. All changed .java files
    2. All newly created .java files
    3. Your completed Analysis.docx file

    Grading outline

    1. (30 pts) Binary Heaps and heapsort, as discussed in that specification. Copy BinaryHeap class into this project. 
    2. (20 pts) Code for your other sorts, and your timing code. Each sort you provide should be a method of the appropriate class (so BinaryHeap has a sort() method). For Quicksort, sort() can be static since you don't need an instance of the class.
    3. (20 pts) A writeup giving a 2D table of sort times (sort vs. data), committed to the repo. Discuss the performance of each algorithm. Did all of them run fine? Were there any memory issues? Which was fastest for each type of array? Was anything surprising or did you expect this? From what you know, try to explain what you see. Easiest for me and the grader: I put a template document called Analysis.docx in your Eclipse project. Just edit that in Word to add your analysis, and then commit it using Eclipse.