CSSE 230
SortingRaces Programming Assignment
Overview
For the first part of this assignment, you will implement and
test a Binary Heap. See here: heaps.html
In this assignment, you will compare the speeds of various O(n
log n) sorting algorithms.
- This is a pair assignment. Pairs and repository names are
on the schedule page.
- You will study the runtimes of six sorting
algorithms: three built into Java, and three that you write (or have
recently written!). Built-in:
- mergesort. Java's mergesort is supposed to be optimized
when array is partially sorted.
- quicksort. Java's quicksort is an optimized
"dual-pivot" version that is supposedly faster than the single pivot
version we learned.
- bstSort. Here you just insert the items into a binary
search tree and then iterate through the tree, copying the data back
into the array. (Can you explain why this is O(n log n)? It would be
great if we could use your solution to EditorTrees, but EditorTrees
don't sort their data like AVL trees do. Please use java's TreeSet
instead. It won't be a perfect comparison if the array to be sorted has
duplicates (since they sets throw out duplicates), but it will suffice.
Be sure to resize the array to account for any duplicates that were
thrown out (see FAQ below).
Ones you provide:
- heapsort. done in part 1.
Drag and drop BinaryHeap and BinaryHeapTest into your
project. If Eclipse gives you an error in BinaryHeapTest, hover your
mouse over the first error at the top of that file, and it will give
suggestions as to how to fix it (you might need to click the error to
see the suggestions). Choose the last option, "Fix Project Setup" and
then add JUnit 4 to the build path in the next window.
- skiplist sort. Copy your
SkipList class into this package, so you can do some minor
modifications.
- First, implement the
public void
sort(T[] array)
method that inserts all the array elements
into your SkipList, and iterates through the skip list, updating the
array to be sorted.
- Second, make the max height to be O(log(n)), where
n is the maximum number of elements you want to sort using this
algorithm. (Your elements will probably require
a height of 3 * log n with probability 1/n2, so
that's a pretty safe choice.) You can figure out this value of the
array size you are using (say 10 000 000) and hard code it into your
program. Better (although admittedly requiring more changes), let your
program calculate the size from the number of elements you are sorting.
- Third, recognize that SkipLists have two
constructors - one takes a
seed
parameter for
the random number generator, so is completely predictable (for
debugging) - use any fixed seed you like. If you omit the seed, it will
be random. (Oracle's documentation says, "This constructor
sets the
seed of the random number generator to a value very likely to be
distinct from any other invocation of this constructor." Pretty
cryptic, but they could use the system time as a seed, for example.)
- quicksort. Implement a
simple version like we did in class so we can see the effects of Java's
optimizations. For simplicity, you can hardcode it for ints. Try to
recall it from the concepts you remember from class, but feel free to
refer to Weiss or other sources as needed. You may have to modify the
given code to handle cases where the pivot repeats.
-
Test your code using at least four kinds of arrays, so you
can investigate sort performance under various conditions.
- Random order: We have given you tests for this.
- Random order, but guaranteed to have no duplicates. To
create such an array, you can create an array such that a[i] = i, and
shuffle it. The Collections class has a static method called
shuffle
that will efficiently and throughly shuffle the elements. - Almost sorted. Start with a sorted array and perform 0.01*n swaps of pairs of elements in random
positions (to "slightly" shuffle it). This will introduce some inversions.
- Almost sorted, but backwards. Just reverse the array
from the previous step.
Your array size should be large enough so that all of your
sorts take at least 25 milliseconds. If you go much smaller than that,
it will be hard to tell them apart. You can go much larger for most of
the sorts.
Frequently Asked Questions
- Do I need to use both of AllBinaryHeapTest and
AllBinaryHeapGenericContructorTest? Answer:
No. Use the
first if your
heap implementation used ArrayLists and the second if it uses arrays.
Comment out the other one.
- Java's TreeSet doesn't allow for duplicates? So
what if I
get them? Answer: reallocate
the size of the array to be the
size of
the tree (which has no duplicates)so the duplicates are ignored. This
doesn't remove emough elements to affact timing significantly (for
example, I once had 2 duplicates when testing with n=50000 elements).
- The quicksort code you gave us in class doesn't
handle the
i and j walking out of bounds. What should I do? Answer:
Modify the
code to handle that case. Note that it doesn't handle the case where
the pivot element is duplicated - you'll need to handle that too!
- I want to display array ar while debugging, but
there is no
ar.toString()
. Do I need to write my own
method to print the array?
Answer: No. Instead, call Arrays.toString(ar)
Deliverables and grading outline
- (30 pts) Binary Heaps and heapsort, as discussed in that
specification. Copy BinaryHeap into this project (you may submit one
implementation per pair) and commit it to the repo.
- (20 pts) Code for your other sorts, and your timing code.
Each sort you provide (heapsort, skiplistsort, quicksort) should be a method the appropriate class (so BinaryHeap has a
sort()
method), and committed to the repo. For quicksort, sort()
can be static since you don't need an instance of the
class.
- (20 pts) A writeup giving a 2D table of sort times (sort
vs. data). Discuss the performance of each algorithm. Did all of them
run fine? Were there any memory issues? Which was fastest for each type
of array? Was anything surprising or did you expect this? From what you
know, try to explain what you see. Use the document here Analysis.docx as a template.