Kids in a Schoolyard

You find yourself in the middle of a schoolyard, sorting students by height, wondering if that education degree was worth the size of your student loans.

Why? Well, the famous photographer Rita Multishot is testing out several compositions for the school photo, and she’s asking you to arrange and rearrange the students.

The last shot had the boys are girls are in two separate lines of ascending height, like the illustration. How did you end up in the photo?

Now she wants you to give her one line of kids with strictly increasing height, and she wants to put the remainder of the kids in a second line with increasing height for the next shot.

You remember from algebra that strictly increasing means that every height will be larger than the last.

You conclude that you need one representative line with heights all different, and one with all the duplicates, both ascending.

Hurry! Time’s a wasting, and the sun is just perfect for taking pictures.

Are there any ambiguities? I think we can safely assume that the number of kids is finite -it’s in a schoolyard!

What about the data type? Let’s assume that our input is height in inches.

Let’s also suppose we are passed sorted arrays of heights for each of the boys and girls, and we return two sorted arrays, one strictly increasing, and one with the duplicates. Let’s call these arrays uniques and duplicates.

From here, we can assume:

Inputs :

boys []int, girls []int

Outputs:

uniques []int, duplicates []int

Rita is waiting, so we can safely assume that performance is a concern.

What about constraints, which might help us understand the data? Well, it seems to me that the kids are just being rearranged, but they’re not being added or deleted, so it’s safe to observe that:

len(boys)+len(girls)=len(uniques)+len(duplicates)

We’re thinking about creating a line where the heights are unique, with a second line with the leftovers. Seems like a natural place to apply a Set?

Well this is a provocative bad idea. Why?

The boys and girls are already ordered, and adding them to a Set discards that information. In addition, we’ve only solved half the problem.

A better idea would be to add both boys and girls to a map that keeps track of the number of kids of each height. We could then sort this, range through it, and easily create the two arrays. We’d use a data structure like this.

var count map[int]int

We have not overcome the issue of losing our sort while moving heights into the map, so we’ll need to sort afterwards, so our overall runtime will be O(n\log n).

Can we do better?

After all, this is only a fancy merge of two arrays. Perhaps we should put away the heavy tools and do this manually.

Let’s just pick the smallest child from the front of each line to be the next child in the unique line, and route the duplicates to the duplicates line.

This has complexity O(n), because we move each kid once (where n is the number of kids).

Let’s implement it.

Right off the bat, there are some questions. What happens if there are fewer boys or fewer girls?

How do we figure out if there’s a duplicate?

The next way this tends to go wrong, is that people have a natural tendency to code like this:

for g<len(girls) && b<len(boys) {
}
for g<len(girls) {
}
for b<len(boys) {
}

This seems rather natural, and it would work really well if we were just doing a merge. Since we’re also keeping track of the duplicates, it’s a disaster. We end up having the same check for duplicates three times, one for each loop.

We need a way to switch to this kind of loop without making spaghetti of our ifs.

for g<len(girls) || b<len(boys) {
}

p.s. There is a different trick that you can use in situations like this, but not in this case. You just ensure that the arrays never end, and exit the loop with a different condition. Like this:

boys = append(boys, math.MaxInt64)
girls = append(girls, math.MaxInt64)

If you do this, you will guarantee that you don’t hit the end of the arrays until all of the valid values have been siphoned off.

The other way this goes wrong is how we figure out whether we have a duplicate!

The natural

if boys[b] == girls[g] {
}

does not work. Can you see why?

What if there are two boys with the same height?

package schoolyard
func mergeDup(boys []int, girls []int) (uniques []int, duplicates []int) {
	var b, g, last int
	for b+g < len(girls)+len(boys) {
		var shorter int
		if g >= len(girls) || b < len(boys) && boys[b] < girls[g] {
			shorter = boys[b]
			b++
		} else {
			shorter = girls[g]
			g++
		}
		if last == shorter {
			duplicates = append(duplicates, shorter)
		} else {
			last = shorter
			uniques = append(uniques, shorter)
		}
	}
	return
}

In order to handle both the case where a boy and girl have the same height as well as the case where two boys in a row have the same height, we just keep track of the last height we added. We seed this with an invalid value (0), which is zero height according to the problem definition.

In order to make the OR work without Spaghetti IFs, we end up with the beautiful condition of line 8. We’re in a loop trying to figure out whether to choose a boy or a girl? When do we choose a boy?

  • When there are no more girls
  • When there are both a boy and girl, and the boy is shorter.

Else it’s a girl! Simple as that. One line has it all.

Overall performance : O(n).

Author: Dean

2 thoughts on “Kids in a Schoolyard

  1. Looks like the old sorting question – this is a merge sort – so…. pick the line with the shortest first person, they go to the front of the line, then compare the next two for line a and b, shorter to front, compare the next two – when one line is exhausted, just append the second line… or did I miss the question? Is there a deeper meaning? How do we handle identical heights? Can two really be identical?

    🙂

  2. SO, yes, it’s a straight merge sort, with the proviso of creating the second line elegantly on the fly. There are a couple of gotchas along the way. Click on the different tabs to see what they are!

    Heisenberg puts some limits on how *identical* heights can be, so let’s just presume…

Leave a Reply

Your email address will not be published. Required fields are marked *