How did I get here?
Fix an awkward relationship. That was the name of the story that I took on as the NoRedInk Writing team embarked on creating our new Peer Review writing product. As we perfected how a teacher can assign and move their students through a cycle of writing steps, it became increasingly obvious that the functionality which we needed to add was not elegantly handled by our current data model. At the end of the day, we needed a new table.
This is a common problem many people working on large, evolving codebases face. What do we do when we need to add a new table but want to use many of the same relationships and components of another table? How do we elegantly handle jumping back and forth between tables that have similar relationships to the same table? The following is a breakdown of my discovery and implementation of polymorphic relationships. It’s not always the right decision, but when it is, it can make working within your tables simpler, prettier, and more efficient.
The Problem
Our current table which handled writing steps, WritingCycles, needed to stick around while we made the gradual transition to our new table, PeerReviews. A single row in the WritingCycle table held information about an assigned series of writing step assignments (writing, practicing topics, rating peers, and revising). It did not point to the corresponding assignment rows in their respective tables. However, the assignments did point to their writing cycle. In other words, the assignments knew about their umbrella writing cycle, but a writing cycle could not reach its assignments. This could have been fixed with a ‘belongs_to’, but it would then have to be duplicated for the new PeerReviews table. This would mean in each assignment table there would be a peer_review_id column and a writing_cycle_id column. These ids should never exist at the same time—yet they could be. If you are thinking “wow, this is really awkward”, that is exactly what we were thinking too.
Our initial solution was to change the direction of the relationship. A writing cycle could point to its assignments, but an assignment could not point to its writing cycle. It wasn’t ideal, but this would allow us to create rows in the PeerReviews table with corresponding assignments, without any hiccups in the WritingCycle→ Assignments relationship.
A First Stab
This initial solution would, of course, solve our problem, but it would involve changing anywhere in the codebase where we referenced assignment.writing_cycle. We would have to change the logic in more than just a few locations. It felt dangerous but in the end I accepted my fate and moved forward with the solution.
About a half hour into writing migrations, I approached my manager, Josh Leven, my go-to Ruby expert, with a random syntax question. For a few minutes he stared at my code, his expression turning more and more confused. Finally, he asked “Why aren’t you using a polymorphic relationship?”.
“Huh?”, I responded. He proceeded to explain.
Polymorphic Relationships!
A polymorphic relationship is a created relationship between at least three tables. It is used when a table can have the same relationship with at least two different tables. Take for example, a Meal table. A meal can be eaten by a customer, in which case the meal row will have a field for the customer_id. But what if a meal can also be eaten by an employee? We cannot just put the employee_id in the customer_id field as there is no protection against the same id being in both tables. In this case, we would want one column, consumer_id, which can hold either the customer_id or the employee_id, as well as another column, consumer_type, which can hold a string indicating which type of consumer it is, “Employee” or “Customer”. Now, when we want to see who ate the last piece of pie, our database will check the consumer_type, go to the respective table, and grab the row where id equals consumer_id. The great part about this solution is that it works both ways—a query can also return all of the meals any given consumer or employee has eaten.
Let’s think about another way to do this before we dive into how to implement a polymorphic relationship. Many have tried to solve this dilemma by creating two id columns, one for each parent table. So, in our example above, we would have a customer_id and an employee_id column. Both would be a foreign key to their respective tables. If you wanted to check who ate the pie, you would check which of the two was not NULL, and query for the row with that id. Technically, this would work. But where this fails is that there is no guarantee that the one of the two columns is always NULL. There could be a situation where there is an id in both columns. Unless the customer and an employee shared a piece of pie in some scandalous breach of professionalism, this situation should not occur in our database. Adding a consumer_type and consumer_id pattern will make sure we can always have non-null values in both columns so it will always point to just one row.
I imagined this is what it felt like when the early pioneers found a short cut through the Rocky Mountains…or something like it. A polymorphic relationship would both allow me to create parallel relationships between WritingCycles and Assignments and PeerReviews and Assignments, as well as prevent me from having to do risky logic changes throughout the codebase. I was ecstatic. I went to work.
How To
Luckily, Rails makes creating polymorphic relationships pretty straightforward.
First, I ran a migration to create the parent_assignment_id
and parent_assignment_type
columns on our assignment tables.
class AddPolymorphicParentAssignmentOnRatingAssignments :environment do |t, args|
RatingAssignment.update_all(parent_assignment_type: "WritingCycle")
RatingAssignment.update_all("parent_assignment_id=writing_cycle_id")
end
We could have also renamed writing_cycle_id
, but this would mean that the migration and updating the code with the new names and relationships would all have to happen in one pull request. I opted to have both the writing_cycle_id
and the parent_assignment_id
exist at the same time as I made the switch to make it a wee bit safer.
Great! I now had two new columns in my assignment tables, parent_assignment_id and parent_assignment_type, which were filled in with the correct data from the corresponding parent WritingCycle! When we were ready to create assignments for PeerReviews it would be as easy as adding the PeerReview id as the parent_assignment_id and “PeerReview” as the peer_assignment_type!
Getting everything to work
The next part got a little chewy. In order to delete the redundant writing_cycle_id column from the assignment tables, I had to swap any reference to it with parent_assignment_id as well as specify the parent_assignment_type where applicable. My team lead, Tessa Kelly, and I, took our time with this one. Jumping into code you haven’t written to change a tiny thing and moving on is deceptively easy. In many files, though tests were passing, I found I had made the code ambiguous and prime for false positives. Three of us reviewed this pull request before we felt okay to merge it into master.
The final bolts we had to screw in were our test alterations. We use a form of FactoryBot which creates rows in our test database to test against. We altered the assignment factories to create a writing cycle and assigned it to the parent assignment:
parent_assignment { create :writing_cycle }
Again, we had to change any reference to the assignments’ writing cycle and replace it with a parent assignment. We took our time with this one too. Subtleties of tests can sometimes take a moment to catch on to and we tried our best to preserve the intentions. If I were to do this over again, I would update the factories before changing any tests. Know what you’re testing against!
It works!
Finally, after all the dust had settled, we deleted the writing_cycle_id column. I was so happy when it was gone, I had to stand up and do a little jig. Gleefully, I created a peer review in the console and jumped between it and its assignments and back again. I had dedicated countless hours to create such a simple and elegant relationship, and I had no regrets.
If you’re struggling in the future with tying a bunch of tables together with tenuous associations, do yourself a favor and look into polymorphic relationships. Hopefully, this blogpost will shed some light on a pretty cool tool to help elevate your database.
Ally McKnight
@allykmcknight
Engineer at NoRedInk