Abstraction levels

I’ve been casually discussing ’tabs vs spaces’ with a friend when this example was brought up

def get_relationship(self, other):
    if other == self:
        return 'the same'
    result = 'none'
    if other in self.known_list.all():
        result = 'in list'
    elif SomeDB.filter(id=self.id, other_id=other.id).exists():
        result = 'self first'
    else:
        db_status = SomeDB.filter(id=other.id, other_id=self.id).first()
        if db_status:
            result = 'other first'

    return result

Basically, this is an example of an approach I see almost every day and which I am definitely not a huge fan of. It has nothing to do with ‘tabs vs spaces’ holy war for sure, but it has several features I’d like to highlight. It took me more than a glance to find out that this code analyses the relationship between two objects and that there can only be 5 distinct cases. There are two reasons for that:

The flow is inconsistent. The author used two different approaches for handling similar cases. The first instance has elif and the second one uses else. Moreover, the second case throws in a basically useless variable. In addition to that, else branch doesn’t really handle the default case, but the additional code does this instead.
This method shows a heavy mix of different level logic. It has some kind of business logic that analyses the relationship, and also database interaction code which actually retrieves data. One can’t read this code without decent understanding of the database design. That also gives another responsibility to the method: not only it analyses the data, but also retrieves it. The first issue is quite easy to understand. Every source states: your code should be consistent. Similar things should be written in a similar manner to make the code easier to read. It sometimes even simplifies writing the code because an experienced engineer would replicate the pattern they encounter regularly. Consistency is a very important goal and there are a lot of instruments for maintaining it, such as style guides, naming conventions, design patterns and so on.

Now, the second issue is a bit trickier to explain in general, but in this piece of code it might be easy to spot. As I mentioned earlier, it mixes different abstraction layers, in this particular case business logic and database logic. Business logic cares about the relationship between objects, not about making database queries and trying to fetch data. In the ideal world a database object might be presented via an injected interface to make it possible to mock it and introduce unit tests to this code.

Moreover, in this particular case database code makes it difficult to notice that the queries are the same, they just have arguments reversed. This will ultimately hurt future refactoring.

Taking these considerations into account, here is how I’d do it. In this case I’d also like to use early returns.

def is_linked(first, second):
  return SomeDB.filter(id=first.id, 
                       other_id=second.id).exists()

  
def get_relationship(self, other):
    if other == self:
        return 'the same'
    if other in self.known_list.all():
        return 'in list'
    if is_linked(self, other):
        return 'self first'
    if is_linked(other, self):
        return 'other first'

    return 'none'

(Yep, the database global object is still present just because that’s not our focus here.)

The main two arguments against my approach would be:

You wrote more code! (Well, it might not be true in this particular case, but such refactoring often results in more lines of code). More code is worse, right? Well, my version may have more lines, but when one states ‘less is better’, they usually imply meaningful code, the code they will have to support in the future. From the standpoint of code maintenance my version probably has less code because now there’s only one query instead of two.
But the code is less obvious now: you can no longer see that it uses the database! That’s the point! Well, not to make the code less obvious, but to separate logic from details. Does the logic care where the data comes from? No, it just needs the data. And now the reader sees the logic clearly with no extra details standing in the way. In addition to that, now the method has a single responsibility and delegates data retrieval to another function.

Hiding the details is one of the main concepts of programming languages. When one uses print function or operator, they usually don’t want to know what API calls are hidden behind that, what drivers are involved and what really happens with the hardware. The same is true for all other levels. Each piece of code should not care about its surroundings or mix different abstraction levels.

Separating abstraction levels is important for keeping the code clean and readable. Unfortunately, in my practice this idea is being neglected the most, and it results in clumsy code which is difficult to maintain.