Django Model Fields With Attributes

I wanted to make a model field where the underlying data is a string, but the field on model instances exposes more attributes. Specifically, a MarkdownField:

class Document(models.Model):
    text = MarkdownField()

that exposes a way to get at its content as both HTML and source Markdown:

>>> doc = Document(text="hello, *world*")
>>> doc.text
"hello, *world*"
>>> doc.text.html
"hello, <strong>world</strong>"

This is not too uncommon in Django-land – for example, Django’s built-in FileFields work this way. Surprisingly, the pattern for accomplishing this doesn’t seem to be documented anywhere (that I can find), so here we go.

There are three steps: a Django model field class; a Python descriptor; and the final value object that’ll be exposed as model_instance.field.

Part 1: Field Class

class MarkdownField(django.db.models.TextField):
    def contribute_to_class(self, cls, name, **kargs):
        super().contribute_to_class(cls, name, **kargs)
        setattr(cls, self.name, MarkdownDescriptor(self))

Django calls contribute_to_class for each field defined on the model, at class definition time (it’s called Model’s metaclass). Explaining exactly what’s happening would require getting into how metaclasses work, which is way outside the scope I want to cover here. So short version: contribute_to_class is Django’s mechanism for allowing fields to modify the classes on which they’re included.

In this case, we set a descriptor on the new class, under the field’s name. So in the above example, this will set Document.text to be a MarkdownDescriptor instance.

Part 2: Descriptor

Descriptors are neat: they’re a way of overriding what happens when you access a class attribute (e.g. doc.text.). Descriptors are a class with __get__ and __set__ methods. When an attribute is accessed on an instance, if the class has a descriptor under that same attribute, Descriptor.__get__ will be called instead. So, by setting Document.text to a MarkdownDescriptor, we will call descriptor.__get__ any time doc.text gets accessed (and same goes for setting values and __set__).

Here’s that descriptor:

class MarkdownDescriptor:
    def __init__(self, field):
        self.field = field

    def __set__(self, instance, value):
        instance.__dict__[self.field.name] = value

    def __get__(self, instance, cls=None):
        if instance is None:
            return self
        return MarkdownString(instance.__dict__[self.field.name])

__init__ recieves the field object (from contribute_to_class) and hangs on to it for later (we need to know field.name).

__set__ saves the underlying data (the raw Markdown) into ___dict__. We have to use __dict__ because if we tried to set instance.text any other way, it’d just call the descriptor!

Then, when __get__ is called returns a MarkdownString instance that wraps the fields actual value, again using __dict__. Descriptors also get called when fields are accessed on the class (Document.text), and if that happens instance will be None. So the if instance is None check makes sure we’re being called as some_document_instance.text.

Part 3: final object

So now, when some_document.text is accessed, we can return whatever object we want instead of the underlying string. This part’s the simplest:

class MarkdownString(str):
    @property
    def html(self):
        return MarkdownIt().render(self)

Just a str subclass with that html property, here using markdown-it-py to render the markdown.

Isn’t there a way to do this without metaclasses and descriptors?

Yes, there is: you could accomplish the same thing using the field methods that convert between Python and database-level representations (I think you’d need to_python, get_db_prep_value, get_prep_value, and maybe deconstruct).

For something like this markdown field, either method works fine. I like this one slightly more, which is why I tend to use it when I want sub-attributes on fields, like this. I think once you get into wanting things to happen on assignment (in __set__), like FieldField and friends, this descriptor method starts to be a lot clearer.