Django Model Fields With Attributes
I wanted to make a model field where the underlying data is a string, but the field on model instances exposes more attributes. Specifically, a MarkdownField:
class Document(models.Model):
text = MarkdownField()
that exposes a way to get at its content as both HTML and source Markdown:
>>> doc = Document(text="hello, *world*")
>>> doc.text
"hello, *world*"
>>> doc.text.html
"hello, <strong>world</strong>"
This is not too uncommon in Django-land – for example, Django’s built-in FileFields work this way. Surprisingly, the pattern for accomplishing this doesn’t seem to be documented anywhere (that I can find), so here we go.
There are three steps: a Django model field class; a Python descriptor; and the final value object that’ll be exposed as model_instance.field
.
Part 1: Field Class
class MarkdownField(django.db.models.TextField):
def contribute_to_class(self, cls, name, **kargs):
super().contribute_to_class(cls, name, **kargs)
setattr(cls, self.name, MarkdownDescriptor(self))
Django calls contribute_to_class
for each field defined on the model, at class definition time (it’s called Model
’s metaclass). Explaining exactly what’s happening would require getting into how metaclasses work, which is way outside the scope I want to cover here. So short version: contribute_to_class
is Django’s mechanism for allowing fields to modify the classes on which they’re included.
In this case, we set a descriptor on the new class, under the field’s name. So in the above example, this will set Document.text
to be a MarkdownDescriptor
instance.
Part 2: Descriptor
Descriptors are neat: they’re a way of overriding what happens when you access a class attribute (e.g. doc.text
.). Descriptors are a class with __get__
and __set__
methods. When an attribute is accessed on an instance, if the class has a descriptor under that same attribute, Descriptor.__get__
will be called instead. So, by setting Document.text
to a MarkdownDescriptor
, we will call descriptor.__get__
any time doc.text
gets accessed (and same goes for setting values and __set__
).
Here’s that descriptor:
class MarkdownDescriptor:
def __init__(self, field):
self.field = field
def __set__(self, instance, value):
instance.__dict__[self.field.name] = value
def __get__(self, instance, cls=None):
if instance is None:
return self
return MarkdownString(instance.__dict__[self.field.name])
__init__
recieves the field object (from contribute_to_class
) and hangs on to it for later (we need to know field.name
).
__set__
saves the underlying data (the raw Markdown) into ___dict__
. We have to use __dict__
because if we tried to set instance.text
any other way, it’d just call the descriptor!
Then, when __get__
is called returns a MarkdownString
instance that wraps the fields actual value, again using __dict__
. Descriptors also get called when fields are accessed on the class (Document.text
), and if that happens instance
will be None
. So the if instance is None
check makes sure we’re being called as some_document_instance.text
.
Part 3: final object
So now, when some_document.text
is accessed, we can return whatever object we want instead of the underlying string. This part’s the simplest:
class MarkdownString(str):
@property
def html(self):
return MarkdownIt().render(self)
Just a str
subclass with that html
property, here using markdown-it-py to render the markdown.
Isn’t there a way to do this without metaclasses and descriptors?
Yes, there is: you could accomplish the same thing using the field methods that convert between Python and database-level representations (I think you’d need to_python
, get_db_prep_value
, get_prep_value
, and maybe deconstruct
).
For something like this markdown field, either method works fine. I like this one slightly more, which is why I tend to use it when I want sub-attributes on fields, like this. I think once you get into wanting things to happen on assignment (in __set__
), like FieldField
and friends, this descriptor method starts to be a lot clearer.