Big Bunny Software

SQL Alchemy and MySQL JSON

At my last job we stored certain data as serialized JSON in MySQL TEXT columns. While it was annoying to json.loads and json.dumps to read and write to it, it was a convenient way of storing unstructured data, or data with a frequently changing model. Using JSON (sparingly) in a relational database can have its place. As it turns out MySQL 5.7+ has a JSON datatype which makes it easier to work with JSON natively.

SQLAlchemy makes this available with its own JSON datatype. Here's how to use it:

from sqlalchemy.types import JSON
...

class Pick(db.Model):
   ...
   # Define a JSON column
   args = Column(JSON)
   ...

# Read a value
model.args['task']
# Read a value, cast to type
model.args['n'].as_integer()

# Filter by a value
db.session.query(Pick).filter(Pick.args['task'] = 'Test Task')

JSON here is actually a facade for the database-specific implementation. At this time of writing SQLAlchemy supports JSON for PostgreSQL, MySQL 5.7+, and SQLite 3.9+. You can define columns with the vendor-specific type yourself if you want to, too.

I was curious what query SQLA is building underneath (for MySQL) so I examined it in an IPython session:

In [29]: from core.db import db

In [30]: from core.models import Pick

In [31]: q = db.session.query(Pick.id, Pick.args).filter(Pick.args['task'])

In [32]: from sqlalchemy.dialects import mysql

In [33]: print(q.statement.compile(compile_kwargs={"literal_binds": True}, dialect=mysql.dialect()))
SELECT picks.id, picks.args
FROM picks
WHERE JSON_EXTRACT(picks.args, '$."task"')

SQLA is using MySQL's JSON_EXTRACT function for getting values, so there's no regular expression matching required like you'd otherwise have to do for filtering on JSON in a TEXT type!

There is a caveat though - SQLA doesn't persist changes to the values in a JSON column as you might expect it would.

model = db.session.query(Pick).first()
assert(model.args['task'], 'Old Value') # True
model.args['task'], 'New Value')
db.session.commit()
db.session.query(Pick).first()
assert(model.args['task'], 'New Value') # False

SQLA's documentation comments on this.

The JSON type, when used with the SQLAlchemy ORM, does not detect in-place mutations to the structure. In order to detect these, the sqlalchemy.ext.mutable extension must be used. This extension will allow "in-place" changes to the datastructure to produce events which will be detected by the unit of work. See the example at HSTORE for a simple example involving a dictionary.

So you need to use the sqlalchemy.ext.mutable extension for in-place changes to mutable types (like dict) to be detected. In my application I redefined my column to wrap the JSON datatype with MutableDict.

from sqlalchemy.ext.mutable import MutableDict
...
args = Column(MutableDict.as_mutable(JSON))

This works for scalar values, but changes to lists and nested dicts still won't be detected. SQLA's documentation comments on this too.

Note that MutableDict does not apply mutable tracking to the values themselves inside the dictionary. Therefore it is not a sufficient solution for the use case of tracking deep changes to a recursive dictionary structure, such as a JSON structure.

Basically that means you need to write your own mutation tracking implementation that recursively tracks mutations in values, too. Fortunately somebody has done that already with sqlalchemy-json. You could also get SQLA to detect changes on nested dicts here by copying the nested dict and modifying the copy, then reassigning it to the key.

How I'm using JSON datatypes in my application

I've opted to use JSON columns for some data in Pickle because...

  • I have a group of related attributes, and I know I'll adding more. I don't want to modify my table each time I do.
  • It makes certain tables more readable because grouped attributes (in my case, Pick args) are grouped in one place.
  • I don't have to set up new relationship and association proxy for something like storing a list of included users.

There are also key disadvantages to keep in mind, though.

  • It makes it more difficult to strictly define a data model since I can't force a certain JSON structure out of the box. I may be able to enforce structure if necessary with marshmallow though.
  • Because of the extra logic required to extract values from JOSN columns, JOINs on JSON attributes (like a list of users) will likely be less efficient than a traditional relationship.
Development #database #sqlalchemy