Rushing Labs

Learning Python CRUD - Data Relations

Intro§

In thinking about my strategy for pulling data from database to client, I came across these two articles. They highlighted a couple key issues for me.

Problems:

  • Joins in SQLAlchemy -> I needed to query data across tables regularly
  • N+1 Problem -> Querying for some common pages could produce dozens of DB calls, or complex client logic
  • Dynamic Data Transformations -> Joined row-based data from SQL requires transformation into nested JSON-like models

Apparent Query Differences§

Apparently, we can replace a whole model declaration with field references, and it produces more direct SQL in the ORM-produced query.

res = db.query(Budget).join(Category).filter(
    Budget.id == Category.budget_id) #.all()
    return res

This only puts all of the Budget fields in the select clause!

res = db.query(Budget.name, Category.name).join(Category).filter(
        Budget.id == Category.budget_id) #.all()
print(res)
return res

SELECT budget.name AS budget_name, category.name AS category_name   
FROM budget JOIN category ON budget.id = category.budget_id
WHERE budget.id = category.budget_id

SQLAlchemy Relationships Declarative vs. Explicit Schemas§

These two articles explain setting up SQLAlchemy table relationships

SQLAlchemy docs for declarative, explicit schemas

Ref: https://docs.sqlalchemy.org/en/14/orm/declarative_tables.html

A Problem -- How to Handle Joins, Mapping Related Data to Nested Models

This is going from the relational SQL data to the nested, document-structure of JSON. For, the resulting data models being sent to the client.

[insert (relational data pic) -> (nested data pic)]

Essentially, this boils down to "How do we move data from SQLAlchemy models to Pydantic models?" And it appears Pydantic's ORM Mode solves the issue, however I couldn't get this working with nested Pydantic models. Also, this transformation needs to work with potentially sparse models. We don't want to be forced to always return full data models to the client, when much less data is needed for the desired page/component.

Ref: https://pydantic-docs.helpmanual.io/usage/models/#orm-mode-aka-arbitrary-class-instances

Many others have had this problem

I also tossed around the idea of moving to a completely different modeling library: SQLModel

It looks like a clever layer on top of Pydantic and SQLAlchemy, but I was already learning enough with tutorials & docs made specific to Pydantic and SQLAlchemy. So, I'm steering clear of potentially getting stuck juggling libraries.

Ref: https://sqlmodel.tiangolo.com/features/

Problem Queries - Joins§

For any query only touching a single table, the SQLAlchemy + Pydantic + FastAPI combination is pretty slick. Needing to nearly duplicate models for Pydantic and SQLAlchemy is questionable, but the rest works pretty well.

However, joins are different. At least, they were in my experience. First, let's look at an example query.

Example Query

 
db.query(
    Budget, Category
    ).join(Category).filter(Budget.id == Category.budget_id).first()

The first problem I encountered was not being sure how to write a join via the ORM. If we have .join() it seems there would be an .on(), but no...we use .filter() instead. But remember how we mentioned we could specify fields in our query instead of the whole model, and some of those ORM and Pydantic models were troublesome? Through a series of bugs I found I needed the query to be re-written something like the following:

db.query(
    Budget.id,
    Budget.name,
    Category.id,
    Category.budget_id
    ).join(Category).filter(Budget.id == Category.budget_id).first()

Now, this is wonderful. I'm able to specify the columns I need in SQL to match the fields I need to fulfill in my models. Also (not discussed here) but in the client view this data will eventually be displayed in, we need to show a realtionship between the Budget and Category model data. So, that Category.budget_id is important.

But the column names aren't returned directly in our resulting dataset. Presumably, only the values are.

Adding the label() to columns§

Ref: https://stackoverflow.com/questions/9187530/using-alias-for-select-as-in-sqlalchemy

Accessing Column names, and Handling RMKeyView - single row§

Ref: Google Search - "python rmkeyview"

Found column names are accessible in `res.keys()`. This is RMKeyView

When querying a single row from SQLAlchemy (notice the .first()), we receive, well...a single row. However, that row does not come with columns defined. This is something I struggled with, but eventually found a messy solution via several StackOverflow pages.

Note: There is likely a cleaner way to do this. I couldn't find it...So, gave up and resolved to ripping data out of the results manually. Please tell me there is a better way to do this.

Example Query

res = db.query(
    Budget.id.label('budget_id'),
    Budget.date_created.label('budget_date_created'),
    Budget.name.label('budget_name'),
    Category.id.label('category_id'),
    Category.date_created.label('category_date_created'),
    Category.name.label('category_name'),
    Category.budget_id.label('category_budget_id')
).join(Category).filter(Budget.id == Category.budget_id).first()

This query returns a sqlalchemy.engine.row.Row type, and it's indexible. So, accessing our data is pretty straightforward with an index, res[0]


res[0] -> UUID('some-data-here')

But to access column names, we have to use res.keys(). Here's the problem. res.keys() returns an RMKeyView type, containing our column names, and it isn't indexable, but we can iterate over it.


for key in res.keys():
    print(key)

Also, we're using .label('column name') on the column selections in our query, to provide a custom name for each column. Without this, the key is the same as the column name in the database. Nomrally this is fine, however it would be difficult to keep track of the differing Budget.id and Category.id columns, so providing column aliases is necessary.

So, an ugly bit of code, but this is how I was able to manually align the query result data with the column values, and then transform to our models.


res = db.query(
    # query removed for brevity
).first()
keys = [] # setup a list to hold column names
    for key in res.keys():
        keys.append(key) # append each key (i.e. column name) to our new list
mydata = {}                 # setup a dictionary for our column:data, key-value store
i = 0                       # an index counter
# This loop assumes there is an equal number of keys, and number of data values returned
while i < len(keys):            # iterate over our keys list (the new one)
    mydata[keys[i]] = res[i]    # each index of keys[] will be the key of mydata, and the value is pulled from res[i]
    i += 1                      # increment counter

Accessing Column names, and Handling RMKeyView - multiple rows§

Ref: https://www.tutorialspoint.com/sqlalchemy/sqlalchemy_orm_working_with_joins.htm But I actually needed to join several tables together, not just Budget and Category, and using this same join() + filter() broke when repeating it for several tables, but I didn't really understand why.

Instead, we can use an explicit condition for the join. Notice the .filter() use goes away.

res = db.query(
        Budget.id.label('budget_id'),
        # -- more field definitions go here --
        Envelope.id.label('envelope_id'),
        Transaction.id.label('transaction_id')
    ).join(Category, Budget.id == Category.budget_id
    ).join(Envelope, Category.id == Envelope.category_id
    ).join(Transaction, Envelope.id == Transaction.envelope_id).all()
for row in res:
    print(row) # See the RMKeyView, row here
    print(row.keys()) # See the keys (column names) for the row here

However, an .all() query is a series of rows. So, we need to alter our data transformation to handle multiple rows.

Example Code§

result = db.query(
    # -- the same field definitions; removed for brevity
).join(Category).filter(Budget.id == Category.budget_id).all()
# NOTE: this time result is a set of rows
keys = [] # setup a list to hold column names
for key in result[0].keys(): # Get keys from the first row in our result set
    keys.append(key)
mydata = [] # This time we have a list to hold multiple objects
for row in result:
    i = 0
    myobj = {}
    while i < len(keys): # this loop translates each row
        myobj[keys[i]] = row[i]
        i += 1
    mydata.append(myobj) # put our translated object in our new collection
# Then iterate over our new collection to create our objects
for data in mydata:
    b = Budget(
        id = data['budget_id']
        date_created = data['date_created']
        # so on...
        category = Category(
            id = data['category_id']
            date_created = data['category_date_created']
        )
    )
RushingLabs - Analytics