Update content & design (#75)
All checks were successful
Deploy to Pages / build (push) Successful in 2m58s

Reviewed-on: https://home.schmelczer.dev/git/git/andras/schmelczer-dev/pulls/75
This commit is contained in:
Andras Schmelczer 2026-05-28 16:20:12 +01:00
parent 0be50b6c24
commit b554e92e9f
83 changed files with 2995 additions and 723 deletions

View file

@ -1,6 +1,6 @@
---
title: Designing an ML Deployment API Around Best Practices
description: How GreatAI tried to make stronger ML deployment habits accessible through a small Python API.
title: A Python Framework Where Doing the Right Thing Is the Default
description: My MSc thesis. 33 catalogued ML deployment habits, a decorator-shaped Python API, and a survey of working engineers on which actually got adopted.
date: 2026-05-09
projectPeriod: '2022'
thumbnail:
@ -9,9 +9,9 @@ thumbnail:
tags: ['ai', 'systems', 'tools']
featuredOrder: 1
role: Researcher and framework author
stack: ['Python', 'ML deployment', 'API design']
scale: 33 deployment best practices, six proposed additions, evaluated with professional data scientists and software engineers
outcome: A Python framework, thesis, and research-backed API design for production-oriented AI deployments
stack: ['Python', 'decorators', 'FastAPI', 'survey design']
scale: 33 deployment habits surveyed, 6 proposed additions, framework evaluated by working data scientists and engineers
outcome: A pip-installable framework, an MSc thesis, and one strong opinion about API surface area
audience: recruiter-relevant
links:
- label: PyPI
@ -25,47 +25,29 @@ media:
- type: image
src: ./_assets/great-ai.png
alt: Example Python code using GreatAI decorators and prediction helpers.
caption: GreatAI's public surface was designed to keep deployment best practices close to the application code.
caption: A working GreatAI service is about ten lines on top of a plain prediction function.
---
GreatAI started from a practical frustration: applying machine learning was becoming easier, but deploying it well was still easy to get wrong. Many failures were not about model architecture. They were about missing metadata, weak versioning, poor reproducibility, untracked inputs, or interfaces that made the right behaviour too cumbersome to use.
By the end of 2021 I had stopped believing the people skipping ML deployment best practices were the problem. They knew the list. They agreed with the list. They had a deadline, and every item on the list cost five lines of glue. My MSc thesis turned that into the actual research question: not "what should engineers do" but "what API shape makes doing the right thing cheaper than not." The framework that fell out, `great-ai`, is a decorator on a plain Python function. The thesis behind it is the part worth reading.
My thesis work looked at that gap from two sides. First, I collected and organised AI/ML deployment best practices, including 33 practices and six additions proposed through the research. Then I designed a Python framework that tried to make those practices feel like the natural path rather than an enterprise checklist.
## The thing nobody wants to admit
The result was GreatAI: a deployment-oriented framework with a deliberately small API. The design goal was not to wrap every part of an ML stack. It was to make common deployment concerns visible, automatic where possible, and hard to forget.
The literature has a long list of habits you should adopt when shipping an ML service: track inputs, version models, expose health, log decisions, keep predictions reproducible. Everyone agrees with the list. Almost nobody implements all of it.
## The Problem
I spent the bulk of the thesis catalogueing 33 such habits, proposing 6 more, and surveying engineers on which actually got applied in their day jobs. The data was pretty clear about the failure mode: it wasn't ignorance, it wasn't laziness, it wasn't budget. It was that the cost of doing the right thing, five lines of glue per habit multiplied across a stack, was higher than the visible cost of skipping it. So skipping it became the default.
Deployment quality is often treated as something that happens after model development. That separation creates a bad default. A model can be useful in a notebook, but a deployed AI service also needs traceability, stable interfaces, input/output logging, model metadata, and operational behaviour that can be inspected later.
So the real research question wasn't "what should engineers do." It was "what API shape makes doing the right thing cheaper than not."
The hard part is not listing those needs. The hard part is getting busy engineers and data scientists to adopt them without making their work feel slower.
## The framework's bet
So the core question became: can a framework implement meaningful deployment practices while keeping the API small enough that people would actually use it?
- **A decorator on a plain function.** `@GreatAI.create` turns a regular Python function into a deployed service with metadata, request tracing, and a versioned interface. No inheritance, no project layout, no enforced directory structure. The mental cost is one import.
- **Implicit behaviour only for cross-cutting concerns.** Logging, versioning, metadata are implicit. Anything touching business logic stays explicit. The rule: if it would surprise me when I'm debugging, it shouldn't be implicit.
- **Own the contract, leave the storage alone.** Where you persist logs, models, or metrics is your choice; GreatAI defines the shape and provides defaults. The model registry stays somebody else's library.
## Constraints
The survey backed up the central premise: ease of use and functionality both matter for adoption, and they're independent axes. A framework that ticks every box and is awkward will lose to a smaller one that doesn't.
GreatAI had to satisfy two constraints that usually pull in opposite directions.
## What I'd change
It needed to encode deployment practices such as metadata handling, model loading, request tracing, and reproducible prediction interfaces. It also had to be approachable enough that the basic use case still looked like ordinary Python.
That shaped the API. The framework could not demand a new mental model for every project. The deployment behaviour had to sit close to the prediction function, because that is where the developer already has context.
## Design
The design leaned on decorators and lightweight conventions. The application author should be able to declare the prediction boundary, attach the relevant model and metadata behaviour, and let the framework handle repeated operational concerns.
That is a careful tradeoff. Too much implicit behaviour makes systems difficult to debug. Too much explicit setup makes best practices optional in practice, because the path of least resistance is to skip them. GreatAI tried to keep the implicit parts focused on cross-cutting deployment concerns rather than business logic.
Feedback from professional data scientists and software engineers supported the main premise: ease of use and functionality both matter when people decide whether to adopt deployment tooling. A framework that is technically complete but awkward to use will still fail.
## What Worked
The strongest part of the project was treating API design as part of deployment quality. Best practices are not only documentation. They need interface support, defaults, and feedback loops.
The research also forced the framework to be specific. "Production-ready" is too broad to be useful. A concrete list of deployment practices made it possible to ask which practices can be automated, which ones need explicit developer decisions, and which ones belong outside the framework.
## What I Would Change
If I returned to the project now, I would focus more on integration boundaries: how GreatAI should fit into modern observability, model registry, and evaluation workflows without trying to own them. Deployment frameworks age quickly when they become too broad.
The part I would keep is the central idea: make the right deployment behaviour easy enough that it becomes the default.
- I'd narrow further. Anything GreatAI did that overlapped with MLflow, BentoML, or modern observability stacks would go. The durable bit was always the decorator and the catalogue behind it.
- I'd publish the survey instrument separately. The 33-habit catalogue and the adoption-vs-impact methodology outlive the framework. People still ask about that part.
- I'd stop calling them "best practices." I used that phrase in the thesis and it aged into corporate-speak. The honest name is "things that hurt later if you skip them."