4 Comments

First of all - thanks for the substack!

I think it's not just OK to start at MLOps level 0, it's mandatory to start there! Even in well established company that have resources to invest in proper ML platform, I would start from this - the faster you can deliver value, the better it is for the ML function as whole.

I am not entirely sure why do you think this system is particularly likely to suffer from Training/Serving skew. The connection between Online and Offline data is the data lake - online data, becomes offline data. I agree that at this level the skew is not monitored, but data scientist will have an idea about this from his EDA - if there are big changes in data distribution in his training data, there will be changes when serving a model. What I am missing here? :)

I would also like to know what do you think about the deployment options - you write that software engineers pick up the artefact to be exposed in Model Service. Several thinks interest me here:

1. What do you have in mind with "model artefact"? Is this just a trained ML algorithm (e.g. XGBoost) that has it's own serialisation / deserialisation format and can be loaded in different languages? Or is it the entire Model class with other business logic (which can be substantial) serialised?

2. If it's just the trained algorithm, would you build other logic in service / app itself? Engineers should do it?

3. How do we think about the environment (especially in Python, as other might have "full jar" equivalents) in this case? Or in your opinion this is too complicated for level 0?

This topic - experiment to production - fascinates me so would be interesting to hear your thoughts. What you saw working and not in the past?

For example all the "managed endpoints" offered by the cloud providers to expose models are great when you are just starting, because they have a way to take care of most things, but they are also very restrictive in other ways and in my experience companies will outgrow them quite fast and will have to invest in setting something more complicated to keep productivity up.

Expand full comment
author

Hi Vaidas,

First of all - thank you for reading and thoughtfully commenting, hopefully this becomes more common in the future of the Newsletter :)

So, my thoughts:

Training/Serving Skew here means that transformation logic of Features used for training the model might diverge from the real time prediction service (endpoint if you may), it might even be coded up in different programming language (e.g. PHP). If you are using the Data Lake to load the Features into a low latency storage like Redis - this is already a Feature Store which is specifically not present in the Level 0.

1. Model Artifact in this case will be a serialized model that can be loaded with different programming languages.

2. Additional logic would be part of a Service App.

3. If you are serving your model Endpoint with Python - I would have it prepackaged into a Docker Container, but that is not for Level 0.

What I saw working is Dockerising the Models and deploying them as REST or gRPC Services via K8s. I would still not add any additional logic into these containers, the only thing they do is expose a prediction service for an entity. Then, you would have additional services that would combine results from combination of Model endpoints and apply additional logic to it - that is how you decouple business logic from pure maths.

What are your thoughts? Let me know if I answered your questions.

Expand full comment

Thanks for your thoughts!

I usually take a different route - I couple business logic (pre and post processing) with algorithm. There are a couple reasons for this:

1. I want to minimise the possibility of having training / serving feature mismatch (what you call skew, however it's usually used for different issue) - having a model class that expects data in the most "raw" format possible and then does the prep (in training and when serving) is my way of minimising this risk;

2. Especially for the MVPs feature engineering is very much dependent on the algorithm that is used. If we are still iterating on the model it's quite safe assumption that we will be iterating on the features as well. Therefore by keeping everything in one place, we can make sure ownership of the model output.

For an established model, when there is little iteration and mostly re-training on newer data, I see the benefit of having algorithm deployed separately. Then again, I do not see a downside of having business logic coupled with algorithm. But maybe it's due to the fact that when I think about the model, I think of it as being "data + algorithm + pre-post-processing" all these three parts forms a model. And then decision needs to be made to either couple it together in code or through services :)

Expand full comment

Good callout on skew, and yeah I actually think the term “mismatch” is more clear

“Explicit is better than implicit” right?

Expand full comment