First of all - thanks for the substack!

I think it's not just OK to start at MLOps level 0, it's mandatory to start there! Even in well established company that have resources to invest in proper ML platform, I would start from this - the faster you can deliver value, the better it is for the ML function as whole.

I am not entirely sure why do you think this system is particularly likely to suffer from Training/Serving skew. The connection between Online and Offline data is the data lake - online data, becomes offline data. I agree that at this level the skew is not monitored, but data scientist will have an idea about this from his EDA - if there are big changes in data distribution in his training data, there will be changes when serving a model. What I am missing here? :)

I would also like to know what do you think about the deployment options - you write that software engineers pick up the artefact to be exposed in Model Service. Several thinks interest me here:

1. What do you have in mind with "model artefact"? Is this just a trained ML algorithm (e.g. XGBoost) that has it's own serialisation / deserialisation format and can be loaded in different languages? Or is it the entire Model class with other business logic (which can be substantial) serialised?

2. If it's just the trained algorithm, would you build other logic in service / app itself? Engineers should do it?

3. How do we think about the environment (especially in Python, as other might have "full jar" equivalents) in this case? Or in your opinion this is too complicated for level 0?

This topic - experiment to production - fascinates me so would be interesting to hear your thoughts. What you saw working and not in the past?

For example all the "managed endpoints" offered by the cloud providers to expose models are great when you are just starting, because they have a way to take care of most things, but they are also very restrictive in other ways and in my experience companies will outgrow them quite fast and will have to invest in setting something more complicated to keep productivity up.

Expand full comment