Thank you for the post!

The graphs are very clear as always and you have definitely inspired me to use them more in my work!

I have couple of discussion points regarding the business logic and ML services:

1. I am a proponent of bundling ML code with business logic as it is usually coupled anyways - rules depend on the output OR rules are very light wrapper in which case it does not even matter that much where to put them (maybe). However, I have recently think of a situation where it might be worth separation - when ML deployment is done through some kind of managed service (e.g. MLFlow on Databricks cloud). This way, we can keep the business logic services pointing to the specific versions of the model. I think this brings some nice things to the table - for example it is easier (in my view) to run several versions of the ML model simultaneously. What is your take?

2. Another point is about feature extraction. You have them accessed from the ML service (in a separated case), but my intuition would be to keep that service as close to "pure function" as possible. Mainly - all the input should come with the "request". Do you see any issues with this approach?

3. And the last point concerns you example about the business logic stitching together result from multiple ML models. I am wondering more about team structures (team topologies) that enables this parallel work. Given that this whole peace made up of three parts needs to produce consistent and good results - therefore those components are coupled. How do you handle this then? Have seen good examples?? From my experience, the same team owned all those services and would deploy the whole pipeline usually even just one part changed - basically the whole pipeline was treated as a single unit, but it was distributed because some parts could use a lot smaller machines than others.

Once again, thank you for writing. There is such an array of interesting topics and discussions!

Expand full comment

Hi Vaidas!

Thank you for bringing up the discussion :)

1. Yes, this is one of the biggest advantages of separation in my mind. You can easily manage different combinations of Model Versions (and business logic) in a single proxy layer, this allows for more convenient A/B testing setup.

2. The issues I see is that given a large enough feature vector it will have to traverse Network possibly several times increasing the latency. Additionally, in my mind the Feature Input is completely tied to the Model that is deployed, so given 5 model versions deployed simultaneously with different Feature Contracts managing them outside of ML Service will become troublesome, additionally in some cases the service calling ML Service might not even be owned by ML Engineers.

3. In Teams Topologies terms I would see systems like RecSys managed by a single Complicated Subsystem Team while different components would be developed by different people. While the pipeline is treated as the same (meaning you need to chain certain functions to produce the result), the functions can be developed separately. It really depends on the size of the company and what ML System we are talking about. I have even seen the proxy application managed by a separate Software Engineering team that has nothing to do with ML (not saying this is a correct way to do it :) ).

Expand full comment