SAI #03: Machine Learning Deployment Types, Spark - Architecture and more...

Machine Learning Deployment Types, ML Experiment/Model Tracking, Spark - Architecture, Kafka - Reading Data (Basics)

Oct 29, 2022

👋 This is Aurimas. I write the weekly SAI Newsletter where my goal is to present complicated Data related concepts in a simple and easy to digest way. The goal is to help You UpSkill in Data Engineering, MLOps, Machine Learning and Data Science areas.

In this episode we cover:

Machine Learning Deployment Types.
Experiment/Model Tracking.
Spark - Architecture.
Kafka -Reading Data (Basics).

MLOps Fundamentals or What Every Machine Learning Engineer Should Know

𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝘆𝗽𝗲𝘀.

There are many ways you could deploy a 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 to serve production use cases. Even if you will not be working with them day to day, the following are the four ways you should know and understand as a 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿.

➡️ 𝗕𝗮𝘁𝗰𝗵:

👉 You apply your trained models as a part of 𝗘𝗧𝗟/𝗘𝗟𝗧 𝗣𝗿𝗼𝗰𝗲𝘀𝘀 on a given schedule.
👉 You load the required Features from a batch storage, apply inference and save inference results to a batch storage.
👉 It is sometimes falsely thought that you can’t use this method for 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻𝘀.
👉 Inference results 𝗰𝗮𝗻 𝗯𝗲 𝗹𝗼𝗮𝗱𝗲𝗱 𝗶𝗻𝘁𝗼 𝗮 𝗿𝗲𝗮𝗹 𝘁𝗶𝗺𝗲 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 and used for real time applications.

➡️ 𝗘𝗺𝗯𝗲𝗱𝗱𝗲𝗱 𝗶𝗻 𝗮 𝗦𝘁𝗿𝗲𝗮𝗺 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻:

👉 You apply your trained models as a part of 𝗦𝘁𝗿𝗲𝗮𝗺 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲.
👉 While Data is continuously piped through your 𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀, an application with a loaded model continuously applies inference on the data and returns it to the system - most likely another Streaming Storage.
👉 This deployment type will most likely involve a real time 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗦𝘁𝗼𝗿𝗲 𝗔𝗣𝗜 to retrieve additional 𝗦𝘁𝗮𝘁𝗶𝗰 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 for inference purposes.
👉 Predictions can be consumed by multiple applications subscribing to the 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗥𝗲𝘀𝘂𝗹𝘁𝘀 𝗦𝘁𝗿𝗲𝗮𝗺.

➡️ 𝗥𝗲𝗾𝘂𝗲𝘀𝘁 - 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗲:

👉 You expose your model as a backend Service.
👉 It will most likely be a 𝗥𝗘𝗦𝗧 𝗼𝗿 𝗴𝗥𝗣𝗖 𝗦𝗲𝗿𝘃𝗶𝗰𝗲.
👉 The API service retrieves Features needed for inference from a 𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗦𝘁𝗼𝗿𝗲 𝗔𝗣𝗜.
👉 Inference can be requested by any application in real time as long as it is able to form a correct request that conforms 𝗔𝗣𝗜 𝗖𝗼𝗻𝘁𝗿𝗮𝗰𝘁.

➡️ 𝗘𝗱𝗴𝗲:

👉 You embed your trained model directly into the application that runs on a user device.
👉 This method provides the lowest latency and improves privacy.
👉 Data most likely has to be generated and live inside of device.

𝗧𝘂𝗻𝗲 𝗶𝗻 𝗳𝗼𝗿 𝗺𝗼𝗿𝗲 𝗮𝗯𝗼𝘂𝘁 𝗲𝗮𝗰𝗵 𝘁𝘆𝗽𝗲 𝗼𝗳 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗶𝗻 𝗳𝘂𝘁𝘂𝗿𝗲 𝗲𝗽𝗶𝘀𝗼𝗱𝗲𝘀!

𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁/𝗠𝗼𝗱𝗲𝗹 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴.

A good 𝗠𝗼𝗱𝗲𝗹 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 should be composed of two integrated parts: 𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 and a 𝗠𝗼𝗱𝗲𝗹 𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝘆.

From where you track 𝗠𝗟 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 metadata will depend on 𝗠𝗟𝗢𝗽𝘀 maturity in your company.

If you are at the beginning of the ML journey you might be:

1️⃣ Training and Serving your Models from experimentation environment - you run 𝗠𝗟 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 inside of your 𝗡𝗼𝘁𝗲𝗯𝗼𝗼𝗸 and do that manually at each retraining.

If you are beyond Notebooks you will be running 𝗠𝗟 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 from 𝗖𝗜/𝗖𝗗 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 and on 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿 𝗧𝗿𝗶𝗴𝗴𝗲𝗿𝘀.

In any case, the 𝗠𝗟 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 will not be too different and a well designed System should track at least:

2️⃣ 𝗗𝗮𝘁𝗮𝘀𝗲𝘁𝘀 used for 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹𝘀 in 𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗼𝗿 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗠𝗟 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀. Here you should also track your 𝗧𝗿𝗮𝗶𝗻/𝗧𝗲𝘀𝘁 𝗦𝗽𝗹𝗶𝘁𝘀. At this stage you should also save all important metrics that relate to 𝗗𝗮𝘁𝗮𝘀𝗲𝘁𝘀 - 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 etc.
3️⃣ 𝗠𝗼𝗱𝗲𝗹 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 (e.g. model type, hyperparameters) together with 𝗠𝗼𝗱𝗲𝗹 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗠𝗲𝘁𝗿𝗶𝗰𝘀.
4️⃣ 𝗠𝗼𝗱𝗲𝗹 𝗔𝗿𝘁𝗶𝗳𝗮𝗰𝘁 𝗟𝗼𝗰𝗮𝘁𝗶𝗼𝗻.
5️⃣ 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 is an 𝗔𝗿𝘁𝗶𝗳𝗮𝗰𝘁 itself - track information about who and when triggered it. Pipeline ID etc.
✅ 𝗖𝗼𝗱𝗲: Everything is code - you should version and track it.

When a 𝗧𝗿𝗮𝗶𝗻𝗲𝗱 𝗠𝗼𝗱𝗲𝗹 𝗔𝗿𝘁𝗶𝗳𝗮𝗰𝘁 is saved to a 𝗠𝗼𝗱𝗲𝗹 𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝘆 there should always be a 1 to 1 mapping of previously saved 𝗠𝗼𝗱𝗲𝗹 𝗠𝗲𝘁𝗮𝗱𝗮𝘁𝗮 𝘁𝗼 𝗧𝗵𝗲 𝗔𝗿𝘁𝗶𝗳𝗮𝗰𝘁 which was outputted to 𝗧𝗵𝗲 𝗠𝗼𝗱𝗲𝗹 𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝘆:

➡️ 𝗠𝗼𝗱𝗲𝗹 𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝘆 should have a convenient user interface in which you can compare metrics of different 𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁 versions.
➡️ 𝗠𝗼𝗱𝗲𝗹 𝗥𝗲𝗴𝗶𝘀𝘁𝗿𝘆 should have a capability that allows change of 𝗠𝗼𝗱𝗲𝗹 𝗦𝘁𝗮𝘁𝗲 with a single click of a button. Usually it would be a change of state between 𝗦𝘁𝗮𝗴𝗶𝗻𝗴 𝗮𝗻𝗱 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻.

𝗙𝗶𝗻𝗮𝗹𝗹𝘆:

6️⃣ 𝗠𝗼𝗱𝗲𝗹 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 should be integrated with the 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗦𝘆𝘀𝘁𝗲𝗺. Once a model state is changed to 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻, 𝗧𝗵𝗲 𝗦𝘆𝘀𝘁𝗲𝗺 𝘁𝗿𝗶𝗴𝗴𝗲𝗿𝘀 𝗮 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 that will deploy a new model version and perform a decommission of the old one. This will vary depending on the type of deployment. 𝗠𝗼𝗿𝗲 𝗮𝗯𝗼𝘂𝘁 𝘁𝗵𝗶𝘀 𝗶𝗻 𝘁𝗵𝗲 𝗳𝘂𝘁𝘂𝗿𝗲 𝗲𝗽𝗶𝘀𝗼𝗱𝗲𝘀!

𝗠𝗼𝗱𝗲𝗹 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 containing these properties helps in the following way:

➡️ You will be able to understand how a Model was built and repeat the experiment.
➡️ You will be able to share experiments with other experts involved.
➡️ You will be able to perform rapid and controlled experiments.
➡️ The system will allow safe rollbacks to any Model Version.
➡️ Such a Self-Service System would remove friction between ML and Operations experts.

Data Engineering Fundamentals + or What Every Data Engineer Should Know

𝗦𝗽𝗮𝗿𝗸 - 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲.

𝗔𝗽𝗮𝗰𝗵𝗲 𝗦𝗽𝗮𝗿𝗸 is an extremely popular distributed processing framework utilizing in-memory processing to speed up task execution. Most of its libraries are contained in the Spark Core layer.

As a warm up exercise for later deeper dives and tips, today we focus on some architecture basics.

𝗦𝗽𝗮𝗿𝗸 𝗵𝗮𝘀 𝘀𝗲𝘃𝗲𝗿𝗮𝗹 𝗵𝗶𝗴𝗵 𝗹𝗲𝘃𝗲𝗹 𝗔𝗣𝗜𝘀 𝗯𝘂𝗶𝗹𝘁 𝗼𝗻 𝘁𝗼𝗽 𝗼𝗳 𝗦𝗽𝗮𝗿𝗸 𝗖𝗼𝗿𝗲 𝘁𝗼 𝘀𝘂𝗽𝗽𝗼𝗿𝘁 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲𝘀:

➡️ 𝗦𝗽𝗮𝗿𝗸𝗦𝗤𝗟 - Batch Processing.
➡️ 𝗦𝗽𝗮𝗿𝗸 𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 - Near to Real-Time Processing.
➡️ 𝗦𝗽𝗮𝗿𝗸 𝗠𝗟𝗹𝗶𝗯 - Machine Learning.
➡️ 𝗚𝗿𝗮𝗽𝗵𝗫 - Graph Structures and Algorithms.

𝗦𝘂𝗽𝗽𝗼𝗿𝘁𝗲𝗱 𝗽𝗿𝗼𝗴𝗿𝗮𝗺𝗺𝗶𝗻𝗴 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀:

➡️ Scala
➡️ Java
➡️ Python
➡️ R

𝗚𝗲𝗻𝗲𝗿𝗮𝗹 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲:

1️⃣ Once you submit a 𝗦𝗽𝗮𝗿𝗸 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 - 𝗦𝗽𝗮𝗿𝗸𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗢𝗯𝗷𝗲𝗰𝘁 is created in the 𝗗𝗿𝗶𝘃𝗲𝗿 𝗣𝗿𝗼𝗴𝗿𝗮𝗺. This Object is responsible for communicating with the 𝗖𝗹𝘂𝘀𝘁𝗲𝗿 𝗠𝗮𝗻𝗮𝗴𝗲𝗿.
2️⃣ 𝗦𝗽𝗮𝗿𝗸𝗖𝗼𝗻𝘁𝗲𝘅𝘁 negotiates with 𝗖𝗹𝘂𝘀𝘁𝗲𝗿 𝗠𝗮𝗻𝗮𝗴𝗲𝗿 for required resources to run 𝗦𝗽𝗮𝗿𝗸 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻. 𝗖𝗹𝘂𝘀𝘁𝗲𝗿 𝗠𝗮𝗻𝗮𝗴𝗲𝗿 allocates the resources inside of a respective Cluster and creates a requested number of 𝗦𝗽𝗮𝗿𝗸 𝗘𝘅𝗲𝗰𝘂𝘁𝗼𝗿𝘀.
3️⃣ After starting - 𝗦𝗽𝗮𝗿𝗸 𝗘𝘅𝗲𝗰𝘂𝘁𝗼𝗿𝘀 will connect with 𝗦𝗽𝗮𝗿𝗸𝗖𝗼𝗻𝘁𝗲𝘅𝘁 to notify about joining the Cluster. 𝗘𝘅𝗲𝗰𝘂𝘁𝗼𝗿𝘀 will be sending heartbeats regularly to notify the 𝗗𝗿𝗶𝘃𝗲𝗿 𝗣𝗿𝗼𝗴𝗿𝗮𝗺 that they are healthy and don’t need rescheduling.
4️⃣ 𝗦𝗽𝗮𝗿𝗸 𝗘𝘅𝗲𝗰𝘂𝘁𝗼𝗿𝘀 are responsible for executing tasks of the 𝗖𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻 𝗗𝗔𝗚 (𝗗𝗶𝗿𝗲𝗰𝘁𝗲𝗱 𝗔𝗰𝘆𝗰𝗹𝗶𝗰 𝗚𝗿𝗮𝗽𝗵). This could include reading, writing data or performing a certain operation on a partition of RDDs.

𝗦𝘂𝗽𝗽𝗼𝗿𝘁𝗲𝗱 𝗖𝗹𝘂𝘀𝘁𝗲𝗿 𝗠𝗮𝗻𝗮𝗴𝗲𝗿𝘀:

➡️ 𝗦𝘁𝗮𝗻𝗱𝗮𝗹𝗼𝗻𝗲 - simple cluster manager shipped together with Spark.
➡️ 𝗛𝗮𝗱𝗼𝗼𝗽 𝗬𝗔𝗥𝗡 - resource manager of Hadoop ecosystem.
➡️ 𝗔𝗽𝗮𝗰𝗵𝗲 𝗠𝗲𝘀𝗼𝘀 - general cluster manager (❗️ deprecated).
➡️ 𝗞𝘂𝗯𝗲𝗿𝗻𝗲𝘁𝗲𝘀 - popular open-source container orchestrator.

𝗦𝗽𝗮𝗿𝗸 𝗝𝗼𝗯 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹𝘀:

👉 𝗦𝗽𝗮𝗿𝗸 𝗗𝗿𝗶𝘃𝗲𝗿 is responsible for constructing an optimized physical execution plan for a given application submitted for execution.
👉 This plan materializes into a Job which is a 𝗗𝗔𝗚 𝗼𝗳 𝗦𝘁𝗮𝗴𝗲𝘀.
👉 Some of the 𝗦𝘁𝗮𝗴𝗲𝘀 can be executed in parallel if they have no sequential dependencies.
👉 Each 𝗦𝘁𝗮𝗴𝗲 is composed of 𝗧𝗮𝘀𝗸𝘀.
👉 All 𝗧𝗮𝘀𝗸𝘀 of a single 𝗦𝘁𝗮𝗴𝗲 contain the same type of work which is the smallest piece of work that can be executed in parallel and is performed by 𝗦𝗽𝗮𝗿𝗸 𝗘𝘅𝗲𝗰𝘂𝘁𝗼𝗿𝘀.

𝗞𝗮𝗳𝗸𝗮 - 𝗥𝗲𝗮𝗱𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 (𝗕𝗮𝘀𝗶𝗰𝘀).

Kafka is an extremely important 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗠𝗲𝘀𝘀𝗮𝗴𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 to understand, last time we covered Writing Data.

𝗦𝗼𝗺𝗲 𝗿𝗲𝗳𝗿𝗲𝘀𝗵𝗲𝗿𝘀:

➡️ Clients writing to Kafka are called 𝗣𝗿𝗼𝗱𝘂𝗰𝗲𝗿𝘀.
➡️ Clients reading the Data are called 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿𝘀.
➡️ Data is written into 𝗧𝗼𝗽𝗶𝗰𝘀 that can be compared to tables in Databases.
➡️ Messages sent to 𝗧𝗼𝗽𝗶𝗰𝘀 are called 𝗥𝗲𝗰𝗼𝗿𝗱𝘀.
➡️ 𝗧𝗼𝗽𝗶𝗰𝘀 are composed of 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀.
➡️ Each 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 is a combination of and behaves as a write ahead log.
➡️ Data is written to the end of the 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻.
➡️ Each 𝗥𝗲𝗰𝗼𝗿𝗱 has an 𝗢𝗳𝗳𝘀𝗲𝘁 assigned to it which denotes its order in the 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻.
➡️ 𝗢𝗳𝗳𝘀𝗲𝘁𝘀 start at 0 and increment by 1 sequentially.

𝗥𝗲𝗮𝗱𝗶𝗻𝗴 𝗗𝗮𝘁𝗮:

➡️ Data is read sequentially per partition.
➡️ 𝗜𝗻𝗶𝘁𝗶𝗮𝗹 𝗥𝗲𝗮𝗱 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻 can be set either to earliest or latest.
➡️ Earliest position initiates the consumer at offset 0 or the earliest available due to retention rules of the 𝗧𝗼𝗽𝗶𝗰 (more about this in later episodes).
➡️ Latest position initiates the consumer at the end of a 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 - no 𝗥𝗲𝗰𝗼𝗿𝗱𝘀 will be read initially and the 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 will wait for new data to be written.
➡️ You could codify your consumers independently, but almost always the preferred way is to use 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽𝘀.

𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽𝘀:

➡️ 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽 is a logical collection of clients that read a 𝗞𝗮𝗳𝗸𝗮 𝗧𝗼𝗽𝗶𝗰 and share the state.
➡️ Groups of consumers are identified by the 𝗴𝗿𝗼𝘂𝗽_𝗶𝗱 parameter.
➡️ 𝗦𝘁𝗮𝘁𝗲 is defined by the offsets that every 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 𝗶𝗻 𝘁𝗵𝗲 𝗧𝗼𝗽𝗶𝗰 is being consumed at.
➡️ 𝗦𝘁𝗮𝘁𝗲 of 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽𝘀 is written by the 𝗕𝗿𝗼𝗸𝗲𝗿 (more about this in later episodes) to an internal 𝗞𝗮𝗳𝗸𝗮 𝗧𝗼𝗽𝗶𝗰 named __𝗰𝗼𝗻𝘀𝘂𝗺𝗲𝗿_𝗼𝗳𝗳𝘀𝗲𝘁𝘀.
➡️ There can be multiple 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽𝘀 reading the same 𝗞𝗮𝗳𝗸𝗮 𝗧𝗼𝗽𝗶𝗰 having their own independent 𝗦𝘁𝗮𝘁𝗲𝘀.
➡️ Only one 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 per 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽 can be reading a 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻 at a single point in time.

𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽 𝗧𝗶𝗽𝘀:

❗️ If you have a prime number of 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀 in the 𝗧𝗼𝗽𝗶𝗰 - you will always have at least one 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 per 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽 consuming less 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀 than others unless number of 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿𝘀 equals number of 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀.

✅ If you want an odd number of 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀 - set it to a 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗼𝗳 𝗣𝗿𝗶𝗺𝗲 𝗡𝘂𝗺𝗯𝗲𝗿.

❗️ If you have more 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿𝘀 in the 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽 then there are 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀 in the 𝗧𝗼𝗽𝗶𝗰 - some of the 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿𝘀 will be 𝗜𝗱𝗹𝗲.

✅ Make your 𝗧𝗼𝗽𝗶𝗰𝘀 large enough or have less 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿𝘀 per 𝗖𝗼𝗻𝘀𝘂𝗺𝗲𝗿 𝗚𝗿𝗼𝘂𝗽.

SwirlAI Newsletter

Discussion about this post