Introduction

Too many poor code in ml research and ml production

1. Proof-of-Concept Style code

Issue: POC code hard to extend (5 to 10 ops rewritten to complete preprocessing, feature engineering, training, deployment and monitoring)

Rationale: Common in Startup, could be fast in the short-term, but detrimental in long-term]

Solution: Use library or custom packages for argparse and other argument management. Example: typer, FastAPI

2. No high-level separation of concerns

Issue: number of cyclic dependencies present between what seems to be low-level packages and high-level ones increases

Rationale: What the ML package is doing (under the name ML lib) also include administrative code, etc

Solution: Use Docker and Microservices architecture. Make sure to achieve good distributed system hygiene with middlewares like RabbitMQ, Kafka and Redis

3. No low-level separation of concerns

Issue: very bad code structure. no OOP or FP

Solution: checkout my code architecture in “/Users/criss_w/Desktop/Research_and_ML/Self_Study/sample_full_stack_ml/model”

4. No configuration Data Model

Issue: Debugging is really a nightmare

Solution: Pydantic

5. Handling legacy models

Issue: When trying to achieve backward compatibility, poor coding structure give much pain

Solution: cron, plotly, tmux. Understand basic deployment strategies

6. Code quality: type hinting, documentation, complexity, dead code

Solution: autopep8, flake, mypy, pylint, unittest, pydeps, sourcery

Author

Zhenlin Wang

Posted on

2024-04-19

Updated on

2024-04-19

Licensed under