[v.1.0] (04/28/2023): Post published!
I recently finished an excellent graduate course, Software Engineering (CS5704), and learned about different aspects of software projects and how different-size companies handle their technical/business changes to deliver successful products to their customer. Some important topics are Process Models (Waterfall, V-Model, Spiral, Agile), Requirements Definition, and Architecture Design Patterns. Especially, S.O.L.I.D principles have struck me as must-known concepts for writing better and cleaner code.
Why do S.O.L.I.D principles matter? According to Uncle Bob, bad code slows down the development team as it is confusing and fragile. Confusing code does not explain what it is doing, while fragile code breaks in many places when you change one or a few lines of code.
What we want is the code that is clear, rigid, and reusable.
Fun Fact: In his talk in S.O.L.I.D Principles, he mentioned that he was not the first to realize or coin the acronym “SOLID.”
Single Responsibility Principle
SRP: “A class should have one, and only one reason to change” — Robert C. Martin. In Object Oriented Programming (OOP), a class should have only one primary function. If there is more than one utility for that class, we should split it into multiple courses. This helps distribute the functional responsibilities across numerous classes or objects (or developers).
Web Development Example
Let’s say you are a developer in the Google Shopping team who is in charge of designing a class to process post-item-purchase.
Code Example
class PostPurchaseProcess:
def web_notification(self, message):
"""
Code to send confirmation bill to Google Chrome
"""
pass
def email_notification(self, message):
"""
Code to send confirmation bill to Gmail
"""
pass
def phone_notification(self, message):
"""
Code to send confirmation bill to Google Pixel
"""
pass
In the example above, the class PostPurchaseProcess
violates the SRP as it contains too many responsibilities: sending notifications to web app, email, and mobile. What if there are errors in sending notifications to Google Pixel and Gmail? It may take time to pinpoint precisely which function(s) is responsible for the mistake!
Fix: We can refactor the code into multiple classes, each with single responsibility.
Code Example
class WebPurchaseProcess:
def web_notification(self, message):
"""
Code to send confirmation bill to Google Chrome
"""
pass
class EmailPurchaseProcess:
def email_notification(self, message):
"""
Code to send confirmation bill to Gmail
"""
pass
class PhonePurchaseProcess:
def phone_notification(self, message):
"""
Code to send confirmation bill to Google Pixel
"""
pass
Data Science Example
Suppose you are a Data Scientist at C3.ai who is processing a tabular dataset for a supervised classification application. There are multiple steps to investigate the structure, quality, and content of the dataset, such as: checking datatypes, removing duplicates, data imputation, removing outliers, class-balancing, feature analysis, or feature engineering. Similar to the Google Shopping example above, to follow the SRP, we will need to put these steps into their separate class.
Code Example
class DataImputation:
def median_fill(self, data):
"""
Code to fill missing data by median of feature
"""
pass
class OutlierRemoval:
def remove_by_euclidean_dist(self, data):
"""
Code to remove outliers using Euclidean distance
"""
pass
class FeatureAnalysis:
def pearson_corr_cal(self, data):
"""
Code to calculate Pearson correlation scores between
input feature(s) and output label(s)
"""
pass
However, I don’t find this principle helpful in a Kaggle or data science project, as each Jupyter Notebook cell can be run and tested individually.
Open-Closed Principle
OCP: “A module (or component) should be open for extension but closed for modification” — Bertrand Meyer. When making changes, the principle prevents the already functional design from bugs or breaks. This principle promotes a modular and flexible design that allows for the easy integration of a new idea while making your codebase more maintainable and scalable. The OCP can be hard to understand, so let’s walk through some examples.
Web Development Example
An easy-to-see symptom of OCP violation to look for is the use of if/elif/else
or switch-case
statements. Let’s say that you are a (Flask or Django) backend developer at Meta who is writing a REST API that uses HTTP request protocols (POST, GET, PUT, DELETE) to allow users to interact with the database via CRUD (Create, Read, Update, Delete).
Code Example
class RequestHandler:
def handle_request(self, request):
"""
Code to handle request based on type
"""
if request.method == "GET":
self.handle_get(request)
elif request.method == "POST":
self.handle_post(request)
elif request.method == "PUT":
self.handle_put(request)
elif request.method == "DELETE":
self.handle_delete(request)
def handle_get(self, request):
"""
Code to handle GET request
"""
pass
def handle_post(self, request):
"""
Code to handle POST request
"""
pass
def handle_put(self, request):
"""
Code to handle PUT request
"""
pass
def handle_delete(self, request):
"""
Code to handle DELETE request
"""
pass
The example above violates OCP because every time a new request type is added (e.g., PATCH), our RequestHandler
class needs to be modified. This can introduce new bugs into existing code. As the OCP stated, we should design our code to be fixed but extended.
Fix: A solution to OCP violation is to separate each request type into individual classes, thus abstracting the RequestHandler
class. Therefore, if we want to add a PATCH request, we can extend our API by adding a new PatchRequestHander
class.
Code Example
class RequestHandler:
def handle_request(self, request):
"""
Code to handle request based on type
"""
request.handle()
class GetRequestHandler:
def handle(self):
"""
Code to handle GET request
"""
pass
class PostRequestHandler:
def handle(self):
"""
Code to handle POST request
"""
pass
class PutRequestHandler:
def handle(self):
"""
Code to handle PUT request
"""
pass
class DeleteRequestHandler:
def handle(self):
"""
Code to handle DELETE request
"""
pass
class PatchRequestHandler:
def handle(self):
"""
Code to handle PATCH request (extended)
"""
pass
Data Science Example
Let’s say you are a Machine Learning Engineer at NVIDIA who is writing a Python script for a baseline model with data preprocessing, model training, and model evaluation. Like the REST API example above, you want your class ModelPipeline
to remain closed for modification but open for extension by strictly following the OCP.
Code Example
class DataPreprocessor:
def preprocess(self, dataset):
"""
Code to preprocess data
"""
pass
class ModelTrainer:
def train(self, dataset):
"""
Code to train model
"""
pass
class ModelEvaluator:
def evaluate(self, model, dataset):
"""
Code to evaluate model
"""
pass
class ModelPipeline:
def __init__(self, preprocessor, trainer, evaluator):
"""
Code to build a ModelPipeline constructor (object)
"""
self.preprocessor = preprocessor
self.trainer = trainer
self.evaluator = evaluator
def run_pipeline(self, dataset):
"""
Code to run the machine learning project end-to-end
"""
preprocessed_data = self.preprocessor.preprocess(dataset)
model = self.trainer.train(preprocessed_data)
evaluation_result = self.evaluator.evaluate(model, preprocessed_data)
return evaluation_result
Your teammate develops a new way of processing the dataset. Luckily, due to your guideline of OCP, your teammate can easily extend the existing baseline model by inheriting the DataPreprocessor
class without the risk of breaking your functional baseline design.
Code Example
class NewDataPreprocessor(DataPreprocessor):
def preprocess(self, dataset):
"""
Code to preprocess data with new technique
"""
pass
Liskov Substitution Principle
LSP: “Subclasses should be substitutable for their base classes (without affecting the correctness of the program)” — Barbara Liskov. In OOP, what we want is for any method or code that works for a base class should continue to work correctly when used with the derived types. This principle ensures the inheritance hierarchies are consistent, extensible, and correct.
Web Development & Data Science Example
Let’s say that you are to design a codebase at Netflix to write processed data into a MySQL database and a CSV file (You can think of adding new columns into a database by scraping or engineering new features).
Code Example
class DataHandler():
def write_db(self, data):
"""
Code to write processed data to MySQL database
"""
print("Handling MySQL database.")
def write_csv(self, data):
"""
Code to write processed data to CSV file
"""
print("Handling CSV file.")
class WriteDB(DataHandler):
def write_db(self, data):
"""
Code to write processed data to MySQL database
"""
print("Handling MySQL database.")
def write_csv(self, data):
"""
Code to write processed data to CSV file
"""
raise Exception("Error: Can't write to CSV file.")
class WriteCSV(DataHandler):
def write_db(self, data):
"""
Code to write processed data to MySQL database
"""
raise Exception("Error: Can't write to MySQL database.")
def write_csv(self, data):
"""
Code to write processed data to CSV file
"""
print("Handling CSV file.")
Here, our base class DataHandler
defined two methods, write_db()
and write_csv()
. The derived class WriteDB
inherits from DataHander
and overrides the write_csv method. However, instead of providing the expected behavior, it prints an error message indicating it can’t write to a CSV file. Similarly, the derived class WriteCSV
prints out the error message indicating it can’t write to MySQL file. This design violates the LSP as the derived classes do not behave as expected based on the contract defined by the DataHandler
base class. The base class is designed to be too specific, thus causing its children to handle edge case(s) based on the characteristics of the child classes.
Fixed: Let’s write a more generic base class!
Code Example
class DataHandler():
def write(self, data):
"""
Code to handle processed data
"""
print("Handling data.")
class WriteDB(DataHandler):
def write(self, data):
"""
Code to write processed data to MySQL database
"""
print("Handling MySQL database.")
class WriteCSV(DataHandler):
def write(self, data):
"""
Code to write processed data to CSV file
"""
print("Handling CSV file.")
If you have an object of type WriteDB
or WriteCSV
, you can safely use it wherever an object of type DataHandler
is expected because they adhere to the same contract.
Interface Segregation Principle
ISP: “Many client-specific interfaces are better than one general purpose interface” — Robert C. Martin. If we have an extensive interface with many functions, the class implementing the interface might only use some defined functions! It is better to break up the interface into multiple smaller interfaces; then, we can inherit our class’s needed interface(s). This principle promotes code modularity, reduces unnecessary dependencies, and makes it easier to maintain/extend the existing codebase.
Web Development Example
Let’s say you are a Full-stack Web Developer at Odoo who writes a simple web app with a home page and admin page.
Code Example
class WebPage:
def render(self):
"""
Code to render HTML of a generic web page
"""
pass
def save(self):
"""
Code to save data of a generic web page into a database
"""
pass
def delete(self):
"""
Code to delete data of a generic web page from a database
"""
pass
class HomePage(WebPage):
def render(self):
"""
Code to render HTML of home page
"""
pass
def save(self):
"""
Code to save data of home page into a database
"""
pass
def delete(self):
"""
Code to delete data of home page from a database
"""
raise Exception("Error: Only Admin can delete data from database.")
class AdminPage(WebPage)
def render(self):
"""
Code to render HTML of admin page
"""
pass
def save(self):
"""
Code to save data of admin page into a database
"""
pass
def delete(self):
"""
Code to delete data of admin page from a database
"""
pass
The example above violates the ISP because the children should not be forced to depend on the parent’s method(s) they do not use. While the AdminPage
class inherits just fine, the HomePage
class can’t use the delete()
function. This creates an unnecessary dependency!
Fixed: It will be better to segregate the parent (WebPage
) class into smaller interface(s) that meet the needs of each child. We can separate render()
, save()
, and delete()
functions into multiple classes.
Code Example
class Renderable:
def render(self):
"""
Code to render HTML of a generic web page
"""
pass
class Savable:
def save(self):
"""
Code to save data of a generic web page into a database
"""
pass
class Deletable:
def delete(self):
"""
Code to delete data of a generic web page from a database
"""
pass
class HomePage(Renderable, Savable):
def render(self):
"""
Code to render HTML of home page
"""
pass
def save(self):
"""
Code to save data of home page into a database
"""
pass
class AdminPage(Renderable, Savable, Deletable):
def render(self):
"""
Code to render HTML of admin page
"""
pass
def save(self):
"""
Code to save data of admin page into a database
"""
pass
def delete(self):
"""
Code to delete data of admin page from a database
"""
pass
Data Science Example
Let’s say that you are a Data Scientist at LinkedIn who is working on a multimodal application. You are tasked to process image and text data. Like the web development example above, you want separate interfaces for each subtask and inherit related interfaces to process images and text accordingly.
Code Example
class DataLoader:
def load(self, data):
"""
Code to load data to notebook memory (entire dataset or as a generator)
"""
pass
class DuplicateRemoval:
def remove_dup(self, data):
"""
Code to remove duplicate images in the dataset
"""
pass
class DataAugmentation:
def img_augment(self, data):
"""
Code to augment images in the dataset
"""
pass
class DataSmoothing:
def ts_smooth(self, data):
"""
Code to smooth time-series dataset
"""
pass
class ProcessTimeSeriesDataset(DataLoader, DataSmoothing):
def load(self, data):
"""
Code to load data to notebook memory (entire dataset or as a generator)
"""
pass
def ts_smooth(self, data):
"""
Code to smooth time-series dataset
"""
pass
class ProcessImageDataset(DataLoader, DuplicateRemoval, DataAugmentation):
def load(self, data):
"""
Code to load data to notebook memory (entire dataset or as a generator)
"""
pass
def remove_dup(self, data):
"""
Code to remove duplicate images in the dataset
"""
pass
def img_augment(self, data):
"""
Code to augment images in the dataset
"""
pass
Here, we have the defined separate interfaces for data load, removing duplicates, image augmentation, and time-series smoothing. The ProcessTimeSeriesDataset
and ProcessImageDataset
classes only implement (or inherit) the interfaces (or parent classes) that are relevant to them.
Dependency Inversion Principle
DIP: “Depend on abstractions. Do not depend on concretions” — Robert C. Martin. The main idea is to decouple high-level modules from low-level modules by introducing abstractions as mediators. When integrating external dependencies, it is better to create wrapper(s) around them so that your code depends on the wrapper you make and not the details of the dependencies. This allows for better flexibility, as different implementations can be easily substituted without affecting the high-level modules. This principle promotes modularity and maintainability in codebase design.
Web Development Example
Let’s say you are a Mobile Developer at Apple who work on the payment integration aspect of Apple Music. Your task is to integrate Stripe Payment API into your backend codebase.
Code Example
class StripeProcessor:
def process_payment(self, credit_cart_num):
"""
Code to process payment via Stripe Payment API
"""
pass
class AppleMusic:
def notify_payment(self, credit_cart_num):
"""
Code to process payment in iOS backend
"""
processor = StripeProcessor()
processor.process_payment(credit_cart_num)
The Apple Music example above violates DIP as the AppleMusic
class directly depends on StripeProcessor
class, a specific low-level implementation. Imagine that Stripe provides Stripe Payment API version 2.0, which has a massive change in multiple methods. Our AppleMusic
(and any other class that uses the StripeProcessor
object) will be broken. We will have to fix every single line that uses StripeProcessor
’s methods.
Fixed: AppleMusic
and StripeProcessor
classes should depend on abstractions (or a wrapper for StripeProcessor) to avoid such catastrophe. In addition, we can easily swap the external API (e.g., Venmo Payment API) within the wrapper class by having a wrapper.
Code Example
class PaymentProcessor:
def process_payment(self, credit_cart_num):
"""
Code to process payment via external API
"""
pass
class StripeProcessor(PaymentProcessor):
def process_payment(self, credit_cart_num):
"""
Code to process payment via Stripe Payment API
"""
pass
class VenmoProcessor(PaymentProcessor):
def process_payment(self, credit_cart_num):
"""
Code to process payment via Venmo Payment API
"""
pass
class AppleMusic:
def __init__(self, processor: PaymentProcessor):
"""
Code to build AppleMusic constructor (object)
"""
self.processor = processor
def notify_payment(self, credit_cart_num):
"""
Code to process payment in iOS backend
"""
self.processor.process_payment(credit_cart_num)
Data Science Example
Let’s say you are a Data Analyst at Deloitte who is in charge of plotting the dataset to show insight to stakeholders. Similarly to the Apple Music example above, creating a wrapper for the data visualization task would be best.
Code Example
class Plotter:
def show_plots(self, dataset):
"""
Code to plot the dataset
"""
pass
class SeabornPlotter(Plotter):
def show_plots(self, dataset):
"""
Code to plot the dataset bia Seaborn API
"""
pass
class PlotlyPlotter(Plotter):
def show_plots(self, dataset):
"""
Code to plot the dataset bia Plotly API
"""
pass
class BusinessInsight():
def __init__(self, plotter: Plotter):
"""
Code to build BusinessInsight constructor (object)
"""
self.plotter = plotter
def notify_payment(self, dataset):
"""
Code to show the visualization trends or patterns of the dataset
"""
self.plotter.show_plots(dataset)
Here, the BusinessInsight
class depends on the Plotter
abstraction through its constructor, allowing different plotting implementations to be injected without modifying the BusinessInsight
class. The Plotter
class serves as the abstraction, while SeabornPlotter
and PlotlyPlotter
are concrete implementations of the Plotter
class. Depending on the abstraction (Plotter
), the BusinessInsight
class is decoupled from specific plotting implementations. This promotes flexibility and modularity, as different plotting libraries or variations can be used interchangeably by appropriately implementing the Plotter
abstraction to the BusinessInsight
class.
Citation
Cited as:
Nguyen, Minh. (April 2023). S.O.L.I.D Principles Explained https://mnguyen0226.github.io/posts/solid_principles/post/
Or
@article{nguyen2023solid,
title = "S.O.L.I.D Principles Explained",
author = "Nguyen, Minh",
journal = "mnguyen0226.github.io",
year = "2023",
month = "April",
url = "https://mnguyen0226.github.io/posts/solid_principles/post/"
}
References
[1] R. S. Pressman and B. R. Maxim, Software Engineering: A Practitioner’s Approach. New York, NY: McGraw-Hill Education, 2020.
[2] R. C. Martin, M. C. Feathers, T. R. Ottinger, and J. J. Langr, Clean Code A Handbook of Agile Software Craftsmanship. Boston, MA: Pearson Education, Inc, 2016.
[3] R. C. Martin, “Clean Code - Lecture Series,” YouTube, https://www.youtube.com/watch?v=7EmboKQH8lM&list=PLwAjnlpkQEft41G-GvHAKnh_CkaEKFawh&ab_channel=UnityCoin (accessed May 12, 2023).
[4] “SOLID Design Principle - Web Dev Simplified,” YouTube, https://www.youtube.com/watch?v=UQqY3_6Epbg&list=PLZlA0Gpn_vH9kocFX7R7BAe_CvvOCO_p9&ab_channel=WebDevSimplified (accessed May 12, 2023).