Does 'full-stack' also apply to machine learning?

Engineering Jan 05, 2021

The term 'full-stack' is a well-known term in tech, but it also loosely means managing the complete development lifecycle of an app. Although this process begins with writing and managing code, the deployment pipeline also involves testing and pushing the code to the production servers. A full-stack ML engineer is expected to code on relevant frameworks, in addition to the ability to build and manage a development lifecycle. DevOps experience is critical to streamlining pipelines and workflow to eventually achieve faster build times for quality, reliable models.

Data scientists and ML engineers don’t have the support of large teams of developers when working with the lean teams of small startups. To excel in that environment, they are expected to take more of an onus and perform tasks that aren’t traditionally thought of as part of their role.

Here is a gist of what it takes to build and manage ML apps. You must:

  1. Know how to build models by writing code to create pipelines.
  2. Manage your infrastructure for efficient training of your models.
  3. Understand all available relevant models for better comparison.
  4. Push your models to production, monitor for quality & performance.
  5. Be aware of good MLOps tools and practices.

Now that you know what it takes to build and manage quality ML apps, the three main stages involved in doing so are:

  1. Training - Building & shaping the models and pipelines.
  2. Integration - Testing & validating the models and pipelines for quality metrics.
  3. Deployment - Pushing the final pipeline to production, and monitoring for critical business metrics.

Tools and practices that help manage, automate, and monitor each stage mentioned above are referred to as Continuous Training (CT), Continuous Integration, and Deployment (CI/CD).

Besides knowing how to code in your preferable frameworks and libraries, you also need to use a bunch of other tools to manage your app development. This includes basic version control of your code using Github, to Jupyter notebooks to explore the available models & pipelines. Use Docker to isolate your app from any given environment, and MLFlow to track & maintain a model registry of your training runs. When working on large datasets, use Dask to accelerate your Python code natively and scale up to clusters.  

The above-mentioned tools are the most common tools used in every project. However, the exact set of tools you'll need to use might differ based on your project and its scale. Since having a good CT pipeline is very critical to achieving a quality, reliable model faster; you should dedicate most of your time toward training and building your models.

Applied deep learning is set to greatly benefit from its community of full-stack ML engineers if they're able to build and manage ML apps with small, manageable teams.

If you are a full-stack ML engineer, SegMind is the all-in-one MLOps platform you've been waiting for. Working on SegMind gets you started with the tools mentioned above within minutes. SegMind also offers one-click Jupyter notebooks, MLFlow based tracking platform, managed Dask clusters, and more to supercharge your ML app development.

Know more from our website, or contact us to know more.

Tags

Rohit Ramesh

Co-Founder & CEO of Segmind