Skip to content

YT Comment Sentiment - Backend Side

Technology Description
scikit-learn A Python library for building and training machine learning models.
DagsHub Badge A collaboration platform for machine learning, hosting data and MLflow models.
MLflow A platform to manage the ML lifecycle, including model tracking and deployment.
FastAPI A modern web framework for building APIs with Python, known for its speed.
Pydantic A Python library for data validation. Used to validate API data.
Pytest Badge A testing framework for Python, used to test the FastAPI application.
YouTube An API to access and manage YouTube video data, including comments.
Render A cloud platform for hosting APIs, websites, and applications.

What I followed to know?

Important

  1. As I am learning Python, Data Science and Machine Learning for more than 3 years. I don't have to look around to learn new things to build this. This part is kind of easy for me.
  2. But as I said earlier, the documentations and ChatGPT is most important resources you can onto. 😉
  • Need to get an API key from Google Developer Console to interact with YouTube Data API.
  • Need to create an account on DagsHub to store/track MLFlow experiments and models.
  • Created a DVC pipeline to run the MLFlow experiments seemlessly using dvc repro command.
  • After creating the FastAPI app, I've used pytest to test it and also setup a pre-commit for it.
  • Deployment on render.com.

What type of problems I have faced?

Render.com

  • As I have used uv to manage my project but render.com doesn't support uv out-of-the-box so I have used pip to use uv for dependencies installation.

    pip install uv && uv sync --extra=backend --compile-bytecode
    
  • Also, render.com only serve apps on port under $PORT env (which 10000 most of the times) so make sure to explicitly provide while running app through uvicorn or fastapi-cli CLI.

    # For uvicorn
    uvicorn run --host 0.0.0.0 --port $PORT backend.app:app
    
    # For fastapi-cli
    fastapi run --host 0.0.0.0 --port $PORT backend/app.py
    

Docker

  • I am using wordcloud to create a plot in a FastAPI route. While building docker image FROM python:3.11-slim image, I am getting error because wordcloud package needs gcc package to build wheels. So you need to explicitly install gcc before install wordcloud as python package.

    # Install gcc for wordcloud
    RUN apt-get update && apt-get install -y gcc && apt-get clean
    
    # Now install project dependencies including wordcloud
    # ...
    
  • Also use multi-stage builds in Dockerfile to reduce the image size. See uv docs.