Skip to content

YT Comment Sentiment - ML Side

Data Handling Steps

Data Gathering

For comment's sentiment prediction we need a data which has Comments and its corresponding Sentiment. And for that we have used dataset used in the course.

Data Preprocessing

  • Preprocess by lowercasing the words.
  • Cleaned the texts by removing stopwords and punctuations.
  • Applied lemmetization using WordNetLemmatizer.
  • Then, stemming using PorterStemmer.

EDA

  • Checked target column's distribution.
  • Performed intensive EDA by creating many additional features using comment's chars, words and sentences.
  • Generated wordcloud to see different sentiment's frequent words.

EDA Notebook

Model Building Steps

  • Comment Vectorization [text-to-vec]

    • Before transforming performed some basic preprocessing steps on comments like lowercasing, lemmetization and stemming to make vectors more consistent.
    • Evaluated multiple vectorization methods like BOW and TF-IDF.
    • Also, performed hyperparameter tuning on vectorization methods by tuning parameters like n_gram and max_features.
    • Chosen TF-IDF Vectorizer model to transform comment texts into vectors which passes into ML Model.
  • Feature Engineering

    • Created multiple new features using comments' texts like word count, etc. which help the model to learn the comments' sentiment better.
  • Hyperparameter Tuning

    • Used Bayesian Optimization Technique to perform hyperparameter tuning on models.
    • Tuned models on their most important parameters.
    • Logged best parameter of each models with MLFlow to evaluate further.
  • Evaluation

    • Used MLFlow UI to check which model is performing well on the dataset.
    • Evaluated on:
      1. Overall accuracy_score
      2. Different sentiment's r1_score, precision and f1_score value.