June Journal

Week 23 Journal

:partying_face: Found the solution of last point in Week 22 which occurs with global Taskfile.

Just use USER_WORKING_DIR special variable of Taskfiles.

pc-all:
  desc: Run pre-commit on all files
  dir: "{{.USER_WORKING_DIR}}" # Refer to the absolute path of the directory `task` was called from.
  cmd: pre-commit run --all-files

Related Docs:

Created HuggingFace account because now I want to dive into LLMs and want to create some basic application using them and in the flow of creating something I can learn DL and LLM concpets too.
:ballot_box: Created India Election 2024 dashboard using Streamlit. Used httpx.AyncClient, polars.LazyFrame and only async functions.
- Created the dashboard same day and posted on LinkedIn about it :star_struck:.
Add a reference link in sidebar to @arv-anshul/notebooks and @arv-anshul/dotfiles repo in /project page.
mkdocs-material/discussions, I want a feature through which I can replace the bullets of lists to a SVG icon.
- A guy suggested a way to achieve that but that is complicated and not much flexible.
@arv-anshul/yt-watch-history-v2
- :sparkles: Add Channel Recommender System.
- :recycle: Refactor the CTT ML Model.
A nice discussion with @MayankVanik around Gen AI, LLMs, AI Agents, LangChain, LangGraph and HuggingFace.
Just realize the best coding font for me which is Recursive Font. BTW, I’ve used it earlier but rejected but now when I realize the casual nature of it, I fell for it. Now, I am using its Rec Mono Casual (for markdowns & docs) and Rec Mono Dutone (for coding) font variants.

Week 24 Journal

I saw a guy in “We write code” Server by Hitesh Chaudhary who has amazing portfolio and he had done amazing work around Full Stack Development and he also contributed in multiple Open Source repos (big repos) :AMAZING:. He made a full-pledged WORKING clown of PW website and also made a desktop app of PW using Tauri (Rust). He has amazing portfolio website. There is so much things I can learn form him. @arnvgh
Completed the refactoring of CTT Model in @arv-anshul/yt-watch-history-v2 project. See release v0.3.0.
~~Created a community server on Discord and added many interactive features which helps others to connect and learn together as community.~~
- After creating the discord server, I’ve deleted it because I don’t think that this is for me.
Wrote a clear project explanation paragraph of @arv-anshul/yt-watch-history-v2 project. See on website.
A employee from AABEE (a real state startup) reached me through reached me through my OG 99acres-scrape project. He conducted a meeting on G-Meet and given a task to scrape the older prices of properties to tackle the property’s price trend but the data is not there on the website. That’s why he said “we’ll reach you later for any other work” :disappointed:.
The 3 co-founders of LeapX.ai startup reached me through my OG 99acres-scrape project. They conducted a meeting on MS Teams and the meeting goes for 48 Minutes while meeting they ask for my education, data science knowledge. In the end, they asked me a problem to explain (not solve; just explain) on the basis of explain they offer me a internship :star_struck:.
- I first interacted with Aditya, when he cam to know that I am in class 12th then started asking me about my Father and his profession and my place, how I study there, which school.
- One of them is from Makhdumpur :star_struck:.
In LeapX.ai interview I was explaining my @arv-anshul/yt-watch-history-v2 project and while explaining I mentioned the User Sentiment Analysis model which is able to predict the user’s sentiment for programming, politics, entertainment, movie genre, etc. But the problem is that I haven’t implemented it (in short I don’t even checked it in ipynb). So, I decided to create a notebook around it and try to explain how to do it.
- :HARD: A ML Model which predict the user’s sentiment around programming, politics, entertainment.
- :EASY: ~~A Clustering Model which cluster similar videos~~
- :EASY: A wordcloud which shows most common words around programming, politics, entertainment from user’s watch history and by analysing that wordcloud anyone is able to determine their sentiment/habit.

Week 25 Journal

Project og-py to create custom GitHub repo social preview image using https://og-playground.vercel.app/ playground.
:star: Working at LeapX.ai as Intern.
:llama: Try using ollama in Zed editor. It is great but the response time very bad but I’ll use it.
Read/Watch something about Playwright (better alternative of Selenium).
:handshake: Connected with @ujjwal-basnet and @iamrajharshit from Discord.

Internship at LeapX.ai

2024-06-19

After a one day off, We (Me and Aditya) connect on and discuss the affluent properties clustering problem. I have created some dashboard around it but after the meeting the conclusion is that data is not good because many properties were removed from 99acres’s website. So we decided to scrape new data from the website. So I have to write code for that.

2024-06-20

While scraping data from 99acres with selenium I got many errors then I decided to scrape data from other website like Housing.com or MagicBricks.com.

I am planning to use selenium-wire. And, want to scrape more than one website and create a pipeline through which different websites’ data combines together and forms a wholesome dataset to work on.

2024-06-21

Getting errors while using selenium-wire package, so I try to learn and use playwright. Again try to scrape using httpx with headers and cookies and I got success so I scrape almost 3K data using this. For now I’m sticking with this method.

Week 26 Journal

Thinking to create a GitHub Repo where all the scraping code for different websites are present. Also, repo maintains a README.md file for different scraping project, tools and resources.
Used st.Page (new multipage UI update from Streamlit) while building dashboards for LeapX.
:star: Got Offer from LeapX.ai for 3 month intership.
I had been writing many scraping scraping for many websites like Housing.com, MagicBricks.com, Naukri.com, etc. That’s why thinking to publish them as blogs because they are not a full-pledged project instead they are just python using which we can scrape websites and there is no guarantee that script forever.

:scroll: I’ll just explain the script and paste it there in the blog.
Added LeapX.ai Internship on LinkedIn.
Stared learning LangChain from YouTube after Aditya told me to-do so.
🇮🇳 WON T20I

Internship at LeapX.ai

2024-06-25

Meeting started after :clock8: 20:00 and end at :clock1030: 22:30. We worked on dataset scraped from Housing.com contains ~12K rows.

We took a different approach to cluster properties, used polars.Expr.qcut function to categories properties on the basis of "PRICE_PER_UNIT_AREA". After this created a dashboard around this (with Streamlit).

Now the main problem is to find a dense region and create a circle around it to mark that region with a label eg. Luxury, Value, Affordable, etc. And for this I have to write a function which do these findings and creating regions.

2024-06-26

No meeting conducted.

Using httpx library to scrape all 21K properties from Housing.com website of Gurgaon, faced many website downtime but eventually got things work.

Received offer leeter from LeapX.

2024-06-29

Completed the two page streamlit dashboard.

Told me to learn about AI stuffs like Langchain and Llama Index. Already found an awesome video on it.