July Journal

01 Jul 2024 Anshul Raj Verma

Blog Icon: lucide:notebook-pen

Weekly Journal by ARV of July 2024

Back

Week 27 Journal

  1. Created new GitHub account with arv@leapx.ai email to work with LeapX. (But I don’t get this.)
    • There is no sense of creating new account just to commit in company’s github organization.
  2. Learning many things while working with LeapX.ai.
    • Finished the first interface of two paged streamlit dashboard.
    • Started working on new project which uses openai and llama-index and it is a kind of AI Dashboard.
    • Aditya said that “Team members are liking my work”.
    • Wrote scrapy scrapper from scratch to scrape Housing.com and pushed to LeapX.ai GH-ORG.
    • Wrote multiple parser to parse html pages (using selectolax) of government website’s pages where RERA registered agents were listed.
  3. Watched movie “The Whale”.

Week 28 Journal

  1. Finally, found the Astral’s uv official docs from their GitHub repo’s deployment section. The docs are present at astral-sh.github.io/uv.
  2. Watched Love Life (2022) movie.
  3. Binam bought Macbook Air M2 (8GB - 256GB).
  4. Fixed many bugs in arv-anshul/dotfiles repository while setting-up Binam’s macbook.

Internship at LeapX.ai

2024-07-08

  • Started refactoring the big-messy-code (where all dfs were stored in st.session_state) and pushed changes to a new repository campaign-dashboard.
  • Try scraping data of different major cities like Gurgaon, Delhi, Hyderabad, Mumbai, etc. but failed due to blocking. So Aditya, asked Pritam (USA based guy with better knowledge) so he told “use slow and randomized scraping using Docker”.

2024-07-09

  • 🤒 Took a day-off due to fever and loose motion.

2024-07-10

  • More refactoring in campaign-dashboard (almost completed).
  • Downloaded MongoDB Compass in laptop to connect with LeapX cluster where scraped data form Housing.com get stored.

2024-07-11

  • Created HuggingFace account with arv@leapx.ai ID.
  • Learned how to use HF Spaces and created a GH Action in campaign-dashboard repo to push code from GH to LeapX’s HF Space where the Streamlit app gets deployed. (done whole thing in 1 Hour :scream:)

2024-07-12

  • Learned and added Scrapy Pipeline to move scraped data into MongoDB cluster; also added Data Transformation (using Polars) code for scraped data.

Week 29 Journal

  1. Maintaining the explanation or description of projects done with LeapX as Data Science Intern in thoughts folder.

Internship at LeapX.ai

2024-07-15

  • Modified housing-dashboard so that it can ingest data from MongoDB.
  • Scraped 10k+ data of Delhi and Gurgaon city using properties-scraper.
  • housing-dashboard is running on HuggingFace Spaces.

2024-07-16

  • Discussed properties-scraper to automate it with @Anurag.
  • Modified properties-scraper codebase for the same.
  • [WIP] Also wrote code to scrape Maharastra RERA Agents.

2024-07-17

  • Meeting with @Pratham for campaign-bot to transform it into an API.
  • Meeting with @Vishal & @Aditya for housing-dashboard problems.
  • Discussed workflow and API architecture of campaign-bot API and created flowchart diagram for the same.

Currently assigned Projects to @Anshul

Hi Aditya, I am Anshul, currently assigned with almost 4 projects and it's hard to manage and discuss all of them with
different team members. It will be very helpful if you prioritize these projects so that I can work with them
efficiently without taking too much stress.
 
**Projects Description**
 
1. `properties-scraper`: Wrap the project with Docker so that it can scrape multiple cities properties data using `cron`
   job. There are many modifications occur in this project for this like dynamically load `URL` and `city` name. I've
   already discussed about it with @Anurag.
 
2. `housing-dashboard`: Yesterday (2024-07-17) in a meeting with @Aditya and @Vishal; we discussed some problems
   regarding this project and decided to make some major changes in this project.
 
3. `campaign-bot`: Yesterday (2024-07-17) meeting with @Pratham; we discussed about creating an API system around this
   to seemlessly integrate this bot in frontend.
 
4. `campaign-dashboard`: Recently refactored this project (code given by @Aditya) and deployed on HuggingFace Spaces but
   project has very high latency due to bad code management. So, this project also need improvements to handle latency
   and needs better architecture.
 
Best Regards, Anshul Raj Verma Data Science Intern

2024-07-18

  • Meeting with @Pratham and @Aditya for campaign-bot API.
  • There is a little progress in campaign-bot code but getting unexpected result after passing user_prompt in query_pipeline.
  • Pipeline is not working as expected and there are more concerns like how to retrive data for visualization and pass it as API response.

2024-07-21

Building campaign-bot-api with @Aditya and @Pratham and almost completed it. Made with FastAPI with JWT auth (learned on-the-go).

Week 30 Journal

  1. Droping the idea of journaling of LeapX.ai work.
  2. Learning LangChain and RAG for internship projects.
  3. Discussed about User’s Search Intent classification with @Gagandeep & @Vishal.
  4. Completed my first month at LeapX.ai as Data Science Intern.

Week 31 Journal

  1. Team dinner at LeapX.ai. They order meal for me too :partying_face:
  2. Worked on same project Search Intent Classifier project whole week. Trained distilbert-base-uncased model 2 times.
  3. :medal: First time following and loving olympics matches and athlete :saluting_face:
    • :star_struck: Loving Lakshya Sen and Hockey Team performaces.
  4. Using and kind of loving the new alternative of Jupyter Notebook; Marimo.