Week 27 Journal
- Created new GitHub account with
arv@leapx.aiemail to work with LeapX. (But I don’t get this.)- There is no sense of creating new account just to commit in company’s github organization.
- Learning many things while working with LeapX.ai.
- Finished the first interface of two paged streamlit dashboard.
- Started working on new project which uses
openaiandllama-indexand it is a kind of AI Dashboard. - Aditya said that “Team members are liking my work”.
- Wrote
scrapyscrapper from scratch to scrape Housing.com and pushed to LeapX.ai GH-ORG. - Wrote multiple parser to parse
htmlpages (usingselectolax) of government website’s pages where RERA registered agents were listed.
- Watched movie “The Whale”.
Week 28 Journal
- Finally, found the Astral’s
uvofficial docs from their GitHub repo’s deployment section. The docs are present at astral-sh.github.io/uv. - Watched Love Life (2022) movie.
- Binam bought Macbook Air M2 (8GB - 256GB).
- Fixed many bugs in arv-anshul/dotfiles repository while setting-up Binam’s macbook.
Internship at LeapX.ai
2024-07-08
- Started refactoring the big-messy-code (where all
dfs were stored inst.session_state) and pushed changes to a new repositorycampaign-dashboard. - Try scraping data of different major cities like Gurgaon, Delhi, Hyderabad, Mumbai, etc. but failed due to blocking. So Aditya, asked Pritam (USA based guy with better knowledge) so he told “use slow and randomized scraping using Docker”.
2024-07-09
- 🤒 Took a day-off due to fever and loose motion.
2024-07-10
- More refactoring in
campaign-dashboard(almost completed). - Downloaded MongoDB Compass in laptop to connect with LeapX cluster where scraped data form Housing.com get stored.
2024-07-11
- Created HuggingFace account with
arv@leapx.aiID. - Learned how to use HF Spaces and created a GH Action in
campaign-dashboardrepo to push code from GH to LeapX’s HF Space where the Streamlit app gets deployed. (done whole thing in 1 Hour :scream:)
2024-07-12
- Learned and added Scrapy Pipeline to move scraped data into MongoDB cluster; also added Data Transformation (using Polars) code for scraped data.
Week 29 Journal
- Maintaining the explanation or description of projects done with LeapX as Data Science Intern in thoughts folder.
Internship at LeapX.ai
2024-07-15
- Modified
housing-dashboardso that it can ingest data from MongoDB. - Scraped 10k+ data of Delhi and Gurgaon city using
properties-scraper. housing-dashboardis running on HuggingFace Spaces.
2024-07-16
- Discussed
properties-scraperto automate it with @Anurag. - Modified
properties-scrapercodebase for the same. - [WIP] Also wrote code to scrape Maharastra RERA Agents.
2024-07-17
- Meeting with @Pratham for
campaign-botto transform it into an API. - Meeting with @Vishal & @Aditya for
housing-dashboardproblems. - Discussed workflow and API architecture of
campaign-botAPI and created flowchart diagram for the same.
Currently assigned Projects to @Anshul
Hi Aditya, I am Anshul, currently assigned with almost 4 projects and it's hard to manage and discuss all of them with
different team members. It will be very helpful if you prioritize these projects so that I can work with them
efficiently without taking too much stress.
**Projects Description**
1. `properties-scraper`: Wrap the project with Docker so that it can scrape multiple cities properties data using `cron`
job. There are many modifications occur in this project for this like dynamically load `URL` and `city` name. I've
already discussed about it with @Anurag.
2. `housing-dashboard`: Yesterday (2024-07-17) in a meeting with @Aditya and @Vishal; we discussed some problems
regarding this project and decided to make some major changes in this project.
3. `campaign-bot`: Yesterday (2024-07-17) meeting with @Pratham; we discussed about creating an API system around this
to seemlessly integrate this bot in frontend.
4. `campaign-dashboard`: Recently refactored this project (code given by @Aditya) and deployed on HuggingFace Spaces but
project has very high latency due to bad code management. So, this project also need improvements to handle latency
and needs better architecture.
Best Regards, Anshul Raj Verma Data Science Intern2024-07-18
- Meeting with @Pratham and @Aditya for
campaign-botAPI. - There is a little progress in
campaign-botcode but getting unexpected result after passinguser_promptinquery_pipeline. - Pipeline is not working as expected and there are more concerns like how to retrive data for visualization and pass it as API response.
2024-07-21
Building campaign-bot-api with @Aditya and @Pratham and almost completed it. Made with FastAPI with JWT auth (learned
on-the-go).
Week 30 Journal
- Droping the idea of journaling of LeapX.ai work.
- Learning LangChain and RAG for internship projects.
- Discussed about User’s Search Intent classification with @Gagandeep & @Vishal.
- Completed my first month at LeapX.ai as Data Science Intern.
Week 31 Journal
- Team dinner at LeapX.ai. They order meal for me too :partying_face:
- Worked on same project Search Intent Classifier project whole week. Trained
distilbert-base-uncasedmodel 2 times. - :medal: First time following and loving olympics matches and athlete :saluting_face:
- :star_struck: Loving Lakshya Sen and Hockey Team performaces.
- Using and kind of loving the new alternative of Jupyter Notebook; Marimo.