Most teens use group chats to swap memes and homework screenshots. A recent HigherEds bootcamp team asked a different question: what if the group chat itself became a private tutor? By the end of a single month they had shipped a Telegram bot that quizzes users on class notes, tracks weak topics and recommends custom study blocks. The project proves that with a clear sprint plan, open-source tools and on-call 1:1 mentorship, ambitious students can move from idea to deployment in record time.
Week-by-Week Build
Week | Goals | Key Tools | Bootcamp Touchpoints |
---|---|---|---|
1 | Define features and gather content (biology, calculus notes) | Obsidian export, Python parser | Mentor session on prompt engineering and token limits |
2 | Create a vector database for instant retrieval | OpenAI embeddings, Pinecone | Tutor deep-dive on cosine similarity and indexing |
3 | Build the dialogue manager | LangChain, Telegram API, Flask | Code clinic on rate limits and webhook security |
4 | Add analytics and launch to classmates | Google Colab dashboard, SQLite | Demo day rehearsal with feedback on UX copy |
The sprint board lived on GitHub Projects; each card linked to commits so mentors could review code asynchronously. That workflow alone saved more than ten hours of back-and-forth and mirrors the agile patterns used in professional AI teams.
Deep-Dive on Key Build Elements
1. Content Pipeline
Markdown parser strips YAML front-matter, cleans LaTeX, and chunk-splits at 300 tokens to maximise retrieval precision.
Each chunk stored in Pinecone with two metadata keys:
subject
anddifficulty
. This allowed the bot to filter biology vs. calculus on demand.
2. Embedding & Retrieval
Selected text-embedding-3-small for its 512-token limit and low cost ($0.02 / 1k tokens).
Index dimension: 1536, similarity metric: cosine, top-k: 5.
Retrieval latency averaged 42 ms per query on Pinecone’s Starter tier.
3. Dialogue Manager
LangChain Retrieval-QA Chain with re-ranking: first aggregate top-k chunks, then prepend them to system prompt.
Prompt template reserved 1 000 tokens for context, 300 for user question, leaving 200 for answer headroom to stay within GPT-3.5-Turbo limits.
Added a Confidence Survey (“Was this helpful? 1–5”). Score ≤ 3 triggers follow-up questions tagged in SQLite.
4. Telegram Integration
One Flask route
/webhook
receives updates.Telegram secret token stored in GCP Secret Manager; loaded via environment variable at runtime.
Implemented exponential back-off to handle 429 rate-limit errors from Telegram.
5. Adaptive Spaced Repetition
Each Q&A round logs
user_id
,chunk_id
,confidence
,timestamp
.Every midnight, a cron job runs an Ebbinghaus decay function to schedule the next quiz date.
Low-confidence chunks surface sooner; mastered chunks push out to 7-day, 14-day, 30-day intervals.
What Made the Bot Smart
Context windows that fit teen brains
Instead of dumping full textbook chapters, the parser chunked notes into 300-token passages keyed by headings. Retrieval then surfaced the smallest chunk that answered a query, keeping responses under thirty seconds of reading.Adaptive questioning
After every answer the bot tagged user confidence on a five-point scale. Scores below three fed directly into the next quiz cycle, imitating spaced repetition. A pilot test with nine students showed a 28 percent jump in recall on a unit quiz compared to peers using static flashcards.Data privacy
All embeddings stored only anonymised note IDs, no raw sentences. Parents appreciated that design choice, and it satisfied school policy on sharing class material.
Why the Timeline Worked
Focused scope: No grades, no gamification, just Q&A plus tracking.
Mentor guardrails: When the retrieval pipeline looped, a tutor spotted the missing await call in minutes.
Early users: Recruited ten classmates on day ten, catching confusing prompts long before launch.
HigherEds bootcamps bake these principles into every project. Weekly checkpoints stop scope creep, and one-to-one tutoring fills theory gaps the moment they appear.
Lessons for Future Builders
Start with high-quality notes; garbage in still means garbage out.
Retrieval-augmented generation beats giant context windows for speed and cost.
A simple confidence survey unlocks powerful personalised practice.
Privacy earns trust; strip PII before you embed anything.
Demo early and often; classmates are ruthless but invaluable testers.
Where to Go Next
The team is now exploring a progress dashboard that recommends session plans to parents and tutors. They are also testing a “voice chat” mode using Whisper for students who prefer speaking to typing.
If your teen spends evenings scrolling for study help, imagine them improving their own chatbot instead. Consider enrolling them in the next HigherEds AI Bootcamp or booking a block of tutoring hours to turn daily questions into project fuel.