yellowsubmarine372

Building an AI-Powered Text Simplifier for Slow Learners — A Tech for Impact Retrospective

Can LLMs bridge the information gap for slow learners? — This was the guiding question behind 피치서가ai a service that uses AI to generate text tailored to each reader’s comprehension level across five linguistic dimensions.

Context: Tech for Impact & B-Peach LAB

alt text

Tech for Impact is a technology initiative by Kakao Impact Foundation (the corporate foundation of Kakao) that connects social innovators with IT professionals to build technology that drives real social change. It runs through a format called LABs — 6-month collaborative projects where working professionals volunteer weekly alongside nonprofits.

alt text

I participated in B-Peach LAB Season 2 partnered with Peach Market (피치마켓) — a Korean social enterprise that produces accessible content for people with intellectual and developmental disabilities. Peach Market’s mission is straightforward but urgent: information is the most fundamental resource for navigating life yet millions of slow learners are cut off from it because the written world simply isn’t designed for them.

I was on the AI engineering team and as the project evolved I became the sole developer responsible for the entire AI backend — architecture design backend integration prompt engineering observability latency experiments and cross-team communication.

This post is a technical retrospective of what I built what I learned and what I’d do differently.


The Problem

Information is the foundation of how we understand the world form values and make decisions. But for millions of people — those with intellectual disabilities developmental disabilities limited education or simply unfamiliar with specialized domains — the written information around them is inaccessible. Government notices news articles textbooks literary works even basic service instructions become barriers rather than bridges.

Peach Market had been addressing this through manual “easy language” translation but human editors can only process so much. The Season 1 LLM tool helped but it offered only a single one-size-fits-all difficulty level. A text simplified for a 7-year-old reader is useless for a teenager with mild learning difficulties — it strips too much nuance. Conversely a mildly simplified version still walls out readers who need more fundamental restructuring.

We needed a system that could systematically control how text gets simplified across multiple dimensions grounded in reading comprehension theory — not just “make it easier” but “make it easier in these specific ways to this specific degree.”

Architecture: From Monolith to Modular Domain Separation

Identifying the Problem Early

When I joined the codebase was a single FastAPI application with all business logic crammed into services/service.py. I raised the first refactoring proposal early on:

The Redesign

I designed and proposed a new architecture that cleanly separated concerns:

ai-api-v2/
├── ai/                        # AI domain layer (pure business logic)
│   ├── core/                  # Interfaces base classes shared contracts
│   │   ├── base_usecase.py
│   │   └── infra/
│   │       └── model_router.py
│   ├── single_turn/           # v1 simplification module
│   │   ├── usecase.py
│   │   └── service.py
│   ├── multi_strategy/        # v2 multi-strand simplification
│   │   ├── domain/
│   │   │   ├── entity/
│   │   │   └── constants.py
│   │   ├── components/
│   │   └── usecase.py
│   └── e2e/                   # Domain-level entry point + DI container
│       ├── di/
│       │   ├── config.py
│       │   └── container.py
│       └── main.py
├── server/                    # Server layer (FastAPI knows nothing about AI internals)
│   ├── api/
│   │   ├── main.py
│   │   └── routers.py
│   └── middleware/
│       └── logging.py
└── tests/

Key design decisions:

  1. Fixed dependency direction: server → ai.e2e.main → ai/*. The server layer never imports AI internals directly. The AI domain doesn’t know FastAPI exists.
  2. Domain-owned assembly: Which implementation to use (single-turn multi-strategy future RAG) is decided by ai/e2e not by the server. Swapping or composing use cases requires zero server changes.
  3. DI container: A Container class centralizes object creation — LLM clients use cases components — making it trivial to mock for testing.
  4. Testability by design: Each module can be tested in isolation. No more spinning up the entire server just to verify prompt logic.

Before implementing I walked the team through the redesign rationale collected feedback and iterated on the proposal. After the refactor I wrote development guidelines so teammates could add new modules following a consistent pattern.


The “팔만대장경” Prompt System

The core innovation of this project was the prompt architecture which we internally called 팔만대장경 — a reference to the Tripitaka Koreana a massive collection of Buddhist scriptures. The name reflected the ambition: a comprehensive systematic framework for text simplification.

Theoretical Foundation: The Reading Rope

We grounded our approach in Scarborough’s Reading Rope model which decomposes reading comprehension into five intertwined “strands”:

StrandWhat It Controls
Background KnowledgeImplicit world knowledge needed to understand context
VocabularyWord-level complexity — technical terms Sino-Korean words abstractions
Language StructureSentence complexity — subordinate clauses passive voice pronoun resolution
Verbal ReasoningInference metaphor implied causality
Literacy KnowledgeGenre awareness text organization signal words

Each strand has three levels (A B C) representing different reader profiles. The combination of strand levels forms a strategy profile that determines which “tools” the LLM receives. The LLM receives a dynamically assembled prompt containing only the predefined tools relevant to the reader’s profile.

I participated in the research sessions where the team mapped the Reading Rope strands to Korea’s national Korean language education standards classifying comprehension benchmarks by strand and level. This theoretical mapping became the backbone of our tool definitions.


Cross-Team Integration

Backend Communication

I took ownership of understanding and documenting how the AI server connected to the backend (Spring Boot). There were limited documentation so I traced the end to end request flow by reading both codebases:

Backend (Spring Boot) → Port 8080 → Nginx Proxy (80→8000) → FastAPI Container

I then proposed and implemented DTO changes to clean up the contract between teams:

I documented every change communicated it to the backend team and coordinated the integration testing timeline.

Infrastructure Team

After the refactor moved the codebase to a new repository (ai-apiai-api-v2) I coordinated with the infrastructure team for CI/CD setup. I set up GitHub Actions workflows for linting and container testing and worked with infra to get the new images deployed.

Frontend Team: The Adaptation Prevention Tag Problem

The frontend team needed certain HTML spans to pass through the LLM untouched (e.g. proper nouns marked with <prevent-adaptation> tags). The naive approach — regex string matching — was fragile.

My solution:

  1. Parse the input as HTML (not regex) to extract prevention tags.
  2. Replace UUID-based tag IDs with sequential integers before sending to the LLM. UUIDs consume excessive tokens and are prone to LLM hallucination; integer IDs are compact and stable.
  3. Restore original UUIDs in the output by reverse-mapping.

This approach was informed by known best practices for UUID handling in LLM contexts and solved the problem without requiring changes from other teams.


Observability and Latency

Logging Architecture

I led the effort to define a structured logging standard for the AI team. After discussions with the backend team about unified tracing I designed a phased approach:

I wrote the detailed implementation proposals for each phase including code samples for both the Spring Boot and FastAPI sides and presented trade-offs to the team. While the full stack wasn’t implemented (the team was constrained by time — this was a side project for working professionals) the structured logging in Phase 1 shipped and became the foundation for debugging in production.

Latency Experiments

The LLM call was the dominant bottleneck — median response times around 12 seconds with P99 hitting 30+ seconds. I ran systematic experiments to find actionable improvements:

Experiment 1: Anthropic Direct vs. OpenRouter

Using Locust with 10 concurrent users across three text lengths (short/medium/long) I found:

Experiment 2: OpenRouter Routing Strategy (sort=latency)

A teammate suggested testing OpenRouter’s sort: latency parameter. I built the test harness ran the experiment and dug into the results:

Practical outcomes: I documented that the realistic levers for latency reduction were: (1) prompt optimization to reduce token count (2) upgrading the OpenRouter tier and (3) adding reqeust options discovered by finding provider docs that is suitable for my situation.

Contributing a Bug Fix to LiteLLM

While investigating timeout behavior I discovered that our user-configured timeout values were being silently ignored for OpenRouter requests. Setting timeout=20 should have cut off slow responses at 20 seconds but requests were hanging for up to 300 seconds — aiohttp’s default.

I traced the issue into the LiteLLM source code. In aiohttp_transport.py the ClientTimeout was being constructed with sock_connect sock_read and connect fields — but the total field was never set so it fell back to aiohttp’s 300-second default. This meant every other provider (Anthropic OpenAI Bedrock) respected the configured timeout correctly but OpenRouter — which routes through aiohttp — did not.

I filed issue #16394 with a detailed root cause analysis and submitted PR #16395 with the fix: setting ClientTimeout.total to match the user-configured timeout value using the read component for consistency with how LiteLLM maps its single timeout parameter across all httpx timeout components. The PR included three test cases covering the timeout trigger behavior and was merged into main within a few days.


Shipping Under Constraints

Cost Policy

Early on I noticed the team had no API key management or cost controls. Having been burned by unexpected bills before I raised this with the team lead and pushed for a formal policy. The result: a shared API key management sheet and usage guidelines — simple but essential for a nonprofit with limited budget.

Structured Output Challenges

OpenRouter didn’t support structured outputs for Claude Sonnet or Gemini models. I tried multiple approaches:

  1. Pydantic model_json_schema() — generated schemas with unresolved $ref references requiring complex post-processing.
  2. OpenAI function-call format — needed anyOftype conversions and null field additions.
  3. Each fix introduced new edge cases (e.g. nested ModifierRelation schema failing to generate properties).

I eventually abandoned the Pydantic-based schema generation entirely and switched to explicit JSON schema files — manually defined version-controlled and reliable. Sometimes the pragmatic solution beats the elegant one.

Discord Error Bot

During QA testers reported issues via Discord — but I couldn’t always respond immediately (side project day job). I built a Discord monitoring bot that automatically posted detailed error reports. Complete with stack traces request bodies and Langfuse trace links. This wasn’t just a convenience — it was a commitment to continued accountability even after the formal project period ended.


Feature Development

Beyond architecture I continuously shipped features driven by evolving requirements:


Reflections

What Went Well

What I’d Do Differently

Working Under Real Constraints

This project taught me what engineering looks like when you can’t throw money or headcount at problems. The nonprofit had limited budget. The team was volunteers with day jobs. Every architectural decision had to account for: Can a future volunteer understand this? Does this add work for web-backend team that’s already stretched thin?

The answer to “what’s the best solution?” was never entirely technical — it was always “what’s the best solution given these constraints?”


Tech Stack

LayerTechnology
API ServerFastAPI Gunicorn
LLM OrchestrationLiteLLM Langfuse (prompt mgmt + tracing)
InfraDocker EC2 GitHub Actions CI
TestingLocust (load testing) pytest
MonitoringLangfuse

Tech for Impact is a technology initiative by Kakao Impact Foundation that matches social innovators with IT professionals for 6-month collaborative projects. Since its inception 618 tech volunteers have participated across 71 projects. If you’re a working engineer in Korea looking for a meaningful way to apply your skills — building real products for real users who need them — I’d highly recommend it.

← all posts