Crawl4AI Web Scraper

This system integrates an open-source web scraping engine deployed via Docker with a no-code workflow that automates website crawling, content extraction, and data ingestion for retrieval-augmented generation. The workflow uses HTTP requests to trigger asynchronous scraping tasks, parses sitemap.xml for URLs, processes markdown and HTML responses, and feeds vector embeddings (via OpenAI) into a Supabase vector store. The process also includes AI agent components such as Pydantic AI and TEN Agent. Creator Cole Medin demonstrates step-by-step deployment and integration techniques in this free template workflow.

Category:

English 🇺🇸🇬🇧