Skip to main content
Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format. This ensures that data can be handled consistently regardless of the source. All document loaders implement the BaseLoader interface.

Interface

Each document loader may define its own parameters, but they share a common API:
  • .load(): Loads all documents at once.
  • .loadAndSplit(): Loads all documents at once and splits them into smaller documents.
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";

const loader = new CSVLoader(
  ...  // <-- Integration specific parameters here
);
const data = await loader.load();

By category

LangChain.js categorizes document loaders in two different ways:
  • File loaders, which load data into LangChain formats from your local filesystem.
  • Web loaders, which load data from remote sources.

File loaders

If you’d like to contribute an integration, see Contributing integrations.

PDFs

Document LoaderDescriptionPackage/API
PDFLoaderLoad and parse PDF files using pdf-parsePackage

Common File Types

Document LoaderDescriptionPackage/API
CSVLoad data from CSV files with configurable column extractionPackage
JSONLoad JSON files using JSON pointer to target specific keysPackage
JSONLinesLoad data from JSONLines/JSONL filesPackage
TextLoad plain text filesPackage
DOCXLoad Microsoft Word documents (.docx and .doc formats)Package
EPUBLoad EPUB files with optional chapter splittingPackage
PPTXLoad PowerPoint presentationsPackage
SubtitlesLoad subtitle files (.srt format)Package

Specialized File Loaders

Document LoaderDescriptionPackage/API
DirectoryLoaderLoad all files from a directory with custom loader mappingsPackage
UnstructuredLoaderLoad multiple file types using Unstructured APIAPI
MultiFileLoaderLoad data from multiple individual file pathsPackage
ChatGPTLoad ChatGPT conversation exportsPackage
Notion MarkdownLoad Notion pages exported as MarkdownPackage
OpenAI Whisper AudioTranscribe audio files using OpenAI Whisper APIAPI

Web loaders

Webpages

Document LoaderDescriptionWeb SupportPackage/API
CheerioLoad webpages using Cheerio (lightweight, no JavaScript execution)Package
PlaywrightLoad dynamic webpages using Playwright (supports JavaScript rendering)Package
PuppeteerLoad dynamic webpages using Puppeteer (headless Chrome)Package
FireCrawlCrawl and convert websites into LLM-ready markdownAPI
SpiderFast crawler that converts websites into HTML, markdown, or textAPI
RecursiveUrlLoaderRecursively load webpages following linksPackage
SitemapLoad all pages from a sitemap.xmlPackage
BrowserbaseLoad webpages using managed headless browsers with stealth modeAPI
WebPDFLoaderLoad PDF files in web environmentsPackage

Cloud Providers

Document LoaderDescriptionWeb SupportPackage/API
S3Load files from AWS S3 bucketsPackage
Azure Blob Storage ContainerLoad all files from Azure Blob Storage containerPackage
Azure Blob Storage FileLoad individual files from Azure Blob StoragePackage
Google Cloud StorageLoad files from Google Cloud Storage bucketsPackage
Google Cloud SQL for PostgreSQLLoad documents from Cloud SQL PostgreSQL databasesPackage

Productivity Tools

Document LoaderDescriptionWeb SupportPackage/API
Notion APILoad Notion pages and databases via APIAPI
FigmaLoad Figma file dataAPI
ConfluenceLoad pages from Confluence spacesAPI
GitHubLoad files from GitHub repositoriesAPI
GitBookLoad GitBook documentation pagesPackage
JiraLoad issues from Jira projectsAPI
AirtableLoad records from Airtable basesAPI
TaskadeLoad Taskade project dataAPI

Search & Data APIs

Document LoaderDescriptionWeb SupportPackage/API
SearchAPILoad web search results from SearchAPI (Google, YouTube, etc.)API
SerpAPILoad web search results from SerpAPIAPI
Apify DatasetLoad scraped data from Apify platformAPI

Audio & Video

Document LoaderDescriptionWeb SupportPackage/API
YouTubeLoad YouTube video transcriptsPackage
AssemblyAITranscribe audio and video files using AssemblyAI APIAPI
SonixTranscribe audio files using Sonix APIAPI

Other

Document LoaderDescriptionWeb SupportPackage/API
CouchbaseLoad documents from Couchbase database using SQL++ queriesPackage
LangSmithLoad datasets and traces from LangSmithAPI
Hacker NewsLoad Hacker News threads and commentsPackage
IMSDBLoad movie scripts from Internet Movie Script DatabasePackage
College ConfidentialLoad college information from College ConfidentialPackage
Blockchain DataLoad blockchain data (NFTs, transactions) via Sort.xyz APIAPI

All document loaders


I