LangSmithLoader
This notebook provides a quick overview for getting started with the
LangSmithLoader. For detailed
documentation of all LangSmithLoader
features and configurations head
to the API
reference.
Overview
Integration details
Class | Package | Local | Serializable | PY support |
---|---|---|---|---|
LangSmithLoader | @langchain/community | ✅ | beta | ✅ |
Loader features
Source | Web Loader | Node Envs Only |
---|---|---|
LangSmithLoader | ✅ | ❌ |
FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible sub-pages and give you clean markdown and metadata for each. No sitemap required.
FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the mendable.ai team.
This guide shows how to scrap and crawl entire websites and load them
using the LangSmithLoader
in LangChain.
Setup
To access the LangSmith document loader you’ll need to install
@langchain/core
, create a LangSmith account
and get an API key.
Credentials
Sign up at https://langsmith.com and generate an API key. Once you’ve
done this set the LANGSMITH_API_KEY
environment variable:
export LANGSMITH_API_KEY="your-api-key"
Installation
The LangSmithLoader
integration lives in the @langchain/core
package:
- npm
- yarn
- pnpm
npm i @langchain/core
yarn add @langchain/core
pnpm add @langchain/core
Create example dataset
For this example, we’ll create a new dataset which we’ll use in our document loader.
import { Client as LangSmithClient } from "langsmith";
import { faker } from "@faker-js/faker";
const lsClient = new LangSmithClient();
const datasetName = "LangSmith Few Shot Datasets Notebook";
const exampleInputs = Array.from({ length: 10 }, (_, i) => ({
input: faker.lorem.paragraph(),
}));
const exampleOutputs = Array.from({ length: 10 }, (_, i) => ({
output: faker.lorem.sentence(),
}));
const exampleMetadata = Array.from({ length: 10 }, (_, i) => ({
companyCatchPhrase: faker.company.catchPhrase(),
}));
await lsClient.deleteDataset({
datasetName,
});
const dataset = await lsClient.createDataset(datasetName);
const examples = await lsClient.createExamples({
inputs: exampleInputs,
outputs: exampleOutputs,
metadata: exampleMetadata,
datasetId: dataset.id,
});
import { LangSmithLoader } from "@langchain/core/document_loaders/langsmith";
const loader = new LangSmithLoader({
datasetName: "LangSmith Few Shot Datasets Notebook",
// Instead of a datasetName, you can alternatively provide a datasetId
// datasetId: dataset.id,
contentKey: "input",
limit: 5,
// formatContent: (content) => content,
// ... other options
});
Load
const docs = await loader.load();
docs[0];
{
pageContent: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.',
metadata: {
id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',
created_at: '2024-08-20T17:01:38.984045+00:00',
modified_at: '2024-08-20T17:01:38.984045+00:00',
name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',
dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',
source_run_id: null,
metadata: {
dataset_split: [Array],
companyCatchPhrase: 'Integrated solution-oriented secured line'
},
inputs: {
input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
},
outputs: {
output: 'Excepturi adeptio spectaculum bis volaticus accusamus.'
}
}
}
console.log(docs[0].metadata);
{
id: 'f1a04800-6f7a-4232-9743-fb5d9029bf1f',
created_at: '2024-08-20T17:01:38.984045+00:00',
modified_at: '2024-08-20T17:01:38.984045+00:00',
name: '#f1a0 @ LangSmith Few Shot Datasets Notebook',
dataset_id: '9ccd66e6-e506-478c-9095-3d9e27575a89',
source_run_id: null,
metadata: {
dataset_split: [ 'base' ],
companyCatchPhrase: 'Integrated solution-oriented secured line'
},
inputs: {
input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
},
outputs: { output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }
}
console.log(docs[0].metadata.inputs);
{
input: 'Conventus supellex aegrotatio termes. Vapulus abscido ubi vita coadunatio modi crapula comparo caecus. Acervus voluptate tergeo pariatur conor argumentum inventore vomito stella.'
}
console.log(docs[0].metadata.outputs);
{ output: 'Excepturi adeptio spectaculum bis volaticus accusamus.' }
console.log(Object.keys(docs[0].metadata));
[
'id',
'created_at',
'modified_at',
'name',
'dataset_id',
'source_run_id',
'metadata',
'inputs',
'outputs'
]
API reference
For detailed documentation of all LangSmithLoader
features and
configurations head to the API
reference