网站首页 > 数据库 / 正文
Vectorstores 是构建索引最重要的组件之一。
有关矢量存储和通用功能的介绍,请参阅:
- 入门
我们还提供了所有受支持的向量存储类型的文档。请参阅下面的列表。
- AnalyticDB
- Annoy
- Atlas
- Chroma
- Deep Lake
- DocArrayHnswSearch
- DocArrayInMemorySearch
- ElasticSearch
- FAISS
- LanceDB
- Milvus
- MyScale
- OpenSearch
- PGVector
- Pinecone
- Qdrant
- Redis
- Supabase (Postgres)
- Tair
- Typesense
- Vectara
- Weaviate
- Persistance
- Retriever options
- Zilliz
开始
此笔记本展示了与 VectorStores 相关的基本功能。使用矢量存储的一个关键部分是创建要放入其中的矢量,这通常是通过嵌入创建的。因此,建议您在深入研究之前熟悉嵌入笔记本。
这涵盖了与所有向量存储相关的通用高级功能。
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
with open('../../state_of_the_union.txt') as f:
state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts(texts, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
print(docs[0].page_content)
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.
We cannot let this happen.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
添加文本
您可以使用该方法轻松地将文本添加到 vectorstore add_texts。它将返回文档 ID 列表(以防您需要在下游使用它们)。
docsearch.add_texts(["Ankush went to Princeton"])
['a05e3d0c-ab40-11ed-a853-e65801318981']
query = "Where did Ankush go to college?"
docs = docsearch.similarity_search(query)
docs[0]
Document(page_content='Ankush went to Princeton', lookup_str='', metadata={}, lookup_index=0)
来自文件
我们也可以直接从文档初始化一个 vectorstore。当我们使用文本拆分器上的方法直接获取文档时,这很有用(当原始文档具有关联的元数据时很方便)。
documents = text_splitter.create_documents([state_of_the_union], metadatas=[{"source": "State of the Union"}])
docsearch = Chroma.from_documents(documents, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
print(docs[0].page_content)
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.
We cannot let this happen.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
AnalyticDB
AnalyticDB for PostgreSQL是一种大规模并行处理 (MPP) 数据仓库服务,旨在在线分析大量数据。
AnalyticDB for PostgreSQL基于开源项目开发,并通过. AnalyticDB for PostgreSQL 与 ANSI SQL 2003 语法以及 PostgreSQL 和 Oracle 数据库生态系统兼容。AnalyticDB for PostgreSQL 还支持行存储和列存储。AnalyticDB for PostgreSQL 高性能离线处理PB级数据,支持在线高并发查询。Greenplum DatabaseAlibaba Cloud
此笔记本展示了如何使用与AnalyticDB矢量数据库相关的功能。要运行,您应该启动并运行AnalyticDB实例:
- 使用AnalyticDB 云矢量数据库。单击此处快速部署它。
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import AnalyticDB
通过调用 OpenAI API 拆分文档并获取嵌入
from langchain.document_loaders import TextLoader
loader = TextLoader('../../../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
通过设置相关的ENVIRONMENTS连接到AnalyticDB。
export PG_HOST={your_analyticdb_hostname}
export PG_PORT={your_analyticdb_port} # Optional, default is 5432
export PG_DATABASE={your_database} # Optional, default is postgres
export PG_USER={database_username}
export PG_PASSWORD={database_password}
然后将您的嵌入和文档存储到 AnalyticDB 中
import os
connection_string = AnalyticDB.connection_string_from_db_params(
driver=os.environ.get("PG_DRIVER", "psycopg2cffi"),
host=os.environ.get("PG_HOST", "localhost"),
port=int(os.environ.get("PG_PORT", "5432")),
database=os.environ.get("PG_DATABASE", "postgres"),
user=os.environ.get("PG_USER", "postgres"),
password=os.environ.get("PG_PASSWORD", "postgres"),
)
vector_db = AnalyticDB.from_documents(
docs,
embeddings,
connection_string= connection_string,
)
查询和检索数据
query = "What did the president say about Ketanji Brown Jackson"
docs = vector_db.similarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
Chroma
Chroma是一个用于构建带有嵌入的 AI 应用程序的数据库。
此笔记本展示了如何使用与Chroma矢量数据库相关的功能。
!pip install chromadb
# get a token: https://platform.openai.com/account/api-keys
from getpass import getpass
OPENAI_API_KEY = getpass()
import os
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader
loader = TextLoader('../../../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(docs, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
Using embedded DuckDB without persistence: data will be transient
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
带分数的相似性搜索
docs = db.similarity_search_with_score(query)
docs[0]
(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),
0.3949805498123169)
持久化
以下步骤涵盖了如何持久化 ChromaDB 实例
初始化 PeristedChromaDB
为每个块创建嵌入并插入色度矢量数据库。persist_directory 参数告诉 ChromaDB 在持久化时将数据库存储在何处。
# Embed and store the texts
# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'db'
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=docs, embedding=embedding, persist_directory=persist_directory)
Running Chroma using direct local API.
No existing DB found in db, skipping load
No existing DB found in db, skipping load
持久化数据库
我们应该调用 persist() 以确保将嵌入写入磁盘。
vectordb.persist()
vectordb = None
Persisting DB to disk, putting it in the save folder db
PersistentDuckDB del, about to run persist
Persisting DB to disk, putting it in the save folder db
从磁盘加载数据库,并创建链#
确保传递与实例化数据库时相同的 persist_directory 和 embedding_function。初始化我们将用于问答的链。
# Now we can load the persisted database from disk, and use it as normal.
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
Running Chroma using direct local API.
loaded in 4 embeddings
loaded in 1 collections
Pinecone
Pinecone是一个具有广泛功能的矢量数据库。
此笔记本展示了如何使用与Pinecone矢量数据库相关的功能。
要使用 Pinecone,您必须有一个 API 密钥。这是安装说明。
!pip install pinecone-client
import os
import getpass
PINECONE_API_KEY = getpass.getpass('Pinecone API Key:')
PINECONE_ENV = getpass.getpass('Pinecone Environment:')
我们要使用OpenAIEmbeddings,所以我们必须获得 OpenAI API 密钥。
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.document_loaders import TextLoader
from langchain.document_loaders import TextLoader
loader = TextLoader('../../../state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
import pinecone
# initialize pinecone
pinecone.init(
api_key=PINECONE_API_KEY, # find at app.pinecone.io
environment=PINECONE_ENV # next to api key in console
)
index_name = "langchain-demo"
docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)
# if you already have an index, you can load it like this
# docsearch = Pinecone.from_existing_index(index_name, embeddings)
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)
其他矢量数据库的支持,请参考源手册:
https://python.langchain.com/en/latest/modules/indexes/vectorstores.html
Tags:oracle in数组
- 上一篇:Java 14 新功能介绍
- 下一篇:Redis GEO地理位置数据存储方案
猜你喜欢
- 2024-11-26 存储基础知识之存储方式
- 2024-11-26 Redis GEO地理位置数据存储方案
- 2024-11-26 Java 14 新功能介绍
- 2024-11-26 sorms 1.0.10 发布,简易ORM框架
- 2024-11-26 CENTOS断更之后,该何去何从?
- 2024-11-26 linux shell 笔记——1
- 2024-11-26 Oracle数据库扩展语言PL/SQL之自治事务
- 2024-11-26 Oracle PL/SQL编程入门篇
- 2024-11-26 一文看懂Oracle11g IO校准功能--数据库评估存储设备IO性能
- 2024-11-26 浅谈分库分表那些事儿