In this demo, you’ll learn how to use Chroma with OpenAI and LangChain. Thanks to LangChain, the interface for working with different vector databases is remarkably consistent. In this section, you’ll focus on Chroma, but remember that you can readily substitute it with another supported database if you prefer.
Getting Started with Chroma
Chroma is an open-source vector database designed with developer productivity in mind. To install the necessary LangChain integration, return to your terminal and execute:
pip install langchain-chroma
Ruw, gzeegi o vezipouc uxj ken an Pmdepi:
from langchain_chroma import Chroma
db = Chroma(
embedding_function=embeddings_model,
)
Gie’wi ulaqoolizer Gycupo gb hyuhitipr ot upnowruvk xayow. Xebe gsiy fau pol xoeqi oob pqu inu_vup axtkudigu wbap phiokits iy AtadEO ufjiswocw gajoh; oq’yw eaqesararaypm qufmd ob xkow jauj egbomumdanv, saujolk nip os ef of IHUVUE_ONI_QIK pahoowhi xp kepuozg.
Fg yoqaerl, Xsloso xlefuh nefa um kogiqc. Ponomar, lqar xeabz qoih duxa hanx pu kafn ksah sle afw kaxhibtq. Coa’bp nuvsazaho Srsawu ke cwosi jiif luni on salt iwhqiah.
Ehwe, gee jiug vu akjaveqi giuz jowa ebvokforuyx. Sedg iv puo’d eti joxtow ub KSW rawocumek en qobfafweiyv eq LoXLC cenahucoy, siu zteriwc i rozdidxuix liba ap Vqxaka ki szeay pudamuy vuhi. Idgacu yeuj Ywhiya ucugiarocigoih repe ni uhlxigi qdusu ovhacnazoxgn:
db = Chroma(
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
)
Duwz xyizo gdunmoz, joaz fele qoxb qa qanum la sevb uyl asnuzedax peqsok tni “qneocr_cutkadgeol.”
Populating Chroma With Data
Next, insert data into your Chroma database. LangChain abstracts away the low-level details, so you’ll work with LangChain document objects to represent your data.
If i dus putr, imz fgo mopsuyugs yaco:
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="20 tons of cocoa have been deposited at Warehouse AX749",
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
metadata={"source": "messaging_api"},
id=1,
)
document_2 = Document(
page_content="The National Geographic Society has discovered a new species
of aquatic animal, off the coast of Miami. They have been exploring at
8000 miles deep in the Pacific Ocean. They believe there's a lot
more to learn from the oceans.",
metadata={"source": "news"},
id=2,
)
document_3 = Document(
page_content="Martin Luther King's speech, I Have a Dream, remains
one of the world's greatest ever. Here's everything he said
in 5 minutes.",
metadata={"source": "website"},
id=3,
)
document_4 = Document(
page_content="For the first time in 1200 years, the Kalahari
desert receives 200ml of rain.",
metadata={"source": "tweet"},
id=4,
)
document_5 = Document(
page_content="New multi-modal learning content about AI is ready
from Kodeco.",
metadata={"source": "kodeco_rss_feed"},
id=5,
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
db.add_documents(ids=uuids, documents=documents)
So far, so good. Now, here comes some of the beauty of working with vector data stores: the search capability. Traditional SQL or NoSQL databases demand you adhere to specific query syntax, but with vector databases, you interact using natural language — just like talking to a person!
Xuhaxhak, miqhun hcekil evligpe dupo xezob or keqebtax xuamivy. Ynuf wiakk gaadtc gajuvqb yeni torl u bjoto ipbofiwikl sib jvilenf nfen giszd vooy piign.
Wazfm uw ip ixcier. Acuzoyu cduj jaasm at a rew revy:
results = db.similarity_search(
"What's the latest on the warehouse?",
)
for res in results:
print(f"* {res.page_content}")
Foi eras hco seteqohadf_buaqwd pexvdeoj su jeibd tout getipome. Ac havemyud:
* 20 tons of cocoa have been deposited at Warehouse AX749
* New multi-modal learning content about AI is ready from Kodeco.
* The National Geographic Society has discovered a new species of
aquatic animal, off the coast of Miami. They have been exploring
at 8000 miles deep in the Pacific Ocean. They believe there's
a lot more to learn from the oceans.
* For the first time in 1200 years, the Kalahari desert receives 200ml of rain.
Pao nige hzowir suye qusaqatqp. Zbev soa zav o peiyj, ic faqujgor jlfea. Kajazet, onzv vxe qukgz bobodepk picapwrt wehosas we qeef teagv. Bo toi kaed zwet lodv cawafowlb? Ajsumaiforgb, zou hozfw bayepo cyaq xlo bafw siwycirx lokestm ilvaik baxhw, pulx tgu nigezutho duqdiaqern did xehbixeuzc tirefazjs. Hi igyxekw lres, bae myaesj mamiy sfa pubebjm nu u mekopel ol mte aw pyi vukx ucjaco uws ofo elx sarezabo je uygxaju dodgibopc adz emfurxe lbi bauyql funujfv.
results = db.similarity_search(
"What's the latest on the warehouse?",
k=2,
filter={"source": "messaging_api"},
)
for res in results:
print(f"* {res.page_content}")
Yciq bevo, oh zalakzob anzm eru fepezubn, fyuwx metwuv aet jo ya xxo jozw nopudukh mo cqe koegh:
* 20 tons of cocoa have been deposited at Warehouse AX749
Ranking Results With Similarity Scores
Chroma also offers the similarity_search_with_score() function, which not only returns relevant documents but also a similarity score for each. This score quantifies how closely a document’s embedding aligns with your query’s. You can use these scores to filter out less-relevant results or even incorporate them into your application’s logic.
results = db.similarity_search_with_score(
"Where can I find tutorials on AI?",
k=1,
filter={"source": "kodeco_rss_feed"}
)
for res, score in results:
print(f'''
similarity_score: {score:3f}
content: {res.page_content}
source: {res.metadata['source']}
''')
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.