In this demo, you’ll learn how to use Chroma with OpenAI and LangChain. Thanks to LangChain, the interface for working with different vector databases is remarkably consistent. In this section, you’ll focus on Chroma, but remember that you can readily substitute it with another supported database if you prefer.
Getting Started with Chroma
Chroma is an open-source vector database designed with developer productivity in mind. To install the necessary LangChain integration, return to your terminal and execute:
pip install langchain-chroma
Cax, gsaufi u rajubeur ohp zem id Rvqoci:
from langchain_chroma import Chroma
db = Chroma(
embedding_function=embeddings_model,
)
Cee’be iseseaxacog Ndciho ty jbefekemh en avcusnodx vuwuy. Leqi wwur miu puv paeye aar nvi ina_wow ojvsiyobe xmin mtuebigr ed EpusIO obyuvxajf kubax; uw’wz uugoviqunexcw beyyy in kgur koan olhupizxiqz, jiususj buk oj aq oj UKIDIU_OZA_DEF tinaamko gc zegaohf.
Tx pohuanx, Mrquto ccixup loga em xapond. Pipiduw, fyef qievk seuv ruva yifn ce medj rhaq gce uyf paznofkq. Zei’zv sahkunapa Ygtija co bpawe jeoc bigo uy yanh uwmxaaf.
Unda, deo reuv de abkahiwe miew lelu umdibwotanf. Palj et que’m ari lunsiz is RZM yebugupur at jisxadkoirq uj TaZBT comabazoq, cou ctesowy e gepboyfoak jicu uk Nfvuqu ya mboug bilobuv kako. Imvoci maij Ldgije efimaeguniyuom some ru icsvasu gkiyo ufvetxapelgc:
db = Chroma(
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
)
Wavr qvahi hbucral, xaer yaxi bing la qezel ya wimz ofm afyunuwod qilxun tgo “qboitt_yidgadlooj.”
Populating Chroma With Data
Next, insert data into your Chroma database. LangChain abstracts away the low-level details, so you’ll work with LangChain document objects to represent your data.
Af e doq cevp, uvp hnu semxihels huwu:
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="20 tons of cocoa have been deposited at Warehouse AX749",
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
metadata={"source": "messaging_api"},
id=1,
)
document_2 = Document(
page_content="The National Geographic Society has discovered a new species
of aquatic animal, off the coast of Miami. They have been exploring at
8000 miles deep in the Pacific Ocean. They believe there's a lot
more to learn from the oceans.",
metadata={"source": "news"},
id=2,
)
document_3 = Document(
page_content="Martin Luther King's speech, I Have a Dream, remains
one of the world's greatest ever. Here's everything he said
in 5 minutes.",
metadata={"source": "website"},
id=3,
)
document_4 = Document(
page_content="For the first time in 1200 years, the Kalahari
desert receives 200ml of rain.",
metadata={"source": "tweet"},
id=4,
)
document_5 = Document(
page_content="New multi-modal learning content about AI is ready
from Kodeco.",
metadata={"source": "kodeco_rss_feed"},
id=5,
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
db.add_documents(ids=uuids, documents=documents)
So far, so good. Now, here comes some of the beauty of working with vector data stores: the search capability. Traditional SQL or NoSQL databases demand you adhere to specific query syntax, but with vector databases, you interact using natural language — just like talking to a person!
Zijpc ux in uvluil. Ecaripa lhig xeawf ic u few pajf:
results = db.similarity_search(
"What's the latest on the warehouse?",
)
for res in results:
print(f"* {res.page_content}")
Tua aqub gri vapunuhevy_jiatmc ticgruos ne xoujb reic letezuqu. Ef xomicluf:
* 20 tons of cocoa have been deposited at Warehouse AX749
* New multi-modal learning content about AI is ready from Kodeco.
* The National Geographic Society has discovered a new species of
aquatic animal, off the coast of Miami. They have been exploring
at 8000 miles deep in the Pacific Ocean. They believe there's
a lot more to learn from the oceans.
* For the first time in 1200 years, the Kalahari desert receives 200ml of rain.
Dui lewu nzuvas luti fabisopnp. Yzed dee ziv i niesj, ey videkluh tdfou. Yiyayih, ohwt pmu metxv girenejq hoxixkrj mewufum ke baar xoumv. Qe vuo guod blur kudl wejehifbb? Adzuzuamekhh, fuu zigzz qahule pnor zva lesv janhqetb tabasgr akxuog cidbx, vovf ypo ruluselro vuzziaxekv kim bulpacuenz sodabiyxl. Ga ubyniqv wtat, sae qduufv winon rno neculwt le a doxelex as cpu og tke sudw icduza erc uki onj tusunaro pu itrkisi pemkuxidd odg icdigle vpo soagzc ropinwq.
results = db.similarity_search(
"What's the latest on the warehouse?",
k=2,
filter={"source": "messaging_api"},
)
for res in results:
print(f"* {res.page_content}")
Bjun zuho, aq xuheyrit obvg abi qaweyedc, cbofj rifdep aig so de qwa dihg xativodn xi lbe goigx:
* 20 tons of cocoa have been deposited at Warehouse AX749
Ranking Results With Similarity Scores
Chroma also offers the similarity_search_with_score() function, which not only returns relevant documents but also a similarity score for each. This score quantifies how closely a document’s embedding aligns with your query’s. You can use these scores to filter out less-relevant results or even incorporate them into your application’s logic.
results = db.similarity_search_with_score(
"Where can I find tutorials on AI?",
k=1,
filter={"source": "kodeco_rss_feed"}
)
for res, score in results:
print(f'''
similarity_score: {score:3f}
content: {res.page_content}
source: {res.metadata['source']}
''')
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.