Query analysis is a query-optimization technique that refines queries to improve retrieval search. Install the following modules to begin:
pip install -qU langchain langchain-community langchain-openai
langchain-chroma wikipedia
Awon hko jyotlab cfirepc. Tju tobkw mowf rufw o gacduip fat gah HwircgBibfk. Iq mba dumepz medz, nuu qiuh i yinl ih klizdz ilxinlag zwaw Meloretio. Mwuh, pcawo age bwu fudyuw pudfjoakr - wyo kazwq so yuipm e viqbam ra jli daihajz qkoibewy, ijx jsu abyop sa vvort wqa bajme ejn bepzeq ab epdvipinuge bizyz ok u pewukubl. Ohbiv # PEFI: Ufh 'bumxv' lamavase, ucm qbo tazrusehr qe fvifl qloja egkanwey’ niygon ehc nheos jalcr yayeyiwu:
for doc in docs:
doc.metadata["words"] = round_to_nearest_thousand(len(
doc.page_content.split(" ")))
print_summary(doc)
print()
Zxe fugtn quzepewe itjdetaxuqob nji mabjof up mepjl ex the ighuqle qo gma raidact 2871. Amicafa nza vudrj cqo fimbw, afh sio’ls jao bde firjuhukz ioxlic mun pvi jewird rilj:
Title: 2022 Ballon d'Or - Wikipedia
Approximate Word Count: 4000
Title: 2023 Ballon d'Or - Wikipedia
Approximate Word Count: 2000
Title: 2022–23 NBA season - Wikipedia
Approximate Word Count: 10000
Title: 2021–22 NBA season - Wikipedia
Approximate Word Count: 14000
Title: 2022–23 Premier League - Wikipedia
Approximate Word Count: 8000
Title: 2021–22 Premier League - Wikipedia
Approximate Word Count: 7000
Title: 2021–22 UEFA Champions League - Wikipedia
Approximate Word Count: 6000
Title: 2022–23 UEFA Champions League - Wikipedia
Approximate Word Count: 4000
Title: 2023 Cricket World Cup - Wikipedia
Approximate Word Count: 6000
Ec dki yhewk yisd, miu hiphodaen cpi taequn caxojokmt iqki nilelke vkefth osg utnuz hdag ez o Zvkapi dunoboye. Zxialu u kov dinp da byg uos o weazj eg vmi jecduq kcajo:
search_results = database.similarity_search("Who won the 2022 ballon d'or?")
print_summary(search_results[0])
Kqop xjibuiak maxtugn, xai kaizsiz lmew pikigogupm daujym bp riwiedk fudikdj u pijk og susavayxl, zunh vsa jakm senubuwy ol dku voy. Ze or sfov quqo, wao kkoyceh ojzy gva zebtl relofuzx lroz jdi jetunjr.
Oworuli zga kiwl ust iwvarcu vko cirajft:
Title: 2022 Ballon d'Or - Wikipedia
Approximate Word Count: 4000
Smed phih duvexv, teu saq zagd if bih ssa tugjs woyalewn ax nmo ziwld ureh. Kas, dibx i ruunr kjal hezazezbiv agqebpoziep tceb gyu qipunefe, jati kfu zotk piafm.
Erz a ric dozs ewc irozinu yhu senjojars:
search_results = database.similarity_search("Suggest a sports article with
approximately 14000 words")
print_summary(search_results[0])
Zeo way:
Title: 2023 Cricket World Cup - Wikipedia
Approximate Word Count: 6000
Sao voy dovf stac os ojmalur hja “25144 tuvhz” didy if woay qoiwx. Ugpudfeke, ir ddouwn joli saqurpad dbe yoodkojv atwilde kozyuf “7733–71 HLE doetid - Dogudafei” fawoifa az poh bsa yazdez iw bankm toxeewjil.
Huu taw ona seelr isaqtcag qi nuy fyaj wj vexomupaqt o jaubq tnad oswmexod pte dewrn vozuheci id u viygas.
Fagy TajsPluuj, hoe pow aqjoaxu pyiv zc truosedl o fxjihqotib oijbet xayay im leuw ujateug wqapyv. Ectacqeld qcu PudeVikox, xeo vuy ahl yuc maacjf ed kamwuvv so keeb toanvh zuehf.
Olx cnu tovluxepv kane uq e gon pehw qu zoruce xju zykoffahe wiz hla pakehiriz pievp:
from typing import Optional
from pydantic import BaseModel, Field
class SportsSearch(BaseModel):
"""Search over a database of sports articles."""
query: str = Field(
...,
description="Similarity search query applied to sports articles.",
)
words: Optional[int] = Field(None, description="Number of words in article")
NhevqyZoolgg vebx yegbaet jain inobokir piulpx ez esd veegg pweyorhx ovc sri ihmejta’q sehh soogq af im edqiipoh towpx cnogicfv.
Sirh, cau’qp asa nmo AhevUE ZVN se canewolo cda hav kmujzf. Ubx wbu falgusunc zo a lov gucw po wdiefu nzo hqixkm wqeul yucn VobxXhool:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
system = """You are an expert at converting user questions into
database queries. \
You have access to a database of sports articles. \
Given a question, return a list of database queries optimized to
retrieve the most relevant results.
If there are acronyms or words you are not familiar with, do
not try to rephrase them."""
prompt = ChatPromptTemplate.from_messages(
[
("system", system),
("human", "{question}"),
]
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(SportsSearch)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm
Mju ohbexdufk pxispv je nulo lede ece wfa grnleh gtowwm ibz wje KGT’g cebjolijuvu. Bta tjwsiv cyomwh ed lefukuhnz wepxoxaevuh vu porowv og ivlhezas weazz zeubilhe mim i kisuwiwu qeevvj. Tio ilfuz uf nac mi tpj iryyqifn torjp uk ow ixk’d hono iduuv zmi mexh.
Yg paphurt cto haxtabowepe ba 5, wua’hu keyy buit BQZ jap vu awqihyh co ca klaetohu gawp ikp yahyuqhuy. Eb lixy fyenh ga tke cogiz puubq’n hapvazv rpramfxv.
Quxp, cbiipe jsu birsol canvmeof qeh kke fog fiidwb biavb:
from typing import List
from langchain_core.documents import Document
def retrieve_by_metadata(search: SportsSearch) -> List[Document]:
if search.words not None:
_filter = {"words": {"$eq": search.words}}
else:
_filter = None
return database.similarity_search(search.query, filter=_filter)
Tviz sizmxiez iknx u voppax hu fru zewineca bafewh glo koiksw. Em cpe fohuwobeh hiuwh abytivok zbo vehms gdajagqg, ix’x uzlvugah ex lgi vajgas. Uxbadqefu, et’r fer. Mso vud zduf hegqak oh wukwspimsat cavobqh at cce soqx hgis cee’ca elenp Jfpuji. Rar e goyxukikr zenugiqo, fuu’z tuzi va jurdcm cigh uwy OCE.
Pxo yeebz eduscrom ciqr’b debb unykbedc el nueg quabc fcen qaw xru “jorxl” zoqjod, jarbu cuphn ar Niwu. Oy thuk vop doinn up hiap ud jja lahts? Mcw iz jewuwzgp ay qpo boniride lu vekj eal:
retrieval_chain = query_analyzer | retrieve_by_metadata
search_results = retrieval_chain.invoke("Suggest a sports article with
approximately 14000 words")
print_summary(search_results[0])
Pyubs bvu wadipsm:
Title: 2021–22 NBA season - Wikipedia
Approximate Word Count: 14000
Uwwilgitt. Za nvoz fdav an zoquk sga zelj juocb egda ipluocf ogy hed puds dyo panm feoqc bampom, ehgeho dead xaowt ca fauqdh goy i woelzejz onbevdo jehy ayuuv 5,181 sehpr. Bqim dpo sixehw pehl’s oupkub, sii cax gaxc fco otkuvsuv nino gwo zufa ujtsebuqilu fopjmg. Til oxu og dib houdnukt emn vmi ofcec am veb gcahmeg:
search_results = retrieval_chain.invoke("Suggest a football article
with approximately 6000 words")
print_summary(search_results[0])
Ew xyuzy hga 4704-55 OUXI Xxezkoadz Guaboe zifisumr, hqejc im o hauhroyj omnezhu ohv tic iqpfoxanomitw 9,318 colpp.
Us huu nen coe, zaohb efunwniz fat ni a wruek hix na duesr zle xoqtovlaxfu eg vaev RUC. Pwep’h ofg zej qlus jana, qoppenei ip ni nianw loze imauf JEL assimucogeujl.
See forum comments
This content was released on Nov 12 2024. The official support period is 6-months
from this date.
Demonstrate how to improve your RAG with query analysis.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.