How a search engine actually finds matches.
Inverted Indexes Up Close: how a search engine actually finds matches
match the datastore to the access pattern
Stemming can over-collapse terms.
An inverted index tokenizes text, normalizes (lowercase, stem), and stores 'term → list of (doc_id, freq, positions)'. Queries intersect posting lists and rank with TF-IDF or BM25.
TF-IDF: term frequency × inverse document frequency. Common terms get less weight.
Phrase queries use position info; proximity queries use windowing.
Index size grows with vocab × postings; compress with VByte / FOR.
Implement a tiny inverted index.