Table of Contents
Faceted search is a parametric search with the difference that user can judge about distribution of results by different cataegories (facets) upfront. One more improvement is when the system sugests most relevant facets depending on the type of search. For example, if user searches:
- for screen - resolution and diagonal are relvant parameters
- for fridge - volume and energy efficiency are relevant parameters
E-commerce example #
Classical example of faceted search is search for e-commerce website.
Typically you would see following filters:
Aditionaly there would be:
- results displayed as a grid (first screenshot) or as a list
- sorting, for example by price, by popularity or relevance
Rental website #
Idea is the same, but additionally:
- results can be displayed on the map
- filter may include date range
Similar ideas can be seen in data tables, in computational notebooks, like Jupyter, Kaggle, Observable etc.:
Or in some plotting libraries
|Range slider||Scatterplot Matrix (SPLOM)|
Typical solution is to use some kind of search engine with support for faceted search, for example:
- There is an attempt to compile it to WASM
- Not a typical choice, but also may work - DuckDB, because it has Full Text Search and GROUPING SETS
Except backend you would need some kind of UI. There are a lot of candidates:
- instantsearch Plain JS, React, Vue, Angular
- reactivesearch React, Vue
- AddSearch/search-ui Plain JS
- coveo/search-ui Plain JS
- sajari/search-ui React
- Flowbite: Tailwind CSS Faceted Search Drawers
In similar way we can use faceted search at the client side. I found 3 libraries:
I decided to try them out. I started with tanstack and
shadcn/ui (React, Radix, TailwindCSS). Then I replaced faceting capabilities with Orama, but preserved UI. Then I replaced faceting capabilities with ItemsJS.
I found couple datasets for the demo:
Demo is not ideal, but enough to compare approaches:
- For filter with checkboxes I use
Comandcomponent, which is probably wrong. Instead component should be able to load more options and use some kind of fuzzy search
- Filter with slider misses number marks. See #1188
- Filters should be collapsible, like Accordion component
- I need to store state of filter in URL
- UI “jumps” - scroll position changes unexpectedly (sometimes)
Tanstack table native faceting #
I’m impresed by Tanstack table, it packs so many features and has elegant API layer.
- Filter with checkboxes
- Options should be sorted by frequency
- Options should be limited to first 10-20, with ability to fetch more on request
- Search and sorting is done in main thread, so there is slight latency on keyboard input
- There is no full-text ssearch (only substring match), but this is irrelevant, because I’m mainly interested in faceting
Tanstack table + Orama #
I wanted to preserve the same UI, so I integrated Orama in Tanstack table.
Initial load of the data (10000 records) was so slow that I had to move it in Web Worker. Later, I limited demo to 1000 records.
Orama has decent full-text search, but faceting is sad:
- Filter with checkboxes
- Options for
stringfacets sorted by frequency, but for
- When option is selected it removes values from the same facet, but instead it should only change other facets
- There is no way to limit number of options returned for the facet
- Options for
- Filter with slider
- There are no min and max values for facets, so this filter in demo is broken
And there are another small bugs.
Tanstack table + ItemsJS #
ItemsJS focuses on faceting, and full-text search is outsourced - by default, it uses Lunr. But you can switch to another solution, for example, minisearch.
Secret sauce is FastBitSet.js.
- moving selected options to top
- limiting number of options per facet
- min, max values for numerical facets
- preserving unselected options in facets
I almost didn’t find downsides, except:
- TypeScript signatures can be better (
- For one letter search, it returns empty result, but I think this is due to full-text search engine
Other things to try:
- integrate different full-text or fuzzy search engine
- move it to Web Worker
- integrate with Instantsearch
- implement slider component with mini-plot
- implement date-range component
- implement hierarchical categories component, like file tree
Other ideas and links #
Prebuild index for static websites #
Typical solution for search for static websites, like Hugo, is to load data as JSON in memory and then index it. Is there a way to build index upfront and fetch it from the server with HTTP range request? It can be optimized-for-reads format, like Arrow.
- stork (deprecated) has CLI for building index and JS library to consume it.
- orama/plugin-data-persistence can store index data as JSON or as dpack, but not sure if stores raw data or index.
- Pagefind has CLI for building index and JS library to consume it. Stores index as CBOR.