Full-Text Search
ParticleDB provides text search through SQL pattern matching operators and string functions. For semantic search, combine these with Vector Search using hybrid RAG pipelines.
Case-sensitive pattern matching with two wildcards:
%matches any sequence of zero or more characters_matches exactly one character
-- Find emails from a specific domainSELECT name, emailFROM usersWHERE email LIKE '%@company.com';
-- Find 3-letter product codes starting with "A"SELECT * FROM productsWHERE sku LIKE 'A__';
-- Find names starting with "Jo"SELECT * FROM usersWHERE name LIKE 'Jo%';
-- Combine wildcards: names containing "son" anywhereSELECT * FROM usersWHERE name LIKE '%son%';NOT LIKE
Section titled “NOT LIKE”Negate the pattern:
-- Users NOT from gmailSELECT name, emailFROM usersWHERE email NOT LIKE '%@gmail.com';Case-insensitive version of LIKE. Same wildcard rules, but ignores case when comparing:
-- Find "smith", "Smith", "SMITH", etc.SELECT * FROM usersWHERE last_name ILIKE '%smith%';
-- Case-insensitive domain searchSELECT * FROM usersWHERE email ILIKE '%@Company.COM';ILIKE is a PostgreSQL extension supported by ParticleDB. Use it whenever case-insensitive matching is needed without manually calling LOWER() on both sides.
Escape Characters
Section titled “Escape Characters”To match a literal % or _ in a LIKE pattern, use a backslash escape:
-- Find discounts stored as "10%", "20%", etc.SELECT * FROM promotionsWHERE label LIKE '%\%%';
-- Find column names containing underscoresSELECT * FROM metadataWHERE column_name LIKE '%\_%';String Functions for Text Search
Section titled “String Functions for Text Search”STARTS_WITH / ENDS_WITH
Section titled “STARTS_WITH / ENDS_WITH”Test whether a string begins or ends with a given prefix or suffix. These return a boolean and are often more readable than equivalent LIKE patterns.
-- Find URLs starting with "https"SELECT * FROM linksWHERE STARTS_WITH(url, 'https');
-- Find files ending with ".csv"SELECT * FROM uploadsWHERE ENDS_WITH(filename, '.csv');
-- Equivalent LIKE patterns (STARTS_WITH = 'prefix%', ENDS_WITH = '%suffix')SELECT * FROM links WHERE url LIKE 'https%';SELECT * FROM uploads WHERE filename LIKE '%.csv';POSITION / STRPOS
Section titled “POSITION / STRPOS”Find the position of a substring within a string. Returns a 1-based index, or 0 if not found.
-- Find the position of "@" in email addressesSELECT email, POSITION(email, '@') AS at_posFROM users;
-- STRPOS is an aliasSELECT email, STRPOS(email, '@') AS at_posFROM users;
-- Filter: only rows where a substring existsSELECT * FROM logsWHERE POSITION(message, 'ERROR') > 0;SUBSTRING / SUBSTR
Section titled “SUBSTRING / SUBSTR”Extract a portion of a string. Uses 1-based indexing.
-- Extract domain from email (after the @)SELECT email, SUBSTRING(email, POSITION(email, '@') + 1) AS domainFROM users;
-- First 10 charactersSELECT SUBSTRING(title, 1, 10) AS title_previewFROM articles;REPLACE
Section titled “REPLACE”Replace all occurrences of a substring:
-- Normalize whitespaceSELECT REPLACE(title, ' ', ' ') AS clean_titleFROM articles;
-- Redact sensitive dataSELECT REPLACE(phone, SUBSTRING(phone, 4, 4), '****') AS masked_phoneFROM users;REGEXP_REPLACE
Section titled “REGEXP_REPLACE”Pattern-based replacement using regular expressions:
-- Extract domain from URLSELECT url, REGEXP_REPLACE(url, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS domainFROM pages;
-- Remove all non-alphanumeric charactersSELECT REGEXP_REPLACE(title, '[^a-zA-Z0-9 ]', '') AS clean_titleFROM articles;ParticleDB automatically detects common regex patterns (like URL domain extraction) and uses optimized string-slicing fast paths instead of the regex engine.
Combining String Functions
Section titled “Combining String Functions”Build complex text processing pipelines by composing functions:
-- Normalize and searchSELECT *FROM productsWHERE LOWER(TRIM(name)) LIKE '%wireless%mouse%';
-- Extract and aggregateSELECT LOWER(SUBSTRING(email, POSITION(email, '@') + 1)) AS domain, COUNT(*) AS user_countFROM usersGROUP BY domainORDER BY user_count DESCLIMIT 10;Text Search with Aggregation
Section titled “Text Search with Aggregation”STRING_AGG
Section titled “STRING_AGG”Concatenate matching text values across rows:
-- Collect all tags for each article into a comma-separated stringSELECT article_id, STRING_AGG(tag, ', ') AS all_tagsFROM article_tagsGROUP BY article_id;Search within Aggregated Text
Section titled “Search within Aggregated Text”-- Find articles whose tags include "machine-learning"SELECT article_id, STRING_AGG(tag, ', ') AS tagsFROM article_tagsGROUP BY article_idHAVING STRING_AGG(tag, ', ') LIKE '%machine-learning%';Hash Functions for Text Fingerprinting
Section titled “Hash Functions for Text Fingerprinting”Use hash functions to create text fingerprints for deduplication or integrity checks:
-- MD5 fingerprint for deduplicationSELECT MD5(content) AS content_hash, COUNT(*) AS duplicatesFROM articlesGROUP BY MD5(content)HAVING COUNT(*) > 1;
-- SHA-256 for content integritySELECT id, title, SHA256(content) AS content_shaFROM documents;Available hash functions: MD5, SHA1, SHA256, SHA512.
Hybrid Search with RAG Pipelines
Section titled “Hybrid Search with RAG Pipelines”For production search applications, combine keyword matching with vector similarity using a hybrid RAG pipeline:
-- Create a hybrid pipeline that combines BM25 keyword scoring-- with vector similarityCREATE RAG PIPELINE article_search TABLE articles VECTOR_COLUMN embedding TEXT_COLUMN content TOP_K 10 MODE hybrid VECTOR_WEIGHT 0.7 MODEL 'text-embedding-3-small';In hybrid mode:
- Vector component (weight 0.7): Finds semantically similar content even when exact keywords don’t match
- BM25 component (weight 0.3): Boosts results that contain the exact query terms
This gives better results than either approach alone — vector search handles paraphrases and synonyms, while BM25 ensures exact keyword matches rank highly.
See Vector Search for full details on RAG pipelines.
Performance Tips
Section titled “Performance Tips”- LIKE patterns starting with
%(e.g.,LIKE '%keyword%') require a full table scan. When possible, anchor the pattern at the start (e.g.,LIKE 'prefix%'). - ILIKE internally lower-cases both sides. For frequently queried columns, consider storing a pre-lowered version.
- POSITION/STRPOS is vectorized — it operates on entire Arrow string columns at once, not row by row.
- For large-scale text search, hybrid RAG pipelines (vector + BM25) provide both precision and semantic understanding.
- REGEXP_REPLACE with common URL extraction patterns uses an optimized fast path that avoids the regex engine entirely.