Site Search

Overview

On every Open Berkeley page, there is a search box that allows visitors to search your site's content.

The search uses an index, which parses page content into keywords and weights them according to things like whether they are in the page title or a subheading.

Search results pages include facets that allow filtering of results by content type and by sitewide topic.

If you search for multiple keywords, the search function treats this as an AND, which means that only pages with all keywords will be included in the results.

You can also specify pages as Top Results for specific search terms.

What is indexed

Every published, editable item on your site is indexed, regardless of type. (Content types include the core Content Page, Landing Page, News Item, and FAQ types, plus any content types provided by beta features such as Service Catalog and Portfolio). All editable content is indexed: the body content, all other fields, and any widgets you have added. Banner, menu, and footer text is not indexed.

Additionally, the alt text of images is indexed. If your pages include images that convey information, adding appropriate alt text will improve the searchability of your site.

Site search does not index content that is not hosted on your site. You cannot add other sites to your site search. The content of embedded widgets such as Campus Calendar, Twitter, or Google Docs is not indexed.

When content is indexed

Content is re-indexed as soon as you save the page. In addition, the search indexer runs hourly and checks for changes that may not have been indexed.

How words and characters are indexed

Punctuation is not indexed; only letters and numbers are indexed.

"Letters" includes characters in non-Latin alphabets such as Cyrillic, characters with diacritics such as accents and umlauts, and non-alphabetic character sets such as Chinese.

Characters with diacritics are completely unrelated to their base characters. For example, the word "résumé" is different from the word "resume," and searching for one will not return instances of the other.

The minimum length for words is 3 characters. Words shorter than 3 characters are ignored and not indexed.

All searches are case-insensitive. Searching for "IST" or "ist" returns the same results, including pages with "IST," "ist," "IsT," or any variant capitalization.

Only whole words are indexed. Searching for "precip" will not return results that include the word "precipitation."

Punctuation

Hyphens are ignored and treated as if the hyphenated parts are all the same word, so for example "long-standing" is indexed as "longstanding."

All other punctuation is treated the same as whitespace, which means that it is considered a word boundary. For example, "Rinse&Repeat" is indexed as two separate words, "Rinse" and "Repeat."

Ranking the results

The search index assigns weights to indexed words depending on where they appear in the content and how they are tagged. For example, words that are part of a page title are rated 8 times more important than ordinary text content.

Creating titles and headings that relate to the content they describe will improve the searchability of your site.

Weights

  • Page titles: 8x
  • H1 headings: 5x
  • H2 headings: 3x
  • H3 headings: 2x
  • Bold text: 2x
  • Italic text: 1.5x

All other tags use the standard weight of 1x.

Search result order

The order of search results depends on the density and weight of the search keywords on your site. Generally speaking, more instances of a word on a page will mean that the page appears higher in the results for that word.

Optimizing content for site search

Optimizing for findability by internal search is similar to optimizing for findability by external search engines such as Google.

  • Do not render text as images.
  • Use appropriate alt text for images.
  • Create page titles that describe the page content.
  • Use headings that describe and organize your page content.
  • Do not over-use headings to boost specific words.
  • Do not attempt to stuff your pages with keywords.

Differences between site search and Google

The internal site search is not a spider like Google; it does not follow links from page to page. Instead, it indexes the text content that is stored for each page. This means that pages are indexed and returned in results even if there are no links to them.

Site search does not compute page rankings based on incoming links, volume of clicks, or other measures of page popularity.

Site search does not adjust results based on things like the searcher's location, previous searches, or whether they are logged in.

Site search does not index non-HTML content such as PDF documents or video captions.