From raw web pages to structured entities

MassIndex crawls the open web, extracts entities and relationships using ML, and assembles them into a searchable knowledge graph. Here's what that gives you.

Four entity types, fully profiled

Every entity extracted from the web gets a structured profile with relevant metadata, linked to related entities in the graph.

Companies

Industry & sub-industry
Products & services
Technologies & certifications
Key people (executives, founders)
Location (city, region, country)
Contact info & social links
Company size & type

People

Connected organizations
Role & title mentions
Co-occurrence with products
Location associations
Cross-referenced across sources

Products

Linked to maker/company
Category classification
Mentioned across articles
Related technologies
Market presence signals

Locations

Companies operating there
People associated with the area
Regional industry clusters
Geographic relationship mapping

What you can do with it

Full-Text Entity Search

Search across three collections — companies, articles, and web pages. Faceted filtering by industry, category, and domain. Highlighted snippets show why each result matched. Autocomplete suggestions as you type.

Interactive Knowledge Graph

Explore relationships visually in a force-directed graph. Filter by entity type, search by name, and traverse up to two degrees of connection. Click any node to see its full properties and connections.

Topic Monitoring & Alerts

Create custom topics with keywords, industry filters, and URL patterns. Assign priority boosts to focus crawling on what matters to you. Subscribe via webhook or email to get notified when new matching entities appear.

Public Entity Pages

Every company and domain gets a server-rendered profile page, discoverable via search engines. Structured data with JSON-LD markup means entities show up in Google with rich metadata.

ML-Powered Extraction

DistilBERT classifiers categorize content. spaCy NER models extract people, organizations, products, and locations from every page. Models are continuously retrained as new data flows in.

Continuous Crawling

A distributed crawler indexes the web around the clock. Configurable politeness, priority queues, and topic-based crawling strategies ensure coverage where it matters. Monitor crawl progress in real time.

Three products, one knowledge graph

Search the graph, subscribe to changes, or point the crawler at what matters to you.

See it for yourself

Search the index, explore the graph, or browse entity profiles.

Start Searching Start Searching