From raw web pages to structured entities
MassIndex crawls the open web, extracts entities and relationships using ML, and assembles them into a searchable knowledge graph. Here's what that gives you.
Four entity types, fully profiled
Every entity extracted from the web gets a structured profile with relevant metadata, linked to related entities in the graph.
Companies
- Industry & sub-industry
- Products & services
- Technologies & certifications
- Key people (executives, founders)
- Location (city, region, country)
- Contact info & social links
- Company size & type
People
- Connected organizations
- Role & title mentions
- Co-occurrence with products
- Location associations
- Cross-referenced across sources
Products
- Linked to maker/company
- Category classification
- Mentioned across articles
- Related technologies
- Market presence signals
Locations
- Companies operating there
- People associated with the area
- Regional industry clusters
- Geographic relationship mapping
What you can do with it
Full-Text Entity Search
Search across three collections — companies, articles, and web pages. Faceted filtering by industry, category, and domain. Highlighted snippets show why each result matched. Autocomplete suggestions as you type.
Interactive Knowledge Graph
Explore relationships visually in a force-directed graph. Filter by entity type, search by name, and traverse up to two degrees of connection. Click any node to see its full properties and connections.
Topic Monitoring & Alerts
Create custom topics with keywords, industry filters, and URL patterns. Assign priority boosts to focus crawling on what matters to you. Subscribe via webhook or email to get notified when new matching entities appear.
Public Entity Pages
Every company and domain gets a server-rendered profile page, discoverable via search engines. Structured data with JSON-LD markup means entities show up in Google with rich metadata.
ML-Powered Extraction
DistilBERT classifiers categorize content. spaCy NER models extract people, organizations, products, and locations from every page. Models are continuously retrained as new data flows in.
Continuous Crawling
A distributed crawler indexes the web around the clock. Configurable politeness, priority queues, and topic-based crawling strategies ensure coverage where it matters. Monitor crawl progress in real time.
Three products, one knowledge graph
Search the graph, subscribe to changes, or point the crawler at what matters to you.
Knowledge Graph Search
Full-text search across companies, people, products, and locations — returning structured entity profiles linked in the knowledge graph.
Live Data Feed
Define topics with keywords and entity filters. Get notified the moment new matching content surfaces from the open web.
Targeted Crawling
Crawl specific domains or URL patterns on demand. Seed URLs and watch entities get extracted into the knowledge graph.
See it for yourself
Search the index, explore the graph, or browse entity profiles.