Using regex to split on whitespace is slow. Write benchmarks for current implementation and then optimize. Possible optimizations:
- Build set as
map[[]byte]struct{}
instead of map[string]struct{}
- Build set concurrently with multiple indices scanning text.
- (For SPDX data) Create index and persist to source at
go generate
time