Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Commit 478f31d

Browse files
xfalcoxromanrizzikeegangeorge
authored
FEATURE: add inferred concepts system (#1330)
* FEATURE: add inferred concepts system This commit adds a new inferred concepts system that: - Creates a model for storing concept labels that can be applied to topics - Provides AI personas for finding new concepts and matching existing ones - Adds jobs for generating concepts from popular topics - Includes a scheduled job that automatically processes engaging topics * FEATURE: Extend inferred concepts to include posts * Adds support for concepts to be inferred from and applied to posts * Replaces daily task with one that handles both topics and posts * Adds database migration for posts_inferred_concepts join table * Updates PersonaContext to include inferred concepts Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com> Co-authored-by: Keegan George <kgeorge13@gmail.com>
1 parent 4ce8973 commit 478f31d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2713
-20
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# frozen_string_literal: true
2+
3+
module Jobs
4+
class GenerateInferredConcepts < ::Jobs::Base
5+
sidekiq_options queue: "low"
6+
7+
# Process items to generate new concepts
8+
#
9+
# @param args [Hash] Contains job arguments
10+
# @option args [String] :item_type Required - Type of items to process ('topics' or 'posts')
11+
# @option args [Array<Integer>] :item_ids Required - List of item IDs to process
12+
# @option args [Integer] :batch_size (100) Number of items to process in each batch
13+
# @option args [Boolean] :match_only (false) Only match against existing concepts without generating new ones
14+
def execute(args = {})
15+
return if args[:item_ids].blank? || args[:item_type].blank?
16+
17+
if %w[topics posts].exclude?(args[:item_type])
18+
Rails.logger.error("Invalid item_type for GenerateInferredConcepts: #{args[:item_type]}")
19+
return
20+
end
21+
22+
# Process items in smaller batches to avoid memory issues
23+
batch_size = args[:batch_size] || 100
24+
25+
# Get the list of item IDs
26+
item_ids = args[:item_ids]
27+
match_only = args[:match_only] || false
28+
29+
# Process items in batches
30+
item_ids.each_slice(batch_size) do |batch_item_ids|
31+
process_batch(batch_item_ids, args[:item_type], match_only)
32+
end
33+
end
34+
35+
private
36+
37+
def process_batch(item_ids, item_type, match_only)
38+
klass = item_type.singularize.classify.constantize
39+
items = klass.where(id: item_ids)
40+
manager = DiscourseAi::InferredConcepts::Manager.new
41+
42+
items.each do |item|
43+
begin
44+
process_item(item, item_type, match_only, manager)
45+
rescue => e
46+
Rails.logger.error(
47+
"Error generating concepts from #{item_type.singularize} #{item.id}: #{e.message}\n#{e.backtrace.join("\n")}",
48+
)
49+
end
50+
end
51+
end
52+
53+
def process_item(item, item_type, match_only, manager)
54+
# Use the Manager method that handles both identifying and creating concepts
55+
if match_only
56+
if item_type == "topics"
57+
manager.match_topic_to_concepts(item)
58+
else # posts
59+
manager.match_post_to_concepts(item)
60+
end
61+
else
62+
if item_type == "topics"
63+
manager.generate_concepts_from_topic(item)
64+
else # posts
65+
manager.generate_concepts_from_post(item)
66+
end
67+
end
68+
end
69+
end
70+
end
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# frozen_string_literal: true
2+
3+
module Jobs
4+
class GenerateConceptsFromPopularItems < ::Jobs::Scheduled
5+
every 1.day
6+
7+
# This job runs daily and generates new concepts from popular topics and posts
8+
# It selects items based on engagement metrics and generates concepts from their content
9+
def execute(_args)
10+
return unless SiteSetting.inferred_concepts_enabled
11+
12+
process_popular_topics
13+
process_popular_posts
14+
end
15+
16+
private
17+
18+
def process_popular_topics
19+
# Find candidate topics that are popular and don't have concepts yet
20+
manager = DiscourseAi::InferredConcepts::Manager.new
21+
candidates =
22+
manager.find_candidate_topics(
23+
limit: SiteSetting.inferred_concepts_daily_topics_limit || 20,
24+
min_posts: SiteSetting.inferred_concepts_min_posts || 5,
25+
min_likes: SiteSetting.inferred_concepts_min_likes || 10,
26+
min_views: SiteSetting.inferred_concepts_min_views || 100,
27+
created_after: SiteSetting.inferred_concepts_lookback_days.days.ago,
28+
)
29+
30+
return if candidates.blank?
31+
32+
# Process candidate topics - first generate concepts, then match
33+
Jobs.enqueue(
34+
:generate_inferred_concepts,
35+
item_type: "topics",
36+
item_ids: candidates.map(&:id),
37+
batch_size: 10,
38+
)
39+
40+
if SiteSetting.inferred_concepts_background_match
41+
# Schedule a follow-up job to match existing concepts
42+
Jobs.enqueue_in(
43+
1.hour,
44+
:generate_inferred_concepts,
45+
item_type: "topics",
46+
item_ids: candidates.map(&:id),
47+
batch_size: 10,
48+
match_only: true,
49+
)
50+
end
51+
end
52+
53+
def process_popular_posts
54+
# Find candidate posts that are popular and don't have concepts yet
55+
manager = DiscourseAi::InferredConcepts::Manager.new
56+
candidates =
57+
manager.find_candidate_posts(
58+
limit: SiteSetting.inferred_concepts_daily_posts_limit || 30,
59+
min_likes: SiteSetting.inferred_concepts_post_min_likes || 5,
60+
exclude_first_posts: true,
61+
created_after: SiteSetting.inferred_concepts_lookback_days.days.ago,
62+
)
63+
64+
return if candidates.blank?
65+
66+
# Process candidate posts - first generate concepts, then match
67+
Jobs.enqueue(
68+
:generate_inferred_concepts,
69+
item_type: "posts",
70+
item_ids: candidates.map(&:id),
71+
batch_size: 10,
72+
)
73+
74+
if SiteSetting.inferred_concepts_background_match
75+
# Schedule a follow-up job to match against existing concepts
76+
Jobs.enqueue_in(
77+
1.hour,
78+
:generate_inferred_concepts,
79+
item_type: "posts",
80+
item_ids: candidates.map(&:id),
81+
batch_size: 10,
82+
match_only: true,
83+
)
84+
end
85+
end
86+
end
87+
end

app/models/inferred_concept.rb

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# frozen_string_literal: true
2+
3+
class InferredConcept < ActiveRecord::Base
4+
has_many :inferred_concept_topics
5+
has_many :topics, through: :inferred_concept_topics
6+
7+
has_many :inferred_concept_posts
8+
has_many :posts, through: :inferred_concept_posts
9+
10+
validates :name, presence: true, uniqueness: true
11+
end
12+
13+
# == Schema Information
14+
#
15+
# Table name: inferred_concepts
16+
#
17+
# id :bigint not null, primary key
18+
# name :string not null
19+
# created_at :datetime not null
20+
# updated_at :datetime not null
21+
#
22+
# Indexes
23+
#
24+
# index_inferred_concepts_on_name (name) UNIQUE
25+
#

app/models/inferred_concept_post.rb

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# frozen_string_literal: true
2+
3+
class InferredConceptPost < ActiveRecord::Base
4+
belongs_to :inferred_concept
5+
belongs_to :post
6+
7+
validates :inferred_concept_id, presence: true
8+
validates :post_id, presence: true
9+
validates :inferred_concept_id, uniqueness: { scope: :post_id }
10+
end
11+
12+
# == Schema Information
13+
#
14+
# Table name: inferred_concept_posts
15+
#
16+
# inferred_concept_id :bigint
17+
# post_id :bigint
18+
# created_at :datetime not null
19+
# updated_at :datetime not null
20+
#
21+
# Indexes
22+
#
23+
# index_inferred_concept_posts_on_inferred_concept_id (inferred_concept_id)
24+
# index_inferred_concept_posts_uniqueness (post_id,inferred_concept_id) UNIQUE
25+
#

app/models/inferred_concept_topic.rb

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# frozen_string_literal: true
2+
3+
class InferredConceptTopic < ActiveRecord::Base
4+
belongs_to :inferred_concept
5+
belongs_to :topic
6+
7+
validates :inferred_concept_id, presence: true
8+
validates :topic_id, presence: true
9+
validates :inferred_concept_id, uniqueness: { scope: :topic_id }
10+
end
11+
12+
# == Schema Information
13+
#
14+
# Table name: inferred_concept_topics
15+
#
16+
# inferred_concept_id :bigint
17+
# topic_id :bigint
18+
# created_at :datetime not null
19+
# updated_at :datetime not null
20+
#
21+
# Indexes
22+
#
23+
# index_inferred_concept_topics_on_inferred_concept_id (inferred_concept_id)
24+
# index_inferred_concept_topics_uniqueness (topic_id,inferred_concept_id) UNIQUE
25+
#
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# frozen_string_literal: true
2+
3+
class AiInferredConceptPostSerializer < ApplicationSerializer
4+
attributes :id,
5+
:post_number,
6+
:topic_id,
7+
:topic_title,
8+
:username,
9+
:avatar_template,
10+
:created_at,
11+
:updated_at,
12+
:excerpt,
13+
:truncated,
14+
:inferred_concepts
15+
16+
def avatar_template
17+
User.avatar_template(object.username, object.uploaded_avatar_id)
18+
end
19+
20+
def excerpt
21+
Post.excerpt(object.cooked)
22+
end
23+
24+
def truncated
25+
object.cooked.length > SiteSetting.post_excerpt_maxlength
26+
end
27+
28+
def inferred_concepts
29+
ActiveModel::ArraySerializer.new(
30+
object.inferred_concepts,
31+
each_serializer: InferredConceptSerializer,
32+
)
33+
end
34+
end
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# frozen_string_literal: true
2+
3+
class InferredConceptSerializer < ApplicationSerializer
4+
attributes :id, :name, :created_at, :updated_at
5+
end

assets/javascripts/discourse/components/modal/ai-persona-response-format-editor.gjs

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,20 @@ export default class AiPersonaResponseFormatEditor extends Component {
2222
type: "string",
2323
},
2424
type: {
25+
type: "string",
26+
enum: ["string", "integer", "boolean", "array"],
27+
},
28+
array_type: {
2529
type: "string",
2630
enum: ["string", "integer", "boolean"],
31+
options: {
32+
dependencies: {
33+
type: "array",
34+
},
35+
},
2736
},
2837
},
38+
required: ["key", "type"],
2939
},
3040
};
3141

@@ -41,7 +51,11 @@ export default class AiPersonaResponseFormatEditor extends Component {
4151
const toDisplay = {};
4252

4353
this.args.data.response_format.forEach((keyDesc) => {
44-
toDisplay[keyDesc.key] = keyDesc.type;
54+
if (keyDesc.type === "array") {
55+
toDisplay[keyDesc.key] = `[${keyDesc.array_type}]`;
56+
} else {
57+
toDisplay[keyDesc.key] = keyDesc.type;
58+
}
4559
});
4660

4761
return prettyJSON(toDisplay);

config/locales/server.en.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,15 @@ en:
330330
short_summarizer:
331331
name: "Summarizer (short form)"
332332
description: "Default persona used to power AI short summaries for topic lists' items"
333+
concept_finder:
334+
name: "Concept Finder"
335+
description: "AI Bot specialized in identifying concepts and themes in content"
336+
concept_matcher:
337+
name: "Concept Matcher"
338+
description: "AI Bot specialized in matching content against existing concepts"
339+
concept_deduplicator:
340+
name: "Concept Deduplicator"
341+
description: "AI Bot specialized in deduplicating concepts"
333342
topic_not_found: "Summary unavailable, topic not found!"
334343
summarizing: "Summarizing topic"
335344
searching: "Searching for: '%{query}'"
@@ -549,6 +558,9 @@ en:
549558
discord_search:
550559
name: "Discord Search"
551560
description: "Adds the ability to search Discord channels"
561+
inferred_concepts:
562+
name: "Inferred Concepts"
563+
description: "Classifies topics and posts into areas of interest / labels."
552564

553565
errors:
554566
quota_exceeded: "You have exceeded the quota for this model. Please try again in %{relative_time}."

config/settings.yml

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,3 +417,55 @@ discourse_ai:
417417
default: false
418418
client: false
419419
hidden: true
420+
421+
inferred_concepts_enabled:
422+
default: false
423+
client: true
424+
area: "ai-features/inferred_concepts"
425+
inferred_concepts_background_match:
426+
default: false
427+
client: false
428+
area: "ai-features/inferred_concepts"
429+
inferred_concepts_daily_topics_limit:
430+
default: 20
431+
client: false
432+
area: "ai-features/inferred_concepts"
433+
inferred_concepts_min_posts:
434+
default: 5
435+
client: false
436+
area: "ai-features/inferred_concepts"
437+
inferred_concepts_min_likes:
438+
default: 10
439+
client: false
440+
area: "ai-features/inferred_concepts"
441+
inferred_concepts_min_views:
442+
default: 100
443+
client: false
444+
area: "ai-features/inferred_concepts"
445+
inferred_concepts_lookback_days:
446+
default: 30
447+
client: false
448+
area: "ai-features/inferred_concepts"
449+
inferred_concepts_daily_posts_limit:
450+
default: 30
451+
client: false
452+
area: "ai-features/inferred_concepts"
453+
inferred_concepts_post_min_likes:
454+
default: 5
455+
client: false
456+
area: "ai-features/inferred_concepts"
457+
inferred_concepts_generate_persona:
458+
default: "-15"
459+
type: enum
460+
enum: "DiscourseAi::Configuration::PersonaEnumerator"
461+
area: "ai-features/inferred_concepts"
462+
inferred_concepts_match_persona:
463+
default: "-16"
464+
type: enum
465+
enum: "DiscourseAi::Configuration::PersonaEnumerator"
466+
area: "ai-features/inferred_concepts"
467+
inferred_concepts_deduplicate_persona:
468+
default: "-17"
469+
type: enum
470+
enum: "DiscourseAi::Configuration::PersonaEnumerator"
471+
area: "ai-features/inferred_concepts"

0 commit comments

Comments
 (0)