Skip to content

Commit f1f5a36

Browse files
authored
feat(nep): Data Instrumentation (aws#7109)
## Problem The science & service team in Next Edit Prediction (NEP) project needs initial data for model training and tuning, to provide more contextually relevant suggestions. This requires us to track user edits in the IDE, and send changes (of current active file in unified diff format) to the codeWhisper API. **No impact on user experience.** **This won't live forever, will be migrated to flare by end of May, needed now for science data collection.** ## Solution ### Key Components `PredictionKeyStrokeHandler`: Listens for document changes, maintains shadow copies of visible documents, and processes edits. `PredictionTracker`: Manages file snapshots, implementing a policy for storing, retrieving, and pruning snapshots based on age and memory constraints. `DiffGenerator`: Creates unified diffs between file snapshots, produces `supplementalContext` sent to the API. ### How it Works - The system track shadow copies of editor visible files' content - Once an edit is made to a tracked file, it takes a snapshot of the file content before the edit - When the Inline API fires, the snapshots of the current editing files are used to generate diff context ### Memory management - maxTotalSizeKb (default: 5000): Caps total size of snapshots storage at ~5MB, purging oldest snapshots when exceeded. - debounceIntervalMs (default: 2000): Prevents excessive snapshots by requiring 2 seconds between captures for the same file. - maxAgeMs (default: 30000): Auto-deletes snapshots after 30 seconds to maintain recent-only history. - maxSupplementalContext (default: 15): Limits `supplementalContext` sent to API to 15 entries maximum. ### Changes - Added new NextEditPrediction module in the CodeWhisperer package - Updated activation code to initialize the NEP system - Updated codeWhisper inline API requests to fit new format --- - Treat all work as PUBLIC. Private `feature/x` branches will not be squash-merged at release time. - Your code changes must meet the guidelines in [CONTRIBUTING.md](https://github.com/aws/aws-toolkit-vscode/blob/master/CONTRIBUTING.md#guidelines). - License: I confirm that my contribution is made under the terms of the Apache 2.0 license.
1 parent f30f770 commit f1f5a36

File tree

10 files changed

+933
-3
lines changed

10 files changed

+933
-3
lines changed

packages/core/src/codewhisperer/activation.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ import { SecurityIssueTreeViewProvider } from './service/securityIssueTreeViewPr
9090
import { setContext } from '../shared/vscode/setContext'
9191
import { syncSecurityIssueWebview } from './views/securityIssue/securityIssueWebview'
9292
import { detectCommentAboveLine } from '../shared/utilities/commentUtils'
93+
import { activateEditTracking } from './nextEditPrediction/activation'
9394
import { notifySelectDeveloperProfile } from './region/utils'
9495

9596
let localize: nls.LocalizeFunc
@@ -505,6 +506,8 @@ export async function activate(context: ExtContext): Promise<void> {
505506
})
506507
)
507508
}
509+
510+
activateEditTracking(context)
508511
}
509512

510513
export async function shutdown() {

packages/core/src/codewhisperer/models/constants.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -945,3 +945,10 @@ export const testGenExcludePatterns = [
945945
'**/*.deb',
946946
'**/*.model',
947947
]
948+
949+
export const predictionTrackerDefaultConfig = {
950+
maxStorageSizeKb: 5000,
951+
debounceIntervalMs: 2000,
952+
maxAgeMs: 30000,
953+
maxSupplementalContext: 15,
954+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
/*!
2+
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
import * as vscode from 'vscode'
7+
import { PredictionTracker } from './predictionTracker'
8+
import { PredictionKeyStrokeHandler } from './predictionKeyStrokeHandler'
9+
import { getLogger } from '../../shared/logger/logger'
10+
import { ExtContext } from '../../shared/extensions'
11+
12+
export let predictionTracker: PredictionTracker | undefined
13+
let keyStrokeHandler: PredictionKeyStrokeHandler | undefined
14+
15+
export function activateEditTracking(context: ExtContext): void {
16+
try {
17+
predictionTracker = new PredictionTracker(context.extensionContext)
18+
19+
keyStrokeHandler = new PredictionKeyStrokeHandler(predictionTracker)
20+
context.extensionContext.subscriptions.push(
21+
vscode.Disposable.from({
22+
dispose: () => {
23+
keyStrokeHandler?.dispose()
24+
},
25+
})
26+
)
27+
28+
getLogger('nextEditPrediction').debug('Next Edit Prediction activated')
29+
} catch (error) {
30+
getLogger('nextEditPrediction').error(`Error in activateEditTracking: ${error}`)
31+
}
32+
}
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
/*!
2+
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
import * as diff from 'diff'
7+
import { getLogger } from '../../shared/logger/logger'
8+
import * as codewhispererClient from '../client/codewhisperer'
9+
import { supplementalContextMaxTotalLength, charactersLimit } from '../models/constants'
10+
11+
const logger = getLogger('nextEditPrediction')
12+
13+
/**
14+
* Generates a unified diff format between old and new file contents
15+
*/
16+
function generateUnifiedDiffWithTimestamps(
17+
oldFilePath: string,
18+
newFilePath: string,
19+
oldContent: string,
20+
newContent: string,
21+
oldTimestamp: number,
22+
newTimestamp: number,
23+
contextSize: number = 3
24+
): string {
25+
const patchResult = diff.createTwoFilesPatch(
26+
oldFilePath,
27+
newFilePath,
28+
oldContent,
29+
newContent,
30+
String(oldTimestamp),
31+
String(newTimestamp),
32+
{ context: contextSize }
33+
)
34+
35+
// Remove unused headers
36+
const lines = patchResult.split('\n')
37+
if (lines.length >= 2 && lines[0].startsWith('Index:')) {
38+
lines.splice(0, 2)
39+
return lines.join('\n')
40+
}
41+
42+
return patchResult
43+
}
44+
45+
export interface SnapshotContent {
46+
filePath: string
47+
content: string
48+
timestamp: number
49+
}
50+
51+
/**
52+
* Generates supplemental contexts from snapshot contents and current content
53+
*
54+
* @param filePath - Path to the file
55+
* @param currentContent - Current content of the file
56+
* @param snapshotContents - List of snapshot contents sorted by timestamp (oldest first)
57+
* @param maxContexts - Maximum number of supplemental contexts to return
58+
* @returns Array of SupplementalContext objects, T_0 being the snapshot of current file content:
59+
* U0: udiff of T_0 and T_1
60+
* U1: udiff of T_0 and T_2
61+
* U2: udiff of T_0 and T_3
62+
*/
63+
export function generateDiffContexts(
64+
filePath: string,
65+
currentContent: string,
66+
snapshotContents: SnapshotContent[],
67+
maxContexts: number
68+
): codewhispererClient.SupplementalContext[] {
69+
if (snapshotContents.length === 0) {
70+
return []
71+
}
72+
73+
const supplementalContexts: codewhispererClient.SupplementalContext[] = []
74+
const currentTimestamp = Date.now()
75+
76+
for (let i = snapshotContents.length - 1; i >= 0; i--) {
77+
const snapshot = snapshotContents[i]
78+
try {
79+
const unifiedDiff = generateUnifiedDiffWithTimestamps(
80+
snapshot.filePath,
81+
filePath,
82+
snapshot.content,
83+
currentContent,
84+
snapshot.timestamp,
85+
currentTimestamp
86+
)
87+
88+
supplementalContexts.push({
89+
filePath: snapshot.filePath,
90+
content: unifiedDiff,
91+
type: 'PreviousEditorState',
92+
metadata: {
93+
previousEditorStateMetadata: {
94+
timeOffset: currentTimestamp - snapshot.timestamp,
95+
},
96+
},
97+
})
98+
} catch (err) {
99+
logger.error(`Failed to generate diff: ${err}`)
100+
}
101+
}
102+
103+
const trimmedContext = trimSupplementalContexts(supplementalContexts, maxContexts)
104+
logger.debug(
105+
`supplemental contexts: ${trimmedContext.length} contexts, total size: ${trimmedContext.reduce((sum, ctx) => sum + ctx.content.length, 0)} characters`
106+
)
107+
return trimmedContext
108+
}
109+
110+
/**
111+
* Trims the supplementalContexts array to ensure it doesn't exceed the max number
112+
* of contexts or total character length limit
113+
*
114+
* @param supplementalContexts - Array of SupplementalContext objects (already sorted with newest first)
115+
* @param maxContexts - Maximum number of supplemental contexts allowed
116+
* @returns Trimmed array of SupplementalContext objects
117+
*/
118+
export function trimSupplementalContexts(
119+
supplementalContexts: codewhispererClient.SupplementalContext[],
120+
maxContexts: number
121+
): codewhispererClient.SupplementalContext[] {
122+
if (supplementalContexts.length === 0) {
123+
return supplementalContexts
124+
}
125+
126+
// First filter out any individual context that exceeds the character limit
127+
let result = supplementalContexts.filter((context) => {
128+
return context.content.length <= charactersLimit
129+
})
130+
131+
// Then limit by max number of contexts
132+
if (result.length > maxContexts) {
133+
result = result.slice(0, maxContexts)
134+
}
135+
136+
// Lastly enforce total character limit
137+
let totalLength = 0
138+
let i = 0
139+
140+
while (i < result.length) {
141+
totalLength += result[i].content.length
142+
if (totalLength > supplementalContextMaxTotalLength) {
143+
break
144+
}
145+
i++
146+
}
147+
148+
if (i === result.length) {
149+
return result
150+
}
151+
152+
const trimmedContexts = result.slice(0, i)
153+
return trimmedContexts
154+
}
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
/*!
2+
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
import * as vscode from 'vscode'
7+
import { PredictionTracker } from './predictionTracker'
8+
9+
/**
10+
* Monitors document changes in the editor and track them for prediction.
11+
*/
12+
export class PredictionKeyStrokeHandler {
13+
private disposables: vscode.Disposable[] = []
14+
private tracker: PredictionTracker
15+
private shadowCopies: Map<string, string> = new Map()
16+
17+
/**
18+
* Creates a new PredictionKeyStrokeHandler
19+
* @param context The extension context
20+
* @param tracker The prediction tracker instance
21+
* @param config Configuration options
22+
*/
23+
constructor(tracker: PredictionTracker) {
24+
this.tracker = tracker
25+
26+
// Initialize shadow copies for currently visible editors when extension starts
27+
this.initializeVisibleDocuments()
28+
29+
// Register event handlers
30+
this.registerVisibleDocumentListener()
31+
this.registerTextDocumentChangeListener()
32+
}
33+
34+
/**
35+
* Initializes shadow copies for all currently visible text editors
36+
*/
37+
private initializeVisibleDocuments(): void {
38+
const editors = vscode.window.visibleTextEditors
39+
40+
for (const editor of editors) {
41+
if (editor.document.uri.scheme === 'file') {
42+
this.updateShadowCopy(editor.document)
43+
}
44+
}
45+
}
46+
47+
/**
48+
* Registers listeners for visibility events to maintain shadow copies of document content
49+
* Only store and update shadow copies for currently visible editors
50+
* And remove shadow copies for files that are no longer visible
51+
* And edits are processed only if a shadow copy exists
52+
* This avoids the memory problem if hidden files are bulk edited, i.e. with global find/replace
53+
*/
54+
private registerVisibleDocumentListener(): void {
55+
// Track when documents become visible (switched to)
56+
const visibleDisposable = vscode.window.onDidChangeVisibleTextEditors((editors) => {
57+
const currentVisibleFiles = new Set<string>()
58+
59+
for (const editor of editors) {
60+
if (editor.document.uri.scheme === 'file') {
61+
const filePath = editor.document.uri.fsPath
62+
currentVisibleFiles.add(filePath)
63+
this.updateShadowCopy(editor.document)
64+
}
65+
}
66+
67+
for (const filePath of this.shadowCopies.keys()) {
68+
if (!currentVisibleFiles.has(filePath)) {
69+
this.shadowCopies.delete(filePath)
70+
}
71+
}
72+
})
73+
74+
this.disposables.push(visibleDisposable)
75+
}
76+
77+
private updateShadowCopy(document: vscode.TextDocument): void {
78+
if (document.uri.scheme === 'file') {
79+
this.shadowCopies.set(document.uri.fsPath, document.getText())
80+
}
81+
}
82+
83+
/**
84+
* Registers listener for text document changes to send to tracker
85+
*/
86+
private registerTextDocumentChangeListener(): void {
87+
// Listen for document changes
88+
const changeDisposable = vscode.workspace.onDidChangeTextDocument(async (event) => {
89+
const filePath = event.document.uri.fsPath
90+
const prevContent = this.shadowCopies.get(filePath)
91+
92+
// Skip if there are no content changes or if the file is not visible
93+
if (
94+
event.contentChanges.length === 0 ||
95+
event.document.uri.scheme !== 'file' ||
96+
prevContent === undefined
97+
) {
98+
return
99+
}
100+
101+
await this.tracker.processEdit(event.document, prevContent)
102+
this.updateShadowCopy(event.document)
103+
})
104+
105+
this.disposables.push(changeDisposable)
106+
}
107+
108+
/**
109+
* Disposes of all resources used by this handler
110+
*/
111+
public dispose(): void {
112+
for (const disposable of this.disposables) {
113+
disposable.dispose()
114+
}
115+
this.disposables = []
116+
}
117+
}

0 commit comments

Comments
 (0)