LLMigrate

This is LLMigrate, a library to transfer tests using LLMs. Inspired by CraftDroid. This is tested with the environment below:

MacOS, Python 3.12.0
node v20.12.1, appium v2.5.4, uiautomator2@2.34.0
Android Studio Iguana | 2023.2.1 Patch 1
Pixel 5 Emulator | API_29 (arm64-v8a)
Nexus 5X Emulator | API_23 (arm64-v8a)

Artifacts

We have evaluated this tool on the Subject APKs
All conversations with the LLM alongside screenshots and XML dumps at each step are available at Experiment Output
Individual analysis of time and tokens alongside TP, FP and FN counts are available at LLMigrate Individual Migration Metrics
Details of all transfers are available at LLMigrate Evaluation Google Sheet

Prerequisites

We strongly suggest to use a python virtual env for running the project, for easy setup of a virtual env head over to the great document from freeCodeCamp
python 3.12.0
Run the sh scripts/set-up.sh this installs all dependencies and create experiment/output and experiment/cache folder.
Appium v2.5.4 with UiAutomator2 Driver installed. Make sure to install Java and correctly set JAVA_HOME, ANDROID_HOME and other variables, find issues with appium-doctor
Android Studio Iguana | 2023.2.1 Patch 1
Download the Subject APKs and git clone this project.
Install appium-uiautomator2-server apps based on UIAutomator version that you are using on your AVD. (Also accessible in Subject APKs)

Getting Started

Install subject apps on the emulator; we suggest starting with the apps under a2 or c5 to avoid some network issues of apps
Start the emulator and start appium server on your terminal
fill the config.json file like the config.example.json, remove organization if you don't want to use it.
Run main.py with arguments:

python src/main.py --appium-port 4723 \
    --category c2 \
    --test-id t1 \
    --source-test a1 \
    --target-app a3 \
    --llm gpt-4o

You should see the following output:

Running in transfer mode, for category c2 and on test t1
Transferring from app a1 to app a3 with LLM gpt-4o
Max wrong tries at the same step: 3
Majority total run: 3, Majority threshold: 2

The above scripts transfers t1 from a1 application into a3 application and save the results into test-repo/c2/t1/generated/a1-a3.json.

There is also a sample script provided in evaluation.py.

Augmented Test Script with Monkey Patching for Appium

web_element_monkey_patch folder enables the augmentation of a test script by monkey-patching common user interaction functions, such as click, send_keys, and swipe, to capture and store detailed information about the actions performed on UI elements during automated testing with Appium. The captured data is saved in JSON format.

By using monkey patching, the tool hooks into existing methods like click and send_keys, without modifying the original behavior of Appium and Selenium WebDriver methods. This approach provides a non-intrusive way to extend the functionality of the test script.

Key Features

Action Capture: Captures key interactions like clicks, text inputs, and long presses, and stores relevant element attributes for each action.
Element State Tracking: Tracks the presence and invisibility of elements as the script waits for certain UI elements to load or disappear.
Custom Actions: Supports gestures like swipe right and pull-to-refresh, which are essential for mobile testing.
System Events: Handles system actions such as pressing the Android back button.
Oracle Events: Collects information about the state of UI elements, such as waiting for their presence or invisibility, to enhance oracle-based assertions.

Usage

To use the tool, follow these steps: Examples are available at test-repo/test-scripts/c2/t2/base

Monkey Patching the Test Driver: Before running your test, invoke the monkey_patch(driver) function. This function patches various WebDriver and Appium methods to collect UI interaction data automatically as the test runs. The patched methods do the following:

Capture element attributes.
Store augmented steps with additional information about the action performed (click, send_keys, swipe, etc.).

Example:

import web_element_monkey_patch

# Initialize the Appium driver
driver = webdriver.Remote('address', desired_capabilities)

# Or use our UI Automator
ui_automator = UIAutomator(
 appium_port='4723',
 app_package='com.rubenroy.minimaltodo',
 app_activity='com.rubenroy.minimaltodo.MainActivity',
 reset=True,
)

driver = ui_automator.driver

# Apply the monkey patching to the driver
web_element_monkey_patch.monkey_patch(driver)

Perform Actions in Your Test: Execute your test as you normally would. The patched methods will automatically capture each interaction and store the data in a global list.
End the Test and Save Data: After the test has finished, call the end_test(save_address) function to save the augmented steps as a JSON file. This function stores all collected actions in a specified file path.

Example:
```
web_element_monkey_patch.end_test('test-repo/test-scripts/c2/t2/generated/a1.json')
```
Review the JSON Output: The output JSON will contain a series of steps with the following information:
- Attributes of the UI elements (such as resource-id, class, text).
- Actions performed (click, send_keys, etc.).
- Additional Information, like event type (GUI interaction or system event) or oracle-related data.
This JSON file can be used for post-test analysis and test migration.

Monkey Patching Details

The following methods are monkey-patched:

click(): Captures click actions on elements.
send_keys(): Captures text input actions.
press_keycode(): Captures system key events (e.g., Android back button).
presence_of_element_located(): Captures events where the script waits for an element to be present.
invisibility_of_element_located(): Captures events where the script waits for an element to disappear.
long_click(): Captures long-press actions.
swipe_right(): Captures swipe right gestures.
pull_to_refresh(): Simulates pull-to-refresh actions in mobile apps.

Each of these methods collects and stores the relevant attributes of the UI element involved in the action.

Note that there are many ways to perform actions like long_click, swipe_right and pull_to_refresh but if you want those actions to get captured using our augmentation tool you should call them directly from patched driver and WebElement. Examples are available at test-repo/test-scripts/c2/t2/base.

FAQ

What would be generated into the `experiment/output`?

Output files of each run will be stored in this folder.

Each run has its own subfolder like c1/t1/a1-a2, which means that for the category c1, transferring test t1 from a1 to a2.

state_[cycle_step].xml: current state (XML dump) on the current cycle
goal_prompt.txt: first prompt for generating the goal from source test
goal_prompt_response.txt: response of the goal prompt
event_prompt_[cycle_step].txt: prompt that we give on the current cycle (could be initial, repair or next_step)
event_prompt_response_[cycle_step].txt: response of the prompt that we give on the current cycle
image_description_prompt_[cycle_step].txt: prompt for current state screenshot analysis
image_description_prompt_response_[cycle_step].txt: response of the prompt that we give for screenshot analysis

What would be generated into the `experiment/cache`?

Pure responses from LLM will be saved into this folder if you run the tool with --clear-cache false the tool will read the cache files in order for each prompt and returns them. This is helpful for reducing cost when you want to debug parts of the tool unrelated to the prompts.

What would be generated into the `data-output`?

data-output folder contains the results from either dynamic data analysis of the tool or static data from other sources for comparing the results against them.

result-analysis: for each category and each test like c1-t1 it contains the results for transfers in the category and the test.

What's the meaning of the column in file `test-repo/c[i]/config.csv`?

We use this file to access applications meta data.

app_id: corresponding application id
package: corresponding application package name
activity: the main activity of the application
reset: if True means should restart the app data each time

What's the `values.json` in some test like `c3/t1/base/values.json`?

Some tests could be run on dynamic environments like when you want to transfer sending an email test in one application you want to send it to yourself but in another application you logged in with different email address and you want to send it to another email in these scenarios you can specify the values using values.json.

You can append the values using --values [values json file name] command.

There is no restriction on not using the values.json file, you can always try transferring the tests with static values. LLMigrate is designed to generate random values if it can not find proper values from the source test in the target application.

What's the `a5-prepare` in some test like `c3/t1/base/a5-prepare.json`?

Some tests need preparation before transferring a test on it the preparation events are handy in these scenarios.

Some categories like c10 have complicated preparation steps, because of that we didn't add them as json files, you should open the apps without internet select them as default messenger app and create a chat with the number +1234567890 and then you can transfer the test.

How to start transferring new tests in new applications?

Create your own category: Add a folder like c6 in /test-repo and create config.csv file in it
Your csv file should have columns of app_id, package, activity, reset.
- app_id is the id that you assign to your app and the tool uses this to find out which test is for which app, for example c1/t1/base/a1.json is a base test for app_id 1.
- package is the package name of the application.
- activity is the first activity that the tool should launch (you can specify this activity by analyzing your app using an app analyzer for android, our suggested app is available in the Subject APKs)
- reset is whether or not the app should wipe data on each transfer or on fixing issues phase.
Add your base test on your folder like this c1/t1/base/a1.json. (You can use Appium Inspector for writing your tests, Each test should be a json object, take a look at c1-c5 based tests examples for help.) You can also use test augmentation tool to generate augmented tests from appium test scripts.
If your app needs preparation before running the exact test add steps in a file like c1/t1/base/a1-prepare.json.
Install apps on your emulator and run an appium server.
Run the tool for your category, test and target application, the results will be saved on a folder like c1/t1/generate/a2.json based on your category, test id and target app id.

Manual work required for transfers

On c6 you need to run the transfers without internet connection on the emulator/device. a1 needs to be connected to internet to open it and then close the ad on some steps.
On c7 you should login into the apps and remove your chats before a transfer.
On c9 remove the earlier notes before a transfer and turn off the internet connection on the emulator/device. On a1 you need internet connection to make it work.
On c10 you need to make the target app your default messenger app and turn off the internet connection on the emulator/device.

I've installed an application from Google Play to my emulator, how can I export the APK and add to the test-repo?

Using the APK Analyzer application (APK of this app exists on the Subject APKs folder) first export the APK and save it to downloads folder inside your emulator.

Run this command to find your file path:

adb shell

go to the sdcard/Download folder and find your apk:

cd sdcard/Download
ls

exit the shell and run this command:

adb pull /sdcard/Download/your-apk.apk ./desired/path/

this will save a copy of the APK file in the second path. (sometimes this copy is not installable!)

How to fix socket hang up Cannot invoke method android.app.UiAutomation androidx.test.uiautomator.UiDevice.getUiAutomation()?

Refer to Issue 20394 on appium github. most of the times wiping the data from your emulator will fix the issue based on our experience.

On the artifacts and generated tests I can see that the website URL on transfers on a3 is different than others, why is that?

That's because in our last experiment with the applications suddenly the a3 app stopped working with the uci.edu website and it was giving 403, we have modified the tests to use em.uci.edu instead. There is absolutely no difference between the tests and the only change we made is the URL and next oracle content-desc. We changed these tests on source application of others and transferred them to target application.

This is the same case for a2-a1 transfer, There is modified folder inside the test-repo/c1/t2 which you can check it out and see how we changed the tests.

On the artifacts why some screenshots are missing while the XML output is present?

In some cases, capturing screenshots is not allowed in apps due to security reasons, which are mostly observed in authentication-related screens that include user-sensitive data such as passwords; for these rare cases, we have developed a fallback approach that only uses XML layout hierarchy as the current app state.

Note about anonymous artifacts

For making the artifacts anonymous we have removed the user-specific data like email, name, etc. from the artifacts. For matching purposes we replaced special data with strings like ___x1___ in every artifact. We have blurred the image artifacts to make sure that the special data is not visible to the human readers, but if you want to match data in prompts and generated events you can use strings like ___xi___.

Citation

This repository contains the implementation of the LLMigrate tool, as presented in our paper:

Automated Test Transfer Across Android Apps Using Large Language Models

Benyamin Beyzaei, Saghar Talebipour, Ghazal Rafiei, Nenad Medvidovic, Sam Malek

arXiv:2411.17933

If you use LLMigrate in your research, please cite our paper:

@misc{beyzaei2024automatedtesttransferandroid,
      title={Automated Test Transfer Across Android Apps Using Large Language Models},
      author={Benyamin Beyzaei and Saghar Talebipour and Ghazal Rafiei and Nenad Medvidovic and Sam Malek},
      year={2024},
      eprint={2411.17933},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2411.17933},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
data-output/result-analysis		data-output/result-analysis
prompt		prompt
scripts		scripts
src		src
test-repo		test-repo
web_element_monkey_patch		web_element_monkey_patch
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
config.example.json		config.example.json
evaluation.py		evaluation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMigrate

Artifacts

Prerequisites

Getting Started

Augmented Test Script with Monkey Patching for Appium

Key Features

Usage

Monkey Patching Details

FAQ

What would be generated into the `experiment/output`?

What would be generated into the `experiment/cache`?

What would be generated into the `data-output`?

What's the meaning of the column in file `test-repo/c[i]/config.csv`?

What's the `values.json` in some test like `c3/t1/base/values.json`?

What's the `a5-prepare` in some test like `c3/t1/base/a5-prepare.json`?

How to start transferring new tests in new applications?

Manual work required for transfers

I've installed an application from Google Play to my emulator, how can I export the APK and add to the test-repo?

How to fix socket hang up Cannot invoke method android.app.UiAutomation androidx.test.uiautomator.UiDevice.getUiAutomation()?

On the artifacts and generated tests I can see that the website URL on transfers on a3 is different than others, why is that?

On the artifacts why some screenshots are missing while the XML output is present?

Note about anonymous artifacts

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

seal-hub/llmigrate

Folders and files

Latest commit

History

Repository files navigation

LLMigrate

Artifacts

Prerequisites

Getting Started

Augmented Test Script with Monkey Patching for Appium

Key Features

Usage

Monkey Patching Details

FAQ

What would be generated into the experiment/output?

What would be generated into the experiment/cache?

What would be generated into the data-output?

What's the meaning of the column in file test-repo/c[i]/config.csv?

What's the values.json in some test like c3/t1/base/values.json?

What's the a5-prepare in some test like c3/t1/base/a5-prepare.json?

How to start transferring new tests in new applications?

Manual work required for transfers

I've installed an application from Google Play to my emulator, how can I export the APK and add to the test-repo?

How to fix socket hang up Cannot invoke method android.app.UiAutomation androidx.test.uiautomator.UiDevice.getUiAutomation()?

On the artifacts and generated tests I can see that the website URL on transfers on a3 is different than others, why is that?

On the artifacts why some screenshots are missing while the XML output is present?

Note about anonymous artifacts

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

What would be generated into the `experiment/output`?

What would be generated into the `experiment/cache`?

What would be generated into the `data-output`?

What's the meaning of the column in file `test-repo/c[i]/config.csv`?

What's the `values.json` in some test like `c3/t1/base/values.json`?

What's the `a5-prepare` in some test like `c3/t1/base/a5-prepare.json`?

Packages