This is LLMigrate, a library to transfer tests using LLMs. Inspired by CraftDroid. This is tested with the environment below:
- MacOS, Python 3.12.0
- node v20.12.1, appium v2.5.4, uiautomator2@2.34.0
- Android Studio Iguana | 2023.2.1 Patch 1
- Pixel 5 Emulator | API_29 (arm64-v8a)
- Nexus 5X Emulator | API_23 (arm64-v8a)
- We have evaluated this tool on the Subject APKs
- All conversations with the LLM alongside screenshots and XML dumps at each step are available at Experiment Output
- Individual analysis of time and tokens alongside TP, FP and FN counts are available at LLMigrate Individual Migration Metrics
- Details of all transfers are available at LLMigrate Evaluation Google Sheet
- We strongly suggest to use a python virtual env for running the project, for easy setup of a virtual env head over to the great document from freeCodeCamp
- python 3.12.0
- Run the
sh scripts/set-up.sh
this installs all dependencies and createexperiment/output
andexperiment/cache
folder. - Appium v2.5.4 with UiAutomator2 Driver installed.
Make sure to install Java and correctly set
JAVA_HOME
,ANDROID_HOME
and other variables, find issues with appium-doctor - Android Studio Iguana | 2023.2.1 Patch 1
- Download the Subject APKs and
git clone
this project. - Install appium-uiautomator2-server apps based on UIAutomator version that you are using on your AVD. (Also accessible in Subject APKs)
- Install subject apps on the emulator; we suggest starting with the apps under
a2
orc5
to avoid some network issues of apps - Start the emulator and start appium server on your terminal
- fill the
config.json
file like theconfig.example.json
, removeorganization
if you don't want to use it. - Run main.py with arguments:
python src/main.py --appium-port 4723 \
--category c2 \
--test-id t1 \
--source-test a1 \
--target-app a3 \
--llm gpt-4o
You should see the following output:
Running in transfer mode, for category c2 and on test t1
Transferring from app a1 to app a3 with LLM gpt-4o
Max wrong tries at the same step: 3
Majority total run: 3, Majority threshold: 2
The above scripts transfers t1
from a1
application into a3
application and save the results into test-repo/c2/t1/generated/a1-a3.json
.
There is also a sample script provided in evaluation.py
.
web_element_monkey_patch
folder enables the augmentation of a test script by monkey-patching common user interaction functions, such as click
, send_keys
, and swipe
, to capture and store detailed information about the actions performed on UI elements during automated testing with Appium. The captured data is saved in JSON format.
By using monkey patching, the tool hooks into existing methods like click
and send_keys
, without modifying the original behavior of Appium and Selenium WebDriver methods. This approach provides a non-intrusive way to extend the functionality of the test script.
- Action Capture: Captures key interactions like clicks, text inputs, and long presses, and stores relevant element attributes for each action.
- Element State Tracking: Tracks the presence and invisibility of elements as the script waits for certain UI elements to load or disappear.
- Custom Actions: Supports gestures like swipe right and pull-to-refresh, which are essential for mobile testing.
- System Events: Handles system actions such as pressing the Android back button.
- Oracle Events: Collects information about the state of UI elements, such as waiting for their presence or invisibility, to enhance oracle-based assertions.
To use the tool, follow these steps:
Examples are available at test-repo/test-scripts/c2/t2/base
-
Monkey Patching the Test Driver: Before running your test, invoke the
monkey_patch(driver)
function. This function patches various WebDriver and Appium methods to collect UI interaction data automatically as the test runs. The patched methods do the following:- Capture element attributes.
- Store augmented steps with additional information about the action performed (click, send_keys, swipe, etc.).
Example:
import web_element_monkey_patch # Initialize the Appium driver driver = webdriver.Remote('address', desired_capabilities) # Or use our UI Automator ui_automator = UIAutomator( appium_port='4723', app_package='com.rubenroy.minimaltodo', app_activity='com.rubenroy.minimaltodo.MainActivity', reset=True, ) driver = ui_automator.driver # Apply the monkey patching to the driver web_element_monkey_patch.monkey_patch(driver)
-
Perform Actions in Your Test: Execute your test as you normally would. The patched methods will automatically capture each interaction and store the data in a global list.
-
End the Test and Save Data: After the test has finished, call the
end_test(save_address)
function to save the augmented steps as a JSON file. This function stores all collected actions in a specified file path.Example:
web_element_monkey_patch.end_test('test-repo/test-scripts/c2/t2/generated/a1.json')
-
Review the JSON Output: The output JSON will contain a series of steps with the following information:
- Attributes of the UI elements (such as resource-id, class, text).
- Actions performed (click, send_keys, etc.).
- Additional Information, like event type (GUI interaction or system event) or oracle-related data.
This JSON file can be used for post-test analysis and test migration.
The following methods are monkey-patched:
click()
: Captures click actions on elements.send_keys()
: Captures text input actions.press_keycode()
: Captures system key events (e.g., Android back button).presence_of_element_located()
: Captures events where the script waits for an element to be present.invisibility_of_element_located()
: Captures events where the script waits for an element to disappear.long_click()
: Captures long-press actions.swipe_right()
: Captures swipe right gestures.pull_to_refresh()
: Simulates pull-to-refresh actions in mobile apps.
Each of these methods collects and stores the relevant attributes of the UI element involved in the action.
Note that there are many ways to perform actions like long_click
, swipe_right
and pull_to_refresh
but if you want those actions to get captured using our augmentation tool you should call them directly from patched driver
and WebElement
. Examples are available at test-repo/test-scripts/c2/t2/base
.
Output files of each run will be stored in this folder.
Each run has its own subfolder like c1/t1/a1-a2
, which means that for the category c1
, transferring test t1
from a1
to a2
.
state_[cycle_step].xml
: current state (XML dump) on the current cyclegoal_prompt.txt
: first prompt for generating the goal from source testgoal_prompt_response.txt
: response of the goal promptevent_prompt_[cycle_step].txt
: prompt that we give on the current cycle (could be initial, repair or next_step)event_prompt_response_[cycle_step].txt
: response of the prompt that we give on the current cycleimage_description_prompt_[cycle_step].txt
: prompt for current state screenshot analysisimage_description_prompt_response_[cycle_step].txt
: response of the prompt that we give for screenshot analysis
Pure responses from LLM will be saved into this folder if you run the tool with --clear-cache false
the tool will read the cache files in order for each prompt and returns them. This is helpful for reducing cost when you want to debug parts of the tool unrelated to the prompts.
data-output
folder contains the results from either dynamic data analysis of the tool or static data from other sources for comparing the results against them.
result-analysis
: for each category and each test likec1-t1
it contains the results for transfers in the category and the test.
We use this file to access applications meta data.
app_id
: corresponding application idpackage
: corresponding application package nameactivity
: the main activity of the applicationreset
: if True means should restart the app data each time
Some tests could be run on dynamic environments like when you want to transfer sending an email test in one application you want to send it to yourself but in another application you logged in with different email address and you want to send it to another email in these scenarios you can specify the values using values.json
.
You can append the values using --values [values json file name]
command.
There is no restriction on not using the values.json
file, you can always try transferring the tests with static values. LLMigrate is designed to generate random values if it can not find proper values from the source test in the target application.
Some tests need preparation before transferring a test on it the preparation events are handy in these scenarios.
Some categories like c10
have complicated preparation steps, because of that we didn't add them as json files, you should open the apps without internet select them as default messenger app and create a chat with the number +1234567890
and then you can transfer the test.
-
Create your own category: Add a folder like c6 in
/test-repo
and createconfig.csv
file in it -
Your csv file should have columns of app_id, package, activity, reset.
-
app_id is the id that you assign to your app and the tool uses this to find out which test is for which app, for example
c1/t1/base/a1.json
is a base test for app_id 1. -
package is the package name of the application.
-
activity is the first activity that the tool should launch (you can specify this activity by analyzing your app using an app analyzer for android, our suggested app is available in the Subject APKs)
- reset is whether or not the app should wipe data on each transfer or on fixing issues phase.
-
-
Add your base test on your folder like this
c1/t1/base/a1.json
. (You can use Appium Inspector for writing your tests, Each test should be a json object, take a look at c1-c5 based tests examples for help.) You can also use test augmentation tool to generate augmented tests from appium test scripts. -
If your app needs preparation before running the exact test add steps in a file like
c1/t1/base/a1-prepare.json
. -
Install apps on your emulator and run an appium server.
-
Run the tool for your category, test and target application, the results will be saved on a folder like
c1/t1/generate/a2.json
based on your category, test id and target app id.
-
On
c6
you need to run the transfers without internet connection on the emulator/device. a1 needs to be connected to internet to open it and then close the ad on some steps. -
On
c7
you should login into the apps and remove your chats before a transfer. -
On
c9
remove the earlier notes before a transfer and turn off the internet connection on the emulator/device. On a1 you need internet connection to make it work. -
On
c10
you need to make the target app your default messenger app and turn off the internet connection on the emulator/device.
I've installed an application from Google Play to my emulator, how can I export the APK and add to the test-repo?
Using the APK Analyzer application (APK of this app exists on the Subject APKs folder) first export the APK and save it to downloads folder inside your emulator.
Run this command to find your file path:
adb shell
go to the sdcard/Download
folder and find your apk:
cd sdcard/Download
ls
exit the shell and run this command:
adb pull /sdcard/Download/your-apk.apk ./desired/path/
this will save a copy of the APK file in the second path. (sometimes this copy is not installable!)
How to fix socket hang up Cannot invoke method android.app.UiAutomation androidx.test.uiautomator.UiDevice.getUiAutomation()?
Refer to Issue 20394 on appium github. most of the times wiping the data from your emulator will fix the issue based on our experience.
On the artifacts and generated tests I can see that the website URL on transfers on a3 is different than others, why is that?
That's because in our last experiment with the applications suddenly the a3 app stopped working with the uci.edu
website and it was giving 403, we have modified the tests to use em.uci.edu
instead. There is absolutely no difference between the tests and the only change we made is the URL and next oracle content-desc. We changed these tests on source application of others and transferred them to target application.
This is the same case for a2-a1
transfer, There is modified folder inside the test-repo/c1/t2
which you can check it out and see how we changed the tests.
In some cases, capturing screenshots is not allowed in apps due to security reasons, which are mostly observed in authentication-related screens that include user-sensitive data such as passwords; for these rare cases, we have developed a fallback approach that only uses XML layout hierarchy as the current app state.
For making the artifacts anonymous we have removed the user-specific data like email, name, etc. from the artifacts. For matching purposes we replaced special data with strings like ___x1___
in every artifact. We have blurred the image artifacts to make sure that the special data is not visible to the human readers, but if you want to match data in prompts and generated events you can use strings like ___xi___
.
This repository contains the implementation of the LLMigrate tool, as presented in our paper:
Automated Test Transfer Across Android Apps Using Large Language Models
Benyamin Beyzaei, Saghar Talebipour, Ghazal Rafiei, Nenad Medvidovic, Sam Malek
If you use LLMigrate in your research, please cite our paper:
@misc{beyzaei2024automatedtesttransferandroid,
title={Automated Test Transfer Across Android Apps Using Large Language Models},
author={Benyamin Beyzaei and Saghar Talebipour and Ghazal Rafiei and Nenad Medvidovic and Sam Malek},
year={2024},
eprint={2411.17933},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2411.17933},
}