Table of Contents
Below, we present four supplementary materials:
1. Tool
The tool, along with installation and usage instructions can be found here.
2. Dataset
Table 1
The table below adds details to the table-1 described in the paper.
Please scroll right to view the entire table.
Each number inside the table links to a JSON list containing the corresponding data. Each element in the JSON list is a piece of code, in the form of a string.
Here is a sample JSON list:
[
"count = 0\nfor i in int_list:\n count += i",
"import numpy as np\ncount = np.sum(int_list)",
"count = sum(int_list)",
.
.
.
]
CPAT Number | CPAT Name | LHS | RHS | Variants | |||
---|---|---|---|---|---|---|---|
Total | Correct | Useful | Applicable | ||||
1 | numpy-sum |
count = 0 for i in int_list: count = count + i |
import numpy as np count = np.sum(int_list) |
1185 | 291 | 83 | 50 |
2 | dict-update |
for k, v in add_dict.items(): d[k] = v |
d.update(add_dict) |
1201 | 478 | 119 | 110 |
3 | set-intersection |
common = [] for i in l1: if i in l2 and i not in common: common.append(i) |
common = list(set(l1). intersection(l2)) |
782 | 287 | 107 | 66 |
4 | string-join |
string = "[" for idx, item in enumerate(values): if idx != 0: string += ", " string += item string += "]" |
string = "[" + ", ".join(values)+ "]" |
285 | 101 | 20 | 10 |
5 | dict-setdefault |
d = {} for i in array: if i in d: d[i].append(f(i)) else: d[i] = [f(i)] |
d = {} for i in array: d.setdefault(i, []).append(f(i)) |
1265 | 416 | 150 | 75 |
6 | collections-counter |
counts = {} for i in iterable: if i not in counts: counts[i] = 0 counts[i] += 1 |
from collections import Counter counts = Counter(iterable) |
927 | 425 | 202 | 85 |
7 | numpy-cumsum |
cum_arr = [] for i in range(len(array)): cum_arr.append(sum(array[:i+1])) |
import numpy as np cum_arr = np.cumsum(array) |
1223 | 290 | 95 | 80 |
8 | numpy-dot |
dot_prod = 0 for i in range(len(arr1)): dot_prod += arr1[i] * arr2[i] |
import numpy as np dot_prod = np.dot(arr1, arr2) |
177 | 28 | 26 | 24 |
9 | numpy-add |
result = [] for i in range(len(array1)): result.append(array1[i] + array2[i]) |
import numpy as np result = np.add(array1, array2) |
64 | 11 | 11 | 9 |
10 | list-comprehension |
t = [] for i in range(len(elem)): if cond(elem[i]): t.append(elem[i]) |
t = [elem[i] for i in range(len(elem)) if cond(elem[i])] |
955 | 453 | 226 | 71 |
11 | context-manager-tempfile |
import tempfile temp_dir = tempfile.TemporaryDirectory() file=temp_dir.name+"/features.json" f=open(file, 'w') f.write(content) temp_dir.cleanup() |
with tempfile.TemporaryDirectory() as temp_dir: file=temp_dir+"/features.json" f=open(file, 'w') f.write(content) |
524 | 150 | - | - |
12 | np-mean |
mean = sum(arr1)/len(arr1) |
import numpy as np mean=np.mean(arr1) |
482 | 97 | - | - |
13 | np-multidot |
import numpy as np result = np.dot(np.dot(arr1, arr2), arr3) |
import numpy as np result = np.linalg.multi_dot([arr1, arr2, arr3]) |
439 | 0 | - | - |
14 | assign-multiple-targets |
a=x b=y c=z |
a,b,c = x,y,z |
437 | 11 | - | - |
15 | swapping-variables |
temp = a a = b b = temp |
a,b = b,a |
385 | 77 | - | - |
16 | non-zero-compare |
val=val1 if (number_value!=0): val=val2 |
val=val1 if (bool(number_value)): val=val2 |
406 | 69 | - | - |
17 | getattr |
try: n = obj.name except: n = "unknown" |
n = getattr(obj, 'name', 'unknown') |
451 | 109 | - | - |
18 | is-instance |
val=val1 if type(int_instance) is int: val=val2 |
val=val1 if isinstance(int_instance, int): val=val2 |
465 | 39 | - | - |
19 | file-context-manager |
file = open(file_path, 'r') contents = file.read() file.close() |
with open(file_path, 'r') as f: contents = f.read() |
565 | 20 | - | - |
20 | any-func |
if(c1 or c2 or c3 or c4): value = val1 else: value = val2 |
if any((c1,c2,c3,c4)): value = val1 else: value = val2 |
434 | 136 | - | - |
RQ1
In RQ1, we quantitatively assess the ability of LLMs to generate variants.
RQ2
In RQ2, we quantitatively assess the ability of LLMs to generate testcases.
RQ3
In this research question, we find the best performing parameters for generating variants (gpt-3.5). Below, we provide the oracle used to make these decisions.
Each csv file linked below contains these 4 columns:
- ‘variant’: The variant generated by the LLM
- ‘temperature-iterations’: The temperature, iterations settings used to generate the variant
- ‘useful’: True/False value indicating whether a real developer would write such a variant.
- ‘applicable’: Aligns with the intent of the CPAT and is ‘useful’
Data:
RQ4
In this research question, we find the best performing parameters for generating testcases (gpt-3.5).
The links below contain data in the form of a JSON list. Each element of the list contains a JSON object which represents a test case. Here is an example:
[
{
"init": "int_list=[]",
"assertion": "assert count == 0"
},
.
.
.
]
The key “init” contains a piece of code to initialise the input variables. The key “assertion” contains assertion statements to validate the correctness of the variant.
Handwritten test cases can be found here. These tests were used to benchmark the performance of LLMs.
Testcases generated by GPT-3.5
Testcases generated by GPT-4
3. Patch Submission
We submitted refactoring patches to open source. Below is the status of the pull requests we submitted.
Repository | Pull Request Link | Status |
---|---|---|
lucidrains/audiolm-pytorch | pull/228 | Approved |
undertheseanlp/underthesea | pull/713 | Approved |
mlcommons/GaNDLF | pull/719 | Approved |
matciotola/Z-PNN | pull/4 | Approved |
alexandra-chron/hierarchical-domain-adaptation | pull/7 | Approved |
chrhenning/hypnettorch | pull/7 | Approved |
Beyond-ML-Labs/BeyondML | pull/43 | Approved |
StanleyLsx/entity_extractor_by_pointer | pull/8 | Approved |
ChrisWu1997/DeepCAD | pull/16 | Approved |
githubharald/SimpleHTR | pull/164 | Approved |
NeuroTorch/NeuroTorch | pull/140 | Approved |
nod-ai/SHARK | pull/1817 | Approved |
Spico197/DocEE | pull/67 | Approved |
alteryx/featuretools | pull/2607 | Approved |
akkana/scripts | pull/27 | Approved |
IDEA-Research/detrex | pull/305 | Approved |
microsoft/DeepSpeed | pull/4262 | Approved |
EdisonLeeeee/GreatX | pull/13 | Approved |
shibing624/similarities | pull/15 | Approved |
artitw/text2text | pull/42 | Approved |
microsoft/archai | pull/245 | Approved |
autonomousvision/unimatch | pull/33 | Approved |
AlexsLemonade/refinebio | pull/3369 | Approved |
airaria/TextPruner | pull/18 | Approved |
IBM/inFairness | pull/68 | Approved |
opendr-eu/opendr | pull/455 | Approved |
mit-han-lab/proxylessnas | pull/10 | Approved |
BYU-PRISM/GEKKO | pull/168 | Approved |
TheAlgorithms/Python | pull/8987 | Approved |
SPFlow/SPFlow | pull/133 | Approved |
nltk/nltk | pull/3183 | OPEN |
pytorch/audio | pull/3576 | OPEN |
pytorch/tutorials | pull/2547 | OPEN |
pytorch/torchrec | pull/1373 | OPEN |
microsoft/MMdnn | pull/945 | OPEN |
NVIDIA-Merlin/NVTabular | pull/1861 | OPEN |
dmlc/dgl | pull/6285 | OPEN |
xlang-ai/UnifiedSKG | pull/40 | CLOSED |
netsharecmu/NetShare | pull/32 | CLOSED |
keras-team/keras | pull/18360 | CLOSED |
AIRI-Institute/Probing_framework | pull/131 | Approved |
TorchSSL/TorchSSL | pull/72 | OPEN |
DerrickXuNu/OpenCOOD | pull/107 | OPEN |
adalca/neurite | pull/74 | OPEN |
4. Supplemental Plots
Supplemental plots can be found here.
We provide additional plots for Figure 5 and Figure 7 in the paper.