PyCraft Replication Package

This site contains information about the tool, data and plots for the PyCraft tool submitted to FSE 2024.

1. Tool

The tool, along with installation and usage instructions can be found here.

2. Dataset

Table 1

The table below adds details to the table-1 described in the paper.

Please scroll right to view the entire table.

Each number inside the table links to a JSON list containing the corresponding data. Each element in the JSON list is a piece of code, in the form of a string.

Here is a sample JSON list:

[
 "count = 0\nfor i in int_list:\n    count += i",
 "import numpy as np\ncount = np.sum(int_list)",
 "count = sum(int_list)",
 .
 .
 .
]



import numpy as np
count = np.sum(int_list)
d.update(add_dict)
common = list(set(l1).
            intersection(l2))
string = "[" + ", ".join(values)+ "]"
d = {}
for i in array:
    d.setdefault(i, []).append(f(i))
from collections import Counter
counts = Counter(iterable)
import numpy as np
cum_arr = np.cumsum(array)
import numpy as np
dot_prod = np.dot(arr1, arr2)
import numpy as np
result = np.add(array1, array2)
t = [elem[i] for i in range(len(elem)) 
        if cond(elem[i])]
with tempfile.TemporaryDirectory() as temp_dir:
    file=temp_dir+"/features.json"
    f=open(file, 'w')
    f.write(content)
import numpy as np
mean=np.mean(arr1)
import numpy as np
result = np.linalg.multi_dot([arr1, arr2, arr3])
a,b,c = x,y,z
a,b = b,a
val=val1
if (bool(number_value)):
     val=val2
n = getattr(obj, 'name', 'unknown')
val=val1
if isinstance(int_instance, int):
    val=val2
with open(file_path, 'r') as f:
    contents = f.read()
if any((c1,c2,c3,c4)):
  value = val1
else:
  value = val2

CPAT Number	CPAT Name	LHS	RHS	Variants
Total	Correct	Useful	Applicable
1	numpy-sum	count = 0 for i in int_list: count = count + i	import numpy as np count = np.sum(int_list)	1185	291	83	50
2	dict-update	for k, v in add_dict.items(): d[k] = v	d.update(add_dict)	1201	478	119	110
3	set-intersection	common = [] for i in l1: if i in l2 and i not in common: common.append(i)	common = list(set(l1). intersection(l2))	782	287	107	66
4	string-join	string = "[" for idx, item in enumerate(values): if idx != 0: string += ", " string += item string += "]"	string = "[" + ", ".join(values)+ "]"	285	101	20	10
5	dict-setdefault	d = {} for i in array: if i in d: d[i].append(f(i)) else: d[i] = [f(i)]	d = {} for i in array: d.setdefault(i, []).append(f(i))	1265	416	150	75
6	collections-counter	counts = {} for i in iterable: if i not in counts: counts[i] = 0 counts[i] += 1	from collections import Counter counts = Counter(iterable)	927	425	202	85
7	numpy-cumsum	cum_arr = [] for i in range(len(array)): cum_arr.append(sum(array[:i+1]))	import numpy as np cum_arr = np.cumsum(array)	1223	290	95	80
8	numpy-dot	dot_prod = 0 for i in range(len(arr1)): dot_prod += arr1[i] * arr2[i]	import numpy as np dot_prod = np.dot(arr1, arr2)	177	28	26	24
9	numpy-add	result = [] for i in range(len(array1)): result.append(array1[i] + array2[i])	import numpy as np result = np.add(array1, array2)	64	11	11	9
10	list-comprehension	t = [] for i in range(len(elem)): if cond(elem[i]): t.append(elem[i])	t = [elem[i] for i in range(len(elem)) if cond(elem[i])]	955	453	226	71
11	context-manager-tempfile	import tempfile temp_dir = tempfile.TemporaryDirectory() file=temp_dir.name+"/features.json" f=open(file, 'w') f.write(content) temp_dir.cleanup()	with tempfile.TemporaryDirectory() as temp_dir: file=temp_dir+"/features.json" f=open(file, 'w') f.write(content)	524	150	-	-
12	np-mean	mean = sum(arr1)/len(arr1)	import numpy as np mean=np.mean(arr1)	482	97	-	-
13	np-multidot	import numpy as np result = np.dot(np.dot(arr1, arr2), arr3)	import numpy as np result = np.linalg.multi_dot([arr1, arr2, arr3])	439	0	-	-
14	assign-multiple-targets	a=x b=y c=z	a,b,c = x,y,z	437	11	-	-
15	swapping-variables	temp = a a = b b = temp	a,b = b,a	385	77	-	-
16	non-zero-compare	val=val1 if (number_value!=0): val=val2	val=val1 if (bool(number_value)): val=val2	406	69	-	-
17	getattr	try: n = obj.name except: n = "unknown"	n = getattr(obj, 'name', 'unknown')	451	109	-	-
18	is-instance	val=val1 if type(int_instance) is int: val=val2	val=val1 if isinstance(int_instance, int): val=val2	465	39	-	-
19	file-context-manager	file = open(file_path, 'r') contents = file.read() file.close()	with open(file_path, 'r') as f: contents = f.read()	565	20	-	-
20	any-func	if(c1 or c2 or c3 or c4): value = val1 else: value = val2	if any((c1,c2,c3,c4)): value = val1 else: value = val2	434	136	-	-

RQ1

In RQ1, we quantitatively assess the ability of LLMs to generate variants.

RQ2

In RQ2, we quantitatively assess the ability of LLMs to generate testcases.

RQ3

In this research question, we find the best performing parameters for generating variants (gpt-3.5). Below, we provide the oracle used to make these decisions.

Each csv file linked below contains these 4 columns:

‘variant’: The variant generated by the LLM
‘temperature-iterations’: The temperature, iterations settings used to generate the variant
‘useful’: True/False value indicating whether a real developer would write such a variant.
‘applicable’: Aligns with the intent of the CPAT and is ‘useful’

Data:

RQ4

In this research question, we find the best performing parameters for generating testcases (gpt-3.5).

The links below contain data in the form of a JSON list. Each element of the list contains a JSON object which represents a test case. Here is an example:

[
  {
    "init": "int_list=[]",
    "assertion": "assert count == 0"
  },
  .
  .
  .
]  

The key “init” contains a piece of code to initialise the input variables. The key “assertion” contains assertion statements to validate the correctness of the variant.

Handwritten test cases can be found here. These tests were used to benchmark the performance of LLMs.

Testcases generated by GPT-3.5

Testcases generated by GPT-4

3. Patch Submission

We submitted refactoring patches to open source. Below is the status of the pull requests we submitted.

Repository	Pull Request Link	Status
lucidrains/audiolm-pytorch	pull/228	Approved
undertheseanlp/underthesea	pull/713	Approved
mlcommons/GaNDLF	pull/719	Approved
matciotola/Z-PNN	pull/4	Approved
alexandra-chron/hierarchical-domain-adaptation	pull/7	Approved
chrhenning/hypnettorch	pull/7	Approved
Beyond-ML-Labs/BeyondML	pull/43	Approved
StanleyLsx/entity_extractor_by_pointer	pull/8	Approved
ChrisWu1997/DeepCAD	pull/16	Approved
githubharald/SimpleHTR	pull/164	Approved
NeuroTorch/NeuroTorch	pull/140	Approved
nod-ai/SHARK	pull/1817	Approved
Spico197/DocEE	pull/67	Approved
alteryx/featuretools	pull/2607	Approved
akkana/scripts	pull/27	Approved
IDEA-Research/detrex	pull/305	Approved
microsoft/DeepSpeed	pull/4262	Approved
EdisonLeeeee/GreatX	pull/13	Approved
shibing624/similarities	pull/15	Approved
artitw/text2text	pull/42	Approved
microsoft/archai	pull/245	Approved
autonomousvision/unimatch	pull/33	Approved
AlexsLemonade/refinebio	pull/3369	Approved
airaria/TextPruner	pull/18	Approved
IBM/inFairness	pull/68	Approved
opendr-eu/opendr	pull/455	Approved
mit-han-lab/proxylessnas	pull/10	Approved
BYU-PRISM/GEKKO	pull/168	Approved
TheAlgorithms/Python	pull/8987	Approved
SPFlow/SPFlow	pull/133	Approved
nltk/nltk	pull/3183	OPEN
pytorch/audio	pull/3576	OPEN
pytorch/tutorials	pull/2547	OPEN
pytorch/torchrec	pull/1373	OPEN
microsoft/MMdnn	pull/945	OPEN
NVIDIA-Merlin/NVTabular	pull/1861	OPEN
dmlc/dgl	pull/6285	OPEN
xlang-ai/UnifiedSKG	pull/40	CLOSED
netsharecmu/NetShare	pull/32	CLOSED
keras-team/keras	pull/18360	CLOSED
AIRI-Institute/Probing_framework	pull/131	Approved
TorchSSL/TorchSSL	pull/72	OPEN
DerrickXuNu/OpenCOOD	pull/107	OPEN
adalca/neurite	pull/74	OPEN

4. Supplemental Plots

Supplemental plots can be found here.

We provide additional plots for Figure 5 and Figure 7 in the paper.

Table of Contents