PyCraft Replication Package

This site contains information about the tool, data and plots for the PyCraft tool submitted to FSE 2024.

Table of Contents

Below, we present four supplementary materials:

  1. Tool
  2. Dataset
  3. Patch Submission
  4. Supplemental Plots

1. Tool

The tool, along with installation and usage instructions can be found here.

2. Dataset

Table 1

The table below adds details to the table-1 described in the paper.

Please scroll right to view the entire table.

Each number inside the table links to a JSON list containing the corresponding data. Each element in the JSON list is a piece of code, in the form of a string.

Here is a sample JSON list:

[
 "count = 0\nfor i in int_list:\n    count += i",
 "import numpy as np\ncount = np.sum(int_list)",
 "count = sum(int_list)",
 .
 .
 .
]
CPAT Number CPAT Name LHS RHS Variants
Total Correct Useful Applicable
1numpy-sum
count = 0
for i in int_list:
   count = count + i
import numpy as np
count = np.sum(int_list)
1185 291 83 50
2dict-update
for k, v in add_dict.items():
    d[k] = v 
d.update(add_dict)
1201 478 119 110
3set-intersection
common = []
for i in l1:
    if i in l2 and i not in common:
        common.append(i)
common = list(set(l1).
            intersection(l2))
782 287 107 66
4string-join
string = "["
for idx, item in enumerate(values):
    if idx != 0:
        string += ", "
    string += item
string += "]"
string = "[" + ", ".join(values)+ "]"
285 101 20 10
5dict-setdefault
d = {}
for i in array:
  if i in d:
    d[i].append(f(i))
  else:
    d[i] = [f(i)]
d = {}
for i in array:
    d.setdefault(i, []).append(f(i))
1265 416 150 75
6collections-counter
counts = {}
for i in iterable:
    if i not in counts:
        counts[i] = 0
    counts[i] += 1
from collections import Counter
counts = Counter(iterable)
927 425 202 85
7numpy-cumsum
cum_arr = []
for i in range(len(array)):
    cum_arr.append(sum(array[:i+1]))
import numpy as np
cum_arr = np.cumsum(array)
1223 290 95 80
8numpy-dot
dot_prod = 0
for i in range(len(arr1)):
    dot_prod += arr1[i] * arr2[i]
import numpy as np
dot_prod = np.dot(arr1, arr2)
177 28 26 24
9numpy-add
result = []
for i in range(len(array1)):
    result.append(array1[i] + array2[i])
import numpy as np
result = np.add(array1, array2)
64 11 11 9
10list-comprehension
t = []
for i in range(len(elem)):
    if cond(elem[i]):
         t.append(elem[i])  
t = [elem[i] for i in range(len(elem)) 
        if cond(elem[i])]
955 453 226 71
11context-manager-tempfile
import tempfile
temp_dir = tempfile.TemporaryDirectory()
file=temp_dir.name+"/features.json"
f=open(file, 'w')
f.write(content)
temp_dir.cleanup()
with tempfile.TemporaryDirectory() as temp_dir:
    file=temp_dir+"/features.json"
    f=open(file, 'w')
    f.write(content)
524 150 - -
12np-mean
mean = sum(arr1)/len(arr1)
import numpy as np
mean=np.mean(arr1)
482 97 - -
13np-multidot
import numpy as np
result = np.dot(np.dot(arr1, arr2), arr3)
import numpy as np
result = np.linalg.multi_dot([arr1, arr2, arr3])
439 0 - -
14assign-multiple-targets
a=x
b=y
c=z
a,b,c = x,y,z
437 11 - -
15swapping-variables
temp = a
a = b
b = temp
a,b = b,a
385 77 - -
16non-zero-compare
val=val1
if (number_value!=0):
     val=val2
val=val1
if (bool(number_value)):
     val=val2
406 69 - -
17getattr
try:
  n = obj.name
except:
  n = "unknown"
n = getattr(obj, 'name', 'unknown')
451 109 - -
18is-instance
val=val1
if type(int_instance) is int:
    val=val2
val=val1
if isinstance(int_instance, int):
    val=val2
465 39 - -
19file-context-manager
file = open(file_path, 'r')
contents = file.read()
file.close()
with open(file_path, 'r') as f:
    contents = f.read()
565 20 - -
20any-func
if(c1 or c2 or c3 or c4):
  value = val1
else:
  value = val2
if any((c1,c2,c3,c4)):
  value = val1
else:
  value = val2
434 136 - -

RQ1

In RQ1, we quantitatively assess the ability of LLMs to generate variants.

RQ2

In RQ2, we quantitatively assess the ability of LLMs to generate testcases.

RQ3

In this research question, we find the best performing parameters for generating variants (gpt-3.5). Below, we provide the oracle used to make these decisions.

Each csv file linked below contains these 4 columns:

  1. ‘variant’: The variant generated by the LLM
  2. ‘temperature-iterations’: The temperature, iterations settings used to generate the variant
  3. ‘useful’: True/False value indicating whether a real developer would write such a variant.
  4. ‘applicable’: Aligns with the intent of the CPAT and is ‘useful’

Data:

RQ4

In this research question, we find the best performing parameters for generating testcases (gpt-3.5).

The links below contain data in the form of a JSON list. Each element of the list contains a JSON object which represents a test case. Here is an example:

[
  {
    "init": "int_list=[]",
    "assertion": "assert count == 0"
  },
  .
  .
  .
]  

The key “init” contains a piece of code to initialise the input variables. The key “assertion” contains assertion statements to validate the correctness of the variant.

Handwritten test cases can be found here. These tests were used to benchmark the performance of LLMs.

Testcases generated by GPT-3.5

Testcases generated by GPT-4

3. Patch Submission

We submitted refactoring patches to open source. Below is the status of the pull requests we submitted.

Repository Pull Request Link Status
lucidrains/audiolm-pytorch pull/228 Approved
undertheseanlp/underthesea pull/713 Approved
mlcommons/GaNDLF pull/719 Approved
matciotola/Z-PNN pull/4 Approved
alexandra-chron/hierarchical-domain-adaptation pull/7 Approved
chrhenning/hypnettorch pull/7 Approved
Beyond-ML-Labs/BeyondML pull/43 Approved
StanleyLsx/entity_extractor_by_pointer pull/8 Approved
ChrisWu1997/DeepCAD pull/16 Approved
githubharald/SimpleHTR pull/164 Approved
NeuroTorch/NeuroTorch pull/140 Approved
nod-ai/SHARK pull/1817 Approved
Spico197/DocEE pull/67 Approved
alteryx/featuretools pull/2607 Approved
akkana/scripts pull/27 Approved
IDEA-Research/detrex pull/305 Approved
microsoft/DeepSpeed pull/4262 Approved
EdisonLeeeee/GreatX pull/13 Approved
shibing624/similarities pull/15 Approved
artitw/text2text pull/42 Approved
microsoft/archai pull/245 Approved
autonomousvision/unimatch pull/33 Approved
AlexsLemonade/refinebio pull/3369 Approved
airaria/TextPruner pull/18 Approved
IBM/inFairness pull/68 Approved
opendr-eu/opendr pull/455 Approved
mit-han-lab/proxylessnas pull/10 Approved
BYU-PRISM/GEKKO pull/168 Approved
TheAlgorithms/Python pull/8987 Approved
SPFlow/SPFlow pull/133 Approved
nltk/nltk pull/3183 OPEN
pytorch/audio pull/3576 OPEN
pytorch/tutorials pull/2547 OPEN
pytorch/torchrec pull/1373 OPEN
microsoft/MMdnn pull/945 OPEN
NVIDIA-Merlin/NVTabular pull/1861 OPEN
dmlc/dgl pull/6285 OPEN
xlang-ai/UnifiedSKG pull/40 CLOSED
netsharecmu/NetShare pull/32 CLOSED
keras-team/keras pull/18360 CLOSED
AIRI-Institute/Probing_framework pull/131 Approved
TorchSSL/TorchSSL pull/72 OPEN
DerrickXuNu/OpenCOOD pull/107 OPEN
adalca/neurite pull/74 OPEN

4. Supplemental Plots

Supplemental plots can be found here.

We provide additional plots for Figure 5 and Figure 7 in the paper.