ChatGPT: fluff or not? SpecBDD 19/04/2023
Now that a couple of months have passed since its over hyped launch, surely ChatGPT has found some use cases where it could be of any actual use. Or is it all fluff? Let's find out.
In my quest to find a use for ChatGPT in my day to day developer activity, I've finally found a helpful scenario!
Let me walk you through my findings about ChatGPT and its "Spec Gen" capacity, so we can learn a thing or two.
Spec Gen
When tasked to write a feature, I first write a unit test that specifies the behaviour, for example:
<?php
namespace spec\Vendor\Project;
use Vendor\Project\Service\Filesystem;
use Vendor\Project\File;
use PhpSpec\ObjectBehavior;
class TextEditorSpec extends ObjectBehavior
{
private const FILENAME = '/tmp/file.txt';
private const FORCE_FILE_CREATION = true;
function let(Filesystem $filesystem): void
{
$this->beConstructedWith($filesystem);
}
function it_creates_new_files(File $file, Filesystem $filesystem): void
{
$filesystem->exists(self::FILENAME)->willReturn(false);
$filesystem->create(self::FILENAME)->willReturn($file);
$this->open(self::FILENAME, self::FORCE_FILE_CREATION)->shouldBe($file);
}
}
When using phpspec, especially with the extension spec-gen, running the test will bootstrap for me the corresponding class:
<?php
namespace Vendor\Project;
use Vendor\Project\Service\Filesystem;
class TextEditor
{
private $filesystem;
public function __construct(Filesytem $filesystem)
{
$this->filesystem = $filesystem;
}
public function open(string $argument1, bool $argument2)
{
}
}
That's pretty good, but it could be better: what if it was able to generate the complete code?
GPT Gen
Let's forget about phpspec and use instead PHPUnit, but still with the Prophecy Mocking framework.
Let's ask ChatGPT. Here's the priming prompt:
USER: Act as an expert PHP developer, who's been writing high quality API code for 20 years. Acknowledge with "ok".
Because ChatGPT's Large Language Model (LLM) is able to generate a wide range of text (conversation, essays, movie script, poetry, etc), assigning it a role helps it narrow down its context, hence improving its accuracy for the task we want it to complete: writing PHP code, on a professional level (persona/role prompt engineering technique).
We also need to consider the Goldfish Memory issue of ChatGPT: it only takes into account the last ~4000 tokens to generate its response, so asking it to give us a short reply helps us save some space.
ChatGPT: ok
LLMs perform better when given some examples, and indeed before asking it to generate code we might want to give ChatGPT some of our code samples (X-Shot prompt engineering technique).
Since coding also requires some level of logic, it is also helpful to include reasoning steps (Chain of Thought prompt engineering technique).
Similarly, asking the LLM to first generate some topic related knowledge, and then reuse that in the further prompts will also increase our success rate (Generated Knowledge prompt engineering technique).
With this in mind, here's the second prompt that provides a test class example:
USER: Here's an example of test class from the project you're working on:
<?php
declare(strict_types=1);
namespace tests\Ssc\Btlr\Cht\Message\Templates\Prompts;
use Ssc\Btlr\App\Filesystem\ReadFile;
use Ssc\Btlr\App\Template\Replace;
use Ssc\Btlr\Cht\Message\Logs\Type;
use Ssc\Btlr\Cht\Message\Logs\WriteLog;
use Ssc\Btlr\Cht\Message\Templates\Prompts\Template;
use tests\Ssc\Btlr\AppTest\BtlrServiceTestCase;
class TemplateTest extends BtlrServiceTestCase
{
/**
* @test
*/
public function it_creates_prompt_from_template(): void
{
// Fixtures
$thoseParameters = [
'last_messages' => 'USER (1968-04-02T18:40:23+00:00): Write code for me, please',
];
$forType = Type::AUGMENTED_PROMPT;
$withConfig = [
'chunk_memory_size' => 15,
'llm_engine' => 'chatgpt-gpt-3.5-turbo',
'logs_filename' => './var/cht/logs',
'prompt_templates_filename' => './templates/cht/prompts',
];
$template = "LAST MESSAGES:\n%last_messages%\nBTLR:\n";
$prompt = "LAST MESSAGES:\n{$thoseParameters['last_messages']}\nBTLR:\n";
$data = [
'entry' => $prompt,
];
// Dummies
$readFile = $this->prophesize(ReadFile::class);
$replace = $this->prophesize(Replace::class);
$writeLog = $this->prophesize(WriteLog::class);
// Stubs & Mocks
$readFile->in("{$withConfig['prompt_templates_filename']}/{$forType['name']}.txt")
->willReturn($template);
$replace->in($template, $thoseParameters)
->willReturn($prompt);
$writeLog->for($data, $forType, $withConfig)
->shouldBeCalled();
// Assertion
$template = new Template(
$readFile->reveal(),
$replace->reveal(),
$writeLog->reveal(),
);
self::assertSame($prompt, $template->replace(
$thoseParameters,
$forType,
$withConfig,
));
}
}
USER: Acknowledge by describing how the use of fixtures, dummies, stubs, mocks and assertions allow to make sure the tested code is behaving as expected. And note the use of the special BtlrServiceTestCase parent class for
TemplateTest
.
By the way, the code is from my Super Secret Code project Btlr.
Here's ChatGPT's response:
ChatGPT: In this test class, fixtures are used to provide the necessary data for testing the
Template
class. The parameters and configuration are defined in$thoseParameters
and$withConfig
respectively, and the template and prompt strings are defined in$template
and$prompt
.Dummies are used to provide the interface of the
ReadFile
,Replace
, andWriteLog
classes, without providing any actual implementation, since these classes are not the focus of the test.Stubs are used to replace the implementation of
ReadFile
andReplace
classes, by returning a fixed value when their respective methods are called. This ensures that the expected values are used for testing theTemplate
class.Mocks are used to verify that the
WriteLog
class is called with the expected parameters. TheshouldBeCalled()
method ensures that thefor()
method ofWriteLog
class is actually called during the test.Finally, the assertion compares the result returned by the
replace()
method of theTemplate
class with the expected prompt string.The
BtlrServiceTestCase
class is a special parent class that sets up a test environment for the Btlr service, which can include initializing dependencies, connecting to databases, and setting up configurations. This helps to ensure that tests are run in a consistent and isolated environment, and makes it easier to write tests for the Btlr service.
ChatGPT doesn't have access to the project and its code, it only knows what we've provided in the conversation so far.
So for example it took a wild guess at what BtlrServiceTestCase
does and got
it wrong, but for our purpose this is good enough.
The third prompt is an example of the corresponding class:
USER: When tasked with generating the corresponding
Template
class, from the aboveTemplateTest
test class, here's the code you wrote:
<?php
declare(strict_types=1);
namespace Ssc\Btlr\Cht\Message\Templates\Prompts;
use Ssc\Btlr\App\Filesystem\ReadFile;
use Ssc\Btlr\App\Template\Replace;
use Ssc\Btlr\Cht\Message\Logs\WriteLog;
class Template
{
public function __construct(
private ReadFile $readFile,
private Replace $replace,
private WriteLog $writeLog,
) {
}
public function replace(
array $thoseParameters,
array $forType,
array $withConfig,
): string {
$template = $this->readFile->in(
"{$withConfig['prompt_templates_filename']}/{$forType['name']}.txt",
);
$prompt = $this->replace->in($template, $thoseParameters);
$this->writeLog->for([
'entry' => $prompt,
], $forType, $withConfig);
return $prompt;
}
}
USER: Acknowledge by describing how the test class got translated into its corresponding class.
To which ChatGPT replies:
ChatGPT: The
TemplateTest
test class served as a guide for implementing theTemplate
class.First, the constructor of
Template
class takes three dependencies, which are also injected in theTemplateTest
class as dummies or mocks.Next, the
replace()
method ofTemplate
class has the same signature as the test methodit_creates_prompt_from_template()
, with the addition of a return type declaration for the prompt string.The implementation of
replace()
method inTemplate
class uses the injectedReadFile
instance to read the template string from the filesystem, using the$forType['name']
parameter to construct the file path. It then uses the injectedReplace
instance to replace placeholders in the template with values provided in the$thoseParameters
array, resulting in a$prompt
string.Finally, the method calls the injected
WriteLog
instance with the$prompt
,$forType
, and$withConfig
parameters, which logs the prompt for the given type and configuration.Overall, the implementation of the
Template
class closely follows the structure of theTemplateTest
class, and the use of dependency injection ensures that the class can be easily tested in isolation from its dependencies.
With ChatGPT noting that the corresponding class closely follows the structure of its test, it should be now ready.
Let's give it a new test class, and ask it to generate the corresponding code:
USER: Brilliant. Now, in the same coding style as
TemplateTest
andTemplate
, you are tasked to write a the corresponding code for the followingReplyTest
test class:
<?php
declare(strict_types=1);
namespace tests\Ssc\Btlr\Cht\Message;
use Ssc\Btlr\Cht\Message\Logs\Type;
use Ssc\Btlr\Cht\Message\Logs\WriteLog;
use Ssc\Btlr\Cht\Message\Memory\Consolidate;
use Ssc\Btlr\Cht\Message\Reply;
use Ssc\Btlr\Cht\Message\Reply\Augment;
use Ssc\Btlr\Cht\Message\Reply\UsingLlm;
use tests\Ssc\Btlr\AppTest\BtlrServiceTestCase;
class ReplyTest extends BtlrServiceTestCase
{
/**
* @test
*/
public function it_replies_to_user_prompt_using_llm(): void
{
// Fixtures
$userPrompt = 'Write code for me, please';
$withConfig = [
'chunk_memory_size' => 15,
'llm_engine' => 'chatgpt-gpt-3.5-turbo',
'logs_filename' => './var/cht/logs',
'prompt_templates_filename' => './templates/cht/prompts',
];
$userPromptData = [
'entry' => $userPrompt,
];
$augmentedPrompt = "USER: {$userPrompt}\nBLTR:";
$modelCompletion = "I'm sorry, dev. I'm afraid I can't do that.";
$modelCompletionData = [
'entry' => $modelCompletion,
'llm_engine' => $withConfig['llm_engine'],
];
// Dummies
$augment = $this->prophesize(Augment::class);
$consolidate = $this->prophesize(Consolidate::class);
$usingLlm = $this->prophesize(UsingLlm::class);
$writeLog = $this->prophesize(WriteLog::class);
// Stubs & Mocks
$writeLog->for($userPromptData, Type::USER_PROMPT, $withConfig)
->shouldBeCalled();
$augment->the($userPrompt, $withConfig)
->willReturn($augmentedPrompt);
$usingLlm->complete($augmentedPrompt)
->willReturn($modelCompletion);
$writeLog->for($modelCompletionData, Type::MODEL_COMPLETION, $withConfig)
->shouldBeCalled();
$consolidate->memories($withConfig)
->shouldBeCalled();
// Assertion
$reply = new Reply(
$augment->reveal(),
$consolidate->reveal(),
$usingLlm->reveal(),
$writeLog->reveal(),
);
self::assertSame($modelCompletion, $reply->to(
$userPrompt,
$withConfig,
));
}
}
USER: Acknowledge by generating the code for the
Reply
class.
And here's what ChatGPT came up with:
ChatGPT: Sure thing! Here's the code for the
Reply
class based on theReplyTest
test class:
<?php
declare(strict_types=1);
namespace Ssc\Btlr\Cht\Message;
use Ssc\Btlr\Cht\Message\Logs\Type;
use Ssc\Btlr\Cht\Message\Logs\WriteLog;
use Ssc\Btlr\Cht\Message\Memory\Consolidate;
use Ssc\Btlr\Cht\Message\Reply\Augment;
use Ssc\Btlr\Cht\Message\Reply\UsingLlm;
class Reply
{
public function __construct(
private Augment $augment,
private Consolidate $consolidate,
private UsingLlm $usingLlm,
private WriteLog $writeLog,
) {
}
public function to(string $userPrompt, array $withConfig): string
{
$userPromptData = [
'entry' => $userPrompt,
];
// Augment the user prompt
$augmentedPrompt = $this->augment->the($userPrompt, $withConfig);
// Use LLM to generate a model completion
$modelCompletion = $this->usingLlm->complete($augmentedPrompt);
// Log the user prompt and model completion
$this->writeLog->for($userPromptData, Type::USER_PROMPT, $withConfig);
$modelCompletionData = [
'entry' => $modelCompletion,
'llm_engine' => $withConfig['llm_engine'],
];
$this->writeLog->for($modelCompletionData, Type::MODEL_COMPLETION, $withConfig);
// Consolidate the memories
$this->consolidate->memories($withConfig);
return $modelCompletion;
}
}
ChatGPT: This
Reply
class has ato
method that takes a user prompt and a configuration array as input, and returns the model completion generated by LLM. TheReply
class constructor takes in four dependencies:Augment
,Consolidate
,UsingLlm
, andWriteLog
, and stores them as private properties. Theto
method then uses these dependencies to augment the user prompt, generate the model completion, log both the user prompt and the model completion, and consolidate the memories.
Which is pretty much spot on.
I'd personnally have written something slightly different (no comments, have
writeLog
for the user prompt at the begining of the method, and possibly
not have any intermediate variables).
But I can easily apply those changes!
Takeways
Thanks to Prompt Engineering (with techniques such as Role / Persona priming prompt, X-Shot, Chain of Thought and Generated Knowledge), it is possible to ask ChatGPT to generate code.
I'm personally using the following prompts to generate the corresponding code for a given Test Class.
Priming Prompt:
Act as an expert PHP developer, who's been writing high quality API code for 20 years. Acknowledge with "ok".
Test Class example prompt:
Here's an example of test class from the project you're working on:
%example_test_class_code%
Acknowledge by describing how the use of fixtures, dummies, stubs, mocks and assertions allow to make sure the tested code is behaving as expected. And note the use of the special BtlrServiceTestCase parent class for
%example_test_class_name%
.
Corresponding class example prompt:
When tasked with generating the corresponding
%example_class_name%
class, from the above%%example_test_class_name
test class, here's the code you wrote:
%example_class_code%
Acknowledge by describing how the test class got translated into its corresponding class.
Code generation request from Test Class prompt:
Brilliant. Now, in the same coding style as
%example_test_class_name%
and%example_class_name%
, you are tasked to write a the corresponding code for the following%test_class_name%
test class:
%test_class_code%
Acknowledge by generating the code for the
%class_name%
class.
Now, what if you're not practicing Test Driven Development?
I suppose you can tweek it to get ChatGPT to generate tests for you:
- Use the same priming prompt
- Then use the class example prompt
- Next use the "corresponding" test example prompt
- Finally provide a class and ask ChatGPT to generate its corresponding test
All in all, the exploration of ChatGPT's text generation potential turned out to be quite worthwhile, as I ended up finding a practical use case for it in my day to day developer activity.
Yet I suspect there's plenty more scenarions where ChatGPT can help, so I guess the quest continues.