Towards a Fully Autonomous Pentester

System

NSC Web

Front Page

Getting Access

Support Email

support@nsc.liu.se

Feedback

Give Feedback

Towards a Fully Autonomous Pentester

Title:	Towards a Fully Autonomous Pentester
DNr:	Berzelius-2024-125
Project Type:	LiU Berzelius
Principal Investigator:	Christian Gehrmann <christian.gehrmann@eit.lth.se>
Affiliation:	Lunds universitet
Duration:	2024-03-28 – 2024-10-01
Classification:	20206
Keywords:

Abstract

The purpose of this proposal is to request the necessary GPU resources to support the development and evaluation of three new Large Language Models (LLMs) for automated penetration testing: WizardRed, DeepseekRed, and OpenCodeRed. These models are based on state-of-the-art open-source LLMs (WizardCoder, DeepseekCoder, and OpenCodeInterpreter) and require significant computational resources for fine-tuning and benchmarking. Justification: - Fine-tuning LLMs: Fine-tuning LLMs is essential to adapt them for specific tasks like penetration testing. This process requires additional training on task-specific datasets, which is computationally intensive and necessitates the use of GPUs to accelerate the training process. Fine-tuning ensures that the models can effectively identify vulnerabilities, generate exploit code, and provide actionable insights for remediation. - Benchmarking and evaluation: To assess the performance of WizardRed, DeepseekRed, and OpenCodeRed, we need to conduct comprehensive benchmarking and evaluation. This involves comparing their performance against industry-standard penetration testing tools, as well as evaluating their accuracy, efficiency, and effectiveness in identifying and exploiting vulnerabilities across a wide range of target systems and applications. Conducting these assessments requires running the models on various test cases and datasets, which demands significant GPU resources to complete in a timely manner. - Model size and complexity: The foundational models for WizardRed, DeepseekRed, and OpenCodeRed are large and complex (WizardCoder, DeepseekCoder, and OpenCodeInterpreter), with billions of parameters and extensive training on diverse datasets. Fine-tuning and adapting these models for penetration testing tasks will likely result in models of similar or greater size and complexity, necessitating powerful GPU resources to train and run efficiently.

National Supercomputer Centre at Linköping University

Abstract