Introduction

The Interpretability API provides access to language model inference with layer-level activation caching and intervention capabilities.

Authentication

All requests require an API key passed via the x-api-key header.

POST /completions

Generate text completions with optional layer caching and interventions.

Request Body

Parameter	Type	Default	Description
model	string	tiiuae/Falcon3-7B-Base	HuggingFace model identifier
input	string	The CN Tower is located in	Input text prompt
cache_layers	array[int]	[1, 2, 3]	Layer indices to cache activations from
interventions	object	{}	Layer-wise interventions to apply
max_tokens	integer	1	Maximum new tokens to generate
temperature	float	1.0	Sampling temperature
return_logits	boolean	true	Whether to return logits

Response

Field	Type	Description
output	string	Decoded output text
sampled_tokens	array	Token IDs sampled during generation
layer_activations	object	Cached activations from specified layers
logits	array	Log softmax probabilities (if requested)

Interventions

Intervention Structure

Interventions allow you to modify layer activations during inference. Each intervention specifies an operation and value to apply to a specific layer.

{
  "interventions": {
    "1": {
      "operation": "multiply",
      "value": 2
    },
    "2": {
      "operation": "add",
      "value": [[1]*3072]*6
    }
  }
}

Request

curl -X POST https://api.alignarena.com/completions \
  -H "Content-Type: application/json" \
  -H "x-api-key: ALIGNARENA_API_KEY"
  -d '{
    "model": "tiiuae/Falcon3-7B-Base",
    "input": "The CN Tower is located in",
    "cache_layers": [1, 2, 3],
    "interventions":{
    "1": {
      "operation": "multiply",
      "value": 2
    },
    "2": {
      "operation": "add",
      "value": <vector of d_model * seq_length , e.g [[1]*3072]*6>
    }
  }
    "max_tokens": 1,
    "return_logits": true
  }