Ollama Integration
Mockstack provides integration with Ollama, allowing you to use real LLM responses in your mock templates. This is particularly useful for development, debugging, and integration testing scenarios where you want to capture the non-deterministic nature of LLM responses.
Prerequisites
To use the Ollama integration, you'll need:
-
Mockstack installed with the optional
llm
dependencies: -
Ollama installed locally with at least one model (e.g., "llama3.2")
Basic Usage
The Ollama integration works by routing requests to a template file that uses the special ollama
template function.
-
Configure your LLM client to hit an endpoint that maps to a template filepath calling the
ollama
method, e.g.: -
Make requests as you normally would:
Integration with Templates
You can use Ollama responses within your Jinja templates via the provided ollama
template function. This allows you to:
- Mix static and dynamic content
- Apply transformations to the LLM responses
- Create conditional logic based on the responses
Example template structure:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4.1",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "{{ ollama(request_json.messages, 'llama3.2') | json_escape }}"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Best Practices
- Caching: Consider implementing caching for frequently used responses to improve performance
- Model Selection: Choose the appropriate model based on your testing needs
- Error Handling: Implement proper error handling in your templates
- Performance: Be mindful of response times when using real LLM responses
Limitations
- The integration requires a local Ollama instance
- Response times will be slower than static templates
- Model availability depends on your local setup