Introduction
Sentiment analysis is widely used in support and incident management, often relying on cloud-based AI services from OpenAI, AWS, or Google. However, self-hosting LLMs is becoming an attractive option for platform teams looking to reduce costs, keep data on-premises, and gain more control over model behaviour.
At CECG, we ran a short proof-of-concept using Ollama to deploy a local sentiment analysis model for Slack messages. The goal was to assess whether an LLM could be effectively integrated into a platform stack, and to evaluate the impact on developer experience, observability, and infrastructure management.
We built the solution using:
- Lang4j – to integrate LLMs into a Java application.
- Spring Boot – to expose an API server for processing sentiment requests.
- SQLite – to cache model results and reduce redundant calls to Ollama.
This blog highlights the challenges, trade-offs, and key lessons for any platform team looking to integrate local AI workloads.
Why Run LLMs Locally?
Self-hosting LLMs offers several advantages:
- Cost Savings – No per-request API fees.
- Data Locality – Keeps internal conversations within the organisation.
- Performance Control – Reduces latency by running models closer to data sources.
- Customisation – Allows model fine-tuning for specific use cases.
However, these benefits come with operational trade-offs, including infrastructure overhead, observability gaps, and accuracy concerns.
Challenges of Running Local LLMs
1. Machine Resource Constraints
LLMs are resource-intensive. During our testing, we used:
- 10 CPU cores
- 24GB RAM
While this was sufficient for processing Slack messages, scaling beyond this setup would require dedicated infrastructure. Running large models locally without optimising inference times can lead to bottlenecks that degrade performance.
Key Takeaway
- If running locally, ensure compute resources scale with workload demands.
- Consider offloading heavy workloads to cloud-based inference in a hybrid approach.
2. Model Accuracy & Bias
We tested multiple models to assess how well they classified sentiment in developer Slack messages.
Observations
- Technical discussions were mostly classified as neutral.
- Troubleshooting messages were frequently flagged as negative (particularly by Phi3).
- Acknowledgements and teamwork were consistently positive across models.
- Llama3.2 and Gemma2:2b showed a slight bias towards positivity.
Most LLMs aren’t trained on developer conversations, leading to misclassifications. For example, messages like ***“This keeps failing. Why isn’t this fixed yet?” ***might be incorrectly flagged as negative, when in reality, it’s a routine debugging discussion.
Why This Matters
Most LLMs are not trained on developer communication and can misinterpret debugging conversations as negative sentiment. This led to false positives, particularly for messages related to troubleshooting.
Key Takeaway
- Prompt engineering is essential to minimise misclassifications.
- Consider fine-tuning models on developer-specific datasets.
3. Observability & Multi-Tenancy Challenges
For sentiment analysis to be useful in a platform engineering context, it must support:
- Team-based sentiment tracking (e.g., logs per service).
- Model performance monitoring (e.g., flagging misclassifications).
- Integration with logging and monitoring tools (e.g., OpenTelemetry, Loki).
Initially, we lacked structured metadata linking sentiment results to specific teams or services. This made debugging difficult and raised concerns about how sentiment data should be stored and queried.
💡 Key Takeaway
- Ensure sentiment logs are structured and enriched with metadata.
- Integrate results into existing monitoring platforms (e.g., OpenTelemetry, Grafana, Loki).
4. Balancing Cost, Performance & Ease of Use
One of the biggest trade-offs in Platform Engineering is the Build vs. Buy decision:
While OpenAI and other vendors offer pre-trained sentiment APIs, platform teams must decide whether the cost savings of self-hosting outweigh the operational complexity.
Key Takeaway
- Compare infrastructure costs vs. vendor API pricing before committing.
- Ensure self-hosted models integrate with platform workflows.
- If fine-tuning is required, evaluate whether the effort justifies the gain.
Final Recommendations for Platform Teams
If your platform team is considering deploying local LLMs for sentiment analysis, here’s what to watch out for:
1. Adoption Strategy
- Ensure multi-tenancy support (team-based sentiment tracking).
- Integrate results with existing logging pipelines (e.g., OpenTelemetry, Loki).
- Provide dashboards & reporting tools for visibility.
2. Technical Considerations
- Benchmark models for accuracy and bias before adoption.
- Ensure compute resources match workload demands.
- Tune prompts to avoid misclassifications of technical conversations.
3. Build vs. Buy Decision
- Local models are cost-effective but require infrastructure investment.
- Cloud-hosted APIs offer ease of integration but come with ongoing costs.
- Consider a hybrid approach where sensitive data is processed locally, and non-sensitive workloads are offloaded to the cloud.
Conclusion
Running local LLMs for sentiment analysis is viable for platform teams, but it comes with infrastructure, accuracy, and observability challenges.
For teams considering self-hosted models, the most important factors to evaluate are:
- Model performance – Benchmark multiple LLMs before selecting one.
- Observability – Ensure sentiment results can be monitored and segmented per team.
- Multi-tenancy – Structured logging and metadata tagging are essential.
- Cost vs. complexity – Weigh infrastructure investment against the ease of vendor APIs.
For Platform Engineering teams, integrating AI-powered observability into their stack is a worthwhile experiment, but trade-offs must be carefully considered before committing to local inference at scale.
Next Steps
This proof-of-concept was completed in just a few days, proving that AI-powered observability can be quickly integrated into Platform Engineering workflows.
For teams exploring local AI inference, understanding the trade-offs early is key to making an informed decision.