FinOps brings financial accountability to cloud spending by combining systems, best practices, and culture. This guide covers practical strategies for optimizing cloud costs while maintaining performance and reliability.
FinOps Framework Phases
- Inform: Visibility into cloud spending and allocation
- Optimize: Identify and implement cost reduction opportunities
- Operate: Continuous governance and improvement
Cost Allocation with Tags
# Terraform - Mandatory tagging
variable "required_tags" {
type = map(string)
default = {
Environment = "production"
Team = "platform"
CostCenter = "engineering"
Project = "api-gateway"
}
}
resource "aws_instance" "app" {
ami = "ami-0123456789"
instance_type = "t3.medium"
tags = merge(var.required_tags, {
Name = "app-server"
})
}Right-Sizing Recommendations
# AWS CLI - Get rightsizing recommendations
aws ce get-rightsizing-recommendation \
--service EC2 \
--configuration '{"RecommendationTarget": "SAME_INSTANCE_FAMILY", "BenefitsConsidered": true}'
# Python script for automated analysis
import boto3
def analyze_underutilized_instances():
cloudwatch = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')
instances = ec2.describe_instances()['Reservations']
for reservation in instances:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
# Get CPU utilization
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
StartTime=datetime.utcnow() - timedelta(days=14),
EndTime=datetime.utcnow(),
Period=3600,
Statistics=['Average']
)
avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
if avg_cpu < 10:
print(f"Underutilized: {instance_id} - Avg CPU: {avg_cpu:.2f}%")Savings Plans and Reserved Instances
# Terraform - Purchase Savings Plan
resource "aws_savingsplans_plan" "compute" {
savings_plan_type = "Compute"
payment_option = "No Upfront"
term = "1 Year"
commitment = 100.00 # USD per hour
}Spot Instances for Non-Critical Workloads
# Kubernetes - Spot instance node group
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: spot-provisioner
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5a.large"]
limits:
resources:
cpu: 1000
ttlSecondsAfterEmpty: 30Automated Cost Alerts
# Terraform - Budget alert
resource "aws_budgets_budget" "monthly" {
name = "monthly-budget"
budget_type = "COST"
limit_amount = "10000"
limit_unit = "USD"
time_unit = "MONTHLY"
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = ["finops@company.com"]
}
}Quick Wins
- Delete unattached EBS volumes and unused Elastic IPs
- Implement S3 lifecycle policies for data tiering
- Use auto-scaling to match capacity with demand
- Schedule non-production resources to stop after hours
- Consolidate idle load balancers
Conclusion
Effective FinOps requires collaboration between engineering, finance, and operations teams. By implementing proper tagging, right-sizing, and commitment-based discounts, organizations can reduce cloud costs by 20-40% without sacrificing performance.
