๐ Monitoring Guide
Comprehensive monitoring and analytics for your gem-ci automation workflows
๐ฏ Overview
gem-ci provides built-in monitoring capabilities to track workflow performance, repository health, and automation effectiveness. This guide covers all monitoring features and how to leverage them.
๐ Built-in Monitoring
Workflow Metrics (08-monitoring.yml
)
Automated Tracking:
- โฑ๏ธ Workflow execution times
- โ Success/failure rates
- ๐ฐ GitHub Actions usage costs
- ๐ Performance trends over time
Schedule: Weekly monitoring reports (Mondays at 4 AM UTC)
Metrics Collected:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Example metrics output
workflow_metrics:
ci_workflow:
avg_duration: "3m 45s"
success_rate: "98.5%"
total_runs: 127
cost_estimate: "$2.35"
security_workflow:
avg_duration: "2m 15s"
success_rate: "100%"
total_runs: 52
cost_estimate: "$1.20"
Repository Health Monitoring
Health Indicators:
- ๐ฅ Overall repository health score
- ๐ Open issues and PR status
- ๐ Workflow reliability metrics
- ๐ฅ Community engagement levels
Health Score Calculation:
1
2
3
4
5
6
# Health factors (weighted)
code_quality: 25% # RuboCop, test coverage
security_posture: 25% # Vulnerability scans, updates
automation_health: 20% # Workflow success rates
community_activity: 15% # PR/issue activity
documentation: 15% # Doc coverage, freshness
๐ Performance Analytics
Cost Optimization Tracking
Monitor your GitHub Actions cost savings:
1
2
3
4
5
6
7
8
9
# Cost comparison (before/after gem-ci optimizations)
previous_monthly_cost: "$45.60"
current_monthly_cost: "$9.20"
savings_percentage: "79.8%"
optimizations_applied:
- "Single Ruby version (3.3)"
- "Ubuntu runners only"
- "Reduced scheduled frequency"
- "Custom focused linting"
Workflow Performance Trends
Track performance improvements over time:
1
2
3
4
5
6
7
8
9
10
11
# 30-day performance trend
ci_workflow_trends:
avg_duration:
30_days_ago: "5m 23s"
current: "3m 45s"
improvement: "30.3%"
success_rate:
30_days_ago: "94.2%"
current: "98.5%"
improvement: "4.6%"
๐ Real-time Monitoring
PR Status Dashboard
Real-time CI status in PR comments:
1
2
3
4
5
6
7
8
9
## ๐ CI Status Dashboard
| Workflow | Status | Duration | Details |
|----------|--------|----------|---------|
| CI | โ
Passed | 3m 45s | [View run](link) |
| Security | โ
Passed | 2m 15s | [View run](link) |
| Quality | โณ Running | 1m 30s | [View run](link) |
**Last updated**: 2 minutes ago
Slack Notifications
Real-time workflow notifications to Slack:
Success Notification:
1
2
3
4
5
6
7
8
9
10
11
12
{
"text": "โ
CI Workflow Completed",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*gem-ci* CI pipeline completed successfully\n*Duration:* 3m 45s\n*Branch:* main"
}
}
]
}
Failure Notification:
1
2
3
4
5
6
7
8
9
10
11
12
{
"text": "โ Workflow Failed",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "๐จ *gem-ci* workflow failed\n*Step:* security-scan\n*Error:* Vulnerability detected"
}
}
]
}
๐ Custom Metrics
Test Coverage Tracking
Monitor test coverage trends:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# In your test suite
SimpleCov.start do
add_filter '/spec/'
add_filter '/test/'
# Track coverage over time
track_files 'lib/**/*.rb'
minimum_coverage 90
# Custom coverage groups
add_group 'Controllers', 'app/controllers'
add_group 'Models', 'app/models'
add_group 'Libraries', 'lib'
end
Coverage Metrics:
1
2
3
4
5
6
7
8
coverage_metrics:
current: "94.2%"
target: "90%"
trend: "+2.1% (30 days)"
by_category:
controllers: "96.8%"
models: "98.1%"
libraries: "91.4%"
Security Metrics
Track security posture over time:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
security_metrics:
vulnerabilities:
critical: 0
high: 0
medium: 1
low: 3
dependency_updates:
pending: 5
auto_merged: 23
manual_review: 2
security_score: "A+"
trend: "Stable"
๐๏ธ Monitoring Dashboard
GitHub Actions Insights
Built-in GitHub insights for workflows:
Access Path: Repository โ Insights โ Actions
Available Data:
- ๐ Usage analytics - Action minutes used
- โฑ๏ธ Workflow run times - Duration trends
- ๐ฐ Cost analysis - Spending by workflow
- ๐ Success rates - Reliability metrics
Custom Dashboard Setup
Create a monitoring dashboard using GitHub API:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Example monitoring script
const fetchWorkflowMetrics = async () => {
const response = await fetch(
'https://api.github.com/repos/owner/repo/actions/runs',
{
headers: {
'Authorization': `token ${github_token}`,
'Accept': 'application/vnd.github.v3+json'
}
}
);
const runs = await response.json();
// Calculate metrics
const metrics = {
totalRuns: runs.total_count,
successRate: calculateSuccessRate(runs.workflow_runs),
avgDuration: calculateAvgDuration(runs.workflow_runs),
costEstimate: estimateCost(runs.workflow_runs)
};
return metrics;
};
๐ฑ Alerting System
Failure Alerts
Automated alerts for workflow failures:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# In workflow failure step
- name: Send failure alert
if: failure()
uses: slackapi/slack-github-action@v1.27.0
with:
channel-id: $
payload: |
{
"text": "๐จ WORKFLOW FAILURE ALERT",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Repository:* $\n*Workflow:* $\n*Branch:* $\n*Failed Step:* $"
}
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {"type": "plain_text", "text": "View Logs"},
"url": "$/$/actions/runs/$"
}
]
}
]
}
Performance Degradation Alerts
Alert when performance degrades:
1
2
3
4
5
6
7
8
9
10
11
- name: Check performance regression
run: |
CURRENT_DURATION=$
BASELINE_DURATION=$
if [ $CURRENT_DURATION -gt $((BASELINE_DURATION * 120 / 100)) ]; then
echo "Performance regression detected"
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"โ ๏ธ Performance regression in $"}' \
$
fi
๐ Automated Reporting
Weekly Performance Reports
Automated weekly reports via email/Slack:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# .github/workflows/weekly-report.yml
name: Weekly Performance Report
on:
schedule:
- cron: '0 9 * * MON' # Monday 9 AM UTC
jobs:
generate-report:
runs-on: ubuntu-latest
steps:
- name: Generate performance report
run: |
# Collect metrics from past week
# Generate HTML report
# Send via email/Slack
Report Contents:
- ๐ Workflow performance summary
- ๐ฐ Cost analysis and savings
- ๐ Security posture updates
- ๐ฅ Community activity metrics
- ๐ Trends and recommendations
Monthly Health Check
Comprehensive monthly repository health assessment:
1
2
3
4
5
6
7
8
9
10
11
# Monthly health metrics
health_report:
overall_score: "A"
improvements_needed:
- "Increase test coverage in utilities module"
- "Update 3 outdated dependencies"
highlights:
- "Zero security vulnerabilities"
- "98.5% CI success rate"
- "30% performance improvement"
๐ ๏ธ Monitoring Tools Integration
Third-party Monitoring
Integrate with external monitoring services:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Datadog integration example
- name: Send metrics to Datadog
run: |
curl -X POST "https://api.datadoghq.com/api/v1/series" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: $" \
-d '{
"series": [{
"metric": "github.workflow.duration",
"points": [['$', $]],
"tags": ["workflow:$", "repo:$"]
}]
}'
Custom Monitoring Scripts
Create custom monitoring scripts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash
# scripts/monitor-workflows.sh
# Fetch workflow data
WORKFLOW_DATA=$(gh api repos/$REPO/actions/runs --jq '.workflow_runs[:10]')
# Calculate metrics
SUCCESS_COUNT=$(echo $WORKFLOW_DATA | jq '[.[] | select(.conclusion=="success")] | length')
TOTAL_COUNT=$(echo $WORKFLOW_DATA | jq 'length')
SUCCESS_RATE=$((SUCCESS_COUNT * 100 / TOTAL_COUNT))
# Report metrics
echo "Success Rate: ${SUCCESS_RATE}%"
echo "Total Runs: ${TOTAL_COUNT}"
# Alert if success rate < 95%
if [ $SUCCESS_RATE -lt 95 ]; then
echo "๐จ Success rate below threshold!"
# Send alert
fi
๐ Monitoring Best Practices
Metric Selection
Focus on actionable metrics:
โ Good Metrics:
- Workflow success rates
- Performance trends
- Cost optimization impact
- Security vulnerability counts
โ Avoid These:
- Vanity metrics without action items
- Too many metrics causing noise
- Metrics not tied to business value
Alert Management
Effective alerting strategy:
- ๐จ Critical: Immediate action required (failures, security)
- โ ๏ธ Warning: Attention needed (performance degradation)
- ๐ Info: Regular updates (weekly reports)
Data Retention
Manage monitoring data effectively:
1
2
3
4
5
6
7
8
9
data_retention:
metrics: "90 days"
logs: "30 days"
reports: "1 year"
cleanup_schedule:
daily: "Remove logs older than 30 days"
weekly: "Archive old metrics"
monthly: "Generate historical reports"
๐ Troubleshooting Monitoring
Common Issues
Metrics not updating:
1
2
3
4
5
# Check workflow execution
gh run list --limit 10
# Verify monitoring workflow
gh workflow view monitoring
Missing Slack notifications:
1
2
3
4
# Test Slack integration
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"Test notification"}' \
$SLACK_WEBHOOK_URL
Performance regression alerts:
1
2
3
4
5
# Review baseline metrics
cat .github/config/performance-baselines.yml
# Check current performance
./scripts/measure-performance.sh
๐ Advanced Monitoring
Predictive Analytics
Use historical data for predictions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Example: Predict workflow failure probability
import pandas as pd
from sklearn.linear_model import LogisticRegression
# Load historical data
data = pd.read_csv('workflow-history.csv')
# Train model to predict failures
model = LogisticRegression()
model.fit(data[['duration', 'complexity', 'changes']], data['failed'])
# Predict current run
probability = model.predict_proba([[current_duration, complexity, changes]])
print(f"Failure probability: {probability[0][1]:.2%}")
Anomaly Detection
Detect unusual patterns:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Anomaly detection workflow
- name: Detect anomalies
run: |
# Compare current metrics to historical baseline
CURRENT_DURATION=$
BASELINE_MEAN=$
BASELINE_STDDEV=$
# Calculate z-score
Z_SCORE=$(echo "($CURRENT_DURATION - $BASELINE_MEAN) / $BASELINE_STDDEV" | bc)
# Alert if anomaly detected (z-score > 2)
if [ $(echo "$Z_SCORE > 2" | bc) -eq 1 ]; then
echo "๐จ Anomaly detected: duration significantly higher than normal"
fi
Ready to monitor your workflows? Start with the built-in monitoring workflow and customize based on your specific needs!