项目作者: dwp

项目描述 :
Repo for the DataWorks Cloudwatch Exporter Docker image
高级语言: Shell
项目地址: git://github.com/dwp/cloudwatch-exporter.git
创建时间: 2020-07-16T12:22:28Z
项目社区:https://github.com/dwp/cloudwatch-exporter

开源协议:ISC License

下载


DO NOT USE THIS REPO - MIGRATED TO GITLAB

cloudwatch-exporter

Repo for the DataWorks Cloudwatch Exporter Docker image

YACE - yet another cloudwatch exporter

Security

Supported Versions

Only latest version gets security updates. We won’t support older versions.

Reporting a Vulnerability

In case of a vulnerability please directly contact us via mail - security@nerdswords.de

Do not disclose any specifics in github issues! - Thank you.

We will contact you as soon as possible.

Features

  • Stop worrying about your AWS IDs - Auto discovery of resources via tags
  • Structured JSON logging
  • Filter monitored resources via regex
  • Automatic adding of tag labels to metrics
  • Automatic adding of dimension labels to metrics
  • Allows to export 0 even if CloudWatch returns nil
  • Allows exports metrics with CloudWatch timestamps (disabled by default)
  • Static metrics support for all cloudwatch metrics without auto discovery
  • Pull data from multiple AWS accounts using cross-account roles
  • Supported services with auto discovery through tags:

    • acm (AWS/CertificateManager) - Certificate Manager
    • alb (AWS/ApplicationELB) - Application Load Balancer
    • apigateway (AWS/ApiGateway) - API Gateway
    • appsync (AWS/AppSync) - AppSync
    • athena (AWS/Athena) - Athena
    • beanstalk (AWS/ElasticBeanstalk) - Elastic Beanstalk
    • billing (AWS/Billing) - Billing
    • cassandra (AWS/Cassandra) - Cassandra
    • cloudfront (AWS/CloudFront) - Cloud Front
    • cognito-idp (AWS/Cognito) - Cognito
    • dms (AWS/DocDB) - Database Migration Service
    • docdb (AWS/DocDB) - DocumentDB (with MongoDB compatibility)
    • dynamodb (AWS/DynamoDB) - NoSQL Key-Value Database
    • ebs (AWS/EBS) - Elastic Block Storage
    • ec (AWS/Elasticache) - ElastiCache
    • ec2 (AWS/EC2) - Elastic Compute Cloud
    • ec2Spot (AWS/EC2Spot) - Elastic Compute Cloud for Spot Instances
    • ecs-svc (AWS/ECS) - Elastic Container Service (Service Metrics)
    • ecs-containerinsights (ECS/ContainerInsights) - ECS/ContainerInsights (Fargate metrics)
    • efs (AWS/EFS) - Elastic File System
    • elb (AWS/ELB) - Elastic Load Balancer
    • emr (AWS/ElasticMapReduce) - Elastic MapReduce
    • es (AWS/ES) - ElasticSearch
    • fsx (AWS/FSx) - FSx File System
    • gamelift (AWS/GameLift) - GameLift
    • glue (Glue) - AWS Glue Jobs
    • iot (AWS/IoT) - IoT
    • kinesis (AWS/Kinesis) - Kinesis Data Stream
    • nfw (AWS/NetworkFirewall) - Network Firewall
    • ngw (AWS/NATGateway) - NAT Gateway
    • lambda (AWS/Lambda) - Lambda Functions
    • mq (AWS/AmazonMQ) - Managed Message Broker Service
    • neptune (AWS/Neptune) - Neptune
    • nlb (AWS/NetworkELB) - Network Load Balancer
    • redshift (AWS/Redshift) - Redshift Database
    • rds (AWS/RDS) - Relational Database Service
    • r53r (AWS/Route53Resolver) - Route53 Resolver
    • s3 (AWS/S3) - Object Storage
    • ses (AWS/SES) - Simple Email Service
    • shield (AWS/DDoSProtection) - Distributed Denial of Service (DDoS) protection service
    • sqs (AWS/SQS) - Simple Queue Service
    • tgw (AWS/TransitGateway) - Transit Gateway
    • vpn (AWS/VPN) - VPN connection
    • asg (AWS/AutoScaling) - Auto Scaling Group
    • kafka (AWS/Kafka) - Managed Apache Kafka
    • firehose (AWS/Firehose) - Managed Streaming Service
    • sns (AWS/SNS) - Simple Notification Service
    • sfn (AWS/States) - Step Functions
    • wafv2 (AWS/WAFV2) - Web Application Firewall v2
    • workspaces (AWS/WorkSpaces) - Workspaces

Image

  • ghcr.io/nerdswords/yet-another-cloudwatch-exporter:x.x.x e.g. 0.5.0
  • See Releases for binaries

Configuration

Command Line Options

Option Description
labels-snake-case Causes labels on metrics to be output in snake case instead of camel case
floating-time-window Use a floating start/end time window instead of rounding times to 5 min intervals

Top level configuration

Key Description
apiVersion Configuration file version
discovery Auto-discovery configuration
static List of static configurations

Auto-discovery configuration

Key Description
exportedTagsOnMetrics List of tags per service to export to all metrics
jobs List of auto-discovery jobs

exportedTagsOnMetrics example:

  1. exportedTagsOnMetrics:
  2. ec2:
  3. - Name
  4. - type

Note: Only tagged resources are discovered.

Auto-discovery job

Key Description
regions List of AWS regions
type Cloudwatch service alias (“alb”, “ec2”, etc) or namespace name (“AWS/EC2”, “AWS/S3”, etc).
length (Default 120) How far back to request data for in seconds
delay If set it will request metrics up until current_time - delay
roles List of IAM roles to assume (optional)
searchTags List of Key/Value pairs to use for tag filtering (all must match), Value can be a regex.
period Statistic period in seconds (General Setting for all metrics in this job)
addCloudwatchTimestamp Export the metric with the original CloudWatch timestamp (General Setting for all metrics in this job)
customTags Custom tags to be added as a list of Key/Value pairs
metrics List of metric definitions

searchTags example:

  1. searchTags:
  2. - key: env
  3. value: production

Metric definition

Key Description
name CloudWatch metric name
statistics List of statistic types, e.g. “Minimum”, “Maximum”, etc.
period Statistic period in seconds (Overrides job level setting)
length How far back to request data for in seconds(for static jobs)
delay If set it will request metrics up until current_time - delay(for static jobs)
nilToZero Return 0 value if Cloudwatch returns no metrics at all. By default NaN will be reported
addCloudwatchTimestamp Export the metric with the original CloudWatch timestamp (Overrides job level setting)
  • Available statistics: Maximum, Minimum, Sum, SampleCount, Average, pXX.
  • Watch out using addCloudwatchTimestamp for sparse metrics, e.g from S3, since Prometheus won’t scrape metrics containing timestamps older than 2-3 hours
  • Setting Inheritance: Some settings at the job level are overridden by settings at the metric level. This allows for a specific setting to override a
    general setting. The currently inherited settings are period, and addCloudwatchTimestamp

Static configuration

Key Description
regions List of AWS regions
roles List of IAM roles to assume
namespace CloudWatch namespace
name Must be set with multiple block definitions per namespace
customTags Custom tags to be added as a list of Key/Value pairs
dimensions CloudWatch metric dimensions as a list of Name/Value pairs
metrics List of metric definitions

Example of config File

  1. apiVersion: v1alpha1
  2. discovery:
  3. exportedTagsOnMetrics:
  4. ec2:
  5. - Name
  6. ebs:
  7. - VolumeId
  8. jobs:
  9. - type: es
  10. regions:
  11. - eu-west-1
  12. searchTags:
  13. - key: type
  14. value: ^(easteregg|k8s)$
  15. metrics:
  16. - name: FreeStorageSpace
  17. statistics:
  18. - Sum
  19. period: 600
  20. length: 60
  21. - name: ClusterStatus.green
  22. statistics:
  23. - Minimum
  24. period: 600
  25. length: 60
  26. - name: ClusterStatus.yellow
  27. statistics:
  28. - Maximum
  29. period: 600
  30. length: 60
  31. - name: ClusterStatus.red
  32. statistics:
  33. - Maximum
  34. period: 600
  35. length: 60
  36. - type: elb
  37. regions:
  38. - eu-west-1
  39. length: 900
  40. delay: 120
  41. searchTags:
  42. - key: KubernetesCluster
  43. value: production-19
  44. metrics:
  45. - name: HealthyHostCount
  46. statistics:
  47. - Minimum
  48. period: 600
  49. length: 600 #(this will be ignored)
  50. - name: HTTPCode_Backend_4XX
  51. statistics:
  52. - Sum
  53. period: 60
  54. length: 900 #(this will be ignored)
  55. delay: 300 #(this will be ignored)
  56. nilToZero: true
  57. - type: alb
  58. regions:
  59. - eu-west-1
  60. searchTags:
  61. - key: kubernetes.io/service-name
  62. value: .*
  63. metrics:
  64. - name: UnHealthyHostCount
  65. statistics: [Maximum]
  66. period: 60
  67. length: 600
  68. - type: vpn
  69. regions:
  70. - eu-west-1
  71. searchTags:
  72. - key: kubernetes.io/service-name
  73. value: .*
  74. metrics:
  75. - name: TunnelState
  76. statistics:
  77. - p90
  78. period: 60
  79. length: 300
  80. - type: kinesis
  81. regions:
  82. - eu-west-1
  83. metrics:
  84. - name: PutRecords.Success
  85. statistics:
  86. - Sum
  87. period: 60
  88. length: 300
  89. - type: s3
  90. regions:
  91. - eu-west-1
  92. searchTags:
  93. - key: type
  94. value: public
  95. metrics:
  96. - name: NumberOfObjects
  97. statistics:
  98. - Average
  99. period: 86400
  100. length: 172800
  101. - name: BucketSizeBytes
  102. statistics:
  103. - Average
  104. period: 86400
  105. length: 172800
  106. - type: ebs
  107. regions:
  108. - eu-west-1
  109. searchTags:
  110. - key: type
  111. value: public
  112. metrics:
  113. - name: BurstBalance
  114. statistics:
  115. - Minimum
  116. period: 600
  117. length: 600
  118. addCloudwatchTimestamp: true
  119. - type: kafka
  120. regions:
  121. - eu-west-1
  122. searchTags:
  123. - key: env
  124. value: dev
  125. metrics:
  126. - name: BytesOutPerSec
  127. statistics:
  128. - Average
  129. period: 600
  130. length: 600
  131. static:
  132. - namespace: AWS/AutoScaling
  133. name: must_be_set
  134. regions:
  135. - eu-west-1
  136. dimensions:
  137. - name: AutoScalingGroupName
  138. value: Test
  139. customTags:
  140. - key: CustomTag
  141. value: CustomValue
  142. metrics:
  143. - name: GroupInServiceInstances
  144. statistics:
  145. - Minimum
  146. period: 60
  147. length: 300

[Source: config_test.yml]

Metrics Examples

  1. ### Metrics with exportedTagsOnMetrics
  2. aws_ec2_cpuutilization_maximum{dimension_InstanceId="i-someid", name="arn:aws:ec2:eu-west-1:472724724:instance/i-someid", tag_Name="jenkins"} 57.2916666666667
  3. ### Info helper with tags
  4. aws_elb_info{name="arn:aws:elasticloadbalancing:eu-west-1:472724724:loadbalancer/a815b16g3417211e7738a02fcc13bbf9",tag_KubernetesCluster="production-19",tag_Name="",tag_kubernetes_io_cluster_production_19="owned",tag_kubernetes_io_service_name="nginx-ingress/private-ext",region="eu-west-1"} 0
  5. aws_ec2_info{name="arn:aws:ec2:eu-west-1:472724724:instance/i-someid",tag_Name="jenkins"} 0
  6. ### Track cloudwatch requests to calculate costs
  7. yace_cloudwatch_requests_total 168

Query Examples without exportedTagsOnMetrics

  1. # CPUUtilization + Name tag of the instance id - No more instance id needed for monitoring
  2. aws_ec2_cpuutilization_average + on (name) group_left(tag_Name) aws_ec2_info
  3. # Free Storage in Megabytes + tag Type of the elasticsearch cluster
  4. (aws_es_free_storage_space_sum + on (name) group_left(tag_Type) aws_es_info) / 1024
  5. # Add kubernetes / kops tags on 4xx elb metrics
  6. (aws_elb_httpcode_backend_4_xx_sum + on (name) group_left(tag_KubernetesCluster,tag_kubernetes_io_service_name) aws_elb_info)
  7. # Availability Metric for ELBs (Successful requests / Total Requests) + k8s service name
  8. # Use nilToZero on all metrics else it won't work
  9. ((aws_elb_request_count_sum - on (name) group_left() aws_elb_httpcode_backend_4_xx_sum) - on (name) group_left() aws_elb_httpcode_backend_5_xx_sum) + on (name) group_left(tag_kubernetes_io_service_name) aws_elb_info
  10. # Forecast your elasticsearch disk size in 7 days and report metrics with tags type and version
  11. predict_linear(aws_es_free_storage_space_minimum[2d], 86400 * 7) + on (name) group_left(tag_type, tag_version) aws_es_info
  12. # Forecast your cloudwatch costs for next 32 days based on last 10 minutes
  13. # 1.000.000 Requests free
  14. # 0.01 Dollar for 1.000 GetMetricStatistics Api Requests (https://aws.amazon.com/cloudwatch/pricing/)
  15. ((increase(yace_cloudwatch_requests_total[10m]) * 6 * 24 * 32) - 100000) / 1000 * 0.01

IAM

The following IAM permissions are required for YACE to work.

  1. "tag:GetResources",
  2. "cloudwatch:GetMetricData",
  3. "cloudwatch:GetMetricStatistics",
  4. "cloudwatch:ListMetrics"

The following IAM permissions are required for the transit gateway attachment (tgwa) metrics to work.

  1. "ec2:DescribeTags",
  2. "ec2:DescribeInstances",
  3. "ec2:DescribeRegions",
  4. "ec2:DescribeTransitGateway*"

The following IAM permission is required to discover tagged API Gateway REST APIs:

  1. "apigateway:GET"

Running locally

  1. docker run -d --rm -v $PWD/credentials:/exporter/.aws/credentials -v $PWD/config.yml:/tmp/config.yml \
  2. -p 5000:5000 --name yace ghcr.io/nerdswords/yet-another-cloudwatch-exporter:vx.xx.x # release version as tag - Do not forget the version 'v'

Kubernetes Installation

  1. ---
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: yace
  6. data:
  7. config.yml: |-
  8. ---
  9. # Start of config file
  10. ---
  11. apiVersion: apps/v1
  12. kind: Deployment
  13. metadata:
  14. name: yace
  15. spec:
  16. replicas: 1
  17. selector:
  18. matchLabels:
  19. name: yace
  20. template:
  21. metadata:
  22. labels:
  23. name: yace
  24. spec:
  25. containers:
  26. - name: yace
  27. image: ghcr.io/nerdswords/yet-another-cloudwatch-exporter:vx.x.x # release version as tag - Do not forget the version 'v'
  28. imagePullPolicy: IfNotPresent
  29. args:
  30. - "--config.file=/tmp/config.yml"
  31. ports:
  32. - name: app
  33. containerPort: 5000
  34. volumeMounts:
  35. - name: config-volume
  36. mountPath: /tmp
  37. volumes:
  38. - name: config-volume
  39. configMap:
  40. name: yace

Options

RoleArns

Multiple roleArns are useful, when you are monitoring multi-account setup, where all accounts are using same AWS services. For example, you are running yace in monitoring account and you have number of accounts (for example newspapers, radio and television) running ECS clusters. Each account gives yace permissions to assume local IAM role, which has all the necessary permissions for Cloudwatch metrics. On this kind of setup, you could simply list:

  1. jobs:
  2. - type: ecs-svc
  3. regions:
  4. - eu-north-1
  5. roles:
  6. - roleArn: "arn:aws:iam:1111111111111:role/prometheus" # newspaper
  7. - roleArn: "arn:aws:iam:2222222222222:role/prometheus" # radio
  8. - roleArn: "arn:aws:iam:3333333333333:role/prometheus" # television
  9. metrics:
  10. - name: MemoryReservation
  11. statistics:
  12. - Average
  13. - Minimum
  14. - Maximum
  15. period: 600
  16. length: 600

Additionally, if the IAM role you want to assume requires an External ID you can specify it this way:

  1. roles:
  2. - roleArn: "arn:aws:iam:1111111111111:role/prometheus"
  3. externalId: "shared-external-identifier"

Requests concurrency

The flags ‘cloudwatch-concurrency’ and ‘tag-concurrency’ define the number of concurrent request to cloudwatch metrics and tags. Their default value is 5.

Setting a higher value makes faster scraping times but can incur in throttling and the blocking of the API.

Decoupled scraping

The exporter scraped cloudwatch metrics in the background in fixed interval.
This protects from the abuse of API requests that can cause extra billing in AWS account.

The flag ‘scraping-interval’ defines the seconds between scrapes.
The default value is 300.

Troubleshooting / Debugging

Help my metrics are intermittent

  • Please, try out a bigger length e.g. for elb try out a length of 600 and a period of 600. Then test how low you can
    go without losing data. ELB metrics on AWS are written every 5 minutes (300) in default.

My metrics only show new values after 5 minutes

  • Please, try to set a lower value for the ‘scraping-interval’ flag or set the ‘decoupled-scraping’ to false.