Prometheus Metrics
F5 NGINX Service Mesh integrates with Prometheus for metrics and Grafana for visualizations.
To configure NGINX Service Mesh to use Prometheus when deploying, refer to the Monitoring and Tracing guide for instructions.
The mesh supports the SMI spec, including traffic metrics. The NGINX Service Mesh creates an extension API Server and shim that query Prometheus and return the results in a traffic metrics format. See SMI Traffic Metrics for more information.
Occasionally metrics are reset when the nginx-mesh-sidecar reloads NGINX Plus. If traffic is flowing and you fail to see metrics, retry after 30 seconds.
If you are deploying NGINX Plus Ingress Controller with the NGINX Service Mesh, make sure to configure the NGINX Plus Ingress Controller to export metrics. Refer to the Metrics section of the NGINX Plus Ingress Controller Deployment tutorial for instructions.
The NGINX Service Mesh sidecar exposes the following metrics in Prometheus format via the /metrics path on port 8887:
- NGINX Plus metrics.
- upstream_server_response_latency_ms: a histogram of upstream server response latencies in milliseconds. The response time is the time from when NGINX establishes a connection to an upstream server to when the last byte of the response body is received by NGINX.
All metrics have the namespace nginxplus, for example nginxplus_http_requests_total and nginxplus_upstream_server_response_latency_ms_count.
This section includes a set of example metrics that you may plug into your existing Prometheus-based tooling to gain insights into the traffic flowing through your applications.
- 
View the rate of requests currently flowing: irate(nginxplus_http_requests_total[30s])
- 
View unsuccessful response codes of your applications: nginxplus_upstream_server_responses{code=~"3xx|4xx|5xx"}This can be used to form more complex queries such as current success rate: sum(irate(nginxplus_upstream_server_responses{code=~"1xx|2xx"}[30s])) by (app, version) / sum(irate(nginxplus_upstream_server_responses[30s])) by (app, version)
- 
View the current throughput of clients sending to upstreams: irate(nginxplus_stream_upstream_server_sent[30s])
- 
You can also see the total number of connections made: nginxplus_stream_upstream_server_connections
- 
(TCP Only): NGINX Service Mesh exposes a whole host of latency information for TCP connections: nginxplus_stream_upstream_server_connect_timenginxplus_stream_upstream_server_first_byte_timenginxplus_stream_upstream_server_response_time
All metrics have the following labels:
| Metric Name | Description | 
|---|---|
| job | Prometheus job name. All metrics scraped from an nginx-mesh-sidecar have a job name of nginx-mesh-sidecars, and all metrics scraped from an NGINX Plus Ingress Controller have a job name ofnginx-plus-ingress. | 
| pod | Name of the Pod. | 
| namespace | Namespace where the Pod resides. | 
| instance | Address of the Pod. | 
| pod_template_hash | Value of the pod-template-hash Kubernetes label. | 
| deployment, statefulset, or daemonset | Name of the Deployment, StatefulSet, or DaemonSet that the Pod belongs to. | 
Metrics for upstream servers, such as nginxplus_upstream_server_requests, have these additional labels:
| Metric Name | Description | 
|---|---|
| code | Response code of the upstream server. For NGINX Plus metrics, the code will be one of the following: 1xx, 2xx, 3xx, 4xx, or 5xx. For the upstream_server_response_latency_msmetrics, the code is the specific response code, such as 201. | 
| upstream | Name of the upstream server group. | 
| server | Address of the upstream server selected by NGINX. | 
Metrics for outgoing requests have the following destination labels:
| Metric Name | Description | 
|---|---|
| dst_pod | Name of the Pod that the request was sent to. | 
| dst_service | Name of the Service that the request was sent to. | 
| dst_deployment, dst_statefulset, or dst_daemonset | Name of the Deployment, StatefulSet, or DaemonSet that the request was sent to. | 
| dst_namespace | Namespace that the request was sent to. | 
Metrics exported by NGINX Plus Ingress Controller have these additional labels:
| Metric Name | Description | 
|---|---|
| ingress | Set to true if ingress traffic is enabled. | 
| egress | Set to true if egress traffic is enabled. | 
| class | Ingress class of the NGINX Plus Ingress Controller. | 
| resource_type | Type of resource: VirtualServer, VirtualServerRoute, or Ingress. | 
| resource_name | Name of the VirtualServer, VirtualServerRoute, or Ingress resource. | 
| resource_namespace | Namespace of the resource. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_namespacefor queries and filters. | 
| service | Service the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_servicefor queries and filters. | 
| pod_name | Name of the Pod that the request was sent to. This value is kept for backwards compatibility; for consistency with NGINX Service Mesh metrics you can use dst_podfor queries and filters. | 
Here are some examples of how you can use the labels above to filter your Prometheus metrics:
- 
Find all upstream server responses with server side errors for deployment productpage-v1in namespaceprod:nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code="5xx"}
- 
Find all upstream server responses with successful response codes for deployment productpage-v1in namespaceprod:nginxplus_upstream_server_responses{deployment="productpage-v1",namespace="prod",code=~"1xx|2xx"}
- 
Find the p99 latency of all requests sent from deployment productpage-v1in namespaceprodto servicedetailsin namespaceprodover the last 30 seconds:histogram_quantile(0.99, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details"}[30s])) by (le))
- 
Find the p90 latency of all requests sent from deployment productpage-v1in namespaceprodto servicedetailsin namespaceprodover the last 30 seconds, excluding 301 response codes:histogram_quantile(0.90, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code!="301"}[30s])) by (le))
- 
Find the p50 latency of all successful(response codes of 200, or 201) requests sent from deployment productpage-v1in namespaceprodto servicedetailsin namespaceprodover the last 30 seconds:histogram_quantile(0.50, sum(irate(nginxplus_upstream_server_response_latency_ms_bucket{namespace="prod",deployment="productpage-v1",dst_service="details",code=~"200|201"}[30s])) by (le))
- 
Find all active connections for the NGINX Plus Ingress Controller: nginxplus_connections_active{job="nginx-plus-ingress"}
The custom NGINX Service Mesh Grafana dashboard NGINX Mesh Top can be imported into your Grafana instance.
For instructions and a list of features, see the Grafana example in the nginx-service-mesh GitHub repo.
To view Grafana, port-forward your Grafana Service:
kubectl port-forward -n <grafana-namespace> svc/grafana 3000