Multilevel Prometheus setup using Remote Read

Sitaram Shelke
4 min readJan 4, 2021

People who have been working in Ops and SRE domain need no introduction to Prometheus. It’s a popular open-source monitoring solution which also a part of the CNCF. I worked on an interesting setup of Prometheus, and this article is about a lesser documented feature of Prometheus called Remote Read.

Remote Read

Prometheus 2.0 added support for the Remote Read feature. The implementation is bi-fold, the ability to read from other data sources and also be able to expose its data to be remote read. The documentation mentions the following use-cases.

  • Support third-party data sources:

Prometheus has defined a request and response format using protobuf over HTTP, which it can use to talk to a third-party system. Now it’s easy for Prometheus to read data from any source if it can exchange data in this format.

  • Prometheus as a data source for third-party systems:

The same request-response format is exposed over HTTP, so essentially any third-party system which wants to query data from Prometheus can do so by making a request and parsing the response.

  • Seamless upgrades between Prometheus versions and data formats:

Now if you think about the previous two points, Prometheus has made itself remote readable and it can remote read others. The next obvious use case is a Prometheus instance remote reading another Prometheus instance. This one really makes sense as this would allow us to use the older version of Prometheus as a data source for the newer version, effectively reducing any downtime if there are any incompatible changes between the Prometheus versions or the underlying data formats.

Using Remote Read

Using the remote read feature is easy, all we need to make sure is the reachability to the system which is going to be remote read. Once that is solved, we can configure remote read in the Prometheus configuration:

remote_read:  
- url: http://remote-prometheus/api/v1/read #required
name: remote-prometheus #optional

Multilevel Prometheus Setup

In my case, the problem at hand was that there were a bunch of Kubernetes clusters that were collecting some necessary metrics within a cluster level. The owners of those clusters had the responsibility of administration of the time series collected, retention, and setting up rules based on the conditions. Although this was fine so far, there was a need for a read-only view of all the clusters from a single place.

The constraints were

  1. We could not move the data outside the cluster.
  2. Replicating data for the clusters was an obvious rejection.

Prometheus Remote Read sounds like the perfect use case for this. So this is how we ended up solving our problem.

Multilevel Prometheus setup using Remote Read

So we have a Prometheus instance running in each cluster, responsible for monitoring and alerting the cluster health. I added an external label representing the cluster identity. Prometheus automatically adds these external labels to all the metrics that it collects. Then I set up a global Prometheus instance configured to do remote read from all other Prometheus instances. This setup has an added advantage when we setup Grafana, we can create a Grafana variable that can query all the existing cluster_name label and we can now monitor all the clusters from a single Dashboard.

However, this design looks simpler the key challenge I faced was with the external labels. As with any monitoring system, a common question is how to we monitor the monitoring system itself, and with a Prometheus instance running in the Global cluster, we could very well add an external label to it, use it to monitor itself, and be available to be queried from Grafana. The only problem with this approach is that it doesn’t work. The remote read configuration has a conflict with the external labels and it is not documented very well. The issue is, if we add external labels to a Prometheus that has a conflicting label name-value pair with any of the remote reads, the remote read fails. After a lot of time trying to understand this behavior, I came across this Github issue. At that point, I had to change my approach to have a separate Prometheus instance monitoring the Global cluster with its own label and use it as a remote read in the Global Prometheus. So now the Global cluster includes two Prometheus servers, one which scrapes cluster data and the other one which is responsible for remote reading all the other Prometheus servers. Now I just have to point Grafana to this Prometheus, and it can show the data from its own cluster and other clusters.

I originally published this in my newsletter.

--

--