Prometheus Monitoring and System Optimization

Prepared by: Anwer Sadath Abdul Muttaliff

Project Overview

This project demonstrates how to set up Prometheus, Grafana, and Node Exporter for system monitoring, perform stress testing using stress-ng, and optimize system performance through kernel tuning and swap management.

Step 1: Understanding the Tools

1.1 What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit. It collects metrics (e.g., CPU usage, memory usage) from your system and stores them in a time-series database. It uses a pull-based model, meaning it periodically scrapes metrics from targets (e.g., Node Exporter).

1.2 What is Grafana?

Grafana is an open-source visualization tool. It connects to data sources (e.g., Prometheus) and creates beautiful dashboards to display metrics in real-time. It’s highly customizable and supports alerts.

1.3 What is Node Exporter?

Node Exporter is a Prometheus plugin that collects system-level metrics (e.g., CPU, memory, disk, network) from Linux machines. It exposes these metrics in a format that Prometheus can scrape.

1.4 What is stress-ng?

stress-ng is a tool to stress-test your system by simulating high CPU, memory, disk, or I/O load. It helps you understand how your system behaves under pressure.

Step 2: Setting Up the Environment

2.1 Launch Linux Instances

Launch two Linux instances with the following IP addresses:

Configure the firewall on the first instance:

firewall-cmd --permanent --add-port=9090/tcp  # Prometheus
firewall-cmd --permanent --add-port=3000/tcp  # Grafana
firewall-cmd --permanent --add-port=9100/tcp  # Node Exporter

On the second instance, install Node Exporter and configure the firewall:

firewall-cmd --permanent --add-port=9100/tcp
2.2 Install Required Tools

Update the system and install stress-ng and wget:

sudo yum update -y
sudo yum install -y stress-ng wget

Step 3: Install and Configure Prometheus

3.1 Download and Install Prometheus

Download Prometheus:

wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz

Extract and move Prometheus to /opt/prometheus:

tar -xvzf prometheus-2.47.0.linux-amd64.tar.gz
sudo mv prometheus-2.47.0.linux-amd64 /opt/prometheus
3.2 Configure Prometheus

Create a prometheus.yml configuration file:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node-localhost'
    static_configs:
      - targets: ['192.168.1.88:9100']

  - job_name: 'node-remote'
    static_configs:
      - targets: ['192.168.1.87:9100']

Start Prometheus:

./prometheus --config.file=prometheus.yml &

Verify Prometheus is running by accessing http://<IP>:9090.

Prometheus Web Interface

Step 4: Install and Configure Node Exporter

4.1 Download and Install Node Exporter

Download Node Exporter:

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz

Extract and move Node Exporter to /usr/local/bin:

tar -xvzf node_exporter-1.6.1.linux-amd64.tar.gz
sudo mv node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
4.2 Start Node Exporter

Run Node Exporter:

sudo node_exporter &

Verify Node Exporter is running by accessing http://<IP>:9100/metrics.

Node Exporter Metrics

Step 5: Install and Configure Grafana

5.1 Download and Install Grafana

Download Grafana:

wget https://dl.grafana.com/oss/release/grafana-10.1.5.linux-amd64.tar.gz

Extract and move Grafana to /opt/grafana:

tar -xvzf grafana-10.1.5.linux-amd64.tar.gz
sudo mv grafana-10.1.5 /opt/grafana
5.2 Start Grafana

Run Grafana:

cd /opt/grafana
./bin/grafana-server &

Access Grafana at http://<IP>:3000 and log in with the default credentials (admin/admin).

5.3 Add Prometheus as a Data Source

Go to Configuration > Data Sources, add Prometheus, and set the URL to http://localhost:9090.

Add Prometheus Data Source
5.4 Import a Dashboard

Import the Node Exporter Full dashboard (ID: 1860) and select the Prometheus data source.

Grafana Dashboard

Step 6: Stress Testing with stress-ng

6.1 Simulate High CPU Usage

Run stress-ng to simulate high CPU usage:

stress-ng --cpu 4 --timeout 300s

Observe the impact on CPU usage in Grafana.

Before sysctl adjustment:

After sysctl adjustment:

Stress Test Results

Step 7: Optimize System Performance

7.1 Tune Kernel Parameters

Edit /etc/sysctl.conf:

sudo nano /etc/sysctl.conf

Add the following lines:

# Increase the number of open files
fs.file-max = 100000

# Optimize network performance
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_max_syn_backlog = 1024

Apply the changes:

sudo sysctl -p
7.2 Increase Swap Space

Increase swap space to improve system performance:

dd if=/dev/zero of=/root/swapblk bs=1M count=1000
mkswap /root/swapblk
chmod 0600 /root/swapblk
swapon /root/swapblk

Verify the new swap space:

free -m
Back to Top Back to Home