Skip to Content
OperateAdvanced Configuration & Monitoring

Performance, Monitoring and Alerting

Optimizing System Configuration

There are a virtgually unlimited number of unique individual setups that cannot be covered in this document. As well, even very similar builds and configurations can behave very differently due to external factors, so your results may vary.

Here are some general guidelines to use as a starting point. Be cautious, make incremental changes, testing and observing before moving forward. Always focus on only one specific area at a time - avoid making changes to memory, storage, and CPU configs all at once. Diagnosing potential problems becomes nearly impossible otherwise.

Memory Management

The following settings in /etc/sysctl.conf can optimize memory usage and disk I/O patterns:

# Minimize swapping vm.swappiness = 1 # Control disk write behavior vm.dirty_background_ratio = 3 vm.dirty_ratio = 10 vm.dirty_expire_centisecs = 300 vm.dirty_writeback_centisecs = 100

Apply changes: sudo sysctl -p

Network Stack

The following settings in /etc/sysctl.conf may improve network performance:

# Increase connection handling capacity net.core.somaxconn = 32768 net.core.netdev_max_backlog = 32768 net.ipv4.tcp_max_syn_backlog = 16384 # Optimize buffer sizes net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216

Storage Configuration

For NVMe drives, optimize I/O scheduling:

Storage Optimization Commands

# Set IO scheduler echo "none" > /sys/block/nvme0n1/queue/scheduler # Set read-ahead buffer blockdev --setra 4096 /dev/nvme0n1 # Set IO priority in systemd service sudo tee -a /etc/systemd/system/seid.service << EOF [Service] IOSchedulingClass=realtime IOSchedulingPriority=2 EOF # Configure disk mount options sudo tee -a /etc/fstab << EOF /dev/nvme0n1p1 /data ext4 defaults,noatime,nosuid,nodev,noexec,commit=60 0 0 EOF

Infrastructure Monitoring

Monitoring is one of the most critical components of network infrastructure. performance tuning, and alerting configuration for Cosmos-SDK/Tendermint nodes.

Prometheus Setup

First, install Prometheus:

wget https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-amd64.tar.gz tar xvf prometheus-2.42.0.linux-amd64.tar.gz

Example Prometheus configuration:

global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'sei_node' static_configs: - targets: ['node1_ip:port'] metrics_path: /metrics - job_name: 'node' static_configs: - targets: ['node2_ip:port']

Grafana Integration

Install and configure Grafana:

sudo apt install -y apt-transport-https software-properties-common sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main" sudo apt update && sudo apt-get install grafana

Sample Grafana Dashboard JSON

{ "annotations": { "list": [ { "builtIn": 1, "datasource": "-- Grafana --", "enable": true, "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", "type": "dashboard" } ] }, "editable": true, "gnetId": null, "graphTooltip": 0, "id": 1, "links": [], "panels": [ { "alerting": {}, "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": null, "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }, "hiddenSeries": false, "id": 2, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.0", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "tendermint_consensus_height", "interval": "", "legendFormat": "", "refId": "A" } ], "thresholds": [], "timeRegions": [], "title": "Block Height", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } } ], "schemaVersion": 26, "style": "dark", "tags": [], "templating": { "list": [] }, "time": { "from": "now-6h", "to": "now" }, "timepicker": {}, "timezone": "", "title": "Sei Node Metrics", "uid": "sei_metrics", "version": 1 }

Alert Management

Install Alertmanager:

wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz tar xvf alertmanager-0.25.0.linux-amd64.tar.gz

Create Alert Rules Configuration

groups: - name: validator_alerts rules: - alert: NodeDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: 'Node {{ $labels.instance }} down' - alert: BlockProductionSlow expr: rate(tendermint_consensus_height[5m]) < 0.1 for: 5m labels: severity: warning annotations: summary: 'Block production is slow on {{ $labels.instance }}' - alert: ValidatorMissedBlocks expr: increase(tendermint_consensus_validator_missed_blocks[1h]) > 0 labels: severity: critical annotations: summary: 'Validator missing blocks' - alert: ValidatorJailed expr: tendermint_consensus_validator_status == 0 labels: severity: critical annotations: summary: 'Validator has been jailed' - alert: ConsensusStalled expr: tendermint_consensus_height_status == 0 for: 5m labels: severity: critical annotations: summary: 'Consensus has stalled'

Log Management

Loki Setup

Using Loki for log aggregation:

wget https://github.com/grafana/loki/releases/download/v2.8.0/loki-linux-amd64.zip unzip loki-linux-amd64.zip

Promtail Configuration

server: http_listen_port: 9080 positions: filename: /tmp/positions.yaml clients: - url: http://localhost:3100/loki/api/v1/push scrape_configs: - job_name: sei_logs static_configs: - targets: - localhost labels: job: seid_logs __path__: /var/log/seid/*.log

Log Rotation

Configure logrotate to manage log files:

sudo tee /etc/logrotate.d/sei << EOF /var/log/sei/*.log { daily rotate 14 compress delaycompress notifempty create 0640 sei sei sharedscripts postrotate systemctl reload seid endscript } EOF

Security Configuration

Network Security

UFW firewall configuration:

sudo ufw default deny incoming sudo ufw default allow outgoing sudo ufw allow 26656/tcp comment 'Sei P2P' sudo ufw allow 26657/tcp comment 'Sei RPC' sudo ufw allow 9090/tcp comment 'Sei gRPC' sudo ufw enable

Rate Limiting

Example Nginx Configuration with Rate Limiting

http { limit_req_zone $binary_remote_addr zone=sei_rpc:10m rate=10r/s; server { listen 26657; location / { limit_req zone=sei_rpc burst=20 nodelay; proxy_pass http://localhost:26657; } } }

Validator-Specific Monitoring

Status Query

Query validator status through SDK:

seid query staking validator $(seid keys show --bech val -a <validator_keyfile_name>)

Query through REST API:

curl -s "http://localhost:1317/cosmos/staking/v1beta1/validators/<valoper_address>"

Validator “Status” Query Script

#!/bin/bash MONIKER="$1" API_URL="http//localhost:1317/cosmos/staking/v1beta1/validators?pagination.limit=500" echo "Querying validators from $API_URL..." VALIDATOR_DATA=$(curl -s "$API_URL" | jq -c --arg MONIKER "$MONIKER" '.validators[] | select(.description.moniker == $MONIKER)') if [[ -z "$VALIDATOR_DATA" ]]; then echo "❌ No validator found with moniker: $MONIKER" exit 1 fi echo "Validator details:" echo "$VALIDATOR_DATA" | jq '.'

Critical Metrics

Monitor these validator-specific metrics:

# Check signing status seid query slashing signing-info $(seid tendermint show-validator) # Check current delegations seid query staking delegations-to $(seid keys show -a $VALIDATOR_KEY)

Oracle Price Feeder Monitoring

The price feeder exposes metrics at <listen_addr>/api/v1/metrics when telemetry is enabled in config.toml. Health status is available at <listen_addr>/api/v1/healthz.

Backup Management

Complete Automated Backup Script

#!/bin/bash BACKUP_DIR="/backup/sei" DATE=$(date +%Y%m%d) NODE_HOME="/root/.sei" # Create backup directory mkdir -p $BACKUP_DIR # Stop service systemctl stop seid # Backup configuration tar czf $BACKUP_DIR/sei-config-$DATE.tar.gz $NODE_HOME/config # Backup data directory tar czf $BACKUP_DIR/sei-data-$DATE.tar.gz $NODE_HOME/data # Backup key files tar czf $BACKUP_DIR/sei-keys-$DATE.tar.gz $NODE_HOME/keyring-file # Start service systemctl start seid # Remove backups older than 7 days find $BACKUP_DIR -type f -mtime +7 -name '*.tar.gz' -delete # Log backup completion echo "Backup completed successfully on $(date)" >> $BACKUP_DIR/backup.log

Host System Monitoring

Resource Usage Tracking

Install and configure node_exporter:

wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz tar xvf node_exporter-1.5.0.linux-amd64.tar.gz

Add to Prometheus configuration:

scrape_configs: - job_name: 'node' static_configs: - targets: ['localhost:9100']

Performance Testing

Example Benchmark Script using eth_getLogs

import { ethers } from 'ethers'; // Configuration const EVM_RPC_URL = 'http://localhost:8545'; // EVM RPC endpoint to test const CONTRACT_ADDRESS = '0x0000000000000000000000000000000000001002'; // replace with very active contract for best results const INITIAL_BLOCK_RANGE = 50; // range of blocks to query using 'eth_getLogs' const RANGE_INCREMENT = 10; // additional blocks to query each consecutive round const MAX_TESTS = 50; // total number of rounds for testing // Store metrics for final analysis const metrics = []; function getResponseSize(logs) { return Buffer.byteLength(JSON.stringify(logs), 'utf8'); } function formatBytes(bytes) { if (bytes === 0) return '0 B'; const k = 1024; const sizes = ['B', 'KB', 'MB', 'GB']; const i = Math.floor(Math.log(bytes) / Math.log(k)); return `${parseFloat((bytes / Math.pow(k, i)).toFixed(2))} ${sizes[i]}`; } function padString(str, length) { return String(str).padEnd(length); } function analyzeResults(metrics) { console.log('\nPerformance Analysis'); console.log('='.repeat(50)); // Filter out queries with no logs for meaningful statistics const queriesWithLogs = metrics.filter((m) => m.logsCount > 0); const totalQueries = metrics.length; console.log(`\nGeneral Statistics:`); console.log(`Total Queries Run: ${totalQueries}`); console.log(`Queries with Logs: ${queriesWithLogs.length}`); console.log(`Empty Responses: ${totalQueries - queriesWithLogs.length}`); if (queriesWithLogs.length > 0) { const avgResponseTime = queriesWithLogs.reduce((acc, m) => acc + m.responseTime, 0) / queriesWithLogs.length; const avgLogsPerQuery = queriesWithLogs.reduce((acc, m) => acc + m.logsCount, 0) / queriesWithLogs.length; const maxLogs = Math.max(...queriesWithLogs.map((m) => m.logsCount)); const maxLogsQuery = queriesWithLogs.find((m) => m.logsCount === maxLogs); console.log(`\nPerformance Metrics:`); console.log(`Average Response Time (with logs): ${avgResponseTime.toFixed(2)}ms`); console.log(`Average Logs per Query: ${avgLogsPerQuery.toFixed(2)}`); console.log(`Maximum Logs in Single Query: ${maxLogs}`); if (maxLogsQuery) { console.log(`- At Range Size: ${maxLogsQuery.rangeSize} blocks`); console.log(`- Response Time: ${maxLogsQuery.responseTime}ms`); console.log(`- Efficiency: ${maxLogsQuery.logsPerMs.toFixed(3)} logs/ms`); } // Identify optimal range size based on logs/ms const bestEfficiency = queriesWithLogs.reduce((best, m) => (m.logsPerMs > best.logsPerMs ? m : best)); console.log(`\nOptimal Performance:`); console.log(`Best Efficiency: ${bestEfficiency.logsPerMs.toFixed(3)} logs/ms`); console.log(`- At Range Size: ${bestEfficiency.rangeSize} blocks`); console.log(`- Retrieved ${bestEfficiency.logsCount} logs in ${bestEfficiency.responseTime}ms`); } } async function testEthGetLogs() { const provider = new ethers.JsonRpcProvider(EVM_RPC_URL); try { const latestBlock = await provider.getBlockNumber(); console.log(`Latest block: ${latestBlock} (0x${latestBlock.toString(16)})`); let currentToBlock = latestBlock; let currentRange = INITIAL_BLOCK_RANGE; let testCount = 0; // Column headers with fixed widths console.log('\nBlock Range Time Logs Size B/ms Logs/ms KB/Log Range'); console.log('='.repeat(80)); while (testCount < MAX_TESTS && currentToBlock > 0) { const fromBlock = Math.max(0, currentToBlock - currentRange); try { const startTime = Date.now(); const filter = { fromBlock: fromBlock, toBlock: currentToBlock, address: CONTRACT_ADDRESS }; const logs = await provider.getLogs(filter); const endTime = Date.now(); const responseTime = endTime - startTime; const logsCount = logs.length; const responseSize = getResponseSize(logs); // Calculate metrics const bytesPerMs = (responseSize / responseTime).toFixed(1); const logsPerMs = (logsCount / responseTime).toFixed(3); const kbPerLog = logsCount > 0 ? (responseSize / 1024 / logsCount).toFixed(2) : 'N/A'; // Store metrics for analysis metrics.push({ rangeSize: currentRange, responseTime, logsCount, responseSize, bytesPerMs: parseFloat(bytesPerMs), logsPerMs: parseFloat(logsPerMs), kbPerLog: kbPerLog !== 'N/A' ? parseFloat(kbPerLog) : 0 }); // Format block range const rangeDisplay = `${fromBlock.toString(16)}-${currentToBlock.toString(16)}`; // Log with fixed column widths console.log(padString(rangeDisplay, 17) + padString(responseTime, 6) + padString(logsCount, 8) + padString(formatBytes(responseSize), 9) + padString(bytesPerMs, 8) + padString(logsPerMs, 9) + padString(kbPerLog, 8) + currentRange); if (logsCount === 10000) { console.log(`\nWarning: Hit 10000 log limit at range ${currentRange}`); } currentToBlock = fromBlock - 1; currentRange += RANGE_INCREMENT; testCount++; } catch (error) { console.log(`Error at range ${currentRange}: ${error.message}`); currentRange = Math.max(INITIAL_BLOCK_RANGE, currentRange - RANGE_INCREMENT); currentToBlock = fromBlock - 1; testCount++; } await new Promise((resolve) => setTimeout(resolve, 1000)); } // Perform final analysis analyzeResults(metrics); } catch (error) { console.error('Failed to initialize or get latest block:', error); process.exit(1); } } // Run the test testEthGetLogs();

For specific customizations or additional metrics, consult the Sei technical communities in Telegram or Discord.

Last updated on