Error Codes and Troubleshooting of Nodes

Understanding common errors and their solutions helps maintain a healthy node operation.

Common Error Codes

Here are the most frequent errors you might encounter and their solutions:

Consensus Errors

When you encounter consensus errors, quick and appropriate action is essential:


Error: "Consensus failure - height halted"
Solution: Check for network upgrades or chain halts
Command: seid status | jq .SyncInfo


Error: "Private validator file not found"
Solution: Restore validator key or check file permissions
Location: $HOME/.sei/config/priv_validator_key.json


Error: "Duplicate signature"
Solution: IMMEDIATELY STOP NODE - potential double signing risk
Action: Check validator operation on other machines

Network Errors

Network errors can prevent your node from participating in consensus:


Error: "Dial tcp connection refused"
Solution: Check network connectivity and firewall rules
Commands:
  - netstat -tulpn | grep seid
  - ufw status


Error: "No peers available"
Solution: Verify peer connections and network config
Commands:

- curl localhost:26657/net_info

Database Errors

Database corruption can require immediate attention:


Error: "Database is corrupted"
Solution: Reset database or restore from backup
Commands:
  - seid tendermint unsafe-reset-all
  - cp -r backup/data $HOME/.sei/

Diagnostic Commands

These commands help you investigate issues and monitor your node:


# Check node synchronization
seid status | jq '.sync_info'
 
# Check validator status
seid query staking validator $(seid tendermint show-validator)
 
# Monitor real-time logs
journalctl -fu seid -o cat
 
# View system resource usage
top -p $(pgrep seid)

AppHash Mismatch Errors

If you encounter an AppHash mismatch, you’ll need to capture the state for comparison with a known good version:


# For SeiDB (most non-archive nodes):
git clone https://github.com/sei-protocol/sei-db.git
cd sei-db/tools
make install
systemctl stop seid
seidb dump-iavl -d $HOME/.sei/data/committer.db -o /home/ubuntu/iavl-dump
systemctl restart seid
 
# For Legacy IAVL DB:
seid debug dump-iavl <latest height>

Always include the app hash, commit hash, and block height from your logs when reporting issues.

Identifying AppHash Errors

AppHash errors typically appear in logs as:


ERR wrong Block.Header.AppHash. Expected [EXPECTED_HASH], got [ACTUAL_HASH]
block_id={"hash":"...","parts":{"hash":"...","total":1}} height=[HEIGHT]

Common Causes:

Using incorrect node version during sync (ensure you’re on the latest version)
Corrupted or incorrectly applied snapshots
Database inconsistencies from improper shutdowns
Syncing with outdated or incompatible peers

Resolution Steps:

Stop the node immediately.
Try a node rollback first:, see here
If rollback fails, restore from a fresh snapshot:
- Download a recent snapshot from trusted providers (Polkachu, PublicNode)
- Ensure you’re using the correct node version
- Verify peer configurations are up to date
Restart the node and monitor logs for continued errors

Peer Connection Issues as AppHash Red Herrings

Important Note: Peer connection failures are often symptoms of underlying AppHash errors, not the root cause.

When you see extensive peer connection errors like:


ERR failed to handshake with peer
ERR failed to send request for peers
ERR peer handshake failed endpoint={} err=EOF

Don’t focus solely on fixing peer connections first. Instead:

Scan your logs carefully for AppHash errors that may appear intermittently

Look for the actual error pattern:


ERR wrong Block.Header.AppHash. Expected [HASH], got [HASH]

Check if your node is stuck at a specific height despite peer connection attempts

Why This Happens:

AppHash mismatches prevent proper block validation
Node cannot advance to new blocks due to state inconsistency
Peers may reject connections from nodes with corrupted state
Network appears to be the problem when it’s actually a local state issue

Debugging Approach:

First, check for AppHash errors in your logs (search for “wrong Block.Header.AppHash”)
If AppHash errors are found, treat this as the primary issue
Only focus on peer connection fixes if no AppHash errors exist

This approach can save hours of debugging time by addressing the root cause rather than symptoms.

Peer Connection and Handshake Issues

Identifying Peer Issues:

Look for these error patterns in your logs:


ERR failed to handshake with peer err="expected to connect with peer \"[EXPECTED_ID]\", got \"[ACTUAL_ID]\""
ERR failed to send request for peers err="no available peers to send a PEX request to (retrying)"
ERR peer handshake failed endpoint={} err=EOF module=p2p

Common Causes:

Outdated peer configurations with mismatched node IDs
Network infrastructure changes on peer side
Firewall blocking connections on port 26656
DNS resolution issues

Resolution Steps:

Update peer configurations with current node IDs:


# Example updated persistent peers for Sei mainnet
persistent-peers = "3be6b24cf86a5938cce7d48f44fb6598465a9924@p2p.state-sync-0.pacific-1.seinetwork.io:26656,b21279d7092fde2e41770832a1cacc7d0051e9dc@p2p.state-sync-1.pacific-1.seinetwork.io:26656,616c05e9ba24acc89c0de630b5e3adbedaebb478@p2p.state-sync-2.pacific-1.seinetwork.io:26656"

Verify network connectivity:


# Test connection to peer endpoints
nc -zv p2p.state-sync-0.pacific-1.seinetwork.io 26656
 
# Check if port 26656 is open for inbound connections
netstat -tulpn | grep :26656

Check current peer status:


curl http://localhost:26657/net_info | jq '.result.peers | length'
curl http://localhost:26657/lag_status | jq .

Sync Performance Issues

Identifying Sync Problems:

Monitor these indicators:


# Check sync status and lag
curl http://localhost:26657/lag_status | jq .
 
# Monitor if height is progressing
curl http://localhost:26657/status | jq '.result.sync_info'

Common Solutions:

Increase packet payload size for large block processing:


# In config.toml [p2p] section
max-packet-msg-payload-size = 1024000  # Increase from default 102400

Optimize mempool settings in config.toml:


# In [mempool] section
keep-invalid-txs-in-cache = true
ttl-duration = "5s"
ttl-num-blocks = 5

If node gets stuck at specific height:
- Try restarting the node
- If restart doesn’t help, perform rollback
- Consider taking a fresh snapshot

Warning Signs to Watch For:

Current height not increasing over time
Increasing lag between current height and max peer height
Repeated timeout errors in logs
Mempool size consistently reaching limits

Crash and Panic Debugging

For crashes, panics, or nil pointer exceptions:

Capture at least 1,000 lines of logs leading up to the crash
Or collect 15 minutes of log data, whichever provides more context
Include the full stack trace if available

Logging Configuration

Proper logging configuration is essential for debugging and monitoring:


# In config.toml
# Set appropriate log level
log_level = "debug"  # Use "trace" for maximum detail
 
# Choose log format
log_format = "json"  # Use "plain" for human-readable logs

Configure log rotation to manage storage effectively:


# Example logrotate configuration
sudo tee /etc/logrotate.d/seid << EOF
/var/log/seid/*.log {
    daily
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 sei sei
    sharedscripts
    postrotate
        systemctl reload seid
    endscript
}
EOF

Enable core dumps for crash analysis:


# Set unlimited core dump size
ulimit -c unlimited
 
# Configure core dump location
echo "/tmp/core.%e.%p" > /proc/sys/kernel/core_pattern

Other common Issues and Fixes

Sync Problems
- Check available disk space (df -h)
- Ensure proper peer connections (curl http://localhost:26657/net_info)
- Verify firewall settings (port 26656 open)
Performance Issues
- Monitor system resources (htop or iotop)
- Check disk I/O performance (iostat)
- Analyze network traffic (iftop)
Database Issues
- Run database integrity checks using:
```
seid debug dump-db | grep -i error
```
  If errors are detected, consider restoring from a recent backup.
- Consider pruning excessive historical data by adjusting ss-keep-recent in app.toml or running:
```
seid unsafe-reset-all --home=$HOME/.sei --keep-addr-book
```
  Alternatively, manually remove old state snapshots to free up space:
```
rm -rf $HOME/.sei/data/snapshots/*
```

Node Rollback

To rollback a node from an AppHashed state, you need to stop the node first. Do this in your preferred way.

Next, do a soft rollback with:


seid rollback

And then a hard rollback with:


seid rollback --hard

Then, restart the node.

In case you see the following error while trying to rollback:


failed to initialize database: resource temporarily unavailable

This means that you did not shutdown the node properly. Try to shutdown or kill the seid process directly in that case. If this doesn’t help, restart your machine.

Then try the rollback steps again.