# Monitoring and Maintenance Guide This guide provides comprehensive instructions for monitoring and maintaining the Multi-Tenant SaaS Platform in production environments. ## Overview Effective monitoring and maintenance are crucial for ensuring the reliability, performance, and security of your Multi-Tenant SaaS Platform. This guide covers monitoring tools, maintenance procedures, and best practices for Malaysian SME deployments. ## Monitoring Architecture ### Components to Monitor 1. **Application Layer**: Django backend, React frontend 2. **Database Layer**: PostgreSQL with multi-tenant schemas 3. **Cache Layer**: Redis for caching and sessions 4. **Infrastructure Layer**: Server resources, network, storage 5. **Business Layer**: User activity, transactions, performance metrics ### Monitoring Stack - **Prometheus**: Metrics collection and storage - **Grafana**: Visualization and dashboards - **Alertmanager**: Alerting and notifications - **Elasticsearch**: Log aggregation and search - **Kibana**: Log visualization and analysis ## Quick Setup ### 1. Install Monitoring Stack ```bash # Create monitoring directory mkdir -p /opt/monitoring cd /opt/monitoring # Create docker-compose.yml for monitoring cat > docker-compose.yml << 'EOF' version: '3.8' services: # Prometheus prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/etc/prometheus/console_libraries' - '--web.console.templates=/etc/prometheus/consoles' - '--storage.tsdb.retention.time=200h' - '--web.enable-lifecycle' networks: - monitoring # Grafana grafana: image: grafana/grafana:latest ports: - "3000:3000" volumes: - grafana_data:/var/lib/grafana - ./grafana/dashboards:/var/lib/grafana/dashboards - ./grafana/provisioning:/etc/grafana/provisioning environment: - GF_SECURITY_ADMIN_PASSWORD=your-secure-password networks: - monitoring # Alertmanager alertmanager: image: prom/alertmanager:latest ports: - "9093:9093" volumes: - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml - alertmanager_data:/alertmanager networks: - monitoring # Node Exporter node-exporter: image: prom/node-exporter:latest ports: - "9100:9100" volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: - '--path.procfs=/host/proc' - '--path.rootfs=/rootfs' - '--path.sysfs=/host/sys' - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($|/)' networks: - monitoring # PostgreSQL Exporter postgres-exporter: image: prometheuscommunity/postgres-exporter:latest ports: - "9187:9187" environment: - DATA_SOURCE_NAME=postgresql://multi_tenant_prod_user:your-password@localhost:5432/multi_tenant_saas_prod?sslmode=disable networks: - monitoring # Redis Exporter redis-exporter: image: oliver006/redis_exporter:latest ports: - "9121:9121" environment: - REDIS_ADDR=redis://localhost:6379 networks: - monitoring volumes: prometheus_data: grafana_data: alertmanager_data: networks: monitoring: driver: bridge EOF ``` ### 2. Configure Prometheus ```bash # Create Prometheus configuration cat > prometheus.yml << 'EOF' global: scrape_interval: 15s evaluation_interval: 15s rule_files: - "alert_rules.yml" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node-exporter' static_configs: - targets: ['localhost:9100'] - job_name: 'postgres-exporter' static_configs: - targets: ['localhost:9187'] - job_name: 'redis-exporter' static_configs: - targets: ['localhost:9121'] - job_name: 'django-app' static_configs: - targets: ['localhost:8000'] metrics_path: '/metrics' scrape_interval: 30s - job_name: 'nginx' static_configs: - targets: ['localhost:80'] metrics_path: '/nginx_status' scrape_interval: 30s alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093 EOF ``` ### 3. Configure Alertmanager ```bash # Create Alertmanager configuration cat > alertmanager.yml << 'EOF' global: smtp_smarthost: 'localhost:587' smtp_from: 'alerts@your-domain.com' smtp_auth_username: 'your-email@domain.com' smtp_auth_password: 'your-email-password' route: group_by: ['alertname', 'severity'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'web.hook' receivers: - name: 'web.hook' email_configs: - to: 'admin@your-domain.com' subject: '[ALERT] {{ .GroupLabels.alertname }} - {{ .Status }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Labels: {{ .Labels }} {{ end }} inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] EOF ``` ### 4. Create Alert Rules ```bash # Create alert rules cat > alert_rules.yml << 'EOF' groups: - name: system rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage detected" description: "CPU usage is above 80% for more than 5 minutes" - alert: HighMemoryUsage expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80 for: 5m labels: severity: warning annotations: summary: "High memory usage detected" description: "Memory usage is above 80% for more than 5 minutes" - alert: LowDiskSpace expr: (node_filesystem_size_bytes{fstype!="tmpfs"} - node_filesystem_free_bytes{fstype!="tmpfs"}) / node_filesystem_size_bytes{fstype!="tmpfs"} * 100 > 85 for: 5m labels: severity: warning annotations: summary: "Low disk space detected" description: "Disk usage is above 85% for more than 5 minutes" - name: database rules: - alert: PostgreSQLDown expr: up{job="postgres-exporter"} == 0 for: 1m labels: severity: critical annotations: summary: "PostgreSQL is down" description: "PostgreSQL database is not responding" - alert: PostgreSQLSlowQueries expr: rate(pg_stat_database_calls_total[5m]) > 100 for: 5m labels: severity: warning annotations: summary: "High number of slow PostgreSQL queries" description: "PostgreSQL is experiencing slow queries" - alert: PostgreSQLConnectionsHigh expr: sum(pg_stat_database_numbackends) / sum(pg_settings_max_connections) * 100 > 80 for: 5m labels: severity: warning annotations: summary: "High PostgreSQL connection usage" description: "PostgreSQL connection usage is above 80%" - name: application rules: - alert: HighResponseTime expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1 for: 5m labels: severity: warning annotations: summary: "High response time detected" description: "95th percentile response time is above 1 second" - alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 > 5 for: 5m labels: severity: warning annotations: summary: "High error rate detected" description: "HTTP 5xx error rate is above 5%" - alert: ServiceDown expr: up{job="django-app"} == 0 for: 1m labels: severity: critical annotations: summary: "Application service is down" description: "The Django application is not responding" EOF ``` ### 5. Start Monitoring Stack ```bash # Start monitoring services docker-compose up -d # Verify services are running docker-compose ps # Access monitoring dashboards # Prometheus: http://localhost:9090 # Grafana: http://localhost:3000 (admin/your-secure-password) # Alertmanager: http://localhost:9093 ``` ## Application Monitoring ### 1. Django Application Metrics ```python # Add to settings.py INSTALLED_APPS = [ # ... other apps 'django_prometheus', ] MIDDLEWARE = [ 'django_prometheus.middleware.PrometheusBeforeMiddleware', # ... other middleware 'django_prometheus.middleware.PrometheusAfterMiddleware', ] ``` ### 2. Custom Metrics ```python # Create metrics.py from prometheus_client import Counter, Histogram, Gauge # Business metrics active_tenants = Gauge('multi_tenant_active_tenants', 'Number of active tenants') total_users = Gauge('multi_tenant_total_users', 'Total number of users') total_transactions = Counter('multi_tenant_total_transactions', 'Total transactions') # Performance metrics api_response_time = Histogram('multi_tenant_api_response_time', 'API response time') db_query_time = Histogram('multi_tenant_db_query_time', 'Database query time') # Error metrics api_errors = Counter('multi_tenant_api_errors', 'API errors', ['method', 'endpoint']) db_errors = Counter('multi_tenant_db_errors', 'Database errors', ['operation']) # Malaysian-specific metrics malaysian_users = Gauge('multi_tenant_malaysian_users', 'Number of Malaysian users') sst_transactions = Counter('multi_tenant_sst_transactions', 'SST transactions', ['rate']) ``` ### 3. Database Monitoring ```sql -- Enable PostgreSQL extensions CREATE EXTENSION pg_stat_statements; -- Create monitoring views CREATE OR REPLACE VIEW monitoring.tenant_stats AS SELECT t.schema_name, COUNT(u.id) as user_count, COUNT(s.id) as subscription_count, SUM(s.amount) as total_revenue FROM core_tenant t LEFT JOIN core_user u ON t.id = u.tenant_id LEFT JOIN core_subscription s ON t.id = s.tenant_id GROUP BY t.schema_name; -- Performance monitoring CREATE OR REPLACE VIEW monitoring.query_performance AS SELECT query, mean_time, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 100; ``` ## Log Management ### 1. Centralized Logging with ELK Stack ```bash # Create docker-compose.yml for ELK stack version: '3.8' services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0 environment: - discovery.type=single-node - "ES_JAVA_OPTS=-Xms512m -Xmx512m" ports: - "9200:9200" volumes: - elasticsearch_data:/usr/share/elasticsearch/data logstash: image: docker.elastic.co/logstash/logstash:7.17.0 volumes: - ./logstash/pipeline:/usr/share/logstash/pipeline ports: - "5044:5044" kibana: image: docker.elastic.co/kibana/kibana:7.17.0 ports: - "5601:5601" environment: - ELASTICSEARCH_HOSTS=http://elasticsearch:9200 filebeat: image: docker.elastic.co/beats/filebeat:7.17.0 volumes: - ./filebeat.yml:/usr/share/filebeat/filebeat.yml - /var/log:/var/log:ro depends_on: - elasticsearch volumes: elasticsearch_data: ``` ### 2. Logstash Configuration ```ruby # logstash/pipeline/logstash.conf input { beats { port => 5044 } } filter { if [type] == "django" { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:logger} - %{GREEDYDATA:message}" } } date { match => [ "timestamp", "ISO8601" ] } } if [type] == "nginx" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] } } # Add Malaysian timezone context date { match => [ "timestamp", "ISO8601" ] target => "@timestamp" } ruby { code => "event.set('[@metadata][tz_offset]', '+08:00')" } } output { elasticsearch { hosts => ["elasticsearch:9200"] index => "logs-%{+YYYY.MM.dd}" } } ``` ### 3. Filebeat Configuration ```yaml # filebeat.yml filebeat.inputs: - type: log enabled: true paths: - /var/log/multi-tenant-saas/*.log fields: type: django - type: log enabled: true paths: - /var/log/nginx/*.log fields: type: nginx output.logstash: hosts: ["logstash:5044"] processors: - add_docker_metadata: host: "unix:///var/run/docker.sock" ``` ## Business Metrics Monitoring ### 1. Key Performance Indicators (KPIs) ```python # KPI monitoring class BusinessMetrics: def __init__(self): self.active_tenants = Gauge('business_active_tenants', 'Active tenant count') self.monthly_revenue = Gauge('business_monthly_revenue', 'Monthly revenue') self.user_growth = Gauge('business_user_growth', 'User growth rate') self.churn_rate = Gauge('business_churn_rate', 'Customer churn rate') # Malaysian-specific metrics self.malaysian_tenant_percentage = Gauge('business_malaysian_tenant_percentage', 'Percentage of Malaysian tenants') self.sst_collected = Counter('business_sst_collected', 'SST amount collected') self.local_payment_methods = Counter('business_local_payments', 'Local payment method usage') def update_metrics(self): # Update active tenants active_count = Tenant.objects.filter(is_active=True).count() self.active_tenants.set(active_count) # Update monthly revenue monthly_rev = PaymentTransaction.objects.filter( created_at__month=datetime.now().month, status='completed' ).aggregate(total=Sum('amount'))['total'] or 0 self.monthly_revenue.set(monthly_rev) # Update Malaysian metrics total_tenants = Tenant.objects.count() malaysian_tenants = Tenant.objects.filter( Q(business_address__country='Malaysia') | Q(contact_phone__startswith='+60') ).count() self.malaysian_tenant_percentage.set( (malaysian_tenants / total_tenants * 100) if total_tenants > 0 else 0 ) ``` ### 2. Real-time Dashboards Create Grafana dashboards for: - System health overview - Application performance - Database performance - Business metrics - User activity - Malaysian market metrics ## Malaysian-Specific Monitoring ### 1. SST Compliance Monitoring ```python # SST monitoring class SSTMonitor: def __init__(self): self.sst_rate_compliance = Gauge('sst_rate_compliance', 'SST rate compliance') self.sst_filing_deadline = Gauge('sst_filing_days_remaining', 'Days until SST filing deadline') self.sst_collected_vs_reported = Gauge('sst_collected_vs_reported', 'SST collected vs reported') def check_sst_compliance(self): # Check if SST rates are correctly applied expected_rate = 0.06 actual_rates = PaymentTransaction.objects.filter( created_at__month=datetime.now().month ).values_list('tax_rate', flat=True).distinct() compliance = all(abs(rate - expected_rate) < 0.001 for rate in actual_rates) self.sst_rate_compliance.set(1 if compliance else 0) # Check SST filing deadline today = datetime.now().date() filing_deadline = self.get_sst_filing_deadline(today) days_remaining = (filing_deadline - today).days self.sst_filing_deadline.set(days_remaining) # Alert if deadline is approaching if days_remaining <= 7: self.trigger_sst_deadline_alert(days_remaining) ``` ### 2. Malaysian Business Hours Monitoring ```python # Malaysian business hours monitoring class BusinessHoursMonitor: def __init__(self): self.business_hour_activity = Gauge('business_hour_activity', 'Activity during business hours') self.off_hour_activity = Gauge('off_hour_activity', 'Activity outside business hours') def monitor_activity(self): # Malaysian business hours: 9 AM - 6 PM, Monday - Friday now = datetime.now() is_business_hour = ( now.weekday() < 5 and # Monday - Friday 9 <= now.hour < 18 # 9 AM - 6 PM ) if is_business_hour: self.business_hour_activity.inc() else: self.off_hour_activity.inc() ``` ### 3. Malaysian Payment Gateway Monitoring ```python # Payment gateway monitoring class PaymentGatewayMonitor: def __init__(self): self.payment_success_rate = Gauge('payment_success_rate', 'Payment success rate') self.gateway_response_time = Histogram('gateway_response_time', 'Payment gateway response time') self.gateway_downtime = Counter('gateway_downtime', 'Payment gateway downtime') def monitor_gateways(self): gateways = ['touch_n_go', 'grabpay', 'online_banking'] for gateway in gateways: success_rate = self.calculate_success_rate(gateway) self.payment_success_rate.labels(gateway=gateway).set(success_rate) # Monitor response times response_time = self.measure_response_time(gateway) self.gateway_response_time.labels(gateway=gateway).observe(response_time) # Check for downtime if not self.is_gateway_available(gateway): self.gateway_downtime.labels(gateway=gateway).inc() ``` ## Maintenance Procedures ### 1. Daily Maintenance ```bash #!/bin/bash # daily_maintenance.sh # Log maintenance echo "$(date): Starting daily maintenance" >> /var/log/maintenance.log # Rotate logs logrotate -f /etc/logrotate.d/multi-tenant-saas # Clear old logs find /var/log/multi-tenant-saas -name "*.log.*" -mtime +30 -delete # Monitor disk space df -h | awk '$5+0 > 85 {print $6 " is " $5 " full"}' >> /var/log/maintenance.log # Check service health systemctl is-active --quiet gunicorn || echo "Gunicorn service is down" >> /var/log/maintenance.log systemctl is-active --quiet nginx || echo "Nginx service is down" >> /var/log/maintenance.log # Check database connections psql -U multi_tenant_prod_user -d multi_tenant_saas_prod -c "SELECT count(*) FROM pg_stat_activity;" >> /var/log/maintenance.log # Clear cache redis-cli FLUSHDB >> /var/log/maintenance.log echo "$(date): Daily maintenance completed" >> /var/log/maintenance.log ``` ### 2. Weekly Maintenance ```bash #!/bin/bash # weekly_maintenance.sh # Database maintenance echo "$(date): Starting weekly database maintenance" >> /var/log/maintenance.log # Vacuum and analyze psql -U multi_tenant_prod_user -d multi_tenant_saas_prod -c "VACUUM ANALYZE;" >> /var/log/maintenance.log # Update statistics psql -U multi_tenant_prod_user -d multi_tenant_saas_prod -c "ANALYZE;" >> /var/log/maintenance.log # Check table sizes psql -U multi_tenant_prod_user -d multi_tenant_saas_prod -c " SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; " >> /var/log/maintenance.log # Index maintenance psql -U multi_tenant_prod_user -d multi_tenant_saas_prod -c "REINDEX DATABASE multi_tenant_saas_prod;" >> /var/log/maintenance.log echo "$(date): Weekly database maintenance completed" >> /var/log/maintenance.log ``` ### 3. Monthly Maintenance ```bash #!/bin/bash # monthly_maintenance.sh # Security updates echo "$(date): Starting monthly security updates" >> /var/log/maintenance.log # Update system packages apt-get update && apt-get upgrade -y >> /var/log/maintenance.log # Update Python packages source /opt/multi-tenant-saas/venv/bin/activate pip list --outdated >> /var/log/maintenance.log pip install --upgrade -r /opt/multi-tenant-saas/requirements.txt >> /var/log/maintenance.log # Update Node packages cd /opt/multi-tenant-saas/frontend npm update >> /var/log/maintenance.log # Database backup full /opt/multi-tenant-saas/scripts/backup-database.sh >> /var/log/maintenance.log # SSL certificate check openssl x509 -in /etc/letsencrypt/live/your-domain.com/fullchain.pem -text -noout | grep "Not After" >> /var/log/maintenance.log # Performance review # Check slow queries psql -U multi_tenant_prod_user -d multi_tenant_saas_prod -c " SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; " >> /var/log/maintenance.log echo "$(date): Monthly maintenance completed" >> /var/log/maintenance.log ``` ## Automated Scheduling ### 1. Cron Jobs ```bash # Add to crontab # Daily maintenance at 2 AM 0 2 * * * /opt/multi-tenant-saas/scripts/daily_maintenance.sh # Weekly maintenance on Sunday at 3 AM 0 3 * * 0 /opt/multi-tenant-saas/scripts/weekly_maintenance.sh # Monthly maintenance on 1st of month at 4 AM 0 4 1 * * /opt/multi-tenant-saas/scripts/monthly_maintenance.sh # Database backup daily at 1 AM 0 1 * * * /opt/multi-tenant-saas/scripts/backup-database.sh # Log rotation daily at midnight 0 0 * * * /usr/sbin/logrotate -f /etc/logrotate.d/multi-tenant-saas # SSL certificate renewal check weekly 0 0 * * 0 /opt/multi-tenant-saas/scripts/check-ssl.sh ``` ### 2. Systemd Timers ```bash # Create systemd timer for daily maintenance cat > /etc/systemd/system/daily-maintenance.timer << 'EOF' [Unit] Description=Daily maintenance tasks Requires=daily-maintenance.service [Timer] OnCalendar=*-*-* 02:00:00 Persistent=true [Install] WantedBy=timers.target EOF # Create systemd service cat > /etc/systemd/system/daily-maintenance.service << 'EOF' [Unit] Description=Daily maintenance tasks [Service] Type=oneshot ExecStart=/opt/multi-tenant-saas/scripts/daily_maintenance.sh User=root Group=root EOF # Enable timer systemctl enable daily-maintenance.timer systemctl start daily-maintenance.timer ``` ## Disaster Recovery ### 1. Backup Verification ```bash #!/bin/bash # verify_backups.sh BACKUP_DIR="/opt/multi-tenant-saas/backups" LOG_FILE="/var/log/backup-verification.log" echo "$(date): Starting backup verification" >> $LOG_FILE # Check if backups exist if [ ! -d "$BACKUP_DIR" ]; then echo "Backup directory does not exist" >> $LOG_FILE exit 1 fi # Check latest backup LATEST_BACKUP=$(ls -t $BACKUP_DIR/database_backup_*.sql.gz | head -1) if [ -z "$LATEST_BACKUP" ]; then echo "No database backup found" >> $LOG_FILE exit 1 fi # Verify backup integrity if gzip -t "$LATEST_BACKUP"; then echo "Backup integrity verified: $LATEST_BACKUP" >> $LOG_FILE else echo "Backup integrity check failed: $LATEST_BACKUP" >> $LOG_FILE exit 1 fi # Check backup size BACKUP_SIZE=$(du -h "$LATEST_BACKUP" | cut -f1) echo "Backup size: $BACKUP_SIZE" >> $LOG_FILE # Test restore (create test database) TEST_DB="backup_test_$(date +%Y%m%d)" createdb -U multi_tenant_prod_user "$TEST_DB" gunzip -c "$LATEST_BACKUP" | psql -U multi_tenant_prod_user "$TEST_DB" # Verify data TABLE_COUNT=$(psql -U multi_tenant_prod_user -d "$TEST_DB" -t -c "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public';") echo "Table count in backup: $TABLE_COUNT" >> $LOG_FILE # Clean up test database dropdb -U multi_tenant_prod_user "$TEST_DB" echo "$(date): Backup verification completed successfully" >> $LOG_FILE ``` ### 2. Failover Procedures ```bash #!/bin/bash # failover_procedures.sh PRIMARY_SERVER="primary.your-domain.com" STANDBY_SERVER="standby.your-domain.com" # Check primary server health if ! curl -f http://$PRIMARY_SERVER/health/ > /dev/null 2>&1; then echo "$(date): Primary server is down, initiating failover" >> /var/log/failover.log # Promote standby ssh $STANDBY_SERVER "sudo systemctl promote postgresql" # Update DNS # This would integrate with your DNS provider API curl -X POST "https://api.dns-provider.com/update" \ -H "Authorization: Bearer $DNS_API_KEY" \ -d '{"record":"your-domain.com","value":"'$STANDBY_SERVER'"}' # Notify administrators echo "Failover completed. Standby server is now primary." | mail -s "Failover Completed" admin@your-domain.com echo "$(date): Failover completed" >> /var/log/failover.log fi ``` ## Performance Optimization ### 1. Database Optimization ```sql -- Create performance monitoring views CREATE OR REPLACE VIEW monitoring.performance_metrics AS SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size, pg_stat_get_numscans(quote_ident(schemaname)||'.'||quote_ident(tablename)) as scans, pg_stat_get_tuples_returned(quote_ident(schemaname)||'.'||quote_ident(tablename)) as tuples_returned, pg_stat_get_tuples_fetched(quote_ident(schemaname)||'.'||quote_ident(tablename)) as tuples_fetched FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; ``` ### 2. Application Optimization ```python # Add to Django settings CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.redis.RedisCache', 'LOCATION': 'redis://localhost:6379/1', 'TIMEOUT': 300, 'OPTIONS': { 'CLIENT_CLASS': 'django_redis.client.DefaultClient', } } } # Database connection pooling DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql', 'NAME': 'multi_tenant_saas_prod', 'USER': 'multi_tenant_prod_user', 'PASSWORD': 'your-password', 'HOST': 'localhost', 'PORT': '5432', 'CONN_MAX_AGE': 60, 'OPTIONS': { 'connect_timeout': 10, 'options': '-c statement_timeout=30000', } } } ``` ## Security Monitoring ### 1. Intrusion Detection ```bash # Install fail2ban apt-get install fail2ban # Configure fail2ban for SSH cat > /etc/fail2ban/jail.local << 'EOF' [sshd] enabled = true port = ssh filter = sshd logpath = /var/log/auth.log maxretry = 3 bantime = 3600 findtime = 600 [nginx-http-auth] enabled = true port = http,https filter = nginx-http-auth logpath = /var/log/nginx/error.log maxretry = 5 bantime = 3600 findtime = 600 EOF # Restart fail2ban systemctl restart fail2ban ``` ### 2. File Integrity Monitoring ```bash # Install AIDE apt-get install aide # Initialize AIDE aideinit # Configure daily checks cat > /etc/cron.daily/aide << 'EOF' #!/bin/sh /usr/bin/aide --check EOF chmod +x /etc/cron.daily/aide ``` ## Malaysian Compliance Monitoring ### 1. PDPA Compliance Monitoring ```python # PDPA compliance monitor class PDPAComplianceMonitor: def __init__(self): self.data_retention_compliance = Gauge('pdpa_data_retention_compliance', 'PDPA data retention compliance') self.consent_management = Gauge('pdpa_consent_management', 'PDPA consent management compliance') self.data_breach_incidents = Counter('pdpa_data_breach_incidents', 'PDPA data breach incidents') def check_compliance(self): # Check data retention policies retention_compliance = self.check_data_retention() self.data_retention_compliance.set(1 if retention_compliance else 0) # Check consent management consent_compliance = self.check_consent_management() self.consent_management.set(1 if consent_compliance else 0) # Monitor for data breaches breach_detected = self.detect_data_breaches() if breach_detected: self.data_breach_incidents.inc() self.trigger_breach_alert() def check_data_retention(self): # Check if personal data is retained beyond required period cutoff_date = datetime.now() - timedelta(days=7*365) # 7 years # Count records older than retention period old_records = User.objects.filter( date_joined__lt=cutoff_date, is_active=False ).count() return old_records == 0 ``` ## Conclusion This comprehensive monitoring and maintenance guide ensures your Multi-Tenant SaaS Platform remains reliable, performant, and compliant with Malaysian regulations. Regular monitoring, proactive maintenance, and automated alerts will help you maintain high service quality and quickly address any issues that arise. Remember to: - Monitor all system components regularly - Set up appropriate alerts for critical issues - Perform regular maintenance tasks - Keep systems updated and secure - Maintain compliance with Malaysian regulations - Document all procedures and incidents For additional support, refer to the main documentation or contact the support team.