Best-Practices Guide to Large-Scale Machine-To-Machine Connections
Introduction
This document provides an overview of best practices for optimizing machine-to-machine (M2M) connections. It will cover configuration changes at PrivX and the database end, with the goal of enhancing overall application performance and user experience.
PrivX Configuration
Audit Events
PrivX logs a wide range of events for each connection to database and syslog. The extensive logging uses extra storage and has an impact on performance in case of huge amount of M2M connections. If the extensive logging is not critical for you, you can reduce the logging by updating the Audit events exclusion list for various microservices. By doing so, certain events can be excluded from being written to the database, although all audit events will still be recorded in syslog regardless of this setting change.
The table below describes the audit events that are generated by machine-to-machine connections. It includes details such as the microservice, event name, code, and action, among others. The action column is an example that you must adjust to your own needs.
Microservice | Event Name | Code | Number of Events | Action |
---|---|---|---|---|
SSH-MITM | Client-authenticated | 305 | 1 | Keep |
SSH-MITM | Connection-requested | 300 | 1 | Keep |
AUTHORIZER | Authorization-certificate-granted | 401 | 2 | Exclude |
AUTHORIZER | Authorization-role-key-granted | 402 | 1 | Exclude |
SSH-MITM | Host-key-matched | 324 | 1 | Exclude |
SSH-MITM | Connection-audit-started | 334 | 1 | Exclude |
SSH-MITM | Connection-authenticated | 301 | 1 | Keep |
SSH-MITM | Session-added | 310 | 1 | Exclude |
SSH-MITM | Session-removed | 311 | 1 | Exclude |
SSH-MITM | Connection-closed | 303 | 1 | Keep |
Use the Exclusion List setting to define a comma-separated list of audit-event code ranges, such as 1, 10, 20-30, for any microservice. The setting is at Administrator→Settings→SSH Proxy under the Auditevents section.
For information about PrivX audit events, see the PrivX Audit Events Reference.
Nginx Proxy Timeouts
If the database is experiencing high levels of activity or if its size is substantial, it may lead to extended response times, exceeding 2 minutes. This is particularly noticeable when accessing or performing searches on the connections and events pages under Monitoring in the PrivX GUI. This may lead to Nginx timeout errors.
In such situations, increasing the Nginx timeout value may help. However, it is crucial to choose the timeout value carefully, as setting it too long might result in an increase in attempted/aborted queries, leading to even higher server load. Finding the right balance for the timeout value is essential to ensure optimal performance and prevent unnecessary server strain.
On all PrivX Servers, modify the /etc/nginx/nginx.conf
file, add the following lines within the http
block but outside the server
block. You may start with a value such as 5 minutes:
http { ... proxy_read_timeout 5m; proxy_connect_timeout 5m; ... }
Here's a breakdown of the configuration options:
proxy_connect_timeout
: Specifies the maximum duration allowed for establishing a connection.proxy_read_timeout
: Sets the maximum time to wait for a response.
By including these settings in the specified location, you can adjust the timeout values for proxy connections within your Nginx configuration.
Note
Achieving an optimal database status is essential for maintaining a well-functioning system. This includes setting a reasonable retention period for data and ensuring that no unnecessary or obsolete data accumulates in the database.
Data Retention
Audit Events
PrivX supports integration to SIEM solutions including Splunk: logs are sent to syslog where they are subsequently picked up by your SIEM. If you have enabled SIEM integration, we recommend minimizing required retention period for audit events in the PrivX configuration. Since the audit events will be available in the SIEM solution for an extended period, keeping a shorter retention period in PrivX helps optimize database-storage usage, enhance database performance, and ensure that only the necessary audit data is retained within PrivX.
Connection Metadata
Metadata of completed connections is stored in the PrivX database. Connection metadata is not regularly removed by default. You can restrict the retention period for connection metadata from the PrivX GUI Administration→Settings→Connection Manager, with the Connection Metadata Retention setting.
Trails
Trails (session recordings of connections) are stored on PrivX servers by default. When session recording is enabled, the following connection-specific features are supported in PrivX.
- Video playback.
- Transferred files.
- Clipboard (RDP only).
- Channel logs (SSH only).
To optimize trail-storage space, we recommend setting trail expiration and trail-transferred-files expiration in PrivX. Trail-storage size should be decided based on factors such as the number and type of connections, activity during sessions, and the volume of transferred files.
When selecting a storage class and performance mode, consider the daily trail size. The chosen storage class should align with the specific needs of your system, such as cost-effectiveness and durability. Likewise, the performance mode should be chosen to ensure efficient access to the trail data based on your requirements, whether it's frequent access or cost optimization.
By carefully considering the trail retention period, storage class, and performance mode, you can strike a balance between managing disk space, optimizing costs, and ensuring effective access to the trail data as needed.
PrivX Database
Database Connection Limit
Adjust the connection limit for your PostgreSQL database to accommodate high-availability (HA) production setups with multiple instances and an external database. The default connection limit is 100 connections, but you will need to change it based on maximum concurrent sessions and the number of PrivX Servers. Doing this ensures that your database can handle the required number of connections for the PrivX instances without encountering connection-related issues, and provide the necessary scalability for your high-availability environment.
With default microservice configurations, you need about 250 connections per PrivX Server. So for example in a HA setup with 4 nodes the recommended connection limit is 1000 or more.
CPU
Database-query performance relies heavily on the CPU. Tasks like aggregations, joins, hashing, grouping, sorting, and other complex operations require significant CPU time to execute efficiently. For this reason you should have a capable CPU that can handle these tasks effectively.
Memory
Ensure the database machine has enough memory to handle queries.
With increased memory, you’ll also see increased disk cache and reduced I/O operations on the disk. This improves PostgreSQL query performance significantly, as I/O operations are a lot more expensive than operations in memory. It’s good to have a little more memory than what’s absolutely necessary.
Insufficient memory may lead to terminated queries, or other running processes may be killed abruptly to accommodate for queries. You should monitor the performance of the database, and be prepared to increase database resources.
A database expert can assess the current setup, identify potential bottlenecks, and recommend optimal configurations for improved performance and reliability.
High Performance SSD Disk
A fast read and write time greatly improves the performance of a PostgreSQL query, as data can be quickly loaded into the memory or quickly off-loaded from memory.
Improve Performance with Indexing
Indexing improves the performance of database queries. For more information about indexing for improved performance, see Improve Performance with Indexing.
Network
Network delays often emerge as a primary performance bottleneck in large environments. To address this, ensure that the database and PrivX nodes are located in close proximity, such as within the same datacenter or datacenter/availability zones within the same region.
Identify and Review M2M use cases.
Gather information about existing M2M connections using the following format.
Source Server | Source Account | Target Server | Target Account | Connection Frequency | Use Case |
---|---|---|---|---|---|
Considering the use case and frequency of connections, carefully evaluate the potential benefits of migrating connections to PrivX. Assess whether auditing and session recording provide any value for these automated M2M connections before deciding.
When incorporating use cases into PrivX, it is advisable to adopt a phased approach, especially when dealing with large-scale deployments. By taking a phased approach, you can break down the implementation into manageable stages, allowing for careful testing and validation at each step.
For example, In the case of cluster monitoring connections between two servers, where the daily connection count exceeds 20,000, it may be worth excluding these types of connections due to the additional overhead they would impose on PrivX.
Alternatively, implementing SSH-key-command restrictions could be a more suitable option for managing these types of connections.
In addition, if providing zero-trust authentication is key to your use cases while session auditing is not critical, you can also implement M2M use cases by fetching an ephemeral certificate for login.
Updated 3 months ago