Skip to content

Logging and monitoring

Data is gathered for Docker workloads, Docker Compose workloads and system parameters regarding resource usage. Metrics are then sent to OpenSearch in the Management System and visualized in OpenSearch Dashboard. Logging and monitoring options can be configured for each node in the node details view in the Management System:

  1. Log in to the Management System.
  2. Select Nodes in the navigation on the left.
  3. Select the node tree tab.
  4. Select a node in the node tree.
  5. Select the Logs tab.

    Select logs tab

From here, tick the corresponding checkbox of a dashboard and select Save to enable logging or monitoring on that node. Note that system logs do not need to be activated, as they are collected by default. Each dashboard can be accessed by selecting the dashboard link. A new browser tab opens, displaying the selected dashboard. Note that all dashboards are also available if the node is offline.

Logs tab content

Note

Some OpenSearch knowledge could be beneficial when working with the dashboards. Refer to the official OpenSearch documentation for more information on OpenSearch.

Node system logs

Selecting VIEW SYSTEM LOGS will open a new window and show the system logs of the node. The internal node logs are aimed at Nerve service technicians in case of error and failure. Data is stored and visualized with OpenSearch. The amount of logs can be modified through the system log settings by Nerve service technicians. Contact customer support for more information.

System monitoring

Resource utilization of the system as a whole is tracked when system monitoring is enabled. Refer to the screenshot and the table below for more information on the data that is gathered and displayed. Tick the checkbox next to VIEW SYSTEM METRICS to enable the tracking of system data.

System monitoring dashboard

Item Description
CPU Usage Gauge chart
This chart displays the currently used total percentage of the CPU.

Line graph
The line graph displays the CPU usage in percent over time. Data is displayed scaled over time.

In general, the data displayed here is according to how CPU usage is understood in Linux. For an explanation on how CPU usage is handled in Linux, refer to this link.
Memory Usage Gauge chart
This chart displays the currently used total percentage of memory, as well as the total memory available in byte. Memory used for virtualization is not included.

Line graph
The line graph displays the total amount of memory used over time. Data is displayed scaled over time. Memory used for virtualization is not included.
Used Disk Space This graph displays the currently used total percentage of disk space on the host, as well as the total disk space available in byte.
Inbound Traffic This is the current amount of incoming data, as well as the total amount of data transferred since the last reboot.
Outbound Traffic This is the current amount of outgoing data, as well as the total amount of data transferred since the last reboot.
Packetloss Here the number of lost incoming packets and lost outgoing packets is displayed.
Disk IO This graph displays the amount of reads and writes on the disk. Reads show how much data per second has been read while writes show the amount of data that has been saved or deleted.
Inbound Traffic by Interface This is the amount of incoming data over time. The list to the right of the graph shows the average amount of traffic per interface. Note that this list also includes internal interfaces in this version.
Outbound Traffic by Interface This is the amount of outgoing data over time. The list to the right of the graph shows the average amount of traffic per interface. Note that this list also includes internal interfaces in this version.

Docker workload logging

The logs of the Docker workloads on a node are collected in the centralized logging system, allowing the analysis of logs from multiple workloads and nodes. Logs are collected from the standard Linux streams stdout (for debug messages) and stderr (for error messages). So for user created workloads this means that logs need to be sent to these streams to be collected. Note that the logs are most suitable to be read by developers with expert knowledge and should also be configured by developers. Tick the checkbox next to DOCKER WORKLOAD LOGGING to enable the tracking of Docker workload logs.

To display logs of a certain workload, collected logs can be filtered in the Docker workload logging dashboard in OpenSearch.

Docker workload logging dashboard

Docker workload monitoring

Metadata of the overall state of Docker workloads is gathered in the Management System. A list of installed containers and their resource utilization is displayed in this dashboard. Tick the checkbox next to DOCKER WORKLOAD MONITORING to enable the tracking of Docker workload data.

Docker workload monitoring dashboard

Item Description
Running Containers This is a list of user-installed Docker containers with details.

Name
This is the name of the Docker container as defined with the Container name setting when provisioning the workload.

Serial Number
This is the serial number of the current node.

CPU usage [%]
This is an average value of how much of the total CPU a Docker workload has used.

DiskIO
This is the sum of reads and writes over the defined timespan.

Mem [%]
This is an average percentage of how much of the total memory a Docker workload has used.

Mem RSS[B]
This is an average value of how much resident set size (RSS) memory a Docker workload has used. Refer to this link for a general explanation on RSS.
CPU Usage This is a graph of CPU usage in percentage over time. Note that the percentages here are in relation to the total amount of available CPU. Also, the display behaves according to standard OpenSearch Dashboard behavior, meaning that CPU usage might be displayed as being at zero even though the CPU is busy. This is due to the graph showing only new data coming in and disregarding values that stay constant over a certain amount of time.
Containers Network IO This is a graph showing the incoming and outgoing data for each container over time. Inbound and outbound traffic are marked separately per container.
Memory Usage This is a graph of the total amount of memory used over time.

Audit logs for the Management System

Audit logs are a systematic and chronological record of events, activities, or transactions within a system. These logs capture a wide range of information, including who performed an action, what the action was, when it occurred, and the outcome of the action. As such, they are a distinct security capability, providing forensics capability and traceability about past actions, and are crucial for several purposes, such as security, compliance, troubleshooting, and performance monitoring.

The audit logs for the Management System are presented in an OpenSearch dashboard. Access the dashboard within the SYSTEM INFO section in the Management System:

Note

  • Only users with the Admin role or the UI_SERVER_AUDIT:VIEW permission assigned to their role can access the audit logs.
  • Audit logs are retained for 6 months.
  1. Access the Management System.
  2. Select SYSTEM INFO at the bottom-left.

    Select system info

  3. Select the Logs tab.

  4. Select VIEW AUDIT LOGS.

    Select view audit logs

An OpenSearch Dashboard window will open in a new browser tab, showing relevant log entries.

Audit logs dashboard

Inspect a log entry to show the full range of fields. The fields contain the following information:

Field Description
@timestamp Timestamps are shown in the format MMM DD, YYYY @ hh:mm:ss.sss. The time is taken from the browser's time zone settings. 
Additional info This category is optional. Depending on the event logged, additional information will be displayed here. The information contained in this filed can be used to ease searching for additional logs. Examples of information that could be shown:
  • Node serial numbers
  • Workload ID, workload version ID, workload name and workload version name for log entries relating to workload operations.
  • Remote connection name and remote connection type for log entries relating to remote connections.
Category This states the area the log applies to. Examples of categories are:
  • Access control
  • Workload
  • Deploy
  • DNA
Event ID An internal code that represents the most important details of a log entry. It consists of:
  • Source of the audit logs
  • Object to which the event is related to
  • Category of the event
  • Unique ID of the specific action
Refer to Event ID code below for a breakdown of the possible code variations.
Host This shows where the log message originates from. It can be a Management System URL or a node serial number.
Message This is the main information field. It describes the actual event in written text and can contain detailed information like error messages, IDs, image paths and more. Refer to the additional info field if the content of the message field is insufficient.
Result This contains the event result and can be either Success or Fail.
Security level This contains an estimation of the security risk of the event and can be either Low, Medium or High.
Source This shows the user, process or component that triggered the event. It will show:
  • User names if the event was triggered by a user in the Management System
  • Nerve Management System if the event is a response from the Management System.
  • nerve-ovdm if the event is a response from a node.
  • An empty field if none of the above apply.
Type of action This field describes the type of event that has occurred. Examples are:
  • Login
  • Workload creation
  • Workload deploy
  • Modified target configuration
Refer to the table below for a full list of events that are being logged.
_id This is a field generated by OpenSearch. It contains a unique identifier that is attached to the log.
_index This is a field generated by OpenSearch. It contains the name of the index that all Management System logs are stored under.
_score This is a field generated by OpenSearch. It is not used for audit logs and will not contain any information.
_type This is a field generated by OpenSearch. It is not used for audit logs and will not contain any information.
label This field shows the label of the log that states the type of log. In the case of audit logs, it will contain audit.
level This field shows the nature of the log. It serves the purpose of differentiating between logs contain information or error messages. It can contain info or error.
timestamp This is another timestamp generated internally by OpenSearch.

Event ID code

The Event ID code is an eight digit number, with every two digits signifying an aspect of the event. Its meaning is generated left to right, meaning that the first digits pair determines the possible value of the second digit pair and so on. The digits stand for the following:

Event ID code pattern: NNXXYYZZ
  • NN
    The first and second digits identify the source of the audit logs.
  • XX
    The third and fourth digits identify the object to which the event is related to.
  • YY
    The fifth and sixth digits identify the category.
  • ZZ
    The seventh and eight digits are used for the unique ID of the specific action that occurred.

Take a look at the flowchart below for possible digits. Keep in mind that the ID is constructed the following way: NN XX YY ZZ

Event ID code flowchart

Refer to the table below for a list of events that signify the last two digits. Note that the Message and Additional info fields in the audit logs dashboard give more context to each event:

Object Category ID Action
System Access control 01 Login
02 Logout
03 User registration
04 Management System language set
05 User profile updated
06 Personal user profile updated
07 User profile profile deleted
Workloads Deploy 01 Workload deployment initiated
02 Workload deployment validated and started
03 Workload deployment successful, failed or canceled
04 Workload deployment restarted (single)
05 Workload resources allocation initiated
06 Workload configuration files applied
07 Workload resources allocation updated
08 Workload deployment restarted (multiple)
CRUD 01 Workload creation initiated
02 Workload creation successful
03 Docker image download successful
04 Workload version creation initiated
06 Workload settings update initiated
07 Workload settings update successful
08 Workload version settings update initiated
09 Workload version settings update successful
10 Workload deleted
11 Workload version deleted
Workload control 01 Workload undeployed
02 Workload suspended
03 Workload resumed
04 Workload stopped
05 Workload restarted
06 Workload started
DNA 01 Target configuration downloaded
02 Current configuration downloaded
03 Reconfiguration initiated by re-applying the target configuration
04 Reconfiguration cancelled
05 Reconfiguration initiated by applying a new target configuration
06 Reconfiguration finished, target configuration successfully applied
Virtual machine backup 01 Backup creation initiated or failed
02 Backup creation successful or failed
03 Backup creation retry initiated
Virtual machine snapshot 01 Snapshot creation initiated
02 Snapshot creation successful or failed
03 Snapshot deletion initiated
04 Snapshot deletion successful of failed
05 Snapshot reversion initiated
06 Snapshot reversion successful or failed
07 Snapshot schedule configured
08 Scheduled snapshot successful or failed
09 Snapshot schedule deletion initiated
10 Snapshot schedule deletion successful or failed
Remote connections 01 Remote connection to workload version or node established or failed
02 Remote connection pending to connect
03 Remote connection connected
05 Disconnection of a remote connection from a workload version or node successful or failed
06 Approval of remote connection to workload version or node successful or failed
07 Remote connection terminated

Management System logs

Management System logs can be accessed in the SYSTEM INFO section by users with the Admin role or the UI_SERVER_AUDIT:VIEW permission assigned to their role. These internal logs are aimed at Nerve service technicians in case of error and failure. Data is stored with OpenSearch and visualized with OpenSearch Dashboard.

Access the Management System logs the following way:

  1. Access the Management System.
  2. Select SYSTEM INFO at the bottom-left.

    Select system info

  3. Select the Logs tab.

  4. Select VIEW MANAGEMENT SYSTEM LOGS.

    Select MS logs

A new tab opens, displaying the Management System logs in an OpenSearch dashboard.

Management System logs dashboard

Setting up alerts

Users can monitor specific events and receive alerts when critical events are detected. The instructions below explain how to create alerts and choose a notification method for specific events detected in a Docker workload. The instructions are split up the following way to make them easier to follow:

Note

This section mostly concerns how to use and configure OpenSearch in the context of the Nerve system. Therefore, refer to the OpenSearch documentation for detailed information that is not covered in the instructions below.

  1. Setting up a notification channel
    A notification channel is set up to use in an action. Inside the channel, sender and recipients are defined. This section is split up further into creating an SMPT sender, a recipient group and a notification channel.

  2. Setting up a monitor
    A monitor is a job that runs on a defined schedule and queries OpenSearch indexes. The results of these queries are then used as input for one or more triggers.

  3. Setting trigger and action
    Triggers are conditions that, if met, generate alerts, with an action that sends out information after being triggered. Actions have a destination, a message subject and a message body.

Since the instructions below cover steps performed in OpenSearch, OpenSearch needs to be accessed in the Nerve system to proceed:

  1. Log in to the Management System.
  2. Select a node from the node tree.
  3. Select the Logs tab.

    Select logs tab

  4. Select any of the dashboard links to access OpenSearch.

    Select any dashboard link

Setting up a notification channel

To send automated notifications from OpenSearch, a notification channel has to be defined. This example shows how to set up an e-mail notification channel. Before setting up a notification channel, a sender and recipient group have to be defined, which is covered in separate steps below.

Creating an SMTP sender

SSL and OpenSearch keystore are recommended to store usernames and passwords of the sender.

  1. Select the burger menu in the upper-left.
  2. Select Notifications under OpenSearch Plugins.

    Select notifications

  3. Select Email senders.

  4. Select Create SMTP sender on the right.

    Create SMTP sender

  5. Enter the following information:

    Setting Value
    Sender name nerve-alerts
    Email address alerts@nerve.cloud
    Host mail.nerve.cloud
    Port 587
    Encryption method TLS
  6. Select Create.

    Select create

The SMTP sender is now created. However, authentication of the SMTP sender needs to be performed by Nerve Service technicians. Contact customer support through the TTTech Industrial support portal to have the SMTP sender authenticated.

Creating a recipient group

Individuals or a group of people that should be notified about specific events are added to recipient groups.

  1. Select the burger menu in the upper-left.
  2. Select Notifications under OpenSearch Plugins.

    Select notifications

  3. Select Email recipient groups.

  4. Select Create recipient group on the right.

    Create recipient group

  5. Enter the following information:

    Setting Value
    Name Enter a name for the recipient group. This example uses documentation.
    Description This field is optional. Enter a description to give more information about the recipient group.
    Emails Add the email addresses of the recipients that should receive alerts.
  6. Select Create.

    Select create

Creating a notification channel

Notification channels can be set up in multiple ways. This example sets up an e-mail notification channel using SMTP. Refer to OpenSearch documentation for more information on other types of notification channels.

  1. Select the burger menu in the upper-left.
  2. Select Notifications under OpenSearch Plugins.

    Select notifications

  3. Select Channels.

  4. Select Create channel on the right.

    Create channel

  5. Enter the following information:

    Category Settings and values
    Name and description Name
    Enter a name for the channel. This example uses docs-channel.

    Description
    This field is optional. Enter a description to give more information about the channel.
    Configurations Channel type
    Select Email from the drop-down menu.

    Sender type
    Select SMTP sender.

    SMTP sender
    Select the sender that was created in Creating an SMTP sender above from the drop-down menu. This example uses nerve-alerts.

    Default recipients
    Select the recipient group that was created in Creating a recipient group above from the drop-down menu. This example uses documentation.
  6. Select Create.

    Select create

Once created, the channel is automatically activated. It is suggested to temporarily deactivate a channel in case an error has been found. If a system is being repaired, logs will likely produce further error messages that would be detected. Muting a channel helps avoiding unnecessary alarms in the meantime. Tick the checkbox next to the channel and select Mute from the Actions drop-down menu.

Muting a channel

Setting up a monitor

Note

This example uses a query monitor to filter log messages by their contents. There are other monitor types that can be used as well. Refer to OpenSearch documentation for more information on other types of monitors.

When setting up a monitor, it is required to define a query, which is used to detect words and phrases in log messages. This query serves as the base of the alert. This query is written in a search language called query domain-specific language (DSL), which is provided by OpenSearch. Query DSL is a flexible language with a JSON interface. Refer to OpenSearch documentation for more information on how to write queries in Query DSL.

For this example, the query is set up to find specific keywords inside custom logs from a container. The query used in the instructions below looks for the phrase ERROR in all log messages, which is set by the query parameter. Replace the value with other words or phrases to search for different terms.

  1. Select the burger menu in the upper-left.
  2. Select Alerting under OpenSearch Plugins.

    Select Alerting

  3. Select the Monitors tab.

  4. Select Create monitor on the right.

    Create monitor

  5. Enter a Monitor name. This example uses docs-monitor.

  6. Select Per query monitor under Monitor type.
  7. Select Extraction query editor under Monitor defining method.

    Monitor details settings

  8. Enter filebeat* under Data source to collect data stored in all filebeat indexes.

  9. Copy the following script into Define extraction query.

    { "size": 1000, "query": { "bool": { "filter": [ { "range": { "@timestamp": { "from": "{{period_end}}||-1m", "to": "{{period_end}}", "include_lower": true, "include_upper": true, "format": "epoch_millis", "boost": 1 } } }, { "match_phrase": { "message": { "query": "ERROR", "slop": 0, "zero_terms_query": "NONE", "boost": 1 } } } ], "adjust_pure_negative": true, "boost": 1 } }, "aggregations": {} }

    Copy query

As mentioned above, the query detects log messages that contain the phrase ERROR in the last minute. Since it is executed every minute, it will detect each error only once. Continue with the next section to configure a trigger with a corresponding action.

Setting trigger and action

The trigger condition controls when notifications are sent through the notification channel. The condition is using the results of the query which was set up in Setting up a monitor above.

When the monitored event is triggered, an action is performed as a result. In this example, an e-mail is sent to the notification channel that was created in Setting up a notification channel above.

  1. Select Add trigger.

    Add trigger

  2. Enter a Trigger name. This example uses docs-trigger.

  3. Select the desired Severity level from the drop-down menu.
  4. Define a trigger condition. For this example, the condition is set to trigger if there are more than five errors detected in the last minute:

    ctx.results[0].hits.total.value > 5

    Trigger settings

  5. Scroll down to reach the Actions (1) section.

  6. Enter the following information:

    Setting Value and description
    Action name Enter a name for the action. This example uses alerting-action.
    Channels Select the channel that was created in Setting up a notification channel above. This example uses docs-channel.
    Message subject Enter a message subject that will be the subject of the e-mail notification.
    Message Enter a message in the Message field that will be the body of the e-mail notification. Some information using variables is already pre-filled. Refer to OpenSearch documentation for more information on how to use variables in the message.

    Action settings

  7. Select Create at the bottom of the page.

With this, a monitor based on a query is created that is executed every minute and detects all log messages containing a key word or phrase. An alarm is triggered if more than five messages per minute are detected and an e-mail with error details is sent to a recipient group.

Accessing workload logs in the Local UI

Logs can be accessed locally through the Local UI. There, logs are accessed separately for each Docker workload or Docker Compose service.

  1. Access the Local UI.
  2. Select Workload management in the navigation on the left.

    Select workload

  3. Select a Docker or Docker Compose workload.

  4. Select the Logs tab.

    Select logs tab

Log message are displayed in a large message window. Take note of the following functions:

item Description
Search bar Enter a string here to search the log messages for the entered string.
Download icon Select the Download icon to download the full logs of this container. This will download the logs as a single LOG file if the logs have not exceeded the limit for one file. Once the limit has been exceeded, the download will be a ZIP file containing multiple LOG files.
Copy icon Select the Copy icon to copy the container's logs into the clipboard. Note that this copies the last 500 lines of the log.
Pause icon Select Pause icon to stop new logs from coming in. Select the icon again, now a Play icon, to resume logs coming in again.

Logs window

Note that for Docker Compose workloads, logs can be accessed for each service. Select a service from the list to display its logs.

Select service