MSolution ElasticSearch integration with Quantum SNFS

snfsIn the media and entertainment field, companies have special needs, especially for their data storage.

As a lot of them use the StorNext file system for its special characteristics, this article will explain how to monitor this particular file system and how to display informations about its performances in the Msol product.

The StorNext file system is built to provide users high availability and a short latency when accessing stored data. Which is why we need some metrics about the hysteresis of the file system to prevent incidents.

The StorNext File System plug-in

How it works

The purpose of the plug-in is to make data readable in order to send it to an Elasticsearch database.

To do this, the plug-in uses the tools cvadmin and qustat included in the StorNext file system package.

The plug-in uses the cvadmin command to gather all the file system names. Once it is done it will use the qustat command to gather all the information about these file systems.

The qustat command

Since the plug-in is included in our monitoring solution, I will explain what the qustat command does.

This command can be run on both the MDC and a client. The qustat command outputs all the statistics possible of one of the file systems passed on the command line for the MDC and all the clients connected to the file systems.

Among them you can access statistics such as the CPU time usage and the page fault of the MDC process, the latency to access a file system (whether it is a write or a read access), the length of read or write data on a disk, the number of errors when accessing a file system or a physical drive, and many other values. For each one of these statistics there are different fields:

  • a type (level, time, sum, counter)
  • a counter (number of times the event happen)
  • a minimum value (minimum value for level and time data type)
  • a maximum value (maximum value for level and time data type)
  • a total or sum (for sum type data or time type data)
  • an average value (total value divided by counter value)
  • a timestamp (same for all the metrics for one call of the qustat command)

Some metrics make no sense unless they’re linked to a machine. For example, for each client you will have a different delay to perform some operation such as create or remove a file. That’s why all of those metrics are grouped by field and by host machine in different tables. For example:

[email protected] ~ # qustat -f test1

# QuStat Rev 4.7.2
# Host MDC
# Module FSM
# Group test1
# Recorded time_t=1446488917 2015-11-02 10:28:37 PST

# Table 1: VOPS
# Last Reset: Secs=1714690 time_t=1444774227 2015-10-13 15:10:27 PDT
# NAME          TYP       COUNT          MIN          MAX      TOT/LVL          AVG
VOP Expand Ino  TIM           1         3075         3075         3075         3075

# Table 2: FSM Core Operations
# Last Reset: Secs=1714690 time_t=1444774227 2015-10-13 15:10:27 PDT
# NAME          TYP       COUNT          MIN          MAX      TOT/LVL          AVG
IO Tm Wrt       TIM     6816029           81      1457121   4657352952          683
IO Tm Rd        TIM     6816039           82      3056844   3720453448          546
Ino Tm Create   TIM           3            5         2341         2659          886
Ino Tm Wrt      TIM           2          144          376          520          260
Ino Tm GetFree  TIM           5            1            7           11            2
Ino Wrt Gang Sz SUM           2            1            3            4            2
Ino Free in mem LVL       32782            1        32768        32768            0
Ino Free in FS  LVL           2          508         1018         1018            0
Ino Total in FS LVL           2          512         1024         1024            0
ICa Hits        CNT           2            1            1            2            1
ICa Misses      CNT           5            1            1            5            1
BufCa Tm Get    TIM           7            1            1            7            1
BufCa Tm Wrt    TIM           9          121        15152        16410         1823
BufCa n Ents    LVL    34241188            1         2048         2048            0
BufCa Hits      CNT    17119558            1            1     17119558            1
BufCa Misses    CNT           7            1            1            7            1
RESV Sp Blocks  LVL           1            0            0            0            0
RESV Sp Floor   LVL           1            0            0            0            0
CON Total       LVL      269515            0            1            1            0

...

Data and data format output of the plug-in

The StorNext File System plug-in will retrieve information about how much time high priority write and read takes on the different StorNext file systems managed by the current MDC.

The data format is ${HOSTNAME}.${CHECK_NAME}.${FILE_SYSTEM_NAME}.${METRIC_NAME} ${METRIC_VALUE} ${TIMESTAMP}. Here is a sample of the plug-in output:

[email protected] ~ # /etc/sensu/plugins/snfs/snfs-latency.rb 
MDC.snfs-latency.test1.write_min 78 1446489288
MDC.snfs-latency.test1.write_max 1457117 1446489288
MDC.snfs-latency.test1.write_avg 681 1446489288
MDC.snfs-latency.test1.read_min 79 1446489288
MDC.snfs-latency.test1.read_max 3056840 1446489288
MDC.snfs-latency.test1.read_avg 542 1446489288
MDC.snfs-latency.test1.records 2 1446489288

Once it’s done, these data are sent to the database where they can be used to create graphs.

How to install the plug-in

You can find the plug-in on the bitbucket repository.

The StorNext plug-in require a ruby version superior or equal to 1.9.0 (so you should not have problems if the Msol client is already running).

This guide assumes you have an MSol server running with the Sensu and ELK stack and an MSol client is already installed and configured to communicate with the MSol server on the MDC host machine.

On the MDC host machine

We will consider the default path installation of the StorNext tools /usr/cvfs, if your installation is located in a different folder just adapt the following step to your configuration.

The plug-in is executed by the sensu user who doesn’t have root privilege access and the cvadmin command is by default only runnable by the super-user.

Set the right permissions for the cvadmin command

We need to configure the sensu user to not ask for user interaction when executing the cvadmin command inside the plug-in. You can add the setuid byte to the cvadmin binary, it will execute the binary with the privileges of its owner, root for a default installation:

sudo chmod u+s /usr/cvfs/bin/cvadmin

Get the plug-ins files

Go to the plug-in installation directory and pull the git repository (you need to have an internet access).

cd /etc/sensu/plugins
sudo git clone https://bitbucket.org/msolutionio/msol-sensu-plugins.git /etc/sensu/plugins

Establish the communication

Now we will tell the MSol server that this client machine needs to run the StorNext File System plug-in and therefore you need to activate the StorNext related plug-ins.

You just have to edit the client.json file and add "snfs" in the susbcribers section:

sudo vi /etc/sensu/conf.d/client.json
{
  "client": {
    "name": "MDC_name",
    "address": "your_mdc_address",
    "subscriptions": [ "linux", "snfs" ]        # If you have other subscriptions don't remove them
  }
}

Restart the sensu-client service with:

sudo service sensu-client restart

Now the new checks should appear for this machine on the monitoring Msol interface.

Display the data in Kibana

Kibana is an analytic and visualization platform. It displays a real-time graphical summary of streaming data and allows users to analyse their data in several ways to best fit their interests.

Configure the index

We will tell Kibana which data will be used.

Go to the logs tab on the Msol web interface.

In Kibana go to the Settings tab. If it’s not configured yet, configure an index pattern with the pattern logstash-*. Chose the @timestamp field in the Time-field name list

You should have this:

configure

Create the new index.

If you already had a previous index just reload the configuration with the orange button to actualize the field list.

Check your request and your data

Now that we have configured which data we want to manipulate in Kibana, we want to make them human readable and easy to understand.

Go to the Discover tab of Kibana.

To gather and exploit specific logs and metrics, we will have to create a request to ask Kibana to filter the data.

Write a request to filter log like check_name:"snfs-latency" for example. If you make a syntax mistake in your request Kibana will tell you.

You should have a view like this:

kibana_discover1

Now that we are sure we have data matching our request, we can make a graph.

Create a graph

Select the data you want

We will make a graph of the write maximum value for the different StorNext file system we have.

Go to the Visualize Kibana tab.

Select the type of graph you want (here I choose a line chart).

Here our request will be check_name:"snfs-latency" AND metric_name:"write_max".

Select the value which will be used to make your graph

Add an Y-axis, in the aggregation field select Average and in the Field list select the value you want to display on the graph (here it’s metric_value field).

In the buckets section add X-axis data, select the ‘Date Histogram’ in the list of the sub-aggregation field then select the @timestamp field in the Field list.

If you click on the run button you should have a graph but for each value you have an average value between the different StorNext file system you have. So we need to split the data according to our file system names.

Separate your file system data

Click on add sub-buckets and select Split Lines. In the Aggregation field choose Terms. In the Field list choose the field that will differentiate your data (fields are list by type then by alphabetical order). So here we choose filesystem_name.

Order your values

You can order your values with the OrderSize and Order by fields.

Click on the run button and the graph should appear:

clickrun

You can save your visualization with the toolbox on the right top of the Kibana tab in order to insert it in a dashboard.

Create a dashboard

Go to the Dashboard Kibana tab.

You can create a new dashboard, or add your previously saved visualization to an existing one. Just use the toolbox on the top right of the Kibana tab.

Once your dashboard is ready, you can save it and export it in the Settings tab and the Object sub-tab. Don’t forget to export your visualizations with your dashboard or it will not work when you reimport it on another machine.

Here is an example of a dashboard with the different StorNext file system latency metrics:

example

Leave a Reply

Be the First to Comment!

avatar
wpDiscuz