An overview of using the statistica Neural Networks package. III. Overview of using the statistica Neural Networks package Algorithms for building a neural network model in statistica

Pre- and post-processing, including data selection, nominal coding, scaling, normalization, removal of missing data with interpretation for classification, regression and time series problems;
Exceptional ease of use plus unrivaled analytical power; for example, has no analogues Solution Wizard guides the user through all stages of creating various neural networks and select the best one (this task, otherwise, is solved through a long process of “trial and error” and requires a thorough knowledge of the theory);
Powerful exploration and analytical technologies, including Principal Component Analysis And Dimensionality Reduction for selecting the necessary input variables in exploratory (neural network) data analysis (selecting the necessary input variables for neural networks often takes a long time; the system STATISTICA Neural Networks can do this work for the user);
The most advanced, optimized and powerful network training algorithms (including conjugate gradient and Levenberg-Marquard); full control over all parameters affecting network quality, such as activation and error functions, network complexity;
Support for ensembles of neural networks and neural network architectures of almost unlimited size, created in Network sets - Network Sets; selective training of neural network segments; combining, and storing sets of networks in separate files;
Full integration with the system STATISTICA; all results, graphs, reports, etc. can be further modified using powerful graphical and analytical tools STATISTICA(for example, to analyze predicted residuals, create a detailed report, etc.);
Seamless integration with powerful automated tools STATISTICA; recording full-fledged macros for any analysis; creating your own neural network analyzes and applications using STATISTICA Visual Basic, challenge STATISTICA Neural Networks from any application that supports COM technology(for example, automatic neural network analysis in a table MS Excel or combining several custom applications written in C++, C#, Java etc.).

STATISTICA Neural Networks in neural network computing:

The use of neural networks involves much more than just data processing using neural network methods.
STATISTICA Neural Networks provides a variety of functionality, for working with very complex tasks, including not only the latest Neural Network Architectures And Learning algorithms, but also new approaches in Selection of Input Data And Building a Network. In addition, the developers software and users experimenting with application settings will appreciate the fact that after conducting specified experiments in a simple and intuitive interface STATISTICA Neural Networks,neural network analyzes can be combined in a custom ,application. This is achieved either using the library STATISTICA COM functions, which fully reflects all the functionality of the program, or using code in the language C (C++, C#) or Visual Basic, which is generated by the program and helps run a fully trained neural network or network ensemble.

Initial data

Module STATISTICA Neural Networks fully integrated with the system STATISTICA, thus available huge selection tools for editing (preparing) data for analysis (transformations, conditions for selecting observations, data checking tools, etc.). Like all tests STATISTICA, the program can be "attached" to a remote database using in-place processing tools, or linked to live data so that models are trained or run (eg, to calculate predicted values or classify) automatically whenever the data changes.

Input selection and dimensionality reduction

After the data is prepared, you have to decide which variables should be used when working with the neural network. The greater the number of variables, the more complex the neural network will be, and therefore it will require more memory and training time, as well as a larger number of training examples (observations). When there is insufficient data and/or correlations between variables, the issues of selecting meaningful input variables and compressing information into a smaller number of variables become of utmost importance in many neural network applications.

Dimensionality reduction algorithms:

IN STATISTICA Neural Networks reverse and direct algorithms implemented step-by-step selection. In addition, the neuro-genetic algorithm for selecting input data combines the capabilities of genetic algorithms and PNN/GRNN (PNN - probabilistic neural networks, GRNN - generalized regression neural networks) For automatic search optimal combinations of input variables, including in cases where there are correlations and nonlinear dependencies between them. Almost instantaneous learning speed PNN/GRNN algorithm not only makes it possible to apply Neuro-Genetic algorithm for input data selection, but also allows you (using the available Editor system data STATISTICA Neural Networks convenient means of suppressing insignificant variables) conduct your own experiments on data sensitivity in real time. STATISTICA Neural Networks also contains a built-in system Principal Component Analysis (PCA, and associative networks for "nonlinear PCA"), which allows you to reduce the dimension of the source data. Note that a huge variety of statistical methods for reducing the dimensionality of data are available in the basic system STATISTICA.

Data scaling and nominal value conversion:

Before data is entered into the network, it must be prepared in a certain way. It is equally important that the output data can be interpreted correctly. IN STATISTICA Neural Networks there is the possibility of automatic scaling of input and output data (including scaling by minimum/maximum values and by average/standard deviation); Variables with nominal values can also be automatically recoded (for example, Gender=(Male,Female)), including using the 1-of-N coding method. STATISTICA Neural Networks also contains tools for working with missing data. Implemented normalization functions such as "single amount", "winner takes all" And "vector of unit length". There are data preparation and interpretation tools specifically designed for time series analysis. A wide variety of similar tools are also implemented in the basic system STATISTICA.
In classification problems, it is possible to set confidence intervals that STATISTICA Neural Networks is then used to assign observations to one class or another. In combination with a special implemented in STATISTICA Neural Networks activation function Softmax and cross-entropy error functions, this provides a fundamental probability-theoretic approach to classification problems.

Selecting a neural network model, Network ensembles

The variety of neural network models and the many parameters that need to be set (network sizes, learning algorithm parameters, etc.) can confuse some users (that’s why there is Solution Wizard, which can automatically search for a suitable network architecture of any complexity).

The STATISTICA Neural Networks system implements all the main types of neural networks used in solving practical problems, including:

multilayer perceptrons (networks with direct signal transmission);
networks based on radial basis functions;
self-organizing Kohonen maps;
probabilistic (Bayesian) neural networks;
generalized regression neural networks;
main component networks;
networks for clustering;
linear networks.

Also, in the system STATISTICA Neural Networks implemented Network ensembles, formed from random (but significant) combinations of the above networks. Another handy feature is that you can link networks together so they run sequentially. This is useful in preprocessing to find solutions with minimal cost.

In the package STATISTICA Neural Networks Numerous tools are available to help the user select an appropriate network architecture. The system's statistical and graphical tools include histograms, matrices and error plots for the entire population and for individual observations, final data on correct/incorrect classification, and all important statistics - for example, explained proportion of variance - are calculated automatically.

To visualize data in a package STATISTICA Neural Networks Scatterplots and 3D response surfaces are implemented to help the user understand the “behavior” of the network.
Of course, you can use any information obtained from the listed sources for further analysis by other means. STATISTICA, as well as for subsequent inclusion in reports or for customization.

STATISTICA Neural Networks automatically remembers the best network option from those that you received while experimenting with the task, and you can refer to it at any time. The usefulness of the network and its predictive ability are automatically tested on a special test set of observations, as well as by estimating the size of the network, its efficiency, and the cost of misclassification. Implemented in STATISTICA Neural Networks automatic cross-validation and regularization procedures Wigend scales allow you to quickly find out whether your network is insufficient or, on the contrary, too complex for a given task.

To improve performance in the package STATISTICA Neural Networks Numerous network configuration options are presented. Thus, you can specify a linear output network layer in regression problems or a softmax activation function in probabilistic estimation and classification problems. If your data has a lot of outliers, then when training the network you can replace standard function errors to a less sensitive function "city blocks". The system also implements cross-entropy error functions based on information theory models and a number of special activation functions, including step, sawtooth and sine.

Solution Wizard (automatically evaluates the problem and selects several networks of different architectures):

Part of the package STATISTICA Neural Network s is Solution Wizard - Intelligent Problem Solver, which evaluates many neural networks of various architectures and complexity and selects the networks of the best architecture for a given task.
Master is able to build networks for data with independent observations (standard regression networks, classification networks, or mixed ones), as well as networks designed to predict future values of a certain variable based on existing values of the same variable (time series networks).
When creating a neural network, considerable time is spent on selecting appropriate variables and optimizing the network architecture using heuristic search. STATISTICA Neural Networks takes over this work and automatically performs the heuristic search for you. This procedure takes into account the input dimension, network type, network dimensions, and required output encoding functions.
During the search, you can set the number of responses received during the training process. When setting the maximum detail mode Solution Wizard It will display the architecture and quality levels for each network tested.
Solution Wizard is an extremely effective tool when using complex techniques, allowing you to automatically find the best network architecture. Instead of spending many hours sitting in front of the computer, let the system STATISTICA Neural Networks do this work for you.
Automatic network constructor can also be used during model development when the module STATISTICA Neural Networks, together with other modules basic system STATISTICA, is used to identify the most significant variables (for example, the best predictors for their subsequent inclusion and testing in any model Nonlinear Estimation).

Neural network training:

The success of your experiments to find the best network type and architecture depends significantly on the quality and speed of network learning algorithms. In system STATISTICA Neural Networks The best training algorithms to date have been implemented.
To train multilayer perceptrons in the system STATISTICA Neural Networks First of all, the backpropagation method is implemented - with a time-varying learning rate and inertia coefficient, mixing observations before the next step of the algorithm and adding additive noise for robust generalization. In addition, in the system STATISTICA Neural Networks two fast second-order algorithms have been implemented - conjugate gradient methods and Levenberg-Marquard. The latter is an extremely powerful modern nonlinear optimization algorithm, and experts highly recommend using it. At the same time, the scope of application of this method is limited to cases of relatively small-sized networks with one output neuron, and for more cumbersome tasks in the package STATISTICA Neural Networks There is a conjugate gradient method. Typically, both algorithms converge faster than backpropagation and usually produce a better solution.
Iterative process of network training in the system STATISTICA Neural Networks is accompanied by an automatic display of the current training error and the error calculated independently on the testing set, and a graph of the total error is also shown. You can interrupt training at any time by simply pressing a button. In addition, it is possible to set stopping conditions, under which the training will be interrupted; such a condition may be, for example, the achievement of a certain error level, or a stable increase in the test error over a given number of passes - “epochs” (which indicates the so-called retraining of the network). If overfitting occurs, the user should not care: STATISTICA Neural Networks automatically remembers an instance of the best network obtained during the training process, and this network option can always be accessed by clicking the corresponding button. After network training is completed, you can check the quality of its work on a separate test set.
In the package STATISTICA Neural Networks A number of learning algorithms have also been implemented for other networks of other architectures. The parameters of radial splines and smoothing coefficients for networks based on a radial basis function and generalized regression networks can be selected using algorithms such as: Kohonen training, subsample, K-means method, isotropy and nearest neighbor methods. The neurons of the linear output layer of networks based on a radial basis function, like those of linear networks, are fully optimized singular value decomposition (SVD) method.
Creation of hybrid network structures. In system STATISTICA Neural Networks It is possible to create networks of mixed structure. For example, in a modified network based on a radial basis function, the first layer of neurons can be trained by Kohonen algorithm ah, and the second one - nonlinear layer - Levenberg-Marquard method.

Neural network testing:

After the network is trained, you need to check the quality of its work and determine its characteristics. For this purpose in the package STATISTICA Neural Networks There is a set of on-screen statistics and graphical tools.
In the event that several models (networks and ensembles) are specified, then (if possible) STATISTICA Neural Network s will display comparative results (for example, plot response curves of several models in one graph, or present predictors of several models in one table). This property is very useful for comparing different models trained on the same data set.
All statistics are calculated separately for the training, validation and test sets. All weights and activation parameters are available as a convenient text file, which with one click can be converted into a table of system results STATISTICA. Experimental results for individual observations or for the entire data set can also be viewed in tabular form STATISTICA and use in further analyzes or graphs.
The following summary statistics are automatically calculated: root mean square error of the network, the so-called discrepancy matrix (confusion matrix) for classification problems (where all cases of correct and incorrect classification are summed up) and the proportion of explained regression for regression problems. Kohonen Network has a window Topological map, in which you can visually observe the activation of network elements, as well as change the labels of observations and nodes in the process of data analysis. There is also a Win Frequency window that allows you to instantly localize clusters in a topological map. Cluster analysis can be performed using a combination of a standard architecture network with a special system cluster diagram STATISTICA Neural Networks. For example, you can train a network for principal component analysis and graph the data as projected onto the first two components.

Editing, modification and serial connection of neural networks

In system STATISTICA Neural Networks There are intelligent tools that allow you to cut off pieces from existing networks and connect multiple networks together. Thus, you can remove or add individual neurons, remove an entire layer from the network, and networks that are consistent in the number of inputs/outputs can be sequentially connected to each other. Thanks to these features, the package STATISTICA Neural Networks allows you to use tools such as dimensionality reduction (during pre-processing) using associative networks and a loss matrix (for making decisions with the least losses). The loss matrix is automatically used when working with probabilistic neural networks.

Ready-made solutions (custom applications using STATISTICA Neural Networks):

Simple and convenient system interface STATISTICA Neural Networks allows you to quickly create neural network applications to solve your problems.
There may be a situation where it is necessary to integrate these solutions into an existing system, for example, to make them part of a wider computing environment (these may be procedures developed separately and built into the corporate computing system).
Trained neural networks can be applied to new datasets (for prediction) in several ways: You can save the trained network or ensemble of networks (for example, to calculate an average prediction based on multiple architectures) and then apply it to a new dataset (for prediction, predicted classification or forecasting); You can use a code generator to automatic creation program code in language C (C++, C#) or Visual Basic and further use it to predict new data in any software environment visual basic or C++ (C#), i.e. implement a fully trained neural network into your application. In conclusion, all system functionality STATISTICA, including STATISTICA Neural Networks can be used as COM objects (Component Object Model) in other applications (for example, Java, MS Excel etc.). For example, you can implement automated analyzes created in STATISTICA Neural Networks to tables MS Excel.

List of learning algorithms:

Backpropagation;
Levenberg-Marquard;
Conjugate gradients;
Quasi-Newtonian;
Rapid spread;
Delta-delta-with-bar;
Pseudo-inverse;
Kohonen Training;
Marking nearby classes;
Training vector quantizer;
Radial (sub)sampling;
K-means method;
K-Nearest Neighbors (KNN) method;
Setting isotropic deviations;
Setting obvious deviations;
Probabilistic neural network;
Generalized regression neural network;
Genetic algorithm for selecting input data;
Step-by-step direct or reverse selection of input data.

System requirements

System STATISTICA Neural Networks can work even on relatively weak or old computers. However, since many of the package's procedures are computationally intensive, it is strongly recommended to use Pentium processor with 32 megabytes of RAM.

Network size restrictions:

A neural network can be of almost any size (that is, its dimensions can be taken many times larger than is actually necessary and reasonable); Up to 128 layers are allowed without restrictions on the number of neurons. In fact, for any practical tasks the program is limited only by the hardware capabilities of the computer.

E-Manual:

As part of the system STATISTICA Neural Networks there is a well-illustrated textbook that provides a complete and clear introduction to neural networks, as well as examples. A system of detailed, context-sensitive help is available from any dialog box.

Source code generator:

Generator source code is an additional product that allows users to easily create their own applications based on the system STATISTICA Neural Networks. This add-on product creates the source system code for the neural network model (as a file in C, C++, C#), which can be separately compiled and integrated into your program for free distribution. This product is designed specifically for enterprise systems developers, as well as those users who need to transform highly optimized procedures created in STATISTICA Neural Networks into external applications to solve complex analytical problems.

Neural network methods are becoming increasingly widespread in a variety of fields.

Industry:

Process management (in particular, monitoring of production processes with continuous regulation of control parameters).
Classification of fuel samples (segmentation of fuel grades based on analysis of their spectra).
Technical diagnostics (use vibration and noise to identify faults in the mechanism at an early stage and carry out preventive repairs).
Engine control systems (assessing and controlling fuel consumption using sensor data).
Real-time switching detector systems in physics. Neural networks are noise-resistant and allow the use of robust patterns in physical data with large statistical noise.

Marketing:

Gold price forecasting;
Forecasting prices for raw materials;
Trade by direct mail.

Finance:

Assessment of creditworthiness (the classic task is to determine from personal data whether a given borrower is reliable).
Forecasting financial time series.

Geological exploration:

Increasing the efficiency of the mining process (identifying significant factors affecting mining efficiency indicators).

Other industries:

Optical character recognition, including signature recognition;
Image processing;
Forecasting chaotic time series;
Medical diagnostics;
Speech synthesis;
Linguistic analysis.

Statistica Neural Networks (SNN) package

Open data file Series_g from the data available in the package. The file contains a single variable that determines the volume of traffic over several years with monthly data recording. (When you open this file, a number of tables appear related to the intelligent solver option, which at this stage must be closed, leaving only the source data table).
Set the variable type “input – output” as follows: select the variable by clicking on the table header, right-click and select an option from the menu Input / Output. The variable name will be highlighted in green.
Create new network using the dialog box Create Network. To do this, press successively: File – New – Network. On the monitor screen there is a dialog box (Fig. 1).

Rice. 1. Network creation dialog box

In a time series prediction problem, the network must know how many copies of one variable it should take and how far ahead it should predict the value of the variable. In this task, accept the parameter Steps (Time window) equal to 12 because the data are monthly observations and the parameter Lookahead– equal to 1.

Select Multilayer Perceptron as the network type and accept the number of network layers as 3. After that, click the Advice button, as a result of which the program will automatically set the number of neurons in all three layers of the network: 12 – 6 – 1 (Fig. 2).

Rice. 2. Dialog box after setting network parameters

After that press the button Create.

When creating the network, SNN will automatically assign the first 12 observations from the data file to type Ignore. With further training and operation of the network in the task of time series analysis, each data block supplied to its input contains data related to several observations. The entire block is assigned to the observation that contains the value of the output variable. As a consequence, the first 12 observations are not actually ignored, but are inputs to the first block of time series data, which corresponds to observation #13. In fact, the program builds a transformed data set in which the number of observations is 12 fewer, but the data in each observation is taken from 13 consecutive lines of the source file.

The created network is shown in Fig. 3.

Rice. 3. Three-layer perceptron

In the source data window "Data Set Editor" set 66 training (Training) and 66 controls (Verification) observations (Fig. 4), then press the button for shuffling rows as follows: through the menu Edit – Cases – Shuffle – All (Editing – Cases – Shuffle – All).
Train the network using the Levenberg-Marquard method, for which you need to click: Train – Multilayer Perceptron – Levenberg-Marquardt (Train – Multilayer Perceptron – Levenberg-Marquardt). The learning procedure takes a few seconds (depending on the processor type). The Levenberg-Marquard method is one of the reliable and fast learning algorithms, but its use is associated with certain limitations:

Rice. 4. Source data window with separated observations

this method can only be used for networks with one output element.
The Levenberg-Marquard method requires memory proportional to the square of the number of weights in the network, so the method is not suitable for large networks (on the order of 1000 weights).
The method is applicable only for the root mean square error function.

The Levenberg-Marquard algorithm is designed to minimize root mean square function errors. Near the minimum point, this assumption holds with great accuracy, so the algorithm moves very quickly. Far from the minimum, this assumption may not be correct, so the method finds a compromise between linear model and gradient descent. A step is taken only if it reduces the error, and where necessary, gradient descent with a sufficiently small step is used to ensure progress.

The Levenberg-Marquard method dialog box is shown in Fig. 5.

Rice. 5. Levenberg-Marquard method dialog box

Main elements of the window:

Epochs (Number of Epochs)– sets the number of epochs during which the algorithm will run. At each epoch, the entire training set is passed through the network, and then the weights are adjusted.
Cross-Verification– when the position is marked, the quality of the result produced by the network is checked at each epoch against the control set (if it is specified). When turned off, control observations are ignored, even if they are present in the data file.
Train– Each time the button is pressed, the algorithm runs through the specified number of epochs.
Reinitialize– before starting training again, you should press the reset button, since in this case the network weights are randomly set again.
Jog Weights – When the algorithm might get stuck in a local minimum, this option adds a small amount to each weight.

Construct a time series projection using Run – Times Series Projection open the corresponding window (Fig. 6).

Rice. 6. Time series projection window

Description of the dialog box

Start– specifies whether the time series projection should start at some observation number (Case No) in a data file or from an individual observation.
Case No – When projecting a time series from a data file, the observation number with the output value from which to start is indicated.
Length– the number of steps for which the forecast will be projected.
Variable– indicates the variable that will be designed.

Using a trained network, you can perform a time series projection. Initially, the network will work on the first 12 input values, resulting in a prediction of the next value. Then the predicted value, together with the previous 11 input values, is again fed to the input of the network, and the latter produces a forecast of the next value.

The only control parameter that needs to be selected is the projection length. IN in this example There are 144 observations in total, 12 of which will be removed during preprocessing, so results can be compared in at most 132 steps. However, it is possible to project a series beyond the boundaries of the available data, but there will be nothing to compare the result with.

View behavior of predicted values at different lengths using the button Run you can observe changes in the target and output values of the series.

In the given fig. Figure 6 shows that the predicted curve (blue on the monitor screen) was not trained very well, since there are significant deviations between the original and predicted series, starting from approximately 70 observations.

Carry out series forecasting using an intelligent solver (third button from the left in the top row). In this case, it is necessary to answer a number of questions in dialogue mode:

Select the main version (Fig. 7) and click Next.

Rice. 7. Selecting the main version

Determine the type of task (standard or time series). Here it is necessary to note the time series (Fig. 8).

Rice. 8. Selecting a task type

Set the observation period to 12 months (Fig. 9).

Rice. 9. Setting the observation period

Select dependent and independent variables, which are the same variable Series.
Determine the time of the calculation procedure equal to 2 minutes (Fig. 10).

Rice. 10. Setting the time of the settlement procedure

Indicate the number of networks to be saved and the actions to take when saving them (Fig. 11).

Rice. 11. Actions for selecting networks

Select forms for presenting results (Fig. 12) and click Finish.

Rice. 12. Selecting a form for presenting results

As a result of using an intelligent solver, the forecast is much more accurate, since the trained network is much closer to the original series (Fig. 13).

Rice. 13. Forecast using an intelligent solver

Exercise

Build a simulated time series from the Statistica package as follows:

Create a new file consisting of 20 lines and 2 columns.
Via the menu Data – Variable Specs enter the expression =vnormal(rnd(1);1;3) into the formula window.
Simulate 20 values of random is normal distributed quantity with a mathematical expectation equal to 1 and a standard deviation equal to 3. These 20 values define the variable Var 1. Convert them to an integer data type by setting in the variable description window as Type meaning Integer.
Go to the variable Var 2 as follows: the first value of Var 2 is equal to the first value of the variable Var 1; the second value Var 2 is equal to the sum of the first two values of the variable Var 1; the third value of the variable Var 2 is equal to the sum of the first three values of the variable Var 1, etc.
Copy the Var 2 variable and go to the SNN package, placing the copied data in the new file created.
Carry out forecasting of the resulting series using a neural network.

print version

INTRODUCTION TO MODERN NEURAL NETWORK

Laboratory work No. 1

SOFTWARE PRODUCT STATISTICA NEURAL NETWORKS (SNN) VERSION “SNN 7.0”

Goal of the work - get acquainted with the software by Statistica product

Neural Networks (SNN), build a neural network using the solution wizard.

1. Open data file Fan.stw(Table A.1) using the command File – Open. This file contains data about two types of classes - 1 and 2, the presence and absence of overheating.

2. Select a team Neural networks on the menu Analysis to open the STATISTICA Neural Networks launch pad.

Rice. 4. Tool selection

3. On the tab Fast launch pad Neural networks select a task type from the list (in this case – Classification) and solution method (in this case – Solution Master) and press the button OK(Fig. 4). After this, the standard variable selection dialog will be displayed.

4. Select the dependent (output) variable (in this case, the CLASS variable) (Fig. 5).

Rice. 5. Input data

5. For display Solution Wizards press the button OK on the launch pad.

On the tab Fast(Fig. 6) deselect option Select a subset of independent variables, only two independent variables are defined here, so both variables will be used as inputs for all neural networks being tested. In Group Duration of analysis there are options that determine the time that Solution Wizard will spend on finding an effective neural network. The longer Solution Wizard will work, the more effective the solution found will be. For example, install 25 networks.

Based on the results of the analysis, neural networks can be saved various types with different performance and complexity indicators so that you can ultimately choose the best network yourself.

6. Enter the number 10 to save networks to Solution Wizard saved only 10 the best options networks.

Tab Solution Wizard – Fast will have the appearance shown in Fig. 6.

Rice. 6. Settings for analysis

Press the button OK, to Solution Wizard started building

neural networks. After this, a dialog will be displayed Training in progress(Solution Wizard). Each time an improved neural network is discovered, a new row will be added to the information table. In addition, the operating time and percentage of the task completed are displayed at the bottom of the window. If no improvement has occurred over a long period of time, press the button Ready in dialogue Training in progress to complete the network search process. After the search is completed, a dialog will be displayed results, containing information about the found networks for further analysis (Fig. 7).

Rice. 7. Learning outcomes

7. Press the button Descriptive stats. on the tab Fast in dialogue results to display two summary tables: Classification and Error Matrix.

The classification table (Fig. 8) shows full information about solving the corresponding problem. In this table, there are multiple columns for each output class predicted by each model. For example, the column labeled CLASS.1.11 corresponds to the predictions of Model 1 in the OVERHEAT class for the variable CLASS. The first line provides information about the number of observations of various types of overheating in the data file. The second (third) line displays data (for each class) on the number of correctly (incorrectly) classified observations. The fourth line lists the “unknown” observations. The error matrix is usually used in problems with several

running classes.

8. To display the final statistics, you need to open Analysis(button results in line Analysis or command Continue on the menu Analysis). In Group Selections for displaying results select option All(separately). Then press the button Descriptive Statistics. The final classification table is divided into four parts. Column headings have different prefixes: O, K, T And AND, which correspond to the training, control, test and ignored samples, respectively. By default, observations are divided into three subsets in a 2:1:1 ratio. Thus, 50 training observations, 25 control observations and 25 testing observations were allocated. The results of the neural network on these sets are almost the same, that is, the quality of the neural network can be considered acceptable.

Rice. 8. Classification table

9. To complete Analysis press the button OK in dialogue results. On the launch pad when you press the button Cancel all constructed neural networks will be deleted. It is necessary to save neural networks in order to quickly train neural networks; accordingly, first find a network with the best performance, and then the constructed neural networks are saved for further use. To save the neural network, select the tab Networks/Ensembles and press the button Save network file as.... (the file has the extension .snn).

Tasks

1. Build and train a neural network using Solution Wizards to automate vehicle diagnostics, determining the need for engine overhaul based on the following parameters: engine compression, oil pressure, gasoline consumption.

2. Enter the initial data in accordance with the table. 1, obtain specific values of the variables from the teacher.

3. Build a neural network in accordance with the settings:

Problem type: classification;

Tool: Solution Wizard;

Number of networks: 25;

5. Analyze the construction of a neural network and reflect it in the report.

6. Prepare a report on the work performed.

and show how the dialogue with the system user is organized.

Pay attention to the user-friendly interface and availability of tools , And Multiple subsampling method, allowing users to design their own networks and choose the best ones.

So, first of all, let's launch neural networks.

Step 1. You start with the launch pad (look at Figure 1).

In this panel, you can select different types of analysis that you need to perform: regression, classification, time series forecasting (with continuous and categorical dependent variable), cluster analysis.

Rice. 1. Launch pad STATISTICA Automated Neural Networks (SANN)

Select for example Time series (regression), if you want to make a forecast, or Classification, if the classification problem is being solved.

Pressing the button OK, let's move on to the data selection dialog box.

Rice. 2. Dialog box Neural networks - Data selection - Quick tab

Step 2. On the tab Fast you should select the necessary variables for analysis. Variables can be continuous or categorical, dependent or independent; in addition, observations may belong to different samples.

Rice. 3. Variable selection window

For novice users, it is recommended to choose a strategy Advanced user can easily use any available strategy: Automated neural network (ANN), Custom Neural Network (CNN) And We will choose Automated neural network (ANN).

Rice. 4. Dialog box Neural networks - Data selection - Quick tab

On the tab Subsamples (PNS and ANS) you should set the desired division of the data into subsamples: training, control and test. The partition can be set randomly, or it can be fixed using an additional code variable.

In this case, we will use random partitioning.

Rice. 5. Dialog box Neural networks - Data selection - Subsamples tab (ANS and PNS)

Tab Subsamples (PNS and ANS) designed for the first two strategies: Automated neural network (ANN) And Custom Neural Network (CNN); in the tab Creating subsamples used for the latter strategy: Multiple subsampling method.

Click OK and move on to the step of specifying architecture parameters.

Step 3. On the tab Fast dialog box Automated Neural Networks you must specify the type of network, the number of hidden neurons, the number of trained and stored networks, and the type of error functions used.

The program offers the following types of networks: multilayer perceptrons and radial basis function networks.

Rice. 6. Dialog box Automated neural networks - Quick tab

Rice. 7. Dialog box Automated neural networks - Activation functions for MLP tab

On the tab Attenuation You can enable the weight regularization option, which will adjust the complexity of the trained networks. This is useful when the task has big number input variables, and also a large number of neurons in the hidden layer is specified.

But in our case we will not use this.

Rice. 8. Dialog box Automated neural networks - Attenuation tab

Now we can move on to the neural network training step.

Step 4. Start the neural network training procedure by clicking the button OK.

In the dialog box shown in Fig. 9, some information about the currently trained neural network is displayed. We can analyze the network architecture, monitor the progress of algorithm iterations, and record model errors. For regression, the mean square error is used, for classification, the percentage of correct classification of observations is used (as in our case).

Rice. 9. Dialog box Training a neural network

The program automatically moves to the next step.

Step 5. Analysis of results. In the results window you can analyze the solutions obtained. The program will select best networks and will show the quality of the solution.

Rice. 10. Dialog box Neural networks - Results - Predicted tab

You can select a specific network, the best in our opinion, using the button Select/Uncheck networks.

Rice. 11. Model Activation dialog box

For example, one way to test is to compare observed values and predicted results. Comparison of observed and predicted values for a selected network, for example, training and test sets.

Rice. 12. Table of observed and predicted values

Or look at the classification error matrix on the test sample:

Rice. 13. Classification matrix

Step 6. Save the best networks for future use, for example, for automatic forecasting.

For further launch, networks are saved in PMML format.

Rice. 14. Dialog box Neural networks - Results - Saving networks

Rice. 15. Standard network file saving window

Step 7 Run saved models on new data. So we load new data, but so that the variables match the variables in the models.

To run the model on new data, you can select the option on the launch panel (Fig. 1) Load models from previous analyzes and press the button Load networks.

Rice. 16. Standard network file selection window

We get:

Rice. 17. STATISTICA Automated Neural Networks (SANN) launchpad

After selecting the required file, all settings are automatically determined, so you can immediately go to the results window (by double-clicking the button OK) and analyze the results obtained.

This is exactly the typical research scenario in the package

In STATISTICA, the continuous forecasting problem is represented as a regression problem. In the context of this problem, a neural network is considered as a nonlinear function, the complexity of which is controlled “semi-parametrically” - the number of elements in the network affects the complexity of the solution, but, of course, the analyst cannot see the explicit form of the regression function.

It is required to build a neural network that calculates the emission of lead into the atmosphere depending on the number and type of passing transport. The data is stored in the file Lead.xls.

Open the Svinets.xls file in the Statistica package. The Open File window appears.

Rice. 4. 33. Import window.

You must select the “Import selected sheet” option and select the name of the data sheet:

Rice. 4. 34. Selecting an Excel sheet for import into the Statistica package.

In the next window, you need to specify the real data parameters, which, as a rule, are determined and displayed automatically (except for the last three checkboxes).

Rice. 4. 35. Setting the import area.

After this, the imported data will be displayed in the window.

Rice. 4. 36. Import results.

Run the analysis package using neural networks. To do this, select “Neural Networks” from the “Analysis” menu.

Rice. 4. 37. Selecting a data processing method - “neural network”.

after which the STATISTICA Neural Networks package window will appear:

Rice. 4. 38. Start window for the “neural networks” analysis.

Go to the “Quick” tab, where you need to set the task type - Regression, and the tool - Network Designer.

Rice. 4. 39. Launching the neural network designer.

Next, by pressing the “OK” button, you will switch to the mode for selecting output (dependent) and input (independent) variables. We select “Lead” as the first one, and the number of cars of all categories as the last ones. The "No" and "Street" columns remain unused.

Rice. 4. 40. Selecting input and output data for a neural network.

By clicking “Ok” you will return to the “Quick” tab. Then, by clicking the “Ok” button again, you will be taken to the neural network formation window. On the “Fast” tab, you need to select the network type - multilayer perceptron,

Rice. 4. 41. Selecting the type of neural network.

and on the “Elements” tab you can specify the required number of layers, the number of neurons in each, as well as the type of activation function:

Rice. 4. 42. Setting the number of layers and types of neurons.

Rice. 4. 43. Choosing a method for training a neural network.

Here, by clicking on the “Samples” button, you can set the number of training, control and test examples. If you set the number of test and control examples to zero, then the network will be trained using all examples:

Rice. 4. 44. Determine data for training and testing.

Returning to the main training window, you can click on the “User” button and go to the “Interactive” tab, request that the training process be reflected in the form of a graph:

Rice. 4. 45. Specifying the type of graph to demonstrate the learning process.

Finally, by clicking on the “Ok” button, you will start the learning process, the result of which will be displayed on the graph:

Rice. 4. 46. Training a neural network.

By clicking on the “Ok” button, you will be taken to the results window, where you can study the various characteristics of the created network by moving through the window tabs:

Rice. 4. 47. Results of neural network modeling.

So, for example, on the “Advanced” tab there is a “Network architecture” button, by clicking on which you can see the topology of the constructed network:

Rice. 4. 48. View of the constructed neural network.

as well as the “User Observations” button, where you can give the network new initial data and receive a response from an already trained network.