Classification and assessment of antioxidant activity and phenolic content of different varieties of(Phoenix dactylifera) date palm fruits from Iran
Abstract Edible parts of date palm fruits (DPF) from Iran have been evaluated for their antioxidant activities (AA) and total phenolic content (TPC) using Trolox equivalent antioxi-dant capacity, TEAC, and Folin–Ciocalteu methods, respec-tively. Seven kinds of soft dates (SD), namely Berehi, Morda-sang, Mazafati, Kabkab, Khanizi, Shahabi and Medjool dates; four types of semidry dates (SDD), namely Piarom, Zahedi, Halavi and Karoot dates; and one kind of dry date (DD), Rabbi date were used. The AA of the DPF were in the range of 21.31–29.94, 33.36–35.85 and 33.71 µmol Trolox equiva-lents/100 g fresh weight, while TPC contents were in the range of 250.75–328.57, 356.55–398.23, and 361.46 mg gallic acid equivalents (GAE)/100 g fresh weight for SD, SDD and DD, respectively. Results show that there was a linear relationship between AA and TPC. Finally, transmittance FT-IR spectra of the extracted methanolic date samples were used to develop classification model based on their TPC content.
Antioxidants are compounds that can delay or inhibit the oxidation of lipids or other molecules by inhibiting the ini-tiation or propagation of oxidative chain reaction [1, 2]. The antioxidative effect is mainly due to phenolic com-ponents, such as flavonoids, phenolic acids, and phenolic diterpenes . The antioxidant activity of phenolic com-pounds is a result of their redox properties, which can play an important role in absorbing and neutralizing free radi-cals, quenching singlet and triplet oxygen, or decomposing peroxides . Many of these phytochemicals possess sig-nificant antioxidant capacities that may be associated with lower incidence and lower mortality rates of cancer .
The fruit of the date palm (Phoenix dactylifera) is an important commercial crop in Middle East countries. Dates are a worthy source of energy, vitamins, and elements such as phosphorus, iron, potassium, as well as a significant amount of calcium . As it was noted in different refer-ences, date palm fruits (DPF) are known to be rich source of carbohydrates, but fairly low percentage of protein . They are also an excellent source of simple sugars, miner-als and vitamins, fat, and a high percentage of dietary fiber . The flesh of dates contains 0.2–0.5 % oil, while the seed contains 7.7–9.7 % oil. Dates contain elemental fluo-rine that is useful for protecting teeth against decay .
These days, the consumption of fruit and vegetables is regarded as an important part for functional food and health. Actually, recent epidemiological studies have indi-cated that a high consumption of fruit and vegetables is associated with reduced risk for a number of chronic dis-eases . The recent explosion of interest in the bioac-tivity of the flavonoids of higher plants is due, at least in part, to the potential health benefits of these polyphenolic
compounds as important dietary constituents . Thus, it is important to have a clear idea of the major phenolic families of which fruit and vegetables are comprised and the levels contained therein . Several studies have been made for the determination of the antioxidant activity (AA) and total phenolic content (TPC) of date from different geographical places and varieties [11–14]. The date palm fruit possesses antioxidant and antimutagenic properties in vitro , therefore the AA and TPC of wide variety from Iranian date have been analyzed.
Amongst different spectrometric methods, infrared (IR) spectrometry has found widespread applications in charac-terizing food products in food quality control. FT-IR spec-troscopy has a strong potential in the analysis and quality control of foods because of its sensitivity, versatility, and speed . Depending on the method used (e.g., trans-mission, diffuse reflectance, attenuated total reflectance), FT-IR spectroscopy minimizes or even eliminates sample preparation. An IR spectrum, especially a FT-IR spectrum, usually consists of hundreds or even thousands of variables, which contain relevant and also irrelevant information for a calibration (classification). In some instances, the contri-bution of interested properties in the total signals is much lower than those of the other contaminations (sources). Thus, powerful methods should be used to extract finger-print of the properties from the total signal .
In the present study, we evaluated the antioxidant capac-ities and TPC of the methanolic extract from twelve DPF varieties of south region of Iran categorized as soft, semi-dry and dry DPF. Furthermore, transmittance FT-IR spectra of TPC extracts were used to develop discriminant models for classification between different types (soft, semidry and dry) and 12 varieties of date samples. Statistical param-eter reveals that transmittance FT-IR data will be helpful for discrimination of DPF based on TPC. According to the best of our knowledge, there is no report on the discrimina-tion of date samples by transmittance FT- IR spectroscopy, which is technically simple.
Materials and methods
Date cultivars are classified on the basis of the texture of the ripe fruit into three generally accepted categories as soft date (SD), semidry date (SDD) and dry date (DD). These catego-ries are generally, but not exclusively, associated with particular moisture and sucrose contents . Soft dates usually possess moisture content in excess of 30 % and no sucrose. Semi-dry types have moisture content between 20 and 30 % and a higher sucrose level, whilst dry varieties contain less than 20 % moisture and approximately equal quantities of sucrose and
reducing sugars. Seven varieties of DPF from Iran used in this study were soft dates (SD), namely Berehi, Mordasang, Sha-habi, Mazafati, Kabkab, Khanizi, Medjool; four types of semi-dry dates (SDD), namely Karoot, Piarom, Zahedi and Halavi dates; and one dry date (DD) which was rabbi dates. These types are well known because of their common preference. Dates samples were provided from “Bushehr Date Research Center”. They were from several regions of Bushehr province (Dashtestan, Samal, Posht Kooh, Tangestan and Ab Pakhsh) and were obtained at the beginning of the 2010 harvest season. The samples were obtained from same lot, so that to have good validation, we did 3 replicate sample measurements. For each replicate, all data collection steps including sample prepara-tions and spectral measurements were repeated. The samples were selected without damaging and were transported in paper bags in refrigerator for the studies. Each date weighed about 3–6 g per fruit and for each extraction, approximately 10 g (3 dates) was taken. Since we have 12 date varieties and for each variety 10 measurements have been done, totally 120 sam-ples have been obtained in each replication. So that, for three replicate 360 samples have been gathered.
Chemicals and reagents
Pure powders of 2,4,6-tripyridyl-S-triazine (TPTZ), FeCl3-3H2O, potassium persulphate, sodium acetate, and sodium carbonate were supplied from Sigma-Aldrich (St. Louis, MO, USA). The Folin–Ciocalteu reagent and Trolox (6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid) were purchased from Merck. All used chemicals and rea-gents were of analytical grade.
Extraction of phenolic compound
The procedure reported by Biglari et al.  has been adopted for extraction. Approximately 10.0 g of edible part of the dates of each variety was crushed and cut to small pieces with dry-blended for 5 min with a blender (Bosch, USA). Then, extraction has been obtained with 150 mL methanol–water (4:1, v/v), at room temperature (25 °C) for 5 h using a shaker. The extracts were then centrifuged (Sigma-Bench top centrifuge-Germany) at 4000g, for 10 min and then the supernatants were decanted into vials. The storage conditions (time and temperature) were the same for all types of fruits.
Determination of TPC (Folin–Ciocalteu assay)
Total phenolic content (TPC) was determined according to the Folin–Ciocalteu procedure . This method is rapid, simple and inexpensive to measure the antioxidant capacity of food involving the use of the free radical, 2,2 Diphenyl-1-picryl-hydrazyl (DPPH). A 200 µL portion of the methanolic date
extract or gallic acid (5 × 10−4−1.8 × 10−3 mol L−1) as stand-ard was mixed with 2.5 mL of Folin–Ciocalteu reagent (ten-fold diluted with deionized water) and allowed to stand at room temperature for 5 min, and then 2 mL of sodium bicarbonate (7.5 % w/w) was added to the mixture. After shaking (speed, 60 rpm) for 60 min at room temperature, the absorbance of the resulting solution was measured at 760 nm using a Perkin-Elmer Lambda 25 spectrophotometer. The total phenol concen-tration has been calculated from the calibration curve using gal-lic acid as a standard. The results were expressed as mg gallic acid equivalents (GAE)/100 g fresh weight sample .
Evaluation of antioxidant activity using DPPH assay
Antioxidant activity (AA) of the methanol extracts was evaluated as the scavenger of the free anionic DPPH radi-cal described by Tuberoso et al. . An aliquot of 50 µL of date extract sample was added to 3.9 mL DPPH solution (0.18 mM) and then they were mixed vigorously. The reaction mixture was incubated for 30 min at 25 °C in the dark, and then the absorbance of samples was recorded at 517 nm. The spectrophotometer has been calibrated using methanol as a reference. The Trolox equivalent antioxidant capacity (TEAC) was calculated from the equation determined from linear regression, after plotting known solutions of Trolox .
FT‑IR spectroscopic measurements
Transmittance FT-IR spectra were measured using an FT-IR Perkin-Elmer precisely system spectrophotometer (Spectrum one, Produced in The UK), equipped with CaF2 cell windows for liquid film preparations. One drop of the extracted methanolic date samples (about 1.0 µL) was located on a CaF2 window and then was spread on its sur-face by a secondary CaF2 window. The transmittance spec-tra were obtained in the range of 4000–450 cm−1.
Some statistical techniques such as analysis of variance (Two-way ANOVA), Pearson’s correlation, and regres-sion analysis were performed for analyzing the data obtained from different types of dates, and to study the relationship between AA and TPC. Data were reported as mean ± standard deviation of the mean. Differences at P < 0.05 were considered statistically significant.
Transmittance FT-IR spectra of all studied date samples were collected in a data matrix X of the dimension of
(n × m), where n is the number of samples whose FT-IR spectrum was recorded and m is the number of absorbance readings per spectrum. Thus, each row of X (xi) is the trans-mittance FT-IR spectrum of a specified date sample. The data have been subjected to extended multiplicative signal correction (EMSC) as a preprocessing algorithm . The mean of X data has been used as reference spectrum for EMSC transformation.
Data analysis was performed by principal component analysis (PCA), partial least squares discriminate analysis (PLS-DA), extended canonical variates analysis (ECVA) and CLoVA-based ECVA . Almost all of these meth-ods are well-known chemometrics methods hence they are briefly described here.
Principal component analysis (PCA)
Principal component analysis (PCA) is probably the most applied linear projection method, and is widely used for data reduction and visualization. The basic idea behind the PCA is that it allows projecting the data from a high-dimensional space onto a lower dimensional one, without losing much information. The projection is done by trans-forming a set of correlated variables into a set of a few orthogonal ones, called principal components (PCs). The PCs are constructed in such a way that the first one explains most of the data variance; the second is orthogonal to the first and describes most of the variance not explained by the first PC, and so on. PCA decomposes the original data matrix, X (m × n), into three matrices.
|X = TP′ + E||(1)|
where T (m × k) and P (n × k) are the score and loading matrices, respectively, with k significant PCs. E (m × n) is the residual matrix, corresponds to the variance not described by the PCA model, m and n are the number of objects and variables, respectively. The superscript (‘) denotes matrix transpose.
Partial least squares discriminate analysis (PLS‑DA)
Partial least squares discriminate analysis (PLS-DA)  is a partial least squares regression aimed at predicting one (or several) binary responses(s) y, from a set of variables in X. Thus, PLS-DA needs the class variable of the objects and extracted scores which have maximal variances of the original variables but also are correlated with the class vari-able. Although PLS-DA can work using PLS1 and PLS2 algorithms, PLS-DA is essentially based on the PLS2 algo-rithm that searches for latent variables with a maximum covariance (discriminant power) with the Y-variables (class of samples).
Table 1 Antioxidant activity and total phenolic content of different date varieties from Iran (based on fresh weight)
|Texture||Variety name||Antioxidant activity||Total phenolic content|
|TEAC (µmol Trolox||(mg GAE/100 g fresh|
|equivalents/100 g fresh weight)|
|SD (soft date)||Berehi||27.84 ± 1.04a||304.08||± 3.39a|
|Mordasang||26.17 ± 1.37a||292.57||± 3.35b|
|Shahabi||26.57 ± 1.15a||300.70||± 4.32a|
|Mazafati||29.94 ± 1.28a||328.57||± 4.29c|
|Kabkab||28.52 ± 1.19a||320.50||± 4.33ab|
|Khanizi||21.31 ± 1.59a||250.75||± 2.26abc|
|Medjool||28.80 ± 1.87a||319.46||± 3.54abcd|
|SDD (semidry date)||Piarom||33.61 ± 2.26b||353.17||± 3.35d|
|Halavi||34.30 ± 2.47b||356.55||± 5.31d|
|Zahedi||33.36 ± 2.57b||360.48||± 5.26d|
|Karoot||35.85 ± 3.54b||398.23||± 6.11e|
|DD (dry date)||Rabbi||33.71 ± 2.98b||361.56||± 3.40d|
1 Data are expressed as milligrams of gallic acid equivalents (GA) per 100 g ± SD (n = 3) on a fresh weight basis. Same letter, within a column, is not significantly different (P > 0.05)
2 Data are expressed as micromoles of Trolox equivalents (TE) per gram ± SD (n = 3) on a fresh weight basis
Extended canonical variate analysis (ECVA)
ECVA  is a recent pattern recognition tool which rep-resents a new approach for grouping samples based on the standard canonical variates analysis, but with an underly-ing PLS engine. It is able to cope with several different classes yielding powerful separations. Careful validation is required to avoid over fitting problem. The CLoVA-ECVA which was proposed in our research group is an extension of the ECVA. It first uses a clustering algorithm, and then run ECVA on each groups of variables separately. In the present application, Kohonen neural network (SOM) was used as a clustering method.
The PCA analysis has been calculated by singular value decomposition (SVD) function. ECVA data analysis was performed in MATLAB utilizing ECVA Toolbox version 2.02 (available at http://www.models.life.ku.dk). PLS-DA was calculated using PLS_Toolbox version 4. All calcula-tions were performed in the MATLAB (version 7.1, Math work, Inc.) environment. Kohonen provided by Milano chemometrics research group is available at http://michem. disat.unimib.it/chm/download/kohoneninfo.html.
Results and discussion
Antioxidant activity and total phenolic content of the DPF
DPPH assay is based on the ability of a chemical to react with DPPH radical in the assay system. The averages
values of antioxidant activity (AA) of different DPF based on DPPH assay are given in Table 1. Karoot (semidry) date showed the highest level of AA (35.85 µmol Trolox equivalents/100 g fresh weight), whilst Khanizi date (soft date) exhibited the lowest level of AA (21.31 µmol Trolox equivalents/100 g fresh weight). The order of AA of DPF based on DPPH assay was: Khanizi < Morda-sang < Shahabi < Berehi < Kabkab < Medjool < Maza-fati < Zahedi < Piarom < Rabbi < Halavi < Karoot. Analysis of variance (ANOVA) showed strong difference between all types of dates (with P < 0.05). It was evident that semidry and dry dates have different AA with respect to soft dates (P < 0.05). The same conclusion can be obtained (ANOVA analysis) for TPC content.
The TPC of DPF varied from 250.57 to 398.21 mg gallic acid equivalents (GAE)/100 g fresh weight sam-ple. The highest TPC was also obtained in Karoot date and the lowest one was found for the Kha-nizi date. The order of TPC is: Khanizi < Morda-sang < Shahabi < Berehi < Medjool < Kabkab < Maza-fati < Piarom < Halavi < Zahedi < Rabbi << Karoot (Table 1). These results showed that the SDD of Iran had a similar level of phenolic content compared with those of Tunisia date palm fruits . All varieties of dates have more or less simi-lar TPC compared to Karoot. However, Al-Farsi et al.  reported TPC values between 172 and 246 mg gallic acid equivalents/100 g fresh weight of Omani dates, which were closer to the TPC value of the studied soft date samples (Kha-nizi and Mordasang).
The antioxidant capacity of DPF can be attributed to the phenolic content [27, 28]. Therefore, regression analysis of
Table 2 Antioxidant activity and total phenolic content of Kabkab and Zahedi date varieties from different regions of Bushehr—Iran (based on fresh weight)
|Variety||Region||Antioxidant activity||Total phenolic content|
|TEAC (µmol Trolox equivalents/100 g||(mg GAE/100 g fresh|
|SD||28.43 ± 1.99a||335.97 ± 4.86a|
|Dashtestan||18.21 ± 1.94b||198.87 ± 2.44b|
|SDD||Tangestan||21.34 ± 1.65b||242.43 ± 3.39c|
|26.89 ± 2.92a||310.44 ± 3.15e|
|Pish-Kooh||33.04 ± 2.85c||365.74 ± 3.18d|
Same letter, within a column, is not significantly different (P > 0.05)
1 Data are expressed as milligrams of gallic acid equivalents (GA) per 100 g ± SD (n = 3) on a fresh weight basis
2 Data are expressed as micromoles of Trolox equivalents (TE) per gram ± SD (n = 3) on a fresh weight basis
- (DPPH assay) on TPC (y =1064x−4.9812) has been obtained. It can be seen that TPC showed a high influence on AA since the R2 = 0.97 indicates that the most changes in AA (DPPH assay) belong to TPC. This confirms that phenolic compounds, the dominant phytochemicals in Ira-nian DPF, are antioxidative. This is in agreement with the results reported in literature [18, 26–28].
In the next study, the effect of grow regions on the AA and TPC of DPF was investigated. To do so, Kabkab date from three regions of Bushehr province of Iran and Zahedi date from two regions were analyzed. The obtained results are shown in Table 2. The given results explain the effect of the region of origin of Kabkab date on its AA, so that dates of Samal region possess the highest one and for Zahedi date, those from Pish-Kooh region represented higher AA capacity than those of Posht-Kooh region. ANOVA analy-sis indicates the differences of AA and TPC of date sam-ples based on regions. The regression analysis of the TPC
and AA data reported in Table 2 (y = 0.0847x + 0.9582; R2 = 0.98) again confirmed the dependency of AA on TPC.
Classification of DPF using FT‑IR spectrometry
The ability of transmittance FT-IR spectroscopy in combi-nation with pattern recognition data analysis for discrimi-nation between different date samples (soft, semidry and dry) has been investigated. The transmittance FT-IR spec-tra of a collection of 360 date samples of different sources were provided. Date samples were divided into the calibra-tion and prediction sets by the DUPLEX algorithm . Summary DUPLEX algorithm is started as follows: first, the two points which are furthest away from each other are selected for the calibration set; from the remaining points, the two objects which are furthest away from each other are included in the prediction set; then the remaining point
which is furthest away from the two previously selected for the calibration set is included in the calibration set. The procedure is repeated for the test set which is furthest from the existing points in that set. In conclusion, points rep-resenting both training and test sets were distributed uni-formly within the whole space which is constructed via the entire dataset. Based on DUPLEX strategy, one can assure that the composition of the training set and the test set is representative, at the same time the imbalance of the two datasets is avoided.
In this case, 250 samples were included in the training and the remaining 110 samples were selected as test set. The preprocessed transmittance FT-IR spectra of the date samples are represented in Fig. 1. One can observe that all types of dates exhibit similar spectra so that the classifica-tion of these samples by visual inspection of the spectra is impossible. In the proceeding sections, we will show that how the employed pattern recognition methods with trans-mittance FT-IR spectra could discriminate the date sample based on TPC contents.
Finally, it should be pointed out that the main purposes of following analysis can be divided into two parts. In the first part, the extent of date samples classifications of each method was evaluated based on date types (soft, semidry and dry) and in the second part, we mainly focused on the ability of each method in discrimination of similar date varieties within each class.
Data visualization by PCA
PCA was performed to get an overall impression about the correlation of 250 date samples (three types) described in Table 1. The results of application of PCA on the transmit-tance FT-IR spectral data matrix of whole samples (soft, semidry and dry class) are given in Table S1. In this table,
Fig. 1 Transmittance FT-IR spectra of the date samples used in this study: a Berehi, b Halavi, c Kabkab, d Khanizi, e Karoot, f Medjool, g Morda-sang, h Mazafati, i Piarom, j Rabbi, k Shahabi, l Zahedi
the percent of variances in the data matrix explained by each PC and the cumulative percent of variances (CPV) are reported. The first three principal components could
explain 97.87 % of variance in dataset. Figure S1 shows the three-dimensional plot of first three principal components for preprocessed dataset that reveal the relative position of
the studied date samples based on the similarity between their transmittance FT-IR spectra. Due to high similarity between the transmittance FT-IR spectra of the different date types (especially for soft and semidry) of date sam-ples, there is an overlapping between these two classes. The interesting point is that, dry date samples (Rabbi) are com-pletely separated from two other classes.
In addition, the transmittance FT-IR spectra of Kab-kab and Zahedi samples from different geographical place have been subjected to PCA to see the relative position of samples with each other. Figure S2a shows the two-dimen-sional plot of the first two principal components for Kab-kab dates, while Fig. S2b indicates the 3-D plot of principal component. They show the relative position of Zahedi dates based on transmittance FT-IR spectra. Very nice separation of Kabkab dates can be found along the second principal component for three geographical regions (for both train-ing and prediction samples). Similar results can be obtained for Zahedi date from the two regions (Pesh Kooh and Posht Kooh).
Partial least squares discriminate analysis (PLS‑DA)
In contrast to PCA, PLS-DA extracts the PCs using the information in y-matrix. The discriminant PLS model was developed using selected 250 training samples. The number of significant latent variables was determined using leave-5-out cross-validation (CV), where the sam-ples were randomly grouped into 50 subsets of 5 sam-ples and in each cycle of CV, one group was left out. Subsequently, the prediction error as function of number of latent variables indicated that the best performances (lowest classification error) can be obtained when 5 num-bers of PLS latent variables are used (Fig. S3 of supple-mentary section). In Fig. S4, the plots of the first 2 PLS factors are shown for both training and prediction sam-ples. Obviously, dry date samples have been separated from the soft and semidry sample dates. However, there is some overlap between soft and semidry date samples. The classification results for calibrations and predictions samples are summarized in Table S2. As seen, all of the semidry samples (Halavi, Piarom, Zahedi, and Karoot) could be assigned to their original groups and thus they are associated with 100 % correct classification rate. Also, the dry date sample (Rabbi) was completely dis-criminated with respect to other classes as shown in PCA score plot. The same results have been obtained for pre-diction step.
The discrimination power of PLS-DA has been checked for classification of 12 date varieties from each other’s (Fig. S5 from supplementary section). Table S3 indicates the misclassification errors. As we can see from this table, PLS-DA can mostly discriminate the date samples based on
Fig. 2 Distribution pattern of the date samples (soft, semidry and dry) in the two-dimensional canonical variate space of their transmit-tance FT-IR spectra (whole region) for preprocessed data by EMSC
type (soft, semidry and dry), but it is unable to accurately discriminate the date samples within each class.
Extended canonical variate analysis (ECVA)
Between the different supervised pattern recognition meth-ods, ECVA is a new and efficient classification method, which has been found widespread applications in the recent years . The composition of the calibration and predic-tion sets was the same as used in PLS-DA and PCA. The ECVA procedure used leave-5-out cross-validation to select the optimum number of LV’s for the PLS model which was used in the inner part of ECVA. In the first part, ECVA has been applied for discrimination of date samples based on tissue types (DD, SDD and SD). The plot of misclas-sification error versus the number of PLS latent variable (Fig. S6 of supporting information) shows that 10 latent variables can be selected as optimum ones for calibration and prediction.
In Fig. 2, the two-dimensional plot of canonical variates for the ECVA method is shown. As it is evident, ECVA can discriminate the soft, semidry and dry date with each other along the first and second canonical variates for calibra-tion and prediction sets. In the next part, ECVA has been applied to classify the similar date samples within each class (12 varieties). Figure S7 from supporting information shows the two-dimensional canonical variate for 12 date varieties. This figure reveals that, ECVA cannot discrimi-nate the Halavi, Piarom, Khanizi and Karoot date which are semidry date samples from each other’s. Besides, for Berehi and Medjool also we observe some overlapping. The classification results (shown in Table S2 and S3) reveal that ECVA can discriminate the date samples by their types, but we have some misclassification errors when we want to discriminate the date varieties.
Fig. 3 Distribution pattern of the date samples (12 varieties) in the two-dimensional canoni-cal variate space of their trans-mittance FT-IR spectra based on cluster (S4,4) of network size (n = 4) of CLoVA-ECVA
Finally, CLoVA-based ECVA was used as an improvement of ECVA for variable selection to select the best spectral regions for classification and consequently to obtain lower classification errors. In CLoVA-based ECVA, the whole spectral region is divided into some cluster by applica-tion of clustering algorithm (Kohonen neural network) and ECVA is applied to each cluster separately. The number of network size which is defined by user should be optimum. Different network sizes (2–9) have been checked and it was found that the results of network size (n = 4; 16-cluster) resulted in the least classification error. The performance of each sub-data (here is 16) was checked by applying the ECVA on each sub-region, separately. It should be noted that the dataset has been subjected to EMSC transform and then is auto-scaled before the SOM analysis. Figure S8 of supporting information shows the distribution of variables in the Kohonen SOM of network sizes (n = 4) for transmit-tance FT-IR data. The clusters in SOM map are arranged in a bi-dimensional (matrix-like) pattern. Each cluster is denoted as Si,j, where i and j are the row and column numbers of the clusters in the Kohonen map, respectively. Among these clusters, cluster S4,4 resulted in the best per-formance such that no misclassification was observed for calibration set and just one for prediction set. This suggests that the obtained model from this spectral region possesses more useful information. Besides the high degree of accu-racy obtained by CLoVA-ECVA, it offers that, one can select the spectral ranges, which are more informative and relevant for specific classes.
The classification results of the CLoVA-based ECVA based on selected cluster are visualized by plotting the obtained two canonical variates (from the variables of cluster S4,4) over each other (Fig. 3). In comparison to the 2D plots of the PCA scores, PLS-DA scores, and ECVA
Table 3 Non-Error Rate (NER), i.e., percentage of correctly assigned samples achieved with different classification methods for date varieties
a Number of PLS latent variables
b NER for calibration set
c NER for prediction set
one can observe a better resolving power of the date sample classes in the 2D space of the canonical variates. Interestingly, all samples are discriminated along the first and second canonical variate axis. The canonical weight vectors for the selected cluster (S4,4), shown in Fig. S9, for the first three ones, indicate which wavenumbers play the more significant role on the discrimination of the date samples. It is observed that the weight vectors possess the largest values at the main three spectral regions includ-ing 890–950, 1350–1380, 3300–3350, cm− 1. The selected cluster (S4,4) also have good potential for discrimination of date samples based on their type (Fig. S10 from sup-plementary). Soft date samples have positive sign, while the dry and semidry have negative sign along the second canonical variate. The obtained model not only repre-sented excellent prediction ability for training samples, but also it could predict correctly the class membership of all samples in the test set. The non-error rates (NER,
- of correctly assigned samples) for three pattern recog-nition methods are reported in Table 3. As it is obvious from this table, PLS-DA using the 5 factors gave NER equal to 73.98 and 80.48 % for the calibration and the test samples, respectively, whereas CLoVA-ECVA showed the
best classification results and performed better than PLS-DA and also ECVA.
The antioxidant activities (AA) of twelve selected date fruits from Iran were determined in this manuscript. The total phenolic content (TPC) and AA of the DPF were measured using Folin–Ciocalteu and DPPH methods, respectively. The result shows that dry and semidry date’s variety had the highest AA, TPC compared to those of other dates. A strong correlation existed between AA and TPC of date samples. The classification of transmittance FT-IR spectra of different date samples (soft, semidry and dry) by means of pattern recognition method was investi-gated. PLS-DA and ECVA resulted in partial discrimina-tion of date classes. However, high discrimination ability was obtained by CLoVA-ECVA such that all of 250 date and 110 samples prediction used in this study were cor-rectly assigned by CLoVA-ECVA to their own class group with high NER (100 %).