Use of Near Infrared Spectroscopy (NIRS) as a tool to discriminate species of the genus Dimorphandra Schott (Leguminosae: Caesalpinioideae)

This work aimed to use near-infrared spectroscopy (NIRS) as a tool to discriminate species of the genus Dimorphandra Schott (Leguminosae, Caesalpinioideae). Spectra were collected from 315 individuals (six readings per individual) distributed in 20 species of Dimorphandra using a Thermo Nicollet spectrophotometer, FT-NIR Antaris II Method Development System (MDS) in the INPA (National Institute of Amazonian Research) Herbarium. Absorbance values comprise the wavenumbers from 4,000 to 10,000 cm -1 , corresponding to the near infrared region, recorded for 16 scans at a resolution of 8 cm -1 . Principal Component Analysis (PCA) was used to visualize the spectral distribution. Discriminant functions were generated in order to evaluate the potential of the data to correctly distinguish the species and the 70-30 cross-validation technique was used to validate the generated models, with selections randomized 1, 10, 50 and 100 times. Excellent results were obtained in the PCA, with prediction values of 95-92%, using the 70-30 validation test in the linear discriminant analyses (LDA), thus indicating high predictive power in the discrimination of species of the genus Dimorphandra . Thus, it is inferred that NIRS contributes to the discrimination of species of the genus and elucidation of taxonomic problems.


INTRODUCTION
The family Leguminosae comprises over 22.000 species in ca.800 genera, what makes it the third largest angiosperm family in terms of species diversity [1][2][3].Ranging in size from small annual herbs to long-lived giant trees, the Leguminosae are often ecologically dominant in tropical and temperate biomes [1,3].In 2017, a new subfamily classification was established for Leguminosae that dealt with the long-standing problem of paraphyly of the subfamily Caesalpinioideae DC. and formally divided the family into six subfamilies: Cercidoideae LPWG, Detarioideae Burmeist., Duparquetioideae LPWG, Dialioideae LPWG, Caesalpinioideae DC., and Papilionoideae DC. [3].Since then, the idea that Leguminosae comprises six major lineages has been largely confirmed by phylogenomic analyses of large datasets of nuclear gene and plastome sequences [4][5][6], providing robust support for the six subfamilies.
Caesalpinioideae sensu is the second largest subfamily of Leguminosae, with ca.4.680 species and 163 genera.Within this subfamily, ca.3.400 species and 90 genera are placed within the mimosoid clade, corresponding to the former subfamily Mimosoideae [3].Caesalpinioideae has a pantropical distribution and many of its lineages form ecologically abundant or dominant elements in each of the main lowland tropical biomesseasonally dry tropical forests, savannas, and humid tropical foreststhus occurring along the whole rainfall range of lowland tropical regions, from arid to hyperhumid, with only a small fraction of species found in the warm temperate zone and a subset of these species showing to be ice-tolerant.The main current problems to be solved in Caesalpinoideae are related to the delimitation of non-monophyletic genera [3].[7] indicate that, although they included 420 taxa in their current phylogenomic analyses, additional sampling is still clearly necessary to solve all the possible problems of lack of monophyly within Caesalpinioideae.For example, a comprehensive sampling is still needed to address the long-standing lack of monophyly in Dimorphandra Schott.
Dimorphandra currently presents 26 species in the world and 23 species and 6 subspecies in Brazil [8][9][10].The genus was described in 1827 [11], with Dimorphandra exaltata Schott as the type species [8].It is restricted to Brazil and adjacent countries (Colombia, Venezuela, Peru and Bolivia) in South America, with some species endemic to some phytogeographic domains such as the Amazon (D. campinaranum Ducke and D. urubuensis Ducke) and Cerrado (D. wilsonii Rizzini) and other species presenting a wide distribution such as D. gardneriana Tul. and D. mollis Benth [8,9,12].The first evidences that Dimorphandra is not a monophyletic group were pointed out by Manzanilla and Bruneau (2012) [13], who proposed a phylogeny for Dimorphandra through a Bayesian analysis, based on the sequencing of four cpDNA regions obtained from 45 individuals and 16 species.
Over time, the sampling for the Dimorphandra phylogeny was expanded (57 specimens and 17 species), with sequenced plastid (matK) and nuclear (ITS) markers, demonstrating again that the Dimorphandra species were grouped into three subclades, each one corresponding to the three subgenera, corroborating the lack of monophyly of the genus [15].Thus, studies that contribute to solve this problem are fundamental, especially in the discrimination of species [16].
In Dimorphandra, the morphological overlap of some traits such as the indument, number of pinnae, leaflet shape, floral pedicels, the shape of flowers, stamens and staminodes among closely related taxa and the lack of description of some reproductive structures such as flowers and fruits is relatively high, making it difficult to delimitate the species.The main examples are the erroneous determinations of D. gardneriana Tul. and D. mollis Benth., especially when young branches are collected, because the diagnostic traits of the species, such as the number and arrangement of pinnae, the indument, the margin and shape of the base of the leaflets, and pubescence of the flower buds, are not yet evident [8,17].
The delimitation of the subspecies of D. macrostachya Benth.and D. cuprea Sprague & Sandwith is also problematic, as they also show considerable morphological overlap of traits (leaflet shape and indument, number of pinnae, calyx shape, and pedicel size) besides the lack of description of some traits such as the staminodes.Another factor is the uncertain geographical distribution of the species, due to lack of geographical data and incorrect determinations that end up leading to mistaken reports of occurrences, further confusing the delimitation of the ssubspecies [8,12].
New tools that contribute to taxonomy always bring new perspectives and better support to clarify the delimitation of species.One of these tools is Near Infrared Spectroscopy (NIRS), which emerged in this century as a clean, low-cost and rapid method that takes advantage of and improves conventional analyses [18][19][20].When a sample of organic material is irradiated, the chemical bonds vibrate continuously, causing the elongation and flexion of the molecules and activating a type of wave movement that is characteristic of the functional group that composes it.This information aligned with multivariate analyses serves as basis to discriminate species [21].
Infrared spectroscopy of leaves has surprising potential to aid in the taxonomy and systematics of plants [22][23][24][25][26][27][28][29][30].This is due to the fact that the spectral behavior of a leaf is a function of its chemical composition, morphology and internal structure, data that help in the recognition of species [31][32][33].Recent studies have shown the effectiveness of Fourier transform near-infrared (FT-NIR) spectroscopy in the identification of botanical species [20,21,30,34,35].
Thus, the works already published clearly show that spectroscopy using the near-infrared band -NIRS is an important tool to discriminate species and assist in the taxonomic resolution of plant species.Thus, this study aimed to use NIRS as a tool to discriminate species of the genus Dimorphandra Schott (Leguminosae, Caesalpinioideae).

MATERIAL AND METHODS
A Thermo Nicollet spectrophotometer, FT-NIR Antaris II Method Development System (MDS), in the INPA (National Institute of Amazonian Research) Herbarium, was used to collect leaf spectral data.The equipment was calibrated (white) every four hours of use, at each reading.For control, a black body was used on top of the point where the spectrum was collected, to avoid the dispersion of light.
Spectra were collected from whole and dried leaves of exsiccates deposited in the INPA Herbarium.Two spectral readings were obtained per leaf, one from the adaxial face (side or face of the leaf that is on or close to the plant axis) and another from the abaxial face (side or face of the leaf that is on the opposite side or away from the plant axis), using three leaves per individual, totaling six readings per individual.Undamaged leaves were prioritized, but leaves with fungi and that suffered herbivory were not discarded.For the reading of large leaflets (larger than the integrating sphere ca. of 1cm in diameter), the spectra were directly collected, placing the leaves on the sphere and covering them with the black body.For leaflets smaller than the integrating sphere, fisheye lensan accessory that calibrates the passage of light through the integration sphere to focus only on small organic materialwas used (Figure 1).Spectra were collected from 315 individuals distributed in 20 Dimorphandra species (Figure 2).Absorbance values comprise the wavenumbers from 4.000 to 10.000 cm -1 , recorded for 16 scans at a resolution of 8 cm -1 .The spectra were not treated, only the Fourier transformation already integrated into the spectrophotometer was used.The collected data were organized in a table for visualization (Table 1).Principal Component Analysis (PCA) was used to determine the similarity between the spectra of the different species.The ordered attributes were all the absorbance values read at each wavelength and the readings per individual.The ordination was conducted using the average spectra per individual for all wavelengths, representing 1557 attributes.A PCA was performed with all species and two PCAs were performed for comparison of the species D. gardneriana and D. mollis and subspecies of D. cuprea: D. cuprea ssp.cuprea, D. cuprea ssp.ferruginea and D. cuprea ssp.velutina, whose determination is problematic.
Discriminant functions were generated according to Lang et al. (2015) [20] in order to evaluate the potential of the data to correctly distinguish the species.The independent variables in the discriminant functions generated were the 20 species analyzed and the dependent variables were the mean absorbance values at each wavelength of the readings per individual.
The cross-validation technique recommended by Durgante et al. (2013) [19] was used to test the efficiency of the generated model.In this technique, 70% of the individuals are used to generate the model and 30% are used to validate it; the selection was randomized 1, 10, 50 and 100 times.The procedure aims to detect the set of independent variables that best predict the species analyzed.For each of the tests described, the percentage of correctness in the identifications is to be obtained.The analyses were performed in the R software 2.10.0 [36].

RESULTS AND DISCUSSION
In the visual inspection of spectral behaviors, two patterns were observed, as shown in Figure 3.One of them was already expected for the spectra of leaves, with absorbances ranging from 0.55-0.20 (Behavior A), as described by [19], and another totally different from the one recorded in previous studies, with high absorbance values ranging from 1.0-0.7 (Behavior B).It was found that of the 315 individuals who had spectra collected, 244 showed the behavior A and 71 the behavior B. The difference in spectral behaviors was related to the collection method: spectra collected directly without the use of fisheye lens resulted in behavior A, and those collected using fisheye lens resulted in behavior B. Thus, the use of fisheye lens modifies the visualized spectral behavior, producing different absorbance levels in relation to collections performed without this tool, as shown in previous works [19,20,29,[33][34][35].However, the use of this spectrophotometer tool is recommended by the manufacturer for materials smaller than the integrating sphere [36].
The patterns informed by the spectra of each species are a reflection of the chemical and physical properties of the organic material that the technique is able to capture aligned with the form of spectrum collection, mainly the greater or lower absorbances that the tool can capture from the organic material.Differences in spectral behavior among genetically distinct groups are expected [31].
The ordering of the species in two dimensions captured 99% of the variation (PC1: 89% and PC2: 10%) using average values of the raw spectra in the entire wavelength (4000 to 10000 cm -1 ) (Figure 4).Most species were clustered, such as D. vernicosa, D. campinarum, D. giagantea among others.However, some groups (the minority) were dispersed, as in the case of D. gardneriana and D. exaltata.Yet, even dispersed, some individuals formed distinct groups.This is the first time that Fourier transform near-infrared spectroscopy (FT-NIR) has been tested to discriminate species of Legumes from the subfamily Caesalpinoideae, with bipinnate leaves and a wide geographic distribution.Our results show that FT-NIR is a powerful tool that can be easily applied to species identification using leaf spectral data [37,38].Considering that PCA values explain the similarity between the spectra of different species, the higher the value of the axes (99% in the present study), the better the explanation of this variation and the recognition of spectra-species groupings.
Comparing with other studies, the results are similar to works [25,29,32], where they obtained values greater than 95% capturing the variation of the spectra in the PCA method.Of the studies analyzed with a methodology similar to the present study, only studies [23,27] obtained values lower than 70% capturing the variation of spectra in the PCA method, indicating that Near Infrared Spectroscopy (NIRS) as a tool to discriminate species is highly efficient.
These exploratory data demonstrate the importance of the tool in the discrimination of species, since many identification errors or the lack of curation of deposited samples lead to problems in the use of data.This problem is, in fact, common in Dimorphandra because of its wide morphological variation [39].Spectral readings using dried plants from herbarium collections provide an alternative to solve identification problems using specimens already available, without the need for new collections [34].In the PCA, in general, the spectra formed two groups: one group that goes from 0 to 60 (left) and another that goes from 0 to 60 (right), based on the PC1, which explained most of the variation.The use of PCA to group samples according to differences and similarities of spectral data generated by infrared analysis, reducing the dimensionality of the data set while preserving the maximum information, proved to be positive [40].
The distinct spectral behaviors observed in the PCA was explained by the collection method: individuals of species with large leaves, in which the collection was performed directly without the use of fisheye lens, showed spectra from 0 to 60 in the PC1, while those in which collections were performed with the aid of fisheye lens showed a different behavior.This demonstrates that collection tool can influence the spectra behavior.
The use of an accessory introduces new variables in the spectrum, which may compromise the recognition of tissues in the sample.The materials of accessories have chemical components that add extra information unrelated to the spectral signatures of the species [40].New tests and standardization in the use of accessories are of utmost importance in the definition of an effective protocol for spectral collections [41].
In the PCAs of D. gardneriana and D. mollis (Figure 5A) and of the subspecies of D. cuprea (Figure 5B), there was a clear separation of species.The ordering in two dimensions captured, in general, 99% to 97% of the variation.Problems in the delimitation of D. gardneriana and D. mollis are mainly associated to the collection of young or immature branches, in which the main distinguishing characters are not yet established.The determination of the two species solely based on vegetative structures of young branches is very difficult.Thus, the contribution of spectroscopy is fundamental to discriminate these species [8,12].The delimitation of the taxa included in D. cuprea, especially D. cuprea ssp.cuprea and D. cuprea ssp.ferruginea, which have some overlapping characteristics, is not clear.[8] uses a series of characteristics to separate the two subspecies, including the shape of the petals and size of the trees.These characteristics are essential, but they show some overlap.Thus, spectral data are important because they able to separate these taxa.Exploratory data are satisfactory for closely related taxa such as those studied here, but it is noteworthy that for the efficient use of FT-NIR spectra in species identification, reference spectral data for calibration of models to be used in the future to classify new samples are necessary [42].The spectral data managed here are pioneer for future calibrations.
The results for the 70-30 cross-validation test, with respective randomizations, are shown in Figure 6.An average prediction of approximately 93% was found among the randomizations performed, demonstrating a high predictive power for the discriminant functions generated.Thus, the model generated is efficient to discriminate Dimorphandra species.In a study with Ephedra species using FT-NIRS, Fan et al. (2010) [29] found results from 84.2 to 91.9% in the discriminant analysis.As for the categories with prediction error, a recurrent pattern was found for the species, as shown in Table 2.A prediction error consists of cases of non-matching species indicated by the generated model.It is noteworthy that, in these cases, the model indicated very close species with difficult morphological determination and that require a more accurate taxonomy.Each taxonomic approach accesses a different type of information about diverging lineages, and multiple lines of evidence are needed to separate the species, especially in closely related groups [43,44].

CONCLUSION
The present results show that NIRS is a useful tool for the discrimination of species of the genus Dimorphandra.Excellent results, with prediction values of 92-95% in the 70-30 validation test in the linear discriminant analyses (LDA), were obtained.This is the first time that Fourier transform near-infrared spectroscopy (FT-NIR) has been tested to discriminate species of Legumes from the subfamily Caesalpinoideae, with bipinnate leaves and a with wide geographic distribution.
After the validation test, aligned with the change in spectral behavior using fisheye lens, it was found that despite the modification, the predictive values were high.However, the use of fisheye lens for spectral collection and the study of materials smaller than the integrating sphere need to be better investigated so as to determine the extent to which these changes influence the analyses performed.
PCA is an important tool for exploring and understanding the spectral behavior of species.This analysis provided important information on spectral data of Dimorphandra species, helping to discriminate species from a group with highly variable leaf morphology.Thus, it is concluded that NIRS is an auxiliary tool for the discrimination of species of the genus Dimorphandra, contributing to the identification and elucidation of taxonomic problems.Considering that spectroscopy provides a single fingerprint for each sample, the technique offers valuable data for taxonomic classification.The integration between spectroscopy and taxonomy is promising and should be consolidated in the near future for botanical studies.

Figure 1 .
Figure 1.Demonstration of spectra collection with the use of fisheye lens.A) Integrating sphere without fisheye lens; B) Fisheye lens coupled to the integrating sphere; C) Leaflets on fisheye lens for spectra collection.

Figure 3 .
Figure 3. Spectral behavior of the genus Dimorphandra based on measurements in the abaxial and adaxial face of leaves.A) Behavior A; B) Behavior B.

Figure 4 .
Figure 4. Principal component analysis (PCA) using average spectra of individuals of the 20 Dimorphandra species studied.

Figure 5 .
Figure 5. Principal component analysis (PCA) of the genus Dimorphandra using the average spectra of individuals of subspecies of D. cuprea, and of the species D. gardneriana and D. mollis.A) PCA of the species D. gardneriana and D. mollis; B) PCA of subspecies of D. cuprea.

Figure 6 .
Figure 6.Percentage of accuracy by permutation obtained in the 70-30 validation test to validate discriminant functions.A) Randomized selection only 1 time; B) Randomized selection 10 times; C) Randomized selection 50 times; D) Randomized selection 100 times.

Table 1 .
Number of specimens per species of Dimorphandra Schott used for spectral collection.

Table 2 .
Demonstration of the categories with prediction error of species of the genus Dimorphandra demonstrated in the 70-30 validation test of the discriminant functions.