Principal Component Analysis (PCA) — SciPy Filters for QGIS¶
Principal Component Analysis (PCA)¶
Changed in version 1.5: Add parameters plot and standard scaler
- class scipy_filters.algs.scipy_pca_algorithm.SciPyPCAAlgorithm[source]¶
Principal Component Analysis (PCA)
calculated using Singular Value Decomposition (SVD) using svd from scipy.linalg.
With default parameters, all components are kept. Optionally, either the number of components to keep or the percentage of variance explained by the kept components can be set.
Standard Scaler Optionally scale each band to unit variance (std of 1) before performing PCA (std of each band is reported in the output). Otherwise, the absolute values of the scores will be in a similar range as the original data. (Added in version 1.5)
Number of components to keep. 0 for all components. If negative: number of components to remove. Ignored if percentage of variance is set.
Percentage of variance to keep is only used if it is greater than 0 (typical values would be in the range between 90 and 100).
Plot If plotly is available, a plot of the variance explained by the principal components is created and saved as HTML file.
Output The output raster contains the data projected into the principal components (i.e. the PCA scores).
Output data type Float32 or Float64
The following values / vectors are avaible a) in the log tab of the processing window, b) in JSON format in the “Abstract” field of the metadata of the output raster layer, eventually to be used by subsequent transformations, and c) in the output dict if the tool has been called from the python console or a script: Singular values (of SVD), Variance explained (Eigenvalues), Ratio of variance explained, Cumulated sum of variance explained, Eigenvectors (V of SVD), Loadings (eigenvectors scaled by sqrt(eigenvalues)), Band Mean.
The plugin should give the same results as sklearn.decomposition.PCA from scikit-learn: ‘singular values’ is pca.singular_values_ ‘eigenvectors’ is pca.components_ ‘variance explained’ is pca.explained_variance_ in sklearn.
Keep only n components¶
Transform from principal components¶
Changed in version 1.5: Add parameter Std of original bands
- class scipy_filters.algs.scipy_pca_helper_algorithms.SciPyTransformFromPCAlgorithm[source]¶
Transform from principal components
Transform data from principal components (i.e. the PCA scores) back into the original feature space using a matrix of eigenvectors by taking the dot product of the scores the with the transpose of the matrix of eigenvectors and adding the original means to the result.
The eigenvectors can also be read from the metadata of the input layer, as long as they exist and are complete.
Eigenvectors Matrix of eigenvectors (as string). Optional if the next parameter is set. The matrix can be taken from the output of the PCA algorith of this plugin.
Mean of original bands As first step of PCA, the data of each band is centered by subtracting the means. These must be added after rotating back into the original feature space. Optional if the meta data of the input layer is complete. (Use false means if they were used for the forward transformation.)
Std of original bands Empty string (no scaling was used) or list of standard deviations for each band of the original data that was used for PCA.
Output data type Float32 or Float64.
Transform to principal components¶
Changed in version 1.5: Add parameter Std of original bands
- class scipy_filters.algs.scipy_pca_helper_algorithms.SciPyTransformToPCAlgorithm[source]¶
Transform to principal components
Transform data into given principal components with a matrix of eigenvectors by taking the dot product with a matrix of weights (after centering or scaling the data).
The eigenvectors can also be read from the metadata of an existing PCA layer.
Eigenvectors Matrix of eigenvectors (as string). Optional if the next parameter is set.
Read eigenvectors from PCA layer metadata Reads the weights for the transformation from the metadata of a layer that was generated using the PCA algorithm of this plugin. Ignored if the parameter eigenvectors is used.
Number of components is only used if the value is greater than 0 and smaller than the count of original bands.
False mean for each band As first step of PCA, the data of each band is centered by subtracting the means. If false means are provided, these are substracted instead of the real means of the input layer. This allows to transform another raster image into the same space as the principal components of another layer. The result is usefull for comparation of several rasters, but should not be considered to be proper principal components. Only used if “Used false mean” is checked.
Use false mean See also false mean of each band. The false mean to be used can also be read from the metadata of a PCA layer.
Std of original bands Empty string (no scaling) or list of standard deviations for each band of the original data that was used for PCA. If provided, the data is scaled by dividing by the provided standard deviations.
Output data type Float32 or Float64.
Biplot¶
New in version 1.5.
- class scipy_filters.algs.scipy_pca_biplot.SciPyPCABiplot[source]¶
Plot an interactive biplot of the principal components
The input raster must be the result of the PCA algorithm of this plugin, with at least the first two components (2 bands). The loadings are read from the metadata.
The scores (PC1 and PC2 of each pixel) are plotted as contours (there would be too many points for a scatter plot).
The loadings are plotted as vectors, with the length and direction indicating the contribution of the bands to the first and second principal component.
Use the zoom tool to explore.
Note
Requires Plotly.
Input PCA layer
Plot The plot is saved as html and can be opened in a browser.