Principal Component Analysis USJudgeRatings

Note: This article shows an example on how to conduct a Principal Component Analysis on the USJudgeRatings dataset. For a more in-depth explanation of the Principal Component Analysis, please refer to the following article: Principal Component Analysis.

The Dataset[edit]

The dataset is part of Base R. It comprises lawyers' ratings of 43 state judges in the US Superior Court on 12 numeric variables.

CONT: Number of contacts of a lawyer with the judge
INTG: Judicial integrity, traits such as honesty, fairness, and incorruptibility
DMNR: Demeanor, the judge’s behavior and attitude in court (e.g. respectful, calm, professional)
DILG: Diligence, how hard-working and thorough the judge is in handling cases
CFMG: Case flow managing, ability to organize and manage cases efficiently to avoid delays
DECI: Prompt decisions, how quickly and decisively the judge makes rulings
PREP: Preparation for trial, how well the judge prepares for court proceedings
FAMI: Familiarity with law, the judge’s knowledge and understanding of legal principles
ORAL: Sound oral rulings, quality and clarity of the judge’s spoken decisions in court
WRIT: Sound written rulings, quality and clarity of the judge’s written decisions
PHYS: Physical ability, the judge’s stamina and capacity to handle the workload
RTEN: Worthy of retention, overall evaluation of whether the judge should remain in office

Examining the Dataset[edit]

Before conducting the analysis, we install and load the necessary package for visualisation and get an idea of what the dataset looks like.

install.packages("psych")
library("psych")
install.packages("factoextra")
library(factoextra) # For visualization

ratings<-USJudgeRatings
head(ratings)

Output[edit]

The first five rows of the dataset are the following:

> head(swiss)
               CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTEN
AARONSON,L.H.   5.7  7.9  7.7  7.3  7.1  7.4  7.1  7.1  7.1  7.0  8.3  7.8
ALEXANDER,J.M.  6.8  8.9  8.8  8.5  7.8  8.1  8.0  8.0  7.8  7.9  8.5  8.7
ARMENTANO,A.J.  7.2  8.1  7.8  7.8  7.5  7.6  7.5  7.5  7.3  7.4  7.9  7.8
BERDON,R.I.     6.8  8.8  8.5  8.8  8.3  8.5  8.7  8.7  8.4  8.5  8.8  8.7
BRACKEN,J.J.    7.3  6.4  4.3  6.5  6.0  6.2  5.7  5.7  5.1  5.3  5.5  4.8
BURNS,E.B.      6.2  8.8  8.7  8.5  7.9  8.0  8.1  8.0  8.0  8.0  8.6  8.6

Each row contains the measures for one judge. The variables are all numeric, which is necessary for conducting a PCA. If you are unsure, you can also check the structure of the dataset with str(ratings).

Conducting the Principal Component Analysis[edit]

The function used to conduct a Principal Component Analysis in R is prcomp, it is important to set the parameter scale = TRUE to perform scaling and centering of the variables. It standardises all variables to have a mean of 0 and a standard deviation of 1, ensuring that variables measured on different scales contribute equally to the analysis.

#perform pca
ratings_PCA<-prcomp(ratings, scale. = TRUE)

Inspection of Analysis Results[edit]

The next step is the visualisation and inspection of the analysis results. To assess how much of the total variance in the dataset is captured by each principal component (PC), a scree plot is created using fviz_eig.

fviz_eig(ratings_PCA, addlabels = TRUE, barfill = "steelblue") +
  labs(title = "PCA Scree Plot: Variance Explained by Variables")

Output[edit]

Scree Plot showing the percentage of explained variance

The scree plot reveals that the first two principal components together explain over 90% of the total variance, suggesting that the dataset can be reasonably represented in two dimensions.

Contribution of each created Principal Component[edit]

To understand which original variables drive each PC, their individual contributions are plotted.

fviz_contrib(ratings_PCA, choice = "var", axes = 1)
fviz_contrib(ratings_PCA, choice = "var", axes = 2)

Output[edit]

Contribution of variables to Dimensions

As you can see in the plot on the left, for PC1, ORAL, WRIT, RTEN, PREP, DILG, CFMG and DECI contribute at comparable levels. For PC2, CONT is the dominant contributing variable, as you can see in the right plot.

Visualisation of PCA[edit]

Now we can create the Biplot to see how the different variables correlate with each other.

fviz_pca_biplot(
  ratings_PCA,
  repel = TRUE,
  label = "var", # this removes the judge name labels for besser readability 
  title = "PCA of Judge Rating Data"
)

Output[edit]

Biplot of judge ratings Principal Component Analysis

The points show how each individual judge ratings is plotted on this new coordinate system, without the labels, as we removed them in the code.

The main information provided in this plot are:

The arrows show how the original variables correlate with the two PCs, and in turn, with each others.
All arrows point in the same direction, except for CONT. This means that the professional qualities rated by lawyers correlated with each other. As CONT stands for the number of times the lawyer and judge got into contact, it is an indicator for the lawyer's familiarity with the judge’s work.

Correlation Matrix[edit]

To examine the different correlation coefficients more closely, a correlation matrix can be created.

corPlot(ratings)

Output[edit]

Correlation Matrix of Judge Ratings Variables

Overall, correlation coefficients of CONT and the other variables are very low. CONT is slightly negatively correlated with, DMNR, the judge’s behaviour and attitude in court, as well as INTG, Judicial integrity, the two lowest arrows. These two variables may be negatively affected by the lawyer and judge not having met many times. CONT is slightly positively correlated with CFMG, case flow managing, and DECI, Prompt decisions. Higher lawyer and judge encounters may lead lawyers to interpret the judges as more efficient. Besides the low coefficients, there are only 43 observations, so interpretations and explanations have to be treated with caution.

The author of this entry is Antonia Ucher

Principal Component Analysis USJudgeRatings

Contents

The Dataset[edit]

Examining the Dataset[edit]

Output[edit]

Conducting the Principal Component Analysis[edit]

Inspection of Analysis Results[edit]

Output[edit]

Contribution of each created Principal Component[edit]

Output[edit]

Visualisation of PCA[edit]

Output[edit]

Correlation Matrix[edit]

Output[edit]