Description
I have a class named Sustainable Finance and Impact Investing, basic respect to ESG( environment, social and governance risks and impacts). I need write a 2000-2500 words essay. My proposed topic is: ESG rating divergence and stock price, I would like focusing on: Can ESG performance properly reflect in corporate stock and bond prices? I need write as a perspective of investor, don’t need to be original research . Attached is my professor requirement and some reference may be useful. Please ask any question you have and anything you need.
Unformatted Attachment Preview
MIT Sloan School of Management
MIT Sloan School Working Paper 5822-19
Aggregate Confusion:
The Divergence of ESG Ratings
Florian Berg, Julian F. Koelbel, and Roberto Rigobon
This work is licensed under a Creative Commons AttributionNonCommercial License (US/v4.0)
http://creativecommons.org/licenses/by-nc/4.0/
August 15, 2019
Electronic copy available at: https://ssrn.com/abstract=3438533
Aggregate Confusion:
The Divergence of ESG Ratings.∗
Florian Berg1 , Julian F. Koelbel2,1 , Roberto Rigobon1
1
MIT Sloan
2
University of Zurich
August 15, 2019
Abstract
This paper investigates the divergence of environmental, social, and governance (ESG) ratings. First, the paper documents the disagreement between the ESG ratings of five prominent
rating agencies. The paper proceeds to trace the disagreement to the most granular level of
ESG categories that is available and decomposes the overall divergence into three sources: Scope
divergence related to the selection of different sets of categories, measurement divergence related to different assessment of ESG categories, and weight divergence related to the relative
importance of categories in the computation of the aggregate ESG score. We find that measurement divergence explains more than 50 percent of the overall divergence. Scope and weight
divergence together are slightly less important. In addition, we detect a rater effect, i.e., the
rating agencies’ assessment in individual categories seems to be influenced by their view of
the analyzed company as a whole. The results allow investors, companies, and researchers to
understand why ESG ratings differ.
We thank Jason Jay, Kathryn Kaminski, Eric Orts, Robert Eccles, Yannick Le Pen, Andrew King, and Timo Busch
for detailed comments on earlier versions of this paper. Also we thank the seminar participants at JOIM 2018 for their
comments. Armaan Gori, Elizabeth Harkavy, Andrew Lu, and Francesca Macchiavello provided excellent research
assistance. All remaining errors are ours. Correspondence to: Roberto Rigobon, MIT Sloan School of Management,
MIT, 50 Memorial Drive, E62-520, Cambridge, MA 02142-1347, [email protected], tel: (617) 258 8374.
∗
1
Electronic copy available at: https://ssrn.com/abstract=3438533
1
Introduction
Environmental, social, and governance rating providers1 have become very influential institutions
that inform a wide range of decisions in business and finance. Regarding business, 80 percent of
CEOs believe that demonstrating a commitment to society is important2 and look to sustainability
ratings for guidance and benchmarking. An estimated USD 30 trillion of assets are invested relying
in some way on ESG ratings3 . There are also a large number of academic studies that rely on ESG
ratings for their empirical analysis, arguing for example that good ESG ratings helped to prop up
stock returns during the 2008 financial crisis (Lins et al., 2017).
However, ratings from different providers disagree dramatically (Chatterji et al., 2016). In our
data set of five different ESG raters, the correlations between their ratings are on average 0.61,
and range from 0.42 to 0.73. For comparison, credit ratings from Moody’s and Standard & Poor’s
are correlated at 0.994 . This means that the information that decision-makers receive from rating
agencies is relatively noisy. Three major consequences follow: First, ESG performance is unlikely to
be properly reflected in corporate stock and bond prices, as investors face a challenge when trying to
identify out-performers and laggards. Fama and French (2007) show that investor tastes can influence
asset prices, but only when a large enough fraction of the market holds and implements a uniform
non-financial preference. Therefore, even if a large fraction of investors have a preference for ESG
performance, the divergence of the ratings disperses the effect of these preferences on asset prices.
Second, the divergence frustrates the ambition of companies to improve their ESG performance,
because they receive mixed signals from rating agencies about which actions are expected and will
be valued by the market. Third, the divergence of ratings poses a challenge for empirical research
as using one rater versus another may alter a study’s results and conclusions. Taken together, the
ambiguity around ESG ratings is an impediment to prudent decision-making that would contribute
to an environmentally sustainable and socially just economy.
This paper investigates why sustainability ratings diverge. In the absence of a reliable measure of
“true ESG performance,” the next best thing is to understand what drives the differences of existing
ESG ratings. In principle, there are two reasons why ratings diverge. They might diverge because
rating agencies adopt different definitions of ESG performance, or they can differ because these
agencies adopt different approaches to measuring ESG performance. Currently, it is unclear how
much each of those two explain the observed dispersion in ratings. Our goal is to disentangle these
sources of divergence by comparing ratings at the disaggregate level. To do so, we specify the ratings
as consisting of three basic elements: (1) a scope of attributes, which denotes all the elements that
together constitute the overall concept of ESG performance; (2) indicators that represent numerical
measures of the attributes; and (3) an aggregation rule that combines the set of indicators into
a single rating. Divergence between ratings can arise from each of these three elements, whereas
differences regarding scope and aggregation rule represent different views about the definition of
ESG performance, and differences regarding indicators represent disagreement about appropriate
ways of measuring.
1
ESG ratings are also called sustainability ratings or corporate social responsibility ratings. We use the terms ESG
ratings and sustainability ratings interchangeably.
2
https://www.accenture.com/hk-en/insight-un-global-compact-ceo-study
3
GSIA 2018
4
Since credit ratings are expressed on an ordinal scale, researchers usually do not report correlations. However, for
the sake of illustration we used the data from Jewell and Livingston (1998), and calculated a Pearson correlation by
replacing the categories with integers.
2
Electronic copy available at: https://ssrn.com/abstract=3438533
We identify three distinct sources of divergence. Scope divergence refers to the situation where
different sets of attributes are used as a basis to form different ratings. For instance, attributes such
as greenhouse gas emissions, employee turnover, human rights, and lobbying, etc., may be included
in the scope of a rating. One rating agency may include lobbying, while another might not, leading
to differences in the final aggregate rating. Weight divergence refers to the situation where rating
agencies take different views on the relative importance of attributes and whether performance in
one attribute compensates for another. For example, the human rights indicator may enter the final
rating with greater weight than the lobbying indicator. Indeed, the scope and weight divergence
could also be subsumed under Aggregation divergence, since excluding an attribute from a rating’s
scope is equivalent to including it with a weight of zero. Finally, Measurement divergence refers to the
situation where rating agencies measure the same attribute using different indicators. For example,
a firm’s labor practices could be evaluated on the basis of workforce turnover, or by the number of
labor cases against the firm. Both capture aspects of the attribute labor practices, but they are likely
to lead to different assessments. Indicators can focus on processes, such as the existence of a code of
conduct, or outcomes, such as the frequency of incidents. The data can come from various sources
such as company reports, public data sources, surveys, or media reports, for example. We assume
that the rating agencies are trying to measure the same attributes, but use different indicators. The
final aggregate rating contains all three sources of divergence intertwined into one number. Our goal
is to estimate to what extent to which each of the three sources drives the overall divergence.
Methodologically, we approach the problem in three steps. First, we categorize all indicators
provided by different data providers into a common taxonomy of 64 categories. This categorization
is a critical step in our methodology, as it allows us to observe the scope of categories covered by
each rating as well as to contrast measurements by different raters within the same category. The
taxonomy is an approximation, because most raters do not share their raw data, making a matching
between identical indicators impossible. However, restricting the analysis to identical indicators would
yield that the entire divergence is due to scope, i.e., that there is zero common ground between ESG
raters, which does not reflect the real situation. Thus, we use a taxonomy that matches indicators by
attribute. We created the taxonomy starting from the population of 641 indicators and establishing a
category whenever at least two indicators from different rating agencies pertain to the same attribute.
Indicators that do not pertain to a shared attribute remain uncategorized. As such, the taxonomy
approximates the population of common attributes as granular as possible and across all raters. We
calculate category scores for each rating by taking simple averages of the indicators that belong to
the same category. Second, we estimate the original ratings to obtain comparable aggregation rules.
Using the category scores established by the taxonomy, we estimate weights of each category in a
simple non-negative linear regression5 . The results are modeled versions of the real ratings which are
comparable in terms of scope, measurement, and weight in the aggregation rule. Third, we calculate
the contribution of divergence in scope, measurement, and weight to the overall ratings divergence
using two different decomposition methods.
Our study yields three results. First, we show that it is possible to estimate the implied aggregation rule used by the rating agencies with an accuracy north of 90 percent on the basis of a
common taxonomy. This demonstrates that although rating agencies take very different approaches,
it is possible to approximate their aggregation rule with a simple linear weighted average. We also
estimated the ratings using different methodologies, e.g. neural networks and random forests. The
results are virtually identical. In the out-of-sample, the non-negative linear regression performed
the best. Second, we find that 53 percent of the difference of the ratings stems from measurement
5
Non-negative least squares constrain the coefficients to take either zero or positive values.
3
Electronic copy available at: https://ssrn.com/abstract=3438533
divergence, while scope divergence explains 44 percent, and weight divergence another 3 percent. In
other words, 53 percent of the discrepancy comes from the fact that the rating agencies are measuring
the same categories differently, and 47 percent of the discrepancy stems from aggregating common
data using different rules. This means that for users of this data – financial institutions for instance
– a sizable proportion of the discrepancy could be resolved by sharing the data on the indicator level
and having a common procedure for aggregation. On the other hand, these results also suggest that
different sustainability ratings cannot be made congruent simply by taking into account scope and
weight differences. Therefore, standardizations of the measurement procedures are required. Third,
we find that a significant portion of the measurement divergence is rater-specific and not categoryspecific, suggesting the presence of a Rater Effect 6 . In other words, a firm that performs well in one
category for one rater, is more likely to perform well in all the other categories for that same rater.
Inversely, if the same firm is evaluated poorly in one category by another rater, it is more likely to
be evaluated poorly for all the other categories as well.
Our methodology relies on two main assumptions and we evaluate the robustness of each of them.
First, the individual indicators are assigned to categories using our individual judgment. We needed
to make several judgment calls to determine to which categories each individual indicator belongs to.
To evaluate robustness, we sorted the indicators according to the Sustainability Accounting Standards
Board taxonomy. The results are virtually identical. Second, the linear rule is not contingent on
the industry or the sector where the firm operates. Many rating agencies openly state that their
aggregation rules are different for different industries. In other words, they state that each industry
has its own set of key issues. However, we impose the exact same aggregation procedure on all firms
and all sectors. We need to implement these two approximations to be able to compare procedures
from different rating agencies. These assumptions, however, seems to be relatively innocuous in our
empirical strategy. We are able to get surprisingly good approximations of the final ratings in our
procedures based on our taxonomy with simple linear rules7 .
Our paper extends a stream of research that has documented the divergence of ESG ratings
(Chatterji et al., 2016, 2009; Semenova and Hassel, 2015; Dorfleitner et al., 2015; Delmas and Blass,
2010). Its key contribution is to explore the disaggregate data behind ESG ratings and explaining
in detail the sources of divergence. Our study is related to research on credit rating agencies, in
particular, those dealing with the question why credit ratings differ (Bongaerts et al., 2012; Güntay
and Hackbarth, 2010; Jewell and Livingston, 1998; Cantor and Packer, 1997). Similar to Griffin and
Tang (2011), we estimate the underlying rating methodologies to understand the differences in ratings.
Additionally, our study is related to literature that is concerned with changing investor expectations,
namely the integration of ESG performance in investment portfolios. Several studies show that
there is a real and growing expectation from investors that companies perform well in terms of ESG
6
The rater effect or rater bias has been extensively studied in sociology, management, and psychology, especially
in performance evaluation. Shrout and Fleiss (1979) evaluate different correlation measures to assess the rater effects.
This is one of the most cited papers in psychology in the area of the rater effect. In performance evaluation see Mount
et al. (1997). They study how different ethnicity and positions within the organization peers, subordinates, and bosses
rate each other, and how the ratings are affected by these categories. These are two of the most influential papers in
this area. In finance and economics there are many papers that study the biases in credit rating agencies. See Griffin
and Tang (2011) and Griffin et al. (2013) for papers studying the rater bias. See Fong et al. (2014) where the authors
study how changes in the competition of analysts impacts the biases of credit rating agencies. They find that less
competition tends to produce an optimistic bias of the rating agencies. In sum, both in psychology and in finance, one
can find a long history of ratings exhibiting biases. Many of those biases are rating agency wide. Finally, Didier et al.
(2012) discuss the rater effect within the mutual fund industry with a focus on international diversification.
7
These errors are very small relative to the discrepancy observed. We explain more than 90 percent of the observed
variation, while the discrepancy is an order of magnitude larger.
4
Electronic copy available at: https://ssrn.com/abstract=3438533
performance (Amel-Zadeh and Serafeim, 2018; Gibson and Krueger, 2018), especially with regard to
risks associated with climate change (Krueger et al., 2018). ESG ratings are the operationalization of
investor expectations regarding ESG, thus understanding ESG ratings improves the understanding
of these changing investor expectations.
The paper is organized as follows: Section 2 describes the data sources, section 3 documents the
divergence in the sustainability ratings from different rating agencies. Section 4 explains the way
in which we structure the data and describes the data at the disaggregate level, in section 5 we
decompose the overall divergence into the contributions of Scope, Measurement, and Weight. In that
section we also document the rater effect. Finally, we conclude in section 6.
2
Data
ESG ratings first emerged in the 1980s as a service for investors to screen companies not purely on
financial characteristics, but also on characteristics relating to social and environmental performance.
The earliest ESG rating agency Vigeo-Eiris was established in 1983 in France and five years later
Kinder, Lydenberg & Domini (KLD) was established in the US (Eccles and Stroehle, 2018). While
initially catering to a highly-specialized investor clientele, such as faith-based organizations, the
market for ESG ratings has widened dramatically, especially in the past decade. Estimates are that
30 trillion USD are invested in ways that rely on some form of ESG information (GSIA, 2018), a
figure that has grown by 34 percent since 2016. As interest in sustainable investing grew, many early
providers were acquired by established financial data providers, e.g. MSCI bought KLD in 2010,
Morningstar bought Sustainalytics in 2010, ISS bought Oekom in 2018 (Eccles and Stroehle, 2018),
and Moody’s bought Vigeo-Eiris in 2019.
ESG rating agencies offer investors a way to screen companies for ESG performance in a similar
way credit ratings allow investors to screen companies for creditworthiness. Yet, there are two
important differences. First, while creditworthiness is relatively clearly defined as the probability of
default, ESG performance is a concept that is still evolving. Thus, an important part of the service
that ESG rating agencies offer is an interpretation of what ESG performance means. Second, while
financial reporting standards have matured and converged over the past century, ESG reporting
is in its infancy. While most major companies provide some form of ESG reporting, there are
competing reporting standards and almost none of the reporting is mandatory. Thus, ESG ratings
provide a service to investors by collecting and aggregating information across a spectrum of sources
and reporting standards. As a result, ESG ratings agencies have considerable discretion in how to
produce ESG ratings and may give different ratings to the same company.
We use the data of five different ESG rating providers: KLD8 , Sustainalytics, Vigeo-Eiris, Asset4,
and RobecoSAM9 . We approached each provider and requested access to not only the ratings, but also
the underlying indicators, as well as documentation about the aggregation rules and measurement
8
KLD, formerly known as Kinder, Lydenberg, Domini & Co., was acquired by RiskMetrics in 2009. MSCI bought
RiskMetrics in 2010. The dataset was subsequently renamed to MSCI Stats as a legacy database. We keep the original
name of the dataset to distinguish it from the MSCI dataset.
9
Other data providers have been approached and our goal is to continue evaluating the sources of discrepancy among
the most prominent rating agencies. RepRisk and MSCI provided us with the data, which we are still processing. We
also requested the data from Oekom/ISS and TrueValueLabs. However, at the moment of writing this paper, we have
not been granted access to their data.
5
Electronic copy available at: https://ssrn.com/abstract=3438533
protocols of the indicators. Together, these providers represent most of the major players in the ESG
rating space as reviewed in Eccles and Stroehle (2018). We requested that the data set be as granular
as possible.
The KLD dataset was the only one that did not contain an aggregate rating, even though it
is frequently used in academic studies in aggregate form. The KLD data set provided only binary
indicators for either “strengths” or “weaknesses” in seven dimensions. We created an aggregate rating
for KLD by following the procedure that is chosen in most academic studies, namely summing all
strengths and subtracting all weaknesses10 .
Table 1 provides some basic descriptive statistics of the data sets obtained from the different
rating providers. The number of firms covered in 201411 , the baseline year for our analysis, ranges
from 1671 to 4566. The balanced sample showed in Table 1 contains 823 firms. The mean and ESG
scores are higher in the balanced sample for all providers, indicating that the balanced sample tends
to drop lower performing companies.
Table 1. Descriptive Statistics
Descriptive Statistics of full sample in 2014.
Sustainalytics
RobecoSAM
Vigeo-Eiris
KLD
Observations
4551
1668
2319
4295
Mean
56.38
47.17
32.19
1.11
Standard Deviation
9.44
21.05
11.78
1.72
Minimum
29
13
5
-6
Median
55
40
31
1
Maximum
89
94
67
9
Asset4
4025
50.87
30.95
2.78
53.13
97.11
Descriptive Statistics of common sample in 2014.
Sustainalytics
RobecoSAM
Vigeo-Eiris
KLD
Observations
823
823
823
823
Mean
61.36
49.61
33.91
2.44
Standard Deviation
9.52
20.91
11.46
2.28
Minimum
36
13
6
-4
Median
61
46
33
2
Maximum
89
94
67
9
Asset4
823
72.12
24.12
3.26
80.47
97.11
The descriptive statistics of the aggregate rating (ESG) in 2014 using the unbalanced and common sample for the five rating agencies KLD,
Sustainalytics, Vigeo-Eiris, RobecoSAM, and Asset4.
Throughout the paper, we refer to three versions of this data set. The first two are the full and
the common sample as shown in Table 1. The third version is the normalized common sample, where
all variables are normalized to have zero mean and unit variance.
3
Measurement of Divergence
To motivate our analysis, we illustrate the extent of divergence between the different rating agencies.
The first step is to compute the correlations of the ratings between different rating agencies at different
levels of aggregation. In particular, on the ESG level as well as for the environmental, social, and
governance dimensions. Second, we evaluate heterogeneity at the firm level. Simple correlations,
although easy to understand, can mask important heterogeneity in the data. It is possible that low
correlations are due to large disagreements in a small subset of the firms. To explore this possibility,
10
See e.g. Lins et al. (2017)
Although, we have data for other years, most of our analysis is cross sectional and therefore we concentrate on the
year in which the greatest common sample.
11
6
Electronic copy available at: https://ssrn.com/abstract=3438533
we compute the average absolute distance to the median rating for each firm. Third, we explore
the rankings of the firms. We determine the proportion of firms belonging to the top quantile, and
the proportion that belongs to the bottom quantile. We then proceed with a thorough analysis for
different quantiles. We develop a simple statistic called the Quantile Ranking Count. The conclusion
of these four approaches is the same. There is a high level of disagreement across rating agencies,
and the disagreement is quite heterogeneous.
3.1
Correlations of Aggregate Ratings
In this section we describe the correlations between the ESG ratings from different rating agencies.
Table 2 shows the Pearson correlations between the aggregate ESG ratings, as well as the ratings
in the separate environmental, social, and governance dimensions. Correlations of the ESG ratings
are on average 0.61, and range from 0.42 to 0.73. The correlations of the environmental ratings
are slightly higher than the overall correlations with an average of 0.65. The social and governance
ratings have the lowest correlations with an average of 0.49 and 0.38, respectively. These results are
consistent with Semenova and Hassel (2015), Chatterji et al. (2016), Dorfleitner et al. (2015), and
Bouten et al. (2017).
KLD clearly exhibits the lowest correlations with all other raters, both for the ESG rating and for
the individual dimensions. RobecoSAM and Vigeo-Eiris have the highest level of agreement between
each other, with a correlation of 0.73.
Table 2. Correlation at aggregate ESG level and at E, S, and G level.
ESG
E
S
G
Econ
SA – VI
0.73
0.70
0.61
0.55
–
SA – KL
0.53
0.61
0.28
0.08
–
SA – RS
0.68
0.66
0.55
0.53
–
SA- A4
0.67
0.65
0.58
0.51
–
VI – KL
0.48
0.55
0.33
0.04
–
VI – RS
0.71
0.74
0.70
0.78
–
VI – A4
0.71
0.66
0.68
0.77
–
KL – RS
0.49
0.58
0.24
0.24
–
KL – A4
0.42
0.55
0.24
-0.01
–
RS – A4
0.64
0.70
0.66
0.81
0.43
Correlations between the ratings on the aggregate level (E, S, G, and ESG) from the five different rating agencies are calculated using the common
sample. The results are similar using pairwise common samples based on the full sample. SA, RS, VI, A4 and KL are short for Sustainalytics,
RobecoSAM, Vigeo-Eiris, Asset4, and KLD, respectively.
The disagreement between ESG ratings is far larger than between credit ratings. Credit rating
agencies use different data sources and procedures to evaluate the ability to pay as well as the
willingness to pay of firms, governments, and individuals. These procedures and the data sources are
not free of judgment. Nevertheless, we find a correlation of 98.6 percent between credit ratings from
Moody’s and Standard & Poor’s. Since credit ratings are expressed on an ordinal scale, researchers
usually do not report correlations. However, for the sake of illustration we used the data from
Jewell and Livingston (1998), and calculated a Pearson correlation by replacing the categories with
integers. The degree of disagreement between ESG ratings from different provider is thus far more
pronounced. While credit rating agencies occasionally differ in their assessment one category upwards
or downwards, ESG ratings disagree significantly more.
3.2
Heterogeneity in the Disagreement
The problem of correlations is that they are comparisons at the rating agency level. Correlations
tend to obscure firm level differences. For example, two rating agencies can be weakly correlated
because there is disagreement for every firm in the sample, or because there is agreement in a large
set of firms and extremely large disagreement in a small set of firms. To evaluate this possibility
7
Electronic copy available at: https://ssrn.com/abstract=3438533
we use the normalized common sample and compute the average absolute distance to the median
rating for each firm. The normalized data indicates where the firm is located in the distribution of a
particular rating agency. Even if the nominal ratings might differ, the placements in the distribution
might be similar. This provides a firm-specific measure of disagreement12 . To present the data we
concentrate on the extremes of the distribution of the median average distance — the 100 firms with
the highest agreement, and the 100 firms with the highest disagreement.
Dexus Property Group
Nokia Corporation
Cisco Systems, Inc.
Kingfisher plc
Campbell Soup Company
Accor S.A.
Imperial Tobacco Group plc
American Water Works Company, Inc.
Kering SA
Swedbank AB (publ)
Land Securities Group plc
Xcel Energy Inc.
Telstra Corporation Limited
Colgate−Palmolive Co.
Newmont Mining Corporation
Kesko Oyj
International Business Machines Corporation
Public Service Enterprise Group Inc.
CA Technologies, Inc.
Belgacom SA
Ford Motor Co.
Baker Hughes Incorporated
Starbucks Corporation
General Mills, Inc.
BCE Inc.
Bouygues SA
Rolls Royce Holdings plc
SGS SA
Nikon Corporation
Qantas Airways Limited
Daikin Industries Ltd.
Brambles Limited
Swedish Match AB
Astellas Pharma, Inc.
IMI plc
Morgan Stanley
Nitto Denko Corporation
Daiwa House Industry Co. Ltd.
Sysmex Corp.
Dassault Systemes SA
Amgen Inc.
Mitsubishi Heavy Industries Ltd.
Svenska Handelsbanken AB (publ)
Resona Holdings, Inc.
Trelleborg AB
Sika AG
Kirin Holdings Company, Limited
Chugai Pharmaceutical Co. Ltd.
Takeda Pharmaceutical Company Limited
Philip Morris International, Inc.
Kuraray Co. Ltd.
Edenred SA
Banco Santander−Chile
Eisai Co., Ltd.
Apple Inc.
Toyota Industries Corporation
Moody’s Corporation
Nomura Research Institute, Ltd.
SCANA Corp.
Cigna Corp.
Kobe Steel Ltd.
HTC Corporation
Mitsubishi Tanabe Pharma Corporation
Daily Mail and General Trust plc
Distribuidora Internacional de Alimentaci…….n, S.A.
Luxottica Group SpA
Challenger Limited
Kerry Group plc
Franklin Resources Inc.
Kuehne + Nagel International AG
Cardinal Health, Inc.
Nippon Steel & Sumitomo Metal Corporation
Ventas, Inc.
Brenntag AG
Shionogi & Co., Ltd.
DENTSPLY International Inc.
Health Care REIT, Inc.
Kimco Realty Corporation
AutoZone, Inc.
CarMax Inc.
Greek Organisation of Football Prognostics S.A.
Fortune Brands Home & Security, Inc.
Chipotle Mexican Grill, Inc.
Robert Half International Inc.
Commercial International Bank (Egypt) S.A.E.
S.A.C.I. Falabella
Helmerich & Payne, Inc.
Sampo Oyj
Sonic Healthcare Limited
AmerisourceBergen Corporation
Advance Auto Parts Inc.
Roper Industries Inc.
MediaTek Inc.
Deutsche Wohnen AG
Intuitive Surgical, Inc.
Amphenol Corporation
Turkiye Halk Bankasi A.S.
Loews Corporation
China Resources Land Ltd.
Genuine Parts Company
Rating_Agency
Sustainalytics
RobecoSAM
Asset4
MSCI/KLD
Vigeo
−2
−1
0
1
2
Normalized Rating
Figure 1. Comparison of firms’ normalized scores for different rating agencies.
100 firms with the lowest median average distance within the normalized common sample (n=823). Firms are sorted by their median rating. Each rating
agency is plotted in a different color. The vertical strings of blue dots are due to the fact that the KLD rating has only 14 unique values.
In Figure 1 we present a subset containing the 100 firms with the lowest average distance to the
median, i.e., where the agreement between raters is greatest. To simplify the visualization, we rank
the firms by their median, placing the best rated firm at the top and the worst rated firm at the
bottom. The y-axis displays the firm’s name, and the x-axis the normalized rating, reflecting how
positively or negatively firms are rated among all five rating agencies. Each rating agency is depicted
with a different colour13 .
The figure shows that among these 100 firms agreement is not perfect, but generally all five rating
agencies share a common view. Companies such as Cisco, Nokia, and Colgate-Palmolive have high
12
The average distance to the median across the 823 firms is 0.41, with the first quantile at 0.30 and the third
quantile at 0.51
13
The aggregate KLD rating has 14 unique values. These are the blue dots that seem to be aligned on top of each
other.
8
Electronic copy available at: https://ssrn.com/abstract=3438533
median ratings, and all five rating agencies tend to agree. Firms such as Roper Industries, Intuitive
Surgical, and China Resources Land, Ltd. have low median ratings, and all rating agencies agree
with such an assessment. The average pairwise correlation of the ratings for these 100 firms is 0.90.
L’Oreal SA
STMicroelectronics NV
BT Group plc
Henkel AG & Co. KGaA
Praxair, Inc.
Intel Corporation
Umicore S.A.
GlaxoSmithKline plc
CRH plc
Sony Corporation
Applied Materials, Inc.
Banco Santander, S.A.
Vivendi Soci…..t….. Anonyme
Northern Trust Corporation
Verizon Communications Inc.
NTT DOCOMO, Inc.
International Paper Company
Outotec Oyj
Petrofac Limited
Exelon Corporation
Canadian National Railway Company
Mahindra & Mahindra Ltd.
HSBC Holdings plc
Lite−On Technology Corp.
Amadeus IT Holding SA
Taiwan Semiconductor Manufacturing Company Limited
H & M Hennes & Mauritz AB (publ)
Kone Oyj
ITV plc
Nippon Telegraph and Telephone Corporation
PostNL N.V.
CEMEX, S.A.B. de C.V.
The Royal Bank of Scotland Group plc
Power Assets Holdings Limited
Entergy Corporation
Advanced Micro Devices, Inc.
American Electric Power Co., Inc.
LG Electronics Inc.
Hitachi Chemical Co. Ltd.
Oracle Corporation
G4S plc
Samsung Electronics Co. Ltd.
Infineon Technologies AG
Dominion Resources, Inc.
Hyundai Mobis Co.,Ltd.
Honda Motor Co., Ltd.
Samsung Electro−Mechanics Co. Ltd.
Shaftesbury plc
CTBC Financial Holding Co., Ltd.
CMS Energy Corp.
Alumina Ltd.
Iluka Resources Ltd.
Comcast Corporation
Core Laboratories NV
Fomento Econ…….mico Mexicano, S.A.B de C.V
Juniper Networks, Inc.
Hannover R……ck SE
Pfizer Inc.
FUJIFILM Holdings Corporation
Aisin Seiki Co., Ltd.
Larsen & Toubro Limited
SSE plc
Lam Research Corporation
Hyundai Engineering & Construction Co., Ltd.
Link Real Estate Investment Trust
AU Optronics Corp.
Caterpillar Inc.
Google Inc.
The Goodyear Tire & Rubber Company
The AES Corporation
Swire Pacific Limited
Dr. Reddy’s Laboratories Ltd.
I