Promoting Energy-Efficient Behavior by Depicting Social Norms in a Recommender Interface

How can recommender interfaces help users to adopt new behaviors? In the behavioral change literature, social norms and other nudges are studied to understand how people can be convinced to take action (e.g., towel re-use is boosted when stating that “75% of hotel guests” do so), but most of these nudges are not personalized. In contrast, recommender systems know what to recommend in a personalized way, but not much human-computer interaction ( HCI ) research has considered how personalized advice should be presented to help users to change their current habits. We examine the value of depicting normative messages (e.g., “75% of users do X”), based on actual user data, in a personalized energy recommender interface called “Saving Aid.” In a study among 207 smart thermostat owners, we compared three different normative explanations (“Global.” “Similar,” and “Experienced” norm rates) to a non-social baseline (“kWh savings”). Although none of the norms increased the total number of chosen measures directly, we show that depicting high peer adoption rates alongside energy-saving measures increased the likelihood that they would be chosen from a list of recommendations. In addition, we show that depicting social norms positively affects a user’s evaluation of a recommender interface.


INTRODUCTION
Recommender interfaces seek to present content that fits user preferences [29].In doing so, they can explain why certain items are presented [10,58], for example, by highlighting that other users have also bought a certain product.While recommenders in leisure domains (e.g., movies) are optimized to promote any item, some recommenders wish to promote specific items that support behavioral change [20,54], for example, in domains such as healthy eating and energy conservation [23,56,59].For obvious reasons, recommending something specific is less likely to be successful and, therefore, social explanations of recommendations are often used to "nudge" users (cf.[49,57]), triggering social comparison mechanisms that might help to convince users [21,44].For example, highlighting that 65% of other users have bought a healthy product in an online supermarket, might persuade a user to also do so.
Studies in psychology have analyzed how social norms can effectively promote specific, onesize-fits-all environmental behaviors (e.g., [4,13,44]).A good example is the work of Goldstein et al. [25], who persuaded tenants of hotel rooms to re-use their towel by highlighting that "75% of others guests have done so, " instead of emphasizing the environmental benefits of doing so.Such descriptive social norms have yet to be tested for a larger set of energy-saving measures.In fact, digital nudges are rarely used in personalized interactive systems [30], nor in recommender systems that support behavioral change [20,53,56].Trying to convince users of energy-saving measures through social comparisons in energy recommender systems is challenging though, because energy-saving measures that yield high kWh savings are quite "unpopular" [56].For example, solar PV has only been installed on top of 13% of Dutch households [19], and "13% of users have solar PV installed" is not very convincing when presented as a normative message.For such messages to work, one needs at least a majority percentage to convince others.Our aim is to analyze whether we can use social comparisons to create a majority norm that can promote "unpopular but useful" energy-saving measures [25,44].
A nudging message that uses a majority norm can be created even for unpopular energy-saving measures, by highlighting the behavior of a specific group of peer users.For example, the adoption rate of Solar PV among more experienced users is much higher than the average rate of 13% [19,56], and possibly exceeds 50% among users with a strong energy-saving attitude [56].This would allow for a convincing, yet truthful majority norm message: "55% of experienced users (like you) have solar PV installed." Adoption rates for different kinds of users can be obtained by using the psychometric Rasch model [33], which has been used in work on energy recommender systems [54,56].Rasch differentiates between users in terms of their attitudinal strength and between energysaving measures in terms of their frequency of use, so that both "users like you" and "energysaving measures similar to this one" have actual meaning.That is, we use the Rasch model to deliver personalized recommendations that use majority norm nudges to convince users to take more energy-saving measures.In addition, depicting high norms scores might persuade users to select specific measures, including relatively unpopular (i.e., low frequency of use), which tend to be energy-efficient (i.e., high kWh savings), as well as to select those that are perceived as effortful (cf.[46,53]).

Objectives
This is the point of departure for this article.We blend social norms and recommender systems to help users attain their energy-saving goals, designing social explanations to signal a majority norm in a personalized advice context.We present an energy recommender interface named "Saving Aid, " which generates a list of household energy-saving measures that is tailored toward a user's energy-saving attitude through the psychometric Rasch model.In a between-subject web study, we then use the Rasch model to craft and depict specific normative message alongside energy-saving measures that highlight either the adoption rate of all users (Global Norms: "60% of users do this"), or that of peer users with specific attitudinal strengths (Similar norms: "60% of users similar to you do this"; Experienced norms: "60% of users who perform more measures than you do this").
We posit the following research questions.We examine changes in choice behavior due to the depiction of social norms, as well as explore whether other commonly used energy-saving attributes play a role (e.g., kWh savings, perceived effort).We differentiate between "overall" changes in choice behavior (i.e., total number of chosen energy-saving measures, kWh savings, and the difficulty of chosen measures), changes in what measure is chosen from a recommendation list due to presented content (i.e., whether users choose different energy-saving measures due to presented norm scores, while controlling for other measure attributes, such as perceived effort), and changes in how users evaluate a recommender interface (e.g., changes in user satisfaction): • RQ1: Do social norms increase the number of chosen energy-saving measures or kWhs saved, and does this differ across different norms and different energy-saving attitudes?• RQ2: Do social norms and other measure attributes affect which energy-saving measures are chosen from within a recommendation list?• RQ3: To what extent do social norms affect a user's evaluation of an energy recommender interface?

LITERATURE REVIEW
This review focuses on work in environmental psychology and nudging that involve descriptive, social norms.We discuss the mechanisms of descriptive norms in psychological literature, contextualize them in the human-computer interaction (HCI) domain, and formulate expectations for our web study.In doing so, we explain how the psychometric Rasch model is used to personalize energy-saving advice, as well as to craft effective social norms for our user study.

Nudges in a Personalized Context
Changes in a decision environment (i.e., "choice architecture") that lead to predictable behavior are "nudges" [57].Notable examples include highlighting a default choice or using normative messages (e.g., "most users do X") [25,31].The use of nudges and persuasive messages is rather uncommon in personalized interactive systems.For example, while recommender systems typically provide decision support by optimizing what to recommend [29,40], nudges focus on how such content should be presented.This way, nudges can shift user preferences, which is also illustrated by studies on explanations in recommender interfaces [12,58].For example, if a recommender explains that a user's peers have chosen specific items, this might steer a user's preferences toward these items, even if they have a worse fit according to the recommender system [12].

Descriptive Norms in Energy Conservation
To date, recommender systems and most HCI studies have examined conservation decisions [36,54,59], but only in a social vacuum [1,44].While a few studies have applied social eco-feedback [24,45], in which users are compared to their peers (e.g., your neighbors consume 3,000 kWh annually [1,4]), its effects on a user's behavior are often limited [6].The majority of HCI studies have yet to adopt the theoretical and empirical evidence from environmental psychology that explaining behaviors in terms of relevant peer groups and descriptive norms can affect one's energy-saving behavior and decision-making [25,27,44,51].
A convincing message that affects preferences is one that highlights a majority norm [14].Showing that a rather large proportion of peers performs a certain behavior [13,27,38], can trigger or promote socially desirable behavior [12,63].Two mechanisms underlie this effect: compliance (i.e., the propensity to act consistently with presented norms) and conformity (i.e., adapting one's behavior to match an apparent majority) [14].Compliance refers to one responding to a direct request to act consistently with presented norms [14], while conformity describes how a behavior is adapted to meet that of an apparent majority.Both compliance and conformity can fulfill one's need for accuracy or appropriateness in behavior or decision-making, for it can alleviate uncertainty surrounding a certain behavior [13].For instance, individuals may want to gain the approval of others when it comes to pro-social behaviors, such as engaging in recycling if many others do so too [47].
For the design of the current study, we highlight work from Goldstein et al. [25] on the use of social norms to promote environmental behavior.They show that hotel guests are more inclined to re-use their towels when asked to do so using descriptive norms ("join your fellow guests in helping to save the environment"), compared to a general environmental message ('help to save the environment').Such normative messages highlight a community aspect ("75% of guests participated"), and are more convincing if they include context-rich or "local" aspects [13,25].For instance, they show that referring to "75% of hotel guests, " rather than "75% of citizens" is more effective, for it highlights an uncommon characteristic with the decision-maker [22,25,28].
Instead of only boosting a specific behavior, descriptive norms can also be used to promote a wider range of sustainable behaviors [41].For example, customers of web shops purchase more healthy and energy-saving products, if the products are explained using social norms instead of their environmental impact [3,15].We expect that this also applies to personalized advice in a recommender interface, when depicting normative messages alongside energy-saving measures.

Rasch Model
There is arguably a large range of norm percentages (probably anything below 50%), which will not trigger conformity [12][13][14].Although it is hard to promote "unpopular" measures such as "Install Solar PV" [54], they typically yield relatively high kWh savings [53].It might therefore pay off to somehow promote such measures, by making them stand out in the larger set of personalized user recommendations.
The dimensionality of energy conservation illustrates the large variety in adoption rates across measures [11,33].Energy-saving measures can be mapped on a one-dimensional scale using the psychometric Rasch model, based on how often these measures are performed [60].In the context of attitude theory "Campbell's Paradigm" [33],1 this frequency of use or adoption rate is operationalized as behavioral costs, which is defined to represent the execution difficulty of a measure, comprising different types of costs, such as money, time, and cognition [61].This approach postulates that measures with smaller adoption rates face higher behavioral costs.For example, a study that fitted a Rasch scale of 79 energy-saving measures, shows that 92% of respondents lower the thermostat when leaving the house for a longer period [56], which has a relatively low behavioral cost level, while only 7% of respondents uses an energy-efficient heat pump, which has high behavioral costs.Moreover, while it is easy to make verbal statements about the importance of saving energy (i.e., low behavioral costs), engaging in actual behavior is much harder and arguably more representative for mapping a user's preferences in a recommender user model [53].
Promoting Energy-Efficient Behavior by Depicting Social Norms 30:5 The characteristics of the Rasch model can be used to craft convincing social norms.An HCI study on energy recommender systems by Starke et al. [54] shows how to form a latent factor model, by asking a group of persons whether they perform a set of energy-saving measures, or not [33,60].Besides ordering measures on their adoption rate this way, users are also ordered with respect to how many measures they perform, which is operationalized as a person's energy-saving attitude [33].Hence, users with stronger attitudes are assumed to perform more measures.
The adoption rates of measures are what we label as "Global" norms.These are statements about the general population that can be presented alongside energy-saving recommendations, analogous to the norms used by Goldstein et al. [25].For example, "55% of other users have installed weather strips on doors" [53].As discussed in the introduction, we expect normative messages such as "75% of participants use X" to signal that the majority of a population has already adopted a certain energy-saving measure, and are therefore expected to be more persuasive than minority norms, such as "30% of users do X."

Crafting Personalized Social Norms
Using the Rasch model, we can craft personalized norms that go beyond "Global" percentages.Instead of highlighting the frequency of use among all users, the behavior of specific groups can be highlighted.This is achieved through the Rasch model, for the probability that a measure is performed by a specific user is person-dependent.This is shown in Equation ( 1): the probability p that a measure i is performed depends on a measure's behavioral costs δ , as well as the attitudinal strength θ of an individual n, where δ and θ are expressed in logistic scale units (logits) [33,60]: For any energy-saving measure, Rasch predicts the same adoption probability for all users with a specific attitudinal strength [34], along with increasing probabilities for users with stronger attitudes.Among the larger population, we consider this probability to be an adoption rate that can be communicated to a user, such as "60% of users with attitude X do this." Hence, we can craft personalized normative messages based on peer users with either similar or stronger attitudes.Not only could higher norm scores across an entire recommendation list persuade users to choose more energy-saving measures, it could also help to make "unpopular" measures, which have a relatively low "Global" adoption rate and high behavioral costs, more appealing.This could, in turn, persuade users to choose measures that have relatively high kWh savings (e.g., Solar PV, which has a low adoption rate), or to choose measures that are subject to other unattractive attributes, such as perceived effort [46,52].How can peer users with "similar or stronger attitudes" be translated to a convincing normative message?Literature on advice-taking highlights relevant "advice sources" that can be used for this purpose.Mentioning a specific peer group is shown to affect choice and advice acceptance [8], suggesting two important advice source characteristics for our work.First, similarity in relevant attitudes can increase the extent to which advice is considered or liked [8,50].The Rasch scale allows the design of "Similar" norms alongside recommendations, which can show higher adoption rates than global norms, especially for users with stronger attitudes.For example, users with a strong attitude might be presented as the "Global" norm "20% of users have installed radiator reflectors" [54], while the "Similar" norm would be "60% of users like you have installed radiator reflectors." A second characteristic is a peer user's perceived expertise [8,32].Expert advice is less likely to be ignored than suggestions from novices [7,8,32].In this study, we assume peers to possess such higher expertise if they perform more measures ("Experienced" norms), thus having stronger We imagine that there are two users: User 1 has a relatively weak energy-saving attitude, and User 2 has a relatively strong attitude.If each user is presented a measure that is tailored toward their attitude (δ = θ ), then these are the presented norm percentage for each norm condition.
attitudes and higher adoption rates.For example, where "Similar" norms would report "55% of users like you do X, " "Experienced" norms at an attitude θ that is +1 logit stronger than the user report an adoption rate of 78%.Combining different advice sources and adoption rates, we craft three different normative messages: • Global norms: "X% of users perform this measure." • Similar norms: "Y% of users who perform similar measures as you, perform this measure." • Experienced norms: "Z% of users who perform more measures than you, perform this measure."

Global vs. Person-Dependent Norm Scores
The percentages for our normative messages are determined using the Rasch model.To show how they depend on a user's energy-saving attitude, we present a recommendation scenario in Table 1.Suppose there are two users and that User 1 has an attitude θ 1 = −1, which is weaker than User 2 (θ 2 = +1), and that they are each presented a measure with behavioral costs δ equal to their attitude θ (in line with [56]).As a result, User 1 is shown a measure with lower behavioral costs than User 2. Table 1 shows that "Global" norm percentages depend on the user's attitude.User 1 has a relatively weak attitude and is therefore presented a "popular" measure with a high "Global" adoption rate (i.e., 70%).User 2 has a stronger attitude and, therefore, her attitude-tailored measure has a lower "Global" adoption rate of 30%, which is not very convincing.In contrast, the adoption rates of the personalized "Similar" or "Experienced" norms do not depend on the "Global" adoption rate, but the user's attitudinal strength and the measure's behavioral costs.Therefore, they are identical for both users (50% and 75%, respectively), and thus lead to a more convincing norm for User 2, compared to "Global" norms.
Table 1 shows what normative messages are most likely to be the most effective for what types of users.Based on the adoption rates, we expect users with stronger attitudes (i.e., User 2) to choose more measures when facing "Similar" norms, while users with weaker attitudes (i.e., User 1) do so for "Global" norms.It is possible that the higher degree of similarity signaled by a "Similar" norm message could overcome differences in norm percentages with "Global" norms [25].Nonetheless, another study by Yaniv et al. [62] argues that inexperienced users (i.e., with a weak attitude) are more likely to rely on majority advice (i.e., a "Global norm %") than similar advice, while experienced individuals (i.e., with a strong attitude) rely on similar peers.
Table 1 also suggests the additional benefit of higher adoption rates for "Experienced" norms, compared to "Similar." Although the persuasiveness of expertise (i.e., "others who perform more Promoting Energy-Efficient Behavior by Depicting Social Norms 30:7 measures than you") may be mitigated because of the reduced similarity, we expect that the higher adoption rates for Experienced norms (75%) across an entire recommendation list will be more persuasive than similar norms (50%).This could particularly apply to the adoption of measures that face high levels of behavioral costs or perceived effort [62].

Perception of Descriptive Norms
Besides evaluating behavior, it is also useful to understand how users perceive such a descriptive norm.Studies in environmental psychology teach us that how an individual evaluates environmental aspects can determine behavioral outcomes [2,18].For instance, social proof of others performing a particular behavior might lower the thresholds toward performing it [21].
Previous recommender studies have similarly highlighted the importance of perceptions in explaining the user experience [37], for they allow us to understand why a change in a particular system aspect increases the user experience.For example, Starke et al. [54] show that tailored recommendation lists with low levels of behavioral costs (δ ) are more likely to be perceived as feasible and, in turn, show stronger perceived support, higher levels of user satisfaction, and more energyefficient choices [54].Likewise, we expect descriptive norms to lower the behavioral thresholds to choose and, eventually, adopt energy-saving measures, which is assessed through perceived feasibility, perceived support, and subsequent user satisfaction.

Research Expectations
Based on the discussed literature, we summarize the expectations for our user study per research question.First, we examine whether social norms affect the total number of chosen measures and kWhs saved, across users with different energy-saving attitudes (RQ1).In line with [25], we test three different interfaces that depict normative messages (i.e., "Global, " "Similar, " and "Experienced" norms) alongside energy-saving measures in a recommender system, and compare their effectiveness to a non-social baseline (i.e., kWh Saving Score).We formulate the following expectations: • Social norms increase the number of energy-saving measures chosen by users, across all attitudinal strengths.
-Users with a weak energy-saving attitude choose more measures if a "Global" norm is depicted instead of a "Similar" norm, and vice versa for users with a strong energy-saving attitude.-Users choose more measures if they are explained with "Experienced" norms rather than "Similar" norms.• Social norms increase the amount of kWh savings per chosen measure, as well as the average behavioral level, particularly for users with strong energy-saving attitudes.
Second, we investigate whether social norms and other measure attributes affect which energysaving measures are chosen from within a recommendation list (RQ2).Based on the reviewed literature and the recommendation scenario in Table 1, we expect the following outcomes: • The presented norm percentage increases the likelihood that a measure is chosen from a recommendation list.• Measures with high levels of perceived effort are more likely to be chosen when accompanied by high norm percentages, such as majority norms.
Finally, we examine whether social norms affect a user's evaluation of our recommender interface (RQ3).Based on previous energy recommender research, we expect users to perceive and evaluate recommender interfaces that depict social norms more favorably, compared to those that emphasize the environmental impact.

METHOD
We investigated to what extent descriptive norms boosted the adoption of a heterogeneous set of tailored energy-saving measures.We first collected data in a pre-study to validate our onedimensional construct, used to personalize both advice and norms.Thereafter, we designed our energy recommender interface called "'Saving Aid" and performed an online user study on our normative intervention.

Pre-Study: Setting up a Rasch Scale for Personalized Norms
To generate recommendations based on the Rasch model, we designed a survey that was part of different study [53].Participants were asked to disclose their current energy-saving behavior, indicating for 13 to 25 randomly sampled energy-saving measures (out of a database of 134) whether they performed them or not ("yes" or "no").
We used dichotomous responses from 555 participants (50.6% male) with a mean age of 43.4 years (SD = 19.7) to fit a one-dimensional measurement scale of 134 energy-saving measures.A tabulation of the scale is reported in Appendix A, in Table 5.Each measure was assigned a distinct behavioral cost level, which formalized how likely a user would be to perform a particular measure [33].In terms of adoption rates, the scale ranged from 94% to 1%.
Furthermore, Table 5 also shows how the estimated kWh savings of each measure are distributed across the scale.Although higher kWh savings seemed to be more prevalent "higher up the scale" (i.e., for higher behavioral cost levels), it was possible to perform measures with moderately high kWh savings across the entire scale.This is also depicted in Figure 1, which shows a small increase in the average kWh savings for higher behavioral cost levels.5 also describes the scale's infit statistics (for mathematical details, see [9]).Overall, the scale's item parameters were determined reliably (α = 0.95, M = 0.05, SD = 1.57), as all measures fitted the construct by meeting the prescribed "infit" criteria [9].Due to an item separation of 4.51, we could reliably discern four to five strata of behavioral costs.

Perceived Effort.
The same pre-study also collected data on how effortful participants perceived measures to be [53].Although a measure's perceived effort decreased the likelihood that a measure was chosen in previous studies [46], we expected that depicting high norm scores might help to increase that likelihood.A sub-sample of the participants (N = 304) was presented in a 4-point scale alongside each measure, on which they could indicate whether executing a measure would require either "very little effort, " "little effort, " "fairly some effort, " or "a lot of effort." The mean response per measure (304 users, rating 25 measures each) is listed in the Appendix, Table 5.We observed a moderate to strong correlation between a measure's perceived effort and its behavioral costs: r (134) = 0.59, p < 0.001.

"Saving Aid" Recommender Study
Following our pre-study, we set up an online user study in collaboration with a Dutch energy supplier (i.e., Eneco).We compared four different recommender interfaces, of which three depicted social norms alongside energy-saving advice and one the kWh savings values.

Participants.
Members of a consumer panel at Eneco were invited to use our "Saving Aid" recommender system to find and select appropriate energy-saving measures to take in the households.Panel members, which were all smart thermostat owners, were sent a formal email invitation, of which an English translation is depicted in Figure 2.This panel was considered a good target group for our study, as they were able to improve energy efficiency in one's household beyond simple behavioral curtailment, as they were predominantly homeowners.
In total, 217 participants used our "Saving Aid" and filled out the evaluation questionnaire.However, we excluded 10 participants for either completing the study in less than 3 minutes, indicating to not trust the website, or showing no variation in the evaluation questionnaire.Eventually, we considered a sample of 207 participants (M = 53.5 years, SD = 14.0) that comprised predominantly males (87%).Among our participants, only 26.6% owned the home they lived in, while the majority lived in a town house (58.5%).

Procedure.
To estimate each user's attitude, we randomly sampled 13 energy-saving measures from across the behavioral costs scale.These were presented sequentially to users, who Fig. 3. Our "Saving Aid" energy recommender interface (NL: "Besparingshulp.nl"),translated to English.Depicted are the name and a short description of the top-2 recommendations in our interface (e.g., at the top: "Install low-flow showerheads"), out of a total of nine recommendations.Users could select any number of measures they would like to perform, by clicking "I will do this." Measures are sorted from high to low kWh savings.On the left, users could hover a measure's image to inspect additional attributes: kWh savings (scaled from 1 to 5 light bulbs), the annual savings (in e ), investment costs (in e ), payback period (from "less than a month" to "never"), effort, and behavioral frequency.Depending on the condition, the numbers on the right either show a score or a norm percentage.Depicted here is a "Similar" norm.
indicated whether they performed them or not ("yes" or "no").Subsequently, we inquired on the user's housing situation to filter irrelevant measures from the recommendation list.
Afterwards, we presented each user with a list of nine energy-saving recommendations, whose behavioral costs were tailored toward the user's estimated energy-saving attitude (θ ≈ δ ).In addition, the measures were ordered in terms of their estimated kWh savings.Figure 3 depicts the top-2 measures of such a list, presenting a measure's name, a short description, a score or percentage, and a (norm) explanation.Recommendations were sampled between the adoption probabilities of 18%-75%, which ranged from −1.5 to +1 logit in terms of the attitude-behavioral costs difference.We asked users to select any number of recommended measures that they wished to perform.Users could hover for "more info" to see other commonly used attributes of an energy-saving measure, such as its frequency and kWh savings.Figure 3 portrays this on the left-hand side of the top measure.
After interacting with our "Saving Aid" interface, we inquired on the user's subjective evaluation of the system.To this end, users were presented statements on 7-point Likert scales.Finally, users could share demographic details and disclose their email address to receive information on chosen measures.
Promoting Energy-Efficient Behavior by Depicting Social Norms 30:11

Research Design.
The presented score and explanation depicted alongside each measure in the list is subject to four between-subject conditions.In line with Goldstein et al. [25], we compared three norm explanations to an environmental baseline: (1) "Savings" score (baseline): We presented a "Saving Score" of 0 to 100, where 100 represented the highest kWh savings in the list.(2) "Global" norm: The adoption rate of measures on the scale, which ranged from 2% to 98%, explained as "XX% of other customers do this." (3) "Similar" norm: The user's adoption probability (between 18% and 75%), explained as "XX% of other customers who perform the same measures as you, do this." (4) "Experienced" norm: The adoption probability for an attitude 1 logit above the current user, which fell between 37.8% and 88%.It was explained as "XX% of customers who perform more measures than you, perform this measure."

Choice
Variables.We examined a user's choice behavior through two analyses.First, we considered the total number of chosen measures by a user per condition [RQ1], as well as the amount of chosen kWh savings per measure and the average behavioral cost level of chosen measures (i.e., attitude-cost difference).Second, we predicted the likelihood that a specific measure was chosen in each condition used on the presented norm scores [RQ2].In the same model, we considered a measure's perceived effort, as well as explored possible interaction effects.

Attributes and Characteristics.
To address our research questions, we dichotomized each user's energy-saving attitude to discern between users with weak and strong attitudes.Figure 4 depicts the distribution of attitudinal strengths in our sample, which were estimated at discrete levels.In previous studies that used a Rasch scale, a cut-off would be placed at θ = 0 [9], but this would lead to very uneven groups in the current study (N weak = 48 vs. N str onд = 159).However, a median or mean split (Median = 0.42, M = 0.52, SD = 0.79) would neither lead to balanced groups, 2 nor would it properly differentiate between weak and strong attitudes.To balance representativeness of both factors, we instead placed the cut-off at θ ≤ 0.25 (see Figure 4), creating a group of 82 users with a weak attitude and a group of 125 users with a strong attitude.
Other attributes are presented in our "Saving Aid" interface (cf. Figure 3).We used the presented score as a continuous measure ("Savings Score" or norm percentages) to assess its impact on the probability that a measure was chosen (RQ2).Other attributes could be inspected in the interface by hovering on an energy-saving measure.From these, we included a measure's perceived effort in our analyses, as we examined whether social norms could boost the adoption of effortful measures.

User Evaluation Aspects.
To examine whether users evaluated recommender interfaces that depicted social norms more positively than non-social ones (RQ3), we inquired on different user evaluation aspects.After interacting with the "Saving Aid, " users were presented questionnaire items on a 7-point Likert scale about the Perceived Feasibility of the presented recommendations, their Perceived Support from the system, and the user's satisfaction with the chosen measures (i.e., Choice Satisfaction).All items were based on earlier research of Knijnenburg and Willemsen [37], and were eventually submitted to a confirmatory factor analysis, as part of a Structural Equation Model.The results are described in Table 4, and discussed in the results section.We also included two user characteristics in our user evaluation analysis.Besides discerning between users with weak and strong energy-saving attitudes, we also inquired on a user's environmental concern, scoring all 15 items of the revised NEP scale [18] on a 7-point Likert scale.We found that the scale had an acceptable internal consistency (α = 0.78).However, the path model was built using only six items to optimize the fit of the Structural Equation Model.

RESULTS
We investigated to what extent social norms affected user choices and evaluation of an attitudetailored list of energy-saving measures, compared with our kWh savings baseline.After presenting manipulation checks, we first examined the total number of choices, chosen kWh savings, and average chosen behavioral cost level across our normative conditions to the baseline (RQ1).Second, we predicted whether the likelihood that a measure was chosen from a recommendation list was affected by the presented norm score, compared to the effects in the baseline (RQ2).Third, we investigated whether the different normative messages affected a user's perception of the system (RQ3).

Presented Norm Scores.
We examined whether the presented scores and percentages were in line with our intended manipulations.As outlined in Table 1, we intended "Global" norms to yield higher norm percentages for users with weak attitudes than in the "Similar" condition, and vice versa for users with strong attitudes.Moreover, we intended "Similar" norms to have a median score around 50%, while the median score for "Experienced" norms was designed to fall around 75%, across all attitudinal strengths.
Figure 5 depicts the distribution of presented scores per condition, across weak (in blue) and strong (in red) attitudes.It shows that the Savings Score roughly captured all possible scores with a median score of 60, while the normative conditions had narrower distributions, which was as expected.However, it shows only a minor difference in presented median scores between "Global" Fig. 5. Box plot of the presented scores or norm percentages in our recommender interface, across conditions and attitudinal strength, presented as a manipulation check.Scores presented to users with weak attitudes are depicted in blue; scores presented to users with strong attitudes in red.As intended, the Savings condition included all scores, while the distributions were narrower and more selective in other conditions.norms (53%) and "Similar" norms (49%) for users with weak attitudes, which was much smaller than our intended manipulation (72% vs. 50%).This made it less likely for any effect to surface between "Global" and "Similar" norms for users with a weak attitude.In contrast, the difference in median scores for strong attitudes between "Global" (39%) and "Similar" (58%) was consistent with our intended manipulation (30% vs. 50%).

Validation of the Used Rasch Construct.
To validate the representativeness of the Rasch scale used for this study (cf.Table 5), we inferred a new Rasch construct based on data collected in this study.Without reporting detailed fit statistics, we found the Rasch scale to have an adequate model fit (item reliability α = 0.85; person reliability α = 0.69).We compared the behavioral cost levels of the new construct to the one used in the current study, performing a pairwise correlation analysis.In spite of both research populations being rather different, it revealed a strong correlation: r (134) = 0.82, p < 0.001, which showed that the two constructs were comparable and had to a large extent a similar order.We only found that a few measures would significantly shift in terms of their behavioral cost level.For example, because all users in the current sample were smart thermostat owners, the measure "install a centralized temperature system with zone controls & thermostats" changed from +3.33 to −2.95 logit.

Total Number of Chosen Measures.
We investigated whether the depiction of descriptive norms increased the number of energy-efficient choices, compared to our non-social baseline (RQ1).We first addressed this question through a multilevel logistic regression analysis to estimate the likelihood that a measure was chosen.Table 3, Model 1 (reported in Section 4.3) examined whether a measure was more likely to be chosen in each normative condition, compared to the baseline.We found no differences between the normative conditions and the Saving Score  baseline (all p-values > 0.05): not for "Global" norms (OR = 0.83, S.E.= 0.33), neither for "Similar" norms (OR = 1.17,S.E.= 0.46), nor for "Experienced" norms (OR = 1.02,S.E.= 0.39).
This result is illustrated in Figure 6.It depicts small, but non-significant differences in the number of chosen measures between the Savings baseline and each norm condition, indicating that social norms did not lead to changes between conditions.Based on the norm scores, we expected that users with a weak attitude would choose more measures when facing "Global" norms compared to "Similar" norms, and vice versa for users with a strong attitude.Although Figure 6 depicts a small difference between the two types, Kruskal-Wallis tests of ranks revealed that these were not significant: not between "Global" and "Similar" norms for users with a weak attitude: H (1, 36) = 0.267, p = 0.61; nor for users with a strong attitude: H (1, 65) = 0.45, p = 0.50.In addition, based on the norm scores, we expected that depicting "Experienced" norms would lead to more choices than presenting "Similar" norms, but we found no significant difference between the two: H (1, 105) = 0.028, p = 0.87.

Chosen kWh Savings per
Measure.We further expected that social norms could boost the overall kWh savings chosen.We present our results in Figure 7 and compared differences in chosen kWh savings per measure, across conditions and attitudinal strengths.Kruskal-Wallis tests provided no evidence for differences in chosen kWh savings between the baseline (M = 401, SD = 843) and the normative conditions (M = 193, SD = 393); not for "Global" norms: H (1, 100) = 2.21, p = 0.14; neither for "Similar" norms: H (1, 97) = 0.65, p = 0.42; nor for "Experienced" norms: H (1, 103) = 0.79, p = 0.37.As we did not find differences across different attitudinal strengths, this suggested that our normative condition did not increase overall kWhs saved, as indicated by a user's choice behavior.

Behavioral Costs.
Finally, we examined whether the behavioral cost levels of chosen measures differed across conditions and attitudes (i.e., weak vs. strong).We used the difference between a user's attitude and the behavioral costs of a chosen measure: the "attitude-cost difference." A positive difference would indicate that users had chosen relatively challenging measures for their attitude level, while a negative difference would suggest that users had selected relatively easy ones.We expected that social norms might be more effective to persuade users to select challenging measures, particularly for users with a strong energy-saving attitude, leading to a positive attitude-cost difference.
Table 2 presents two multilevel linear regression models, clustered at the user level.Model 1 examined whether this led to different choices in the norm conditions, compared to the Saving Score baseline.We found that both the "Global" and "Similar" norm conditions positively affected the chosen "attitude-behavioral cost" difference of chosen measures, compared to choices in the baseline: β = 0.29, p < 0.01, for the Global norm condition; β = 0.22, p < 0.05, for Similar norms.This suggested that users in those conditions were more likely to choose measures with higher behavioral costs, which were located "further up" the Rasch scale.In contrast, such an effect was not observed for the "Experienced" condition.
Table 2, Model 2 examines whether the effects of Model 1 were affected by a user's energy-saving attitude.We found that users with strong energy-saving attitudes were more likely to choose measures with relatively higher behavioral costs, across all conditions: β = 0.25, p < 0.05.The observed effects of the presented norms in Model 1 were reduced in Model 2, as only Global norms still positively affected the chosen behavioral cost level (p < 0.05).Moreover, we did not find interaction effects between attitude and the presented norm interface.
To better understand the results reported in Table 2, please refer to Figure 8.It depicts how users with stronger energy-saving attitudes had also chosen measures with behavioral costs above their own attitude, yielding a positive "attitude-cost difference." In contrast, users with a weak attitude had chosen "below" their own attitude in all conditions.Regarding specific conditions, the average behavioral cost level of chosen measures in the "Global" (M = 0.24) and "Similar" (M = 0.13) conditions was higher than in the baseline (M = −0.075).In addition to the models reported in Table 2, we also examined whether demographical factors (i.e., age, income, etc.) and housing characteristics affected the current results, but we found no significant effects.

Conclusion.
Overall, we found that the use of descriptive norms did not persuade users to choose more energy-saving measures in total, compared to our kWh savings baseline.Nor did we observe a change in the overall kWh savings selected by users.It seemed that adding normative explanations to a personalized list of energy-saving recommendations did not boost the overall choice behavior in terms of sustainability.Instead of main effects, it seemed that more "within list"' effects had occurred (cf.Section 4.3).
We did find overall chosen in terms of the relative difficulty of chosen measures for some normative conditions, in particular "Global" Norms.This suggested that normative explanations, or other social explanations for that matter, are more successful in persuading users to select "challenging" measures, compared to a factual explanation (i.e., kWh savings).Moreover, we found a main effect of energy-saving attitude on the behavioral cost level of chosen measures, suggesting that more experienced users could be presented measures with relatively high behavioral costs compared to their attitudinal strength, while those with a weaker attitude should be presented relatively easy measures.

Choice Behavior for Individual Measures (RQ2)
We further investigated whether social norms affected choices for individual measures, compared to a non-social baseline (RQ2).The results of this analysis are reported in Table 3. Model 2 addresses whether measures with higher norm scores were more likely to be selected in our interface, which we expected.It examined whether the presented norm score affected the likelihood that a measure was chosen, compared per norm condition (e.g., Score X Global) to the effect in the baseline.Second, in addition to the norm score, Table 3, Model 3 considers a measure's perceived effort ("Effort") and the interaction between the two ("Score X Effort") on the likelihood that a measure is chosen.3, Model 2 shows, for the baseline, that depicting higher Savings Scores did not affect the probability that a measure was chosen: OR = 0.78, p = 0.533.This pointed The "Main Effects" examine whether any measure was more likely to be chosen in a norm condition, compared to the baseline, effectively comparing the total number of measures chosen (Model 1).The "Within List" effects examined for each measure whether the likelihood it was chosen was affected by the presented norm score (Models 2 and 3) and its perceived effort (Model 3), also comparing the effect in each norm condition to the baseline.Reported are odds ratios (OR < 1 implies a negative effect; OR > 1 a positive effect) and standard errors (S .E .). * * * , p < 0.001; * * , p < 0.01; * , p < 0.05.

Norm Percentages. Table
out that higher kWh savings did not persuade users to choose a measure.Figure 9 illustrates this as well, as the proportion of chosen measures (depicted in green) did not differ across kWh Savings scores.
In line with our expectations, Table 3, Model 2 provides evidence that showing higher norm scores increased the likelihood that a measure was chosen, compared to the effect of presenting high kWh savings.The effect of Score in the "Global" (OR = 3.80, p = 0.022) and "'Experienced" conditions (OR = 2.99, p = 0.049) increased the likelihood that a measure was chosen, while no such effect was found for Score in the "Similar" condition (OR = 2.26, p = 0.15), all compared to effect in the baseline.This suggested that high "Global" and "Experienced" norm percentages were more likely to persuade a user to choose a measure, compared to presenting a measure's kWh savings.While we could not make such assertions for "'Similar" norms, the effect pointed towards the same direction.
We explored what norm percentage cohorts were more likely to persuade a user to choose a measure.Figure 10 shows two levels at which the proportion of chosen measures seemed to increase: around 20% (an increase from 0.07 to 0.23) and at 60% (an increase from 0.24 to 0.29).This suggested that normative messages below 20% discouraged users from choosing them, while measures that depicted norm percentages over 60% seemed the most likely to be chosen, which was most common among "Experienced" norms.

Perceived Effort.
To expand the results found in Table 3 Model 2, we also analyzed a model that included the presented norm score, a measure's perceived effort, and the interaction between the two.We expected that presenting high norm scores in our interface would overcome a measure's perceived effort level, increasing the likelihood that a measure was chosen.
Table 3, Model 3 reports the results.We observed no significant within-list effects for score, nor were significant effects observed for perceived effort between the norm conditions and the baseline (p > 0.05 for all effects).Model 3 did reveal interesting interaction effects.We found that an interaction between score and effort in the baseline negatively affected the likelihood that a measure was chosen: OR = 0.15, p = 0.041.This suggested that measures with high kWh savings were more likely to be chosen if they had low levels of effort, while this likelihood decreased if a measure had high levels of perceived effort.The latter, high effort and high savings, was however far more common among the set of energy-saving measures used in this study.
We examined further interaction effects between score and perceived effort for the normative conditions.Table 3, Model 3 reveals a non-significant difference in choice likelihood between the baseline and both the "Global" and "Similar" conditions (OR ≈ 6.3, p > 0.05).Even though this showed that high norm scores did not significantly persuade users to choose more effortful measures, the OR was comparable to that of the baseline, suggesting that the negative baseline effect was reduced (i.e., users only choosing measures with high kWh savings if effort was low).Furthermore, we did find a significant increase in the choice likelihood for the "Experienced" condition: OR = 12.52, p = 0.034, suggesting that high norm percentages, explained in terms of experienced peers, increased the probability that an effortful measure was chosen, rather than those with low effort.The odds ratio of this positive effect was two times larger than the negative effect in the baseline, showing that higher "Experienced" norm scores could persuade users to choose effortful measures.

Conclusion.
We examined whether depicting social norms in an energy recommender interface also affected what measures were chosen from a list of recommendations (RQ2).We found that recommendations were more likely to be chosen if they were presented alongside high norm scores or percentages, suggesting that they stand out from a list of tailored measures.In particular, it seemed that presenting "Experienced" norms alongside effortful measures could increase Promoting Energy-Efficient Behavior by Depicting Social Norms 30:19 the likelihood that they were chosen, while explaining measures in terms of their kWh savings led users to choose relatively low-effort measures.Although the previous section did not report any changes in the overall choice behavior, the current section showed that social norms in a personalized context were capable of promoting specific measures in a recommendation list.

User Evaluation of the Saving Aid (RQ3)
Finally, we examined whether perceptions of the recommender system differed between conditions, and whether this affected, in turn, choice behavior and satisfaction (RQ3).We organized the objective constructs, subjective constructs, and relevant interactions into a path model using Structural Equation Modeling in MPlus [37,43].To do so, we first performed a confirmatory factor analysis, after which we tested a fully saturated model and performed stepwise removal of non-significant relations.

Confirmatory Factor Analysis.
We submitted all items in our evaluation questionnaire, described in Table 4, to a confirmatory factor analysis.We had to drop perceived support for our subsequent structural equation model (SEM) analysis, as it could not be reliably discerned from the choice satisfaction aspect, violating divergent validity [37].Both the feasibility (α = 0.72) and choice satisfaction aspects (α = 0.87) had an acceptable internal consistency and met the standards for convergence validity (AV E > 0.5), as prescribed by [37].

Perceived Feasibility and Choice
Behavior.We expected that the different social norm interfaces would be perceived as more feasible than the kWh savings baseline.Figure 4 partially confirms this, as both the "Global" norm (β = 1.04) and "Similar" norm conditions (β = 0.693) positively affected a recommendation list's perceived feasibility compared to the baseline.Moreover, the interaction between the user's energy-saving attitude and a "Global" norm on a user's perceived feasibility (β = 0.937) matched our expectations that the user's attitudinal strength would determine how the "Global" norms would be evaluated.
In contrast, no such effect on feasibility was observed for "Experienced" norms (β = −0.286,p = 0.310; not depicted in Figure 11).Although the results in Table 3 suggested that "Experienced" norms convinced users to choose effortful measures, they did not increase the perceived feasibility of the presented measures altogether.
In turn, Figure 11 shows that perceived feasibility positively affected the number of measures chosen by a user.A bootstrapped test of indirect effects from the "Global" condition toward the number of chosen measures was significant: β = 0.240, 95%-CI : [0.020, 0.460], p = 0.033, while the effect from the "Similar" condition to the number of chosen measures was not significant: β = 0.160, 95%-CI : [−0.009, 0.329], p = 0.064.

Choice Satisfaction.
The SEM model in Figure 11 shows two positive effects on choice satisfaction: perceived feasibility (β = 0.330) and the number of chosen measures (β = 0.305).This suggested that choosing feasible measures, as well as more measures positively affected how they were evaluated.In terms of indirect effects, the path from "Global" norms to choice satisfaction (β = 0.417, 95%-CI : [0.041, 0.793], p = 0.030), as well as the path from the "Similar" norm condition (β = 0.278, 95%-CI : [0.034,0.521],p = 0.026) was significant, showing that the positive effects of these norms on satisfaction were mediated by feasibility.
Besides the interface effects, we found that the mean presented recommendation list score positively affected the list's feasibility perception (β = 0.034).This confirmed that higher scores, regardless of the source (i.e., kWh savings or norm), were related to higher levels of feasibility.Figure 11 also depicts that a user's environmental concern positively affected perceived feasibility (β = 0.225), showing that users who attributed greater concern toward their role in protecting the environment, indicated that the recommended measures were more feasible to perform.

Conclusion.
We examined whether user perceptions of recommendations were affected by the depiction of social norms (RQ3).The path model showed that "Global" and "Similar" norms increased the perceived feasibility of recommendations, relative to the Savings baseline, while no such effect was found for "Experienced." This suggested that the effectiveness of descriptive norms did not simply boil down to high percentages, but that the advice source played a role, possibly through similarity rather than expertise.
Furthermore, our path model shows that higher levels of feasibility increased both the number of chosen measures, as well as choice satisfaction.Our tests of indirect effects confirmed that most of the paths from the "Global"' and "Similar"' norms to both the "number of chosen measures" and choice satisfaction were mediated by feasibility.This indicated that explaining energy-saving recommendations in terms of normative messages affected how users perceived them (i.e., making them seem more feasible to perform) and, in turn, increased the number of measures that users had selected (i.e., a proxy for behavioral intention), as well as their choice satisfaction levels.

DISCUSSION
We have investigated to what extent different social norm nudges affect choice behavior in an energy recommender system.We have translated the findings of a well-known social psychology study by Goldstein et al. [25], which used descriptive norms to promote towel re-use, to an HCI context.Specifically, we have investigated whether the merits of descriptive norms still apply in a choice environment where a more diverse, yet personalized set of energy-saving measures is presented.
Our results show that normative explanations affect user decision-making within lists of tailored recommendations, in the context of energy conservation.Although normative messages have not led to more energy-efficient choices in total, as investigated in [RQ1], we find that presenting normative explanations, as well as comparatively high norm scores can boost the adoption of specific energy-saving measures (RQ2).Moreover, the depiction of social norms also positively affects a user's evaluation of a recommender system (RQ3), in terms of the perceived feasibility and choice satisfaction.These findings underline that social proof [21] and implied norms can act as persuasive nudges, both through majority preferences (in this study: the presented norm percentages), as well as specific peers (in this study: the different norm sources)-even in the context of personalized advice.Such norms are found to be more effective than presenting additional information about key attributes of the recommended items (in this study: kWh savings).Moreover, social norms also affect what types of measures are chosen, in terms of relevant energy-saving attributes: a measure's behavioral costs (i.e., execution difficulty) and its perceived effort.
More generally, we show that person-dependent nudges (in our case: social norms) are beneficial to contexts where the advice itself is also personalized.For nudging researchers (i.e., behavioral economists [57]), this implies that it could be fruitful to move beyond one-size-fits-all persuasion, for its effectiveness might be lost when the content of an intervention is aligned with a user's preferences [35,54].For recommender system scholars, the merit of our study lies in the effectiveness of personalized nudges in recommender interfaces to shift user preferences.This is particularly important to recommender domains where self-actualization and behavioral goals play a role [20], as users in those domains often consider what their peers are doing [13,39].Earlier approaches in behavioral recommender domains (i.e., energy and health) show that is hard for users to seek behavioral change, as most algorithms reinforce their current preferences [48,53].The use of nudges, such as social norms, could alleviate those issues.

Influence of Study Design
We further discuss the study results in more detail, by first examining the influence of our study design.Contrasting with the findings in Goldstein et al. [25], the use of descriptive norms did not lead to an overall increase in the total number of chosen energy-saving measures, compared to a baseline that emphasized kWh savings (RQ1).We discuss a number of possible causes for this different outcome through our study design.
For one, the decision contexts are different, in terms of the number of presented norms and measures.The observed behavior in Goldstein et al. [25] is rather straightforward, as it only promotes towel re-use by means of a door hanger in a hotel room.In contrast, our study comprises a set of attitude-tailored energy-saving measures in a web shop study, presenting multiple descriptive norms simultaneously.While the change from a hotel room to a web-based interface might not have impacted the results, the fact that our recommender has simultaneously presented multiple measures with different norm percentages could have led users to make more comparative judgments.For example, users could have been influenced by the presented interface score rather than the norm source.This is supported by our findings that, in the "Global" and "Similar" norm conditions, measures were more likely to be chosen from recommendation lists if they had a comparatively high score, while revealing no differences in the main effect between norm conditions.
The results in our study clearly show that, contrary to what has been suggested in a metareview [1], simply generalizing the effectiveness of descriptive norms for a simple, one-time behavior (e.g., towel re-use) to all types of measures is not a representative statement.Hence, potential adopters of a norm-based approach should seriously consider the nature of the behavior that is being promoted, whether it is energy conservation or another behavior in the recommender domain that involves effort, such as healthy eating [48,56].While the effects of descriptive norms have been rather consistent in both movie and social recommender systems [12,26,49], these are domains distinct from energy and health, as their behaviors (e.g., clicks) face few behavioral thresholds (e.g., no financial costs).
Finally, we wish to emphasize that our study has used a strict baseline.By framing treatment energy-saving measures in terms of their energy-saving measures, we have attempted to reflect the study design of Goldstein et al. [25], rather than adhering to the more traditional social psychology study design that employs a no-treatment control group [1].Hence, we have evaluated the effectiveness of our normative manipulations critically, which is consistent with recommender system research, where novel algorithms and interfaces are benchmarked against commercial applications or state-of-the-art technology [37].

Differences in Norms
Besides differences in choice behavior as a result of the presented norm percentages, we also discuss to what extent the type of descriptive norms played a role.Goldstein et al. [25] propose that "provincial" norms (i.e., "local" norms such as "'Similar" and "Experienced") are more effective in changing user behavior than the more "Global" norms, for they share very specific and context-rich characteristics with the recipient of an environmental appeal.While each normative condition in this study leverages some similarity with the user, the "Global" and "Similar" norms are arguably the most "provincial, " as they specifically emphasize the similarities with the user.In contrast, the "Experienced" norm also emphasizes a difference by pointing out that other customers "performed more measures than [the user]." Nonetheless, our path model shows effects consistent with a context-rich, provincial norm explanation.The "Global" and "Similar" norm conditions produce higher levels of perceived feasibility compared to the Savings baseline, while no such effect is found for the "Experienced" condition.That analysis suggests that the increase in feasibility can be attributed to the norm source rather than the score, as the "Experienced" norm presented the highest percentages.This would suggest that users are influenced by the principle of "similar others are doing it, therefore I can do this too, " a more general heuristic for choice.
The relevance of this finding on feasibility lies in the indirect effects of "Global" and "Similar" norms on choice satisfaction, which are mediated by feasibility.The use of such normative explanations has not only increased the perceived feasibility of the recommended measures, but has also led to higher levels of choice satisfaction, compared to users in the kWh Savings baseline.This increase in choice satisfaction might be important to ultimately spur behavioral change, as it can, in turn, persuade users to re-use a recommender system at a later stage [37].Moreover, previous studies have shown that higher levels of choice satisfaction lead to a higher likelihood that users actually implement chosen measures [54].
Although our "Experienced" norms have not increased feasibility compared to the kWh savings baseline, our analysis on choice in recommendation lists (cf.Table 3) reveals that it can boost preferences for high-effort measures.Unlike in the baseline condition, where only loweffort measures are more likely to be chosen for high (kWh) Saving Scores, higher "Experienced" norm percentages seem to boost the selection of high-effort measures.This finding nicely shows that personalized nudges rather than one-size-fits-all persuasion can improve the effectiveness of a recommender system, as the persuasiveness of the "Experienced" norm is specific to effortful measures.

Limitations
There might be some concerns about the use of self-reported behavior and choice as our behavioral indicators.Although we are aware that self-report can be an inaccurate measurement method, it should have not had a large impact on the study's results, for we have only examined differences in choice behavior between randomly assigned conditions.Furthermore, it is possible that some users have also chosen measures that they already performed.Particularly for the so-called "curtailment" measures (i.e., highly frequent behaviors [56] with no or little investment costs), users may have chosen a certain measure to indicate that they want to "keep doing" something, such as turning off the lights after leaving a room.However, since we have mostly made comparisons across conditions, we expect the impact of this issue to be small.If any, the amount of chosen kWh savings per measure could even be larger, for curtailment measures tend to yield relatively low energy savings (cf.Table 5 and [17,56]).
Finally, the used sample might not be representative for the broader population.The sample comprises energy supplier customers with a smart thermostat, a group that happens to be composed of mostly males with relatively strong energy-saving attitudes.Although this might limit the extent to which our results could be generalized to the broader population, our randomly assigned between-subject research design reduces the impact of using such a specific sample population.Hence, we have examined the effectiveness of different personalized normative interfaces.Nonetheless, it would be useful to replicate this research among a more representative population, to check whether our findings on specific normative explanations still apply.Since the study has been conducted in the Netherlands, it is possible that some of the energy-saving determinants (e.g., a measure's perceived effort) show small variations across different countries [56,60].

Mitigating Climate Change.
Our findings show how current approaches to household energy conversation promotion could be improved.Although some personalized campaigns are in place [56], relatively many are still one-size-fits-all.For example, whereas many government information pages describe best practices, they could also implement a tool similar to the "Saving Aid" (cf. Figure 3), for which personalization brings forth relatively little costs in terms of time.Furthermore, other interventions only focus on the use of social norms for a single metric or behavior [25,51,55], for example by comparing the overall energy use of one household with its neighbors [4].
We expect that a combination of personalization and social norm interventions is likely to improve the overall effectiveness of energy-saving promotion, in terms of actual behavioral change.However, beyond energy-efficient choices, it requires further research to understand to what extent it actually lowers the threshold of "getting started." For example, some people defer from making energy-saving choices, for they believe that they are the only people engaging in it-the so-called "sucker effect" [27].Follow-up research should make clear to what extent observing the behavior of others alongside personalized advice, also in a recommender interface such as ours, could help to lower the thresholds to actual energy-saving behavior.Previous research has provided evidence that users who evaluate a recommender interface positively, are also much more likely to implement chosen measures [54].

Applications of Social Norms in Other Recommender Domains.
The current article has employed social norms to support behavioral change for a user's "better self" [20].While such persuasion techniques may be somewhat paternalistic, it could be argued for the energy domain that the user ultimately benefits, due to a lower energy bill and a positive environmental impact.However, the implications of this study also reach beyond the energy domain, for it has been among the first to apply nudges in a personalized advice interface.For example, in food, most recommender systems only focus on a user's current eating habits [23,42], while a user might have certain eating goals that can be attained more easily by the use of social nudges [3,52].
There are also domains in which norms can easily backfire or lead to arguably unethical situations.For instance, in the context of news recommender systems [5], normative explanations of news articles could reinforce partisan readership, if a user observes fellow democrats or conservatives consuming certain articles.Designers of recommender interfaces should always consider whether it could be harmful to a user if she "follows the herd, " and to what extent reinforcing such behavior through persuasive messaging exacerbates this.
Nonetheless, we wish to repeat that our study has focused on nudging within a personalized list of recommendations.Since the presented items already fit a user, this might mitigate possible "herd behavior." Moreover, the recommendation algorithm used in this article (i.e., based on the Rasch model) is less biased toward popular items [52], for it focuses on the relation between the user and an item, based on its execution difficulty (i.e., behavioral costs) and novelty (i.e., execution probability) [56].We think that this study can serve as a starting for various recommendation interfaces, in which social explanations are presented alongside personalized items.Described are names, behavioral cost levels (θ i ), the infit statistics (MNSQ denotes Mean Square; ZSTD the standardized mean), kWh savings per year (i.e., "kWh"), and perceived effort (i.e., "EF").

Fig. 2 .
Fig. 2. Excerpt from the email template sent to customers of Dutch energy supplier Eneco who owned a smart thermostat.

Fig. 4 .
Fig. 4. Histogram depicting the distribution of energy-saving attitudes in our sample.It also depicts the cut-off between weak and strong energy-saving attitudes, placed at θ = 0.25.

Fig. 6 .
Fig. 6.Total number of chosen measures, per condition and attitude strength.Error bars are 1 S.E.

Fig. 7 .
Fig. 7. Box plot of the log transformed chosen kWh savings per measure, divided across conditions, as well as between weak (in blue) and strong (in red) attitudinal strength.

Fig. 9 .Fig. 10 .
Fig. 9. Depicted in green are the proportions of chosen measures in the baseline condition (among those presented), per Savings Score category.

Table 1 .
Recommendation Scenario to Illustrate what Norm Percentages are Presented for Each Norm, and How this Depends on the User's Attitudinal Strength

Table 2 .
Multilevel Linear Regression Models Predicting the Relative Behavioral Costs Level (Difference between Attitude-Behavioral Costs) of Chosen Measures, Clustered at the User Level Energy-saving attitude discerns between weak and strong; norm condition dummies are compared to the Saving Score baseline.β represents the regression coefficient.

Table 3 .
Three Multilevel Logistic Regression Analyses Predicting the Choice Probability Per Measure, Clustered at the User Level

Table 4 .
Results of the Confirmatory Factor Analysis on User Experience [37]s without loading were removed from the final model.Perceived Support was omitted due to high cross-loadings with Choice Satisfaction[37].AVE denotes the average variance explained by an aspect, and α represents Cronbach's Alpha.

Table 5 .
Tabulation of the Rasch Scale of Energy-Saving Measures i