Subjective evaluation and bureaucratic performance: Experimental evidence from China

Uncertainty in the identity of an evaluator discourages evaluator-specific influence activities and significantly improves state employees’ performance

Evaluating the work performance of public employees is important to incentivise effort. Since measuring individual achievements is difficult and tasks are typically multiple and vaguely defined, evaluation is often subjective and delegated to supervisors (Finan et al. 2015). This, however, opens the door to evaluator-specific influence activities, whereby employees try to improve the evaluator’s assessment of their performances at the cost of reducing productive work that is beneficial to the organisation (Milgrom 1988). Influence can be exerted both through productive activities, such as putting extra effort into tasks that are more visible or more important to the evaluator, and non-productive activities, such as making personal favours to the evaluator and flattering their ego.

In our study (de Janvry, He, Sadoulet, Wang, and Zhang 2021), we investigate whether there are specific designs for the evaluation procedure that could reduce influence activities. If these designs are effective, it would allow us to provide evidence for the existence of influence activities and to measure their consequences on the organisation, something which has not been done rigorously before.

Introducing uncertainty in the evaluator’s identity

We address this question by conducting a large-scale field experiment in two Chinese provinces using the 3+1 Supports programme. This programme hires more than 30,000 college graduates every year to work as rural township government employees on two-year contracts. The junior government employees are referred to as college graduate civil servants (CGCSs). They engage in tasks to assist the local leadership in governing the township, taking advantage of their college education in skills such as information technology, agronomy, and health. Their performances are evaluated by the township leaders. Good evaluations can lead to permanent civil service contracts at the end of the programme, which are highly sought-after positions that lead many CGCSs to exert substantial efforts at pleasing the evaluating supervisor through superior work achievements and influence activities, with the latter potentially crowding out efforts at productive tasks.

A distinct institutional feature of the Chinese governance system is dual leadership. Currently, every public institution has both a political leader (representing the Communist Party) and an administrative leader (representing the constituency). At the level of rural townships, both leaders can assign tasks to a CGCS and both supervise the performance of the CGCS, with one of them responsible for evaluating each particular CGCS every year. We use this dual supervisor system to design a randomised controlled experiment whereby some CGCSs know ex-ante which leader will evaluate them (‘revealed scheme’, corresponding to the status quo) while others do not (‘masked scheme’, with uncertain evaluator identity introduced as treatment). We hypothesise that the revealed scheme might induce more evaluator-specific productive and non-productive influence activities, while the masked scheme induces greater official job achievements.

The field experiment consisted in randomising the two subjective evaluation schemes across 3,700 CGCSs. In the revealed scheme, we informed each CGCS, at the beginning of the evaluation year, which of the two supervisors was randomly selected to be their evaluator, so the CGCS knew where to eventually direct their influence activities in the hope of improving the evaluation outcome. In the masked scheme, the randomly selected evaluator’s identity was hidden until the end of the evaluation year, making it impossible for the CGCS to precisely target evaluator-specific influence activities. Under both schemes, the two township leaders were not informed as to whether they will be the evaluator or not of a particular CGCS until evaluation time arrives.

Evaluator uncertainty reduces influence activities

Result 1: The existence of influence activities is revealed by the large gap in assessment scores between the two officials (evaluator and non-evaluator) under the revealed scheme, while the gap disappears under the masked scheme.

With random assignment of evaluators, there should be no gap in assessments under the revealed scheme if there were no influence activities. As can be seen in Figure 1, this is not the case. The ‘evaluator edge’, i.e. the difference in scores between evaluator and non-evaluator, is large (0.22 points) under the revealed scheme while non-existent (0.02) under the masked scheme. This suggests that CGCSs did indeed engage in influence activities to improve their evaluation scores when they knew whom to target with such activities.

Figure 1 Evaluator edge by evaluation scheme

Note: “Evaluator edge” is the difference between evaluator and non-evaluator assessment score, both on a scale of 1 to 7. If both leaders were equally positive about the CGCS, the “evaluator edge” would be equal to 0.

Result 2: The existence of influence activities is revealed by the CGCS colleagues’ greater ability to predict which evaluator will be more positive about the CGCS based on their direct observation of activities under the two schemes.

While colleagues of the CGCS are not informed about the roles of the two supervisors, when asked to guess which of the two supervisors will give a more positive assessment about the CGCS, colleagues in the revealed scheme are significantly more likely to point to the evaluator (0.57). In contrast, colleagues do not expect a difference between the two supervisors under the masked scheme and are significantly less able to predict who the evaluator will be. This suggests that influence activities did occur when the evaluator was known to the CGCS and were at least partially observed by their colleagues.

Evaluator uncertainty enhances work performance

Result 3: Better work performance is revealed by CGCS colleagues’ scoring of behaviour when the identity of the evaluator is not revealed.

As evaluator identity is masked, the CGCS can reallocate efforts from evaluator-specific influence activities to work activities expected from a CGCS as part of their regular duties. These activities are valued by both leaders and observable to CGCS colleagues. As can be seen in Figure 2, when colleagues are asked to assign a performance score to the CGCSs, the average score they assign is significantly higher under the masked scheme (5.34 versus 5.12). The average supervisor score is also higher under masking. This suggests that keeping secret which leader will be the evaluator until the end of the evaluation period is of benefit to the governance unit to which the CGCS has been assigned.

Figure 2 Colleagues’ assessment of CGCS performance

Note: The outcome is the average assessment score of CGCS performance given by colleagues, on a scale of 1 to 7.

Result 4: Work performance in activities where behaviour is rewarded with extra pay shows that CGCSs achieve higher pay under the masked scheme.

Some behavioural patterns in work activities such as overtime and participation in night-time shifts are rewarded with extra pay. On average, the CGCSs in the masked scheme earn an additional 2.4% per month as compared to their counterparts in the revealed scheme, suggesting improved work performance when the employee is not distracted by influence activities.

Conclusion

Our study provides the first rigorous evidence on the existence of influence activities associated with subjective supervisor evaluation of employee performance, and the attendant implications for work efforts. Masking the identity of the evaluating supervisor is a simple institutional design that can have very large benefits. Not only can it potentially improve the performance of some 50 million state employees in China, but there are many situations across the world where multiple supervisors are involved in overseeing public employees (power duality) (Li 2019) or private employees (management pairs) (Williams and Scott 2012) and the institutional design analysed here can be adopted.

References

de Janvry, A, G He, E Sadoulet, S Wang and Q Zhang (2021), “Influence Activities and Bureaucratic Performance: Evidence from a Large-Scale Field Experiment in China”, Working Paper.

Finan, F, B Olken and R Pande (2015), “The personnel economics of the state”, NBER Working Paper No. w21825.

Li, W (2019), “Meritocracy and Dual Leadership: Historical Evidence and an Interpretation”, Department of Economics, Monash University.

Milgrom, P (1988), “Employment Contracts, Influence Activities, and Efficient Organization Design”, Journal of Political Economy 96 (1): 42–60.

Williams, D and M. Scott (2012), “Leadership Teams: Why Two Are Better Than One”, Harvard Business Review.

Supported by

Subjective evaluation and bureaucratic performance: Experimental evidence from China

Alain De Janvry

Guojun He

Elisabeth Sadoulet

Shaoda Wang

Qiong Zhang