Strategic Evaluation Program

The Opportunity

Research consistently supports the fact that the most important in-school factor to student success is the effectiveness of the teacher in the classroom.

How are great teachers identified?
How do we help all teachers improve their craft?
How can we retain an effective and diverse teacher workforce?
How can districts think strategically about staffing so students with the greatest needs have access to the most effective teachers?

Answering these questions requires, in part, nuanced, accurate data about each teachers’ performance generated by a well-implemented strategic evaluation system.

The state-adopted evaluation system, Texas Teacher Evaluation and Support System (T-TESS), uses three core components and was designed to capture the holistic nature of teaching and offer a more nuanced evaluation of teacher effectiveness. However, while T-TESS applies multiple measures, and the tools and rubrics are developed well, the way districts implement T-TESS often does not meaningfully differentiate teacher performance. For example, in 2017-2018 over 73% of teachers under T-TESS were evaluated within the range of Proficient or better, making it difficult for districts to precisely inform feedback, improve professional development, provide increased compensation and advancement based on performance and provide rationale for teacher dismissals.

Fortunately, there are districts in Texas and elsewhere designing and implementing strategic evaluation systems that are differentiating ratings and using the data to achieve strong results. These strategic systems have proven to:

Increase retention of higher performing teachers
Improve student outcomes
Close opportunity gaps

While there is no one perfect system, there are components from these state and national examples that provide a helpful roadmap for districts looking to strengthen T-TESS or develop a locally approved strategic evaluation system.

Definition of a Strategic Evaluation System

A strategic evaluation system fairly and accurately evaluates teachers based on multiple measures, including student growth and student voice, and leads to meaningful differentiation of evaluation ratings (e.g., not all teachers are rated as effective). The evaluation data are used to inform critical human capital decisions such as strategic staffing, professional development, compensation, career pathways and hiring, supporting and retaining the best possible teachers.

The ideal components of a strategic teacher evaluation system contain (at a minimum) the following categories:

Components	Teacher Evaluation
Student Achievement	The use of multiple assessments to measure both (i) absolute student achievement and (ii) student achievement growth.
Observation	A combination of informal coaching and formal observations conducted by a principal, assistant principal or instructional leader at the school, and based on a rubric of educator and teacher behaviors.
Student Voice via Survey	A research-based perception survey that captures students’ feedback about their classroom experience.

For more detail on these components, please see Exploring: What Does a Strategic Evaluation System Include?

Toolkit Purpose and Orientation

The purpose of this Toolkit is to help school district leaders develop a strategic evaluation system for teachers in their district. This Toolkit includes an overview of how to leverage evaluation data, the components of a strategic evaluation system, how the system has been implemented in districts to date, and key takeaways from district leaders and researchers at multiple districts.

How to use this Toolkit:

Introduction - describes the rationale and opportunity for strengthening a multi-measure evaluation system.
Exploring - describes the benefits of using evaluation data, details about the components of a multiple measure evaluation system, and pathways for strengthening districts existing systems.
Implementing - describes how to implement the components of a strategic evaluation system including change management considerations.
Continuous Improvement - describes how district leaders can engage teachers and leaders in ongoing feedback cycles to continually improve the design and implementation of the evaluation system.

Lesson Learned

Takeaways for strengthening implementation and adjusting when things don't go according to plan.

If you are considering whether or not to invest the time in strengthening your teacher evaluation system, please review the Introduction and Exploring sections. If you have further questions or would like more detailed information, contact info@bestinclass.org.

If you are committed to improving your teacher evaluation system, please review the Implementing and Continuous Improvement sections.

Exploring

What Does a Strategic Evaluation System Include?

For districts to be able to make decisions based on evaluation data, they need a strategic evaluation system that fosters differentiation among evaluation outcomes. T-TESS offers a strong foundation that districts can build from to extend the impact of their evaluation systems. T-TESS is comprised of three components:

Goal-setting and professional development plan;
The evaluation cycle (including pre-conference, observation and post-conference); and
A student growth measure that is not required to be an objective student assessment.

Components (1) and (2) comprise 80% of the evaluation, while the student growth measure represents the remaining 20%. Embedded within component (1) and (2) is a detailed and comprehensive T-TESS rubric comprised of four domains: (1) Planning, (2) Instruction, (3) Learning Environment, and (4) Professional Practices and Responsibilities.

With the addition of the key components outlined below, school districts will be better equipped to make informed strategic human capital decisions

Student Achievement

What this is:

It is the use of multiple assessments that can be used to measure both (i) absolute student achievement and (ii) student achievement growth, either during the year and/or year-over-year. Assessments could include state standardized assessments, Measures of Academic Progress (MAP), I-Station, ITBS, or any other standard assessment used district-wide. The assessments must go through a district process to ensure the validity and reliability of the testing instrument.

How incorporating this strengthens T-TESS:

It requires a student growth measure that quantifies the extent to which a teacher contributes to his/her students’ learning over the course of the year.

Administrator Observation

What this is:

It is a combination of informal coaching and formal observations conducted by a principal, assistant principal or instructional leader at the school, based on a rubric of educator and teacher behaviors. Observers are trained or normed to accurately and consistently rate teacher practice according to the rubric.

How incorporating this strengthens T-TESS:

It includes an inter-rater reliability requirement to ensure evaluators are fairly and accurately rating teachers, such that observers observing the same teacher give him or her the same ratings on the observation rubric.

Student Perception Survey

What this is:

This is a way to capture student voice through a perception survey administered to students in grades 3-12. This should be a research-based survey that captures students’ feedback about their classroom experience.

How incorporating this strengthens T-TESS:

T-TESS does not include student survey data.

*Evaluation component weights should be adjusted for teacher type (i.e., a second-grade teacher will not have a student perception survey, so the other components weights will be adjusted accordingly).

For additional details on how other districts and states weight the components see the following resource.

For more details and the rationale for each of these components, please read below. For more details on implementing these components, see the Implementation Considerations section.

Student Achievement

Districts should include a student growth measure, in addition to a measure on absolute student performance, because it provides a richer, more comprehensive picture of student learning. A measure of absolute student performance indicates the performance of a student at one point in time while a student growth measure indicates the progress of a student over time. There are different types of student growth measures that provide different types of information. Below you will find details on three of the most common student growth measures. The first two measures are objective measures based on common assessments while the last is a subjective measure, dependent on teacher and principal judgement:

Student growth percentiles (SGPs): this is a measure that uses a student’s past performance to determine a student’s current performance compared to the student’s peers.
Value-added model (VAM): this is a measure that determines the impact of an educator or school on student learning and controls for factors outside of a teacher’s control that influence student achievement.
Student learning objectives (SLOs): this is a measure of student progress based on student growth goals set by teachers.

For additional details on how other districts and states weight the student achievement component see the following resource.

For details on additional student growth measures, such as growth tables, see the Growth Data: It Matters and It's Complicated by Data Quality Campaigns.

Student Growth Percentiles

What it is: Student growth percentiles (SGPs) show how a student’s achievement at the end of the year compares to other students who started at the same level at the beginning of the year.

What it does: SGPs indicate progress in terms that are familiar to teachers and parents. Typically, a teacher will be evaluated based on the median growth percentile (MGP), which is useful because it is not drastically altered by one or two students performing exceptionally well or low.

What it doesn't do: SGPs do not account for factors outside of test scores that may contribute to student learning. Additionally, the measure does not provide any information on student achievement relative to grade-level standards or account for variations in students or classes.

For additional information, please see the RAND report on student growth percentiles.

Implementing

Implementation Considerations

Effective implementation of a well-designed strategic evaluation system drives improvement in teacher practice and student learning. This section highlights important details for implementing:

Core Components Selection, Calculation and Weight
Calibration of Observers
Differentiated Compensation

Core Components Selection, Calculation and Weight

When implementing the core components of a strategic evaluation system, there are a variety of considerations related to each component. These considerations require a district to reflect on the context of their schools, educators and community along with the district structure and existing evaluation system.

When determining the weight of each component that will be used to evaluate educators, districts should consider the following questions:

What will be the overall size of the student achievement component?
What will be the overall size of the student survey component? (if applicable)
What will be the overall size of the observation component?
Will a school-wide measure be included for every teacher?
How will the inclusion and weights of the components vary for different teachers, i.e., tested vs non-tested subjects?
Is there a minimum number of days for a teacher to receive an evaluation?

Lesson Learned

Research indicates that student achievement should be at minimum 30%, with a target of 35% of an educator's total evaluation score.

Districts that include student survey or student voice in the overall evaluation typically weight this component from 5 to 15% of the overall evaluation. Research suggests that evaluation systems with student surveys are strong and more reliable than those that neglect incorporating student voice. (The Widget Effect; Making a Difference: Six Places Where Teacher Evaluation Systems are Getting Results). The following breakdown is recommended:

State Assessment: 10-15%
District Assessment: 15%
Other (Portfolio of student work, Student Learning Objectives (SLO), etc.): 5%

When determining what student growth measure will be used to evaluate student achievement, districts should consider the following questions:

What tests/data will be included in the student achievement component and will this require a change to protocol to include different assessments?
How can the student growth measure align with the district’s philosophy on assessment?
What student achievement and/or growth data does the district already collect?
What is the capacity of the district to collect, analyze and track student growth measures?
Is there a minimum start date or minimum attendance rate required for a student to be included in the calculations for a teacher?

Lesson Learned

The specific measure used and weight of each measure typically varies by the category a teacher falls within.

For example, a teacher in a non-tested subject area or grade may have a certain measure weighted less or more heavily. Additionally, a teacher that teaches a specific sub-set of students may also have differing weights and/or measures.

When determining the student perception survey and implementation process, districts should consider the following questions:

Will all grades receive the survey or will it be limited to grades 3 and higher?
Will students complete a survey for all teachers or will they submit feedback for a limited number?
Will every class be surveyed, for example, would a student complete a survey for their advisory period or a study hall?
Will surveys be administered to students in special education units?
What languages will you offer the survey in?
Will the survey be conducted online or paper? How many times a year will you conduct the survey? Will you average the scores?
Will there be a minimum length of time the student will need to have been assigned to the teacher’s class?
Will results be compared differentiated by school type (elementary, middle, high school) and/or by subject (core subject, elective)?
Will questions on the survey be available in advance?

Lesson Learned

Districts should put systems in place so a student does not complete a survey for every teacher.

For example, a district could have students only complete two student perception surveys a year and use a randomly generated list to select the students that will complete a survey for a teacher.

When determining the calculation approach to teacher's ratings, districts should consider the following questions:

Will the calculation of the overall scorecard calculation include a targeted distribution or have fixed cut points?
Will the calculation of individual components follow a targeted distribution or have set fixed cut points?

Target Distribution vs. Fixed Cut Points

Target Distribution: Target distribution sets a fixed percentage of teachers for each rating or point level and assigns cut scores for assessments based on this distribution. The use of a target distribution can help promote equity across grades and content areas because the cut score can be set independently for each assessment, mitigating for any differences in rigor of assessments. Additionally, once set, the distribution does not have to be adjusted if the test is changed. While the target distribution makes the evaluation process fair and sustainable, it limits how many teachers can be rated at the top, which can feel unfair to teachers and outside stakeholders.

A target distribution can also promote budget sustainability, if strategic compensation is included in the system design. A target distribution allows for budgeting models to predict the number of teachers who could receive salary increases, based on potential increase in performance rating.

Fixed Cut Points: Fixed cut points use pre-determined cut scores for each assessment and each overall performance level. While this method is more easily understood by teachers, providing a goal or a target for their individual performance, it does not allow for a district to adjust those targets if the outcomes of the student achievement component is different than expected. For example, if standards change for a STAAR exam, it may result in a harder exam and fewer than expected teachers reaching the higher performance levels.

Fixed cut points at the performance level can present a challenge for budget forecasting if strategic compensation is included in the system design strategic compensation. With fixed cut points, it is mathematically possible for all teachers to achieve the highest performance and therefore highest compensation level. If fixed cut points are used, other implementation parameters may be used to help ensure fiscal sustainability, such as a salary cap increase or a limit on the number of performance levels a teacher may advance each year.

Spotlight: Dallas ISD and IDEA Public Schools

Detail

Dallas ISD and IDEA Public Schools use a target distribution in their evaluation system to ensure that the system gives all educators an equal chance for success, no matter the grade or subject area. The district starts with a target distribution for student achievement. The target distribution is then applied to each assessment to determine the cut point. In contrast, IDEA public schools use fixed cut points in their evaluation system. They establish cutoffs for each assessment based on historical data. For budgeting the bonuses that are associated with achieving certain rating levels, IDEA again uses historical data and ensures enough budget cushion to allow for any variances.

Calibration of Observers

Calibration is one of the most important and challenging elements of implementation.

When determining the administrator calibration methods, districts should consider the following questions:

What is the district’s calibration/observation best practice recommendations?
How aligned are the outcomes of the observation component to the District’s definition of Excellence?
How aligned are the outcomes of the observation component to the student growth components?
How can inter-rater reliability be included as a part of the evaluator’s own observation?

Lesson Learned

Due to the large part observations play in many evaluation systems, and the tendency for scores to be inflated over time, a district needs to provide initial training to ensure calibration as well as take ongoing action to maintain consistent scoring.

One example of ongoing training are calibration exercises (e.g., watching and rating videos together or observing teachers and debriefing) with principals multiple times during the year. In addition to training principals, principal supervisors should be provided calibration training at the beginning of the year and throughout the year. Finally, campus calibration walks throughout the year with people from outside the campus can help ensure principal calibration. Research shows that outside observers are least likely to bring bias to observation ratings. Districts should monitor and audit observation data regular to check for observers that may not be normed and investigate for biases or inflation.

Spotlight: IDEA Public Schools

Detail

At IDEA Public Schools, Regional Directors analyze observation ratings data on a weekly basis via reports produced by their data platform. They look at how observers are rating and whether anyone is rating consistently high or low. If there are any flags in the data, Directors follow up with school leaders to understand what is happening at the school and explore whether additional training or support is needed for the observers.

Differentiated Compensation

Developing a differentiated compensation system based on the evaluation results requires careful consideration and preparation so that employees understand and embrace the transition to a new pay structure.

Increasing Base Salary or Providing a Stipend: A district may approach differentiated compensation by offering a change in base salary structure or as a stipend paid in addition to a traditional pay structure. A change to base salary signals a long-term commitment by a district, but may require implementation rules such as a targeted distribution to ensure it is fiscally able to sustain the proposed changes. A stipend allows a district to adjust the amounts each year based on the district’s available budget. This approach may increase the feeling among teachers that this is a temporary program, similar to other "pay for performance" stipends in the past.

Setting a Salary Floor: A district can set a salary floor so that an employee who was with district prior to the change in compensation will not see their salary decrease. This decision will impact how teachers who have prior salaries based on years of experience and/or degrees are hired and paid as they transition into a new district’s pay structure. A district may also consider providing a salary floor based on their starting salary with the district or may choose to not offer a salary floor, meaning the teacher’s salary will adjust based on the first year of performance in the district.

Setting a Salary Increase Cap: As the district transitions from a traditional pay scale, a cap on increase in salary each year can help ensure financial stability and predictability. Some districts have used a $5,000 cap on an increase each year for the first two years of a new compensation system.

Compensation Level Movement: Implementing rules that control for fluctuation in effectiveness level will provide stability to a teacher’s annual salary. With the addition of more complex student achievement measures, the final evaluation scorecards may not be available until late summer/early September, meaning teachers will need to be able to anticipate the salary for the coming year based on implementation parameters. Here are some examples of parameters:

Limits on compensation levels for early years teachers: Early years teachers may advance one level per year, and must begin at the first level.
Limits on number of compensation levels advance in one year: Teachers may advance one level per year. Teachers who are hired with three or more years of teaching will not have a level assigned until they have completed their first year in the district.
Limits on how many years of lower performance to have compensation level lowered: Teachers with multiple years of lower performance, will be lowered one level.
Averaging of two years performance to produce compensation level: When multiple years of evaluation data is available, an average of the last two years will be used to determine compensation level.

Spotlight: Dallas ISD

Detail

The compensation program in Dallas ISD bases teacher pay on evaluation level. One parameter they have set to control fluctuation is that a teacher’s pay will only decrease after three consecutive years of a lower evaluation rating. Additionally, the pay will only decrease by one level in that fourth year. Finally, they have developed a salary floor, which the salary can never drop below.

Financial Information

There are a number of ways to "find" existing dollars in a district’s general operating budget to fund a strategic evaluation system. For many districts that may have used a flat increase for all employees, it may consider using those dollars to fund the new strategic compensation approach. Some additional examples used by districts include:

Prioritizing the funding of strategic initiatives through cuts across the board by department (i.e., have each department prioritize their spending, and then as a district leadership team, determine what spending is not necessary or not a priority).
Analyzing all existing professional development contracts, particularly those under the umbrella "approved vendor" list to determine which vendors have programs that can show positive data for their performance and which cannot. This exercise has uncovered many expenditures that had zero to no known impact on students/campuses.

Lesson Learned

Prior to embarking on strengthening or creating a new evaluation system a district should assess its current capacity to collect, analyze and track the data necessary to determine accurate and understandable evaluations.

Many districts use partners to support them. For example, Lubbock ISD partners with SAS EVAAS and Batelle for Kids to aid in their value-added model and stipend awards.

Compensation/Budget Director: This person creates budget models for compensation scenarios. This is an existing role within districts. Communications: This person assist with stakeholder engagement and creates resources and forums to share information; this is a critical component in successful implementation. This is typically an existing department within a district. It is recommended to include this department to assist with the efforts. School Leadership: These individuals serve as a champion for the evaluation system. They help communicate the "WHY" to Principal and Teacher stakeholders and oversee implementation of the observation rubric, specifically the calibration of observers. Additionally, they participate in the continuous improvement process.

Lesson Learned

Engaging with school leadership early on in the process is important to ensure their input is included in the design and that they understand and support the evaluation system.

Many districts gather principals early in the process. The principals are than able to be advocates for change.

Legal/Employee Relations: These individuals participate in the modifications or the design of a new system to ensure compliance with local and state regulations. They provide guidance on implementation and individual components and update district policies related to staffing decisions (if applicable). This is a decision point for a district to decide how performance evaluations will be used in hiring decision. This can be an opportunity to change policies from using seniority as a determining factor in compensation to how include performance on evaluation system when considering staff reductions and in leveling decisions (when expected enrollment is not aligned to actual enrollment).Employee Relations: Any change to a performance evaluation system should include close coordination with both the employee relations department and the district’s legal department to ensure compliance with district policy and state education code. Changes could also result in an initial increase in grievances or lawsuits. Including employee relations and legal in the design and implementation can mitigate the exposure of the district.

Lesson Learned

A district should not rely on philanthropy or grants to fund increases to teacher salaries.

It is important to make a concrete commitment to the educators that their salary is funded out of the general fund, and can continue to be funded by the district.

TEA - Teacher Incentive Allotment

The Texas legislature passed a sweeping bill for school finance reform in the spring of 2019, including unlocking funds districts for a Teacher Incentive Allotment. For additional information on TIA and other components of House Bill 3, please visit the TEA webpage.

Organizational Capacity

As with any district initiative, the capacity of district departments to execute this work is critical for success. The performance evaluation team is frequently housed within Human Resources, supporting current evaluation system(s). As changes to the system are considered, the team supporting may need to grow and will need to expand beyond the Human Resources department to include the following roles:

Evaluation System Director: This person will serve as the champion for a strong multi-measure evaluation system, ensuring close-coordination of district departments involved in the work. They will oversee execution of evaluation cycle and lead continuous improvement cycle for program.
Evaluation & Assessment Data Director: This person oversees the calculation of student achievement and growth measures, the student survey administration and the production of overall evaluation scores. A district may choose to out-source this role and responsibilities, dependent on their capacity, to places such as ERG or SASS.

Lesson Learned

Many districts use partners to support them. For example, Lubbock ISD partners with SAS EVAAS and Batelle for Kids to aid in their value-added model and stipend awards.

Implementation Leadership Team

In addition to broad engagement with most departments in the district, a subset of leaders, the implementation leadership team, will be critical to identify and to charge with ensuring a coherent and smooth rollout of changes. The implementation leadership team should include the following members:

Project Lead – a person with the authority (either by existing role or Superintendent designee) to make decisions to ensure the process moves thoughtfully and at the appropriate speed.
Superintendent of Schools – It is imperative that the Superintendent is the public champion of the initiative to change the evaluation system, repeatedly reinforcing why multi-measure evaluation and strategic compensation is a priority for the district.
Superintendent Cabinet (Chiefs of every department) – The evaluation system has implications for every aspect of teaching and learning and therefore, each department chief, particularly those in charge of human capital, school leadership, assessment, communications and operations/finance will play a critical role in rolling it out and in reiterating the reasons for implementing changes. Aligned efforts and a united front will trickle down to all staff/teachers and pave the way for smoother implementation.
Campus Experts/ Representatives – Because evaluation systems are implemented by principals and teachers, it is valuable to have a representative or champion at each campus who understands and believes in the evaluation system and can support his or her colleagues to implement it with fidelity and use the system to improve feedback and professional development. These representatives can also provide valuable feedback to district leaders about how the system is or is not working well and what tweaks might improve the experience at the campus-level. This can be a multi-year, stipend position.

Lesson Learned

Bringing in data and research team members early on is critical for making sure the data generated by the evaluation system can be translated into accessible information and acted upon at the school and district level.

The evaluation data should be analyzed every year to surface systemic weaknesses or biases in the system. In Dallas, the evaluation team includes the Office of Institutional Research who provide the score cards for teachers and leaders to understand the data.

Continuous Improvement

A hallmark of successful strategic evaluation system is the ability to iterate and improve the system over time. After the initial implementation, a district must remain committed to assessing and improving the system based on evaluation data and feedback from educators and leaders. This requires a district to create a process for continuous improvement that is clear to all stakeholders. Through this process, feedback and data should continuously inform improvements and adjustments.

A district can obtain feedback and identify areas for improvement through methods that include focus groups, surveys, collaborative committees/panels and stakeholder meetings. In addition to feedback from educators, it is important for the district to understand the impact and efficacy of the system based on data. Districts should regularly review data to identify schools where student achievement scores and observation ratings are not aligned. This information should be used to make necessary system-level adjustments as well as inform where additional training and support may be needed. The ongoing analysis of evaluation data can also help mitigate for potential bias in observations. If district resources are limited, an outside partner, such as a university or a consulting organization, can provide additional capacity to solicit and incorporate feedback.

Spotlight: Denver Public Schools

Detail

Denver Public Schools uses educator feedback from teachers, principals and other stakeholders along to continuously improve their evaluation system. Through a variety of channels, such as surveys and a collaborative council, which includes individuals from the district and union, Denver Public Schools is able to obtain feedback and adjust the system accordingly. For example, in 2015 incentives for educators in the highest priority schools were implemented based on recommendations from teachers. Additionally, the district adjusted the scoring model for the student perception data in teacher evaluations based on teacher feedback; now, teachers are compared to similar types of teachers when scoring student perception data.