Executive Abilities: Measures and Instruments for Neurobehavioral Evaluation and Research (EXAMINER)

User Manual 3.6

Acknowledgements
Chapter 1. Mission
Chapter 2. Project Structure
Chapter 3. Background
Chapter 4. Framework
Chapter 5. Tasks
Chapter 6. Data Collection
Chapter 7. Software
Chapter 8. Psychometrics
Chapter 10. Validity
Chapter 11. Variables
Chapter 12. Administration
Chapter 13. Scoring
Chapter 14. Known Issues
References

Acknowledgements

The EXAMINER project represents the combined efforts of many individuals to whom we are very grateful.

Our external advisers were generous with their time and wisdom throughout the project. Kimberly Espy, Josette Harris, Steven Hinshaw, Robert Knight, David Knopman, Paul Malloy, Jennifer Manly, Dan Mungas, Ron Ruff, Tim Salthouse, Donatella Scabini, and Phil Zelazo all made immensely valuable contributions.

The team at NIH, Emmeline Edwards, Helene Braun, and Laurie Leonard, provided able stewardship, guidance, and assistance, helping us navigate through a great many scientific and administrative challenges.

We are extremely grateful to the PIs and coordinators at all of the data collection sites, including Ramon Diaz-Arrestia, Kimberly Espy, Robert Knight, Dan Mungas, Celiane Rey-Casserly, Jeffrey Schatz, Glenn Smith, Gerry Taylor, and Daniel Tranel. We were fortunate to collaborate with so many gifted, generous, and capable colleagues.

The UCSF team members each contributed their own unique brand of creativity, hard work, and brilliance to make this project happen. Special thanks go to Blaire Benavides, Caroline Becerra, Ashley Berhel, Lauren Gritzer, Caroline Prioleau, Lena Sinha, Susan Verde, and Mary Widmeyer. Katherine Possin was instrumental in all aspects of our working memory tasks, and Katherine Rankin took the lead on the social cognition measures. Adam Boxer played a key role in our eye movement task. Howard Rosen assisted with the insight measures. Katrin Schenk inspired novel analytic approaches to random number generation. Bruce Miller, Ari Green, and Chad Christine facilitated subject recruitment and offered ongoing advice. Kathleen Drew was the backbone of our administrative team, and John Neuhaus and Alan Bostrom provided valuable biostatistical input at every step. Joe Hesse and Charlie Toohey built the best data management system imaginable and programmed the computer tasks and scoring systems.

Special thanks go to Dan Mungas, who, in addition to being an adviser and a site PI, led our efforts to use item response theory for scale construction.

Finally, we are indebted to the patients and control subjects who willingly underwent hours of testing, and to their families, who also assisted in many ways. Measurement tools are built to help us understand the effects of disease and aging, and our patients remain our ultimate inspiration.

Chapter 1. Mission

In May 2005, the NINDS issued a call for proposals to develop domain-specific methods for defining and measuring executive functioning. Our team at the University of California, San Francisco, led by Joel Kramer, PsyD, was fortunate to be awarded the contract. The project, entitled Executive Abilities; Methods and Instruments for Neurobehavioral Evaluation and Research (EXAMINER), commenced on December 30, 2005.

Executive function refers to a constellation of cognitive abilities that include planning, organizing, self-monitoring, and managing multiple tasks simultaneously. Despite its importance for clinical and neuroscience research, the paucity of valid and reliable tasks that specifically tap executive functioning remains a significant obstacle to research in this area.

An executive function battery should enable clinical investigators to assess executive functions reliably and validly across a variety of ages and disorders in cross-sectional and longitudinal studies. Such a battery should reflect both cognitive and non-cognitive behaviors and bear some relationship to day-to-day executive functioning. In addition to these underlying themes, we believe that a successful executive function battery should have the attributes described below.

Modular. Executive functioning encompasses multiple domains of behavior, such as generation, set-shifting, working memory, inhibition, concept formation, and social behavior. Separate modules that allow quantification of each relevant domain should be available to researchers for individual or collective use, as needed.

Modifiable. The specific needs of individual clinical investigations can vary quite considerably in terms of the types of executive tasks, the age and level of impairment of the study sample, and the overall study design. Any standard battery of executive tasks must be flexible enough to be adapted to a range of experimental and clinical situations.

Efficient. In most research settings, executive functioning will be only one of several data points required. To be maximally useful, an executive function battery will need to be very efficient, providing reliable and valid test data in as brief a period as possible.

Applicable to a broad range of subjects in terms of age and ethnicity. Very few standardized neuropsychological instruments are available that can be used with both pediatric and adult populations. In addition, executive tasks, like most cognitive measures, tend to be culture and language-specific, and their utility for administration in other languages has not been established. As a result, measurement bias is central to the cross-cultural application of neuropsychological tests. Executive measures should detect cognitive change, and an unbiased measure should be equally sensitive to change in individuals from different age and ethnic groups.

Psychometrically robust. Extant executive measures are often criticized for low or unmeasured reliability, questionable relationships with frontal lobe injury or real-world behavior (Heflin et al., 2011; Stuss et al., 1995), or skewed distributions. Good psychometric properties are a key element for measures designed for clinical trials, as is the availability of equivalent alternate forms.

Chapter 2. Project Structure

EXAMINER was developed in two general phases. Phase I lasted 2 years and focused on battery development. In the first year of Phase I, the UCSF team was built, and a website was created to facilitate communication to NIH and the public (examiner.ucsf.edu). The UCSF team extensively reviewed the literature on executive functioning and posted it on the website. A team of external advisers was formed, with the first meeting held in San Francisco on April 24, 2006. Finally, the advisers and experts in the field were surveyed on what they felt were the highest priorities for battery development. These results are also summarized on the website.

Priorities identified by the NINDS, the external advisers, and the survey of experts were to: 1) have a brief (30–40 minute) battery designed for clinical trials with alternate forms and a single composite score; 2) create a menu of tasks from which investigators can select to meet specific research goals; 3) use non-copyrighted tasks that NIH could distribute freely; and 4) validate the battery by demonstrating a relationship with real-world markers.

During Phase I’s second year, attention shifted toward defining the conceptual framework for the EXAMINER battery, selecting extant executive paradigms from the research and clinical literature, and developing novel tasks. Extensive piloting was conducted at UCSF and UC Davis (Dan Mungas, PI), and tasks were continually revised. Record forms, test stimuli, software for computerized tasks, and training materials were created. Translation of test materials was carried out by a professional translation service, with back-translation. Traditional neuropsychological measures such as Trail-Making, Stroop Interference, WAIS-III Digit Symbol, D-KEFS Design Fluency, and the Wide Range Achievement Test-III Reading subtest were added to the battery as control measures. The Frontal Systems Behavior Scale™ (FrSBe) and the Behavior Rating Inventory of Executive Function® (BRIEF), copyrighted informant-based questionnaires, were added as measures of day-to-day executive functioning and behavior. Concurrently, the information technology team under the direction of Joe Hesse began work on a web-based data management system for use during the data collection phase. Finally, subcontract sites for data collection were identified, and the contracts and grants process was initiated to enable sites to begin data collection in January 2008. A second advisory meeting took place in San Francisco on November 1, 2007. Additional advisory meetings took place on June 24, 2009, and February 9, 2011.

Phase II was initially designed for 2 years of data collection; a third year was later added, and EXAMINER exceeded its original recruitment goals. Data collection is described in more detail in Chapter 6. Several approaches to data reduction were piloted, and item response theory was ultimately selected as the best method for generating a smaller set of meaningful scores with ready application in research and clinical trial settings.

Chapter 3. Background

Executive abilities, widely accepted as a central component of human cognition, reflect the capacity to engage in goal-directed behavior. To evaluate executive abilities, clinicians have tended to emphasize constructs like fluency, working memory, concept formation, set shifting, and inhibition, using a range of cognitive measures. Test batteries specifically designed to measure executive functioning have been developed, including the Frontal Assessment Battery (FAB) (Dubois, Slachevsky, Litvan, & Pillon, 2000), Executive Interview (EXIT25) (Royall, Mahurin, & Gray, 1992), Behavioral Assessment of Dysexecutive Syndrome (Wilson, 1996), Cambridge Neuropsychological Test Automated Battery (CANTAB) (Robbins et al., 1994), and the Delis-Kaplan Executive Function System (Delis, Kaplan, & Kramer, 2001). Espy (Espy & Cwik, 2004; Espy, Kaufmann, & Glisky, 2001; Espy, Kaufmann, McDiarmid, & Glisky, 1999; Espy et al., 2002) and others (D. C. Delis et al., 2001; Delis, Kramer, Kaplan, & Holdnack, 2004; Korkman, Kemp, & Kirk, 2001) have thoughtfully extended assessment of executive function to younger children. Methods for assessing frontally mediated neuropsychiatric symptoms and dysexecutive syndromes have also been developed (Cummings et al., 1994; Malloy & Grace, 2005).

Despite the proliferation of executive measures, there is little agreement about what the primary executive abilities are, how they are organized, what the underlying neuroanatomy is, or how they should best be measured. There are also no widely accepted unified models of executive functions. Nonetheless, several key concepts have been proposed, with significant influence on our understanding of executive control. For example, Shallice has proposed that the frontal lobes organize a Supervisory Attention System that distinguishes between routine tasks for which contention scheduling is sufficient and novel problems that require more top-down control (Shallice & Burgess, 1996). Stuss et al. (1995) have refined and extended this model to incorporate a range of anterior attentional functions. Miyake and colleagues (2000) reported three different executive functions, shifting, inhibition, and updating, that were modestly correlated but separable. Updating, as Miyake views it, overlaps with the broader concept of working memory.

Models of working memory have included a “central executive”, mediated largely by dorsolateral prefrontal cortex (Baddeley, 2002; Baddeley & Della Sala, 1996), and sub-processes that include storage/maintenance, rehearsal, interference control, inhibition, and scanning functions (D'Esposito et al., 1995; D'Esposito et al., 1999).

Decision-making, reward processing, self-regulation, and inhibition are additional key components of executive functioning that have been studied experimentally, and their relationships with medial, ventral, and dorsolateral prefrontal structures are being defined (Bechara, Damasio, Tranel, & Anderson, 1998; Levine et al., 2000; McDonald, Ko, & Hong, 2002; Shallice & Burgess, 1991).

It has also become clear that simple paper-and-pencil tasks will not always capture real-life social and executive deficits. Tasks that capture deficits in executive control in social cognition are needed.

The psychometric properties of executive tasks pose yet another challenge for clinical investigators seeking to measure executive functioning. Construct validity refers to how well an instrument measures what it purports to measure. Importantly, clinical neuropsychological instruments have been criticized for being multifactorial, drawing on several non-executive component skills. In fact, listed among the top 20 “executive tasks” in a survey of neuropsychologists by Rabin et al. (2005) were a memory task (CVLT) and visuospatial tasks such as clock drawing, Rey-Osterrieth Complex Figure, and block design. While no one would argue that executive skills were irrelevant to performing these tests, it is not possible to untangle the various component skills. Not surprisingly, the ability of these tasks to differentiate between frontal and non-frontal patients is fraught with error (Anderson et al., 1991; Berman et al., 1995; Dunbar & Sussman, 1995; Manchester, Priestley, & Jackson, 2004).

Another psychometric issue is test-retest reliability. This has particular importance for clinical trials where researchers must be able to attribute change in cognition to the intervention, and not poor reliability or practice effects (Beglinger et al., 2005; Bowden, Benedikt, & Ritter, 1992).

Most current measures of executive function also have limited cross-cultural application. Often the stimuli require reasonable mastery of English (e.g., Similarities, Stroop, DKEFS Card Sorting), or they are culturally based (e.g., proverb interpretation). Executive tasks are also highly correlated with education, and educational levels vary across ethnic groups. Even when tasks appear to be readily translatable, assessment of differential item functioning has revealed item bias (Marshall et al., 1997). While progress is being made (Chan et al., 2002; Chan & Manly, 2002; Chan et al., 2003; Mungas et al., 2004; Rodriguez del Alamo et al., 2003), the shortage of validated clinical measures that are applicable across ethnic and language groups poses a major obstacle to clinical research.

In sum, despite the wealth of available instruments, there are continued concerns about psychometric properties, validity, applicability to settings and populations other than those for which the tests were developed, suitability for all ages and non-English-speaking subjects, and adaptability for clinical trials. There is also no consensus on the primary components of executive functioning or on how they should be operationalized. There remains a compelling need for a battery of tests that can be routinely integrated into neurobehavioral research and reliably and validly measure constructs that clinical investigators agree are important.

Chapter 4. Framework for Current Battery

This proposal to develop psychometrically robust measures of executive function was guided by three basic premises that influenced the initial stages of task selection. First, the term “executive function” is broad and requires breaking down into smaller conceptual units. Second, because executive abilities are measured using tasks that require multiple abilities, methods are needed to parse the executive component from other skills. Finally, executive function encompasses both cognitive and non-cognitive behaviors. A multimodal approach using cognitive and observational methods is necessary to capture the broad range of deficits seen in patients with executive dysfunction.

Sub-Components of Executive Function

The first underlying premise is that the term “executive functioning” is an overarching rubric that encompasses multiple domains that are mediated by different neural structures and networks. Various executive abilities include maintaining and manipulating information, temporal organization, set shifting, self-monitoring, concept formation, fluency, inhibition, motivation, organization, and planning. There is no single “executive function” that investigators can turn to when conducting clinical studies. In many instances, patients with deficits in executive functioning will perform well on certain domains but poorly on others.

The Role of Non-Executive Skills

The second underlying premise is that these executive abilities are typically measured using heterogeneous tasks that require multiple non-executive skills. Most tasks designed to assess some aspect of executive functioning also involve varying degrees of information processing speed, working memory, motor speed, language processing, spatial processing, and fundamental perceptual and motor skills.

Broad-based and Novel Strategies

The third underlying premise of this proposal was that a comprehensive approach to executive functioning requires broad-based and novel assessment strategies to capture cognitive and non-cognitive behaviors.

Current Framework

Our ultimate approach to developing the EXAMINER battery was to integrate the cognitive literature on executive functioning with the clinical literature on the sequelae of frontal injury. We selected Miyake’s model as the core conceptual structure for battery design, and targeted tasks that measured mental set shifting, information updating and monitoring, and inhibition of pre-potent responses. To this core set of constructs we added fluency, planning, insight, and social cognition and behavior.

Chapter 5. Description of Tasks

This chapter provides a descriptive overview of each EXAMINER measure by domain, along with a brief discussion of the administration time required. We include all the tasks for which we collected data during the field testing phase, even if they were not included in the final battery. In addition, after data were collected on the first 800 subjects, several tasks were carefully analyzed to see if they could be shortened without losing any information. Any modifications to the tasks are also described.

Domain: Working Memory

Dot counting

The dot counting task measures verbal working memory. The examinee is asked to look at a screen with a mixed array of green circles, blue circles and blue squares. The examinee is asked to count all of the blue circles on the screen one at a time, out loud and remember the final total. Once the examinee finishes counting the blue circles on one screen, the examiner switches the display to a different mixed array of green circles, blue circles and blue squares. The examinee is instructed to count the blue circles in the new display. The number of different displays presented to the examinee in each trial increases from two to seven over six experimental trials. After counting the blue circles on all of the displays presented within a trial, the examinee is asked to recall the total number of blue circles that were counted in each of the different displays in the order in which they were presented. Partial credit is given based on how many totals the examinee can recall correctly from each trial.

In an effort to reduce patient burden and time of administration, the number of trials administered was reduced by half. Initially this task consisted of two trials at each series length, two through seven. After observing that the correlation between the mean percent correct for all 12 trials (out of 54 total points) and the mean percent correct for trials 1, 3, 5, 7 & 9 (out of 27 total points) were highly correlated, we eliminated the second of the two trials for each series length. This resulted in the task being reduced from 12 trials to six trials with possible scores ranging from 0 to 27.

N-back

The n-back paradigm is a widely used measure of working memory that requires flexible updating capabilities. EXAMINER includes a spatial 1-back and 2-back task to assess spatial working memory. The 1-back requires maintaining and updating 1 location at a time, whereas the more difficult 2-back requires maintaining and updating two locations.

During both the 1-back and the 2-back, the examinee is shown a series of 2.4 cm white squares that appear in 15 different locations on a black screen. Each square is presented for 1000 msec. All of the locations are equidistant from the center of the screen. During the 1-back, the examinee is instructed to press the left arrow key whenever the square is presented in the same location as the previous one and the right arrow key if the square is presented in a different location as the previous one. Responses should be given as quickly as possible while trying to maintain accuracy throughout the trials. The next square appears on the screen after each response is given. A number (varying from 1–9, selected randomly) appears in the center of the screen 500 msec after each response and remains on the screen for 1000 msec. The examinee should say this number out loud immediately when it appears on the screen before responding to the next square. The 1-back consists of one block of 30 trials, ten of which match the location of the previous square, and 20 that are in a different location. The stimuli are presented in a fixed order for all participants.

During the 2-back, the examinee is instructed to press the left arrow key whenever the square is presented in the same location as the square two squares before and the right arrow key if the square is presented in a different location as the square two before. The 2-back consists of one block of 90 trials, 30 of which match the location of the square two before, and 60 that are in a different location. The squares are presented in a fixed order for all participants.

Domain: Inhibition

Flanker

The examinee is instructed to focus on a small cross in the center of the screen. After a short variable duration (1000 msec–3000 msec), a row of five arrows is presented in the center of the screen either above or below the fixation point. The duration of the stimuli presentation for each trial is 1000 msec.

The examinee is then required to indicate whether the centrally presented arrow is pointing either to the left or right by pressing the left or right arrow key. The examinee is presented with two different conditions during the task, incongruent and congruent. In the congruent trials, the non-target arrows point in the same direction as the target arrow and in the incongruent trials they point in the opposite direction. Examinees should respond as quickly and accurately as possible. The stimuli are presented in a random order with each condition being presented 24 times resulting in 48 total trials.

Initially the Flanker task consisted of 64 experimental trials, 32 of which were congruent and 32 of which were incongruent. In order to determine how many trials we would be able to eliminate without compromising precision, we ran correlations between the median RTs for k number of trials (where k is 1 through 32) and the median RTs resulting from all 32 trials. This was done separately for both the congruent and the incongruent conditions. We observed that when k equals 24, the mean correlation between median reaction times for both congruent and incongruent trials showed a correlation greater than .95 with the median RTs for the full 32 trials. In addition, the difference in median RT between a 24-trial task and a 32-trial task was negligible. This led us to eliminate 8 trials from each condition, reducing the total number of trials down from 64 to 48.

Continuous Performance Test (CPT)

The continuous performance task is a classic response inhibition task, which requires subjects to respond to a certain type of stimulus and withhold a response to another.

The examinee is presented with different images in the center of the screen instructed to press the left arrow key for only the target image (e.g., a white five-pointed star in Form A), responding as quickly and accurately as possible. The task consists of 100 experimental trials, 80% of which were the target image. The five non-target images that were presented were of a similar shape and of comparable size to the target for each form.

Anti-saccades

This is an eye movement task. There are three blocks of trials in which subjects look at a fixation point in the center of a computer screen and move their eyes upon presentation of a laterally presented stimulus. In the first block (pro-saccade), subjects are instructed to move their eyes in the direction of the presented stimulus. In the second and third blocks (anti-saccade), subjects are instructed to move their eyes in the opposite direction of the presented stimulus.

Dysexecutive Errors

An underlying assumption in developing the EXAMINER battery is that executive related deficits can manifest as impulsive errors, failure to shift set, perseverative behavior, and stimulus-boundedness, even when achievement scores on tests are unremarkable. Accordingly, we generated a composite error score using several EXAMINER tasks. This composite includes false alarm responses on the CPT, rule violations on the verbal fluency tasks, the tendency to make errors on Flanker incongruent trials relative to congruent trials, the tendency to make errors on the Set Shifting shift trials relative to the non-shift trials, and the total score on the Behavior Rating Scale.

Domain: Set Shifting

Set Shifting

Participants are required to match a stimulus on the top of the screen to one of two stimuli in the lower corners of the screen. In task-homogeneous blocks, participants perform either Task A (e.g., classifying shapes) or Task B (e.g., classifying colors). In task-heterogeneous blocks, participants alternate between the two tasks pseudo-randomly. The combination of task-homogeneous and task-heterogeneous blocks allows measurement of general switch costs (latency differences between heterogeneous and homogeneous blocks) and specific switch costs (differences between switch and non-switch trials within the heterogeneous block).

Domain: Fluency

Phonemic Fluency

For the phonemic fluency task, examinees are instructed to name as many words as they can that begin with a particular letter of the alphabet as quickly as they can. Sixty seconds are allowed for each letter. The examinee is instructed that names of people, places and numbers are not acceptable responses. Grammatical variants of previous responses (plurals, altered tenses, and comparatives) are also not acceptable responses. All responses should be recorded by the examiner. The number of correct responses, repetitions and rule violations are then totaled for each letter.

Category Fluency

For the category fluency task, examinees are asked to generate as many items that they can think of that belong to a particular category as quickly as possible. Sixty seconds are allowed for each category. All responses are recorded by the examiner. The number of correct responses, repetitions and rule violations are totaled for each category.

Domain: Planning

Unstructured Task

This task was modeled after the 6-elements test (Shallice & Burgess, 1991). Subjects are presented with three booklets, each containing five pages of simple puzzles (4 per page). The puzzles were designed to be cognitively simple (e.g., connect the dots; trace the design) but average completion times range from 4 to 60 seconds. Each puzzle has a designated point value, and subjects are given 6 minutes to earn as many points as possible. Irrespective of actual point value, puzzles can have a high cost-benefit ratio (i.e., the time required to complete the puzzle makes it less desirable) or a low cost-benefit ratio (i.e., the time required to complete the puzzle makes it more desirable). In addition, the proportion of low cost-benefit items decreases as subjects proceed through a booklet. Subjects need to plan ahead, avoid items that are strategically poor choices, and be cognizant of when a particular booklet offers diminishing returns.

Domain: Insight

Insight

Examinees are asked to rate themselves on their performance immediately after completing the well-normed verbal fluency tasks. Before the fluency tasks begin, the examinee is informed that after performing the task they will be asked to assess their performance. They are instructed to assess their own performance relative to a hypothetical sample of 100 people of a similar age and level of education. After the fluency task is complete, they are shown a picture of a bell curve with corresponding percentile rankings at the bottom of the page. They are then reminded that, on a typical task, the majority of healthy age-matched peers would perform at the 50th percentile, with smaller numbers performing above or below average.

Domain: Social Cognition and Behavior

The Social Norms Questionnaire

This task measures subjects’ crystallized knowledge of social norms in a linguistically and cognitively simple manner. This yes-no questionnaire is designed to determine the extent to which subjects understand and can accurately identify implicit yet widely accepted social boundaries in the dominant U.S. culture. The Social Norms Questionnaire includes both socially inappropriate behaviors (e.g., “Cut in line if you are in a hurry,” “Pick your nose in public,” and “Wear the same shirt every day”) and generally acceptable behaviors (e.g., “Tell a coworker your age,” “Blow your nose in public,” and “Eat ribs with your fingers”). The subject must decide whether the behavior is socially appropriate or not if it were hypothetically enacted with an acquaintance or coworker.

Two subscales are derived that represent a) whether the subject errs in the direction of breaking a social norm, the “Break” score; or b) in the direction of interpreting a social norm too rigidly, the “Overadhere” score. There are 22-item questionnaires for adults and 30-item questionnaires for children. This measure also has an alternate form for test-retest purposes.

Behavior Rating Scale

This rating scale is completed by the examiner after completion of the testing. Examiners restrict their ratings to behaviors that they have observed directly, but include all observed behaviors, regardless of the context. Thus, although behaviors during the actual assessment will likely provide the bulk of data, examiners should also note behaviors exhibited in all other situations, such as the waiting room and walking to and from the exam room. There are nine behavioral domains to rate, including agitation, stimulus-boundedness, perseverations, decreased initiation, motor stereotypies, distractibility, degree of social/emotional engagement, impulsivity, and social appropriateness.

Chapter 6. Data Collection

A central goal of the EXAMINER project was to develop a battery that could reliably and validly assess executive functions across a wide range of ages and disorders. Data collection targets were established with this goal in mind by the EXAMINER advisory panel, NINDS focus groups, NINDS Project Officer, and the UCSF team. Piloting and ongoing data collection were conducted utilizing the large research infrastructure at UCSF and in collaboration with nine remote sites to represent a full range of geographic regions, ethnic groups, age groups, and diagnostic disorders.

Diagnostic Groups

The diagnostic categories below represent neurological and neuropsychiatric syndromes associated with executive deficits. Data were also collected on healthy subjects across the age span.

Adults and children with the following neurological conditions or neurodegenerative disorders are represented in the EXAMINER battery dataset:

Attention deficit hyperactive disorder (ADHD)
Alzheimer’s disease (AD)
Focal lesions
Behavioral variant frontotemporal dementia (bvFTD)
Huntington’s disease (HD)
Mild cognitive impairment (MCI)
Multiple sclerosis (MS)
Parkinson’s disease (PD)
Progressive supranuclear palsy (PSP)
Sickle-cell anemia
Traumatic brain injury (TBI)
Very low birth weight (VLBW)

General Inclusion Criteria

Participants had to be between 3–90 years old
Fluent English and/or Spanish speakers
Subjects unable to consent required an informant

General Exclusion Criteria

Current alcohol abuse or dependence
Current drug abuse
Psychiatric disorder (apart from specified groups)
B12 deficiency or metabolic syndrome
Hypothyroidism (TSH >150% of normal)
Known HIV
Renal failure
Respiratory failure
Significant systemic medical illnesses
Medications likely to affect CNS functions

Subject Diagnostic Criteria

Inclusion and exclusion criteria for the diagnostic categories followed classifications as reflected in the following guidelines. Principal investigators at each site reviewed and confirmed subject diagnoses.

Diagnosis	Criteria
Alzheimer’s Disease (AD)	McKhann et al, NINCDS-ADRDA Criteria (1984)
Attention Deficit Hyperactive Disorder (ADHD)	Diagnostic and Statistical Manual (DSM-IV; 1994). Age range of 7–16 yrs and IQ > 90. Additional Exclusion Criteria: Comorbid mental retardation (MR), neurological diagnosis, psychiatric condition, or major learning disability.
Behavioral Variant Frontotemporal Dementia (bvFTD)	Neary et al Criteria (Neary et al., 1998)
Focal Lesion	Lateral frontal, ventromedial frontal, non-frontal, and basal ganglia lesions secondary to ischemic stroke, tumor, or focal injury.
Huntington’s Disease (HD)	HD mutation, gene positive
Mild Cognitive Impairment (MCI)	Petersen et al (2001). Subtypes: MCI-mem and MCI-exec.
Multiple Sclerosis	McDonald Revised Criteria (Polman et al., 2005)
Parkinson’s Disease	Albanese Criteria (2003)
Progressive Supranuclear Palsy (PSP)	Litvan et al, NINDS-SPSP Criteria (1996)
Sickle Cell Anemia	Confirmed diagnosis; age 8–17 yrs
Traumatic Brain Injury (TBI)	Moderate to severe (GCS < 12); age 18–50 yrs; injury ≥6 months prior
Very Low Birth Weight (VLBW)	<1000 grams and/or <28 weeks gestational age; age 10–12 yrs

Data Collection Sites

The EXAMINER battery was administered at nine collaborating sites across the country. The final dataset includes adults and children, Spanish and English speaking, across a wide range of diagnostic cohorts.

Adults

The Memory and Aging Center at the University of California-San Francisco (PI: Joel Kramer, PsyD) is the largest source of EXAMINER battery data. UCSF administered the battery to 249 people, including longitudinal follow-up in some cases.

The University of California-Berkeley’s Helen Wills Neuroscience Institute (PI: Robert Knight, MD) administered the battery to 45 people, including lesion subjects and controls.

The Mayo Clinic Alzheimer’s Disease Research Center (PI: Glenn Smith, PhD) administered the battery to 79 people, including AD, bvFTD, MCI, and Parkinson’s disease subjects.

The University of Iowa (PI: Daniel Tranel, Ph.D.) administered the battery to 87 people, including lesion subjects and controls.

The University of Texas Southwestern Medical Center (PI: Ramon Diaz-Arrastia, MD, PhD) administered the battery to 33 subjects, including TBI subjects and controls.

Spanish Language

The University of California-Davis (PI: Dan Mungas, PhD) administered the battery to 180 older, predominantly healthy, Spanish-speaking subjects.

Children

The Developmental Cognitive Neuroscience Laboratory at University of Nebraska-Lincoln (PI: Kimberly Espy, PhD) administered the battery to 207 children.

Boston Children’s Hospital (PI: Celiane Rey-Casserly, PhD) administered the battery to 41 children, including ADHD subjects.

Case Western Reserve University (PI: H. Gerry Taylor, PhD) administered the battery to 72 children with very low birth weight.

The University of South Carolina (PI: Jeffrey Schatz, PhD) administered the battery to 117 individuals, including those with sickle cell anemia.

Chapter 7. Software Installation and Administration

The EXAMINER battery includes software for administering computer-based tasks and for generating executive composite and factor scores. The EXAMINER battery software is designed to work on multiple operating systems and to use open-source, readily available software.

Requirements

The EXAMINER software requires the following minimum hardware:

2GB RAM (Windows XP/Linux: 1GB RAM)
2 GHz Intel Core 2 Duo processor (Windows XP/Linux: 1.6 GHz Pentium M processor)
At least 14” diagonal display
Keyboard input device
500 MB available hard drive space
0.25 MB available hard drive space per administration

The EXAMINER software requires the following minimum software:

Windows XP (Service Pack 3), Windows 7, Apple OS X 10.6, or Ubuntu 10.4
PsychoPy Version 1.73.05
R (Statistical Software) Version 2.14

Acquiring and Installing Software Dependencies

The EXAMINER battery is distributed with installation files for PsychoPy and R. These files are located in the SOFTWARE directory of the EXAMINER battery distribution and organized by operating system.

The EXAMINER battery has been tested with software versions available as of April 15, 2011. Use of updated versions may require additional testing and verification.

You will complete software installation in five steps:

Install PsychoPy and update to version 1.73.05
Install EXAMINER computer tasks
Configure EXAMINER computer tasks
Run EXAMINER computer tasks
Install R

Install PsychoPy

PsychoPy is an open-source application to allow the presentation of stimuli and collection of data for a wide range of neuroscience, psychology and psychophysics experiments. The EXAMINER battery computer tasks are designed to run within the PsychoPy application, and PsychoPy must be installed on every computer that will run the EXAMINER battery computer tasks.

As of April 15, 2011 the current web site for acquiring the PsychoPy software is www.psychopy.org, and the current released version of PsychoPy is 1.73.05. For Windows and OS X operating systems, the easiest way to install PsychoPy is to download the current “Standalone” version of PsychoPy (1.73.02), and to upgrade this standalone version to 1.73.05 using the built-in PsychoPy updater included in your EXAMINER installation package. For Debian-based Linux operating systems (e.g., Ubuntu) you can use the packages located at neuro.debian.net.

Therefore, you will need to install three things: PsychoPy software, EXAMINER tasks to run on PsychoPy, and R. All three of these items are included on the EXAMINER distribution disk.

Installing PsychoPy on Windows

Download the copy included in the EXAMINER battery distribution or from the PsychoPy website.
Ensure that you have administrative privileges.
Run the installer “StandalonePsychoPy-1.73.02-win32.exe” and accept default settings.
Installation will take approximately five minutes.
Additional instructions are available at www.psychopy.org/installation.html.
Ensure that the installation works by running the PsychoPy2 application.
If errors occur, additional Microsoft components may need to be installed.
Perform a manual upgrade to version 1.73.05 using the PsychoPy updater.

Installing PsychoPy on Mac OS X

Download from the EXAMINER distribution.
Open the disk image and drag PsychoPy2.app to the Applications folder.
Follow installation instructions at www.psychopy.org if needed.
Ensure installation by running PsychoPy2.
Perform a manual upgrade to version 1.73.05 using the updater.

Installing PsychoPy on Linux

Using a Linux operating system requires knowledge of the package system. Installation packages for PsychoPy 1.73.05 are included in the EXAMINER distribution. Additional instructions are available at www.psychopy.org/installation.html.

Install the EXAMINER Computer Tasks

Install PsychoPy following instructions above.
Locate the installation file for your operating system:
- Windows: Examiner.zip
- OS X: Examiner.dmg
- Linux: Examiner.tar.gz
Extract the contents to your desktop to create an Examiner3_6 folder.

Configure the EXAMINER Computer Tasks

Run PsychoPy.
Open the Builder view and switch to Coder view.
Open the lavatask.cfg file in the Examiner3_6 folder.
Enter a unique Site ID.
Enter a unique Machine ID.
Measure the horizontal screen width and enter the value in centimeters.
Save the configuration file.

Run the EXAMINER Computer Tasks

Open PsychoPy.
Open examiner_adult_english.py from the scripts folder.
Run the script.
Enter task administration details.
Select tasks to run or skip.

EXAMINER Computer Task Output Files

The software creates data files stored in the Examiner3_6/data folder. Each subject has a folder containing summary and detail files. Combined summary files are also generated for each task type.

File naming format:

[task_name]_Summary_[subject_id]_[session_num]_[date/time].csv
[task_name]_Summary_Combined_[site_id]_[machine_name].csv

Install R Software

R is a free software environment for statistical computing and graphics. The EXAMINER scoring program is written in R and relies on the ltm package.

Installing R on Windows

Download from www.r-project.org or use the included installer.
Run installer and accept defaults.
Install the “ltm” package via the Packages menu.

Installing R on Mac OS X

Download installer from www.r-project.org.
Run installer and accept defaults.
Install the “ltm” package using Package Installer.

Installing R on Linux

Download from www.r-project.org or use included installer.
Run installation and configure as needed.

Chapter 8. Psychometric Properties

Administration Issues

The entire EXAMINER battery was designed to be administered to all subjects with a few notable exceptions. Phonemic verbal fluency, for example, could not be routinely administered to children under seven or eight because the task required a minimum degree of literacy. In addition, the spatial 2-back was a particularly challenging task that was not administered to subjects who either had considerable difficulty with the 1-back or who were unable to complete the 2-back sample and practice items. The children’s version of the Social Norms Questionnaire was not ready until after data collection had started. Also, several subjects were administered short-forms of the battery. Finally, there were instances when a task could not be administered due to situation- or subject-specific issues (e.g., computer issues, sensory-motor deficits, lack of subject cooperation). Data quality issues once a test was administered were relatively infrequent.

Completion Rates

Task	3–7 yrs	8–17 yrs	18 yrs and older	Total
Dot counting	84.4	97.5	93.4	94.0
1-back	83.3	95.9	92.0	92.5
2-back	1.0	58.7	79.3	66.3
Flanker	96.9	98.6	95.6	96.6
Set shifting	96.9	98.3	96.1	96.9
CPT	96.9	97.5	93.8	95.3
Anti-saccade	2.1	76.9	66.7	64.5
Verbal fluency	5.2	98.6	99.4	91.4
Category fluency	96.9	100.0	94.6	96.5
Unstructured task	95.8	98.9	97.4	97.8
Social norms questionnaire	46.9	34.7	74.8	59.9

Distributions

Individual EXAMINER variables can be combined into composite and factor scores using item response theory. The distributions of the Executive Composite score and the Working Memory, Cognitive Control, and Fluency Factor scores were all normal when the sample was viewed in its entirety and when children and adults were viewed separately. The Executive Composite and Factor scores are raw scores generated directly from the EXAMINER scoring program and are not age-referenced.

Reliability: Individual Tests

Reliability of individual tests was estimated in several ways depending on the nature of the task. Internal consistency measures were appropriate for most tasks, while test-retest reliability was reported for Unstructured Task and Social Norms. Inter-rater reliability estimates were reported for Anti-Saccades, Verbal Fluency, and Behavior Rating.

Internal Consistency

Dot Counting

Age group	Dot Counting accuracy
<18	0.69
18+	0.65
Total	0.69

Flanker

Age group	Congruent RT	Congruent accuracy	Incongruent RT	Incongruent accuracy
<18	0.97	0.80	0.97	0.93
18+	0.97	0.88	0.98	0.93
Total	0.97	0.86	0.97	0.93

Continuous Performance Test

The CPT accuracy has an internal consistency reliability coefficient of .64 in children and .78 in adults.

Anti-saccade

Internal consistency reliability is .92.

Set shifting

Age group	Color RT	Color accuracy	Shape RT	Shape accuracy	Shift RT	Shift accuracy
<18	0.95	0.86	0.94	0.93	0.97	0.88
18+	0.95	0.92	0.94	0.91	0.98	0.91
Total	0.95	0.89	0.94	0.92	0.97	0.91

Verbal fluency

The reliability of phonemic fluency is .88, and the reliability of category fluency is .78. Alternate forms showed reliabilities of .91 and .97.

Inter-rater Reliability

Verbal fluency

Intraclass coefficient values indicated high reliability for correct words and acceptable reliability for repetitions and rule violations.

Anti-saccade

The ICC was .98.

Behavior rating scale

The ICC was 0.61 for a single rater and 0.76 for two raters.

Test-Retest Reliability

Measure	n	Reliability
Unstructured Task	85	.71
Social Norms Questionnaire (Children)	15	.93
Social Norms Questionnaire (Adults)	52	.69

Reliability: Executive Composite and Factor Scores

Age group	Executive Composite	Cognitive Control	Working Memory	Fluency
<18	0.87	0.83	0.77	0.76
18+	0.78	0.73	0.39	0.86
Total	0.94	0.91	0.75	0.91

Chapter 10. Validity

Validity reflects how well a particular task measures what it purports to measure. In this section, we report first on the validity of the executive composite score, followed by validation studies of the individual factor scores. Several different approaches to assessing validity have been incorporated.

Executive Composite Score

Correlations with external measures of executive functioning

Baseline FrSBe scores were obtained on 219 adults who had informants available at the time of the EXAMINER visit. The correlation between the Composite Score and FrSBe total raw score, after partialling out the effect of age, was -0.57 (p<.001), reflecting a robust association between executive functioning on EXAMINER and estimates of real-world executive function. This relationship remained highly significant (-0.48, p<.001) even after controlling for estimates of baseline verbal ability and processing speed.

Baseline BRIEF scores were obtained on 404 children. The correlation between the Composite Score and BRIEF total raw score, partialling out the effect of age, was -0.21 (p<.001).

Differences between patients and normal controls

Separate analyses were carried out for children and adults controlling for age.

Group	Patients Mean (SE)	Normals Mean (SE)
<18	-0.81 (.04)	-0.50 (.03)
18+	.19 (.05)	.57 (.04)

Correlations with age

The correlation with age in older adults was -0.30 (p<.001). In children, the correlation was 0.83 (p<.001).

Longitudinal Change

Executive functioning improves significantly with age in normally developing children. The increase in the Executive Composite score over 12 months was significant (t=-10.3, p<.001).

Baseline Mean (SD)	12-month Follow-up Mean (SD)
-0.83 (.55)	-0.49 (.59)

Selected group comparisons

Lateral Frontal vs Medial Frontal vs Posterior lesions

Lateral frontal lesion patients performed worse than posterior lesion patients, while medial frontal patients performed more similarly to posterior lesion patients.

Alzheimer’s vs Mild Cognitive Impairment vs Controls

Controls	MCI	AD
.70 (.05)	.29 (.11)	-.20 (.10)

Subcortical syndromes

Group	Mean (SE)
Subcortical	.15 (.10)
Normals	.86 (.05)

Convergent and Divergent Validity

The correlation between the Executive Composite score and Stroop Interference was .58 (p<.001), whereas the correlation with delayed recall was negligible.

Working Memory Factor Score

Correlations with external measures

The correlation between the Working Memory Factor Score and FrSBe total score was -0.54 (p<.001), and remained significant after controlling for baseline verbal ability and processing speed.

The Working Memory Factor correlated with the BRIEF total score at -0.21 (p<.001).

Differences between patients and controls

Group	Patients Mean (SE)	Normals Mean (SE)
<18	-0.76 (.03)	-0.23 (.03)
18+	.14 (.05)	.61 (.04)

Correlations with age

The correlation with age was -0.32 in adults and 0.53 in children.

Longitudinal Change

Baseline Mean (SD)	Follow-up Mean (SD)
-0.51 (.79)	-0.29 (.77)

Selected analyses

Prodromal Huntington’s Disease

Working Memory Factor scores were impaired in prodromal HD patients and correlated with disease burden.

Attention Deficit Disorder

Children with ADHD performed less well on the Working Memory Composite and performance correlated with learning problems.

Group	Mean (SE)
Controls	-0.17 (.05)
ADHD	-0.48 (.11)

Parkinson’s Disease vs Controls

Controls	PD
.59 (.07)	.06 (.16)

Convergent and Divergent Validity

The Working Memory Factor correlated more strongly with Digit Span Backward (r=.38, p<.001) than with delayed recall, supporting its validity as a working memory measure.

Cognitive Control Factor Score

Correlations with external measures of executive functioning

The correlation between the Cognitive Control Factor Score and FrSBe total raw score, after partialling out the effect of age, was -0.56 (p<.001), reflecting a robust association between Cognitive Control on EXAMINER and estimates of real-world executive function.

The Cognitive Control Factor correlated with the BRIEF total score at -0.22 (p<.001), as well as with the BRIEF Shifting Scale (-0.20, p<.001) and Inhibition Scale (-0.22, p<.001), controlling for age.

Differences between patients and normal controls

Group	Patients Mean (SE)	Normals Mean (SE)
<18	-0.76 (.03)	-0.23 (.03)
18+	.14 (.05)	.61 (.04)

Correlations with age

The correlation between Cognitive Control and age was -0.41 (p=.001) in adults and 0.78 (p<.001) in children.

Longitudinal Change

Baseline Mean (SD)	Follow-up Mean (SD)
-0.72 (.62)	-0.32 (.66)

Convergent and Divergent Validity

The Cognitive Control Factor Score correlated with Stroop Interference (r=.55, p<.001), but not with delayed recall or naming measures.

Fluency Factor Score

Correlations with external measures of executive functioning

The correlation between the Fluency Factor Score and FrSBe total raw score for adults was -0.43 (p<.001), remaining significant after controlling for verbal ability and processing speed (-0.28, p<.001).

The correlation between the Fluency Factor Score and BRIEF total raw score in children was -0.09 (p=.06).

Differences between patients and normal controls

Group	Patients Mean (SE)	Normals Mean (SE)
<18	-0.76 (.05)	-0.58 (.03)
18+	.21 (.05)	.51 (.04)

Correlations with age

The correlation with age in adults was -0.13 (p=.07). In children, the correlation was 0.70 (p<.001).

Correlations with brain MRI

Fluency Factor scores were associated with left frontal lobe volumes. In regression analyses, frontal volumes uniquely explained additional variance in fluency performance.

Longitudinal Change

Baseline Mean (SD)	Follow-up Mean (SD)
-0.77 (.53)	-0.54 (.57)

Selected group comparisons

Lateral Frontal vs Medial Frontal vs Posterior lesions

Lateral frontal lesion patients performed worse than posterior lesion patients; medial frontal patients performed similarly to posterior lesion patients.

bvFTD vs Alzheimer’s vs Controls

bvFTD	AD	Controls
-.39 (.18)	.07 (.13)	.61 (.05)

Convergent and Divergent Validity

Measure	Correlation	p value
Design fluency	.36	<.001
Digits backward	.29	<.001
Stroop	.30	<.001
Delayed recall	.11	.048
Benson copy	.04	.496

Measures not included in factor scores

Unstructured Task

The weighted composite score correlates with FrSBe (r=-.29), separates patients from controls, and correlates with age in children (r=.70) and older adults (r=-.35).

Insight

Self-appraisal accuracy was associated with parent-rated executive behavior (BRIEF Shifting score), indicating ecological validity.

Social Norms Questionnaire

Content Validity

The initial validation phase assessed agreement among healthy controls regarding social norms. Two items were removed due to low agreement, resulting in a 22-item version.

Item	Behavior	Subscale	Initial %	Final %
1	Tell a stranger you don’t like their hairstyle?	Break	97.0	95.6
2	Spit on the floor?	Break	100.0	97.5
3	Blow your nose in public?	Over	74.0	65.5
4	Ask a coworker their age?	Break	80.0	76.8
5	Cry during a movie?	Over	97.0	95.6
6	Cut in line if in a hurry?	Break	100.0	97.5
7	Laugh when you trip?	Over	89.0	90.5
8	Eat pasta with fingers?	Break	89.0	93.5

Differences between patients and normal controls

Patients performed significantly worse than controls (F=11.5, p<.001), particularly those with disorders affecting social behavior.

Convergent and Divergent Validity

The Social Norms Questionnaire correlated with measures of dysexecutive behavior, empathy, and social sensitivity, indicating overlap with both executive and socio-emotional functioning.

Chapter 11. Variable and Scale Construction

The tasks contained in the EXAMINER battery are designed to be used either individually or combined into scales. In this chapter, we review the dependent measures that can be derived from individual tests, and describe the methods underlying scale construction.

Individual Tests

Dot Counting

Each of the six trials is scored according to procedures outlined in Chapter 12, and the primary dependent variable is the total score summed across the six trials.

Spatial 1-back and 2-back

Total correct and d-prime are the primary indices for characterizing accuracy. The 1-back contains 30 trials and the 2-back contains 90 trials. The EXAMINER scoring program generates measures of discriminability (d-prime) using signal detection parameters.

Flanker

The scoring program generates accuracy and median reaction times for congruent and incongruent trials. A composite score combining accuracy and reaction time is also generated, ranging from 0 to 10.

Continuous Performance Test

The primary dependent measure is the total number of false alarm errors. Additional measures include reaction time and omission data.

Anti-Saccade Test

The primary dependent measure is the total number of correct responses across anti-saccade trials.

Set Shifting

Accuracy and reaction time are recorded for each trial. Composite scores are generated combining accuracy and reaction time, ranging from 0 to 10.

Verbal Fluency

Each task generates total correct responses, repetitions, and rule violations. Total correct scores contribute to composite and factor scores.

Unstructured Task

The task generates total points earned and percentage of high-value puzzles completed. A weighted composite is calculated using:

Weighted composite = UTpct * log10(UTTotal + 1)

This measure is normally distributed and correlates with executive behavior measures.

Social Norms Questionnaire

The total score reflects the number of items endorsed in the direction of mainstream cultural norms.

Factor Scores

Confirmatory factor analysis methods were used to identify dimensions underlying executive function measures. Eleven core variables were identified:

Dot Counting total
1-back d-prime
2-back d-prime
Flanker score
Set Shifting score
Anti-saccade total
Verbal fluency scores (4 measures)
Dysexecutive errors

Two models were tested:

One-factor model (global executive function)
Three-factor model (Cognitive Control, Working Memory, Fluency)

The three-factor model showed superior fit, but results also supported a global composite score.

Item Response Theory (IRT)

IRT methods were used to generate scores for:

Global executive function
Cognitive control
Working memory
Fluency

Continuous variables were recoded into ordinal categories and analyzed using a graded response model. Scores were generated using Empirical Bayes methods.

Analyses evaluated differential item functioning across age and language groups. Results indicated that language adjustments were necessary for the global executive function score.

Binning Values (Example Tables)

Ordinal Value	Dot Counting	1-back d-prime	2-back d-prime	Flanker Score
1	3.5–5.5	-0.175–0.067	-0.181–0.096	3.992–4.462
2	5.5–6.5	0.067–0.334	0.096–0.386	4.462–4.926
3	6.5–7.5	0.334–0.577	0.386–0.679	4.926–5.355

Ordinal Value	Verbal Fluency 1	Category Fluency 1	Shift Score
1	0–3.5	0–4.5	1.429–2.879
2	3.5–5.5	4.5–6.5	2.879–3.618
3	5.5–7.5	6.5–8.5	3.618–4.069

Correlation Between Factor Scores

	Executive	Cognitive Control	Working Memory	Fluency
Executive	—	.79	.71	.87
Cognitive Control	.87	—	.58	.48
Working Memory	.74	.60	—	.45
Fluency	.87	.60	.46	—

The Composite and Factor scores are not age-adjusted and should be interpreted as raw scores. Standard errors are provided, and scores with standard error greater than 0.75 should be interpreted cautiously.

Chapter 12. Administration and Scoring Guidelines

Global Considerations

Before beginning a testing session, organize your materials so you can easily move from one task to another. When using alternate forms, ensure that you have the appropriate version.

The optimal seating arrangement is to be across the testing table from the examinee. Do your best to minimize factors that can affect the examinee’s performance (e.g., extraneous noises, poor lighting, anxiety, low motivation). Take time to establish rapport with the participant at the beginning of the test session. Give the examinee a brief and very general overview of what the testing will entail. Family members should not be present in the room during an evaluation. Please administer tests in the order in which they are provided on the record form, unless doing so would compromise rapport with the participant.

Instructions should be read verbatim. Do not paraphrase unless necessary. Should the examinee have difficulties understanding the instructions, it is permissible to explain the task to them in another fashion (while maintaining the basic concept of the instructions); however, you should always read the verbatim instructions first. Always try to elicit the examinee’s best performance. Encouraging remarks are important, but do not provide specific feedback regarding whether individual responses are correct or incorrect, except during practice trials. If the examinee seems discouraged or anxious about his or her performance, encouraging remarks like “I can tell you are trying your best,” “No one gets all of these correct,” or “That was a tough one for you, but you did it” are often helpful. If the examinee’s motivation wanes, it is okay to explain in general terms what the task is trying to measure (e.g., “This is a measure of a particular kind of attention”).

Provide breaks when needed. Examiners should pay attention to the examinee’s motivation, emotional state, physical discomfort, and other factors that potentially compromise test validity. Many of the tasks have discontinuation rules. For those that do not, please attempt to complete all tasks whenever possible.

Examinees can use pen or pencil for carrying out the Unstructured Test and the Social Norms Questionnaire. However, if pencil is used, please provide a pencil without an eraser. Examinees should be instructed to cross out incorrect responses rather than attempt to erase them.

Fluency

Materials & Setup: Timer (1 minute), record form to record responses.

The fluency task has two conditions, phonemic based word generation and category based word generation. The examiner should recite the instructions for each fluency task verbatim. Should the participant have difficulties understanding the instructions, it is permissible to explain the task further or answer any questions they may have. Start the stopwatch once it is clear that the examinee understands the instructions. Write the actual responses as legibly as possible. Record all responses, including repeated words and rule violations. When a rule violation occurs on three consecutive responses, remind the participant of the correct rule. Each rule can be repeated only once per trial. Stop the procedure at 1 minute. If the examinee gives a response that is unclear during the task, make a note of it and query about it after the 1 minute has elapsed.

Fluency – Phonemic

I’m going to say a letter of the alphabet. When I ask you to start, tell me as many words as you can that begin with that letter. You will have one minute before I tell you to stop. None of the words can be numbers or names of people, or places.

For example, if I gave you the letter B, you could say brown, bottle or bake, but you wouldn’t say Barbara, Boston or billion. Also, don’t give me the same word with different endings, so if you said bake, you wouldn’t also say baked or bakes, and if you said big, you wouldn’t also say bigger and biggest.

Let’s begin. Tell me all the words you can, as quickly as you can, that begin with the letter “F.” Ready? Begin.

If the participant pauses for 15 seconds, prompt them to continue by saying, “Keep going” or “What other words beginning with ‘F’ can you think of?” If the participant gives three consecutive responses that do not start with the designated letter prompt them by saying, “Remember, we are using the letter ‘F’.”

Fluency – Category

Now I am going to give you a category, and I want you to name, as fast as you can, all of the things that belong to that category. For example, if I say “articles of clothing,” you would say shirt, tie or hat. It doesn’t matter what letter the word starts with.

Now I want you to name things that belong to the category: Animals. You will have one minute. I want you to tell me all the animals that you can think of in one minute. Ready? Begin.

If the participant pauses for 15 seconds say, “Keep going. What other animals can you think of?” If the participant gives 3 consecutive words that do not fit the category say, “The category we are now using is animals.” If the participant only names animals that begin with the letter L or F, remind the participant, “It doesn’t matter what letter the words start with.” Any instruction can be repeated during a trial by participant request.

Phonemic Fluency Scoring Guidelines

Record all responses, including repeated words and rule violations. When a rule violation occurs on three consecutive responses, remind the participant of the correct rule. Each rule can be repeated only once per trial.

Correct Responses: Any word that begins with the specified letter, can be found in a dictionary, is not a proper noun, number, or certain grammatical variant, and is not a repetition within that trial, should be scored as a correct response.

Repetitions: Any response that is repeated verbatim within the 60-second trial should be scored as a repetition.

Rule Violations:

Words beginning with letters other than the designated letter
Non-words
Proper nouns (people or places)
Numbers
Grammatical variants of previous responses

Category Fluency Scoring Guidelines

Correct responses: Include breeds, male/female forms, infant names, birds, fish, reptiles, and insects.

Repetitions: Repeated animals or synonyms count as repetitions.

Rule Violations: Mythical animals or incorrect categories.

Vegetable category rules include acceptance of grains, gourds, legumes, and certain borderline items (e.g., tomato, avocado).

Unstructured Task

Materials & Setup: Computer timer (6 minutes), practice page, three stimulus booklets.

The examinee will be provided with three test booklets full of puzzles of varying point values and varying average times of completion. They will be asked to choose and complete the puzzles that will allow them to earn as many points as possible in six minutes.

On this practice page there are six puzzles for you to try. Each puzzle has a different instruction, and some puzzles are easier than others. Go ahead and complete this page so you can see the kinds of puzzles you will do.

Answer any questions the examinee may have regarding the puzzles on the practice form. Once the practice form has been completed, review it and correct any errors.

Here are three booklets. Each of the booklets has different puzzles you can do. In these booklets, there are four puzzles on each page. Each puzzle has a number of points that you will earn when you complete the puzzle. Some puzzles have higher points than others. Your goal is to earn as many points as possible.

You do not have to complete all of the pages in a book, and you do not have to complete all the puzzles on each page. You can go in any order you want through the puzzles. Each book is worth the same amount of points. Be sure to read the instructions for each puzzle you do, and complete the puzzles accurately to receive full credit.

You will only have 6 minutes to earn as many points as possible, so choose your puzzles carefully. A timer will be displayed to help you manage your time.

Start the timer once instructions are understood. Stop at 6 minutes. Do not allow completion of items after time ends.

Scoring

Performance is based on number of puzzles completed, value of puzzles, and total points earned.

Count number of high value items completed
Count number of low value items completed
Count number of high value items attempted
Count number of low value items attempted
Sum total points earned

Flanker

Materials & Setup: Use left and right arrow keys. Participants should be 30–40 inches from the screen.

You will be shown a row of five arrows on the screen, pointing to the left or right.

Press the LEFT button if the CENTRAL arrow is pointing to the left. Press the RIGHT button if the CENTRAL arrow is pointing to the right.

Try to respond as quickly and accurately as you can. Try to keep your attention focused on the cross (“+”) at the center of the screen.

First we’ll do a practice trial. Press the SPACEBAR to begin.

If accuracy is below 75% during practice, additional trials are administered. If the examinee cannot meet criteria after three practice trials, the task is discontinued.

Set Shifting

Materials & Setup: Use left and right arrow keys. Participants should be 30–40 inches from the screen.

This is a matching game. You will see a picture in the middle of the screen, and a word at the bottom of the screen. The word will tell you how to match the picture.

You can match the picture by SHAPE or COLOR.

When matching by COLOR: LEFT button for RED, RIGHT button for BLUE.

When matching by SHAPE: LEFT button for TRIANGLE, RIGHT button for RECTANGLE.

Try to respond quickly and accurately. Practice trials are given before the test. If performance is below 75%, additional practice trials are administered.

Dot Counting

Materials & Setup: Record form. Participants should be 30–40 inches from the screen.

You will be shown a series of images containing blue circles, green circles, and blue squares.

You will count and remember the number of BLUE CIRCLES you see on each screen.

Count the BLUE circles aloud, one at a time, and then repeat the final total aloud immediately.

After a number of displays, question marks will appear. This will be your cue to repeat the final numbers you counted.

Scoring is based on recall accuracy, not counting accuracy.

Continuous Performance Test

Materials & Setup: Use left arrow key only. Participants should be 30–40 inches from the screen.

You will be presented with different shapes on the screen. If a 5-pointed star is presented, press the LEFT ARROW key. If any other shape is presented, do not press any button.

Respond as quickly as you can without making mistakes. If you make a mistake, just keep going.

Practice trials are administered first. If performance is below 80%, additional practice trials are given.

1-Back

Materials & Setup: Use left and right arrow keys.

Remember the location of this square, so you can compare it to the location of the next square.

If the location is the same as the previous one, press LEFT. If not, press RIGHT.

Say the number aloud when it appears on the screen.

Respond as quickly as possible without making mistakes.

2-Back

Materials & Setup: Use left and right arrow keys. The 2-back should always be administered immediately after the 1-back. Participants should be 30–40 inches from the screen.

Remember the location of this square, so you can compare it to the location of the square after the next one.

Also remember the location of this square so you can compare it to the location of the square after the next one.

Does this location match the one TWO before? If YES, press the LEFT key. If NO, press the RIGHT key.

Let’s try some more squares. Remember each square and compare it to the one 2 before. Start responding with the 3rd square.

Please respond as quickly as possible without making mistakes.

If the participant does not perform adequately, additional practice trials are administered. If performance does not improve after three practice trials, the test is discontinued.

After the test is complete:

The test is complete. This was a challenging test and we want to make sure you understood the instructions. Please explain the instructions to the examiner.

Anti-Saccades

Materials & Setup: Record form. Examiner must be able to observe the participant’s eyes.

Set the computer approximately 31 inches from the participant and at eye level.

Pro-Saccade Trial:

You will see a dot in the center of the screen. I would like you to move your eyes in the direction the dot moves, either to the left or to the right. Then follow the dot back to the center. Do not move your head, just your eyes.

There are 10 trials for Pro-Saccades.

Anti-Saccade Trial:

Now we are going to be doing a second eye movement task. You will see a dot in the center of the screen. The dot will move either to the left or to the right. This time, I would like you to use only your eyes to look at the opposite direction of where the dot moves. Do not move your head, just your eyes.

After looking at the opposite side, return your eyes to the center.

There are two blocks of 20 trials each.

Scoring: Record the direction of the initial eye movement. Count correct responses and sum across trials.

Social Norms Questionnaire

Materials & Setup: Questionnaire form.

The following is a list of behaviors that a person might do. Please decide whether or not it would be socially acceptable and appropriate to do these things in the mainstream culture of the United States, and answer yes or no to each.

Think about these questions as they would apply to interactions with a stranger or acquaintance, not with a close friend or family member.

Collect the form when the examinee has finished. Provide clarification if needed without revealing correct responses.

Behavioral Rating Scale

This rating scale is completed by the examiner after completion of the testing. Examiners should restrict their ratings to behaviors they have directly observed.

There are nine behavioral domains:

Agitation
Stimulus-boundedness
Perseveration
Decreased initiation
Motor stereotypies
Distractibility
Lack of social/emotional engagement
Impulsivity
Socially inappropriate behavior

Each domain is rated as:

None
Mild
Moderate
Severe

General Guidelines

Ratings should reflect only observed behaviors. Include behaviors observed during testing and in other contexts such as waiting room interactions.

Determining Severity

Mild: Infrequent or minor impact on testing.

Moderate: Interferes with testing or interaction.

Severe: Frequent, disruptive, or compromises test validity.

Behavior Descriptions

Agitation: Includes verbal or physical disruptive behaviors.

Stimulus-boundedness: Inappropriate responses to environmental stimuli.

Perseveration: Repetition of behaviors or responses.

Decreased initiation: Delayed or reduced response initiation.

Motor stereotypies: Repetitive, purposeless movements.

Distractibility: Attention easily diverted.

Lack of social/emotional engagement: Reduced social interaction or emotional responsiveness.

Impulsivity: Acting without forethought.

Socially inappropriate behavior: Actions not suitable for professional or social settings.

Chapter 13. Procedures for Calculating Composite and Factor Scores

Installing the Scoring Program

The EXAMINER battery includes software for generating the executive composite and factor scores. The software is programmed in the R language and relies on the ltm module (latent trait models under the Item Response Theory approach).

Install the R software (see Chapter 7).
Locate and copy the ExaminerScoring installation file for your operating system:
- Windows: ExaminerScoring.zip
- OS X: ExaminerScoring.dmg
- Linux: ExaminerScoring.tar.gz
Extract the contents to a chosen location.

Running the Scoring Program

Start the R program.
Set the working directory to the scoring folder.
Type source("examiner_scoring.R") and press return.
Run the scoring function: score_file("./test/ExampleScoringInput.csv","./test/TestOutput1.csv")
The program will generate an output file containing composite and factor scores.

Preparing an Input File

The scoring program requires a CSV file with 11 donor variables and a language indicator.

Variable	Source	Description
language	Record Form	1 = English, 2 = Spanish
dot_total	Record Form	Dot Counting total
nb1_score	N-back	1-back d-prime
nb2_score	N-back	2-back d-prime
flanker_score	Flanker	Flanker composite score
error_score	Calculated	Composite error score
antisacc	Record Form	Anti-saccade total
shift_score	Set Shifting	Shift score
vf1_corr	Record Form	Phonemic fluency trial 1
vf2_corr	Record Form	Phonemic fluency trial 2
cf1_corr	Record Form	Category fluency trial 1
cf2_corr	Record Form	Category fluency trial 2

Calculating the error_score Variable

The error_score is calculated by summing the following variables:

Non-target errors (CPT)
Flanker error difference
Shift error difference
Verbal fluency repetitions and rule violations
Category fluency repetitions and rule violations
Behavior Rating Scale total score

Negative values for flanker_error_diff and shift_error_diff should be converted to zero before calculation.

Composite and Factor Score Output Variables

Variable	Description
executive_composite	Executive composite score
executive_se	Standard error of executive composite
fluency_factor	Fluency factor score
fluency_se	Standard error of fluency
cog_control_factor	Cognitive control factor score
cog_control_se	Standard error of cognitive control
working_memory_factor	Working memory factor score
working_memory_se	Standard error of working memory

Chapter 14. Known Issues

If administering tasks on a Linux machine, centering and alignment of textual instructions may be off.

References

Amieva, H., Phillips, L., & Della Sala, S. (2003). Behavioral dysexecutive symptoms in normal aging. Brain Cogn, 53(2), 129–132.

Anderson, S. W., Damasio, H., Jones, R. D., & Tranel, D. (1991). Wisconsin Card Sorting Test performance as a measure of frontal lobe damage. J Clin Exp Neuropsychol, 13(6), 909–922.

Baddeley, A. (2002). Fractionating the central executive. In D. T. Stuss & R. T. Knight (Eds.), Principles of frontal lobe function.

Baddeley, A., & Della Sala, S. (1996). Working memory and executive control. Philos Trans R Soc Lond B Biol Sci, 351(1346), 1397–1403.

Bechara, A., Damasio, H., Tranel, D., & Anderson, S. W. (1998). Dissociation of working memory from decision making within the human prefrontal cortex. J Neurosci, 18(1), 428–437.

Beglinger, L. J., et al. (2005). Practice effects and alternate forms in neuropsychological testing. Arch Clin Neuropsychol, 20(4), 517–529.

Berman, K. F., et al. (1995). Physiological activation during Wisconsin Card Sorting Test. Neuropsychologia, 33(8), 1027–1046.

Bowden, S. C., Benedikt, R., & Ritter, A. J. (1992). Delayed matching and learning in alcohol dependence. Neuropsychologia, 30(5), 427–435.

Chan, R. C., et al. (2002–2003). Cross-cultural neuropsychological assessment studies.

Cummings, J. L., et al. (1994). Neuropsychiatric Inventory. Neurology, 44, 2308–2314.

D’Esposito, M., et al. (1995, 1999). Neural basis of working memory.

Delis, D. C., et al. (2000–2004). Executive Function System and CVLT.

Dubois, B., et al. (2000). Frontal Assessment Battery. Neurology, 55(11), 1621–1626.

Dunbar, K., & Sussman, D. (1995). Frontal lobe deficits simulation. Ann N Y Acad Sci, 769, 289–304.

Espy, K. A., et al. (1999–2004). Executive function in children.

Gioia, G. A., et al. (2002). BRIEF validation.

Heflin, L. H., et al. (2011). Stroop and inhibition. Neuropsychology, 25(5), 655–665.

Korkman, M., et al. (2001). Neurocognitive measures in children.

Krueger, C. E., et al. (2011). Self-appraisal accuracy.

Levine, B., et al. (2000). Goal management training.

Litvan, I., et al. (1996). PSP diagnostic criteria. Neurology, 47(1), 1–9.

Malloy, P., & Grace, J. (2005). Rating scales for frontal systems damage.

Manchester, D., et al. (2004). Executive function assessment.

Marshall, S. C., et al. (1997). Differential item functioning.

McDonald, R. J., et al. (2002). Response inhibition studies.

Miyake, A., et al. (2000). Unity and diversity of executive functions.

Mungas, D., et al. (2004). SENAS development.

Neary, D., et al. (1998). Frontotemporal dementia criteria.

Petersen, R. C., et al. (2001). Mild cognitive impairment.

Polman, C. H., et al. (2005). Multiple sclerosis criteria.

Rabin, L. A., et al. (2005). Neuropsychological assessment survey.

Robbins, T. W., et al. (1994). CANTAB validation.

Rodriguez del Alamo, A., et al. (2003). Spanish FAB.

Rogers, R. D., et al. (2000). Attentional set shifting.

Rosso, I. M., et al. (2004). Cognitive and emotional frontal lobe functioning.

Royall, D. R., et al. (1992). Executive Interview.

Saver, J. L., & Damasio, A. R. (1991). Social knowledge and frontal damage.

Shallice, T., & Burgess, P. W. (1991, 1996). Supervisory processes.

Stuss, D. T., et al. (1995). Anterior attentional functions.

Tranel, D., et al. (2002). Ventromedial prefrontal cortex and social conduct.

Wilson, B. A., et al. (1996). Behavioral Assessment of the Dysexecutive Syndrome.

Contents

Acknowledgements

Chapter 1. Mission

Chapter 2. Project Structure

Chapter 3. Background

Chapter 4. Framework for Current Battery

Sub-Components of Executive Function

The Role of Non-Executive Skills

Broad-based and Novel Strategies

Current Framework

Chapter 5. Description of Tasks

Domain: Working Memory

Domain: Inhibition

Domain: Set Shifting

Domain: Fluency

Domain: Planning

Domain: Insight

Domain: Social Cognition and Behavior

Chapter 6. Data Collection

Diagnostic Groups

General Inclusion Criteria

General Exclusion Criteria

Subject Diagnostic Criteria

Data Collection Sites

Chapter 7. Software Installation and Administration

Requirements

Acquiring and Installing Software Dependencies

Install PsychoPy

Installing PsychoPy on Windows

Installing PsychoPy on Mac OS X

Installing PsychoPy on Linux

Install the EXAMINER Computer Tasks

Configure the EXAMINER Computer Tasks

Run the EXAMINER Computer Tasks

EXAMINER Computer Task Output Files

Install R Software

Installing R on Windows

Installing R on Mac OS X

Installing R on Linux

Chapter 8. Psychometric Properties

Administration Issues

Completion Rates

Distributions

Reliability: Individual Tests

Internal Consistency

Inter-rater Reliability

Test-Retest Reliability

Reliability: Executive Composite and Factor Scores

Chapter 10. Validity

Executive Composite Score

Correlations with external measures of executive functioning

Differences between patients and normal controls

Correlations with age

Longitudinal Change

Selected group comparisons

Convergent and Divergent Validity

Working Memory Factor Score

Correlations with external measures

Differences between patients and controls

Correlations with age

Longitudinal Change

Selected analyses

Convergent and Divergent Validity

Cognitive Control Factor Score

Correlations with external measures of executive functioning

Differences between patients and normal controls

Correlations with age

Longitudinal Change

Convergent and Divergent Validity

Fluency Factor Score

Correlations with external measures of executive functioning

Differences between patients and normal controls

Correlations with age

Correlations with brain MRI

Longitudinal Change

Selected group comparisons

Convergent and Divergent Validity

Measures not included in factor scores

Content Validity

Differences between patients and normal controls