Hypertension specific patient-reported outcome measure. Part II: validation survey and item selection process

Aim. Improvement of the health-related quality of life (HRQoL) is one of the basic principles of value-based medicine. HRQoL could be assessed by the patient reported outcome measures (PROMs) also in case of arterial hypertension (HTN). However for HTN patients only generic PROMs are still used. Previously the group of experts had created the primary version of HTN-specific PROM. The purpose of the second part was to conduct a validation survey and to select the items in a statistically-based manner. Material and methods. Validation survey was conducted in a large multidisciplinary center among patients with HTN stages 1-3 and healthy volunteers. Inclusion criteria were age >18 years old, ability to understand or complete the scale themselves, absence of significant illness requiring hospitalization. The items were selected according to the principles of classical test theory (CTT) and item response theory (IRT). The criteria for CTT were sensitivity (standard deviation and coefficient of variation with corresponding confidence intervals), representativeness (item-total Pearson’s correlation coefficient), internal consistency (Cronbach’s α coefficient). In IRT analysis two methods were adopted — value of four degrees of difficulty and the discrimination estimate. Each question was evaluated according to 8 criteria. An item was considered for selection when it was retained by ≥4 criteria. The expert panel considered practical significance of each item. Results. A total of 430 questionnaires were distributed and 407 (94,7%) of them were returned completed (from 359 hypertensive patients, mean age 62,3±11,7 y.o.; 48 healthy volunteers, mean age 38,8±10,5 y.o.). The average time for PROM filling was 24±4,2 minutes. Of 163 questions, 27 met all 8 criteria and 3 questions did not match any of the 36 HTNspecific questions, 11 matched ≥5 criteria and in the generic part there were 87 questions (33 in the PHY domain, 35 for PSY, 8 for SOC, 11 for THER). The symmetric distribution of criteria was seen in 25 questions, of which 11 were evaluated by experts and then retained. For 40 questions, <4 eligibility criteria were recorded, of which 9 were retained after expert review. The PROM draft contained 80 questions (19 questions in the physiology domain, 22 in psychology, 6 in social, 13 in therapy, 20 items are HTNspecific). Conclusion. The methods of CTT and IRT allowed to reduce the PROM volume without losing the semantic richness and the need to reorganize the conceptual structure. The next step is the validation of the scale.

ageable). Therefore, it is important to have reliable and concise questionnaires for patients with a certain pathology. In addition, using adequate data obtained with the help of disease-specific PROMs, it becomes possible to carry out cost-utility analysis [7]. It is one of the most complex and sophisticated methods of economic analysis, which is most important in the value-based healthcare (clinical, economic and patient-oriented benefits).
At first stage, the process of creating a multidimensional and multivariate disease-specific PROM for HTN patients was described [8]. As the first part of study, Interviewing and pilot questioning of patients were carried out, followed by assessment of the questionnaire structure, which greatly reduced it. The current stage is aimed at use of special statistical methods for the analysis of psychological tests, which complement the qualitative examination.

Material and methods
The study was conducted in accordance with the Good Clinical Practice standards and Declaration of Helsinki principles. The study protocol was approved by the local Ethics Committee. Prior to inclusion in the study, all participants gave written informed consent. The guidelines and documents of Food and Drug Administration (FDA) [9], the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) [10], and the International Society for Quality of Life Research (ISO-QOL) were used for PROM creation and validation [11]. The study was supported by a grant from the Russian Science Foundation (project № 17-15-01177).
Validation study. The survey with the primary PROM version [8] was conducted in an outpatient department of a large multidisciplinary medical center.
The main group consisted of patients with stage 1-3 HTN who were first seen by a hypertensiologist (one of the authors), while the antihypertensives' status was not taken into account. The second group of participants was conditionally healthy volunteers.
The general inclusion criteria were at least 18 years of age, the ability to understand the purpose and instructions for filling out, independently read and answer the questions in the print PROM form.
The main exclusion criteria were a cognitive deficit assessed by a physician subjectively or a diagnosis of grade 2 or higher encephalopathy, a serious somatic pathology (cardiovascular or non-Hypertension (HTN) as the leading cause of premature mortality and disability [1], is predicted to maintain a leading position until 2040 [2]. Undoubtedly, the main aim of health care is to save and prolong life. However, improving of life quality (reduce the severity of symptoms, lighten the psychological and social burden of diseases) is equally important. Like most chronic pathologies, HTN affects the quality of life (QOL) associated with health in the range from insignificant to quite significant [3]. This fact is of particular interest since there are more than a billion HTN patients around the world, the effectiveness of treatment in most of which is insufficient [4].
Recent scientific evidence detects a decrease in QOL of patients with uncontrolled HTN; however, even with undoubted effectiveness of some antihypertensive drugs and rational treatment regimens, it can also negatively affect QOL [5]. To "measure" the symptoms and inf luence of the disease on the psychological, social fields of the patient's life is possible due to patient-reported outcome measures (PROMs) -a highly effective tool for translating subjective perception into an objective assessment.
Since the use of PROMs in routine clinical practice allows you to change the paradigm of decision taking, making it more personalized, it is expected that medical care can be better also at population level. Thus, international and Russian guidelines for the management of HTN patients are general in nature and are based on the "crude" stratification of patients according to the main objective parameters [6]. In turn, PROMs is a more holistic and comprehensive assessment of QOL and the treatment effect from the patient's point of view. PROMs analysis can provide the doctor with valuable information for implementing the precision medicine. It is also worth noting that sometimes the results of PROMs analysis are the primary endpoints of clinical trials, replacing or complementing the "conventional" objective and laboratory goals.
Detailed algorithms for developing tools of QOL assessing are given in the international guidelines. However, many researchers have noted the difficulties in choosing a suitable PROM, since it often depends on a specific pathology, clinical trial and goals of the authors/experts. As with many chronic pathologies, patients with HTN are in a certain conditions' continuum, determined by the degree of severity (which can be significantly alleviated in a short time) and stage (almost unman-CV >20 units was required to remain question in the intermediate questionnaire version.
2.1. Similar to the CI for SD, the same requirements were used on the 95% CI for CV; the lower confidence limit was supposed to be >20 units.
3. Question representativeness was evaluated based on the item-total correlation. A question was considered appropriate if the Pearson's correlation coefficient between the mean value of the responses of a question and its area exceeded the CV=0,5.
When selecting questions by IRT, 2 criteria were used, the values of which were estimated by maximum likelihood method: 1. Discrimination estimate (coefficient a). The key principle was as follows: the higher the coefficient a, the higher the item informative value. If cardiac), which required hospital (including surgical) treatment in the near future. All participants were asked to fill out a questionnaire after talking with a doctor and completing an informed consent form.
Statistical analysis. The selection of questions was based on the classical test theory (CTT) and item response theory (IRT).
When selecting questions by CTT, the following 6 criteria were used: 1. The sensitivity of the item was determined by standard deviation (SD). A question considered inappropriate if SD was <1,0.
1.1. Values of 95% confidence interval (CI) of SD >1,0 was the criterion of acceptability. Preferably, the lower confidence limit meets these requirements.
2. A question sensitivity was also evaluated using the coefficient of variation (CV) of responses;  2. Value of difficulty degree, determined by four coefficients b1, b2, b3 and b4 and satisfying the inequality b1<b2<b3<b4. In this case, the values of b1 and b4 should have been in the range from -3 to +3. Questions where the values of these items fell outside this range, was removed from the questionnaire, since the distribution of answers to them would be shifted to one of the extremes (responses with points 1 or 5).
Each question was evaluated based on the eight criteria described above. If an item meets with four or more ones, then it could be maintain in the questionnaire. In addition, the significance and semantic richness of a question were reassessed by expert group. Thus, with the both use of statistical analysis and expert review, the logic and informative value of the second intermediate PROM version were formed (Fig. 1).
Statistical processing of the results was performed using the non-profit open source software package R Statistics (ver. 3.1.0, The R Foundation for Statistical Computing, Vienna, Austria) and the SPSS software package (ver. 23.0, IBM, Chicago, IL, USA). The level of statistical significance for differences was set as p<0,05. The following specialized programs were also used: IRTShiny Version 1.1 (http://kylehamilton.net/shiny/IRT-Shiny/); Classical Test Theory (Item Analysis)http: //kylehamilton.net/shiny/CTTShiny. The jamovi software (https://www.jamovi.org/) was used for the reliability analysis.
Analysis of the frequency distribution showed that there were 11,4% of unanswered questions.
The missing data was analyzed by the Little's Test of Missing Completely at Random: χ 2 =347, p=0,39. Results showed that the distribution is consistent with normal, and omissions are random. Missing data was recovered by multiple imputation method.
The initial version of the questionnaire, formed by conceptual framework, consisted of a general ("non-specific") part, which included areas of "physiology" (PHY) with 43 questions (5 subareas: physical symptoms, general well-being and vitality, self-assessment, the limiting effect of physical health , dynamics of physical health), "psychology" (PSY) with 42 questions (5 subareas: emotional and behavioral symptoms, cognitive symptoms, psychological well-being, the limiting effect of mental health, dynamics of mental health). General part also included "social" area (SOC), which contained 15 questions (4 sub-area: social frustration, social resources, the effect of physical and mental health on social activity), and the "therapy" domain (THER) with 27 items (6 sub-areas: therapy satisfaction, therapy-related physical changes, therapy-related psychological changes, the effect of the treatment regimen on daily life, adherence to treatment). HTN-specific part included 13 questions in PHY and THER subdomains similar to the general part, 4 questions in PSY and 6 in SOC subdomains).
Thus, a total of 163 questions were assessed (36 questions regarded only HTN). For each question, the CTT and IRT criteria values are presented in Table 1. Twenty seven questions met all 8 criteria (2 questions of HTN-specific, 20 in the PHY area, 4 in the PSY area, 1 in the SOC area). Despite satisfactory values, questions PHY_4_3, PHY_4_5, and PHY_4_8 were removed due to low practical signif icance, questions PHY_4_10 and PHY_4_12 -due to duplication. PHY_4_14 and PHY_4_17 questions were not included in the intermediate version because similar questions were in the HTN-specific part (HTN_SOC_5, HTN_SOC_7, respectively). The following three questions did not meet any of the criteria: THER_6_7 "How often do you take medicine on friend recommendations or on your own without prescription?", THER_7_1 "How often do you miss scheduled appointments with a doctor?", HTN_THER_14 "How often do you eat fast food. These questions were excluded from the intermediate version of the questionnaire.
Eleven questions of the HTN-specific part met ≥5 criteria; there were 87 such items in the gen-eral part (33 were in the PHY domain, 35 -in the PSY area, 8 -in the SOC area, 11 -in the THER domain). After consistent expert review, 3 questions were removed from the HTN-specific part (HTN_SOC_4, HTN_THER_10, HTN_ THER_11) due to low practical significance. In addition to the already mentioned items, 28 following ones were excluded from the general part: 7 PHY questions (practical insignificance, duplication of the HTN-specific part, discrepancy with the HTN concept), 15 PSY questions (low reliability, practical insignificance), 3 SOC questions (practical insignificance, duplication of the HTN-specific part); 3 THER questions (practical insignificance, duplication of the HTN-specific part).
Symmetric distribution of criteria was observed in 25 questions (9 -HTN-specific). These questions were reassessed by the authors, as a result of which 11 ones (HTN -5, PSY -1, THER -5 questions) were remained in the intermediate version (due to the semantic richness and practical significance).
Forty questions did not meet at least 4 criteria (except for the three questions described above); 9 of them were remained after an expert review due to practical significance (7 -HTN-specific, questions PSY_5_5 and SOC_1_1 in the relevant areas).
Forty six questions (21 -HTN-specific) did not meet both basic statistical criteria of CTT and IRT (reliability and difficulty, respectively). Only 16 of them were remained in in the intermediate version of the questionnaire (11 -HTN-specific, 2 -PSY and THER, 1 -SOC) due to their practical significance and semantic richness.
Due to significant reduction of question pool and to facilitate the validation, HTN-specific questions were integrated into the relevant areas of the general part. Thus, the questions HTN_ PHY_1-12 remained in the sub-area "physical symptoms"; items HTN_PSY_1-4 constituted an additional sub-area of "hypo-and hypernosognosia" in the PSY domain; HTN_SOC_3 were added to the sub-area "social resources in the HTN treatment", and questions HTN_SOC_5,7 -to the sub-area "the effect of physical health on social activity"; questions HTN_THER_1,2 were included in the subdomain "therapy satisfaction", HTN_THER_3-5 were included in the subdomain "adherence to treatment". The sub-areas "dynamics of physical health", "the effect of mental health on social activity" were removed from the interme-diate version of PROM. As a result, the intermediate version included 80 questions (19 -PHY, 22 -PSY, 6 -SOC, 13 -THER, and 20 HTNspecific items) (Annex 1).

Discussion
Expert selection, creating a conceptual framework and developing a questionnaire are some of the most important and difficult steps. However, when these steps are taken, it becomes necessary to select meaningful, practically significant, most reliable questions. An important step in the second part of the study was the need to obtain a sufficient amount of data for statistical analysis.
The most common method of social and psychological research is a mass survey. This is particularly important when developing new PROM or adapting well-known foreign-language ones, since the contribution of patients and the subsequent interviewing is one of the ways to confirm content validity [9]. Therefore, HTN patients, especially ambulatory ones, were broadly covered in the conditions close to the real clinical practice. Nevertheless, the mass survey inevitably leads to incomplete data, which is often associated with the unattainability, fatigue or inattention (when completing large questionnaires), cultural, ethnic and social characteristics of individuals [12]. The main reason for the loss of a tenth of the required data, according to the Little's MCAR test, was the fatigue or inattention of the respondents.
The resulting data pool of responses became the basis for evaluating each unit of the HTN-specific PROM according to both CTT and IRT.
CTT methods, also called true score theory, are clear and readily available for use. The main criteria of CTT in this work were considered SD and reliability. It should be noted that the authors did not evaluate the factor loading due to the large number of questions, areas and sub-areas. So, the probability of unreliable distribution by factors was rather high. For the Hyper-PRO questionnaire, an exploratory factor analysis was carried out within the initial selection of questions [13]. That was reasonable due to small initial pool.
It should be noted that despite the CTT recognition, it does not take into account latent traits and abilities of the respondents, and therefore the reliability assessment may be inadequate. In addition to CTT, an IRT method was used, based on item characteristics curves and difficulty of questions. CTT has three advantages over IRT: the assessment of respondent's capacity does not depend on a specific question; the assessment does not depend on the study population; the accuracy of capacity assessment can also be determined. When using CTT, it is possible to determine the nonlinear relationship between the respondent's response and its potential quality, or to describe the relationship between the response and the factor underlying the question. However, CTT principles are rather difficult to understand and therefore the application is limited mainly by the teaching tests [14].
Reliability, size and content are important characteristics affecting the CTT application for PROM development. There is a growing understanding among specialists that combinations of quality questions can contribute to the development of the most valid and concise PROMs that would reduce the "respondent burden". Therefore, attempts are being made not to evaluate the PROM as a holistic concept, but to determine the reliability and importance of questions based on the characteristics of the patients' response. The active introduction of CTT and computer programs for adaptive testing made it possible to create a PROMIS system (Patient-Reported Outcomes Measurement Information System). From a large database of questions, a small number of the most informative items are selected based on the patient's characteristics. [15].
Most of the questions were removed on the basis of CTT and IRT combination. However, the items that were significant for the overall structure, despite that some of them did not meet the criteria, were remained based on the expert review. For example, most of the inappropriate questions of the HTN-specific part were remained, and the questions of the last THER subdomain (in the general and HTN-specific parts) were excluded; the sub-area "adherence to treatment" was significantly reduced. Items of this area were developed based on the clinical judgment of the authors and the theoretical problems of treating HTN patients. Probably, cultural, sociological and age-related characteristics, along with sample bias could be associated with the insufficient compliance with the selection criteria.

Conclusion
The development of a disease-specific questionnaire based on the outcomes reported by HTN patients passed the second stage using the CTT and IRT methods and expert review. The results obtained led to its twofold reduction due to the exclusion of inappropriate (duplicate, unreliable, difficult to understand) items. In addition, the prior structure and conceptual framework have been remained in the intermediate PROM version. The next step is to analyze validity, reliability and sensitivity.
Funding. The study was conducted by a grant of the Russian Science Foundation (project № 17-15-01177).
Conflicts of Interest: nothing to declare.

Health-related Quality-of-Life Questionnaire for Patients with Hypertension
Please answer questions regarding your general state, mood and treatment. Your answers will help your doctor work to improve th e quality of care. Answer each question by marking the answer you have chosen as stated. If you are not sure how to answer the question, please choose the answer that most accurately reflects your view.