Assessing "spin" in urology randomized controlled trials with statistically non-significant primary outcomes

The Journal of urology(2023)

引用 1|浏览12
暂无评分
摘要
You have accessJournal of UrologyReview Articles1 Mar 2023Assessing “Spin” in Urology Randomized Controlled Trials With Statistically Nonsignificant Primary OutcomesThis article is commented on by the following:Editorial CommentEditorial Comment Jeremy Wu, Wilson Ho, Laurence Klotz, Morgan Yuan, Jason Y. Lee, and Yonah Krakowsky Jeremy WuJeremy Wu https://orcid.org/0000-0002-5304-3552 Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada More articles by this author , Wilson HoWilson Ho Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada More articles by this author , Laurence KlotzLaurence Klotz Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada Division of Urology, Department of Surgery, Sunnybrook Hospital, Toronto, Ontario, Canada More articles by this author , Morgan YuanMorgan Yuan Division of Plastic, Reconstructive & Aesthetic Surgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada More articles by this author , Jason Y. LeeJason Y. Lee Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada Division of Urology, Department of Surgery, University Health Network, Toronto, Ontario, Canada More articles by this author , and Yonah KrakowskyYonah Krakowsky *Correspondence: Division of Urology, Department of Surgery, Women's College Hospital and Mount Sinai Hospital, 60 Murray St, Toronto, ON , M5T 3L9, Canada telephone: 416.586.4800; E-mail Address: [email protected] Division of Urology, Department of Surgery, University of Toronto, Toronto, Ontario, Canada Division of Urology, Department of Surgery, Women’s College Hospital and Mount Sinai Hospital, Toronto, Ontario, Canada More articles by this author View All Author Informationhttps://doi.org/10.1097/JU.0000000000003105AboutAbstractPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookTwitterLinked InEmail Abstract Purpose: “Spin” refers to a form of language manipulation that positively reflects negative findings or downplays potential harms. Spin has been reported in randomized controlled trials of other surgical specialties, which can lead to the recommendation of subpar or ineffective treatments. The goal of this study was to characterize spin strategies and severity in statistically nonsignificant urology randomized controlled trials. Materials and Methods: A comprehensive search of MEDLINE and Embase for the top 5 urology journals, major urology subspecialty journals, and high-impact nonurology journals from 2019 to 2021 was conducted. Statistically nonsignificant randomized controlled trials with a defined primary outcome were included. Screening, data extraction, and spin assessment were performed in duplicate by 2 independent reviewers. Results: From the database search of 4,339 studies, 46 trials were included for analysis. Spin was identified in 35 studies (76%), with the majority of abstracts (n = 26, 57%) and main texts (n = 35, 76%) containing some level of spin. “Obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results” was the most frequently used strategy in abstracts, while “other” strategies not previously defined were the most commonly used strategies in main texts. Moderate or high spin severity was identified in 21 (46%) abstract and 22 (48%) main text conclusions. Conclusions: Overall, our results suggest that 76% of statistically nonsignificant urology randomized controlled trials contained some level of spin. Readers and writers should be aware of common spin strategies when interpreting nonsignificant results and critically appraise the significance of results when making decisions for clinical practice. Randomized controlled trials (RCTs) are considered the gold standard for comparing the effectiveness of therapeutic interventions when designed, interpreted, and reported appropriately.1 Through randomization, participant characteristics are balanced between groups, allowing attribution of any differences in outcome to the study intervention.1 As such, they have high internal validity and allow clinicians to confidently draw conclusions about the treatments and outcomes assessed. An RCT, ie, performed without methodological rigor, however, can fall victim to biases that impact the validity of its conclusions. Spin is a construct that refers to the manipulation of language in reporting that may influence the interpretation of results, such as presenting a study more favorably than the actual results reflect or downplaying potential harms.2,3 Spin may involve emphasizing the beneficial effect of a treatment despite statistically nonsignificant results or distracting the reader from statistically nonsignificant findings, which can lead to false statements being made about interventions and the subsequent recommendation of subpar or ineffective treatments to patients.2 As such, it is crucial for clinicians and researchers to be aware of spin when interpreting evidence and reporting their own research. This is a particularly important issue in surgical trials, which are often underpowered and, as such, may fail to achieve statistical significance for their primary end point.4 To help readers recognize and evaluate spin in RCTs, Boutron et al developed criteria to classify common spin strategies and categorize spin severity.5 These criteria have been used to assess spin in RCTs of various surgical specialties.6-9 In vascular surgery literature, 72%-77% of RCTs with nonsignificant primary outcomes contained spin in their main text, while 70% of otolaryngology RCTs contained spin in their abstract.6,8 To our knowledge, there have not been any studies looking at spin in published urology RCTs, but evidence demonstrates urology RCTs have relatively poor compliance to the CONSORT statement—established guidelines that facilitate complete and transparent reporting of RCT results—suggesting the possibility of significant spin.10,11 The primary objective of this study was to identify the strategies and severity of spin in the abstract and main text of statistically nonsignificant urology RCTs. The secondary objective was to identify whether specific study characteristics are associated with the severity of spin in the conclusions section of both abstracts and main texts. Materials And Methods This systematic review was conducted with adherence to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines (supplementary Appendix A, https://www.jurology.com) and was prospectively published on Open Science Framework registries (https://osf.io/k8z6a/). Search Strategy A comprehensive search of MEDLINE and Embase from January 1, 2019, to December 31, 2021, was conducted. The search includes relevant RCTs from the top 5 urology journals based on 2021 Web of Science impact factors (European Urology, European Urology Oncology, European Urology Focus, The Journal of Urology®, BJU International), major urology subspecialty journals (Journal of Sexual Medicine, Andrology, Journal of Endourology, Journal of Pediatric Urology, International Neurourology Journal), and urology RCTs published in high-impact nonurology journals (New England Journal of Medicine, Lancet, Canadian Medical Association Journal, and Journal of the American Medical Association). The search strategy was designed to be a representative sample of the urological literature, so a timeframe of 3 years and a large breadth of journals were chosen to identify published urological RCTs. The detailed search strategy for both databases can be found in supplementary Appendix B (https://www.jurology.com). Eligibility Criteria Articles were included if they were (1) RCTs relevant to urology; (2) published between January 1, 2019, and December 31, 2021; and (3) had a clearly defined primary outcome(s), ie, statistically nonsignificant (ie, P > .05). Studies without a specific primary outcome, with a statistically significant primary outcome (P < .05), non-English literature, nonhuman based studies, and had an inappropriate study design (phase 1 and 2 trials, factorial and split-body designs, cluster trials, equivalence or non-inferiority trials, crossover trials, multigroup trials, observational studies, case reports, systematic reviews [and meta-analysis], diagnostic tests, research letters, pilot/feasibility studies, post hoc/subgroup analyses of published clinical trials, early/interim/preliminary reports of ongoing clinical trials) were excluded. Data Collection and Extraction Two independent reviewers screened (title and abstract, full text) and extracted the identified studies within the search parameters. A pilot screening assessment was performed on 10 studies to ensure agreement between reviewers. Results were collected through the Covidence software for systematic reviews (Veritas Health Innovation Ltd) and Microsoft Excel (Microsoft, Redmond, Washington). In the case of discrepancies between reviewers in screening or data collection, a joint discussion took place to resolve conflicts between the 2 reviewers and the senior author. If any of the inclusion criteria were unclear based on the abstract alone, the study was included for full-text review. All studies excluded at the full-text stage of screening have an outlined reason for exclusion. General characteristics data (eg, journal impact factor, number of PubMed citations) extracted from the included RCTs were conducted independently by 2 authors in duplicate. The journals’ impact factors were obtained through the 2021 Incites Journal Citation Report. When multiple funding sources were listed, “industry funding” and “public funding” were only recorded if all funding sources fell into either category. Otherwise, “industry and public funding” was recorded. A full list of study characteristics extracted from each study can be found in supplementary Appendix C (https://www.jurology.com). Spin Assessment Spin was assessed in the results and conclusions sections of the abstract, and the results, discussions, and conclusions of the main texts of the selected studies. Two reviewers independently assessed the content of the RCT using a standardized data abstraction form, with a preliminary review of 10 RCTs to ensure agreement before evaluating all RCTs. All discrepancies were discussed to obtain consensus. The assessment criteria of strategies and severity of spin were the same as those outlined by Boutron et al and are found below.5 Strategies of Spin The type of spin implemented in abstracts and main texts was classified under one of 3 strategies as listed below. Strategies of spin that cannot be classified under one of these 3 schemes were documented as “other.” Obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results such as within-group comparison, secondary outcomes, subgroup analyses, and modified population of analyses Interpreting statistically nonsignificant results for the primary outcomes as showing treatment equivalence or comparable effectiveness Claiming or emphasizing the beneficial effect of the treatment despite statistically nonsignificant results Severity of Spin The severity of spin within the conclusion sections of study abstracts and main texts were evaluated as one of high, moderate, low, or no spin. High spin is defined as no uncertainty in the framing, no recommendations for further trials, and no acknowledgment of the statistically nonsignificant results for the primary outcomes. In addition, when the conclusions section reported recommendations to use the treatment in clinical practice, we classified this section as having a high level of spin. Moderate spin is defined as some uncertainty in the framing or recommendations for further trials but no acknowledgment of the statistically nonsignificant results for the primary outcomes. Low spin is defined as uncertainty in the framing and recommendations for further trials or acknowledgment of the statistically nonsignificant results for the primary outcomes. No spin is defined as no uncertainty in the framing of results. Statistical Analysis All statistical analyses were calculated using GraphPad Prism software (GraphPad Software Inc, San Diego, California). Statistical results were considered significant at an alpha threshold of .05 using 2-tailed tests. The Spearman ρ correlation test was performed to assess for any associations between the severity of spin (abstract or main text) and number of citations per year and journal impact factor. The χ2 linear-by-linear association test with Bonferroni correction was conducted to assess the association between severity of spin (abstract or main text) and authors’ conflict of interest. The Kruskal-Wallis test with Dunn post hoc method was used to identify any associations between severity of spin (abstract or main text) and funding source. Results Search Results The initial database search performed on May 30, 2022, returned 4,339 studies; 993 duplicates were removed for 3,346 studies to be reviewed at title and abstract screening. Subsequently, 2,823 studies were excluded for not being a urology study or not meeting the pre-specified RCT design outlined in the eligibility criteria above, and 523 studies proceeded to full-text screening; ultimately, 46 studies were included for analysis (see Figure). The list of included studies can be found in supplementary Appendix D (https://www.jurology.com). The characteristics of our included studies are summarized in Table 1 and detailed breakdown of characteristics for individual studies is available in supplementary Appendix E (https://www.jurology.com). Figure. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) diagram. RCT indicates randomized controlled trial. Table 1. Summary of Study Characteristics Characteristic Year, No. (%) 2019 16 (34) 2020 15 (33) 2021 15 (33) Journal (2021 Web of Science impact factor), No. (%) European Urology (24.267) 12 (26) European Urology Focus (5.952) 2 (4) European Urology Oncology (8.208) 2 (4) The Journal of Urology® (7.600) 8 (18) BJU International (5.969) 6 (13) Journal of Sexual Medicine (3.937) 1 (1) Journal of Endourology (2.619) 6 (13) Journal of Pediatric Urology (1.921) 3 (7) Lancet (202.731) 3 (7) Journal of the American Medical Association (157.335) 3 (7) Journal impact factor, median (range) 7.6 (1.921-202.731) No. PubMed citations, median (range) 4 (0-103) Urological subspecialty, No. (%) Pediatric urology 3 (7) Uro-oncology 24 (52) Endourology 5 (21) Male infertility 3 (7) Renal transplant 1 (2) Neurourology 5 (21) Female urology 4 (9) Reconstructive urology 1 (2) Sample size, median (range) 175 (30-76,683) Presence of author conflict of interest, No. (%) Yes 23 (50) No 23 (50) Funding source, No. (%) None 12 (26) Industry funding 5 (11) Public funding 17 (37) Industry and public funding 10 (22) None reported 2 (4) General Spin Analysis Among the 46 included studies, strategies of spin were identified in 35 (76%); the majority of abstracts (n=26, 57%) and main texts (n=35, 76%) contained some level of spin. The prevalence of specific spin strategies within our included studies is detailed in Table 2. Table 3 provides an example, rationale, and implication of each spin strategy characterized by Boutron et al,5 as well as “other” strategies commonly used in the included studies. Examples for each severity of spin level and rationale for that rating can be found in Table 4. Cohen’s kappa of 0.79 was obtained for the spin analysis, indicating substantial agreement between raters and high inter-rater reliability. Table 2. Spin Strategies by Section Section Spin strategy Abstract Main text No. (%) No. (%) Title Misleading title NA 3 (7) Results Total 10 (22) 21 (46) Obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results such as within-group comparison, secondary outcomes, subgroup analyses, and modified population of analyses 5 (11) 3 (7) Interpreting statistically nonsignificant results for the primary outcomes as showing treatment equivalence or comparable effectiveness 1 (1) 1 (1) Claiming or emphasizing the beneficial effect of the treatment despite statistically nonsignificant results 0 (0) 2 (4) Others 5 (11) 17 (37) Discussion Total NA 26 (57) Obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results such as within-group comparison, secondary outcomes, subgroup analyses, and modified population of analyses NA 6 (13) Interpreting statistically nonsignificant results for the primary outcomes as showing treatment equivalence or comparable effectiveness NA 8 (17) Claiming or emphasizing the beneficial effect of the treatment despite statistically nonsignificant results NA 6 (13) Others NA 18 (39) Conclusion Total 25 (54) 29 (63) Obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results such as within-group comparison, secondary outcomes, subgroup analyses, and modified population of analyses 11 (24) 11 (24) Interpreting statistically nonsignificant results for the primary outcomes as showing treatment equivalence or comparable effectiveness 10 (22) 10 (20) Claiming or emphasizing the beneficial effect of the treatment despite statistically nonsignificant results 6 (13) 7 (15) Others 5 (11) 13 (28) Abbreviation: NA, not applicable. Table 3. Example, Rationale, and Implication of Spin Strategies From Reviewed Articles Study citationa Strategy Relevant information Excerpt Rationale and implication 36 Obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results • Standard care with 3 mo supervised aerobic and resistance exercise training vs standard care alone • Primary outcome: difference in fat mass at 3 mo • Results: no significant difference in body composition (P = .18) “A short-term programme of supervised exercise in patients with prostate cancer beginning ADT results in sustained improvements in QoL and cardiovascular events risk profile.” • The authors do not mention the nonsignificant primary outcome of fat mass at 3 mo • Only discuss the significant secondary results of QoL and cardiovascular complications as benefits of the intervention in the abstract conclusion • They therefore contradict the primary outcome that was defined a priori and obscure the implications of their nonsignificant finding 37 Interpreting statistically nonsignificant results for the primary outcomes as showing treatment equivalence or comparable effectiveness • LPN vs RAPN for treating renal masses • Primary outcome: renal function preservation, assessed by RS, at 6 mo after surgery • Results: no statistical difference in RS values between the 2 groups (P = .6) “In terms of preserving renal function, LPN in total ischemia and RAPN in selective ischemia are comparable.” • The authors claim both treatments are equally effective at preserving renal function • Cannot claim equivalence as the study was designed as a superiority trial rather than a non-inferiority trial • Conclusions may bias readers to conclude interventions are comparable/equal although the study was not designed to do so 44 Claiming or emphasizing the beneficial effect of the treatment despite statistically nonsignificant results • HMPO2 vs HMP • Primary outcome: renal function 12 mo following transplantation, measured by eGFR • Results: eGFR did not significantly differ between groups at 12 mo (p = 0.12) “At 12 mo post-transplant, the eGFR was higher in the HMPO2 group than in the HMP group (mean difference 3.7 mL/min/1.73 m2, 95% CI −1.0 to 8.4; P = .12) for those pairs of which both donor kidneys were still functioning.” • The authors claim that eGFR 6 mo after transplantation with HMPO2 is higher than with HMP, although the difference was statistically nonsignificant • For results that demonstrate statistical nonsignificance, claiming/emphasizing a beneficial effect for that outcome may mislead readers to believe an intervention is better than the control/comparator 30 Other: emphasis of trend despite nonsignificant results • Docetaxel and androgen deprivation therapy vs androgen deprivation therapy alone • Primary outcome: PSA progression, defined by PSA >2.0 mg above nadir • Results: the risk of progression over time in the 2 arms did not significantly differ (P = .6) “The interaction between GS class (GS >8/GS ≤8) and treatment group was close to significant (P = .059), and there was a tendency toward a treatment benefit in the high-risk (Gleason 9–10) subgroup (n = 80) with HR 0.67 (95% CI 0.34–1.30, P = 0.2) for PSA progression in arm A (docetaxel) vs arm B (surveillance; Fig. 3).” • The authors emphasized a tendency toward treatment benefit despite nonsignificant changes • Studies with nonsignificant results should not suggest a trend or tendency, as their data do not support this claim and comprise a clear strategy to misguide readers Abbreviations: ADT, androgen deprivation therapy; CI, confidence interval; eGFR, estimated glomerular filtration rate; GS, Gleason score; HMP, hypothermic machine perfusion without oxygenation; HMPO2, oxygenated hypothermic machine perfusion of the donor kidney; HR, hazard ratio; LPN, laparoscopic partial nephrectomy; PSA, prostate-specific antigen; QoL, quality of life; RAPN, robot-assisted partial nephrectomy; RS, renal scintigraphy. Citations included can be found in supplementary Appendix D (https://www.jurology.com). Bolded text refers to the negative implications of using the corresponding spin strategy. Table 4. Example and Rationale for Severity of Spin From Reviewed Articles Study citationa Severity Relevant information Excerpt Rationale for rating 6 High • Super-mini PCNL vs standard PCNL • Primary outcome: stone-clearance rates • Results: no significant differences in stone-clearance rates between groups (P = .56) “SMP is a newer miniaturised PCNL technique with good efficacy, reduced morbidity and hospital stay, which benefits the patient and national health expenditure. The smaller the PCNL tract size the lesser the morbidity, with an unaffected stone-clearance rate.” • There was no acknowledgment of the statistically nonsignificant difference between groups in the primary outcome • Emphasizes the benefits of SMP with no uncertainty in the framing or recommendations for further trials • May mislead readers to believe the intervention is superior to standard PCNL, despite the lack of evidence to suggest this 10 Moderate • Oral nutrition supplement vs multivitamin multimineral supplement during 8-wk perioperative period for radical cystectomy • Primary outcome: 30-day hospital-free days • Results: no significant difference in 30-day hospital-free days (P = .77) “Patients who undergo radical cystectomy after consuming an oral nutrition supplement perioperatively have a reduced prevalence of sarcopenia and may also experience fewer and less severe complications and readmissions. A larger blinded, randomized, controlled trial is necessary to determine whether oral nutrition supplement interventions can improve outcomes following radical cystectomy.” • The authors do not acknowledge the statistically nonsignificant results of their primary outcome • There is some uncertainty in the framing when discussing their statistically nonsignificant secondary outcomes (ie, complications and readmissions) • There is a call for the need for more robust research to further characterize the impact of the intervention on radical cystectomy outcomes 21 Low • Extended vs limited PLND in bladder cancer • Primary outcome: RFS • Results: no significant differences in RFS between the 2 groups (P = .36) “This trial assessing the therapeutic benefit of extended vs limited LND at the time of RC for urothelial BCa failed to show a significant improvement in the primary endpoint RFS and the secondary endpoints CSS and OS. There were survival differences between groups, but these did not reach conventional levels of statistical significance. A larger trial would be required to determine whether extended compared with limited LND leads to a small, but clinically relevant, survival difference.” • Acknowledges that the difference in their primary outcome of RFS was not significantly different between groups • Highlights the need for future trials • However, they claim that their nonsignificant outcomes of survival (ie, RFS, CSS, OS) differed between groups, potentially misleading readers to believe that one intervention is better than the other 3 None • Music during flexible cystoscopy vs no music • Primary outcome: VAS pain scale and change in STAI-S for anxiety • Results: no differences in either VAS or STAI-S scores between groups (P = .86 and P = .33) “Music does not appear to decrease perceived pain or anxiety when used during flexible cystoscopy. These findings may differ from the literature due to several factors, most significantly blinding of participants, but also potentially due to the ethnic composition of the study population or lack of choice of music.” • No spin identified in the conclusion • Acknowledged the nonsignificant differences between groups in both primary outcomes • Highlighted reasons why their findings differ from literature Abbreviations: BCa, bladder cancer; CSS, cancer-specific survival; LND, lymph node dissection; OS, overall survival; PCNL, percutaneous nephrolithotomy; PLND, pelvic lymph node dissection; RC, radical cystectomy; RFS, recurrence-free survival; SMP, super-mini PCNL; STAI-S, State-Trait Anxiety Inventory; VAS, visual analog scale. Citations included can be found in supplementary Appendix D (https://www.jurology.com). Spin Analysis: Abstract In the abstracts, 10 results sections (22%) and 25 conclusion sections (54%) contained spin. The most commonly employed spin strategies were “obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results” and “other” in the results sections (n=5, 11%) and “obscuring the statistical nonsignificance of the primary outcome and focusing on statistically significant secondary results” in the conclusion sections (n=11, 24%). Nine studies (20%) identified high spin, 12 (26%) identified moderate spin, 4 (8%) identified low spin, and 21 (46%) identified no spin. Spin Analysis: Main Text In the main texts, 21 results sections (46%), 26 discussion sections (57%), and 29 conclusion sections (63%) contained spin. The most commonly employed spin strategies were “other” in the results (n=17, 37%), discussion (n=18, 39%), and conclusion sections (n=13, 28%), specifically “emphasis of trend despite nonsignificant results.” Thirteen studies (28%) identified high spin, 9 (20%) identified moderate spin, 7 (15%) identified low spin, and 17 (37%) identified no sp
更多
查看译文
关键词
urology,spin”,controlled trials,outcomes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要