skip to main content
10.1145/3395363.3397364acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Dependent-test-aware regression testing techniques

Published: 18 July 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Developers typically rely on regression testing techniques to ensure that their changes do not break existing functionality. Unfortunately, these techniques suffer from flaky tests, which can both pass and fail when run multiple times on the same version of code and tests. One prominent type of flaky tests is order-dependent (OD) tests, which are tests that pass when run in one order but fail when run in another order. Although OD tests may cause flaky-test failures, OD tests can help developers run their tests faster by allowing them to share resources. We propose to make regression testing techniques dependent-test-aware to reduce flaky-test failures.
    To understand the necessity of dependent-test-aware regression testing techniques, we conduct the first study on the impact of OD tests on three regression testing techniques: test prioritization, test selection, and test parallelization. In particular, we implement 4 test prioritization, 6 test selection, and 2 test parallelization algorithms, and we evaluate them on 11 Java modules with OD tests. When we run the orders produced by the traditional, dependent-test-unaware regression testing algorithms, 82% of human-written test suites and 100% of automatically-generated test suites with OD tests have at least one flaky-test failure.
    We develop a general approach for enhancing regression testing algorithms to make them dependent-test-aware, and apply our approach to 12 algorithms. Compared to traditional, unenhanced regression testing algorithms, the enhanced algorithms use provided test dependencies to produce orders with different permutations or extra tests. Our evaluation shows that, in comparison to the orders produced by unenhanced algorithms, the orders produced by enhanced algorithms (1) have overall 80% fewer flaky-test failures due to OD tests, and (2) may add extra tests but run only 1% slower on average. Our results suggest that enhancing regression testing algorithms to be dependent-test-aware can substantially reduce flaky-test failures with only a minor slowdown to run the tests.

    References

    [1]
    2011. Spock Stepwise. https://www.canoo.com/blog/2011/04/12/spock-stepwise.
    [2]
    2012. JUnit and Java 7. http://intellijava.blogspot.com/ 2012 /05/junit-and-java7. html.
    [3]
    2013. JUnit test method ordering. http://www.java-allandsundry.com/ 2013 /01.
    [4]
    2013. Maintaining the order of JUnit3 tests with JDK 1.7. https://coderanch.com/ t/600985/engineering/Maintaining-order-JUnit-tests-JDK.
    [5]
    2013. Test execution order in JUnit. https://github.com/junit-team/junit/blob/ master/doc/ReleaseNotes4.11. md#test-execution-order.
    [6]
    2016. Running your tests in a specific order. https://www.ontestautomation. com/running-your-tests-in-a-specific-order
    [7]
    2019. Run tests in parallel using the Visual Studio Test task. https://docs. microsoft.com/en-us/azure/devops/pipelines/test/parallel-testing-vstest.
    [8]
    2020. Accommodating Test Dependence Project Web. https://sites.google.com/ view/test-dependence-impact
    [9]
    2020. Cucumber Reference-Scenario hooks. https://cucumber.io/docs/cucumber/ api/#hooks.
    [10]
    2020. DepUnit. https://www.openhub.net/p/depunit.
    [11]
    2020. SLOCCount. https://dwheeler.com/sloccount
    [12]
    2020. TestNG. http://testng.org.
    [13]
    2020. TestNG Dependencies. https://testng.org/doc/documentation-main. html# dependent-methods.
    [14]
    Stephan Arlt, Tobias Morciniec, Andreas Podelski, and Silke Wagner. 2015. If A fails, can B still succeed? Inferring dependencies between test results in automotive system testing. In ICST. Graz, Austria, 1-10.
    [15]
    Jonathan Bell. 2014. Detecting, isolating, and enforcing dependencies among and within test cases. In FSE. Hong Kong, 799-802.
    [16]
    Jonathan Bell and Gail Kaiser. 2014. Unit test virtualization with VMVM. In ICSE. Hyderabad, India, 550-561.
    [17]
    Jonathan Bell, Gail Kaiser, Eric Melski, and Mohan Dattatreya. 2015. Eficient dependency detection for safe Java test acceleration. In ESEC/FSE. Bergamo, Italy, 770-781.
    [18]
    Benny Bergelson and Iaakov Exman. 2006. Dynamic test composition in hierarchical software testing. In Convention of Electrical and Electronics Engineers in Israel. Eilat, Israel, 37-41.
    [19]
    Matteo Biagiola, Andrea Stocco, Ali Mesbah, Filippo Ricca, and Paolo Tonella. 2019. Web test dependency detection. In ESEC/FSE. Tallinn, Estonia, 154-164.
    [20]
    Lionel C. Briand, Yvan Labiche, and S. He. 2009. Automating regression test selection based on UML designs. Information and Software Technology 51, 1 ( January 2009 ), 16-30.
    [21]
    Koen Claessen and John Hughes. 2000. QuickCheck: A lightweight tool for random testing of Haskell programs. In ICFP. Montreal, Canada, 268-279.
    [22]
    Christoph Csallner and Yannis Smaragdakis. 2004. JCrasher: An automatic robustness tester for Java. Software: Practice and Experience 34, 11 ( September 2004 ), 1025-1050.
    [23]
    Sebastian Elbaum, Hui Nee Chin, Matthew B. Dwyer, and Jonathan Dokulil. 2006. Carving diferential unit test cases from system test cases. In FSE. Portland, OR, USA, 253-264.
    [24]
    Sebastian Elbaum, Alexey G. Malishevsky, and Gregg Rothermel. 2000. Prioritizing test cases for regression testing. In ISSTA. Portland, OR, USA, 102-112.
    [25]
    Gordon Fraser and Andreas Zeller. 2011. Generating parameterized unit tests. In ISSTA. Toronto, Canada, 364-374.
    [26]
    Alessio Gambi, Jonathan Bell, and Andreas Zeller. 2018. Practical test dependency detection. In ICST. Vasteras, Sweden, 1-11.
    [27]
    Zebao Gao, Yalan Liang, Myra B. Cohen, Atif M. Memon, and Zhen Wang. 2015. Making system user interactive tests repeatable: When and what should we control?. In ICSE. Florence, Italy, 55-65.
    [28]
    Alex Groce, Amin Alipour, Chaoqiang Zhang, Yang Chen, and John Regehr. 2014. Cause reduction for quick testing. In ICST. Cleveland, OH, USA, 243-252.
    [29]
    Alex Gyori, August Shi, Farah Hariri, and Darko Marinov. 2015. Reliable testing: Detecting state-polluting tests to prevent test dependency. In ISSTA. Baltimore, MD, USA, 223-233.
    [30]
    Shifa Zehra Haidry and Tim Miller. 2013. Using dependency structures for prioritization of functional test suites. IEEE Transactions on Software Engineering 39, 2 ( 2013 ), 258-275.
    [31]
    Mark Harman and Peter O'Hearn. 2018. From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis. In SCAM. 1-23.
    [32]
    Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, S. Alexander Spoon, and Ashish Gujarathi. 2001. Regression test selection for Java software. In OOPSLA. Tampa Bay, FL, USA, 312-326.
    [33]
    Kim Herzig, Michaela Greiler, Jacek Czerwonka, and Brendan Murphy. 2015. The art of testing less without sacrificing quality. In ICSE. Florence, Italy, 483-493.
    [34]
    William E. Howden. 1975. Methodology for the generation of program test data. IEEE Transactions on Computers C-24, 5 (May 1975 ), 554-560.
    [35]
    Hwa-You Hsu and Alessandro Orso. 2009. MINTS: A general framework and tool for supporting test-suite minimization. In ICSE. Vancouver, BC, Canada, 419-429.
    [36]
    Chen Huo and James Clause. 2014. Improving oracle quality by detecting brittle assertions and unused inputs in tests. In FSE. Hong Kong, 621-631.
    [37]
    Bo Jiang, Zhenyu Zhang, W. K. Chan, and T. H. Tse. 2009. Adaptive random test case prioritization. In ASE. Auckland, NZ, 233-244.
    [38]
    James A. Jones, Mary Jean Harrold, and John Stasko. 2002. Visualization of test information to assist fault localization. In ICSE. Orlando, Florida, 467-477.
    [39]
    Gregory M. Kapfhammer and Mary Lou Sofa. 2003. A family of test adequacy criteria for database-driven applications. In ESEC/FSE. Helsinki, Finland, 98-107.
    [40]
    Jung-Min Kim and Adam Porter. 2002. A history-based test prioritization technique for regression testing in resource constrained environments. In ICSE. Orlando, Florida, 119-129.
    [41]
    Taesoo Kim, Ramesh Chandra, and Nickolai Zeldovich. 2013. Optimizing unit test execution in large software programs using dependency analysis. In APSys. Singapore, 19 : 1-19 : 6.
    [42]
    Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In ISSTA. Beijing, China, 101-111.
    [43]
    Wing Lam, Kıvanç Muşlu, Hitesh Sajnani, and Suresh Thummalapenta. 2020. A Study on the Lifecycle of Flaky Tests. In ICSE. Seoul, South Korea, pages-toappear.
    [44]
    Wing Lam, Reed Oei, August Shi, Darko Marinov, and Tao Xie. 2019. iDFlakies: A framework for detecting and partially classifying flaky tests. In ICST. Xi'an, China, 312-322.
    [45]
    Jingjing Liang, Sebastian Elbaum, and Gregg Rothermel. 2018. Redefining prioritization: Continuous prioritization for continuous integration. In ICSE. Gothenburg, Sweden, 688-698.
    [46]
    Jun-Wei Lin, Reyhaneh Jabbarvand, Joshua Garcia, and Sam Malek. 2018. Nemo: Multi-criteria test-suite minimization with integer nonlinear programming. In ICSE. Gothenburg, Sweden, 1039-1049.
    [47]
    Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In FSE. Hong Kong, 643-653.
    [48]
    John Micco. 2016. Flaky tests at Google and how we mitigate them. https: //testing.googleblog.com/ 2016 /05/flaky-tests-at-google-and-how-we.html
    [49]
    John Micco. 2017. The state of continuous integration testing @ Google. https: //ai.google/research/pubs/pub45880
    [50]
    Sasa Misailovic, Aleksandar Milicevic, Nemanja Petrovic, Sarfraz Khurshid, and Darko Marinov. 2007. Parallel test generation and execution with Korat. In ESEC/FSE. Dubrovnik, Croatia, 135-144.
    [51]
    Agastya Nanda, Senthil Mani, Saurabh Sinha, Mary Jean Harrold, and Alessandro Orso. 2011. Regression testing in the presence of non-code changes. In ICST. Berlin, Germany, 21-30.
    [52]
    Alessandro Orso, Nanjuan Shi, and Mary Jean Harrold. 2004. Scaling regression testing to large software systems. In FSE. Newport Beach, CA, USA, 241-251.
    [53]
    Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-directed random test generation. In ICSE. Minneapolis, MN, USA, 75-84.
    [54]
    Md Tajmilur Rahman and Peter C. Rigby. 2018. The impact of failing, flaky, and high failure tests on the number of crash reports associated with Firefox builds. In ESEC/FSE. Lake Buena Vista, FL, USA, 857-862.
    [55]
    Gregg Rothermel, Sebastian Elbaum, Alexey G. Malishevsky, Praveen Kallakuri, and Xuemei Qiu. 2004. On test suite composition and cost-efective regression testing. ACM Transactions on Software Engineering and Methodology 13, 3 ( July 2004 ), 277-331.
    [56]
    Gregg Rothermel, Roland H. Untch, Chengyun Chu, and Mary Jean Harrold. 2001. Prioritizing test cases for regression testing. IEEE Transactions on Software Engineering 27, 10 ( October 2001 ), 929-948.
    [57]
    Matthew J. Rummel, Gregory M. Kapfhammer, and Andrew Thall. 2005. Towards the prioritization of regression test suites with data flow information. In SAC. Santa Fe, NM, USA, 1499-1504.
    [58]
    David Saf, Shay Artzi, Jef H. Perkins, and Michael D. Ernst. 2005. Automatic test factoring for Java. In ASE. Long Beach, CA, USA, 114-123.
    [59]
    David Schuler, Valentin Dallmeier, and Andreas Zeller. 2009. Eficient mutation testing by checking invariant violations. In ISSTA. Chicago, IL, USA, 69-80.
    [60]
    August Shi, Alex Gyori, Owolabi Legunsen, and Darko Marinov. 2016. Detecting assumptions on deterministic implementations of non-deterministic specifications. In ICST. Chicago, IL, USA, 80-90.
    [61]
    August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In ESEC/FSE. Tallinn, Estonia, 545-555.
    [62]
    Amitabh Srivastava and Jay Thiagarajan. 2002. Efectively prioritizing tests in development environment. In ISSTA. Rome, Italy, 97-106.
    [63]
    Friedrich Steimann, Marcus Frenkel, and Rui Abreu. 2013. Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators. In ISSTA. Lugano, Switzerland, 314-324.
    [64]
    Swapna Thorve, Chandani Sreshtha, and Na Meng. 2018. An empirical study of lfaky tests in Android apps. In ICSME, NIER Track. Madrid, Spain, 534-538.
    [65]
    Matias Waterloo, Suzette Person, and Sebastian Elbaum. 2015. Test analysis: Searching for faults in tests. In ASE. Lincoln, NE, USA, 149-154.
    [66]
    Ming Wu, Fan Long, Xi Wang, Zhilei Xu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, and Zheng Zhang. 2010. Language-based replay via data flow cut. In FSE. Santa Fe, NM, USA, 197-206.
    [67]
    Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: A survey. Journal of Software Testing, Verification and Reliability 22, 2 (March 2012 ), 67-120.
    [68]
    Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failureinducing input. IEEE Transactions on Software Engineering 28, 3 ( February 2002 ), 183-200.
    [69]
    Lingming Zhang, Darko Marinov, and Sarfraz Khurshid. 2013. Faster mutation testing inspired by test prioritization and reduction. In ISSTA. Lugano, Switzerland, 235-245.
    [70]
    Lingming Zhang, Darko Marinov, Lu Zhang, and Sarfraz Khurshid. 2012. Regression mutation testing. In ISSTA. Minneapolis, MN, USA, 331-341.
    [71]
    Lingming Zhang, Lu Zhang, and Sarfraz Khurshid. 2013. Injecting mechanical faults to localize developer faults for evolving software. In OOPSLA. Indianapolis, IN, USA, 765-784.
    [72]
    Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Muşlu, Wing Lam, Michael D. Ernst, and David Notkin. 2014. Empirically revisiting the test independence assumption. In ISSTA. San Jose, CA, USA, 385-396.
    [73]
    Sai Zhang, David Saf, Yingyi Bu, and Michael D. Ernst. 2011. Combined static and dynamic automated test generation. In ISSTA. Toronto, Canada, 353-363.

    Cited By

    View all
    • (2024)WEFix: Intelligent Automatic Generation of Explicit Waits for Efficient Web End-to-End Flaky TestsProceedings of the ACM on Web Conference 202410.1145/3589334.3645628(3043-3052)Online publication date: 13-May-2024
    • (2024)Exploiting DBSCAN and Combination Strategy to Prioritize the Test Suite in Regression TestingIET Software10.1049/2024/99429592024(1-14)Online publication date: 4-Apr-2024
    • (2023)Systematically Producing Test Orders to Detect Order-Dependent Flaky TestsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598083(627-638)Online publication date: 12-Jul-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
    July 2020
    591 pages
    ISBN:9781450380089
    DOI:10.1145/3395363
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. flaky test
    2. order-dependent test
    3. regression testing

    Qualifiers

    • Research-article

    Conference

    ISSTA '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 58 of 213 submissions, 27%

    Upcoming Conference

    ISSTA '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)237
    • Downloads (Last 6 weeks)23

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)WEFix: Intelligent Automatic Generation of Explicit Waits for Efficient Web End-to-End Flaky TestsProceedings of the ACM on Web Conference 202410.1145/3589334.3645628(3043-3052)Online publication date: 13-May-2024
    • (2024)Exploiting DBSCAN and Combination Strategy to Prioritize the Test Suite in Regression TestingIET Software10.1049/2024/99429592024(1-14)Online publication date: 4-Apr-2024
    • (2023)Systematically Producing Test Orders to Detect Order-Dependent Flaky TestsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598083(627-638)Online publication date: 12-Jul-2023
    • (2023)A Taxonomy of Information Attributes for Test Case Prioritisation: Applicability, Machine LearningACM Transactions on Software Engineering and Methodology10.1145/351180532:1(1-42)Online publication date: 13-Feb-2023
    • (2023)Flakify: A Black-Box, Language Model-Based Predictor for Flaky TestsIEEE Transactions on Software Engineering10.1109/TSE.2022.320120949:4(1912-1927)Online publication date: 1-Apr-2023
    • (2023)Orchestration Strategies for Regression Test Suites2023 IEEE/ACM International Conference on Automation of Software Test (AST)10.1109/AST58925.2023.00020(163-167)Online publication date: May-2023
    • (2023)Application of the Law of Minimum and Dissimilarity Analysis to Regression Test Case PrioritizationIEEE Access10.1109/ACCESS.2023.328321211(57137-57157)Online publication date: 2023
    • (2023)Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning modelsEmpirical Software Engineering10.1007/s10664-023-10307-w28:3Online publication date: 28-Apr-2023
    • (2022)API Message-Driven Regression Testing FrameworkElectronics10.3390/electronics1117267111:17(2671)Online publication date: 26-Aug-2022
    • (2022)Evolution-aware detection of order-dependent flaky testsProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534404(114-125)Online publication date: 18-Jul-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media