Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu.

Documents

System is processing data
Please download to view
1
Description
Text
  • Slide 1
  • Research in Empirical Software Eng. Reduced-Parameter Modeling (RPM) for Cost Estimation Models Zhihao Chen zhihaoch@cse.usc.edu
  • Slide 2
  • Research in Empirical Software Eng. 2 Reduced-Parameter Modeling (RPM) What Is RPM? How Does It Work? Why Is It Useful? What Should You Not Use It?
  • Slide 3
  • Research in Empirical Software Eng. 3 What is RPM? A machine learning technique for determining a minimum-essential set of cost model parameters Using an organization’s particular project data points Assuming that the organization’s project data points will be representative of its future projects
  • Slide 4
  • Research in Empirical Software Eng. 4 Why Is It Useful? Simplifies cost model usage and data collection Often improves estimation accuracy –Eliminates highly-correlated, weak- dispersion, or noisy-data parameters Identifies organization’s most important cost drivers for productivity improvement
  • Slide 5
  • Research in Empirical Software Eng. 5 Organizations Have Different Data Distributions Correlation Analysis of COCOMO81 63 Projects Correlation Analysis of NASA Project02 22 Projects
  • Slide 6
  • Research in Empirical Software Eng. 6 Under-sampling: A Case Study for CPLX in NASA 60 If the even higher complexity projects were the most important ones to NASA, redefine the complexity for the highly complex NASA systems. Is software complexity a useful cost driver in this domain? In NASA60 data set, CPLX=high (usually); Little information in this parameter Consider dropping the parameter
  • Slide 7
  • Research in Empirical Software Eng. 7 How Does It Work – Technically? Organization collects critical mass of similar project data RPM tool starts with Size, tests which additional parameter produces most accurate estimates –By calibrating many times to random data subsets, testing on holdout data points RPM tool continues to add next best parameters until accuracy starts to decrease –This produces best RPM for the data set
  • Slide 8
  • Research in Empirical Software Eng. 8 Real and Large Industry Data Research is supported by CSE and NASA/JPL Two datasets are public and available from PROMISE Software Engineering Repository - http://promise.site.uottawa.ca/ http://promise.site.uottawa.ca/ –63 projects in Cocomo81/Software cost estimation –60 projects NASA/Software cost estimation Two datasets from COCOMO II database –161 projects in COCOMO II 2000 –119 projects in COCOMO II 2004 More data are coming –30 more projects from JPL The techniques can be applied and basic results generalized to any model
  • Slide 9
  • Research in Empirical Software Eng. 9 Example Result
  • Slide 10
  • Research in Empirical Software Eng. 10 What Should You Not Use It Do not subtract the parameters are important. –In many domains, expert business users hold in their head more knowledge than might be available in historical databases Do not subtract parameter you still might need them. –User needs some of the subtracted parameters to make a business decision.
  • Slide 11
  • Research in Empirical Software Eng. 11 Published Results Chen, Menzies, Port, and Boehm. "Finding the Right Data for Software Cost Modeling", IEEE Software 11/2005.Finding the Right Data for Software Cost Modeling Menzies, Port, Chen, and Hihn. "Specialization and Extrapolation of Software Cost Models", ASE 2005, Long Beach, California, 11/2005.Specialization and Extrapolation of Software Cost ModelsASE 2005 Menzies, Port, Chen, Hihn, and Stukes. "Validation Methods for Calibration Software Effort Models", ICSE 2005, 05/2005, St. Louis, MissouriValidation Methods for Calibration Software Effort ModelsICSE 2005 Yang, Chen, Valerdi, and Boehm. "Effect of Schedule Compression on Project Effort", ISPA 2005, 06/2005, Denver, ColoradoEffect of Schedule Compression on Project Effort ISPA 2005 Chen, Menzies, Port, and Boehm. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy", PROMISE 2005, 05/2005, St. Louis, MissouriFeature Subset Selection Can Improve Software Cost Estimation AccuracyPROMISE 2005 Menzies, Chen, Port, and Hihn. "Simple Software Cost Analysis: Safe or Unsafe?", PROMISE 2005, 05/2005, St. Louis, Missouri Simple Software Cost Analysis: Safe or Unsafe? PROMISE 2005 Some results have been recently published on the use of data mining and machine learning techniques to analyze cost estimation models and data All papers are available from http://www.ssei.org/chen/papers/papers.htmlhttp://www.ssei.org/chen/papers/papers.html
  • Slide 12
  • Research in Empirical Software Eng. 12 Question and Answer
Comments
Top