SpecGuru Project

Inferring Software Specifications from Open Source Repositories by Leveraging Data and Collective Community Expertise

Welcome to the Specification Inference Project

The SpecGuru project is a collaborative project of Iowa State University (ISU), Bowling Green State University (BGSU), the University of Central Florida (UCF) the Penn State University (PSU), and the University of Texas, Dallas (UT Dallas). The overall project PI is Dr. Hridesh Rajan at Iowa State University, Department of Computer Science. The BGSU site is led by Dr. Robert Dyer. The UCF site is led by Dr. Gary T. Leavens. The PSU site is led by Dr. Vasant G. Honavar. The UT Dallas site is led by Dr. Tien N. Nguyen.

Problems and Goals

Despite their proven benefits, useful, comprehensible, and efficiently checkable specifications are not widely available. This is primarily because writing useful, non-trivial specifications from scratch is too hard, time consuming, and requires expertise that is not broadly available. Furthermore, the lack of specifications for widely-used libraries and frameworks, caused by the high cost of writing specifications, tends to have a snowball effect. Core libraries lack specifications, which makes specifying applications that use them expensive.

To contain the skyrocketing development and maintenance costs of high assurance systems, this self-perpetuating cycle must be broken. The labor cost of specifying programs can be significantly decreased via advances in specification inference and synthesis, and this has been attempted several times, but with limited success. We believe that practical specification inference and synthesis is an idea whose time has come. Fundamental breakthroughs in this area can be achieved by leveraging the collective intelligence available in software artifacts from millions of open source projects. Fine-grained access to such data sets has been unprecedented, but is now easily available. In this project we are making advances in specification inference that can be had by using such data sets to infer specifications.

Publications and Other Products

This project has led to the following publications and other products.

  1. Hridesh Rajan and Tien N. Nguyen and Gary T. Leavens and Robert Dyer, “Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence,” ICSE’15: The 37th International Conference on Software Engineering, Florence, Italy, May 2015.
  2. Hoan Anh Nguyen, Robert Dyer, Tien N. Nguyen, and Hridesh Rajan, “Mining preconditions of APIs in large-scale code corpus,” FSE’14: The 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 166-177. DOI: http://dx.doi.org/10.1145/2635868.2635924
  3. Samantha Syeda Khairunnesa, Hoan Anh Nguyen, Tien N. Nguyen, and Hridesh Rajan, “Exploiting Implicit Beliefs to Resolve Sparse Usage Problem in Usage-based Specification Mining,” OOPSLA’17: The ACM SIGPLAN conference on Object-Oriented Programming, Systems, Languages, and Applications, October, 2017. DOI: https://doi.org/10.1145/3133907
  4. Hung Phan, Hoan Anh Nguyen, Tien N. Nguyen, and Hridesh Rajan, “Statistical Learning for Inference Between Implementations and Documentation,” ICSE’17: The 39th International Conference on Software Engineering: New Ideas and Emerging Results Track, May, 2017. DOI: https://doi.org/10.1109/ICSE-NIER.2017.9


This project has been supported in part by the US National Science Foundation under the grant “SHF: Large:Collaborative Research: Inferring Software Specifications from Open Source Repositories by Leveraging Data and Collective Community Expertise.” PI: Hridesh Rajan and Co-I: Robert Dyer, Tien Nguyen, Gary T. Leavens, and Vasant Honavar (2015-2018). Links: ISU, BGSU, UCF, and PSU.

Having trouble with Pages? Contact us and we’ll help you sort it out.