In this study, we assess how program-committee members of journals and conferences in the software-engineering domain think about empirical research.
The Future of Empirical Research in Software Engineering In software engineering, empirical studies became more and more important, especially over the past few years. Empirical studies come with several obstacles. In this questionnaire, we are interested in the validity of empirical studies. Typically, two kinds of validity are of primary concern: internal and external validity.
Internal validity is the degree to which the influence of confounding factors on the results are excluded. This allows experimenters to observe the results without bias. For example, when recruiting novice programmers, results are not biased by different levels of programming experience.
In contrast, external validity is the degree to which results of one experiment can be generalized. For example, when recruiting programmers with different levels of programming experience, according experimental results apply to these different levels of programming experience.
There is a trade-off between internal and external validity; only one at a time can be maximized. There are different ways to address this trade off in empirical research, and we would like your thoughts on this.
Since you are an expert in software engineering, we highly value your opinion. To help us better understand your answers, we would like to know for which conferences and journals you served as a reviewer (technical and research papers). We will anonymize your data.
| Conference | 2014 | 2013 | 2012 | 2011 | 2010 | Full name | | ----------------| ----- | ------ | | ASE | [] | [] | [] | [] | [] | International Conference on Automated Software Engineering | | EASE | [] | [] | [] | [] | [] | International Conference on Evaluation and Assessment in Software Engineering | | ECOOP | [] | [] | [] | [] | [] | International Conference on Evaluation and Assessment in Software Engineering | | EMSE | [] | [] | [] | [] | [] | Empirical Software Engineering Journal | | ESEC/FSE | [] | [] | [] | [] | [] | European Software Engineering Conference/Symposium on the Foundations of Software Engineering | | ESEM | [] | [] | [] | [] | [] | International Symposium on Empirical Software Engineering and Measurement | | GPCE | [] | [] | [] | [] | [] | International Conference on Generative Programming: Concepts & Experiences | | ICPC | [] | [] | [] | [] | [] | International Conference on Program Comprehension | | ICSE | [] | [] | [] | [] | [] | International Conference on Software Engineering | | ICSM | [] | [] | [] | [] | [] | International Conference on Software Maintenance | | OOPSLA | [] | [] | [] | [] | [] | International Conference on Object-Oriented Programming, Systems, Languages, and Applications | | TOSEM | [] | [] | [] | [] | [] | ACM Transactions on Software Engineering and Methodology | | TSE | [] | [] | [] | [] | [] | IEEE Transactions on Software Engineering |
<br/ > Did we forget some important venue? Let us know which venue and in which years (starting from 2010) you reviewed papers:
{rows:7}
Consider the following scenario: Suppose you are a reviewer of a submission in which the authors want to determine, based on an empirical study with human participants, whether functional or object-oriented programming (FP vs. OOP) is more comprehensible for programmers. So far, this question has never been addressed. Now, there are two options to design the study:
Option 1: maximize internal validity
The authors develop an artificial language that has a very similar design in the functional and object-oriented version. They leave out special features of existing languages (e.g., generics, classes, ...), recruit students as participants, use a stripped-down IDE, use artificial tasks to measure program comprehension, etc. In short, authors exclude the influence of all possible confounding factors.
Option 2: maximize external validity
Instead of creating an artificial set-up, the authors use existing languages, IDEs, and tasks to conduct the experiment. Authors recruit professional programmers as participants, use real projects from SourceForge, Eclipse as IDE, and let participants fix real bugs. In other words, authors create a practical, everyday setting.
- Option 1: maximize internal validity
- Option 2: maximize external validity
- No preference
Please, elaborate:
{rows:7}
favored option?
- Yes
- No
Please, elaborate:
{rows:7}
questions like the one outlined above (FP vs. OOP)?
{rows:7}
Consider a different research question, not one in which human participants are observed, but, say, a new approach that promises faster response times for database systems.
Again, there are the two options for evaluation: maximize internal (e.g., look at one database system in detail) or maximize external (e.g., look at as many systems as possible, neglecting system-specific details) validity.
Assuming that both options would be realized in the best possible way, which option would you prefer for evaluation like the one outlined above?
- Maximize internal validity
- Maximize external validity
Please, elaborate why you did or did not select a different option than for Question 1:
{rows:7}
For the following questions, please look back on your activity as a reviewer.
following reasons?
- Internal validity too low
- External validity too low
Please, elaborate:
{rows:7}
do you prefer more practically relevant research or more theoretical (ground) research?
- Applied research (focus on practicability)
- Basic research (focus on sound scientific foundations)
- No preference
Please, elaborate:
{rows:7}
paper regarding internal and external validity?
- Yes, I now appreciate papers with high internal validity more.
- Yes, I now appreciate papers with high external validity more.
- No, I have not changed my view.
Please, elaborate:
{rows:7}
The following questions are related to the representation of empirical research in the software-engineering literature.
evaluations with human participants...
| |Considerably less often | Less often | Fine as is | More | Considerably more| I do not know | | ----------------| ----- | ------ |-----|-----|-----|---| | ...are needed more or less often?|||||||| | | Considerably too often rejected | Too often rejected | Fine as is | Too often accepted | Considerably too often accepted | I do not know | | ----------------| ----- | ------ |-----|-----|-----| |...are accepted/rejected too often? |||||||| | | Considerably higher internal validity | Higher internal validity | Fine as is | Higher external validity | Considerably higher external validity | I do not know | | ----------------| ----- | ------ |-----|-----|-----| |...need higher internal/ external validity?||||||||
Please, elaborate:
{rows:7}
participants (e.g., performance evaluation, code measurement)...
| |Considerably less often | Less often | Fine as is | More | Considerably more| I do not know | | ----------------| ----- | ------ |-----|-----|-----|---| | ...are needed more or less often?|||||||| | | Considerably too often rejected | Too often rejected | Fine as is | Too often accepted | Considerably too often accepted | I do not know | | ----------------| ----- | ------ |-----|-----|-----| |...are accepted/rejected too often? |||||||| | | Considerably higher internal validity | Higher internal validity | Fine as is | Higher external validity | Considerably higher external validity | I do not know | | ----------------| ----- | ------ |-----|-----|-----| |...need higher internal/ external validity?||||||||
Please, elaborate:
{rows:7}
To increase validity of empirical studies, researchers replicate experiments. That is, the same or other researchers conduct the experiment again, either exactly as it took place, or with some modifications.
replicated study?
- Never
- Sometimes
- Regularly
Please, elaborate:
{rows:7}
number of replicated studies?
- Yes, it increased.
- Yes, it decreased.
- No
Please, elaborate:
{rows:7}
computer science?
- Yes
- No
Please, elaborate:
{rows:7}
that, as the main contribution,... (assuming authors realized it in the best possible way)
Yes | No | I do not know | |
---|---|---|---|
**...exactly replicates a previously published experiment of *the | |||
same* group?** | |||
**...exactly replicates a previously published experiment of | |||
another group?** | |||
**...replicates a previously published experiment of the same | |||
group, but increases external validity (e.g., by recruiting expert | |||
programmers or using another programming language)?** | |||
**...replicates a previously published experiment of another group, | |||
but increases external validity (e.g., by recruiting expert | |||
programmers or using another programming language)?** | |||
**...replicates a previously published experiment of the same | |||
group, but increases internal validity (e.g., by recruiting expert | |||
programmers or using another programming language)?** | |||
**...replicates a previously published experiment of another group, | |||
but increases internal validity (e.g., by recruiting expert | |||
programmers or using another programming language)?** |
{rows:7}
A huge problem for authors is that empirical evaluations require a high effort in terms of time and cost (e.g., recruiting participants, designing tasks, selecting/designing languages). Since the outcome of an experiment is not clear and might be biased (e.g., due to deviations), the effort is at high risk.
but with publication guarantees?
That is, the paper is guaranteed to be published (independent of the results), if the authors conduct a further, sound empirical evaluation that improves either internal or external validity?
Answer:
{rows:7}
Do you have any suggestions on how empirical researchers can solve the dilemma of internal vs. external validity of empirical work in computer science?
{rows:7}
external validity
(e.g., maturity of research area, availability of experimental data,
effort of recruiting and preparing participants/subject systems)
Answer:
{rows:7}
or empirical research in general?
Comments:
{rows:7}
Thank you very much for your time. If you have any questions regarding
this survey, please contact:
Janet Siegmund: [email protected]
Norbert Siegmund: [email protected]
Sven Apel: [email protected]