The goals of the Visual Labs project are to create application-specific visual languages for use in computer science and engineering labs and to evaluate their effectiveness in comparison to textual languages. The premise of the project is that students will benefit from the conceptual modeling power of graphical representations of programs that they can build and test, and that they will be able to get more work done in a time-limited lab setting. In this paper, we summarize the results of our research to date, describe for the first time our visual labs for database systems, and detail our plan for evaluating the relative effectiveness of visual labs (in which students create and test graphical models) and traditional labs (in which students write textual code).
The goals of the Visual Labs project are to create application-specific visual languages for use in computer science and engineering labs and to evaluate their effectiveness in comparison to textual languages. The project is based on the premise that users will benefit from the conceptual modeling power of graphical representations of programs, and that they can learn more from building and testing graphical models than from writing textual code.
Both the ACM Task Force on the Core of Computer Science and the NSF Disciplinary Workshops on Undergraduate Education recommend scheduled, structured laboratory sessions for undergraduates studying computer science. In our experience, students can make little progress during a lab session when they are working in a traditional programming language.
The visual lab software consists of application-specific visual languages. Students build models by positioning and connecting components and setting the components' parameters. The models are then tested and revised. We have created visual labs for computer architecture, formal languages, robotics, database systems, modeling of parallel systems, and operating systems.
The visual labs allow computer science and engineering undergraduates to explore a variety of important topics from the ACM curriculum. Because students use visual programming software, they can build and test significant models within a standard lab period. And because all of the visual labs are created in the same visual programming environment, students need to learn only one user interface in order to perform an entire semester's worth of labs.
Most research on instructional applications of visual programming has focused on the acquisition of programming skills. Our focus is on the acquisition of subject-matter concepts. Work such as Kieras' studies of the usefulness of diagrammatic displays for understanding systems [Kieras, 1992] and Niedderer' s study of the value of computer modeling for the teaching of physics [Niedderer, et al., 1992] is suggestive of the usefulness of visual programming for the acquisition of subject-matter concepts.
Our software development environment, Eyes, was developed at the UMass Lowell Center for Productivity Enhancement. It is a developer's toolkit for creating visual programming applications [Canning, et al., 1991]. We elected to use EYES because it offers simple user interaction and we can distribute it with our lab software.
In this paper, we summarize the results of our research to date, describe for the first time our visual labs for database systems, and detail our plan for evaluating the relative effectiveness of visual labs (in which students create and test graphical models) and traditional labs (in which students write textual code).
The visual labs for database systems introduce concepts that are fundamental to database design and use. The labs have three components. The first is data modeling, which is done by constructing a graphical representation of an entity-relationship (ER) type model [Elmasri, et al., 1994]. The second is data entry, based on the ER-type model. Entity instances are represented by icons, and relationship instances are represented by graphical connections between entity instance icons. The third is data query, which is done by constructing graphical representations of SQL-like queries to retrieve information from the database
Using the three components, the user can design a database, enter data, and query the database. For example, a simple university course registration system can be implemented by (1) designing the data model with course, student, and instructor entity types and with relationship types for student-takes-course and instructor-teaches-course; (2) entering data about entity instances (specific courses, students, and instructors, with relationship instances that indicate which courses each student takes and which courses each instructor teaches); and (3) constructing queries (e.g., "Retrieve the student ID numbers of all students who take one or more courses taught by Prof. Smith") to retrieve information from the database.
To perform the data modeling task, the user constructs a graphical representation of an entity-relationship diagram. There are three kinds of icons available: entity type, attribute, and relationship type. To create an entity type, the user creates and positions an entity type icon, enters the name of the entity type into the icon's dialog box, creates and positions an attribute icon for each attribute of the entity type, enters the name of each attribute into its dialog box, and connects each attribute icon to the entity type icon. To create a relationship type, the user creates and positions a relationship type icon, enters the name of the relationship type into the icon' s dialog box, and connects the relationship type icon to the entity type icons that the relationship is between.
To perform the data entry task, the user constructs a graphical representation of the data. There are icons for a pre-defined set of entity types, according to the ER type model that was created in the data modeling task. For instance, if student, course, and instructor entity types have been defined, then student, course, and instructor icons will be available for data entry. To create an entity instance, the user creates and positions an entity icon and enters the values of its attributes into the icon' s dialog box. (Note that the attributes modeled in the data modeling task appear as text-entry fields in the dialog box.) To create a relationship instance between two entities, the user simply connects them. The relationship instance is implicit in the connection. There is no relationship icon.
To perform the data query task, the user constructs a graphical representation of an SQL-type query. There are icons for the keywords QUERY, SELECT, FROM, and WHERE. There are also icons for attributes, entities, and relationships, as defined in the data model. In addition, there are icons for Boolean operations (EQUAL, NOT EQUAL, etc.) and for values. Creating a query such as "Retrieve the student ID numbers of all students who take one or more courses taught by Prof Smith" requires creating and positioning ten icons (two attribute icons, two relationship icons, and one each of the QUERY, SELECT, FROM, WHERE, EQUAL, and value icons).
The lab manual for the database visual labs begins with a tutorial about using the visual programming environment. It then presents brief reviews of relevant database concepts, examples, and exercises for the user to complete.
It is our expectation that students in time-limited laboratory settings will be able to get more work done and, we hope, to learn more by using the visual labs than by using an off-the-shelf database package or by using an equivalent textual language. Our experimental design for evaluating the labs is described in the next section.
We are performing a series of empirical studies with the ultimate goal of comparing learning outcomes using visual and textual languages during computer science labs. We started with studies that showed that subjects who used the visual language based labs had increased learning outcomes when compared to a control group who did not use them. Then, as a first step toward comparing visual and textual labs, we performed a study that showed no systematic bias between visual and textual lab languages with the same semantics, as far as program entry is concerned. We are now performing a study to compare learning outcomes with visual and textual lab languages with the same semantics.
Learning outcomes
In this section, we summarize the work we have done on measuring learning outcomes for the visual labs [Williams, et al., 1993a, 1993b].
The visual labs for architecture were the first labs that we evaluated. The lab exercises take the user from simple circuits to portions of a central processing unit. The visual language used in the labs is a special-purpose digital logic simulator. The experimental design called for an experimental group to perform the labs and a control group not to. Both groups received the same lectures on the architecture material. Learning outcomes were measured by pre- and post-tests. There was a span of three weeks between administration of the pre-test and post-test, during which time the experimental group performed the labs and the control group had no formal instruction in computer architecture. The experimental group also completed pre- and post-test questionnaires.
Mean test scores for the experimental group rose 10% between the pre-test and post-test. The increase was not statistically significant. The mean test scores for the control group fell 25% . The difference between the groups was statistically significant. Of the 16 subjects in the experimental group who took both tests, 9 showed improvement, 4 showed decline, and 3 stayed the same. Of the 16 subjects in the control group who took both tests, only 1 showed improvement, while 10 showed decline, and 5 stayed the same. The statistically significant difference in performance of the experimental and control groups appears to be primarily a result of the expected decay over time in the control group's performance. We can conclude that visual labs at least overcome that decay.
Subjects were asked on both the pre-questionnaire and the post-questionnaire to answer the question "In general, are labs a valuable part of the learning experience in a course?" on an 11-point scale (0-10, where 0 means "not at all valuable" and 10 means "extremely valuable"). The difference between the pre-questionnaire and post-questionnaire responses was statistically significant. There was a significant positive correlation between this difference and the difference between the pre- and post-test scores. Subjects whose ratings of the usefulness of labs increased also tended to show improvement in their test scores.
On the pre-questionnaire, we surveyed the subjects about their previous experiences with labs. Many subjects complained about labs in other sciences, especially physics. The question, "In general, are labs a valuable part of the learning experience in a course?" elicited a significantly higher mean response on the post-questionnaire than on the pre-questionnaire. This change suggests a favorable attitude toward the visual labs.
We next evaluated the finite state machine labs. This study was designed to minimize the control group's performance degradation by keeping the time gap between pre-test and post-test small (only four class periods). Mean test scores for the experimental group rose 14.6% between the pre-test and post-test. The increase was statistically significant. The mean test scores for the control group did not change significantly. The difference between experimental and control groups was statistically significant. Of the 10 subjects in the experimental group who took both tests, 9 showed improvement and 1 stayed the same. None showed decrease.
Subjects were asked on both the pre-questionnaire and the post-questionnaire to answer the question "How well do you understand the concepts covered in the lectures on finite state machines?" They responded on an 11-point scale (0-10, where 0 means "do not understand" and 10 means "understand very well"). The difference between the pre-questionnaire and post-questionnaire responses was statistically significant.
In this case, the control group's performance did not degrade, while the experimental group's performance improved significantly. This improvement, along with the experimental group's questionnaire responses indicating that they came to understand finite state machines better during the labs, leads us to conclude that the labs contributed to subjects' learning.
The robotics labs were evaluated in a less formal manner than the architecture and finite state machine labs, because of the small number of subjects (9). Subjects were asked on a questionnaire to give their overall evaluation of the lab software on an 11-point scale (0-10, where 0 means "very difficult to use" and 10 means "very easy to use"). The mean response was 8.6. Subjects were asked to rate the lab manual on the same scale. The mean response was 7.3. Subjects were asked to rate the level of difficulty of the lab exercises on the same 11-point scale (where 0 means "not at all difficult" and 10 means "extremely difficult"). The mean response was 2.1.
The subjects' questionnaire responses indicate that they found the software and manual to be effective, and found the lab exercises to be relatively easy. The instructional staff observed during the labs that the students especially enjoyed exercises that permitted them to introduce their robots into a world shared by other subject's robots.
From the evaluations of the visual labs for architecture, finite state machines, and robotics, we conclude that visual labs may increase learning outcomes, as compared to no lab experiences. In the "Future Directions" section, we describe our experimental design for comparing visual language based labs to textual language based labs
Visual vs. Textual Lab Languages
In this section, we summarize the work we have done on comparing visual and textual lab languages [Williams. et al., 1997a, 1997b, 1997c]
In order to study the efficacy of the graphical languages that we are developing, we need to perform comparisons with textual languages. We are hesitant to use off-the-shelf textual languages for this purpose. Some studies that have done so, such as Green, et al.'s study of program comprehensibility [Green, et al., 1991], run the risk of comparing languages that are not equivalent in some sense, as discussed in [Mohar, et al., 1993, Williams, et al., 1996]. We are also hesitant to use an off-the-shelf database environment for the comparison, because it would be unrealistic to expect users in a closed lab setting to learn their way around a new environment, on top of learning how to use its language
To avoid these potential difficulties, we have designed textual languages of our own. Each has the same functionality as the corresponding graphical language, but has textual syntax. Since our software is used in time-limited training settings, we need to know to what extent the mechanics of program entry affect the amount of work that gets done. In other words, if we find that subjects who use the graphical languages accomplish more in a given amount of time, then we need to know how much of the effect we are seeing is due to the mechanics of program entry, rather than to, say, a difference in conceptual clarity. We began with the suspicion that the amount of typing necessary for entering a textual program might bias the results of a comparison, especially if the textual language has verbose syntax.
By program entry, we mean the time it takes an experienced programmer to enter a program that he or she has already designed. Although the conceptual work of creating graphical representations is the subject of our overall research project, our focus in this study is the physical act of entering a program.
To test our suspicion that there might be a systematic bias against the textual language because of the amount of work required to enter a program, we performed an analysis using the Keystroke Level Model (KLM), as described by [Card, et al., 1983, Kieras, 1994]. The KLM provides a technique for creating a model of the actions required to perform a task using keyboard and mouse, and then for using the model to predict execution time for the task. A KLM model is constructed in five steps, as follows: (I) specifying the task of interest; (2) writing a method that details the sequence of actions needed to perform the task; (3) encoding each action using KLM operators; (4) computing the predicted time for each encoded action, and (5) summing the predicted times over the entire method. The predicted times for each KLM operator are averages based on empirical studies [Card, et al., 1983].
For purposes of the KLM analysis of the database-specific languages, we identified three simple but typical tasks that a visual or textual programmer would perform (a data modeling task, a data entry task, and a data query task). We then wrote methods that detail the sequences of actions for accomplishing each task in the visual condition and in the textual condition. We encoded the actions using KLM operators. Finally, we derived predicted execution times.
We then performed an empirical study to test the validity of the predictions. Since we were interested in program entry time for an expert, not novice, user, our test subjects were computer science graduate students with expertise in programming, though without previous exposure to the languages used in the study.
The predictions from the models of the graphical condition always overestimated the observed execution time. The predictions from the models of the textual condition underestimated it, except for the data entry task, where the predicted and observed times were only a fraction of a second apart.
There was a significant difference (p < 0.05) between the execution times for the graphical and textual conditions for each task, as shown by a paired comparison t test. In every case, the significant difference was in the direction predicted by the models.
There was a high positive correlation between observed execution times and predicted times (r = .927). This correlation was significant as shown by Fisher's r to Z method (r = .927, n = 6, Z = 2.839, p = .0045).
We then fine-tuned the times associated with KLM operators to yield more accurate time predictions. Because distances between targets on the screen were small (on the order of several centimeters) for connecting icons, we decided to observe the actual time it takes to point to an object with the mouse, rather than using the estimate given by [Card, et al., 1983]. As a result, we lowered the pointing time estimate from 1.10 sec. to 0.65 sec. We then recomputed the predicted execution times for the three tasks mentioned above.
The correlation between observed execution times and revised predicted times (r = .982) shows an improvement over the earlier estimates (r = .927). This correlation was significant as shown by Fisher's r to Z method (r = .982, n = 6, Z = 4.086, p = .0001). We have not yet repeated the empirical study to verify this post hoc adjustment of our predictions.
For the purposes of our study of application-specific graphical languages in time-limited settings, we surmise that, overall, for the languages we studied, any advantage of the graphical languages over the textual ones (in terms of learning outcomes) will not be a result of greater ease of program entry, but of other factors -- including, we suspect, the ability of the graphical language to more clearly model the technical concepts being studied.
Our next study will compare learning outcomes for the use of the visual and textual database lab languages in time-limited settings. We have approval from the University's Institutional Review Board for an experimental design using undergraduate students as test subjects.
In this between-subjects study, one group of subjects will receive a tutorial about the visual lab language and will then perform a series of lab exercises using it. A second group will receive a tutorial about the textual lab language and will then perform the same lab exercises using it. Both groups will have lab manuals that will be customized for the type of language but equivalent otherwise. All subjects will be asked to complete a pre-test questionnaire to gather demographic data. All subjects will also be given pre- and post-tests to measure to measure learning outcomes. And all subjects will be asked to complete a post-test questionnaire to provide feedback about the lab experience.
Our previous studies indicate that visual language based labs for computer science may have a positive effect on learning outcomes. Our upcoming study comparing learning outcomes with visual labs and textual labs -- where the languages are sematically equivalent and the lab exercises are identical -- should begin to tell us whether visual language based labs have advantages over textual language based labs. Since we found no systematic bias in program entry in favor of the visual language, we can interpret any advantage of the visual language to be due to differences in the conceptual modeling power of the two languages.
The Visual Labs project is supported in part by NSF grant number DUE-9354708.
Canning, James T., David Pelland, and Sharon Sliger. 1991. "Visual Programming in a Workstation Environment," Proceedings of 1991 ACM Symposium on Personal and Small Computers.
Card, Stuart K., Thomas P. Moran, and Allen Newell. 1983. The Psychology of Human-Computer Interaction (Hillsdale, NJ: Lawrence Erlbaum Associates).
Elmasri, Ramez and Shamkant B. Navathe. 1994. Fundamentals of Database Systems, Reading, MA: Addison-Wesley.
Green, T.G.R, M. Petre, and R.K.E. Bellamy. 1991. L'Comprehensibility of Visual and Textual Programs: A Test of Superlativism Against the 'Match-Mismatch' Conjecture," Empirical Studies of Programmers: Fourth Workshop, pp. 121-146.
Kieras, David. 1992. "Diagrammatic Displays for Engineered Systems: Effects on Human Performance in Interacting with Malfunctioning Systems," International Journal of ManMachine Studies, vol. 36, pp. 861-895.
Kieras, David. 1994. "A Guide to GOMS Task Analysis," in GOMS Modeling of User Interfaces Using NGOMSL, tutorial notes from the CHI '94 Conference on Human Factors in Computing Systems, Boston, MA, April 24-28, 1994.
Mohar, Thomas G., David C. Mak, Brad Blumenthal, and Laura M. Leventhal. 1993. "Comparing the Comprehensibility of Textual and Graphical Programs: The Case of Petri Nets," Empirical Studies of Programmers: Fifth Workshop, 1993, pp. 137- 161.
Niedderer, Hans, H. Schecker, and T. Bethge. 1991. "The Role of Computer-aided Modeling in Learning Physics," Journal of Computer Assisted Learning, vol. 7, pp. 84-95, 1991
Peterson, James L. 1981. Petri Net Theory and the Modeling of Systems, Englewood Cliffs, NJ: Prentice-Hall.
Williams, Marian G. and J. Nicholas Buehler. 1997a. Prediction of Program Entry Times for Visual and Textual Languages with Equivalent Semantics, submitted to the 1997 IEEE Visual Languages Symposium.
Williams, Marian G. and J. Nicholas Buehler. 1997b. "Predictions of Program Entry Time for Semantically Equivalent Graphical and Textual Languages," University of Massachusetts Lowell Computer Science Department Technical Report No. 97-101
Williams, Marian G. and J. Nicholas Buehler. 1997c. "A Study of Program Entry Time Predictions for Application-specific Visual and Textual Languages," submitted to the Seventh Empirical Studies of Programmers Workshop.
Williams, Marian G., William A. Ledder, J. Nicholas Buehler, and James T. Canning. 1993a. "Visual Programming Labs for Introducing Computer Science Concepts," Proceedings of the IEEE/ASEE Frontiers in Education Conference, November 6-9, 1993, Washington, DC, pp. 797-801.
Williams, Marian G., William A. Ledder, J. Nicholas Buehler, and James T. Canning. 1993b. "An Empirical Study of Visual Labs," Proceedings of the 1993 IEEE/CS Symposium on Visual Languages, August 24-27, Bergen, Norway, pp. 371-373.
Williams, Marian G., Hyxia Villegas, and J. Nicholas Buehler. 1996. "Appropriateness of Graphical Program Representations for Training Applications," Conference Companion of the CHI 96 Conference on Human Factors in Computing Systems, Vancouver, BC, April 13- 18, 1996, pp. 91 -92