HCI - Human Computer Interaction II
Evaluating the designed system
The elements that concur to the system evaluation are:
- stage of design early, middle, late
- novelty of project well defined vs exploratory
- number of expected users
- criticality of the interface:life-critical medical system vs museum-exhibit support
- costs of product and finances allocated for testing
- time available
- experience of the design and evaluation team
Ranges
the evaluation plans might be a 2 years test with mutiple phases for an air-traffic-control system or
a 3 day test with 6 users for a small internal accounting system
costs may vary from 10% of the project to 1%
not possible now to bypass usability testing
customers might file lawsuits to s/w vendors for errors
Limitations
impossible to test the system in every possible situation
testing must include continuing methods to assess and repair problems during the lifecycle of the interface
a decision must be made as to delivery after testing
most testing methods account for normal usage but stressful situations and partial equipment failures should also be considered
more than 4000 members of the Usab.Prof. Ass. exchange ideas about these problems
Expert reviews
experts may belong to staff or be external consultants
reviews may be conducted rapidly
reviews may be performed early or late in the design phase: they provide recommendations, a formal report or both
suggestions should be made cautiously (take care of the designer’s ego)
it is better to pinpoint problems than to provide solutions
solutions should be left to the designers
Expert review: methods
heuristic evaluation
i.e. evaluate using the 8 golden rules; expertise on the rules is very important
guidelines review
the interface is checked with respect to organizational & guidelines documents
consistency inspection
across a family of interfaces: color, layout, terminology, i/p & o/p formats, training materials, help
cognitive walkthrough
users are simulated in carrying out their tasks; (Wharton, 1944) frequent tasks are the starting point, critical tasks, error recovery, also public walkthroughs are performed (Yourdon, 1989)
formal usability inspection
courtroom-style meetings with a moderator to present the interface and discuss merits and waknesses, design team member may rebut; these meetings may be good experiences for managers yet they are time consuming.
Expert review vs usability studies
difficult to compare, different contributions to improve the interface
some studies prove the benefits of expert reviews (Jeffries et al 1991, Karat et al 1992)
different experts find different problems, it may be a good thing to use 3-5 experts and collect all evidence
expert reviewers should act in the same conditions as the potential users (work place, noise, stress)
bird’s eye view of an interface via printed screens pinned to a board may be very useful to detect inconsistencies
some experts may lack knowledge on the task domain, conflicting advice may be negative
experts should have a long term relationship with the organization - they may be accountable
difficult to predict how first time users will behave
Usability testing and laboratories
started in 1980
traditional managers resisted (nice idea but time and money pressures prevented them from adopting usability evaluation)
competition started the need for such evaluation, moreover deadlines could be met if an usability test was scheduled
the results of the test provided:
- supportive confirmation of progress
- specific recommendations for changes
designers had evaluative feedback to guide their work
managers saw fewer disasters as delivery dates approached
usability testing speeded up many projects
it also produced dramatic cost savings
usability laboratory tests were influenced by marketing and advertising, few users, quick & dirty
controlled experiments tested hypothesis, support theories and methods and produce statistically significant results
Usability labs
have emerged in different companies
they provide a positive image of the company
some are very large (16 labs at Boca Raton, IBM)
usability consultancy firms have started and may be hired
each lab may serve 10 to 15 projects a year; lab staff meets with the user interface architect or manager at the kick off to make a test plan with scheduled dates and budget allocations
Pre-test
usability staff participate in early task analysis, provide info on s/w tools, references and help develop set of taks for the usability test
2 to 6 weeks before the test a detailed test plan is defined, inlcuding a list of tasksm subjective satisfaction and debriefing questions
number of participants, source: customer site, personnel agency
a pilot test of procedures, tasks, questionnaires made 1 week ahead of time
Test
final procedures are now defined
- participants are chosen to represent the user communities
- attention to background in computing
- experience with the task, motivation, education, ability with the interface language
- control of eyesight, left versus right handedness, age, gender
- other experimental conditions: time of day, day of week
- physical surroundings, noise, room temperature
Etiquette
participants should always be treated with respect
informed that THEY are not tested, the system is
they will be told what they will be doing & for how long
participation should always be voluntary (informed consent)
a typical statement could be the following:
Statement of consent
I have freely volunteered to participate in this experiment
I have been informed in advance what my task(s) will be and what procedures will be followed
I have been given the opportunity to ask questions and have had my questions answered to my satisfaction
I am aware that I have the right to withdraw consent and to discontinue participation at any time, without prejudice to my future treatment
My signature below may be taken as affirmation of all the above statements; it was given prior to my participation in ths study
Cues for testing
participants may be encouraged to think aloud
the tester should be supportive of the participants taking notes and not interfering
typically tasks will be achieved after 2-3 hours
participants are invited to make general comments/suggestions
sometimes 2 users cooperate in the task and exchange ideas
videotaping is often performed for later review (tedious job)
logging all user actions is generally performed, perhaps with special programs (The Observer, Nolde, The Netherlands)
logging means tracing mousing, typing, reading manuals, screens, etc.
designers are impressed when they see (on the tape) the users failing or not achieving what they want
sometimes users consistently pick up the wrong menu: the position of that menu was ackward
Testing & correcting
at each design stage the interface can be refined iteratively
the improved version can be tested again
it is important to fix quickly even small flaws (spelling errors, inconsistent layout,...)
forms of usability testing have been suggested
- discount usability engineering (Nielsen, 1992) which is a quick & dirty approach (task analysis, prototype development, testing)
- field test user realistic environments - portable usability labs with videotaping & logging - a variant is to provide users with test versions of new s/w (Microsoft’s Windows 95 was screened by 400.000 users!)
Other testing strategies
early usability testing may be performed with mockups of screen displays to assess user reactions to wording, layout and sequencing
a test administrator plays the role of the computer by flipping the pages while asking a participant user to carry out typical tasks
game designers pioneered the can-you-break-this approach providing teenagers with the challenge to beat new games
this last approach is a destructive one which tries to detect fatal flaws and appears very productive for critical systems
Testing conclusions
last approach compares different versions of the same interface or with other similar interfaces intended for the same job
its name: competitive usability testing
it is important to construct parallel sets of tasks and counterbalance the order of presentation of interfaces
usability has at least 2 serious limitations:
- emphasis on first usage (2 to 4 hours, remaining period has unknown problems)
- limited coverage of interface features (few aspects may be touched on during a test)
Surveys
due to the above conclusions, usability tests must be integrated with other measurements, i.e. with surveys
surveys are a familiar, inexpensive and generally acceptable companion for usability tests and expert reviews
clear goals in advance + focused items helping to attain those goals
care in administration and data analysis
it should be prepared, reviewed among colleagues and tested with a small sample of users
statistical analyses and presentations should be developed before the final survey is distributed
survey goals may be tied to the components of the OAI model of interface design; subjective impressions about the representation of:
task domain objects and actions
- interface domain metaphors and action handles
- syntax of inputs and design of displays
ascertain the user’s
- background - age, gender,origins, education, income
- experience with computers - specific applications, length of time, depth of knowledge
- job responsibilities - decision making influence, managerial roles, motivation
- personality style - introvert vs extravert, risk taking vs risk averse, early vs late adopter, systematic vs opportunistic
ascertain the user’s
- reasons for not using an interface - inadequate services, too complex, too slow
- familiarity with features - printing, macros, shortcuts, tutorials
feelings after using an interface:
confused vs clear
frustrated vs in control
bored vs excited
Online surveys
avoid the cost and effort of printing, distributing and collecting paper forms
many people prefer to answer a short survey displayed on a screen (rather than filling in and returning a printed form)
in a survey, a short scale with 5 values was provided
- strongly agree
- agree
- neutral
- disagree
- strongly disagree
Survey with the 5-value scale
I find the system commands easy to use
I feel competent with and knowledgeable about the system commands
When writing a set of system commands for a new application, I am confident that they will be correct on the first run
When I get an error message, I find that it is helpful in identifying the problem
I think that there are too many options and special cases
I believe that the commands could be substantially simplified
I have trouble remembering the commands and options and must consult the manual frequently
When a problem arises, I ask for assistance from someone who really knows the system
Results from this survey
it helps designers to identify problems users are having
it demonstrates improvement to the interface as changes are made in
- training
- online assistance
- command structures
progress is demonstrated as subsequent surveys show higher scores
On a text editor usage
users had to rate the messages from a text editor on a 7 value scale
Hostile 1234567 Friendly
Vague 1234567 Specific
Misleading 1234567 Beneficial
Discouraging 1234567 Encouraging
when precise questions are asked, precise answers will be given
Other questionnaires
Coleman and Williges, 1985 developed a set of opposing features as reactions users could have with an interface
- pleasing vs irritating
- simple vs complicated
- concise vs redundant
and then asked to evaluate, on these grounds, a text editor
another approach is to ask specific questions like:
- readability of characters
- meaningfulness of command names
- helpfulness of error messages
[Previous]
[Home Page]
[Next]