HCI - Human Computer Interaction II

Evaluating the designed system

The elements that concur to the system evaluation are:

stage of design early, middle, late
novelty of project well defined vs exploratory
number of expected users
criticality of the interface:life-critical medical system vs museum-exhibit support
costs of product and finances allocated for testing
time available
experience of the design and evaluation team

Ranges

the evaluation plans might be a 2 years test with mutiple phases for an air-traffic-control system or

a 3 day test with 6 users for a small internal accounting system

costs may vary from 10% of the project to 1%

not possible now to bypass usability testing

customers might file lawsuits to s/w vendors for errors

Limitations

impossible to test the system in every possible situation

testing must include continuing methods to assess and repair problems during the lifecycle of the interface

a decision must be made as to delivery after testing

most testing methods account for normal usage but stressful situations and partial equipment failures should also be considered

more than 4000 members of the Usab.Prof. Ass. exchange ideas about these problems

Expert reviews

experts may belong to staff or be external consultants

reviews may be conducted rapidly

reviews may be performed early or late in the design phase: they provide recommendations, a formal report or both

suggestions should be made cautiously (take care of the designer’s ego)

it is better to pinpoint problems than to provide solutions

solutions should be left to the designers

Expert review: methods

heuristic evaluation

i.e. evaluate using the 8 golden rules; expertise on the rules is very important

guidelines review

the interface is checked with respect to organizational & guidelines documents

consistency inspection

across a family of interfaces: color, layout, terminology, i/p & o/p formats, training materials, help

cognitive walkthrough

Expert review vs usability studies

difficult to compare, different contributions to improve the interface

some studies prove the benefits of expert reviews (Jeffries et al 1991, Karat et al 1992)

different experts find different problems, it may be a good thing to use 3-5 experts and collect all evidence

expert reviewers should act in the same conditions as the potential users (work place, noise, stress)

bird’s eye view of an interface via printed screens pinned to a board may be very useful to detect inconsistencies

some experts may lack knowledge on the task domain, conflicting advice may be negative

experts should have a long term relationship with the organization - they may be accountable

difficult to predict how first time users will behave

Usability testing and laboratories

started in 1980

traditional managers resisted (nice idea but time and money pressures prevented them from adopting usability evaluation)

competition started the need for such evaluation, moreover deadlines could be met if an usability test was scheduled

the results of the test provided:

supportive confirmation of progress
specific recommendations for changes

designers had evaluative feedback to guide their work

managers saw fewer disasters as delivery dates approached

usability testing speeded up many projects

it also produced dramatic cost savings

usability laboratory tests were influenced by marketing and advertising, few users, quick & dirty

controlled experiments tested hypothesis, support theories and methods and produce statistically significant results

Usability labs

have emerged in different companies

they provide a positive image of the company

some are very large (16 labs at Boca Raton, IBM)

usability consultancy firms have started and may be hired

each lab may serve 10 to 15 projects a year; lab staff meets with the user interface architect or manager at the kick off to make a test plan with scheduled dates and budget allocations

Pre-test

usability staff participate in early task analysis, provide info on s/w tools, references and help develop set of taks for the usability test

2 to 6 weeks before the test a detailed test plan is defined, inlcuding a list of tasksm subjective satisfaction and debriefing questions

number of participants, source: customer site, personnel agency

a pilot test of procedures, tasks, questionnaires made 1 week ahead of time

Test

final procedures are now defined

participants are chosen to represent the user communities
attention to background in computing
experience with the task, motivation, education, ability with the interface language
control of eyesight, left versus right handedness, age, gender
other experimental conditions: time of day, day of week
physical surroundings, noise, room temperature

Etiquette

participants should always be treated with respect

informed that THEY are not tested, the system is

they will be told what they will be doing & for how long

participation should always be voluntary (informed consent)

a typical statement could be the following:

Statement of consent

I have freely volunteered to participate in this experiment
I have been informed in advance what my task(s) will be and what procedures will be followed
I have been given the opportunity to ask questions and have had my questions answered to my satisfaction
I am aware that I have the right to withdraw consent and to discontinue participation at any time, without prejudice to my future treatment
My signature below may be taken as affirmation of all the above statements; it was given prior to my participation in ths study

Cues for testing

participants may be encouraged to think aloud

the tester should be supportive of the participants taking notes and not interfering

typically tasks will be achieved after 2-3 hours

participants are invited to make general comments/suggestions

sometimes 2 users cooperate in the task and exchange ideas

videotaping is often performed for later review (tedious job)

logging all user actions is generally performed, perhaps with special programs (The Observer, Nolde, The Netherlands)

logging means tracing mousing, typing, reading manuals, screens, etc.

designers are impressed when they see (on the tape) the users failing or not achieving what they want

sometimes users consistently pick up the wrong menu: the position of that menu was ackward

Testing & correcting

at each design stage the interface can be refined iteratively

the improved version can be tested again

it is important to fix quickly even small flaws (spelling errors, inconsistent layout,...)

forms of usability testing have been suggested

discount usability engineering (Nielsen, 1992) which is a quick & dirty approach (task analysis, prototype development, testing)
field test user realistic environments - portable usability labs with videotaping & logging - a variant is to provide users with test versions of new s/w (Microsoft’s Windows 95 was screened by 400.000 users!)

Other testing strategies

early usability testing may be performed with mockups of screen displays to assess user reactions to wording, layout and sequencing

a test administrator plays the role of the computer by flipping the pages while asking a participant user to carry out typical tasks

game designers pioneered the can-you-break-this approach providing teenagers with the challenge to beat new games

this last approach is a destructive one which tries to detect fatal flaws and appears very productive for critical systems

Testing conclusions

last approach compares different versions of the same interface or with other similar interfaces intended for the same job

its name: competitive usability testing

it is important to construct parallel sets of tasks and counterbalance the order of presentation of interfaces

usability has at least 2 serious limitations:

emphasis on first usage (2 to 4 hours, remaining period has unknown problems)
limited coverage of interface features (few aspects may be touched on during a test)

Surveys

due to the above conclusions, usability tests must be integrated with other measurements, i.e. with surveys

surveys are a familiar, inexpensive and generally acceptable companion for usability tests and expert reviews

clear goals in advance + focused items helping to attain those goals

care in administration and data analysis

it should be prepared, reviewed among colleagues and tested with a small sample of users

statistical analyses and presentations should be developed before the final survey is distributed

survey goals may be tied to the components of the OAI model of interface design; subjective impressions about the representation of:

task domain objects and actions

interface domain metaphors and action handles
syntax of inputs and design of displays

ascertain the user’s

background - age, gender,origins, education, income
experience with computers - specific applications, length of time, depth of knowledge
job responsibilities - decision making influence, managerial roles, motivation
personality style - introvert vs extravert, risk taking vs risk averse, early vs late adopter, systematic vs opportunistic

ascertain the user’s

reasons for not using an interface - inadequate services, too complex, too slow
familiarity with features - printing, macros, shortcuts, tutorials
feelings after using an interface:
confused vs clear
frustrated vs in control
bored vs excited

Online surveys

avoid the cost and effort of printing, distributing and collecting paper forms

many people prefer to answer a short survey displayed on a screen (rather than filling in and returning a printed form)

in a survey, a short scale with 5 values was provided

strongly agree
agree
neutral
disagree
strongly disagree

Survey with the 5-value scale

I find the system commands easy to use

I feel competent with and knowledgeable about the system commands

When writing a set of system commands for a new application, I am confident that they will be correct on the first run

When I get an error message, I find that it is helpful in identifying the problem

I think that there are too many options and special cases

I believe that the commands could be substantially simplified

I have trouble remembering the commands and options and must consult the manual frequently

When a problem arises, I ask for assistance from someone who really knows the system

Results from this survey

it helps designers to identify problems users are having

it demonstrates improvement to the interface as changes are made in

training
online assistance
command structures

progress is demonstrated as subsequent surveys show higher scores

On a text editor usage

users had to rate the messages from a text editor on a 7 value scale

Hostile 1234567 Friendly

Vague 1234567 Specific

Misleading 1234567 Beneficial

Discouraging 1234567 Encouraging

when precise questions are asked, precise answers will be given

Other questionnaires

Coleman and Williges, 1985 developed a set of opposing features as reactions users could have with an interface

pleasing vs irritating
simple vs complicated
concise vs redundant

and then asked to evaluate, on these grounds, a text editor

another approach is to ask specific questions like:

readability of characters
meaningfulness of command names
helpfulness of error messages

[Previous] [Home Page] [Next]