Evaluating the designed system

The elements that concur to the system evaluation are:

Ranges

  • the evaluation plans might be a 2 years test with mutiple phases for an air-traffic-control system or
  • a 3 day test with 6 users for a small internal accounting system
  • costs may vary from 10% of the project to 1%
  • not possible now to bypass usability testing
  • customers might file lawsuits to s/w vendors for errors

    Limitations

  • impossible to test the system in every possible situation
  • testing must include continuing methods to assess and repair problems during the lifecycle of the interface
  • a decision must be made as to delivery after testing
  • most testing methods account for normal usage but stressful situations and partial equipment failures should also be considered
  • more than 4000 members of the Usab.Prof. Ass. exchange ideas about these problems

    Expert reviews

  • experts may belong to staff or be external consultants
  • reviews may be conducted rapidly
  • reviews may be performed early or late in the design phase: they provide recommendations, a formal report or both
  • suggestions should be made cautiously (take care of the designer’s ego)
  • it is better to pinpoint problems than to provide solutions
  • solutions should be left to the designers

    Expert review: methods

  • heuristic evaluation
      i.e. evaluate using the 8 golden rules; expertise on the rules is very important
  • guidelines review
      the interface is checked with respect to organizational & guidelines documents
  • consistency inspection
      across a family of interfaces: color, layout, terminology, i/p & o/p formats, training materials, help
  • cognitive walkthrough
      users are simulated in carrying out their tasks; (Wharton, 1944) frequent tasks are the starting point, critical tasks, error recovery, also public walkthroughs are performed (Yourdon, 1989)
      formal usability inspection
      courtroom-style meetings with a moderator to present the interface and discuss merits and waknesses, design team member may rebut; these meetings may be good experiences for managers yet they are time consuming.

    Expert review vs usability studies

  • difficult to compare, different contributions to improve the interface
  • some studies prove the benefits of expert reviews (Jeffries et al 1991, Karat et al 1992)
  • different experts find different problems, it may be a good thing to use 3-5 experts and collect all evidence
  • expert reviewers should act in the same conditions as the potential users (work place, noise, stress)
  • bird’s eye view of an interface via printed screens pinned to a board may be very useful to detect inconsistencies
  • some experts may lack knowledge on the task domain, conflicting advice may be negative
  • experts should have a long term relationship with the organization - they may be accountable
  • difficult to predict how first time users will behave

    Usability testing and laboratories

  • started in 1980
  • traditional managers resisted (nice idea but time and money pressures prevented them from adopting usability evaluation)
  • competition started the need for such evaluation, moreover deadlines could be met if an usability test was scheduled
  • the results of the test provided:
    • supportive confirmation of progress
    • specific recommendations for changes
  • designers had evaluative feedback to guide their work
  • managers saw fewer disasters as delivery dates approached
  • usability testing speeded up many projects
  • it also produced dramatic cost savings
  • usability laboratory tests were influenced by marketing and advertising, few users, quick & dirty
  • controlled experiments tested hypothesis, support theories and methods and produce statistically significant results

    Usability labs

  • have emerged in different companies
  • they provide a positive image of the company
  • some are very large (16 labs at Boca Raton, IBM)
  • usability consultancy firms have started and may be hired
  • each lab may serve 10 to 15 projects a year; lab staff meets with the user interface architect or manager at the kick off to make a test plan with scheduled dates and budget allocations

    Pre-test

  • usability staff participate in early task analysis, provide info on s/w tools, references and help develop set of taks for the usability test
  • 2 to 6 weeks before the test a detailed test plan is defined, inlcuding a list of tasksm subjective satisfaction and debriefing questions
  • number of participants, source: customer site, personnel agency
  • a pilot test of procedures, tasks, questionnaires made 1 week ahead of time

    Test

  • final procedures are now defined
    • participants are chosen to represent the user communities
    • attention to background in computing
    • experience with the task, motivation, education, ability with the interface language
    • control of eyesight, left versus right handedness, age, gender
    • other experimental conditions: time of day, day of week
    • physical surroundings, noise, room temperature

    Etiquette

  • participants should always be treated with respect
  • informed that THEY are not tested, the system is
  • they will be told what they will be doing & for how long
  • participation should always be voluntary (informed consent)
  • a typical statement could be the following:

    Statement of consent

    I have freely volunteered to participate in this experiment
    I have been informed in advance what my task(s) will be and what procedures will be followed
    I have been given the opportunity to ask questions and have had my questions answered to my satisfaction
    I am aware that I have the right to withdraw consent and to discontinue participation at any time, without prejudice to my future treatment
    My signature below may be taken as affirmation of all the above statements; it was given prior to my participation in ths study

    Cues for testing

  • participants may be encouraged to think aloud
  • the tester should be supportive of the participants taking notes and not interfering
  • typically tasks will be achieved after 2-3 hours
  • participants are invited to make general comments/suggestions
  • sometimes 2 users cooperate in the task and exchange ideas
  • videotaping is often performed for later review (tedious job)
  • logging all user actions is generally performed, perhaps with special programs (The Observer, Nolde, The Netherlands)
  • logging means tracing mousing, typing, reading manuals, screens, etc.
  • designers are impressed when they see (on the tape) the users failing or not achieving what they want
  • sometimes users consistently pick up the wrong menu: the position of that menu was ackward

    Testing & correcting

  • at each design stage the interface can be refined iteratively
  • the improved version can be tested again
  • it is important to fix quickly even small flaws (spelling errors, inconsistent layout,...)
  • forms of usability testing have been suggested
    • discount usability engineering (Nielsen, 1992) which is a quick & dirty approach (task analysis, prototype development, testing)
    • field test user realistic environments - portable usability labs with videotaping & logging - a variant is to provide users with test versions of new s/w (Microsoft’s Windows 95 was screened by 400.000 users!)

    Other testing strategies

  • early usability testing may be performed with mockups of screen displays to assess user reactions to wording, layout and sequencing
  • a test administrator plays the role of the computer by flipping the pages while asking a participant user to carry out typical tasks
  • game designers pioneered the can-you-break-this approach providing teenagers with the challenge to beat new games
  • this last approach is a destructive one which tries to detect fatal flaws and appears very productive for critical systems



  • Testing conclusions

  • last approach compares different versions of the same interface or with other similar interfaces intended for the same job
  • its name: competitive usability testing
  • it is important to construct parallel sets of tasks and counterbalance the order of presentation of interfaces
  • usability has at least 2 serious limitations:
    • emphasis on first usage (2 to 4 hours, remaining period has unknown problems)
    • limited coverage of interface features (few aspects may be touched on during a test)

    Surveys

  • due to the above conclusions, usability tests must be integrated with other measurements, i.e. with surveys
  • surveys are a familiar, inexpensive and generally acceptable companion for usability tests and expert reviews
  • clear goals in advance + focused items helping to attain those goals
  • care in administration and data analysis
  • it should be prepared, reviewed among colleagues and tested with a small sample of users
  • statistical analyses and presentations should be developed before the final survey is distributed
  • survey goals may be tied to the components of the OAI model of interface design; subjective impressions about the representation of:
  • task domain objects and actions
    • interface domain metaphors and action handles
    • syntax of inputs and design of displays
  • ascertain the user’s
    • background - age, gender,origins, education, income
    • experience with computers - specific applications, length of time, depth of knowledge
    • job responsibilities - decision making influence, managerial roles, motivation
    • personality style - introvert vs extravert, risk taking vs risk averse, early vs late adopter, systematic vs opportunistic
  • ascertain the user’s
    • reasons for not using an interface - inadequate services, too complex, too slow
    • familiarity with features - printing, macros, shortcuts, tutorials
      feelings after using an interface:
      confused vs clear
      frustrated vs in control
      bored vs excited

    Online surveys

  • avoid the cost and effort of printing, distributing and collecting paper forms
  • many people prefer to answer a short survey displayed on a screen (rather than filling in and returning a printed form)
  • in a survey, a short scale with 5 values was provided
    • strongly agree
    • agree
    • neutral
    • disagree
    • strongly disagree

    Survey with the 5-value scale

  • I find the system commands easy to use
  • I feel competent with and knowledgeable about the system commands
  • When writing a set of system commands for a new application, I am confident that they will be correct on the first run
  • When I get an error message, I find that it is helpful in identifying the problem
  • I think that there are too many options and special cases
  • I believe that the commands could be substantially simplified
  • I have trouble remembering the commands and options and must consult the manual frequently
  • When a problem arises, I ask for assistance from someone who really knows the system

    Results from this survey

  • it helps designers to identify problems users are having
  • it demonstrates improvement to the interface as changes are made in
    • training
    • online assistance
    • command structures
  • progress is demonstrated as subsequent surveys show higher scores

    On a text editor usage

  • users had to rate the messages from a text editor on a 7 value scale
  • Hostile 1234567 Friendly
  • Vague 1234567 Specific
  • Misleading 1234567 Beneficial
  • Discouraging 1234567 Encouraging
  • when precise questions are asked, precise answers will be given

    Other questionnaires

  • Coleman and Williges, 1985 developed a set of opposing features as reactions users could have with an interface
    • pleasing vs irritating
    • simple vs complicated
    • concise vs redundant
  • and then asked to evaluate, on these grounds, a text editor
  • another approach is to ask specific questions like:
    • readability of characters
    • meaningfulness of command names
    • helpfulness of error messages


    [Previous] [Home Page] [Next]