New data out of Stanford is lending credence to common wisdom that fitness trackers suck at counting calories. Heart-rate monitoring, on the other hand, fared surprisingly well in the school’s Medical Center studies.

The group of 60 volunteers was hand-selected to offer a broad range of subjects, factoring in things like gender, body mass index and skin color (optical heart-rate monitoring has a history of being a bit unintentionally racist). The 31 women and 29 men tested various combinations of the Apple Watch (first generation), Basis Peak, Fitbit Surge, Microsoft Band, Mio Alpha 2, PulseOn and Samsung Gear S2 while walking on treadmills and riding stationary bikes.

Those results were compared to data from medical-grade equipment for measuring heart rate and carbon dioxide in the breath, a commonly accepted measure for gauging energy expenditure. Turns out all of the devices other than the Samsung did a pretty solid job on the heart-rate front. The rest had error rates of under five percent, which qualifies them as decent candidates for medical uses.

They’re not going to replace chest straps in the lab any time soon, but the kind of day to day data they collect could be useful for doctors looking for a more complete picture of their patients’ health. At the very least, they could provide some interesting supplemental information.

“Five percent is approaching something useful in a clinical setting, which suggests that doctors may be looking toward this data when evaluating their patients,” graduate student Anna Shcherbina told TechCrunch. “If a doctor has average data on the patient as they’re going about their day, this could provide a more complete picture.”

Caloric information is a different story altogether. And it’s not surprising, really. Heart rate is a pretty direct reading. Calories burned differ hugely from person to person based on a variety of different factors. And the wrist isn’t the logical spot to start a search for caloric information. The system needs to do a fair amount of guesswork based on movements detected by the wearables’ built-in accelerometers.

The range of readings is pretty astounding. The best reading was off by an average of 27 percent and the worst by an appalling 93. To be of any real use for a medical setting, you’re going to want a number under 10 percent. So there’s a lot of work to be done here. Shcherbina says the Apple Watch and Fitbit Surge were the most accurate of the bunch; it was Finnish company PulseOn really missed the mark.

The team unsurprisingly got some pushback from PulseOn when it didn’t enter more precise information like a volunteer’s VO2 max level. In order to get a more accurate picture of the average user, the researchers didn’t enter information that most of us don’t have access to. I know I haven’t been on a treadmill with a VO2 measuring mask on my face any time recently. Perhaps I’m overdue.

Since the study began in 2015, there are newer versions of a number of the devices here. The team admits that, in the case of the Apple Watch, there may be some discrepancies between “active” and “total” calories burned, an issue that may be reconciled as it continues testing on the new version.

The team purchased all devices independently for the integrity of the study — but lead researcher professor Euan Ashley says they’re open to communicating with the companies involved to make sure there’s not some sort of proprietary secret sauce somehow being overlooked.

Both PulseOn and Apple have reached out with feedback, after the paper’s publication, and the team is open to future collaboration. “No concrete plans as of now,” Shcherbina explains, “but we are definitely open to this idea if the companies are interested in working with us. Obviously maintaining our independence/impartiality is key, but closer communication might yield valuable insights about most effective ways to improve fitness algorithms for wearables.”

Apple and Samsung declined to offer a statement, while Fitbit is clearly on the offensive here, sending along a big one. The wearable maker touches upon its “extensive, ongoing research and development,” but adds that the caloric estimates that trackers give are just that — estimates:

Fitbit trackers show an estimated total number of calories burned based on users’ BMR (basal metabolic rate) and activity energy expenditure (AEE). Fitbit uses a scientifically validated estimate of BMR based on height, weight, age, and gender information that users provide when setting up their Fitbit account.

The company also argues that there’s something to be said for the base-level motivation that simply wearing a tracker brings. It’s a fair point. If the sort of gamification that these devices bring does get more people off their butts, that’s probably a net positive and a big part of what’s driving the category’s success. I know plenty of people (myself included) who find the simple act of wearing one of these devices on their wrist cause to get off their respective butts.

Shcherbina agrees with the point, but adds that there’s a big downside for those who rely on such readings to make “very precise decisions about exercise and food based on that, then it might be off by quite a bit and they may not get the results they want and become frustrated.”

As more health providers look to trackers for data, the category is likely to come under increased scrutiny from regulators like the FDA. As it currently stands, however, that body doesn’t oversee any sort of claims made for these “non-medical” products.

“Our position is that ‘sunlight is better than regulation,’ ” says Ashley. “We would encourage the companies to make available their validation studies in the public domain so the public can see for themselves which devices are most accurate.”

At the very least, the study prompts some interesting questions about how companies collect and process the data that informs its health claims. The data collected may not be a perfect reflection of manufacturers’ intentions, but the study is a place to start the conversation.

Featured Image: Bryce Durbin