Canibais e Reis

  • artigos comentários

21 de Março, 2009

Integrated nutrition, lifestyle and health database: epidemiological information for an improved understanding of diseases of civilization

Autor: O Primitivo. Categoria: Saúde

Figure: "Nutrition, Lifestyle and Health Database",
Canibais e Reis (2009).

Nutrition, lifestyle and health data sources

There is a lot of freely available, on-line, nutrition and health statistics. Unfortunately, most of this data is dispersed and in different formats and, because of this, it becomes relatively difficult to compare data from those different sources. WHO publishes health data but doesn’t care about food statstics. FAO has food data statistics, but doesn’t provides any health data. Because epidemiological data is an essential tool, that crosses these types of data, for studying and formulating assotiations between modern lifestyle and the diseases of civilization, and because I couldn’t find any freely available on-line database, I decided to build my own database. I started by collecting data from some of the sources listed in the end of this article. To be more precise, the main sources I used were the following:

  1. FAO Statistical Yearbook 2005-2006 - consumption, for world nutritional data (energy intake, macronutrient distribution, etc. for more than 180 countries);
  2. FAOSTAT consumption - crops and FAOSTAT consumption - livestock and fish, for detailed world nutrition data (animal and vegetal products, cereals, fruits, sugars, olive oil, butter, fish, meat, and much more);
  3. Health statistics (British Heart Found), for european and world disease statistics (heart related, obesity, diabetes, blood pressure, total cholesterol);
  4. WHO Global Health Atlas and WHO Statistical Information System, for general world health statistics (mortality, socio-economical, drinking water, sanitation, tobacco use, etc.).

Because of the importance of Hormone D I also included two sun exposure related parameters: latitude (average latitude of each country) and the corresponding annual insolation. (In the end, I didn’t find any important correlation of these parameters with health and disease as I previously expected). Also and for my disapointment, I could’nt find any recent world data on saturated data, only data from 1998, so this database doesn’t include data on saturated fat. If you know where I can find it, please let me know by commenting on this article.

After joining all the raw data into a single Excel workbook with several worksheets (, the major problem arised: merging everything into an homogeneous format. Because most data was available for year 2003, I had to choose this year for most parameters. Some of them refer to nearby years, for example, total cholesterol data refers to 2005. The fully integrated database obtained this way includes data from 167 countries and a total 106 parameters (nutritional, health, lifestyle and disease related parameters).

167 countries reduced to 86 countries

Unfortunately this "full" database of 167 countries (see worksheet ‘FULL-2003′) has many missing values, in quite a few countries, especially on what concerns the world’s obesity and blood pressure data from the Brithish Heart Foundation and the world socio-economic data from the World Heath Organization. So, in order to get a realy complete database, without any missing values, 81 countries (lines) had to be removed, as well as 6 parameters (columns). I don’t consider this a major problem as the final database (see worksheet ‘FINAL-2003′) still includes 86 countries and 100 parameters, and this is more than enough to get some interesting conclusions. At last, a truly complete database!

Perhaps the first question to be asked is how reliable is this database? Well, the data sources must be considered reliable, as they are provided by well known international and world health authorities.You may check them by visiting the links provided in the end of this message. Concerning the work I’ve done here, it must be noticed that processing such relatively huge amounts of data, available from different sources and in variable formats, without anyone helping me and validating this work, it is certainly prone to some degree of error. Because of this, I ask anyone usign this data to always double check their results against the original sources, specially if you find some unexpected correlations. Also, please report to me any error that you may find.

Correlation does not imply causation

After the above warnings, one could ask what can be done with this data, a total 100 parameters from 86 world countries? Well, if you’re into statistical analysis, and this includes canonical analysis, redundancy analysis, principal component analysis, etc., then quite a lot of hypothesis and conclusions can be formulated and proved. But first, the most simple analysis to begin with is trying to understand how each parameter relates to the other parameters. In statistical analysis this is called correlation analysis and most certainly everyone, that studied some math at the university, understands what it means and how it is done (see worksheet ‘CORREL-2003′) .

In the correlation analysis I’ve done, I first divided the existing 100 parameters into: 70 ‘nutrition & lifestyle’ parameters; and 30 ‘health and disease’ related parameters. These can be combined in 2100 (70 x 30) pairs of parameters and their correlations can be ranked. That’s what I did in worksheets ‘BEST’ and ‘WORST’. Notice that in epidemiological observational data, ‘correlation’ does not necessarily imply ‘causality’. This is a common mistake that some investigators do due to a-priori conjectures that don’t exist in realiity. Regarding this issue, please read the Wikipedia correlation does not imply causation article and also the correlations does not equal causation article by Stephan, the author of the Whole Health Source blog.

Confounding factors

According to Wikipedia, "a confounding variable, or confounding factor, lurking variable or confounder, is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable. The methodologies of scientific studies therefore need to control for these factors to avoid a type 1 error; an erroneous ‘false positive’ conclusion that the dependent variables are in a causal relationship with the independent variable. Such a relation between two observed variables is termed a spurious relationship".

Perhaps the strongest confounding factor in epidemiological studies involving several countries accross different continents, as Stephan already noticed me, is wealth. Richer countries have more access to food and thus live with higher energy intakes. There are a few exceptions to this rule, of course, and I would quote here Portugal, in 2003 ranked nr. 2 in the world with 3750 kcal/day (against only 2780 kcal in 1980), right after the nr. 1 USA with 3770 kcal (I personally don’t believe this but thats the FAO data).

Because energy intake is a primary key parameter in health, probably much more than macronutrient distribution, physical activity or even food quality combined (I know many people don’t agree with this but the high correlation of r=0.82 between energy intake to years of life lost to non-communicable diseases, or diseases of civilization, strongly suggests this - but doesn’t prove it), it must also be the strongest confounder parameters of all. Perhaps one possible solution to overcome this dificulty would be to limit the number of countries according to classes of energy intake, and this is what I’ve donne in this database by providing a filtering option by "energy ranges". Or even better, if sufficient data is available from several consecutive years, different countries could be compared with data from distinct years, as much as energy intake for them all is matched to a certain value or a narrow range around that same value, for example 2800 kcal.

Visualizing data

Another way of analyzing data, and possibly the best one, is to draw some charts. This would be easy if we only had a few countries and a couple of parameters, but our database has now 100 parameters. For this purpose, I created a special Excel worksheet (see ‘CHARTS’) that allows the automated production of graphs, based in the combination of any 2 parameters listed in the database. This will allow for a total 10.000 different graphs, but of course most of them aren’t correlated at all. I chose here not to limit the combination of parameters to those 70×30, because you might want to correlate, for example, ‘energy intake’ with ’sugars and sweeteners’. This might be interesting if you want to explore the hypothesis that a variation of a certain parameter does not produce any effect in some other parameter. Se below for a further discussion on this subject.

Please notice that these automatic charts require Excel macros activated, because the Visual Basic language is necessary for the automated generation of the referred charts (I couldn’t get this working on Open Office’s Calc or even with GO-OO, because their VBA support is still too limited to run this specific Excel/VBA example. I hope the open source community solves this problem in the next couple of years as VBA is an incredible development environment that would be extremely useful in the Open Office environment when fully implemented and compatible with Microsoft VB language.)

Multivariate statistical analysis

Things now get a bit more complicated because we have a lot of parameters, exactly 70 ‘nutrition & lifestyle’ related parameters and 30 ‘health and disease’ parameters, and we would like to understand how some of those 70 parameters influence some of the other 30 parameters. Am I making myself clear here? For example, if you want to understand the influence of food and lifestyle on mortality and longevity, you might choose ‘energy intake’, ‘fat intake’ and ‘tobacco consumption’ and try to understand how these 3 parameters influence ‘addult mortality rate’, ‘mortality rate for cardiovascular disease’ and ‘healthy life expectancy’. Get the idea?

But only to get things worst, as we already know, many of these parameters can not be considered statistically independent. Fortunately mathematicians came up with a method called canonical correlation analysis, in which the variance of a set of parameters is decomposed in its orthogonal components in such a way that the variance of each parameter is then represented by the variance (eigenvalues) of its projections. I suppose most multivariate analysis books explain this better than me so I will not develop this subject here. I must say that, despite being no expert in this area, I did some CCA on the parameters I found most relevant for a set of conditions using the shareware version of XLSTAT, which can be download here.

If you are experienced in multivariate analysis I would love to hear some comments and suggestions from you on what concern the analysis I did here. I realy need help in this area from an expert in statistics.

Some inconvenient truths?

If you arrived here and want to try the database for yourself, please download it from the link below ( As I already referred, this is an Excel workbook, which includes several worksheets and that requires VBA /macro support in order to produce the described automated Excel charts. I consider this database just a preliminary version, that may contain errors, so I urge you to be very careful and double check any data you extract from it against the original FAO-stat, WHO-sis and BHF data, which can be accessed from the list of links in the end of this article. This is an ongoing effort so please report any errors that you may find so that I can correct them in future versions of this database.

If everything is correct and no major errors are found, we can start looking at the best correlated parameters and their corresponding graphs, and perhaps start finding by ourselves some, I would call them, ‘inconvenient truths’. For example, as many of us already know and as some health blogs recently reported, namely the Whole Health Source blog and the Hyperlipid blog, which by the way are two excellent blogs that I visit on a daily basis, total cholesterol doesn’t seem to be related to heart disease at all as we are constantly told by health authorities. For example, have a look at this graph cholesterol-cardiovasc-men.gif and you’ll understand that it takes a lot of imagination to draw a trend line, between ‘total cholesterol, men, 2005′ and ‘cardiovascular disease mortality, both sexes, 2002′, anywhere in this graph.

Also, if you graph ‘total cholesterol, female, 2005′ against ‘life expectancy at birth, female, 2003′ you will get this cholesterol-longevity-female.gif, a truly high correlation of r=0.73, where an increased total cholesterol level supposedly would provide higher longevity for females. I don’t believe the contrary, that lower total cholesterol levels would provide longer life as this would be against evidence. As we have seen before, and it is never too much to repeat this epidemiological mantra, ‘correlation does not imply causation’. On the other hand, I suppose a strong inverse correlation between ‘total cholesterol, men, 2005′ and ‘adult mortality rate, both sexes, 2003′ necessarily means that higher cholesterol levels by themselves can not imply higher mortality. In other words, if two variables are highly correlated, despite one of them not necessarily causing the other, we can assume non-causality for the exactly inverse relation. Is this clear, or not?

Final comments

If you try this type of parametric analysis or even the canonical analysis as described above, you may find yourself doing science and perhaps history. I’m not jokking! I tell this because, until now, I haven’t seen any analysis of this type with more than only a few countries, and the classical example of this epidemiological observational analysis, and of what can go wrong if we use only a very small sample of the whole data, is what Dr. Ancel Keys did in the early 50’s when he ‘established’ his never proven dietary fat-heart disease hypothesis. This has been a subject of some previous posts of mine, written in  portuguese, so I will not repeat them here.

Just to finish this article, it must be noticed that this Excel database, despite being the result of some realy hard and careful (unpaid) work, is provided as it is, without any warranty whatsoever, either explicit or implicit, of adequacy for any purpose. Anyway, despite of this necessary disclaimer, I sincerely believe it works fine, and you can test it just to make sure it realy does, and perhaps improve it and/or eventually report any errors that you may find, so that future versions of this free database are more reliable and error free.

In the next few weeks I’ll start my own data mining on this data, using some techinques that are familiar to me, like mutiple non-linear regression and a powerfull genetic algorithm called gene expression programming, which was created by a portuguese scientist.

Nota aos leitores portugueses: Este artigo foi publicado excepcionalmente em inglês tendo em vista facilitar o intercâmbio de ideias com outros bloguers, interessados neste tipo de informação estatística mas que não falam a língua portuguesa. Felizmente grande parte dos portugueses e brasileiros sabe ler e/ou entende o inglês, mas o contrário já não se verifica com tanta frequência.


Nutrition, lifestyle & health database: (1.95 Mb)
Last updated: 29.03.2009; Only minor changes.

On-line freely available databases:

WHO Global Health Atlas
WHO Statistical Information System
WHO burden of disease: 2004 update
WHO comparative health risks
WHO disease and injury estimates 2004
WHO disease and injury country estimates
WHO data sources and methods
WBank burden of disease & risk factors
FAOSTAT consumption - crops
FAOSTAT consumption - livestock and fish
FAO Statistical Yearbook 2004 - consumption
FAO Statistical Yearbook 2004 - countries
Health statistics (British Heart Found)
European Cardiovascular Statistics
Obesity, physical activity and diet in England


Related websites:

Epidemiology (Wikipedia)
XLSTAT, Statistical software for MS Excel
BioEstat 5.0 (portuguese software)
Correlation (Wikipedia)
Correlation does not imply causation (Wikipedia)
Confunding variable (Wikipedia)
Canonical analysis (Wikipedia)
Understanding canonical correlation analysis
Canonical Analysis (StatSoft)
Why There Is No Statistical Test for Confounding, Why Many Think There Is, and Why They Are Almost Right


Em português:

Uma Breve Introdução à Epidemiologia (Rev. Vigilância em Saúde Pública)
Texto Introdutório de Epidemiologia (Univ. Federal do Rio de Janeiro)
Os Caminhos da Estatística e suas Incursões pela Epidemiologia
Epidemiologia, demografia, história da medicina (Paulo Lotufo)

8 comentários a este artigo.

1 | Christer Sundqvist

21 de Março, 2009, 3:46


Quite recently I stumbled upon your excellent database on nutrition and health. Keep up with your good work! Greetings from the far-away country which carried out the inconsistent findings of late Dr. Ancel Keys to its fullest: Finland.

Sincerely yours,

Christer Sundqvist, PhD (biology)

2 | angela

21 de Março, 2009, 14:21


Congratulations and thank you for this, Ricardo - extraordinary work, really! Angela

3 | Janine

21 de Março, 2009, 22:42



I found the reference to your database on Whole Health Blog. I am an epidemiologist - this is the most fun database I have ever seen. The instant gratification for an epidemiologist is unrivaled. Of course, there are sizable grounds for speculating on alternative explanations for positive or negative correlations - but that’s what epidemiologists live for. I can do interactive lab exercises in class and have the students generate and debate hypotheses.

And the epidemiologists: can’t you just hear them trying to generate alternative hypotheses to explain why increased consumption of animal fat and increased cholesterol level is negatively correlated with cardiovascular mortality (while wheat and cereal consumption is positively correlated)?


Thanks, Ricardo - I will make sure they know who did this wondrous thing.


4 | Erik-Alexander Richter

21 de Março, 2009, 18:25


Great info! very useful for some presentations!!!

5 | Food and health data set « Follow the Data

21 de Março, 2009, 12:20


[...] dataset about food and health, available online here (Google spreadsheet) and described at the Canibais e Reis blog. I found it through the Cluster analysis of what the world eats blog post, which is cool, but [...]

7 | FJ | Fitmarker

21 de Março, 2009, 10:24


Nice, this is a meaty article. You should think about converting a few of those Portuguese posts into English, would be worth a read.

By the way, might want to check out and submit some of your work for extra exposure.

8 | Pukhraj

21 de Março, 2009, 7:06


Outstanding info………….

Formulário para comentário




Loading ... Loading ...

RSS Primal Wisdom (Don Matez)

  • Legumes in Hunter-Gatherer Diets
  • The Real Gladiator Diet
  • The Progression of Disease According To Oriental Medicine: Part 2
  • Stoned On Fat?
  • Gathering Wild Grains
  • The Progression of Disease According to Oriental Medicine: Part 1
  • Micronutrient Comparison: High fat vs. High carb; Plus: Ancient Greek Diet and Diseases
  • Effect of dietary fat on satiation within and between meals
  • Farewell To "Paleo"
  • Fat balance versus energy balance

RSS That Paleo Guy (Jamie Scott)

  • TdF-Inspired Cycling Post #1 - Updating high-fat diets for cyclists
  • Low carbohydrate diets slow tumor growth and prevent cancer initiation
  • Whole grains and another smoking gun?
  • Subclinical celiac disease and gluten sensitivity
  • Low Carbohydrate Diet Review: Shifting the Paradigm
  • Follow the money II
  • Nutrition Professor puts sugar in the gun
  • Interview with a nutritional ecologist - Research on human obesity epidemic
  • The Diet and Lifestyle of the People of Vanuatu: Paleo in Paradise
  • Where's Wally? On Vacation...

RSS Heart Scan Blog (Dr. William Davis)

  • Calling all super-duper weight losers!
  • Lp(a): Be patient with fish oil
  • Baby your pancreas
  • Bread equals sugar
  • Gluten-free carbohydrate mania
  • Gluten-free is going DOWN
  • Normal cholesterol panel . . . no heart disease?
  • Idiot farm
  • Eat triglycerides
  • You’ve come a long way, baby

RSS Low-carb Forum

  • Anthony Colpo on low-carbs and thyroid
  • Gall Bladder Question
  • Researchers Discover Why High-Fat, Low-Carb Ketogenic Diets Work To Control Epilepsy
  • Statins raise diabetes risk
  • Leptin and leptin resistance
  • Estrogen?
  • Chris Masterjohn on Healthy Skeptic talks cholesterol
  • N=1
  • Melanoma Doc Urges MORE Sun
  • Coffee May Lower Risk of Prostate Cancer

RSS Fat New World (Jekyll)

  • Opinião do Dr. Hilary Jones sobre a restrição no consumo de ovos
  • Crossfit Summer Weekend Trip - 23 e 24 de Julho, Vila Nova de Santo André
  • VII Congresso Internacional de Nutrição Clínica Funcional e VI Congresso Brasileiro de Nutrição Desportiva Funcional
  • "Boas Práticas de Perda de Peso", a posição do Conselho Cientifico da Plataforma Contra a Obesidade
  • Beber 6-8 copos de água por dia: “não é apenas disparate, mas um disparate desmistificado”
  • O exercício afecta a sinalização hormonal após uma refeição
  • Dietas hiperproteicas aumentam a saciedade independentemente do número de refeições
  • O seu cocktail matinal de drogas e hormonas
  • Quase todas as crianças consomem pizzas e refrigerantes pelo menos quatro vezes por semana
  • Crianças portuguesas são das que mais têm excesso de peso na Europa

Nacional (notícias)

2º Congresso Nacional Cuidados Continuados
5ª Reunião Pediátrica do Hospitalcuf Descobertas
17º Simpósio de Infecção e Sepsis
Sonda da NASA já está a orbitar fóssil do Sistema Solar
Lisboa recebe primeira edição de uma escola avançada
Efeito placebo é eficaz em asmáticos
Topo de hierarquias sociais provoca mais stress
Encontradas figuras funerárias do Antigo Egipto
Novas tecnologias para formar professores do ensino básico
Dente-de-leão pode dar origem a “pneus verdes”
Internet tornou-se na “memória externa” do cérebro humano
Nova rede de investigação ecológica
À procura de fundos para comprar o livro mais antigo da Europa
Nova edição de Informação SIDA®
Cuidados com o sol na praia da Falésia

  • “Viva Bem e Com Saúde”: Vegetarianismo e outros temas de saúde, Dr. Viriato Ferreira entrevistado pelo Pastor William Santos
  • How the ideology of low fat conquered america
  • CarbSane Argument Clinic
  • JANE: Journal/Author Name Estimator
  • Dieta low-fat de baixo custo vs Dieta paleolítica low-cost, mais uma ilusão inspirada no nutricionismo?
  • Colesterol, um factor de risco major para o nosso coração, mas os resultados são desanimadores (palavras dos estatinadores)
  • Três artigos sobre saúde, condição física, nutrição e exercício em sociedades primitivas, por Pedro Bastos
  • “Statin treatment of healthy people”, newsletter do Dr. Uffe Ravnskov
  • Águas turbulentas nas guerras insulínicas: Dr. Stephan Guyenet rejeita teorias de Gary Taubes
  • “Who needs statins?”, artigo de Justin Smith, autor de “$29 Billion Reasons to Lie about Cholesterol”
  • Pedro: O Dr. Staffan Lindeberg tem exactamente essa opinião: o excessivo foco em macronutrientes não nos deixa ver a big picture, que parece ser o estilo d
  • isabel marão: trabalho com culinária e vou enlouquecer com tantas divêrgencias em nos informar sobre os alimentos saúdaveis, o que devemos ou não consumir. Util
  • Steve Cooksey: Sincerest thanks for posting my website/blog. It's a true honor. Steve Cooksey

Canibais e Reis

"As populações da Idade da Pedra tinham vidas mais saudáveis do que a maior parte do povo que surgiu imediatamente depois delas. Quanto a facilidades, como a boa alimentação, os divertimentos e os prazeres estéticos, os primitivos caçadores e recolectores de plantas gozavam de luxos que só os mais ricos dos nossos dias podem gozar" - Marvin Harris (1927-2001).

Dietas primitivas e tradicionais


Hipótese Lipídica


Vitamina D

Podcasts (áudio)