Updated information regarding what data is used can be found in the changelog page.

Data pre-processing

30 day time-window

Events for a given individual and a given phenocode will be merged if they are less than or equal to 30 days apart. For example if an individual as K11_APPENDACUT events at the following dates: 2000-01-01, 2000-01-20, 2000-02-10, 2000-02-28, then all these events will become one at date 2000-01-01.

This is done as an attempt to remove events that are follow-ups rather than initial diagnoses.


Unadjusted prevalence

Number of individuals having at least one event for a given phenocode, divided by the total number of individuals in the FinnGen study. No adjustment is done to account for the difference between the age distribution of the FinnGen cohort and the one of the Finnish population.

Recurrence within 6 months

Number of individuals having two events for the given phenocode less than 6 months apart, divided by the number of individuals having at least one event for the given phenocode.

Case fatality at 5-years

Number of individuals that died less than 5 years after the first event for the given phenocode, divided by the number of individuals having at least one event for the given phenocode.

Phenocode associations

Most of the study follows the NB-COMO study.

Data pre-processing

  • Start of study: 1998-01-01
  • End of study: 2017-12-31
  • Prevalent cases removed from the study.
  • Ignore time before start of study for individuals having the prior-phenocode before the study starts.
  • Split time in unexposoed and exposed periods.

Cox regression

The model used is: y ~ prior + birth_year + sex

The regression are done using the lifelines library.


Due to the sensitive nature of the data, the age when entering and leaving the study has an accuracy of 1 year.

Source code

Availabe on GitHub for both the data processing pipeline and the website.