OS theme
Dark theme
Light theme
My iPhone battery
My iPhone Wi-Fi
Current motion
Click for current location
πŸ‘ˆπŸΌ πŸ•ΈπŸ’ πŸ‘‰πŸΌ
Skip to main content

Learn more about me. Email me at













Taking Control of my Personal Health Data

9 min read

Over the past few years, I've invested time and effort into extricating important data and content from external services, and bringing it into systems that I own and control. I've moved on from Facebook and Instagram, established tracking for my movie, tv, and podcast activity, automatically track my location in multiple ways, and much more. But, for years now, one type of data has eluded me: my personal health data.

As of today, that has changed! I'd like to share with you what I've built.

Overview of Enhancements

My website now features my personal health metrics in several places. First, there is now a health section which shows both daily health metrics and historical metrics. You can go backward and forward in time and compare my daily metrics to historical min, max, and average values.

For the daily metrics, I use the familiar Apple Activity Rings format, and include supporting metrics across a variety of categories, including activity, heart health, and sleep analysis.

Daily Health Metrics Screenshot

For the historical metrics, I am particularly proud of the visualization. Each metric has a bar representing the minimum, maximum, and average values, and the gradient that is used to fill the bar adjusts to reflect the position of the average value.

Historical Health Metrics Screenshot

In addition, I have augmented my monthly summaries.

Monthly Health Summary Screenshot

Each day is represented by an Activity Ring and can be clicked on to view detailed, in-context metrics for that day.

Overall, I am quite pleased with how this project has turned out. Navigating through health metrics is snappy, the visualizations are attractive and useful, and it fits in neatly with the rest of my site.

Now that we've walked through what these features look like in practice, let's discuss how I gather the data and make it useful.

Unlocking HealthKit

I've owned an Apple Watch since the Series 2 watch was released, and have worn it fairly consistently ever since. As a result, I've got quite a lot of data amassed on my iPhone in Apple Health. That data is accessible through the Health app, and also via the HealthKit APIs. While I am a pretty strong developer, my skillset doesn't include much in the way of iOS development. I've made a few attempts at building an iOS app that will allow me to extract my HealthKit data automatically, but never made it far before I ran out of steam.

A few weeks ago, I discovered an app called Health Auto Export (which I will refer to as HAE for the rest of this post), which neatly solves the problem. HAE has many great features, but the key feature is "API Export," which allows you to automatically have your HealthKit data sent to an HTTP endpoint in JSON or CSVΒ format, with control over time period and aggregation granularity. With this app in hand, I set about creating an API to store, index, and make that data searchable.

Introducing Health Lake

HAE uses a simple, but nested JSONΒ data structure to represent health metrics. Because the data is structured, in plain-text, and will mostly sit at rest, a data lake is a natural target to store the data. Data lakes on Amazon Web Services (AWS) are generally implemented with Amazon S3 for storage, as it is well-suited to the use case, is deeply integrated with AWS' data, analytics, and machine learning (DAML) services.

In order to keep most of the complexity out of my website, I decided to build a microservice which is entirely focused on getting data into the data lake and making it useful. I call this service Health Lake, and the source is available on GitHub.

Sync and Store

Let's take a look at the first endpoint of Health Lake, which accepts data from HAE, trasforms it to align with the requirments for AWS's DAML services, and stores it in S3 - HTTP POST /sync.

HAE structures its data in a nested format:

    "data": {
        "metrics": [
                "units": "kcal",
                "name": "active_energy",
                "data": [
                        "date": "2021-01-20 00:00:00 -0800",
                        "qty": 370.75

As you can see, the data is nested fairly deeply. In order to simplify my ability to query the data, Health Lake transforms the data to a flatter structure, with each data point being formatted in JSONΒ on a single line. On each sync, I create a single object that contains many data points, one per line, in a format like this:

{"name": "active_energy", "date": "2021-01-20 00:00:00 -0800", "units": "kcal", "qty": 370.75 }

Each sync object is stored in my target S3 bucket with the key format:

syncs/<ISO-format date and time of sync>.json

The prefix on the object name is critical, as it enables the indexing and querying of sync data independent from other data in the bucket.

Querying the Data Lake

Now that we have data being sent to our data lake and stored in an efficient, standardized format, we can focus on making that data searchable. Very often, I use relational databases like MySQL or PostgreSQL to store data and make it searchable with SQL. AWS provides a few great services which allow you to treat your data lake as a series of database tables that can be queried using SQL.

The first service we'll leverage is AWS Glue, which provides powerful data integration capabilities:

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes.

Using AWS Glue, I created a database called "health," and then created a "crawler," which connects to my data store in S3, walks through all of the data, and attempts to infer the schema based upon hints and classifiers. The crawler can be run manually on-demand, or can be scheduled to run on a regular basis to continuously update the schema as new fields are discovered. Here is what the configuration of my crawler looks like in the AWS Glue console:

AWS Glue Crawler Configuration Screenshot

Upon the first run of the crawler, a new table was created in my health database called syncs, which inferred the following schema:

AWS Glue Table Schema Screenshot

I wasn't able to get the crawler to match the date format properly, so I ended up creating a "view" which adds a proper column that is a timestamp using the following SQL statement:

    date_parse(substr(date, 1, 19), '%Y-%m-%d %H:%i:%s') as datetime,

Now that our data lake has been crawled, and a database, table, and view have been defined in our AWS Glue Data Catalog, we can use Amazon Athena to query our data like using standard SQL. Athena is entirely serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Daily Metrics

For our daily metric view, we need a summary of all metrics gathered on a specific day. To accomplish this, I added an endpoint to our microservice:


In response to this request, the client will receive a JSON data structure collecting all data points for that day. Under the hood, the microservice is running the following SQL query:

SELECT * FROM history 
    datetime >= TIMESTAMP 'YYYY-MM-DD 00:00:00'
    datetime <= TIMESTAMP 'YYYY-MM-DD 23:59:59'

Because I pay for every query that I run on Athena, and to achieve great performance, I store the query results in the proper format for the client in S3 after I run the query. I then implemented some intelligence to decide if, for any given request, I should pull from the cache, or regenerate fresh data. Take a look at the source code for more detail.

Monthly Metrics

To show our monthly summaries, we need to get data for each day of the month. Rather than sending a request and query for every single day of the month, I decided to implement another endpoint to our microservice:

HTTP GET /summary/<YYYY-MM>

In response to this request, the client will receive a JSON data structure collecting all data points for the month, sorted by date. To accomplish this, I run the following SQL query:

SELECT * FROM history
    datetime >= TIMESTAMP 'YYYY-MM-01 00:00:00'
    datetime <= TIMESTAMP 'YYYY-MM-31 00:00:00'

The start and end range are actually calculated to ensure I have the proper end date, as not every month has the same number of days. Again, to save costs and improve performance, results are intelligently cached in our S3 bucket.

Global Metrics

Generating a global summary of all data points in the data lake was a bit more challenging. To make things more efficient, I created another view in my database with this query. Results are, again, intelligently cached.

Website Integration

With all of this great data available to me, it was time to integrate it with my website, which uses the Known CMS. I have created a Known plugin that provides enhancements that are specific to my website. Using this plugin, I simply send requests to the Health Lake microservice, parse the JSON, and create my visualizations.


Overall, I am quite pleased that I have been able to integrate this data into my website, and more importantly, to free the data from its walled garden and place it under my control and ownership.


Normal is a Privilege

4 min read

I just wish things would get back to normal.

Its a refrain we’ve all heard since the emergence of COVID-19. The world has been thrown into chaos, our way of life has been threatened, and many people have lost their jobs. The best minds in medicine and science are encouraging life-altering precautions like social distancing, wearing masks, and staying home. People are having to adapt to this β€œnew normal” quickly, working from home, perhaps balancing the pressures of parenting children, or taking care of family members who are at higher risk. While its obviously the right thing to do to be cautious, and not re-open the country too quickly, protestors have taken to the streets to demand that things go back to β€œnormal,” even if it puts others at risk.

Life in a global pandemic is not comfortable, convenient, or fun. We can all agree on that. The β€œnew normal” sort of sucks.

But, what of the old normal? Well, consider George Floyd.

Last week, our chaotic world erupted into further chaos with the brutal murder of George Floyd by a Police Officer in Minneapolis. Make no mistake, I absolutely believe that this was a murder, and one that was the direct result of a fundamentally flawed system that demands reform. There must be justice for George Floyd, and it cannot just come in the form of punishment for the killers, it also must come in the form of radical, systematic change.

For many people, going back to β€œnormal” is, on the surface, quite appealing. Returning to our privileged lives, where we feel safe to go out to eat, walk with our friends and family at shopping malls, gather in our places of worship, and to do it all with a strong sense of security – after all, the police, and all other systems of power, are there to protect us.

But, for a huge portion of our country, β€œnormal” means avoiding the police because they cannot be trusted to protect you. It means less opportunity at work. It means overcoming an unequal system to fight for the same benefits that others readily receive.

You know what? The old normal sucks, too.

It is no surprise to me that many people who are calling to β€œre-open our country” in the midst of a global pandemic are also telling protestors to calm down, or to β€œtone down” their methods of protest. These demands often come from a position of privilege; of preserving a system that fundamentally benefits them at the expense of others. They like things just the way they were.

But, this time, we can’t let the cries for a β€œreturn to normalcy” win. As an ally, I cannot sit idly by, or demand that the oppressed respond to their generations-long oppression with calm, non-violent protest. It’s been nearly 30 years since the police brutalization of Rodney King, and it’s clear that nothing has fundamentally changed in that time. Peaceful protest isn’t enough. Voting isn’t enough. Patient conversation isn’t enough. Incremental change isn’t enough. Now, I am not advocating for violence, but I am advocating for persistent, enduring commitment to driving change.

We must listen to what Martin Luther King Jr. said in his Letter from a Birmingham Jail in August of 1963:

I have almost reached the regrettable conclusion that the Negro’s great stumbling block in the stride toward freedom is not the White Citizens Councillor or the Ku Klux Klanner but the white moderate who is more devoted to order than to justice; who prefers a negative peace which is the absence of tension to a positive peace which is the presence of justice; who constantly says, β€œI agree with you in the goal you seek, but I can’t agree with your methods of direct action”; who paternalistically feels that he can set the timetable for another man’s freedom; who lives by the myth of time; and who constantly advises the Negro to wait until a β€œmore convenient season.”

So, fellow white people, it is time to step up. Acknowledge your privilege, speak up, and demand radical, fundamental change. Amplify the voices of people of color. Fearlessly support Black Lives Matter. Embrace the fact that creating a more just, equitable, and fair society likely means that you will need to sacrifice your own privilege for the benefit of others.

Let’s not go back to normal. Let’s create a better normal.