Let's stay in contact

Welcome! and thank you for visiting my little home on the web. I would love to tell you more about my journeys in tech and running. I mostly write about what I learn from running long distances and building large scale web applications. I also enjoy reading your email responses.

Thank you, Chris

Click Here to Join Me

Collecting data is easy.

Analyzing data is fun.

Preparing collected data for analysis is difficult.

Recently I started moving my websites off of Google Analytics. Instead of using Google, I’ve setup my own web traffic analytics. Collecting traffic data and links was easy. It took about an hour to set up.

Analyzing data is easy as well. There are a couple amazing projects that handle large amounts of data. I’m using Kibana from elastic.co. Kibana has a fun interface for creating all sorts of amazing looking graphs.

The difficult part is transforming the log files from my web server and loading them into Elasticsearch, the backend for Kibana.

The data is stored in individually compressed files created once an hour from each of my webservers. The log files are not consistent. The format for each log entry is roughly the same. Data might be appended or omitted depending on the information available at the time of the log event.

I need a complex piece of software to uncompress new log files, look for duplicate entries, combine files from each server into one file, and then load that file into Elasticsearch. Most of the time a Data Scientist isn’t doing data analysis. Most of the time is spent transforming data from one format to another.

Chris Larson

I fell in love with running after I committed to running a 5k everyday during the month of September. Now I'm training for my first Marathon on October 1st, 2017.

chrylarson chrylarson


Published