To Build or Not To Build

Getting to the meat of the problem with Stakeholder Abbie

Abbie came to our data science team with 2 major issues. She needed some heavy duty data cleansing, and help with dialing in the efforts of her Senior Engineering team. The data that was in need of cleaning was from 6–7 years back that was collected in an unorganized fashion, leading to a large block of text that is not useful for many tasks in her organization in its current form. Our job was to parse and reorganize the bridge sites that contained this data, to be used in future predictive analysis or visualizations. Our second task was to utilize our “Data Science mojo” aka Machine Learning algorithms to assist her team in reducing the amount of times an engineer has to take an entire week to assess a new possible location, only to discover that it will not support a bridge build.

In order to tackle a problem, you need to understand all the players

10,000 ft view, down to staring it right in the eyes, that’s how I like to break down tasks given to us. Then as we break things down, we consistently reference the overall picture and point of view. We had our assignments from Abbie, although before we could divide and conquer, we needed to understand our objectives a little more. “Measure twice, cut once” my dad would always say. The data cleaning is directly correlated to the web teams success, that means we need to have a cleaned dataset, ripe for the querying, loaded to a database, hosted on our API, with all proper routes set, ASAP. Due to the size of the task we decided to work in two teams. team 1: 2013/2014 data parsing, team 2: General data cleaning on the remaining features

With Great Power Comes Great Responsibility

My next task was to back up the machine learning task. Our second team was able to move through their portion of data cleansing faster than team 1. So we decided for team 2 to begin diving into building a model to aid in predicting the success of a bridge assessment.

The Numbers

Before we even started the task of cleaning the data there was much talk of just organizing the features and target for the machine learning task. Through much conversation with Abbie and a little help from our data science manager we were able to isolate multiple categories to a binary target. After Senior review did the site turn out to be a “Good Site” or “Not Good Site”, aka 1 or 0.

Synthetic Minority Over-sampling Technique

Utilizing SMOTE was the best chance we had at increasing our models efficacy and reducing the risk of overfitting the data. SMOTE works by looking at the locations in which the minority classes resides in vector space, it draws lines between all those points, then adds data points to class along those lines. In a simple world where everything is 2D its relatively easy to imagine. When data points are multi-dimensional, it can be a little harder to visualize.

Reflections

Going into the experience, I was a little nervous in general, maybe call it stage fright. It’s very exciting to work with a real problem. From the start I have always been obsessed with impact, can my work make a difference in whatever I’m doing. I was extremely excited to make a positive mark on this already incredible company and mission.

Thank you

Wanted to give a huge Thank you to Abbie at Bridges to Prosperity for entrusting me and my team to work on her projects, listen to our results, and answer our questions. I am very thankful for the experience and the hard work of all my teammates, Web and Data Science. It had been a fun and eventful 4 weeks!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store