Rethinking the Data Science Portfolio

Why an easily executed and user-friendly project will outclass a high-complexity showpiece

Photo by Helena Lopes on Unsplash

There’s a longstanding myth across tech fields: believing that the more complex a project is, the more competence it conveys.

I spent countless hours during my early career adding layers upon layers of complexity to my projects, hoping to impress potential employers with my technical skills and ability to juggle intricate algorithms. Over time, I realized that this approach was not only counterproductive but also a misrepresentation of how I’d deliver work on the job.

As both a hiring manager and a firm believer in the value of a good portfolio, I’m making the case for emphasizing ease of running and user-friendliness over complexity, to showcase a more accurate understanding of real-world data science tasks.

Don’t Make The Goal Complicated

When I was around a year into my career in data science, I had to build a model to calculate baseline web traffic for our attribution model. I immediately saw my chance to show off. I went through iterations of ARIMA, Prophet, etc, looking to showcase my machine learning skills. I settled on a random forest implementation.

When it came time to go over the methodology with non-technical high-ups, I had to tell them the inner workings were a bit of a black box. I was met with blank faces. My manager pulled me aside afterward and suggested I try something that can be fully explained in one PowerPoint slide (post-it note that advice on your monitor). Almost joking with myself, I tried a rolling median. It was right in the middle of the reasonably tight error score pack.

We went back to the high-ups with “tuning updates” and all I had to say was rolling average and nobody so much as questioned it. Five years and multiple attribution model iterations later, that baseline calculation remains in production without any need for retraining or model drift monitoring. It’s handled hundreds of millions of dollars of ad spend and is one of the few elements of our work that hasn’t faced quibbles from a probing client.

This might seem like an extreme example in the context of portfolios, but it highlights a critical point. Complex projects can easily become unwieldy, difficult to understand, and as such, harder to execute. Such projects provide diminishing value to both you and those reviewing your work.

An elaborate portfolio project may demonstrate technical prowess, but it doesn’t necessarily reflect the capacity to create pragmatic solutions.

Competence in data science isn’t solely about your ability to handle complexity. It’s about your understanding of the field’s principles, your capacity to create practical, user-friendly solutions that can be easily implemented and run, and most importantly, your ability to tell a compelling story with data.

Keep Your Work Accessible

It’s easy to lose sight of these fundamentals in the pursuit of complexity. An elaborate portfolio project might show technical prowess, but it doesn’t reflect your capacity to create practical solutions that align with business objectives and constraints. Such pursuit of complexity might not just misrepresent the demands of real-world data science, but also overlook the very essence of the discipline: leveraging data to create impactful, easily understandable, and actionable insights.

When your projects are easily understood and simple to run, they become more accessible and engaging.

The last time my team was hiring, I was legitimately stoked to see so many portfolio links on the resumes. I was quickly disappointed to see more than half of them were a couple of repos containing a single Jupyter notebook, no instructions, and in-notebook markdown that didn’t go beyond one-liner descriptions of code blocks. I gave almost everyone who had instructions and requirements files an interview — the bar really can be that low.

Accessible and engaging projects will make your portfolio stand out to potential employers. Yes, GitHub displays your notebook contents, but if a hiring manager looking for your work can clone the repo, follow your README file, and run all your code without changing anything, you’re going to find yourself near the top of the candidate list.

Show that You Code to Collaborate

All of the above also applies to the quality of code vs quantity and complexity. Being able to produce code that is both maintainable and understandable by others demonstrates your ability to work within a team, and explicitly displays a professional level of software engineering skills.

It shows your awareness that your code is not an island, but part of a larger ecosystem where others may need to interact with it. Focusing on user-friendliness will show a level of maturity and understanding of real-world data science that goes beyond pure technical expertise.

Reflect on Your Learned Habits

It’s also important to step back from the code and reflect on how you’re approaching your work. I struggled immensely in school, always knowing the class content yet never being able to test well. My school life became not about how to get A-grades, but how to simply pass tests where the stakes were my entire future.

As you might imagine, that didn’t set me up well to jump into situations where giving the CEO as little information as possible would win me massive bonus points.

Most of us spent the better part of 15–20 years working to succeed in education systems that gauge our success by our ability to regurgitate as much knowledge as possible.

Particularly when looking for an entry-level position, it takes courage to shift away from the perceived norm, pivoting to processes and workflows that go against your instincts. After all, I’m not alone in my school experience; most of us spent the better part of 15–20 years working to succeed in education systems that gauge our success by our ability to regurgitate knowledge as possible.

Focus on the End Goal

Remember, your portfolio is your introduction to potential employers, and they’re looking for so much more than just technical prowess. They want to see your problem-solving abilities, your communication skills, and your understanding of real-world problems.

When hiring, I place the most value on how seamless it’s going to be to work with someone. A team is only as good as their ability to effectively solve problems together.

Rethinking the Data Science Portfolio was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Logo

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

We don’t spam!

Leave a Comment

Scroll to Top