

Besides being hard to understand, the shorter variable names can be mistyped. These values represent true_positives, true_negatives, false_positives and false_negatives, so make it explicit. tp, tn, fp, fnĪvoid machine learning-specific abbreviations.
#A MESS O TROUBLE CODE#
Then, in code review, make sure to enforce these written standards.

Agree with the rest of your team on common abbreviations and write them down. If you’re using abbreviations like these, make sure you establish them ahead of time. House_price_in_aud = house_price_in_usd * usd_to_aud_conversion_rate usd, aud, mph, kwh, sqft House_price_in_usd = get_house_price_in_usd(house_sqft, Temp = get_house_price_in_usd(house_sqft, house_room_count)įinal_value = temp * usd_to_aud_conversion_rate # Do this instead Perhaps it is a value where you need to convert the units, so in that case, make it explicit: # Don't do this TempĮven if you are only using a variable as a temporary value store, still give it a meaningful name. A name such as value tells you nothing about the purpose of the variable and just creates confusion. What does the value represent? It could stand for velocity_mph, c ustomers_served, efficiency or revenue_total. Instead, use names that describe what these variables represent such as house_features and h ouse_prices. If you’ve seen these several hundred times, you know they commonly refer to features and targets in a data science context, but that may not be obvious to other developers reading your code. What does this look like in practice? Let’s go through some improvements to variable names. Prioritize how easy your code is to read over than how quick it is to write.Īdopt standard conventions for naming so you can make one global decision in a codebase instead of multiple local decisions. Your code will be read more times than it is written. A variable name should tell you concisely in words what the variable stands for. The variable name must describe the information represented by the variable. There are three basic ideas to keep in mind when naming variables: More From Will Koerhsen The Poisson Process and Poisson Distribution, Explained Named constants are in ALL_CAPITAL_LETTERS Variable/function names are lower_case and separated_with_underscores Some Python-specific naming rules (see here for more details) include: Note: I’m focusing on Python since it’s by far the most widely used language in industry data science.
#A MESS O TROUBLE SOFTWARE#
Fortunately, there are best practices from software engineering we data scientists can adopt, including the ones we’ll cover in this article. Yes, you can get away with them in a Jupyter Notebook that runs once, but when you have mission-critical machine learning pipelines running hundreds of times per day with no errors, you have to write readable and understandable code. Unhelpful, confusing or vague variable namesīoth these problems contribute to the disconnect between data science research (or Kaggle projects) and production machine learning systems. There are significant differences between deployable machine learning code and how data scientists learn to program, but we’ll start here by focusing on two common and easily fixable problems: Use consistent standards throughout a project to minimize the cognitive burden of small decisions.Īs I’ve grown from writing research-oriented data science code for one-off analyses to production-level code (at Cortex Building Intelligence), I’ve had to improve my programming by unlearning practices from data science books, courses and the lab.When writing your code, prioritize ease of reading over speed of writing.A variable name should describe the entity the variable represents.
