Datastrophes: Il Buono, il Brutto, il Cattivo

Data projects: stacking up assumptions

CRISP-DM
  • Business understanding: students generally don’t subscribe easily to health insurance.
  • Data understanding: the data received for the surveying company represents students adequately.
  • Data preparation: the age of students is always greater than 18.
  • Data modeling: the distribution of the students’ income is bimodal.
  • Evaluation: testing the conversion rate is enough.
  • Deployment: everything is gonna be fine (sorry for the troll).

The case of SQL

  • Logging the schema: to find the cause rapidly or, using a monitoring tool to alert you.
  • Testing the schema stability: and stop the application with a warning message, resulting in a comprehensible ticket.
  • Reviewing the schema of the table periodically (or ping your DBA): to clear up your mind.

The case of “AI”

Datastrophe, a definition

DC-comics — Question
  • Bad for you: you’ll undoubtedly be pointed out as guilty or responsible even though you may not feel that way,
  • Ugly for the group: data intelligence results are used to draw strategic or tactical moves resulting in disastrous events,
  • Good for both: please focus on this one, because there is always a light you can seek:
  1. You can both learn from it and anticipate them in the future,
  2. You can discover the opportunities opened by the new learning that those assumptions aren’t (always) right.
Check on this fantastic Tableau on the movie created by Filippo Mastroianni (@FilMastroianni)
  • Data issues: data monitoring.
  • Data downtime: data observability.
  • Dark data: coined by Gartner, means unused data.
  • Dark data (my preferred) coined by David J. Hand who lists 14th different DD-types:
  1. Data We Know Are Missing.
  2. Data We Don’t Know Are Missing.
  3. Choosing Just Some Cases.
  4. Self-Selection is a variant of DD-Type 3.
  5. Missing What Matters.
  6. Data Which Might Have Been.
  7. Changes with Time.
  8. Definitions of Data.
  9. Summaries of Data.
  10. Measurement Error and Uncertainty.
  11. Feedback and Gaming.
  12. Information Asymmetry.
  13. Intentionally Darkened Data.
  14. Fabricated and Synthetic Data.

What’s next?

References

--

--

--

I get energy from using his expertise in mathematics, data technologies to build innovative solutions to help organisations - former Spark Notebook creator

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Benefits of Cloud Technology

Case Study: Temona sees 40% improvement in issue documentation

GitLab CI — SSH with Passphrase deploy example

READ/DOWNLOAD*% Learning Python FULL BOOK PDF & FULL AUDIOBOOK

Frontend and backend integration based on a True Story

Lazy Instantiation

Repository pattern

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andy Petrella

Andy Petrella

I get energy from using his expertise in mathematics, data technologies to build innovative solutions to help organisations - former Spark Notebook creator

More from Medium

tSterilisation Validation for Auto Injectors

Fully Homomorphic Encryption: Cutting the Gordian Knot of querying health data without accessing it

Managing Vault Secrets using Vault-CRD Operator

DrChrono EHR Vs Athena EHR — A Closer Look