Blaise Byron Faint
1 IntroductionChecklists are great.
As someone who is new to studying artificial intelligence (AI), machine learning(ML), and deep learning (DL), I am overwhelmed by the complexity of the topic.
For example, here is a Machine Learning Algorithm MindMap by Vineet Verma.
Right now, when I consider a set of data, I have no clue which of these many algorithms is (are) appropriate for that particular use case.
In addition, if I’m considering tokenizing the text of War and Peace, shouldn’t I investigate whether that text has already been tokenized, to avoid duplication of effort?
(Along those lines, I suggest the establishment of a sister website to the various iterations of Project Gutenberg, in which all of the available texts are tokenized, with boilerplate text that is replicated in many texts commented out.)
If I do happen to have a guess as to which algorithm to use, I currently have to manually search the web (Github, specifically) for sample code to fit my imagined use case.
In addition, as I understand it, the next big goal of AI is artificial general intelligence (AGI), in which a computer is able to mimic all of the functions of the human brain.
Especially for someone in the process of learning AI, and given that it’s difficult to ascertain which algorithm is most appropriate for a particular use case, why not use all of them and find out?
I suggest the establishment of a generalized repository of mature artificial intelligence algorithms, similar to Chocolatey or some other package manager, optimized for simplifying the task of processing data, designed for 3 main purposes:
B. Pure Research
C. Practical Applications
2 Example proto-checklistThe checklist repository would be downloadable or accessible via the cloud. It could have a teaching mode, in which the master program walks the user through each step, to offer sample code and explanations of why the program recommends a particular algorithm for a specific set of data, a practical use case, in which the end user already knows which algorithms and hyperparameters they wish to use, and a God-mode, in which the master program manipulates the data in every conceivable way, subject to the limitations imposed by the available processing power.
For example, I open the master program and choose “Teaching,” “Practical Use Case,”, or “God Mode”. Based on my initial selection, the computer leads me through an opening dialog using checkboxes, drop-down menus, or natural language processes, to determine what I am trying to accomplish or if I even know what I’m trying to accomplish. For example, in “Teaching” mode, I select a file or folder on my local machine.
The master program examines the folder (or file) and by file extension(s) determines whether the data
is homogenous, or a mixture of csv, txt, jpg, or other files.
The master program opens each file and determines whether the data has been pre-processed, or whether pre-processing is required. It could then either attempt to pre-process the data or inform me of what pre-processing is required.
If the file is a csv of historical stock prices, the master program might suggest a time-series algorithm. If the data is known to be pre-processed, the master program could find this information via the checklist repository.
If the data is images, the checklist repository could convert the images to text. If the data is text, the master program could convert the text to graphs and images.
Ultimately, by working through the checklist of mature algorithms, the master program could find answers to questions the end user didn’t even think to ask.
End users could then share their conclusions in a specialized format intended to extend the functionality of the checklist repository, just as Chocolatey relies on files optimized for that particular use case. In other words, instead of the end user Googling “MNIST” and finding The MNIST Database,  downloading the files manually, and processing them for the undecillionth time, the checklist repository would access this type of information automatically and walk the end user through the sample code, results, and so on.
3 Advantages and disadvantagesThe major disadvantage of this approach is that the processing power used could quickly grow exponentially and beyond the ability of the end user to manage with limited computing resources. The advantage is analogous to Frank Zappa’s explanation of the decline of the music business:
The executives of the day were “cigar-chomping old guys who looked at the product and
said, 'I don’t know. Who knows what it is? Record it, stick it out. If it sells, alright!'”
Given that I don’t know which algorithm is most appropriate, rather than guess, I believe that allowing the checklist repository to thoroughly examine the data will yield potential benefits that will ultimately expedite the arrival of AGI and artificial super intelligence (ASI).
4 Conclusions and future workDon’t be the “hip, young executives” who turned out to be far more conservative than the “old guys”
AcknowledgmentsI acknowledge that water is wet.
References Gawande, Atul (2009). The Checklist Manifesto: How to Get Things Right. Metropolitan Books.
 Verma, Vineet (2015). “Machine Learning Algorithm MindMap.” https://techutils.in/blog/2015/12/26/machine-learningalgorithm-
 Woodford, Chris (2018). “Neural networks.” https://www.explainthatstuff.com/introduction-to-neural-networks.html
 LeCun, Yann, Cortes, Corinna, and Burges, Christopher J.C. (2013). The MNIST Database. http://yann.lecun.com/exdb/
 Zappa, Frank (1987). “Frank Zappa Explains the Decline of the Music Business.” http://www.openculture.com/2016/09/
This paper originally published June 10, 2018.