Exponential Functions Since the Independent variable occurs in the Exponent...
This is a follow up post titled, “skill sets required to become a Machine learning expert“
Get your Math skill up. We know, you have been told at school that you can never be good in Math, listen up! that is a lie that you were told. If only you would sit and dedicate some learning time, for sure you will achieve greater things in Math, than you ever imagined.
Math is the queen of all sciences and if you capture the queen, you shall have the kingdom.
That being said, it is imperative to have a sound knowledge in Mathematics to gain competence in Machine learning. Agreed that the field of Mathematics is profound and vast, yet still, you don’t need to be a genius and acquire competence in the entire subject, which is neither possible nor feasible to do.
However, there are certain specific areas which you can focus and specialize that would give you the required tools to move up the ladder.
The three areas that you need to concentrate would be, Probability and Statistics, Linear Algebra and Calculus.
Probability and Statistics: The field by itself is large. But we shall focus on the following concepts.
- Sampling Theory
- Probability Distribution
- Bayes’ Theorem
- Hypothesis Testing
Linear Algebra: Here our focus would be on Matrices and Vectors.
In Matrices, our focus will be on performing basic operations in Matrix algebra like, Matrix addition, subtraction, Matrix inverse and transposes.
Likewise in Vectors our focus will be on Vector algebra involving all of the above operations that we talked about in Matrices, only that the components that we work will have directions and magnitude.
Calculus: The word Calculus has it origin from a Latin word which means, Pebble. As with the name so too is the subject, strong and firm, with applications found in almost all walks of life.
There are two branches in Calculus, Differential Calculus and Integral Calculus. In Differential calculus, we study about the Rate of Change, and in Integral Calculus, we study about the area under a Curve, the curve can have a practical representation and limits.
If there is one skill that pays you huge dividends then it is the skill and thrill of knowing how to program. From C to Java, we have a wide variety of programming languages that which you can gain expertise in.
No language is outdated, for it is you who ought to decide what language you wish to work with, and the platform is always available. In short, no program experience is wasted.
Having said these, you don’t need to be a top nerd in all of the languages, don’t even attempt to do that, a mere waste of energy and resources. Just study two of the ones that we shall delve here.
Python and R.
Python: this language is based on the OOPS concept. A concept which saw the rise of C++ to unprecedented levels. OOPS stands for Object Oriented Programming, here everything that you want to operate on, is treated as an Object. The language requires basic math skills to achieve mastery. Integration with other software is simple in Python.
R: This language is a functional language. Meaning, everything that want to operate on, is treated as a function. The Language requires certain algebraic skills to achieve mastery. The speed of execution is fast in comparison to Python owing to the fact that it is function based and packages are inbuilt and hence all you ought to do is, merely import them.
After all, everything that you are going to work with and work on, is going to be data. Just as in the case of any industry, when you receive your Raw materials, they ought to be processed first before you could work on them. So too is Data.
Before you could input Data into the Machine learning Algorithm, all of the received Data needs to go undergo what is known as the Pre-Processing state.
The following Pre-Processing steps ought to be performed.
Data Cleaning: the Data that you receive will be from a variety of sources and in unstructured form and hence, data cleaning is the process where you modify and or remove, Incomplete, Inaccurate, Incorrect, Irrelevant, duplicated, and or improperly formatted data.
Why do we need to perform this process? A valid question, the reason is simple; you just don’t want the outcome to have a bias or inaccuracies latent in them. This process will increase data accuracy there by ensure better output.
ETL stands for Data Extraction, Data Transform and Data Load
Data Extraction: This is vital part and one of the main ways to procure Data. Sources from which the data is procured usually includes Internet sites and servers which are either locally maintained or hooked on to the main reservoir.
Transforming the Data: Data come in all forms and structures, and not all forms of data can be fed into the Machine learning Algorithm. The best way to understand this principle would be that of a Video converter Machine.
You have a raw video mpeg file, you want to upload the file on to your website. If you were to upload the file, the space occupied on your server would be large, also, this would drastically increase the load time, making your website heavy which in turn would mean, slower website.
According to research, a website which takes more than 3 seconds to load, will lose, 40% of its customers. So what is the remedy? The best accepted format is .flv. To get this format, you need a Machine with an Algorithm that can work with MPEG, files.
The solution is a Video Converter. The Data is sent in, the output of .flv, is obtained, this format is further sent into your Server, which is yet another machine. Hopefully, you are getting the idea. This is called as Data Transformation.
Data Loading into your Algorithm
This is a skill of its own. The best way to understand this loading technique is to understand Eating. Our Machine called the Human body, needs Data, to function, to process, to produce, to reproduce, to multiply, to be fruitful.
The data that we use for the human body/machine, is food. You want to eat steak, we all know, steak comes from cows, you simply cannot go and gobble a cow in, can we?
So what we do? We go to a butcher, select the meat cut that we need, then bring it home, season it up, place it on the pan, cook it medium rare, and then place it on the platter, with some potatoes and scotch bonnet peppers, then we use a knife and a fork, slice it up, and then place a small slice in our mouth, then Enjoy!
The slice that you placed in your mouth is what is known as Data loading. To prevent choking, you need to place small meat cuts in manageable forms into your mouth. So your teeth, a machine that can grind, will be not be overwhelmed.
In fact, all of the data handling pre-process that we talked about can be summed up with this one example that we had illustrated above.
Data Base Management Software
Here you will have to communicate with the Server which holds the data. Just imagine you are walking into a Tuna market in Japan. None of the Tuna sellers there speak English. You know for sure that Tuna is an expensive fish, in order to get a better deal, you ought to speak the language of the seller. Here in this case would be Japanese.
Similarly, larger servers hold data. To obtain data from them, we need to communicate with these servers in their language and their language is called the Data Base Management languages. One of the most popular language that is being used widely would be, MySQL.
Machine Learning Algorithm-Application
These come in two types, Supervise Machine Learning Algorithm and Unsupervised Machine Learning Algorithm.
After you have mastered the Algorithm these are the vital process you ought to know.
- Selecting the Best Algorithm for the Business solution. This is can illustrated like this. You wish to travel from Point A to Point C. If you were to catch a train, the time taken to commute would be 90 minutes. But instead, if you were to use a ferry, and commute from Point A to Point B, and then from Point B, if you were to catch a bus shuttle service, the time taken would be 65 minutes. The choice of the machines that you used here are different. The key word is Choice. Your choices will dictate the outcome and the profit that you will make in Business.
- Creating a Good Model with one or more algorithm
- Optimization for better accuracy
The example above is appropriate to understand these terms too.