what is one hot encoding and how to do that?

Ritul
2 min readNov 20, 2018

so whenever we study any ML algorithm, the term one hot encoding is widely used. So what is it and how to do it?

First of all one hot encoding is a technique by which we convert categorical data into numerical data. we perform each category’s data representation into binary terms.

see for the example , suppose we have student data with their ranks as,

after one hot encoding , it will look like this,

How to do that in python…

well to do this in python ,we have to load our Dataframe, for that we will install pandas and will read_csv() function as,

once the Dataframe is loaded we will apply get_dummies( ) function of pandas to assign binary numbers for each rank as,

Now when we have dummies, we will concatenate this newly generated four columns to our existing previous data frame and will drop the ‘rank’ column. Here axis =1 represents column ,then it will look exactly as the picture shown above.

so this way we can perform one hot encoding, where only one feature is hot or active and remaining are inactive. this way we can reduce any dependency among variables, if there exist any.

--

--

Ritul

Data Science Enthusiast | Advanced Analytics Intern at EY