I bet you all heard that more than a half of Kaggle competitions was won using only one algorithm [source].
You probably even gave it a try. It’s so easy to get excited about a dream of getting on top of the leaderboard. Imagine all that fame and fortune, ahh.
Let’s get real. You made it work. The submitted results are good but definitely not the best. Then motivation falls.
Don’t give up! You know that it is possible to get more out of it. Start investigating deeper.
Time (and webpages) passes by. You are overwhelmed and even more confused than in the beginning. There are a lot of detailed, laser-focused guides, but it’s hard to find a more general, easy to follow one.
It starts to resemble Dilberts story:
Additionally more topics relating to each other stars forming:
- "I’m dealing with an imbalanced dataset, which parameters to tune and how?"
- "My data contains missing values - does XGBoost handles them?"
- "Gradient Boosted Tree? What the hell is it?"
- "How can I evaluate my results to be more confident that I’m not overfitting?”
- "The XGBoost version in repo wasn’t updated for 1 year. Maybe I should install the latest version from sources?”
I have been that way.
I saw a lot of these question asked by other people on different groups or forums (uff, I was not alone). These are common issues. My “Read later” browser bookmark list was getting longer and longer.
And guess what.
I read it all.
Some of them were super boring, some very inspiring. I was determined and time spent cannot get wasted.
A guy named Seneca (Roman philosopher) once said - "While we teach, we learn".
So after spending 100+ hours of exploring all possible catches I present to you….
Practical XGBoost in Python
A 100% free online course that will show you how to use one of the hottest algorithms in 2016. You will learn things like:
- how does the algorithm work explained in layman's terms,
- using it both with a native and scikit-learn interface,
- figuring out which features in your data are most important,
- dealing with bias / variance tradeoff (overfitting problem),
- evaluating algorithm performance,
- dealing with missing data,
- handling imbalanced datasets
Each topic is described from A to Z in a fully reproducible way. It starts with loading data set and takes you through all steps. At the end, you will have a clear vision and be able to use a technique in your cases.
Go through video materials and learn how to harness the algorithm to make it work for your data.
XGBoost has proven its power in many competitions. It might be tempting to jump-in right away, but please take the time and read the recommended prerequisites before doing this.
This course is for you if:
- you want to understand the mechanics of the methods used,
- don’t want to get buried in math equations,
- you respect your time (stop wasting it on side things like compiling sources),
- you are focused on getting the job done,
- want to know the proper approach when dealing with common machine learning issues specific to XGBoost
You shouldn’t take it if you:
- don’t have elementary computer skills (you can install Git and Docker on you machine, do you?),
- expect immediate results (remember that all great skills come with practice),
- have never seen Python language before
"Very clear, well-structured and informative, even with a brief read through I can already pick up some helpful knowledge - i.e. how to handle imbalanced dataset"
Yifan Xie
Project Manager at Airbus
Frequently Asked Questions
“Practical XGBoost in Python” is a part of Parrot Prediction’s ESCO Courses. It's a collection of online data-science courses guided in an innovative way.
The main point is to gain experience from empirical processes. From there we can build the right intuition that can be reused everywhere.
Remember that knowledge without action is useless.