{"id":5599,"date":"2022-05-02T14:02:04","date_gmt":"2022-05-02T14:02:04","guid":{"rendered":"https:\/\/www.aiproblog.com\/index.php\/2022\/05\/02\/using-kaggle-in-machine-learning-projects\/"},"modified":"2022-05-02T14:02:04","modified_gmt":"2022-05-02T14:02:04","slug":"using-kaggle-in-machine-learning-projects","status":"publish","type":"post","link":"https:\/\/www.aiproblog.com\/index.php\/2022\/05\/02\/using-kaggle-in-machine-learning-projects\/","title":{"rendered":"Using Kaggle in Machine Learning Projects"},"content":{"rendered":"<p>Author: Zhe Ming Chng<\/p>\n<div>\n<p>You\u2019ve probably heard of Kaggle data science competitions, but did you know that Kaggle has many other features that can help you with your next machine learning project? For people looking for datasets for their next machine learning project, Kaggle allows you to access public datasets by others and share your own datasets. For those looking to build and train their own machine learning models, Kaggle also offers an in-browser notebook environment and some free GPU hours. You can also look at other people\u2019s public notebooks as well!<\/p>\n<p>Other than the website, Kaggle also has a command-line interface (CLI) which you can use within the command line to access and download datasets.<\/p>\n<p>Let\u2019s dive right in and explore what Kaggle has to offer!<\/p>\n<p>After completing this tutorial, you will learn:<\/p>\n<ul>\n<li>What is Kaggle?<\/li>\n<li>How you can use Kaggle as part of your machine learning pipeline<\/li>\n<li>Using Kaggle API\u2019s Command Line Interface (CLI)<\/li>\n<\/ul>\n<p>Let\u2019s get started!<\/p>\n<div id=\"attachment_13561\" style=\"width: 810px\" class=\"wp-caption aligncenter\">\n<img decoding=\"async\" aria-describedby=\"caption-attachment-13561\" class=\"wp-image-13561 size-full\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-scaled.jpg\" alt=\"\" width=\"800\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-scaled.jpg 2560w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-300x200.jpg 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-1024x683.jpg 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-768x512.jpg 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-1536x1024.jpg 1536w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-2048x1365.jpg 2048w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/stefan-widua-kOuaZs7jDZE-unsplash-600x400.jpg 600w\" sizes=\"(max-width: 2560px) 100vw, 2560px\"><\/p>\n<p id=\"caption-attachment-13561\" class=\"wp-caption-text\">Using Kaggle in Machine Learning Projects<br \/>Photo by <a href=\"https:\/\/unsplash.com\/photos\/kOuaZs7jDZE\">Stefan Widua<\/a>. Some rights reserved.<\/p>\n<\/div>\n<h2>Overview<\/h2>\n<p>This tutorial is split into five parts; they are:<\/p>\n<ul>\n<li>What is Kaggle?<\/li>\n<li>Setting up Kaggle Notebooks<\/li>\n<li>Using Kaggle Notebooks with GPUs\/TPUs<\/li>\n<li>Using Kaggle Datasets with Kaggle Notebooks<\/li>\n<li>Using Kaggle Datasets with Kaggle CLI tool<\/li>\n<\/ul>\n<h2>What Is Kaggle?<\/h2>\n<p>Kaggle is probably most well known for the data science competitions that it hosts, with some of them offering 5-figure prize pools and seeing hundreds of teams participating. Besides these competitions, Kaggle also allows users to publish and search for datasets, which they can use for their machine learning projects. To use these datasets, you can use Kaggle notebooks within your browser or Kaggle\u2019s public API to download their datasets which you can then use for your machine learning projects.<\/p>\n<div id=\"attachment_13551\" style=\"width: 716px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13551\" loading=\"lazy\" class=\"wp-image-13551 \" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions.png\" alt=\"\" width=\"706\" height=\"384\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions.png 1627w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions-300x163.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions-1024x558.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions-768x418.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_featured_competitions-1536x836.png 1536w\" sizes=\"(max-width: 706px) 100vw, 706px\"><\/a><\/p>\n<p id=\"caption-attachment-13551\" class=\"wp-caption-text\">Kaggle Competitions<\/p>\n<\/div>\n<p>In addition to that, Kaggle also offers some courses and a discussions page for you to learn more about machine learning and talk with other machine learning practitioners!<\/p>\n<p>For the rest of this article, we\u2019ll focus on how we can use Kaggle\u2019s datasets and notebooks to help us when working on our own machine learning projects or finding new projects to work on.<\/p>\n<h2>Setting up Kaggle Notebooks<\/h2>\n<p>To get started with Kaggle Notebooks, you\u2019ll need to create a Kaggle account either using an existing Google account or creating one using your email.<\/p>\n<p>Then, go to the \u201cCode\u201d page.<\/p>\n<div id=\"attachment_13554\" style=\"width: 197px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_sidebar_notebook.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13554\" loading=\"lazy\" class=\"wp-image-13554\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_sidebar_notebook.png\" alt=\"\" width=\"187\" height=\"362\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_sidebar_notebook.png 293w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_sidebar_notebook-155x300.png 155w\" sizes=\"(max-width: 187px) 100vw, 187px\"><\/a><\/p>\n<p id=\"caption-attachment-13554\" class=\"wp-caption-text\">Left Sidebar of Kaggle Home Page, Code Tab<\/p>\n<\/div>\n<p>You will then be able to see your own notebooks as well as public notebooks by others. To create your own notebook, click on New Notebook.<\/p>\n<div id=\"attachment_13546\" style=\"width: 809px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_code_page.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13546\" loading=\"lazy\" class=\"wp-image-13546\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_code_page.png\" alt=\"\" width=\"799\" height=\"445\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_code_page.png 1209w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_code_page-300x167.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_code_page-1024x570.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_code_page-768x428.png 768w\" sizes=\"(max-width: 799px) 100vw, 799px\"><\/a><\/p>\n<p id=\"caption-attachment-13546\" class=\"wp-caption-text\">Kaggle Code Page<\/p>\n<\/div>\n<p>This will create your new notebook, which looks like a Jupyter notebook, with many similar commands and shortcuts.<\/p>\n<div id=\"attachment_13548\" style=\"width: 844px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_notebook.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13548\" loading=\"lazy\" class=\"wp-image-13548\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_notebook.png\" alt=\"\" width=\"834\" height=\"382\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_notebook.png 1373w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_notebook-300x137.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_notebook-1024x469.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_notebook-768x352.png 768w\" sizes=\"(max-width: 834px) 100vw, 834px\"><\/a><\/p>\n<p id=\"caption-attachment-13548\" class=\"wp-caption-text\">Kaggle Notebook<\/p>\n<\/div>\n<p>You can also toggle between a notebook editor and script editor by going to File -&gt; Editor Type.<\/p>\n<div id=\"attachment_13555\" style=\"width: 312px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_toggle_script.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13555\" loading=\"lazy\" class=\"wp-image-13555\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_toggle_script.png\" alt=\"\" width=\"302\" height=\"362\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_toggle_script.png 587w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_toggle_script-250x300.png 250w\" sizes=\"(max-width: 302px) 100vw, 302px\"><\/a><\/p>\n<p id=\"caption-attachment-13555\" class=\"wp-caption-text\">Changing Editor Type in Kaggle Notebook<\/p>\n<\/div>\n<p>Changing the editor type to script shows this instead:<\/p>\n<div id=\"attachment_13549\" style=\"width: 860px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_script.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13549\" loading=\"lazy\" class=\"wp-image-13549\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_script.png\" alt=\"\" width=\"850\" height=\"412\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_script.png 1260w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_script-300x145.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_script-1024x496.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_empty_script-768x372.png 768w\" sizes=\"(max-width: 850px) 100vw, 850px\"><\/a><\/p>\n<p id=\"caption-attachment-13549\" class=\"wp-caption-text\">Kaggle Notebook Script Editor Type<\/p>\n<\/div>\n<h2>Using Kaggle with GPUs\/TPUs<\/h2>\n<p>Who doesn\u2019t love free GPU time for machine learning projects? GPUs can help to massively speed up the training and inference of machine learning models, especially with deep learning models.<\/p>\n<p>Kaggle comes with some free allocation of GPUs and TPUs, which you can use for your projects. At the time of this writing, the availability is 30 hours a week for GPUs and 20 hours a week for TPUs after verifying your account with a phone number.<\/p>\n<p>To attach an accelerator to your notebook, go to Settings \u25b7 Environment \u25b7 Preferences.<\/p>\n<div id=\"attachment_13550\" style=\"width: 902px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_environment_preferences.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13550\" loading=\"lazy\" class=\"wp-image-13550\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_environment_preferences.png\" alt=\"\" width=\"892\" height=\"500\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_environment_preferences.png 1268w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_environment_preferences-300x168.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_environment_preferences-1024x574.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_environment_preferences-768x431.png 768w\" sizes=\"(max-width: 892px) 100vw, 892px\"><\/a><\/p>\n<p id=\"caption-attachment-13550\" class=\"wp-caption-text\">Changing Kaggle Notebook Environment preferences<\/p>\n<\/div>\n<p>You\u2019ll be asked to verify your account with a phone number.<\/p>\n<div id=\"attachment_13558\" style=\"width: 540px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/verify_phone.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13558\" loading=\"lazy\" class=\"wp-image-13558\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/verify_phone.png\" alt=\"\" width=\"530\" height=\"262\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/verify_phone.png 1290w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/verify_phone-300x148.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/verify_phone-1024x506.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/verify_phone-768x380.png 768w\" sizes=\"(max-width: 530px) 100vw, 530px\"><\/a><\/p>\n<p id=\"caption-attachment-13558\" class=\"wp-caption-text\">Verify phone number<\/p>\n<\/div>\n<p>And then presented with this page which lists the amount of availability you have left and mentions that turning on GPUs will reduce the number of CPUs available, so it\u2019s probably only a good idea when doing training\/inference with neural networks.<\/p>\n<div id=\"attachment_13557\" style=\"width: 703px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/use_accelerator.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13557\" loading=\"lazy\" class=\"wp-image-13557\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/use_accelerator.png\" alt=\"\" width=\"693\" height=\"241\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/use_accelerator.png 1266w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/use_accelerator-300x105.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/use_accelerator-1024x357.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/use_accelerator-768x268.png 768w\" sizes=\"(max-width: 693px) 100vw, 693px\"><\/a><\/p>\n<p id=\"caption-attachment-13557\" class=\"wp-caption-text\">Adding GPU Accelerator to Kaggle Notebook<\/p>\n<\/div>\n<h2>Using Kaggle Datasets with Kaggle Notebooks<\/h2>\n<p>Machine learning projects are data-hungry monsters, and finding datasets for our current projects or looking for datasets to start new projects is always a chore. Luckily, Kaggle has a rich collection of datasets contributed by users and from competitions. These datasets can be a treasure trove for people looking for data for their current machine learning project or people looking for new ideas for projects.<\/p>\n<p>Let\u2019s explore how we can add these datasets to our Kaggle notebook.<\/p>\n<p>First, click on Add data on the right sidebar.<\/p>\n<div id=\"attachment_13541\" style=\"width: 265px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/add_data.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13541\" loading=\"lazy\" class=\"wp-image-13541\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/add_data.png\" alt=\"\" width=\"255\" height=\"502\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/add_data.png 366w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/add_data-153x300.png 153w\" sizes=\"(max-width: 255px) 100vw, 255px\"><\/a><\/p>\n<p id=\"caption-attachment-13541\" class=\"wp-caption-text\">Adding Datasets to Kaggle Notebook Environment<\/p>\n<\/div>\n<p>A window should appear that shows you some of the publicly available datasets and gives you the option to upload your own dataset for use with your Kaggle notebook.<\/p>\n<div id=\"attachment_13547\" style=\"width: 749px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_datasets.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13547\" loading=\"lazy\" class=\"wp-image-13547\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_datasets.png\" alt=\"\" width=\"739\" height=\"508\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_datasets.png 1083w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_datasets-300x206.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_datasets-1024x703.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_datasets-768x528.png 768w\" sizes=\"(max-width: 739px) 100vw, 739px\"><\/a><\/p>\n<p id=\"caption-attachment-13547\" class=\"wp-caption-text\">Searching Through Kaggle datasets<\/p>\n<\/div>\n<p>I\u2019ll be using the classic titanic dataset as my example for this tutorial, which you can find by keying your search terms into the search bar on the top right of the window.<\/p>\n<div id=\"attachment_13556\" style=\"width: 795px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/search_titanic_dataset.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13556\" loading=\"lazy\" class=\"wp-image-13556\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/search_titanic_dataset.png\" alt=\"\" width=\"785\" height=\"463\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/search_titanic_dataset.png 1195w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/search_titanic_dataset-300x177.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/search_titanic_dataset-1024x604.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/search_titanic_dataset-768x453.png 768w\" sizes=\"(max-width: 785px) 100vw, 785px\"><\/a><\/p>\n<p id=\"caption-attachment-13556\" class=\"wp-caption-text\">Kaggle Datasets Filtered with \u201cTitanic\u201d Keyword<\/p>\n<\/div>\n<p>After that, the dataset is available to be used by the notebook. To access the files, take a look at the path for the file and prepend <code>..\/input\/{path}<\/code>. For example, the file path for the titanic dataset is:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">..\/input\/titanic\/train_and_test2.csv<\/pre>\n<p>In the notebook, we can read the data using:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">import pandas\r\n\r\npandas.read_csv(\"..\/input\/titanic\/train_and_test2.csv\")<\/pre>\n<p>This gets us the data from the file:<\/p>\n<div id=\"attachment_13553\" style=\"width: 952px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13553\" loading=\"lazy\" class=\"wp-image-13553\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset.png\" alt=\"\" width=\"942\" height=\"373\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset.png 1673w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset-300x119.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset-1024x405.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset-768x303.png 768w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_notebook_read_dataset-1536x607.png 1536w\" sizes=\"(max-width: 942px) 100vw, 942px\"><\/a><\/p>\n<p id=\"caption-attachment-13553\" class=\"wp-caption-text\">Using Titanic Dataset in Kaggle Notebook<\/p>\n<\/div>\n<h2>Using Kaggle Datasets with Kaggle CLI Tool<\/h2>\n<p>Kaggle also has a public API with a CLI tool which we can use to download datasets, interact with competitions, and much more. We\u2019ll be looking at how to set up and download Kaggle datasets using the CLI tool.<\/p>\n<p>To get started, install the CLI tool using:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">pip install kaggle<\/pre>\n<p>For Mac\/Linux users, you might need:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">pip install --user kaggle<\/pre>\n<p>Then, you\u2019ll need to create an API token for authentication. Go to Kaggle\u2019s webpage, click on your profile icon in the top right corner and go to Account.<\/p>\n<div id=\"attachment_13544\" style=\"width: 287px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_account.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13544\" loading=\"lazy\" class=\"wp-image-13544\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_account.png\" alt=\"\" width=\"277\" height=\"373\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_account.png 521w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/kaggle_account-223x300.png 223w\" sizes=\"(max-width: 277px) 100vw, 277px\"><\/a><\/p>\n<p id=\"caption-attachment-13544\" class=\"wp-caption-text\">Going to Kaggle Account Settings<\/p>\n<\/div>\n<p>From there, scroll down to Create New API Token:<\/p>\n<div id=\"attachment_13542\" style=\"width: 698px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/create_api_token.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13542\" loading=\"lazy\" class=\"wp-image-13542\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/create_api_token.png\" alt=\"\" width=\"688\" height=\"444\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/create_api_token.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/create_api_token-300x193.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/create_api_token-768x495.png 768w\" sizes=\"(max-width: 688px) 100vw, 688px\"><\/a><\/p>\n<p id=\"caption-attachment-13542\" class=\"wp-caption-text\">Generating New API Token for Kaggle Public API<\/p>\n<\/div>\n<p>This will download a <code>kaggle.json<\/code> file that you\u2019ll use to authenticate yourself with the Kaggle CLI tool. You will have to place it in the correct location for it to work. For Linux\/Mac\/Unix-based operating systems, this should be placed at <code>~\/.kaggle\/kaggle.json<\/code>, and for Windows users, it should be placed at <code>C:Users&lt;Windows-username&gt;.kagglekaggle.json<\/code>. Placing it in the wrong location and calling <code>kaggle<\/code> in the command line will give an error:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">OSError: Could not find kaggle.json. Make sure it\u2019s location in \u2026 Or use the environment method<\/pre>\n<p>Now, let\u2019s get started on downloading those datasets!<\/p>\n<p>To search for datasets using a search term, e.g., titanic, we can use:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">kaggle datasets list -s titanic<\/pre>\n<p>Searching for titanic, we get:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">$ kaggle datasets list -s titanic\r\nref                                                          title                                           size  lastUpdated          downloadCount  voteCount  usabilityRating\r\n-----------------------------------------------------------  ---------------------------------------------  -----  -------------------  -------------  ---------  ---------------\r\ndatasets\/heptapod\/titanic                                    Titanic                                         11KB  2017-05-16 08:14:22          37681        739  0.7058824\r\ndatasets\/azeembootwala\/titanic                               Titanic                                         12KB  2017-06-05 12:14:37          13104        145  0.8235294\r\ndatasets\/brendan45774\/test-file                              Titanic dataset                                 11KB  2021-12-02 16:11:42          19348        251  1.0\r\ndatasets\/rahulsah06\/titanic                                  Titanic                                         34KB  2019-09-16 14:43:23           3619         43  0.6764706\r\ndatasets\/prkukunoor\/TitanicDataset                           Titanic                                        135KB  2017-01-03 22:01:13           4719         24  0.5882353\r\ndatasets\/hesh97\/titanicdataset-traincsv                      Titanic-Dataset (train.csv)                     22KB  2018-02-02 04:51:06          54111        377  0.4117647\r\ndatasets\/fossouodonald\/titaniccsv                            Titanic csv                                      1KB  2016-11-07 09:44:58           8615         50  0.5882353\r\ndatasets\/broaniki\/titanic                                    titanic                                        717KB  2018-01-30 04:08:45           8004        128  0.1764706\r\ndatasets\/pavlofesenko\/titanic-extended                       Titanic extended dataset (Kaggle + Wikipedia)  134KB  2019-03-06 09:53:24           8779        130  0.9411765\r\ndatasets\/jamesleslie\/titanic-cleaned-data                    Titanic: cleaned data                           36KB  2018-11-21 11:50:18           4846         53  0.7647059\r\ndatasets\/kittisaks\/testtitanic                               test titanic                                    22KB  2017-03-13 15:13:12           1658         32  0.64705884\r\ndatasets\/yasserh\/titanic-dataset                             Titanic Dataset                                 22KB  2021-12-24 14:53:06           1011         25  1.0\r\ndatasets\/abhinavralhan\/titanic                               titanic                                         22KB  2017-07-30 11:07:55            628         11  0.8235294\r\ndatasets\/cities\/titanic123                                   Titanic Dataset Analysis                        22KB  2017-02-07 23:15:54           1585         29  0.5294118\r\ndatasets\/brendan45774\/gender-submisson                       Titanic: all ones csv file                      942B  2021-02-12 19:18:32            459         34  0.9411765\r\ndatasets\/harunshimanto\/titanic-solution-for-beginners-guide  Titanic Solution for Beginner's Guide           34KB  2018-03-12 17:47:06           1444         21  0.7058824\r\ndatasets\/ibrahimelsayed182\/titanic-dataset                   Titanic dataset                                  6KB  2022-01-27 07:41:54            334          8  1.0\r\ndatasets\/sureshbhusare\/titanic-dataset-from-kaggle           Titanic DataSet from Kaggle                     33KB  2017-10-12 04:49:39           2688         27  0.4117647\r\ndatasets\/shuofxz\/titanic-machine-learning-from-disaster      Titanic: Machine Learning from Disaster         33KB  2017-10-15 10:05:34           3867         55  0.29411766\r\ndatasets\/vinicius150987\/titanic3                             The Complete Titanic Dataset                   277KB  2020-01-04 18:24:11           1459         23  0.64705884<\/pre>\n<p>To download the first dataset in that list, we can use:<\/p>\n<pre class=\"urvanov-syntax-highlighter-plain-tag\">kaggle datasets download -d heptapod\/titanic --unzip<\/pre>\n<p>Using a Jupyter notebook to read the file, similar to the Kaggle notebook example, gives us:<\/p>\n<div id=\"attachment_13543\" style=\"width: 838px\" class=\"wp-caption aligncenter\">\n<a href=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/jupyter_titanic.png\"><img decoding=\"async\" aria-describedby=\"caption-attachment-13543\" loading=\"lazy\" class=\"wp-image-13543\" src=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/jupyter_titanic.png\" alt=\"\" width=\"828\" height=\"421\" srcset=\"https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/jupyter_titanic.png 1402w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/jupyter_titanic-300x152.png 300w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/jupyter_titanic-1024x520.png 1024w, https:\/\/machinelearningmastery.com\/wp-content\/uploads\/2022\/04\/jupyter_titanic-768x390.png 768w\" sizes=\"(max-width: 828px) 100vw, 828px\"><\/a><\/p>\n<p id=\"caption-attachment-13543\" class=\"wp-caption-text\">Using Titanic Dataset in Jupyter Notebook<\/p>\n<\/div>\n<p>Of course, some datasets are so large in size that you may not want to keep them on your own disk. Nonetheless, this is one of the free resources provided by Kaggle for your machine learning projects!<\/p>\n<h2>Further Reading<\/h2>\n<p>This section provides more resources if you\u2019re interested in going deeper into the topic.<\/p>\n<ul>\n<li>Kaggle: <a href=\"https:\/\/www.kaggle.com\/\">https:\/\/www.kaggle.com<\/a>\n<\/li>\n<li>Kaggle API documentation: <a href=\"https:\/\/www.kaggle.com\/docs\/api\">https:\/\/www.kaggle.com\/docs\/api<\/a>\n<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this tutorial, you learned what Kaggle is , how we can use Kaggle to get datasets, and even for some free GPU\/TPU instances within Kaggle Notebooks. You\u2019ve also seen how we can use Kaggle API\u2019s CLI tool to download datasets for us to use in our local environments.<\/p>\n<p>Specifically, you learnt:<\/p>\n<ul>\n<li>What is Kaggle<\/li>\n<li>How to use Kaggle notebooks along with their GPU\/TPU accelerator<\/li>\n<li>How to use Kaggle datasets in Kaggle notebooks or download them using Kaggle\u2019s CLI tool<\/li>\n<\/ul>\n<p>The post <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/using-kaggle-in-machine-learning-projects\/\">Using Kaggle in Machine Learning Projects<\/a> appeared first on <a rel=\"nofollow\" href=\"https:\/\/machinelearningmastery.com\/\">Machine Learning Mastery<\/a>.<\/p>\n<\/div>\n<p><a href=\"https:\/\/machinelearningmastery.com\/using-kaggle-in-machine-learning-projects\/\">Go to Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author: Zhe Ming Chng You\u2019ve probably heard of Kaggle data science competitions, but did you know that Kaggle has many other features that can help [&hellip;] <span class=\"read-more-link\"><a class=\"read-more\" href=\"https:\/\/www.aiproblog.com\/index.php\/2022\/05\/02\/using-kaggle-in-machine-learning-projects\/\">Read More<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":5600,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[24],"tags":[],"_links":{"self":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5599"}],"collection":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/comments?post=5599"}],"version-history":[{"count":0,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/posts\/5599\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media\/5600"}],"wp:attachment":[{"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/media?parent=5599"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/categories?post=5599"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aiproblog.com\/index.php\/wp-json\/wp\/v2\/tags?post=5599"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}