Lesson 7 - Linear regression

In this tutorial, you'll learn how to do basic multiple linear regression with tables in your KgBase project. This feature lets you specify one or multiple input variables (predictor variables) and a single output variable (outcome variable.)

2 min read - Start now

Step 1

Import example dataset for linear regression. If you don't have suitable data, you can try this feature using our car prices example table. This table contains 1,000 listings of cars for sale.

Step 2

Start by uploading this dataset into KgBase. Create a new project named "Car sales", click on the cogs icon in the lower right corner, and pick "Import CSV".

Step 3

To use linear regression, some columns need to be converted to numbers. To convert column format to numbers, select the "Edit" option in the column header.

Step 4

In the edit dialog that just came up, select the "Number" option for "Type" field.

Repeat this process for other number columns: price, yearOfRegistration, powerPS, and kilometer.

Step 5

In the sidebar on the left, expand the last item "Regression". Here you can select multiple input variables. We'll select yearOfRegistration, kilometer, and powerPS. These are the variables that will probably affect the price of a vehicle.

For output variable, select price. This is the result we're trying to predict.

Step 6

Finally, click on "Run regression" button. A new column will appear next to price column: A number labelled price (predicted).

This is the result of linear regression trained by all data in the dataset, and ran against input variables for the current row.