
In this short tutorial I illustrate a complete data analysis process which exploits the scikit-learn Python library. The process includes
The code of this tutorial can be downloaded from my Github Repository.
Firstly, I load the dataset through the Python pandas library. I exploit the heart.csv dataset, provided by the Kaggle repository.
import pandas as pddf = pd.read_csv('source/heart.csv')
df.head()

The R software is a very popular software for statistical computing and graphics. It provides many packages which can be also used for data science, especially for data analysis.
This article belongs to the series R for Beginners, which tries to help beginners to get started with the R software. In my previous article, I dealt with vectors. In this article, I deal with matrices and, in particular I focus on the following aspects:
A matrix is a multidimensional…

Many Javascript libraries exist to build and animate maps, such as Leaflet.js and Highcharts. In this article I exploit the very famous Data Driven Documents (D3) library (version 5), which is more than a simple graph library.
D3 is a Javascript library which permits to manipulate documents, based on data.
In this tutorial I will build a choropleth map which shows the population of each country of the world. I have modified the original code, by adapting it to D3 v5 and enriching it with interactivity and annotations.
The full code can be downloaded from my Github Repository.
Firstly, I…
Recently, the Observablehq team has released a new feature, which permits to import sqlite3 databases into a notebook. This features is very powerful, since it permits to dynamically query the dataset through the classical SQL syntax. The original tutorial provided by Mike Bostock is available at this link.
In this tutorial, I exploit the new sqlite3 feature to build a simple bar chart, which updates dynamically, according to users’ selection.
As example dataset, I use the Generic Food Database, provided by data.world and available at this link. The following table shows a snapshot of the Generic Food Database:

In the early 2000s, one of the most popular topics was the Semantic Web. The Semantic Web, also known as Web of Data or Web 3.0, tried to give a structure to the content of Web pages such that they were understandable not only by humans but also by machines.
The Semantic Web is the Web of Data, instead of the previous versions of Web, which were the Web of documents.
The main technologies associated to the Semantic Web are:

Almost all the Data Scientists working in Python know the Pandas library and almost all of them know the read_csv() function. However, only few of them know the read_html() function.
The read_html() function permits to extract tables contained in HTML pages very quickly. The basic version of this function extracts all the tables contained in the HTML page, while the usage of some specific parameters allows the extraction of a very specific table.
In this tutorial, I focus on the following HTML page, containing the groups of Euro 2020 football competition:

Some months ago, I wrote an article, which described the full process to build a SARIMA model for time series forecasting. In that article, I explained how to tune the p, d and q order of a SARIMA model and I evaluated the performance of the trained model in terms of NRMSE.
One comment about that article was that the proposed model was basically an ARIMA model, since it did not consider the seasonal order. I thanked the comment’s author and I investigated this aspect.
And now I am here to explain you an interesting aspect that I discovered and…

In this tutorial, I illustrate how to implement a classification model exploiting the K-Neighbours Classifier. The full code is implemented as a Jupyter Notebook and can be downloaded from my Github repository.
As an example dataset, I exploit the Titanic dataset provided in the Kaggle Challenge: Titanic — Machine Learning from Disaster. The objective of this challenge is to build a model, which predicts whether a passenger survived or not during the Titanic disaster, given some passenger’s features.
The dataset is composed of three files:

It may happen that you need only some rows of your Python dataframe. You can achieve this result, through different sampling techniques.
In this tutorial, I illustrate the following techniques to perform rows sampling through Python Pandas:
The full code can be downloaded from my Github repository.
In this tutorial, I exploit the iris dataset, provided by the scikit-learn library and I convert it to a pandas dataframe:
from sklearn.datasets import load_iris
import pandas as pddata = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)

Observablehq is a very popular notebook to write code exploiting the D3.js library. Thanks to the many examples and tutorials available on the Web, you can fork already built notebooks and customise them for your needs.
However, once built a graph, it is not immediately easy to embed into another Web site.
In this tutorial, I propose two strategies to embed a graph into a Web site:
In both cases, firstly, you must to publish your notebook, by clicking the publish button.
In addition, in both cases, you should follow the following…
I’m a computer scientist with experience in the field of Web applications, Data Science, Data Journalism, Blockchain and Semantic Web.