Monthly Summary

A Summary of My June Articles…in Case you Have Missed Them

A quick recap of articles that I published in June: from Data Analysis, to Data Visualisation, up to Environment Setup.

4 min readJul 3, 2021

The articles I published in June can be classified in the following groups:

Data and Text Analysis
Data Visualisation
Environment Setup
Data Science Basics

In this article I do a quick recap of my June articles. I also insert the link to the original articles, in the case you want to deepen the topic.

1. Data and Text Analysis

Regarding Data Analysis, I focussed on Overfitting and Automated Machine Learning (AutoML), as well as I proposed a complete Data Analysis workflow with Python PyCaret.

1.1 Overfitting

Overfitted models are those that perform very well on the training set, but very poorly on the test set, thus producing very poor predictions.

In my article I proposed a strategy to check if a model is overfitted and how to mitigate this problem, in case our model is affected by overfitting.

How to Check if a Classification Model is Overfitted using scikit-learn

towardsdatascience.com

1.2 AutoML

AutoML permits to find the best model for a given problem automatically. In practice, AutoML selects the best model, with tuned parameters among some defined models.

Many libraries exist to perform AutoML in Python. In my article, I compared two libraries: Hyperopt Sklearn and TPOT.

AutoML in Python: A comparison between Hyperopt Sklearn and TPOT

Pros and Cons of the two popular AutoML libraries for Python

towardsdatascience.com

1.3 Complete Data Analysis Workflow

Following the great success had by previous article onto a complete Data Analysis workflow with scikit-learn, I wrote a similar article, which implemented the same workflow with PyCaret, a Python library for Machine Learning.

A Complete Data Analysis Workflow in Python PyCaret

A ready-to-run tutorial exploiting the best library for Machine Learning that I have ever used.

towardsdatascience.com

1.4 Text Analysis

Often, it may happen that you must extract information from unstructured text. For this, the SpaCy Python library for Text Analysis can help you.

I proposed a tutorial on how to exploit the SpaCy library to build a dataset from a text.

How to Extract Structured Information from a Text through Python SpaCy

A ready-to-run tutorial on how to build a structured dataset from a text.

towardsdatascience.com

2 Data Visualisation

A Good Data Visualisation is one of the hard things to do in the Data Science workflow. Everyone is able to build a data visualisation, but only a little number of data scientists can build a good visualisation.

For this reason, I June I started a series of Data Visualisation principles. At the moment I have covered the following aspects:

white spaces
text
colour
layout
emphasis

You can read Part 1 and Part 2 articles of the series, as well as a practical example in Python Altair:

Data Visualisation Principles Part 1: White Space, Text and Colour

Getting started with basic Graphic Design principles.

towardsdatascience.com

Data Visualisation Principles Part 1 — A Practical Example in Altair

A practical tutorial on how to build, customise and add annotations to a simple bar chart in Python Altair

towardsdatascience.com

Data Visualisation Principles Part 2: Layout and Emphasis

Getting started with basic Graphic Design principles.

towardsdatascience.com

In addition, I have collected some articles from the Web, dealing with the topic of Data Visualisation. I recommend you to read them, because you will get enriched :)

Some Interesting Articles and Resources on Data Visualisation that I Discovered in June

A summary of some up-to-date interesting articles and resources collected from the Web regarding Data Visualisation.

towardsdatascience.com

3 Environment Setup

Every data scientist should work in a comfortable environment, both at the physical and virtual layers. For this reason, I proposed some articles to setup your virtual work environment in the best way.

3.1 Jupyter

Principally, I work with Jupyter notebooks, so I illustrated the way to install Jupyter onto an Android Device, as well as how to run R scripts in Jupyter.

How to Install Python and Jupyter Notebook onto an Android Device

towardsdatascience.com

How to Run R scripts in Jupyter

A short tutorial on how to install the R Kernel and run it in Jupyter

towardsdatascience.com

3.2 Python

I investigated some strategies to improve Python coding. I found fantastic the usage of virtualenv, as well as an improvement in computation speed through PySpark.

Have you ever thought about using Python virtualenv?

A practical guide to install and use Python virtualenv both in terminal and in Jupyter notebooks.

towardsdatascience.com

How to Speed Up Your Python Code through PySpark

A tutorial on how to install and run Apache Spark and PySpark to improve the performance of your code.

towardsdatascience.com

Data Science Basics

Finally, I wrote some articles for beginners, including a starting description of Descriptive Analytics, some basic data structures in R and the top 25 libraries which every data scientist should know.

A Gentle Introduction to Descriptive Analytics

Some basic concepts regarding central tendency, frequency and dispersion indexes.

medium.com

R for Beginners — Part 1: Data Structures

An interactive article to learn the R language in a very simple way.

towardsdatascience.com

The Top 25 Python libraries for Data Science

A list of the Python libraries that you should try at least once in your life.

medium.com

Last but not least, I explained how to get started with D3.js maps.

Getting Started with D3.js Maps

alod83.medium.com

Summary

In this article, I have described a quick summary of the articles I published in June. If you want to stay up-to-date, you can follow me and also read my new publications.

Stay tuned :)

If you wanted to be updated on my research and other activities, you can follow me on Twitter, Youtube and and Github.

Monthly Summary

A Summary of My June Articles…in Case you Have Missed Them

A quick recap of articles that I published in June: from Data Analysis, to Data Visualisation, up to Environment Setup.

1. Data and Text Analysis

1.1 Overfitting

How to Check if a Classification Model is Overfitted using scikit-learn

1.2 AutoML

AutoML in Python: A comparison between Hyperopt Sklearn and TPOT

Pros and Cons of the two popular AutoML libraries for Python

1.3 Complete Data Analysis Workflow

A Complete Data Analysis Workflow in Python PyCaret

A ready-to-run tutorial exploiting the best library for Machine Learning that I have ever used.

1.4 Text Analysis

How to Extract Structured Information from a Text through Python SpaCy

A ready-to-run tutorial on how to build a structured dataset from a text.

2 Data Visualisation

Data Visualisation Principles Part 1: White Space, Text and Colour

Getting started with basic Graphic Design principles.

Data Visualisation Principles Part 1 — A Practical Example in Altair

A practical tutorial on how to build, customise and add annotations to a simple bar chart in Python Altair

Data Visualisation Principles Part 2: Layout and Emphasis

Getting started with basic Graphic Design principles.

Some Interesting Articles and Resources on Data Visualisation that I Discovered in June

A summary of some up-to-date interesting articles and resources collected from the Web regarding Data Visualisation.

3 Environment Setup

3.1 Jupyter

How to Install Python and Jupyter Notebook onto an Android Device

How to Run R scripts in Jupyter

A short tutorial on how to install the R Kernel and run it in Jupyter

3.2 Python

Have you ever thought about using Python virtualenv?

A practical guide to install and use Python virtualenv both in terminal and in Jupyter notebooks.

How to Speed Up Your Python Code through PySpark

A tutorial on how to install and run Apache Spark and PySpark to improve the performance of your code.

Data Science Basics

A Gentle Introduction to Descriptive Analytics

Some basic concepts regarding central tendency, frequency and dispersion indexes.

R for Beginners — Part 1: Data Structures

An interactive article to learn the R language in a very simple way.

The Top 25 Python libraries for Data Science

A list of the Python libraries that you should try at least once in your life.

Getting Started with D3.js Maps

Summary

Written by Angelica Lo Duca