Monthly Summary

A Summary of My June Articles…in Case you Have Missed Them

A quick recap of articles that I published in June: from Data Analysis, to Data Visualisation, up to Environment Setup.

Angelica Lo Duca
4 min readJul 3, 2021

--

Image by Gerd Altmann from Pixabay

The articles I published in June can be classified in the following groups:

  • Data and Text Analysis
  • Data Visualisation
  • Environment Setup
  • Data Science Basics

In this article I do a quick recap of my June articles. I also insert the link to the original articles, in the case you want to deepen the topic.

1. Data and Text Analysis

Regarding Data Analysis, I focussed on Overfitting and Automated Machine Learning (AutoML), as well as I proposed a complete Data Analysis workflow with Python PyCaret.

1.1 Overfitting

Overfitted models are those that perform very well on the training set, but very poorly on the test set, thus producing very poor predictions.

In my article I proposed a strategy to check if a model is overfitted and how to mitigate this problem, in case our model is affected by overfitting.

1.2 AutoML

AutoML permits to find the best model for a given problem automatically. In practice, AutoML selects the best model, with tuned parameters among some defined models.

Many libraries exist to perform AutoML in Python. In my article, I compared two libraries: Hyperopt Sklearn and TPOT.

1.3 Complete Data Analysis Workflow

Following the great success had by previous article onto a complete Data Analysis workflow with scikit-learn, I wrote a similar article, which implemented the same workflow with PyCaret, a Python library for Machine Learning.

1.4 Text Analysis

Often, it may happen that you must extract information from unstructured text. For this, the SpaCy Python library for Text Analysis can help you.

I proposed a tutorial on how to exploit the SpaCy library to build a dataset from a text.

2 Data Visualisation

A Good Data Visualisation is one of the hard things to do in the Data Science workflow. Everyone is able to build a data visualisation, but only a little number of data scientists can build a good visualisation.

For this reason, I June I started a series of Data Visualisation principles. At the moment I have covered the following aspects:

  • white spaces
  • text
  • colour
  • layout
  • emphasis

You can read Part 1 and Part 2 articles of the series, as well as a practical example in Python Altair:

In addition, I have collected some articles from the Web, dealing with the topic of Data Visualisation. I recommend you to read them, because you will get enriched :)

3 Environment Setup

Every data scientist should work in a comfortable environment, both at the physical and virtual layers. For this reason, I proposed some articles to setup your virtual work environment in the best way.

3.1 Jupyter

Principally, I work with Jupyter notebooks, so I illustrated the way to install Jupyter onto an Android Device, as well as how to run R scripts in Jupyter.

3.2 Python

I investigated some strategies to improve Python coding. I found fantastic the usage of virtualenv, as well as an improvement in computation speed through PySpark.

Data Science Basics

Finally, I wrote some articles for beginners, including a starting description of Descriptive Analytics, some basic data structures in R and the top 25 libraries which every data scientist should know.

Last but not least, I explained how to get started with D3.js maps.

Summary

In this article, I have described a quick summary of the articles I published in June. If you want to stay up-to-date, you can follow me and also read my new publications.

Stay tuned :)

If you wanted to be updated on my research and other activities, you can follow me on Twitter, Youtube and and Github.

--

--

Angelica Lo Duca

Researcher | +50k monthly views | I write on Data Science, Python, Tutorials, and, occasionally, Web Applications | Book Author of Comet for Data Science