Survey

Engaging you to understand what you like.

Photo by Clay Banks on Unsplash

I have always enjoyed learning new things and then teaching them to others.

Therefore, I would love to know from you what you would like to read in my articles, that I have not covered yet.

I would ask you, if possible, to comment on this article, writing what you would like to read and learn. And, as far as possible, I will try to please you.

The general topic is Data Science. Some possible topics are, but not limited to:

  • streamlit
  • statistics
  • seaborn
  • data cleaning
  • R
  • linear regression
  • flask


Monthly Summary

Here a quick recap of articles I wrote in July

Photo by JESHOOTS.COM on Unsplash

The articles I published in July can be classified in the following groups:

  • Discussions
  • Data Structures
  • Data Collection
  • Data Analysis
  • Data Visualisation
  • Other Topics

In this article I do a quick recap of my July articles. I also insert the link to the original articles, in the case you want to deepen the topic.

1 Discussions

1.1 Adversarial Machine Learning

Adversarial Machine Learning is a technique which tries to modify an existing Machine Learning model, in order to introduce errors in predictions.

An adversary can perform attacks to a ML model at two levels:

  • Training: the attacker tries to perturb the model or the dataset at…


Data Analysis

A ready-to-run code including preprocessing, parameters tuning and model running and evaluation.

Image by Buffik from Pixabay

In this short tutorial I illustrate a complete data analysis process which exploits the scikit-learn Python library. The process includes

  • preprocessing, which includes features selection, normalization and balancing
  • model selection with parameters tuning
  • model evaluation

The code of this tutorial can be downloaded from my Github Repository.

Load Dataset

Firstly, I load the dataset through the Python pandas library. I exploit the heart.csv dataset, provided by the Kaggle repository.

import pandas as pddf = pd.read_csv('source/heart.csv')
df.head()


Data Manipulation

A ready-to-run code with some tricks to manipulate a Python Pandas Dataframe, using SQL queries.

Photo by Michael Dziedzic on Unsplash

In this tutorial, I illustrate some tricks to manipulate a Python Pandas Dataframe, using SQL queries. In details, I cover the following topic:

  • Missing Values (removal and replacement)
  • Dataframe Ordering
  • Dropping Duplicates
  • Merge two Dataframe (Union and Intersection)

In order to query a Pandas Dataframe through SQL queries, I exploit the sqldf Python library, which can be installed through the following command: pip install sqldf.

Load Dataset

I import the pandas library and I read a simple dataset, which contains for each country, its capital and a generic field, called Value.

import pandas as pddf = pd.read_csv('../../Datasets/capitals1.csv')
df.head()


Information Theory

An overview regarding one of the emerging research field for Machine Learning and Artificial Intelligence.

Image by Author

Research on Machine Learning (ML) models has evolved in recent years, leading to the definition of very precise models. In fact, the primary goal of the ML researchers has always been to develop ever more accurate models.

Therefore, research and development have not focused on the security of these models, leaving many serious vulnerabilities open, which in theory could cause significant damage to the implemented models.

Adversarial Machine Learning is a technique which tries to modify an existing Machine Learning model, in order to introduce errors in predictions.

In this article, I will give an overview of Adversarial ML attacks…


Data Visualisation

A ready-to-run tutorial, which describes how to build an animated line chart using Altair and Streamlit.

Image by Author

Altair is a very popular Python library for data visualisation. Through Altair, you can build very complex charts with few lines of code, since the library follows the guide lines provided by the Vega-lite grammar.

Unfortunately, Altair does not support native animations, because of the complexity of rendering them through Vega-lite.

In this tutorial, I illustrate a mechanism which combines the power of Streamlit with Altair, in order to render an animated line chart.

Streamlit is a very powerful Python library, which permits to build Web apps in Python with few lines of code.

Setup

Firstly, I install the required libraries:


Data Analysis

A ready-to-run code which demonstrates how the use of the n_jobs parameter can reduce the training time

Image by Author

In this tutorial I illustrate the importance of the n_jobs parameter provided by some classes of the scikit-learn library. According to the official scikit-learn library, the n_jobs parameter is described as follows:

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

This means that the n_jobs parameter can be used to distribute and exploit all the CPUs available in the local computer.

In this tutorial, I evaluate the time elapsed to fit all the default classification datasets provided by the scikit-learn library, by varying the n_jobs


Python Tricks

Less than 10 lines of code to preserve the layout of a text document after a manipulation, such as a text anonymisation.

Image by Author

It may happen that you need to manipulate a text, for example you need to anonymise it, removing sensitive data, such as names, places and dates.

When the manipulation process is complete, you may need to save the anonymised document and still maintain the layout of the original document.

This process could be useful, for example, if you want to publish the anonymised document into a Web site or an archive.

If many tutorials exist on how to manipulate a text, indeed I don’t have found any complete tutorial on how to export the manipulated text to a document with…


Data Science Teaching

A quick tutorial for beginners to get started with the very popular software for statistics and data analysis.

Image by Author

The R software is a very popular software for statistical computing and graphics. It provides many packages which can be also used for data science, especially for data analysis.

This article belongs to the series R for Beginners, which tries to help beginners to get started with the R software. In my previous article, I dealt with vectors. In this article, I deal with matrices and, in particular I focus on the following aspects:

  • create a matrix
  • assign names to rows and columns
  • select items
  • expand the matrix with new rows or columns
  • basic statistics.

1 Create a Matrix

A matrix is a multidimensional…


Data Visualisation

A quick tutorial to build an interactive Choropleth map with the popular Javascript library

Image by Author

Many Javascript libraries exist to build and animate maps, such as Leaflet.js and Highcharts. In this article I exploit the very famous Data Driven Documents (D3) library (version 5), which is more than a simple graph library.

D3 is a Javascript library which permits to manipulate documents, based on data.

In this tutorial I will build a choropleth map which shows the population of each country of the world. I have modified the original code, by adapting it to D3 v5 and enriching it with interactivity and annotations.

The full code can be downloaded from my Github Repository.

Setup

Firstly, I…

Angelica Lo Duca

I’m a computer scientist with experience in the field of Web applications, Data Science, Data Journalism, Blockchain and Semantic Web.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store