A Summary of My July Articles: from Python/R Tutorials to Open Discussions
The articles I published in July can be classified in the following groups:
- Data Structures
- Data Collection
- Data Analysis
- Data Visualisation
- Other Topics
In this article I do a quick recap of my July articles. I also insert the link to the original articles, in the case you want to deepen the topic.
1.1 Adversarial Machine Learning
Adversarial Machine Learning is a technique which tries to modify an existing Machine Learning model, in order to introduce errors in predictions.
An adversary can perform attacks to a ML model at two levels:
- Training: the attacker tries to perturb the model or the dataset at the training time, for example by injecting fake data or modifying data in the dataset;
- Testing (or Inference): this kind of attack is performed when the model has been already trained.
In order to defend a ML system from Adversarial ML attacks, the following steps should be followed:
- identify the potential vulnerabilities of the ML system
- design and implement the corresponding attacks and evaluate their impact on the system
- propose some countermeasures to protect the ML system against the identified attacks.
Adversarial Machine Learning: Attacks and Possible Defense Strategies
An overview regarding one of the emerging research field for Machine Learning and Artificial Intelligence.
1.2 Some considerations on Semantic Web
Before moving into the data science field, I used to work in the field of the Semantic Web. In this article, I wonder whether the Semantic Web is really dead, or there is still space for this old research field.
My conclusion is that the Semantic Web, especially the Linked Data initiative is still alive in the Cultural Heritage sector.
Is the Semantic Web really Dead?
It seems that the Semantic Web has almost gone. Is this true? In this article we try to retrace the history of Semantic…
2 Data Structures
2.1 Learning R: Matrices
I deal with matrices and, in particular I focus on the following aspects:
- create a matrix
- assign names to rows and columns
- select items
- expand the matrix with new rows or columns
- basic statistics.
R for Beginners — Part 2: Working with Matrices
A quick tutorial for beginners to get started with the very popular software for statistics and data analysis.
2.2 Sampling a Dataframe in Python Pandas
In this tutorial, I illustrate the following techniques to perform rows sampling through Python Pandas:
- random sampling — given a dataframe with N rows, random Sampling extract X random rows from the dataframe, with X ≤ N.
- sampling with condition — extract only some rows which satisfy a given condition.
- sampling at a constant rate — sampling at a constant rate, which means that you want that there is a constant distance between two adjacent samples.
How to Sample a Dataframe in Python Pandas
A ready-to-run code with different techniques to sample a dataset in Python Pandas
3 Data Collection
3.1 HTML Scraping with Python Pandas
In this tutorial I describe a simple mechanism to extract tables from HTML pages with Python Pandas. This can be achieved through the
read_html() function, which is very simple and fast. In most cases, the scraped tables need some cleaning process.
How to Scrape HTML Tables with Python Pandas
A ready-to-run code which exploits the read_html() function of the Python Pandas library
4 Data Analysis
4.1 How to speedup a scikit-learn classification task
In this tutorial, I evaluate the time elapsed to fit all the default classification datasets provided by the scikit-learn library, by varying the
n_jobs parameter from 1 to the maximum number of CPUs. As example, I will try a K-Neighbors Classifier with Grid Search with Cross Validation.
Understanding the n_jobs Parameter to Speedup scikit-learn Classification
A ready-to-run code which demonstrates how the use of the n_jobs parameter can reduce the training time
4.2 Time Series forecasting with SARIMA model
A SARIMA model can be tuned with two kinds of orders:
- (p,d,q) order, which refers to the order of the time series. This order is also used in the ARIMA model (which does not consider seasonality);
- (P,D,Q,M) seasonal order, which refers to the order of the seasonal component of the time series.
In this article, I focus on the importance of the seasonal order.
Understanding the Seasonal Order of the SARIMA Model
A quick overview and a ready-to-run code to understand the (D, P, Q,M) seasonal order of the SARIMA model of the Python…
4.3 K-Neighbours Classification
In this tutorial for beginners, I illustrate how to set up, train and finalise a K-Neighbours Classifiers using the
scikit-learn library. The following steps should be followed:
- data preprocessing
- model training
- model testing
- model finalisation
Machine Learning: Getting Started with the K-Neighbours Classifier
A Python ready-to-run code which implements the K-Neighbours Classifier in scikit-learn, from data preprocessing to…
4.4 Three tricks to speed up and optimise your Python
In this article I illustrate three tricks to optimise your Python code:
- if you need to run scientific computations, you can exploit the
- if you need to deal with large datasets, you can exploit the
pysparkpackage or, whenever possible, downgrade the columns datatype.
Three Tricks to Speed Up and Optimise Your Python
A review regarding three Python tricks that I have discovered in my June readings.
5 Data Visualisation
5.1 Using sqlite3 in Observablehq
In this tutorial, I exploit the new sqlite3 feature to build a simple bar chart, which updates dynamically, according to users’ selection.
As example dataset, I use the Generic Food Database, provided by data.world and available at this link. In addition, I will build a dynamic bar chart, which shows the number of items for each sub group, provided the main group. The group choice is done through a dropdown selection.
How to build a Dynamic Bar Chart in Observablehq through sqlite3
A ready-to-run notebook which exploits the very recent sqlite3 features provided by Observablehq
5.2 D3.j for Beginners: Maps
In this tutorial I will build a choropleth map which shows the population of each country of the world.
Getting Started with D3.js Maps
5.3 How to insert an Graph drawn in Observablehq into a HTML page
In this tutorial, I propose two strategies to embed a graph into a Web site:
- through iframe
In both cases, firstly, you must to publish your notebook, by clicking the publish button.
In addition, in both cases, you should follow the following steps:
- download the embedding code from Observable
- insert the code into your HTML page.
How to Insert an Observablehq Graph into a HTML Page
A quick tutorial to make wonderful HTML pages with your Observable .
5.3 How to Run Animation in Altair
In this tutorial, I illustrate a mechanism which combines the power of Streamlit with Altair, in order to render an animated line chart.
The resulting animation should look like the following one:
How to Run Animations in Altair and Streamlit
A ready-to-run tutorial, which describes how to build an animated line chart using Altair and Streamlit.
6 Other Topics
6.1 Preserving the layout of a manipulated document
If many tutorials exist on how to manipulate a text, indeed I don’t have found any complete tutorial on how to export the manipulated text to a document with the same layout of the original one.
In this short tutorial, I describe how to achieve this objective, with less than 10 lines of Python code!
How to Restore The Original Layout of a Text Document after a Manipulation in Python
Less than 10 lines of code to preserve the layout of a text document after a manipulation, such as a text…
6.2 Build a Readme file
Here, I propose a simple online tool, called readme.so, which is specifically thought to build a Readme file very quickly.
It is completely free ant takes just few minutes to understand how it works. Readme.so supports many languages, including Italian, French, Spanish and many others.
How to Quickly Build a Readme file in Github
A quick overview of the readme.so online tool to build a Github Readme very quickly.
6.3 How to spend your time when you are waiting for a Data Analysis Output
In this article, I suggest you two possible alternatives to fill the waiting time:
- Focus on your project — you can try to improve your project.
- Open your mind — You may try to improve your skills and knowledge in different ways, such as attending webinars, online courses and much more.
How to spend your time when you are waiting for a Data Analysis Output
Some suggestions to not waste your time when your computer is running your preferred algorithms and you are waiting for…
In this article, I have described a quick summary of the articles I published in July. If you want to stay up-to-date, you can follow me and also read my new publications.
Stay tuned :)