[Feature Engineering] how to convert a date string to datetime

Let’s say that we have a dataset like rainfall in Australia (where I can find this? in kaggle)

In this dataset, we have a date column that looks like this:

>>> df['Date']
0         2008-12-01
1         2008-12-02
2         2008-12-03
3         2008-12-04
4         2008-12-05
             ...    
145455    2017-06-21
145456    2017-06-22
145457    2017-06-23
145458    2017-06-24
145459    2017-06-25
Name: Date, Length: 145460, dtype: object

This is a string that we prefer to convert to datatime; years, months and days and help the model to understand better the time relation of the event. Pandas is our tool. and with the following command, it turns a string date to datatime. Let’s do it:

df['Date'] = pd.to_datetime(df['Date'])

df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

df.drop('Date', axis=1, inplace=True)

In this snippet, we convert to datetime and store it back to ‘Date’ column. After that to increase the resolution, we split into three new columns called ‘Year’, ‘Month’, ‘Day’. Finally, we drop the ‘Date’ column to avoid replication of the information.

Now, the dates look like this:

df[['Year','Month', 'Day']]
        Year  Month  Day
0       2008     12    1
1       2008     12    2
2       2008     12    3
3       2008     12    4
4       2008     12    5
...      ...    ...  ...
145455  2017      6   21
145456  2017      6   22
145457  2017      6   23
145458  2017      6   24
145459  2017      6   25
[145460 rows x 3 columns]

References: Tutorials by Prashant Banerjee in kaggle