Let’s say that we have a dataset like rainfall in Australia (where I can find this? in kaggle)
In this dataset, we have a date column that looks like this:
>>> df['Date'] 0 2008-12-01 1 2008-12-02 2 2008-12-03 3 2008-12-04 4 2008-12-05 ... 145455 2017-06-21 145456 2017-06-22 145457 2017-06-23 145458 2017-06-24 145459 2017-06-25 Name: Date, Length: 145460, dtype: object
This is a string that we prefer to convert to datatime; years, months and days and help the model to understand better the time relation of the event. Pandas is our tool. and with the following command, it turns a string date to datatime. Let’s do it:
df['Date'] = pd.to_datetime(df['Date']) df['Year'] = df['Date'].dt.year df['Month'] = df['Date'].dt.month df['Day'] = df['Date'].dt.day df.drop('Date', axis=1, inplace=True)
In this snippet, we convert to datetime and store it back to ‘Date’ column. After that to increase the resolution, we split into three new columns called ‘Year’, ‘Month’, ‘Day’. Finally, we drop the ‘Date’ column to avoid replication of the information.
Now, the dates look like this:
df[['Year','Month', 'Day']] Year Month Day 0 2008 12 1 1 2008 12 2 2 2008 12 3 3 2008 12 4 4 2008 12 5 ... ... ... ... 145455 2017 6 21 145456 2017 6 22 145457 2017 6 23 145458 2017 6 24 145459 2017 6 25 [145460 rows x 3 columns]
References: Tutorials by Prashant Banerjee in kaggle