Practice Data Sprint #3: Abalone Challenge | DPhi

This is a companion discussion topic for the original entry at
1 Like

Looks like my model is behaving very badly. The predicting score is around 200. But the RMSE on the valid data seems decent (Root Mean Squared Error is: 0.28047780853740273). Any suggestions? I did encode the “string” for SEX feature and also tried adding more duplicate data to the set and it did improve my RMSE, but it did not help during the actual prediction. Any sugessions?

I made “male”, “female”, “infant” feature whose data types are int64. Then, I dropped the “Sex” feature. This is what I did for data preparation.
Using This data, I ran the xgboost.XGBRegressor.
I hope this works for you, too.

This is what I have done too. Have used dummy encoding for the ‘Sex’ column. Yet I get poor result on unseen data. Can you share me the link of your work after the hackathon is completed?
I would like to see your work and learn from it ( if you dont mind)

Hi @sunny
I have checked the prediction for your random forest model and it gives rmse 2.15 on the unseen data (i.e. test data). Please check if you have done something wrong while downloading the prediction or making a submission