Solve Data Sprint #17 Challenge | DPhi


Information will be put up on 11th December 2020 at 21:00 IST | 17:00 CET

This is a companion discussion topic for the original entry at

Seems like I dominated this hackathon too early with 0 score :stuck_out_tongue:
here is the tip for other participants.

Try to make your forest algorithms act like a linear algorithm (by changing hyperparameters). also, there are only a few unique price values so test values must be one of them.
Linear algorithms will give terrible results. the trick is to change tree-based algorithms’ behaviour to linear models.

Best of luck :+1:

After the deadline of this challenge, curious to see your code. pls dont forget to upload it.
Thanks in advance!

We identified there is something unusual with the test data owing to the nature of the problem (auction). While this could appear in the case of a real-world problem, we wouldn’t ideally expect to see it in a competition as this may not be in the best spirit of the competition. We will be writing to all the top contestants to seek notebooks. Only upon validation, the scores on the leaderboard will retain.

Happy problem-solving :slight_smile:

Hi, Moderators,

The issue will still be there because the score of 0 is attained via auctionid. If we keep that feature and then fit a simple model on the data, we can still get 0 values. I got a score of 0 using this :-

gdf = auction_data.groupby([‘auctionid’]).agg({‘price’:‘max’}).reset_index()
test_df = test_data.merge(gdf,on=[‘auctionid’],how=‘left’)
submission_df = pd.DataFrame(columns = [‘prediction’])
submission_df[‘prediction’] = test_df[‘price’]

We can merge the gdf with train data and fit the model on train data and use it on test. It will give the score

The issue will be resolved only if the rule to remove auctionid as a feature is given. We have seen in many hackathons that people use this ID column to create features to boost their scores. However, its too late now.

1 Like