This is a companion discussion topic for the original entry at https://dphi.tech/practice/challenge/55
I am getting invalid output error expected 1 or 0 while uploading my submission file .As it is a multiclassification problem and the number of outputs classes are 4.I want to know why we are getting the error on uploading the csv file.
@sagarnarula could you please submit now? we change the evaluation metric and it should work fine now.
target = pd.DataFrame() target['prediction'] =y_pred
Yes, done! Thanks
How to improve accuracy? Do you guys try to change parameters and tune the models randomly? Or is there a better approach?Can you guys help me out?
I would suggest a RandomSearch or Gridsearch of the hyperparameters that are used in the model :), if that doesn’t work you could maybe try another model or do some feature selection/engineering?
Thanks @dphi for giving a nice competition to work on at the beginning of the year.
This time I wanted to do something unique (get the best solution in minimal lines of code). So here is my 6 lines code to get the second rank:-
import pandas as pd; import numpy as np; from sklearn.ensemble import ExtraTreesClassifier train_df = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/sukhna_dhanas/train_set_label.csv" ) test_df = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/sukhna_dhanas/test_set_label.csv') train_y = train_df['microorganism'].values preds = ExtraTreesClassifier(n_estimators=200,random_state=2020,max_depth=21).fit(train_df.drop(['microorganism'],axis=1).values,train_y).predict(test_df.values) pd.DataFrame(preds,columns=['prediction']).to_csv('extra_trees.csv',index=False)
I didn’t get the result in the first try though. I started with lightgbm, xgboost and catboost for my initial subs. Then used the library GML (developed by @muhammad4hmed and Naman) to get an idea as to which algorithms are performing well with 10 folds (link of demo present here :- https://github.com/Muhammad4hmed/GML/blob/master/DEMO/AutoMachineLearning.ipynb) and it showed Extra trees to be performing well than others. Since I got the best performing algorithm, I parameter tuned it and got the results.
congratulations! btw the person on first also used GML xD so it was GML vs GML BTW impressive short solution!