项目作者: takuto0831

项目描述 :
Home Credit Default Risk [Kaggle Format] / NN,LifhtGBM, Xgboost
高级语言: Jupyter Notebook
项目地址: git://github.com/takuto0831/Home_Credit_Kaggle.git
创建时间: 2018-07-19T10:39:39Z
项目社区:https://github.com/takuto0831/Home_Credit_Kaggle

开源协议:

下载


Table of Contents generated with DocToc

Reference

Technics

  • Important thing is good set of smart features and diverse set of base algorithms.
  • A lot of features based on division and substraction from the application_train.csv
    • The most notable division was by EXT_SOURCE_3
  • The most important features that I engineered, in descending order of importance (measured by gain in the LGBM model)
  • Find data structure, understand column description, mannagement of the feature

Outlook

  • Use feather.file
  • Feature engineering by using script file
  • How to feature selection -> using LGBM importance ??
  • Using the predictive value of such regression as features

Flow Chart

Home_Credit_Kaggle

Rmd.file

  • 0_EDA.Rmd: Checking data simply and searching problem
  • 1Preprocess_app.Rmd: Preprocessing for application{train|test}.csv
  • 1_Preprocess_bureau.Rmd: Preprocessing for bureau.csv and bureau_balance.csv (not changed)
  • 1_Preprocess_pre_app.Rmd: Preprocessing for previous_applications.csv (not changed)
  • 1_Preprocess_ins_pay.Rmd: Preprocessing for installments_payment.csv (not changed)
  • 1_Preprocess_pos_cash.Rmd: Preprocessing for POS_CASH_balance.csv (not changed)
  • 1_Preprocess_credit.Rmd: Preprocessing for credit_card_balance.csv (not changed)
  • 2_Combine.Rmd: Combining all data and Checking for data (not changed)
  • 3_XGBoost.Rmd: construct xgboost model, predict, make a submit file, search best features, parameter tune (not changed)

jn.file

  • LightGBM.ipynb: lightgbm, cross validation, predict
  • NeuralNetwork.ipynb: neural network, predict

py.file

script.file

  • function.R: Descrive detail of functions
  • makedummies.R: Make factor values dummy variables

submit.file

  • file_name + submit_date.csv

input

csv.file

  • raw data

csv_imp0.file

  • {…}.csv: Apply basic preprocess
  • all_{train|test}.csv: Combine all tables

csv_imp1.file

  • {…}_imp.csv: Complement missing values, Extract features
  • all_{train|test}.csv: Combine all tables

data.file

  • best_para.tsv: recorded best features
  • score_sheet.tsv: train auc, test auc, LB score
  • Flowchart.eddx, FlowChart.png: Illustrate the process chart
  • about_column.numbers: Explain all table columns

Layered Directory

  1. ├── Home_Credit_Kaggle.Rproj
  2. ├── README.md
  3. ├── Rmd
  4. ├── 0_EDA.Rmd
  5. ├── 1_Preprocess_app.Rmd
  6. ├── 1_Preprocess_app.html
  7. ├── 1_Preprocess_bureau.Rmd
  8. ├── 1_Preprocess_credit.Rmd
  9. ├── 1_Preprocess_ins_pay.Rmd
  10. ├── 1_Preprocess_pos_cash.Rmd
  11. ├── 1_Preprocess_pre_app.Rmd
  12. ├── 2_Combine.Rmd
  13. └── 3_XGBoost.Rmd
  14. ├── input
  15. ├── csv
  16. ├── HomeCredit_columns_description.csv
  17. ├── POS_CASH_balance.csv
  18. ├── application_test.csv
  19. ├── application_train.csv
  20. ├── bureau.csv
  21. ├── bureau_balance.csv
  22. ├── credit_card_balance.csv
  23. ├── installments_payments.csv
  24. ├── previous_application.csv
  25. └── sample_submission.csv
  26. ├── csv_imp0
  27. ├── all_data_test.csv
  28. ├── all_data_train.csv
  29. ├── POS_CASH_balance.csv
  30. ├── application_test.csv
  31. ├── application_train.csv
  32. ├── bureau.csv
  33. ├── bureau_balance.csv
  34. ├── credit_card_balance.csv
  35. ├── installments_payments.csv
  36. └── previous_application.csv
  37. └── csv_imp1
  38. ├── all_data_test.csv
  39. ├── all_data_train.csv
  40. ├── application_test_imp.csv
  41. ├── application_train_imp.csv
  42. └── credit_card_balance_imp.csv
  43. ├── data
  44. ├── best_para.tsv
  45. ├── best_para_old_100.tsv
  46. ├── about_column.numbers
  47. ├── FlowChart.eddx
  48. ├── FLowChart.png
  49. └── score_sheet.tsv
  50. ├── jn
  51. ├── LightGBM.ipynb
  52. └── NeuralNetwork.ipynb
  53. ├── py
  54. ├──
  55. └──
  56. ├── submit
  57. └── script
  58. ├── function.R
  59. └── makedummies.R