项目作者: takuto0831
项目描述 :
Home Credit Default Risk [Kaggle Format] / NN,LifhtGBM, Xgboost
高级语言: Jupyter Notebook
项目地址: git://github.com/takuto0831/Home_Credit_Kaggle.git
Table of Contents generated with DocToc
Reference
Technics
- Important thing is good set of smart features and diverse set of base algorithms.
- A lot of features based on division and substraction from the application_train.csv
- The most notable division was by EXT_SOURCE_3
- The most important features that I engineered, in descending order of importance (measured by gain in the LGBM model)
- Find data structure, understand column description, mannagement of the feature
Outlook
- Use feather.file
- Feature engineering by using script file
- How to feature selection -> using LGBM importance ??
- Using the predictive value of such regression as features
Flow Chart

Home_Credit_Kaggle
Rmd.file
- 0_EDA.Rmd: Checking data simply and searching problem
- 1Preprocess_app.Rmd: Preprocessing for application{train|test}.csv
- 1_Preprocess_bureau.Rmd: Preprocessing for bureau.csv and bureau_balance.csv (not changed)
- 1_Preprocess_pre_app.Rmd: Preprocessing for previous_applications.csv (not changed)
- 1_Preprocess_ins_pay.Rmd: Preprocessing for installments_payment.csv (not changed)
- 1_Preprocess_pos_cash.Rmd: Preprocessing for POS_CASH_balance.csv (not changed)
- 1_Preprocess_credit.Rmd: Preprocessing for credit_card_balance.csv (not changed)
- 2_Combine.Rmd: Combining all data and Checking for data (not changed)
- 3_XGBoost.Rmd: construct xgboost model, predict, make a submit file, search best features, parameter tune (not changed)
jn.file
- LightGBM.ipynb: lightgbm, cross validation, predict
- NeuralNetwork.ipynb: neural network, predict
py.file
script.file
- function.R: Descrive detail of functions
- makedummies.R: Make factor values dummy variables
submit.file
- file_name + submit_date.csv
csv.file
csv_imp0.file
- {…}.csv: Apply basic preprocess
- all_{train|test}.csv: Combine all tables
csv_imp1.file
- {…}_imp.csv: Complement missing values, Extract features
- all_{train|test}.csv: Combine all tables
data.file
- best_para.tsv: recorded best features
- score_sheet.tsv: train auc, test auc, LB score
- Flowchart.eddx, FlowChart.png: Illustrate the process chart
- about_column.numbers: Explain all table columns
Layered Directory
├── Home_Credit_Kaggle.Rproj
├── README.md
├── Rmd
│ ├── 0_EDA.Rmd
│ ├── 1_Preprocess_app.Rmd
│ ├── 1_Preprocess_app.html
│ ├── 1_Preprocess_bureau.Rmd
│ ├── 1_Preprocess_credit.Rmd
│ ├── 1_Preprocess_ins_pay.Rmd
│ ├── 1_Preprocess_pos_cash.Rmd
│ ├── 1_Preprocess_pre_app.Rmd
│ ├── 2_Combine.Rmd
│ └── 3_XGBoost.Rmd
├── input
│ ├── csv
│ │ ├── HomeCredit_columns_description.csv
│ │ ├── POS_CASH_balance.csv
│ │ ├── application_test.csv
│ │ ├── application_train.csv
│ │ ├── bureau.csv
│ │ ├── bureau_balance.csv
│ │ ├── credit_card_balance.csv
│ │ ├── installments_payments.csv
│ │ ├── previous_application.csv
│ │ └── sample_submission.csv
│ ├── csv_imp0
│ │ ├── all_data_test.csv
│ │ ├── all_data_train.csv
│ │ ├── POS_CASH_balance.csv
│ │ ├── application_test.csv
│ │ ├── application_train.csv
│ │ ├── bureau.csv
│ │ ├── bureau_balance.csv
│ │ ├── credit_card_balance.csv
│ │ ├── installments_payments.csv
│ │ └── previous_application.csv
│ └── csv_imp1
│ ├── all_data_test.csv
│ ├── all_data_train.csv
│ ├── application_test_imp.csv
│ ├── application_train_imp.csv
│ └── credit_card_balance_imp.csv
├── data
│ ├── best_para.tsv
│ ├── best_para_old_100.tsv
│ ├── about_column.numbers
│ ├── FlowChart.eddx
│ ├── FLowChart.png
│ └── score_sheet.tsv
├── jn
│ ├── LightGBM.ipynb
│ └── NeuralNetwork.ipynb
├── py
│ ├──
│ └──
├── submit
└── script
├── function.R
└── makedummies.R