Folder Structure - Where to find what Files (In- and Output)

Unless the paths are not adapted/overwritten in config.ini, Comprior builds up the following folder infrastructure during processing:

data
 ├── input
 │   ├── dataset
 │   └── example
 ├── intermediate
 │   ├── dataset
 │   ├── crossvalidation
 │   └── externalKnowledge
 └── results
     └── XXX
         ├── timeLogs
         ├── preanalysis
         ├── geneRankings
         └── evaluation
             ├── rankings
                 ├── annotation
                 └── metrics
             ├── reducedData
             └── classification
                 ├── metrics
                 └── crossEvaluation
                     ├── reducedData
                     └── classification

input/

  • dataset/: put your input dataset here
  • example/: folder with example files for trying out Comprior

intermediate/

  • dataset/: preprocessed input data (currently metadata added to one file)
  • crossvalidation/: contains preprocessed dataset for cross-validation (e.g. mapped to the right identifier or pathway features)
  • externalKnowledge/: one sub-folder per knowledge base that is queried with query results

results/

  • XXX/: output folder for the current run, whose name is specified by the outputDir_name parameter in config.ini (if there already exists a folder with such a name, Comprior adds a number to the name)

    • timeLogs/: one file for every selected approach, containing logs with time durations of different selection activities, e.g. external knowledge retrieval or statistical feature selection

    • preanalysis/: contains - if selected via preanalysis_plots and evaluateKBcoverage parameters in config.ini - plots on data set characteristics and knowledge base coverage

    • geneRankings/: contains the actual feature rankings, one CSV file for every selected approach

    • evaluation/: contains all evaluation results

      • rankings/: contains evaluation results from analyzing the feature rankings
        • annotation/: contains annotation/enrichment files for every ranking
        • metrics/: contains the actual metrics results to compare the rankings
      • reducedData/: one sub-folder per selection approach containing input data for the top k features; these files are used for the actual classification/prediction
      • classification/:
        • metrics/: contains the actual classification metrics results, one CSV file for every selected metric, also contains pdfs for visualizations
        • crossEvaluation/: contains evaluation data from the second data set for cross-validation
          • reducedData/: one sub-folder per selection approach containing input data (second data set) for the top k features; these files are used for the actual classification/prediction
          • classification/: contains the actual classification metrics results, one CSV file for every selected metric, also contains pdfs for visualizations