knime2py — Implemented Nodes

This page lists the KNIME nodes currently supported for code generation. For unsupported nodes, the generator produces a best-effort stub and TODOs to guide manual implementation.

# KNIME Node Module Notes
1Color Managercolor_manager.pyKNIME color annotations are UI metadata and have no native representation in pandas;
we therefore forward the input table unchanged to all outputs.
2Column Appendercolumn_appender.py- Settings read: selected_rowid_mode, selected_rowid_table, selected_rowid_table_number
(base suffix defaults to "_r"; final suffix becomes f"{base}{k}" per right table).
- Alignment: IDENTICAL → index join; other modes → positional concat with reset index.
3Column Filter (exclude-only)column_filter.py- Excludes are parsed heuristically from settings.xml by scanning <config> blocks whose keys
contain "exclude", collecting list entries (<entry key='0' value='Col'/> or <entry key='name'/>).
- Dropping uses errors='ignore' so missing columns won't fail the cell.
- If no excludes are found, the node is a passthrough.
4Column Renamercolumn_renamer.py• Supports only explicit (old → new) mappings from settings.xml.
• No pattern/regex templating, no type-based renames, no column reordering.
5Concatenateconcatenate.py- No suffixing or renaming of columns.
- No column intersection logic; pandas default union alignment is used.
- Row index is reset (0..N-1) via ignore_index=True.
6CSV Readercsv_reader.pypandas>=1.5 recommended (nullable dtypes supported in dtype mapping).
Quote/escape are passed to pandas. If escapechar equals quotechar, we omit escapechar and rely
on double-quote parsing (avoids C-engine "EOF inside string" errors).
Dtype mapping is derived from table_spec_config_Internals; unknown types are left to inference.
Path resolution supports LOCAL and RELATIVE knime.workflow only; other FS types are not yet handled.
Robust NA/dtype handling:
- Treat '' and ' ' as missing on read (na_values=['', ' '], keep_default_na=True, skipinitialspace=True)
- Read WITHOUT dtype=..., then coerce per-column:
* numeric targets ('Int64', 'Float64') via pd.to_numeric(..., errors='coerce').astype(target)
* other types via .astype(target)
7CSV Writercsv_writer.pypandas>=1.5 recommended for consistent NA/nullable dtype handling.
Path resolution supports LOCAL absolute paths and RELATIVE knime.workflow; other FS types are not yet handled.
Directory creation is not automatic; ensure out_path.parent exists before writing.
Line terminator / quoting mode / doublequote / escapechar are not explicitly mapped unless present; pandas defaults apply.
File is overwritten by default; KNIME “append/overwrite” style flags are not implemented here.
8Decision Tree Learnerdecision_tree_learner.pyPruning options (e.g., pruningMethod/Reduced Error Pruning) are not available in sklearn DT;
consider ccp_alpha for cost-complexity pruning if needed.
First-split constraints and binary nominal split settings are not supported by sklearn.
Feature importances are impurity-based (Gini/entropy); consider permutation importances if
you need model-agnostic measures.
Library expectations: pandas>=1.5, numpy>=1.23, scikit-learn>=1.2 recommended.
9Decision Tree Predictordecision_tree_predictor.pyThe estimator itself must be scikit-learn-like.
Scope: classification predictor only. Multi-output and regression variants are not handled
10Equal Size Samplingequal_size_sampling.pyExact mode only: “Approximate” sampling is not implemented in this generator.
Requires pandas; scikit-learn is used only for resample() (no synthetic example generation).
Seed is used when provided; default fallback is 1 for deterministic output.
Order of rows after concatenation is re-sorted back to the original index.
11Excel Readerexcel_reader.pyCovered mappings (KNIME → pandas):
• Path: LOCAL & RELATIVE (knime.workflow) via resolve_reader_path()
• Sheet selection: sheet_selection ∈ {FIRST, NAME, INDEX} → sheet (0 | 'name' | index)
• Header: table_contains_column_names + column_names_row_number → header (0-based) or None
• Column range: read_from_column/read_to_column → usecols="A:D" (Excel A1-style column span)
• Row range: read_from_row/read_to_row → skiprows / nrows (best-effort)
• Dtypes: table_spec_config_Internals → dtype mapping (nullable pandas dtypes when possible)
• Replace empty strings with missings: advanced_settings.replace_empty_strings_with_missings
12Excel Writerexcel_writer.py- Only XLSX is supported (engine='openpyxl'). Legacy XLS (xls) is not implemented.
- KNIME-style row-wise append into an existing sheet is not fully replicated. Pandas
does not support true “append to bottom” without custom openpyxl manipulation.
- We honor if_sheet_exists and header flags but do not append rows.
- Auto-size columns, print layout, formula evaluation, and “open file after exec”
are not supported.
13Gradient Boosted Trees (Classification) Learnergbt_learner.py- Feature selection: use included_names if present; otherwise all numeric/boolean columns except
the target. Excluded_names are removed afterward. If no target is configured, the node is a
passthrough: bundle=None and empty outputs with an error note in the summary.
- Hyperparameters mapped: nrModels→n_estimators, learningRate→learning_rate, maxLevels
(-1/absent → default 3)→max_depth, minNodeSize→min_samples_split (≥2), minChildSize→min_samples_leaf (≥1),
dataFraction (0<≤1)→subsample (stochastic GB), columnSamplingMode→max_features (None/'sqrt'/'log2'/fraction/int),
seed→random_state. Seed defaults to 1 for deterministic output.
- Unsupported/orthogonal flags: splitCriterion (trees in sklearn GBT have fixed criterion),
missingValueHandling (impute beforehand), useAverageSplitPoints, useBinaryNominalSplits,
isUseDifferentAttributesAtEachNode (no direct sklearn analog). These are noted and ignored.
- Outputs: port 1=model bundle (estimator, metadata), port 2=feature_importances_, port 3=summary.
- Dependencies: lxml for XML parsing; pandas/numpy for data handling; scikit-learn for modeling.
- KNIME seeds can be > 2**32-1. We now coerce to a valid sklearn seed:
seed32 = None if seed is None else int(abs(int(seed)) % (2**32))
14Gradient Boosted Trees (Classification) Predictorgbt_predictor.py- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back gracefully
to bare estimator and infers features if needed (raises KeyError if required columns are missing).
- Prediction column name: custom if configured, else "Prediction (<target>)".
- Probabilities: adds per-class "P (<target>=<class>)<suffix>" when predict_proba is available; may
also append "<prediction> (confidence)" as max probability.
- Optional: append number of boosted estimators as "<prediction> (models)".
- Ignored flag: 'useSoftVoting' (not applicable to sklearn GBT).
15K Nearest Neighbor (single-node trainer + scorer)knn.py- Inputs: one table with a target column (classColumn) plus feature columns.
- Feature selection: all numeric/boolean columns except the target. Values are coerced to
numeric (invalid → NaN) and filled with 0.0 to satisfy KNN distance computations.
- Hyperparameters: k (neighbors), weightByDistance → weights ('uniform'|'distance').
16Linear Correlationlinear_corellation.pySettings honored (from settings.xml):
- include-list: included_names / excluded_names + enforce_option (EnforceInclusion/EnforceExclusion)
- pvalAlternative: TWO_SIDED | GREATER | LESS (re-scales p from two-sided if SciPy is available)
- columnPairsFilter: COMPATIBLE_PAIRS | ALL_PAIRS (we compute numeric↔numeric only)
17Logistic Regression Learnerlogreg_learner.py- Feature selection: use included_names if set; otherwise all numeric/boolean columns minus the
target; then remove excluded_names.
- Hyperparameter mapping: solver(KNIME→sklearn), maxEpoch→max_iter, epsilon→tol, seed→random_state.
Target reference category is recorded as metadata only (no sklearn equivalent).
18Logistic Regression Predictorlogreg_predictor.py- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back to a bare
estimator and infers features if absent (raises KeyError if required columns are missing).
- Prediction column name: custom if configured; otherwise "Prediction (<target>)".
- Probabilities: when predict_proba exists, adds "P (<target>=<class>)<suffix>" columns.
- XML quirks: reads KNIME’s misspelled keys verbatim
(has_custom_predicition_name, include_probabilites, propability_columns_suffix).
19Math Formula (JEP)math_formula.py- Only a small set of functions is mapped (ln, log, log10, sqrt, exp, round, ceil, floor).
- Advanced JEP functions/operators not listed above are not translated.
20Missing Value Handlermissing_value.py- Integers: mean/median/mode fills are rounded and per-column recast to nullable Int64.
- Skip branches always contain an executable statement (pass) to avoid IndentationError.
- We never emit .fillna(None).
21MLP Predictormlp_predictor.py- Settings: "change prediction" (bool), "prediction column name" (string),
"append probabilities" (bool), "class probability suffix" (e.g., "_AN").
22Naive Bayes Learner (GaussianNB with optional one-hot on categoricals)naive_bayes_learner.pySettings parsed (best effort):
- classifyColumn (target)
- threshold → var_smoothing for GaussianNB
- minSdValue, minSdThreshold (not directly supported in sklearn; documented in meta)
- maxNoOfNomVals (categorical columns with > N unique values are ignored)
- skipMissingVals (True → drop rows with any missing among selected features;
False → impute numeric with mean and keep dummy_na for categoricals)
23Naive Bayes Predictornaive_bayes_predictor.py- Inputs:
Port 1 → model bundle (dict with {'estimator','features','target','classes','meta': {...}})
or a bare sklearn estimator as fallback
Port 2 → data table to score
- Prediction column:
• If settings["change prediction"] is True and a custom name is provided, uses it.
• Otherwise defaults to "Prediction (<target>)".
- Probabilities:
• If enabled, adds "P (<target>=<class>)<suffix>" per class (suffix from settings, default "_NB").
- Feature matrix reconstruction:
• Prefer bundle['features'] (order preserved).
• Else use getattr(estimator, 'feature_names_in_', None).
• Else build from data: numeric columns + one-hot for all non-numeric; then align:
- add missing expected columns with 0,
- drop extra columns not in features,
using bundle['meta'] flags when available (e.g., skip_missing) for dummy_na policy.
24Normalizernormalizer.py- Column selection: use included_names if set; else all numeric dtypes (Int*/int*/Float*/float*);
drop excluded_names afterward.
- Modes: MINMAX uses new-min/new-max (constant/empty columns map to new_min);
ZSCORE uses (x-mean)/std (zero std → 0.0).
25One to Many (One-Hot Encoding)one_to_many.py- Column selection: parsed from model/columns2Btransformed with EnforceInclusion/EnforceExclusion
semantics, restricted to string-like dtypes (string/object/category).
- Naming: new columns are prefixed with the source column and '=' separator (e.g., "Region=West")
to avoid collisions when different columns share the same category label.
- Missing values: not encoded (rows with NA get all zeros for that column’s dummies).
- removeSources: if true, drops the original columns after expansion.
26Partitioningpartitioning.py- Implementation: sklearn.model_selection.train_test_split; seed honored when provided.
- STRATIFIED: uses class_column; NaN treated as a separate class; falls back to non-stratified if
stratification is infeasible (e.g., tiny classes).
- RELATIVE: fraction is clamped to [0,1]. ABSOLUTE: train_size is an integer bounded by len(df).
27Random Forest (Classification) Learnerrandom_forest_learner.py- Feature selection: use included_names if provided; otherwise all numeric/boolean columns except
the target; excluded_names are removed afterward.
- Hyperparameter mapping: nrModels→n_estimators; maxLevels>0→max_depth else None; minNodeSize→min_samples_split;
minChildSize→min_samples_leaf; isDataSelectionWithReplacement→bootstrap; dataFraction→max_samples
(only when bootstrap=True); columnSamplingMode/columnFractionPerTree/columnAbsolutePerTree plus
isUseDifferentAttributesAtEachNode→max_features ('sqrt'/'log2'/1.0/fraction/int); seed→random_state.
- Info-only flags (not applied in sklearn RF): splitCriterion, missingValueHandling,
useAverageSplitPoints, useBinaryNominalSplits; noted and ignored.
28Random Forest (Classification) Predictorrandom_forest_predictor.py- Ports: In1=model bundle, In2=data table, Out1=predicted table.
- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back to a bare
estimator and infers features if absent (raises KeyError if required columns are missing).
- Prediction column name: custom if configured; otherwise "Prediction (<target>)".
- Probabilities: when available, adds "P (<target>=<class>)<suffix>"; may also append
"<prediction> (confidence)" as max probability. Optional "Model Count" from n_estimators.
- 'useSoftVoting' is informational; sklearn RandomForest averages probabilities by design.
29Reference Row Splitterreference_row_splitter.py• Join keys are coerced to pandas 'string' dtype; NaNs in the reference key set are ignored.
• If a configured column is missing, a clear KeyError is raised.
30ROC Curveroc_curve.pySupports both KNIME view configuration variants:
1) Newer:
- view/targetColumnV3 → truth column
- view/predictionColumnsV2/manualFilter/manuallySelected → probability columns
2) Older:
- view/targetColumn/selected → truth column
- view/predictionColumns/selected_Internals → probability columns
- (also checks view/predictionColumns/manualFilter/manuallySelected if present)
31Row Aggregatorrow_aggregator.pyKeys used (model):
categoryColumn (string | null)
aggregationMethod (COUNT | SUM | AVERAGE | MINIMUM | MAXIMUM)
frequencyColumns/selected_Internals + manualFilter/manuallySelected → aggregation column names
weightColumn (string | null) — only SUM/AVERAGE use it
grandTotals (boolean)
32Row Filterrow_filter.pySupported operators (heuristic mapping):
- IS_MISSING → df[col].isna()
- IS_NOT_MISSING → df[col].notna()
- EQ, EQUAL(S), = → numeric compare when possible; otherwise string compare
- NE, NOT_EQUAL, <>, != → numeric compare when possible; otherwise string compare
- GT, GREATER, > → to_numeric(df[col]) > to_numeric(value)
- GE, GREATER_EQUAL, >= → to_numeric(df[col]) >= to_numeric(value)
- LT, LESS, < → to_numeric(df[col]) < to_numeric(value)
- LE, LESS_EQUAL, <= → to_numeric(df[col]) <= to_numeric(value)
- CONTAINS → df[col].astype('string').str.contains(value, case=True, na=False)
- STARTS_WITH / ENDS_WITH → df[col].astype('string').str.startswith/endswith(value, na=False)
33RProp MLP Learnermlp_learner.py- Mapping: classcol→target; hiddenlayer→#hidden layers; nrhiddenneurons→neurons per layer;
maxiter→max_iter; ignoremv→drop rows with NA in X/y; useRandomSeed/randomSeed→random_state.
- Topology: hidden_layer_sizes = [n_hidden_neurons] × n_hidden_layers.
- Implementation detail: scikit-learn has no RProp; uses MLPClassifier (solver='adam') as an
approximation.
- Features: all numeric/bool columns except target. If ignoremv=False, upstream imputation may be
required (sklearn MLP does not accept NaNs).
34Rule Enginerule_engine.py- Supported rules: TRUE => "out"; $col$ <op> value => "out" with <, <=, >, >=, =, ==, !=;
$col$ LIKE "pat" (uses * as wildcard; converted to a regex). A trailing TRUE acts as default.
- Column output: append to a new column if configured; otherwise replace the specified column;
falls back to "RuleResult" when no name is provided.
- Literals: numeric strings are emitted as numbers; everything else is a quoted Python literal.
- Limitations: no AND/OR chaining, no between/in lists, no regex beyond LIKE→wildcard, and no
type coercion beyond basic string/number handling.
35Scorerscorer.py- Columns: 'first' → truth column, 'second' → prediction column (default "Prediction (<truth>)").
ignore.missing.values=true drops NA before scoring; false keeps NA (sklearn metrics may fail).
- Confusion matrix labels: union of values from truth and prediction in order of appearance.
36SMOTEsmote.py- Feature/target: uses all numeric/bool columns as features and the configured class/target.
- Methods:
• oversample_equal → sampling_strategy='auto' (minorities up to majority)
• otherwise uses rate: (0,1] → target_n ≈ rate * majority_n; >1 → target_n ≈ rate * minority_n
- kNN: k_neighbors is clamped to ≤ (minority_count - 1) to avoid imblearn errors.
- Fallbacks: if no target, no numeric features, single-class, or SMOTE raises, the original df
is returned unchanged.
37Statistics (Extended)statistics.py- compute_median: bool → include Median in numeric stats
- filter_nominal_columns/included_names: list → which columns to treat as nominal
- num_nominal-values_output: int → cap of categories per nominal column for Port 3 (occurrence table)
38String Manipulation (Multi Column)string_mamipulatioin_mc.py- Append vs Replace: APPEND_OR_REPLACE ∈ {"APPEND_COLUMNS","REPLACE_COLUMNS"}
* Append uses APPEND_COLUMN_SUFFIX (default "_transformed")
- Missing handling: values are processed with pandas 'string' dtype to preserve NA
- Abort flag ("Abort execution on evaluation errors"): when False, per-column exceptions are
swallowed; when True, exceptions raise and stop execution.
39String to Numberstring_to_number.py- Column selection: taken from model/include/included_names (present columns only).
- Separators: supports custom decimal separator and optional thousands separator.
- Target type: inferred from parse_type/cell_class (DoubleCell→Float64, Int/Long→Int64).
- Error handling: if fail_on_error==True → raise on any parse issue; otherwise coerce to NA.
- Missing values: preserved (pandas NA) via pd.to_numeric(..., errors='coerce') when not failing.
40SVM Learnersvm_learner.py- Feature coefficients only exist for linear/separable cases; for non-linear kernels we emit an
empty coefficient table.
- Scaling is not applied here; if KNIME’s node performs internal scaling, replicate upstream.
- Random seed: SVC uses it for probability calibration; default to 1 for reproducibility.
41SVM Predictorsvm_predictor.py- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back to a bare
estimator and infers features if absent (raises KeyError if required columns are missing).
- Prediction column name: custom if "change prediction" is true and a name is provided;
otherwise "Prediction (<target>)".
- Probabilities: when predict_proba exists, adds "P (<target>=<class>)<suffix>" columns.
42Table Viewtable_view.pyThis view node intentionally writes NO outputs to the workflow context – it only prints.
43Value Lookupvalue_lookup.pyMerge details:
- To avoid dtype mismatches we always cast join keys to pandas 'string' dtype.
- If caseSensitive is False we compare lowercased string keys.
- We avoid name collisions by suffixing new columns with "_lkp" when needed.
44X-Aggregatorx_aggregator.pyOn intermediate folds: no outputs are published (is_complete=False).
45X-Validation Partitioner (Loop Start)x_partitioner.pyDependencies & helpers
• Uses scikit-learn splitters: KFold, StratifiedKFold, LeaveOneOut.
• Relies on lxml for settings parsing and project helpers (first, first_el, normalize_in_ports,
collect_module_imports, split_out_imports, iter_entries).