knime2py — Implemented Nodes

This page lists the KNIME nodes currently supported for code generation. For unsupported nodes, the generator produces a best-effort stub and TODOs to guide manual implementation.

#	KNIME Node	Module	Notes
1	Color Manager	`color_manager.py`	KNIME color annotations are UI metadata and have no native representation in pandas; we therefore forward the input table unchanged to all outputs.
2	Column Appender	`column_appender.py`	- Settings read: selected_rowid_mode, selected_rowid_table, selected_rowid_table_number (base suffix defaults to "_r"; final suffix becomes f"{base}{k}" per right table). - Alignment: IDENTICAL → index join; other modes → positional concat with reset index.
3	Column Filter (exclude-only)	`column_filter.py`	- Excludes are parsed heuristically from settings.xml by scanning <config> blocks whose keys contain "exclude", collecting list entries (<entry key='0' value='Col'/> or <entry key='name'/>). - Dropping uses errors='ignore' so missing columns won't fail the cell. - If no excludes are found, the node is a passthrough.
4	Column Renamer	`column_renamer.py`	• Supports only explicit (old → new) mappings from settings.xml. • No pattern/regex templating, no type-based renames, no column reordering.
5	Concatenate	`concatenate.py`	- No suffixing or renaming of columns. - No column intersection logic; pandas default union alignment is used. - Row index is reset (0..N-1) via ignore_index=True.
6	CSV Reader	`csv_reader.py`	pandas>=1.5 recommended (nullable dtypes supported in dtype mapping). Quote/escape are passed to pandas. If escapechar equals quotechar, we omit escapechar and rely on double-quote parsing (avoids C-engine "EOF inside string" errors). Dtype mapping is derived from table_spec_config_Internals; unknown types are left to inference. Path resolution supports LOCAL and RELATIVE knime.workflow only; other FS types are not yet handled. Robust NA/dtype handling: - Treat '' and ' ' as missing on read (na_values=['', ' '], keep_default_na=True, skipinitialspace=True) - Read WITHOUT dtype=..., then coerce per-column: * numeric targets ('Int64', 'Float64') via pd.to_numeric(..., errors='coerce').astype(target) * other types via .astype(target)
7	CSV Writer	`csv_writer.py`	pandas>=1.5 recommended for consistent NA/nullable dtype handling. Path resolution supports LOCAL absolute paths and RELATIVE knime.workflow; other FS types are not yet handled. Directory creation is not automatic; ensure out_path.parent exists before writing. Line terminator / quoting mode / doublequote / escapechar are not explicitly mapped unless present; pandas defaults apply. File is overwritten by default; KNIME “append/overwrite” style flags are not implemented here.
8	Decision Tree Learner	`decision_tree_learner.py`	Pruning options (e.g., pruningMethod/Reduced Error Pruning) are not available in sklearn DT; consider ccp_alpha for cost-complexity pruning if needed. First-split constraints and binary nominal split settings are not supported by sklearn. Feature importances are impurity-based (Gini/entropy); consider permutation importances if you need model-agnostic measures. Library expectations: pandas>=1.5, numpy>=1.23, scikit-learn>=1.2 recommended.
9	Decision Tree Predictor	`decision_tree_predictor.py`	The estimator itself must be scikit-learn-like. Scope: classification predictor only. Multi-output and regression variants are not handled
10	Equal Size Sampling	`equal_size_sampling.py`	Exact mode only: “Approximate” sampling is not implemented in this generator. Requires pandas; scikit-learn is used only for resample() (no synthetic example generation). Seed is used when provided; default fallback is 1 for deterministic output. Order of rows after concatenation is re-sorted back to the original index.
11	Excel Reader	`excel_reader.py`	Covered mappings (KNIME → pandas): • Path: LOCAL & RELATIVE (knime.workflow) via resolve_reader_path() • Sheet selection: sheet_selection ∈ {FIRST, NAME, INDEX} → sheet (0 \| 'name' \| index) • Header: table_contains_column_names + column_names_row_number → header (0-based) or None • Column range: read_from_column/read_to_column → usecols="A:D" (Excel A1-style column span) • Row range: read_from_row/read_to_row → skiprows / nrows (best-effort) • Dtypes: table_spec_config_Internals → dtype mapping (nullable pandas dtypes when possible) • Replace empty strings with missings: advanced_settings.replace_empty_strings_with_missings
12	Excel Writer	`excel_writer.py`	- Only XLSX is supported (engine='openpyxl'). Legacy XLS (xls) is not implemented. - KNIME-style row-wise append into an existing sheet is not fully replicated. Pandas does not support true “append to bottom” without custom openpyxl manipulation. - We honor if_sheet_exists and header flags but do not append rows. - Auto-size columns, print layout, formula evaluation, and “open file after exec” are not supported.
13	Gradient Boosted Trees (Classification) Learner	`gbt_learner.py`	- Feature selection: use included_names if present; otherwise all numeric/boolean columns except the target. Excluded_names are removed afterward. If no target is configured, the node is a passthrough: bundle=None and empty outputs with an error note in the summary. - Hyperparameters mapped: nrModels→n_estimators, learningRate→learning_rate, maxLevels (-1/absent → default 3)→max_depth, minNodeSize→min_samples_split (≥2), minChildSize→min_samples_leaf (≥1), dataFraction (0<≤1)→subsample (stochastic GB), columnSamplingMode→max_features (None/'sqrt'/'log2'/fraction/int), seed→random_state. Seed defaults to 1 for deterministic output. - Unsupported/orthogonal flags: splitCriterion (trees in sklearn GBT have fixed criterion), missingValueHandling (impute beforehand), useAverageSplitPoints, useBinaryNominalSplits, isUseDifferentAttributesAtEachNode (no direct sklearn analog). These are noted and ignored. - Outputs: port 1=model bundle (estimator, metadata), port 2=feature_importances_, port 3=summary. - Dependencies: lxml for XML parsing; pandas/numpy for data handling; scikit-learn for modeling. - KNIME seeds can be > 232-1. We now coerce to a valid sklearn seed: seed32 = None if seed is None else int(abs(int(seed)) % (232))
14	Gradient Boosted Trees (Classification) Predictor	`gbt_predictor.py`	- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back gracefully to bare estimator and infers features if needed (raises KeyError if required columns are missing). - Prediction column name: custom if configured, else "Prediction (<target>)". - Probabilities: adds per-class "P (<target>=<class>)<suffix>" when predict_proba is available; may also append "<prediction> (confidence)" as max probability. - Optional: append number of boosted estimators as "<prediction> (models)". - Ignored flag: 'useSoftVoting' (not applicable to sklearn GBT).
15	K Nearest Neighbor (single-node trainer + scorer)	`knn.py`	- Inputs: one table with a target column (classColumn) plus feature columns. - Feature selection: all numeric/boolean columns except the target. Values are coerced to numeric (invalid → NaN) and filled with 0.0 to satisfy KNN distance computations. - Hyperparameters: k (neighbors), weightByDistance → weights ('uniform'\|'distance').
16	Linear Correlation	`linear_corellation.py`	Settings honored (from settings.xml): - include-list: included_names / excluded_names + enforce_option (EnforceInclusion/EnforceExclusion) - pvalAlternative: TWO_SIDED \| GREATER \| LESS (re-scales p from two-sided if SciPy is available) - columnPairsFilter: COMPATIBLE_PAIRS \| ALL_PAIRS (we compute numeric↔numeric only)
17	Logistic Regression Learner	`logreg_learner.py`	- Feature selection: use included_names if set; otherwise all numeric/boolean columns minus the target; then remove excluded_names. - Hyperparameter mapping: solver(KNIME→sklearn), maxEpoch→max_iter, epsilon→tol, seed→random_state. Target reference category is recorded as metadata only (no sklearn equivalent).
18	Logistic Regression Predictor	`logreg_predictor.py`	- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back to a bare estimator and infers features if absent (raises KeyError if required columns are missing). - Prediction column name: custom if configured; otherwise "Prediction (<target>)". - Probabilities: when predict_proba exists, adds "P (<target>=<class>)<suffix>" columns. - XML quirks: reads KNIME’s misspelled keys verbatim (has_custom_predicition_name, include_probabilites, propability_columns_suffix).
19	Math Formula (JEP)	`math_formula.py`	- Only a small set of functions is mapped (ln, log, log10, sqrt, exp, round, ceil, floor). - Advanced JEP functions/operators not listed above are not translated.
20	Missing Value Handler	`missing_value.py`	- Integers: mean/median/mode fills are rounded and per-column recast to nullable Int64. - Skip branches always contain an executable statement (pass) to avoid IndentationError. - We never emit .fillna(None).
21	MLP Predictor	`mlp_predictor.py`	- Settings: "change prediction" (bool), "prediction column name" (string), "append probabilities" (bool), "class probability suffix" (e.g., "_AN").
22	Naive Bayes Learner (GaussianNB with optional one-hot on categoricals)	`naive_bayes_learner.py`	Settings parsed (best effort): - classifyColumn (target) - threshold → var_smoothing for GaussianNB - minSdValue, minSdThreshold (not directly supported in sklearn; documented in meta) - maxNoOfNomVals (categorical columns with > N unique values are ignored) - skipMissingVals (True → drop rows with any missing among selected features; False → impute numeric with mean and keep dummy_na for categoricals)
23	Naive Bayes Predictor	`naive_bayes_predictor.py`	- Inputs: Port 1 → model bundle (dict with {'estimator','features','target','classes','meta': {...}}) or a bare sklearn estimator as fallback Port 2 → data table to score - Prediction column: • If settings["change prediction"] is True and a custom name is provided, uses it. • Otherwise defaults to "Prediction (<target>)". - Probabilities: • If enabled, adds "P (<target>=<class>)<suffix>" per class (suffix from settings, default "_NB"). - Feature matrix reconstruction: • Prefer bundle['features'] (order preserved). • Else use getattr(estimator, 'feature_names_in_', None). • Else build from data: numeric columns + one-hot for all non-numeric; then align: - add missing expected columns with 0, - drop extra columns not in features, using bundle['meta'] flags when available (e.g., skip_missing) for dummy_na policy.
24	Normalizer	`normalizer.py`	- Column selection: use included_names if set; else all numeric dtypes (Int/int/Float/float); drop excluded_names afterward. - Modes: MINMAX uses new-min/new-max (constant/empty columns map to new_min); ZSCORE uses (x-mean)/std (zero std → 0.0).
25	One to Many (One-Hot Encoding)	`one_to_many.py`	- Column selection: parsed from model/columns2Btransformed with EnforceInclusion/EnforceExclusion semantics, restricted to string-like dtypes (string/object/category). - Naming: new columns are prefixed with the source column and '=' separator (e.g., "Region=West") to avoid collisions when different columns share the same category label. - Missing values: not encoded (rows with NA get all zeros for that column’s dummies). - removeSources: if true, drops the original columns after expansion.
26	Partitioning	`partitioning.py`	- Implementation: sklearn.model_selection.train_test_split; seed honored when provided. - STRATIFIED: uses class_column; NaN treated as a separate class; falls back to non-stratified if stratification is infeasible (e.g., tiny classes). - RELATIVE: fraction is clamped to [0,1]. ABSOLUTE: train_size is an integer bounded by len(df).
27	Random Forest (Classification) Learner	`random_forest_learner.py`	- Feature selection: use included_names if provided; otherwise all numeric/boolean columns except the target; excluded_names are removed afterward. - Hyperparameter mapping: nrModels→n_estimators; maxLevels>0→max_depth else None; minNodeSize→min_samples_split; minChildSize→min_samples_leaf; isDataSelectionWithReplacement→bootstrap; dataFraction→max_samples (only when bootstrap=True); columnSamplingMode/columnFractionPerTree/columnAbsolutePerTree plus isUseDifferentAttributesAtEachNode→max_features ('sqrt'/'log2'/1.0/fraction/int); seed→random_state. - Info-only flags (not applied in sklearn RF): splitCriterion, missingValueHandling, useAverageSplitPoints, useBinaryNominalSplits; noted and ignored.
28	Random Forest (Classification) Predictor	`random_forest_predictor.py`	- Ports: In1=model bundle, In2=data table, Out1=predicted table. - Bundle keys (if present): {'estimator','features','target','classes',...}; falls back to a bare estimator and infers features if absent (raises KeyError if required columns are missing). - Prediction column name: custom if configured; otherwise "Prediction (<target>)". - Probabilities: when available, adds "P (<target>=<class>)<suffix>"; may also append "<prediction> (confidence)" as max probability. Optional "Model Count" from n_estimators. - 'useSoftVoting' is informational; sklearn RandomForest averages probabilities by design.
29	Reference Row Splitter	`reference_row_splitter.py`	• Join keys are coerced to pandas 'string' dtype; NaNs in the reference key set are ignored. • If a configured column is missing, a clear KeyError is raised.
30	ROC Curve	`roc_curve.py`	Supports both KNIME view configuration variants: 1) Newer: - view/targetColumnV3 → truth column - view/predictionColumnsV2/manualFilter/manuallySelected → probability columns 2) Older: - view/targetColumn/selected → truth column - view/predictionColumns/selected_Internals → probability columns - (also checks view/predictionColumns/manualFilter/manuallySelected if present)
31	Row Aggregator	`row_aggregator.py`	Keys used (model): categoryColumn (string \| null) aggregationMethod (COUNT \| SUM \| AVERAGE \| MINIMUM \| MAXIMUM) frequencyColumns/selected_Internals + manualFilter/manuallySelected → aggregation column names weightColumn (string \| null) — only SUM/AVERAGE use it grandTotals (boolean)
32	Row Filter	`row_filter.py`	Supported operators (heuristic mapping): - IS_MISSING → df[col].isna() - IS_NOT_MISSING → df[col].notna() - EQ, EQUAL(S), = → numeric compare when possible; otherwise string compare - NE, NOT_EQUAL, <>, != → numeric compare when possible; otherwise string compare - GT, GREATER, > → to_numeric(df[col]) > to_numeric(value) - GE, GREATER_EQUAL, >= → to_numeric(df[col]) >= to_numeric(value) - LT, LESS, < → to_numeric(df[col]) < to_numeric(value) - LE, LESS_EQUAL, <= → to_numeric(df[col]) <= to_numeric(value) - CONTAINS → df[col].astype('string').str.contains(value, case=True, na=False) - STARTS_WITH / ENDS_WITH → df[col].astype('string').str.startswith/endswith(value, na=False)
33	RProp MLP Learner	`mlp_learner.py`	- Mapping: classcol→target; hiddenlayer→#hidden layers; nrhiddenneurons→neurons per layer; maxiter→max_iter; ignoremv→drop rows with NA in X/y; useRandomSeed/randomSeed→random_state. - Topology: hidden_layer_sizes = [n_hidden_neurons] × n_hidden_layers. - Implementation detail: scikit-learn has no RProp; uses MLPClassifier (solver='adam') as an approximation. - Features: all numeric/bool columns except target. If ignoremv=False, upstream imputation may be required (sklearn MLP does not accept NaNs).
34	Rule Engine	`rule_engine.py`	- Supported rules: TRUE => "out"; $col$ <op> value => "out" with <, <=, >, >=, =, ==, !=; $col$ LIKE "pat" (uses * as wildcard; converted to a regex). A trailing TRUE acts as default. - Column output: append to a new column if configured; otherwise replace the specified column; falls back to "RuleResult" when no name is provided. - Literals: numeric strings are emitted as numbers; everything else is a quoted Python literal. - Limitations: no AND/OR chaining, no between/in lists, no regex beyond LIKE→wildcard, and no type coercion beyond basic string/number handling.
35	Scorer	`scorer.py`	- Columns: 'first' → truth column, 'second' → prediction column (default "Prediction (<truth>)"). ignore.missing.values=true drops NA before scoring; false keeps NA (sklearn metrics may fail). - Confusion matrix labels: union of values from truth and prediction in order of appearance.
36	SMOTE	`smote.py`	- Feature/target: uses all numeric/bool columns as features and the configured class/target. - Methods: • oversample_equal → sampling_strategy='auto' (minorities up to majority) • otherwise uses rate: (0,1] → target_n ≈ rate * majority_n; >1 → target_n ≈ rate * minority_n - kNN: k_neighbors is clamped to ≤ (minority_count - 1) to avoid imblearn errors. - Fallbacks: if no target, no numeric features, single-class, or SMOTE raises, the original df is returned unchanged.
37	Statistics (Extended)	`statistics.py`	- compute_median: bool → include Median in numeric stats - filter_nominal_columns/included_names: list → which columns to treat as nominal - num_nominal-values_output: int → cap of categories per nominal column for Port 3 (occurrence table)
38	String Manipulation (Multi Column)	`string_mamipulatioin_mc.py`	- Append vs Replace: APPEND_OR_REPLACE ∈ {"APPEND_COLUMNS","REPLACE_COLUMNS"} * Append uses APPEND_COLUMN_SUFFIX (default "_transformed") - Missing handling: values are processed with pandas 'string' dtype to preserve NA - Abort flag ("Abort execution on evaluation errors"): when False, per-column exceptions are swallowed; when True, exceptions raise and stop execution.
39	String to Number	`string_to_number.py`	- Column selection: taken from model/include/included_names (present columns only). - Separators: supports custom decimal separator and optional thousands separator. - Target type: inferred from parse_type/cell_class (DoubleCell→Float64, Int/Long→Int64). - Error handling: if fail_on_error==True → raise on any parse issue; otherwise coerce to NA. - Missing values: preserved (pandas NA) via pd.to_numeric(..., errors='coerce') when not failing.
40	SVM Learner	`svm_learner.py`	- Feature coefficients only exist for linear/separable cases; for non-linear kernels we emit an empty coefficient table. - Scaling is not applied here; if KNIME’s node performs internal scaling, replicate upstream. - Random seed: SVC uses it for probability calibration; default to 1 for reproducibility.
41	SVM Predictor	`svm_predictor.py`	- Bundle keys (if present): {'estimator','features','target','classes',...}; falls back to a bare estimator and infers features if absent (raises KeyError if required columns are missing). - Prediction column name: custom if "change prediction" is true and a name is provided; otherwise "Prediction (<target>)". - Probabilities: when predict_proba exists, adds "P (<target>=<class>)<suffix>" columns.
42	Table View	`table_view.py`	This view node intentionally writes NO outputs to the workflow context – it only prints.
43	Value Lookup	`value_lookup.py`	Merge details: - To avoid dtype mismatches we always cast join keys to pandas 'string' dtype. - If caseSensitive is False we compare lowercased string keys. - We avoid name collisions by suffixing new columns with "_lkp" when needed.
44	X-Aggregator	`x_aggregator.py`	On intermediate folds: no outputs are published (is_complete=False).
45	X-Validation Partitioner (Loop Start)	`x_partitioner.py`	Dependencies & helpers • Uses scikit-learn splitters: KFold, StratifiedKFold, LeaveOneOut. • Relies on lxml for settings parsing and project helpers (first, first_el, normalize_in_ports, collect_module_imports, split_out_imports, iter_entries).