View on GitHub

rdga_4k

Random Data Generator Algorithm for Clustering

Project Author Python Version License

rdga_4k (Random Data Generator Algorithm for Clustering)

The rdga_4k library generates synthetic datasets tailored for clustering algorithm applications. It provides two core functions, catbird and canard, for customizable dataset generation with support for binary and categorical features.


🔥 Features


🛠 Installation

Install using git and pip install:

pip install git+https://github.com/aquinordg/rdga_4k.git


🚀 Usage

Import the library and use the catbird or canard functions to generate datasets:

from rdga_4k import catbird, canard

# Example using catbird
X, y = catbird(
    n_feat=10,
    feat_sig=[3, 2],
    rate=[50, 50],
    lmbd=0.7,
    eps=0.1,
    random_state=42
)

# Example using canard
X, y = canard(
    n_feat=10,
    n_cat=3,
    rate=[50, 50],
    lmbd=5,
    eps=0.2,
    random_state=42
)

📜 Functions Overview

catbird

Generates a labeled dataset with binary features based on feature clustering.

Parameters

Returns

Example

X, y = catbird(n_feat=10, feat_sig=[3, 2], rate=[50, 50], lmbd=0.7, eps=0.1, random_state=42)

canard

Generates a labeled dataset with categorical features divided into multiple categories.

Parameters

Returns

Example

X, y = canard(n_feat=10, n_cat=3, rate=[50, 50], lmbd=5, eps=0.2, random_state=42)

📄 License

This project is licensed under the MIT License.


🤝 Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a new branch.
  3. Commit your changes.
  4. Push to the branch.
  5. Open a pull request.

For questions or information, feel free to reach out at: aquinordga@gmail.com.


👨‍💻 Author

Developed by AQUINO, R. D. G. Lattes ORCID Google Scholar


💬 Feedback

Feel free to open an issue or contact me for feedback or feature requests. Your input is highly appreciated!