Brute Force Plotter

[Work in progress] Tool to visualize data quickly with no brain usage for plot creation

Installation

Using UV (Recommended)

UV is a fast Python package installer and resolver. First, install UV:

$ pip install uv

Then install the project using:

$ git clone https://github.com/eyadsibai/brute_force_plotter.git
$ cd brute_force_plotter
$ uv sync

This will create a virtual environment (.venv) and install all dependencies with locked versions for reproducibility.

Useful UV Commands:

uv sync - Install dependencies and sync the environment
uv add <package> - Add a new dependency
uv remove <package> - Remove a dependency
uv lock - Update the lockfile
uv run <command> - Run a command in the virtual environment

Usage

As a Python Library (NEW!)

You can now use brute-force-plotter directly in your Python scripts:

import pandas as pd
import brute_force_plotter as bfp

# Load your data
data = pd.read_csv('data.csv')

# Define data types (c=category, n=numeric, g=geocoordinate, i=ignore)
# Define data types (c=category, n=numeric, t=timeseries, i=ignore)
# Option 1: Automatic type inference (NEW!)
output_path, dtypes = bfp.plot(data)
print(f"Inferred types: {dtypes}")

# Option 2: Manual type definition (c=category, n=numeric, i=ignore)
dtypes = {
    'column1': 'n',  # numeric
    'column2': 'c',  # category
    'column3': 't',  # time series (datetime)
    'column4': 'i'   # ignore
}

# Create and save plots (always returns tuple)
output_path, dtypes_used = bfp.plot(data, dtypes, output_path='./plots')

# Or show plots interactively
bfp.plot(data, dtypes, show=True)

# Example with geocoordinates
geo_data = pd.read_csv('cities.csv')
geo_dtypes = {
    'latitude': 'g',   # geocoordinate
    'longitude': 'g',  # geocoordinate
    'city_type': 'c',  # category
    'population': 'n'  # numeric
}
bfp.plot(geo_data, geo_dtypes, output_path='./maps')
output_path, dtypes_used = bfp.plot(data, dtypes, show=True)

# Generate minimal set of plots (reduces redundant visualizations)
output_path, dtypes_used = bfp.plot(data, dtypes, output_path='./plots', minimal=True)

# Option 3: Manually infer types first, then edit if needed
dtypes = bfp.infer_dtypes(data)
# Edit dtypes as needed...
output_path, dtypes_used = bfp.plot(data, dtypes, output_path='./plots')

See example/library_usage_example.py for more examples.

As a Command-Line Tool

Example

It was tested on python3 only (Python 3.10+ required)

Using UV:

$ git clone https://github.com/eyadsibai/brute_force_plotter.git
$ cd brute_force_plotter
$ uv sync

# With automatic type inference (NEW!)
$ uv run python -m src example/titanic.csv example/output --infer-dtypes --save-dtypes example/auto_dtypes.json

# With manual type definition
$ uv run python -m src example/titanic.csv example/titanic_dtypes.json example/output

# Or use the brute-force-plotter command:
$ uv run brute-force-plotter example/titanic.csv example/titanic_dtypes.json example/output

Command Line Options

--skip-existing: Skip generating plots that already exist (default: True)
--theme: Choose plot style theme (darkgrid, whitegrid, dark, white, ticks) (default: darkgrid)
--n-workers: Number of parallel workers for plot generation (default: 4)
--export-stats: Export statistical summary to CSV files
--minimal: Generate minimal set of plots (reduces redundant visualizations)
--infer-dtypes: Automatically infer data types from the data (NEW!)
--save-dtypes PATH: Save inferred or used dtypes to a JSON file (NEW!)
--max-rows: Maximum number of rows before sampling is applied (default: 100,000)
--sample-size: Number of rows to sample for large datasets (default: 50,000)
--no-sample: Disable sampling for large datasets (may cause memory issues)

Using UV:

$ uv run brute-force-plotter example/titanic.csv example/titanic_dtypes.json example/output --theme whitegrid --n-workers 8 --export-stats

# Generate minimal set of plots (fewer redundant visualizations)
$ uv run brute-force-plotter example/titanic.csv example/titanic_dtypes.json example/output --minimal
$ uv run brute-force-plotter example/titanic.csv example/output --infer-dtypes --save-dtypes example/auto_dtypes.json --theme whitegrid --n-workers 8 --export-stats

Large Dataset Handling

For datasets exceeding 100,000 rows, brute-force-plotter automatically samples the data to improve performance and reduce memory usage. This ensures plots are generated quickly even with millions of rows.

Default Behavior:

Datasets with ≤ 100,000 rows: No sampling, all data is used
Datasets with > 100,000 rows: Automatically samples 50,000 rows for visualization
Statistical exports (--export-stats) always use the full dataset for accuracy

Customization:

# Increase sampling threshold to 200,000 rows
$ python3 -m src data.csv dtypes.json output --max-rows 200000

# Use a larger sample size (75,000 rows)
$ python3 -m src data.csv dtypes.json output --sample-size 75000

# Disable sampling entirely (use with caution for very large datasets)
$ python3 -m src data.csv dtypes.json output --no-sample

Time Series Example

The tool now supports time series data! Here's how to visualize time series:

# Generate example time series data
$ python3 example/timeseries_example.py

# Plot the time series data
$ python3 -m src example/timeseries_data.csv example/timeseries_dtypes.json example/timeseries_output

The time series example generates plots for:

Single time series line plots
Numeric values over time (e.g., sales over time)
Multiple time series overlays
Grouped time series by category (e.g., sales by region over time)

Time Series dtypes example:

{
  "date": "t",           # time series column
  "temperature": "n",    # numeric - will plot over time
  "sales": "n",          # numeric - will plot over time
  "region": "c",         # category - will group time series
  "id": "i"              # ignore
}

Library Usage:

import pandas as pd
import brute_force_plotter as bfp

# Load a large dataset
data = pd.read_csv('large_data.csv')  # e.g., 500,000 rows

dtypes = {'col1': 'n', 'col2': 'c'}

# Automatic sampling (default: max_rows=100000, sample_size=50000)
bfp.plot(data, dtypes, output_path='./plots')

# Custom sampling parameters
bfp.plot(data, dtypes, output_path='./plots', max_rows=200000, sample_size=75000)

# Disable sampling
bfp.plot(data, dtypes, output_path='./plots', no_sample=True)

Note: Sampling uses a fixed random seed (42) for reproducibility, ensuring consistent results across multiple runs.

Arguments

json.dump({k:v.name for k,v in df.dtypes.to_dict().items()},open('dtypes.json','w'))
the first argument is the input file (csv file with data) example/titanic.csv
second argument is a json file with the data types of each columns:
- c for category
- n for numeric
- g for geocoordinate (latitude/longitude) - NEW!
- i for ignore
Example: example/titanic_dtypes.json
second argument is a json file with the data types of each columns (c for category, n for numeric, t for time series, i for ignore) example/titanic_dtypes.json
the first argument is the input file (csv file with data) example/titanic.csv
second argument is a json file with the data types of each columns (c for category, n for numeric, i for ignore) example/titanic_dtypes.json

{
"Survived": "c",
"Pclass": "c",
"Sex": "c",
"Age": "n",
"SibSp": "n",
"Parch": "n",
"Fare": "n",
"Embarked": "c",
"PassengerId": "i",
"Ticket": "i",
"Cabin": "i",
"Name": "i"
}

third argument is the output directory

Geocoordinate Example

For data with latitude and longitude columns:

{
  "city": "i",
  "latitude": "g",
  "longitude": "g",
  "population": "n",
  "category": "c"
}

See example/cities_geo.csv and example/cities_geo_dtypes.json for a complete example.

c stands for category, i stands for ignore, n for numeric, t for time series (datetime)

Minimal Mode

The --minimal flag reduces the number of plots generated by removing redundant visualizations while keeping the most informative ones:

What's reduced in minimal mode:

Correlation matrices: Only Spearman correlation (removes Pearson correlation)
- Spearman is more robust to outliers and works for both linear and monotonic relationships
Category vs Category: Only heatmap (removes bar plot)
- Heatmap shows the same information more compactly
Category vs Numeric: Only box plot and violin plot (removes bar plot and strip plot)
- Box and violin plots are the most informative for showing distributions

What's kept in minimal mode:

All single-variable distributions (histograms, violin plots, bar plots)
All numeric vs numeric scatter plots
Missing values heatmap

Example reduction: For the Titanic dataset, minimal mode generates 38 plots instead of 45 (15.6% reduction).

Use --minimal when you want to:

Reduce clutter in your output directory
Focus on the most informative visualizations
Speed up plot generation for large datasets

Features

The tool automatically generates:

Distribution Plots:

Histogram with KDE for numeric variables
Violin plots for numeric variables
Bar plots for categorical variables
Correlation matrices (Pearson and Spearman, or just Spearman in minimal mode)
Line plots for time series variables
Correlation matrices (Pearson and Spearman)
Missing values heatmap

2D Interaction Plots:

Scatter plots for numeric vs numeric
Heatmaps for categorical vs categorical (and bar plots in full mode)
Bar/Box/Violin/Strip plots for categorical vs numeric (Box/Violin only in minimal mode)
Heatmaps for categorical vs categorical
Bar/Box/Violin/Strip plots for categorical vs numeric
Line plots for time series vs numeric (values over time)
Multiple time series overlays for time series vs time series

3D Interaction Plots:

Grouped time series plots (time series + category + numeric)
- Shows how numeric values change over time, grouped by categorical values

Map Visualizations (NEW!):

Interactive maps for geocoordinate data (latitude/longitude)
Color-coded markers based on categorical variables
Automatic detection of lat/lon column pairs
Support for common naming patterns (lat, lon, latitude, longitude, x_coord, y_coord)

Statistical Summaries (with --export-stats):

Numeric statistics (mean, std, min, max, quartiles)
Category value counts
Missing values analysis

Example Plots

Testing

The project includes a comprehensive test suite with 81+ tests covering unit tests, integration tests, and edge cases.

Running Tests

# Run all tests
$ pytest

# Run with coverage report
$ pytest --cov=src --cov-report=html

# Run specific test categories
$ pytest -m unit          # Unit tests only
$ pytest -m integration   # Integration tests only
$ pytest -m edge_case     # Edge case tests only

# Run tests in parallel (faster)
$ pytest -n auto

# Run with verbose output
$ pytest -v

Test Coverage

The test suite achieves ~96% code coverage and includes:

Unit tests: Core plotting functions, utilities, statistical exports, large dataset handling
Integration tests: CLI interface, library interface, end-to-end workflows
Edge case tests: Empty data, missing values, many categories, Unicode support

Writing Tests

When contributing, please:

Add tests for new features in the appropriate test file
Ensure tests pass locally before submitting PR
Aim for >90% code coverage for new code
Use the fixtures in conftest.py for test data

Development

Setting Up for Development

When developing for this project, it's important to set up code quality tools to ensure consistency:

1. Install Development Dependencies

Using UV:

$ uv sync  # Installs all dependencies including dev tools

2. Install Pre-commit Hooks (REQUIRED)

This project uses pre-commit hooks to automatically enforce code quality standards on every commit:

$ pre-commit install

After installation, the hooks will run automatically on git commit and check:

✅ Ruff linting (with auto-fix)
✅ Ruff formatting
✅ Trailing whitespace removal
✅ End-of-file fixes
✅ YAML/JSON/TOML validation
✅ Large file detection

3. Manual Code Quality Checks

You can also run these checks manually:

# Lint code (check for issues)
$ ruff check .

# Lint and auto-fix issues
$ ruff check --fix .

# Format code
$ ruff format .

# Run all pre-commit hooks on all files
$ pre-commit run --all-files

4. Running Tests

Always run tests before submitting changes:

$ pytest

Why Pre-commit Hooks?

Pre-commit hooks ensure that:

All code follows consistent style guidelines
Linting issues are caught before they reach CI
Code quality is maintained automatically
Review cycles are faster (no style nitpicks)

Note: If you try to commit code that doesn't pass the checks, the commit will be blocked. Fix the issues reported and commit again.

Recent Updates (2025)

✅ Updated all dependencies to latest stable versions ✅ Added correlation matrix plots (Pearson and Spearman) ✅ Added missing values visualization ✅ Added statistical summary export ✅ Added configurable plot themes ✅ Added parallel processing controls ✅ Added skip-existing-plots option ✅ Improved logging and progress indicators ✅ Code cleanup and better error handling ✅ Comprehensive test suite with 96% coverage ✅ Interactive map visualization for geocoordinate data (NEW!) ✅ Time series support with line plots, grouped plots, and multi-series overlays ✅ Automatic data type inference - No need to manually specify data types! ✅ Comprehensive test suite with 96% coverage (81+ tests) ✅ Large dataset fallback with automatic sampling

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines on:

Setting up your development environment
Using code quality tools (Ruff, pre-commit)
Submitting pull requests
Coding standards and best practices

Code Organization

The project follows a modular architecture for better maintainability and reduced merge conflicts:

src/
├── core/               # Core functionality
│   ├── config.py      # Global configuration
│   ├── data_types.py  # Type inference
│   └── utils.py       # Utilities
├── plotting/          # Visualization modules
│   ├── base.py        # Common plotting functions
│   ├── single_variable.py
│   ├── two_variable.py
│   ├── three_variable.py
│   ├── summary.py
│   ├── timeseries.py
│   └── maps.py
├── stats/             # Statistical exports
│   └── export.py
├── cli/               # Command-line interface
│   ├── commands.py
│   └── orchestration.py
├── library.py         # Python API
└── brute_force_plotter.py  # Compatibility layer

This structure enables parallel development and makes it easier to locate and modify specific functionality.

Contributors

Code Contributors

Eyad Sibai / @eyadsibai

Special Thanks

The following haven't provided code directly, but have provided guidance and advice:

Andreas Meisingseth / @AndreasMeisingseth
Tom Baylis / @tbaylis

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
example		example
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Brute Force Plotter

Installation

Usage

Example

Command Line Options

Large Dataset Handling

Time Series Example

Arguments

Geocoordinate Example

Minimal Mode

Features

Example Plots

Testing

Development

Setting Up for Development

Why Pre-commit Hooks?

Recent Updates (2025)

Contributing

Code Organization

Contributors

Contributors

Code Contributors

Special Thanks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

eyadsibai/brute-force-plotter

Folders and files

Latest commit

History

Repository files navigation

Brute Force Plotter

Installation

Usage

Example

Command Line Options

Large Dataset Handling

Time Series Example

Arguments

Geocoordinate Example

Minimal Mode

Features

Example Plots

Testing

Development

Setting Up for Development

Why Pre-commit Hooks?

Recent Updates (2025)

Contributing

Code Organization

Contributors

Contributors

Code Contributors

Special Thanks

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages