SoftwareEnergyCost

Google Summer of Code 2023


gsoc


Contributor : Manas Pratim Biswas

Description


Project Details

Estimate the energy efficiency and performance of a scientific software - Baler and attempt to identify where this efficiency can be improved.

Background: The Large Hadron Collider (LHC) experiments generate massive datasets composed of billions of proton-proton collisions. The analysis of this data requires high-throughput scientific computing that relies on efficient software algorithms. In todayโ€™s world, where energy crisis and environmental issues are becoming more pressing concerns, it is crucial that we start taking action to develop sustainable software solutions. As scientific software is being used more and more in high-throughput computing, there is a growing need to optimize its energy efficiency and reduce its carbon footprint.


Project Report

The Baler project is a collaboration between 12 research physicists, computer scientists, and machine learning experts at the universities of Lund, Manchester, and Uppsala. Baler, is a tool that uses machine learning to derive a compression that is tailored to the userโ€™s input data, achieving large data reduction and high fidelity where it matters. Read more about Baler

Throughout the summer, I have spent most of my time exploring profilers and learning about profiling small code snippets and softwares in general.

Initially, I profiled Baler with multiple profilers with varied techniques. Some profilers like codecarbon had to be used as wrappers or invoked as APIs while others like cProfile, pyinstruments, powermetrics had to be used as standalone commands with optional -flags directly from the terminal.

๐Ÿ’ก Visualizing cProfile logs

How to profile Baler using cProfile ?

Baler can be profiled with cProfile by adding the flag --cProfile while training

Example:

poetry run baler --project CFD_workspace CFD_project_animation --mode train --cProfile

The profile logs will be stored at: workspaces/CFD_workspace/CFD_project_animation/output/profiling/

Note: A Keyboard Interrupt is necessary to stop and exit from the SnakeViz server

cProfile profiles visualized using SnakeViz

call stack generated by SnakeViz from cProfile profiles

๐Ÿ•“ The majority time is taken by the optimizer for performing the gradient descent

Directed Graphs (Di Graphs):

Call Graphs:

Usage:

This is the call graph generated, rooted at the perform_training() function when baler is trained for 2000 epochs on the CFD Dataset

๐Ÿ•“ The majority time is taken by the optimizer for performing the gradient descent
๐Ÿ•“ The Back Propagation takes more time than Forward Propagation

Hence, the results are in compliance with each other

๐Ÿ’ก Powermetrics with InfluxDB

Powermetrics is a built-in tool for macOS which monitors CPU usage and determines how much CPU time and CPU power is being allocated to different quality of service classes (QoS).

InfluxDB is a time-series database that can be used to visualize the real-time logs generated by Powermetrics

  1. Install InfluxDB
brew install influxdb

In case you donโ€™t have homebrew, get homebrew from here

  1. Starting the InfluxDB service
brew services start influxdb

InfluxDB runs on port 8086 by default. Assuming that you have not made any changes to its default configuration, you can see if everything is properly installed and running by accessing it on localhost at port 8086

http://127.0.0.1:8086 or http://localhost:8086

If InfluxDB welcome page loads up, then everything is setup properly. The official documentation to setup the credentials and bucket can be found in the official documentation

  1. Scripting to save the powermetrics log into InfluxDB

This script should be executed before running Baler. The script runs continuously in the background while baler runs in one of the modes in a separate terminal.

from datetime import datetime
import signal
import sys
import subprocess

from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS

def powermetrics_profile():

    token = "<Your API Token>"
    org = "<Your Organization Name"
    bucket = "<Your Bucket Name>"

    client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)
    write_api = client.write_api(write_options=SYNCHRONOUS)

    process = subprocess.Popen("/usr/bin/powermetrics -i 300 --samplers cpu_power -a --hide-cpu-duty-cycle", shell=True, stdout=subprocess.PIPE, bufsize=3)
    while True:
        out = process.stdout.readline().decode()
        if out == '' and process.poll() != None:
            break
        if out != '':
            if ' Power: ' in out:
                metrics = out.split(' Power: ')
                point = Point(metrics[0]) \
                    .tag("host", "host1") \
                    .field("power", int(metrics[1].replace('mW', ''))) \
                    .time(datetime.utcnow(), WritePrecision.NS)

                write_api.write(bucket, org, point)
                sys.stdout.flush()


    with subprocess.Popen("/usr/bin/powermetrics -i 300 --samplers cpu_power -a --hide-cpu-duty-cycle | grep -B 2 'GPU Power'", shell=True, stdout=subprocess.PIPE, bufsize=3) as p:
        for c in iter(lambda: p.stdout.readline(), b''):
            sys.stdout.write(c)
        for line in p.stdout.read():
            metrics = line.split(' Power: ')
            print(line)

powermetrics_profile()

sudo access is needed to run the script. Suppose you have saved the script with the file name influxdb.py then run the script as below, depending on your environment.

sudo python3 influxdb.py

or

sudo poetry run influxdb.py
  1. InfluxDB dashboard setup
  1. Query to view the power consumption in real-time

Note: Here the bucket is named baler

from(bucket: "baler")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "CPU")
  |> map(fn: (r) => ({ r with _value: float(v:r._value)/1000.00}))
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "last")
from(bucket: "baler")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "GPU")
  |> map(fn: (r) => ({ r with _value: float(v:r._value)/1000.00}))
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "last")

Save the queries in the respective gauges and reload the page. The gauges should update values in realtime. The auto-refresh option can be set to indefinite and 10s.

  1. Sample Outputs from InfluxDB dashboard. All the reading are in $Watt$

Baler training starts

Baler training ongoing

Baler training ends

Note: A Keyboard Interrupt is necessary to stop and exit from the influxdb.py script.

๐Ÿ’ก Visualizing codecarbon logs

How to profile Baler using codecarbon ?

Baler can be profiled with codecarbon by adding the flag --energyProfile while training

Example:

poetry run baler --project CFD_workspace CFD_project_animation --mode train --energyProfile

The profile logs will be stored at: workspaces/CFD_workspace/CFD_project_animation/output/profiling/

Conventions and Units used by codecarbon

Setup to estimate energy using codecarbon

Plots for Train

Plots for Compression

Plots for Decompression

Summarizing the results

$Mode$ $CO_{2} Emission$ ($CO_{2}$ $Eqv\ in\ Kg$) $Energy\ Consumed$ ($kWh$)
Train $6.25$ $15.35$
Compress $0.024$ $0.075$
Decompress $0.022$ $0.063$

Note - Scaling factor was introduced simply because the numbers generated were small in magnitude and were difficult to plot. Hence, each values were scaled up by a factor of $10^6$. So, apart from the time axis, if a particular value on any axis is read as $V_{plot}$ from the plot, the value should be scaled down as : \(V_{actual} = V_{plot} * 10^{-6}\)

๐Ÿ” List of all Tools used

No. Profiler/Tool Description
1 cProfile cProfile provides Deterministic Profiling of Python programs. A profile is a set of statistics that describes how often and for how long various parts of the program executed. Is measures the CPU time
2 pyinstruments pyinstruments provides Statistical Profiling of Python programs. It doesnโ€™t track every function call that the program makes. Instead, it records the call stack every 1ms and measures the Wall Clock time
3 experiment-impact-tracker experiment-impact-tracker tracks energy usage, carbon emissions, and compute utilization of the system. Currently, on Linux systems with Intel chips (that support the RAPL or powergadget interfaces) and NVIDIA GPUs. It records power draw from CPU and GPU, hardware information, python package versions and estimated carbon emissions information
4 scalene Scalene is a high-performance CPU, GPU and memory profiler for Python that incorporates AI-powered proposed optimizations
5 memory-profiler memory-profiler is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs
6 memray Memray is a memory profiler for Python. It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to analyze the captured memory usage data.
7 codecarbon codecarbon is a Python package that estimates the hardware electricity power consumption (GPU + CPU + RAM) and apply to it the carbon intensity of the region where the computing is done. The methodology behind the package involves the use a scheduler that, by default, call for the measure every 15 seconds and measures the COโ‚‚ as per the formula Carbon dioxide emissions $(COโ‚‚eq) = C * E$

Here, C = Carbon Intensity of the electricity consumed for computation: quantified as g of COโ‚‚ emitted per kilowatt-hour of electricity and E = Energy Consumed by the computational infrastructure: quantified as kilowatt-hours.
8 Eco2AI Eco2AI is a Python library for COโ‚‚ emission tracking. It monitors energy consumption of CPU & GPU devices and estimates equivalent carbon emissions taking into account the regional emission coefficient.
9 powermetrics with InfluxDB powermetrics gathers and displays CPU usage statistics (divided into time spent in user mode and supervisor mode), timer and interrupt wake-up frequency (total and, for near-idle workloads, those that resulted in package idle exits), and on supported platforms,interrupt frequencies (categorized by CPU number), package C-state statistics (an indication of the time the core complex + integrated graphics, if any, were in low-power idle states), as well as the average execution frequency for each CPU when not idle. This comes with as a default with UNIX and therefore can be considered a standard.

influxDB is a time-series database that can be used to visualize the logs generated by powermetrics

Contributions

I have incorporated some of the profilers into the baler codebase and made multiple commits spread across the subsequent Pull Requests at baler-collaboration/baler and are listed in the reverse chronological order






Apart from this, most of my work and experiments can be found unorganized, across the various branches of the repositories sanam2405/baler and sanam2405/SoftwareEnergyCost and inside the profling folder of this repository

References

[1] Baler - Machine Learning Based Compression of Scientific Data ย  (LINK๐Ÿ”—)

[2] Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning ย  (LINK๐Ÿ”—)

[3] Green Software Foundation ย  (LINK๐Ÿ”—)

[4] Green Algorithms: Quantifying the Carbon Footprintof Computation ย  (LINK๐Ÿ”—)


License

Copyright 2023 Baler-Collaboration. Distributed under the Apache License 2.0. See LICENSE for more information.


Summary

Participating in Google Summer of Code (GSoC) for the very first time was an exhilarating experience for me. Iโ€™m immensely grateful to my mentor, Caterina Doglioni, for this opportunity. I am thankful for her invaluable guidance, feedback and understanding during my tough times through the project.

Special Thanks to Leonid Didukh (@neogyk) for providing immense support and help throughout the program and Anirban Mukerjee (@anirbanm1728) & Krishnaneel Dey (@Krishnaneel) for their valuable feedback on the proposal.

Beyond GSoC, Iโ€™m committed to ongoing contributions to the organization.Feel free to connect on LinkedIn for any suggestions and feedback! ๐Ÿ˜„