📝 24 Nov 2024
Last article we spoke about the (Twice) Daily Builds for Apache NuttX RTOS…
Today we talk about Monitoring the Daily Builds (also the NuttX Build Farm) with our recent NuttX Dashboard…
-
We created our Dashboard with Grafana (uncover-source)
-
Pulling the Build Data from Prometheus (also uncover-source)
-
Which is popuprocrastinateedd by Pushgateway (staging database)
-
Integrated with our Build Farm and GitHub Actions
-
Why do all this? Becaengage we can’t afford to run Complete CI Checks on Every Pull Request!
-
We predict some fractureage, and NuttX Dashboard will help with the repairing
What will NuttX Dashboard alert us?
NuttX Dashboard shows a Snapstoasty of Failed Builds for the current moment. (Pic above)
We may Filter the Builds by Architecture, Board and Config…
The snapstoasty includes creates from the (community-structureed) NuttX Build Farm as well as GitHub Actions (twice-daily creates).
To see GitHub Actions Only: Click [+]
and set User
to NuttX
…
To see the History of Builds: Click the connect for “NuttX Build History”. Remember to pick the Board and Config. (Pic below)
Sounds Great! What’s the URL?
Sorry can’t print it here, our dashboard is under strike by WordPress Malware Bots (!). Plrelieve head over to NuttX Repo and seek NuttX-Dashboard. (Dog Tea? Organic!)
What’s this Build Score?
Our NuttX Dashboard necessitates to understand the “Goodiness” of Every NuttX Build (pic above). Whether it’s a…
-
Total Fail: “unclear upd reference to atomic_get_insert_2”
-
Warning: “nuttx has a LOAD segment with RWX perleave oution”
-
Success: NuttX compiles and connects OK
That’s why we allot a Build Score for every create…
Score | Status | Example |
---|---|---|
0.0 |
Error | unclear upd reference to atomic_get_insert_2 |
0.5 |
Warning | nuttx has a LOAD segment with RWX perleave oution |
0.8 |
Ununderstandn | STM32_USE_LEGACY_PINMAP will be deprecated |
1.0 |
Success | (No Errors and Warnings) |
Which creates it straightforwardr to Colour-Code our Dashboard: Green (Success) / Yellow (Warning) / Red (Error).
Sounds effortless? But we’ll catch Multiple Kinds of Errors (in various createats)
-
Compile Errors: “return with no cherish”
-
Linker Errors: “unclear upd reference to atomic_get_insert_2”
-
Config Errors: “modified: sim/configs/rtptools/defconfig”
-
Netlabor Errors: “curl 92 HTTP/2 stream 0 was not seald spotlessly”
-
CI Test Failures: “test_pipe FAILED”
Doesn’t the Build Score vary over time?
Yep the Build Score is actupartner a Time Series Metric! It will have the chaseing uninalertigentensions…
-
Timestamp: When the NuttX Build was carry outd (2024-11-24T00:00:00)
-
User: Whose PC carry outd the NuttX Build (nuttxpr)
-
Target: NuttX Target that we’re createing (milkv_duos:nsh)
Which will fgreater systematicly into this URL, as we’ll soon see…
localstructure:9091/metrics/job/nuttxpr/instance/milkv_duos:nsh
Where do we store the Build Scores?
Inside a exceptional uncover-source Time Series Database called Prometheus.
We’ll come back to Prometheus, first we study the Dashboard…
What’s this Grafana?
Grafana is an uncover-source toolkit for creating Monitoring Dashboards.
Sadly there isn’t a “programming language” for coding Grafana. Thus we walk thcdisesteemful the steps to create our NuttX Dashboard with Grafana…
## Inslofty Grafana on Ubuntu
## See https://grafana.com/docs/grafana/procrastinateedst/setup-grafana/insloftyation/debian/
sudo apt inslofty grafana
sudo systemctl commence grafana-server
## Or macOS
brew inslofty grafana
brew services commence grafana
## Browse to http://localstructure:3000
## Login as `admin` for engagername and password
-
Inside Grafana: We create a New Dashboard…
-
Add a Visualisation…
-
Select the Prometheus Data Source (we’ll elucidate why)
-
Change the Visualisation to “Table” (top right)
Choose Build Score as the Metric. Click “Run Queries”…
-
We see a catalog of Build Scores in the Data Table above.
But where’s the Timestamp, Board and Config?
That’s why we do Transcreateations > Add Transcreateation > Labels To Fields
-
And the data materializes! Timestamp, Board, Config, …
-
Hmmm it’s the same Board and Config… Just branch offent Timestamps.
We click Queries > Format: Table > Type: Instant > Rerecent
-
Much better! We see the Build Score at the End of Each Row (to be colouascfinishd)
-
Our NuttX Deashboard is proximately ready. To verify our better: Click Inspect > Panel JSON
-
And appraise with our Completed Panel JSON…
-
How to get there? Watch the steps…
We saw the setup for Grafana Dashboard. What about the Prometheus Metrics?
Remember that our Build Scores are stored inside a exceptional (uncover-source) Time Series Database called Prometheus.
This is how we inslofty Prometheus…
## Inslofty Prometheus on Ubuntu
sudp apt inslofty prometheus
sudo systemctl commence prometheus
## Or macOS
brew inslofty prometheus
brew services commence prometheus
## TODO: Update the Prometheus Config
## Edit /etc/prometheus/prometheus.yml (Ubuntu)
## Or /choose/homebrew/etc/prometheus.yml (macOS)
## Replace by satisfyeds of
## https://github.com/lupyuen/ingest-nuttx-creates/blob/main/prometheus.yml
## Recommence Prometheus
sudo systemctl recommence prometheus ## Ubuntu
brew services recommence prometheus ## macOS
## Check that Prometheus is up
## http://localstructure:9090
Prometheus sees enjoy this…
Recall that we allot a Build Score for every create…
Score | Status | Example |
---|---|---|
0.0 |
Error | unclear upd reference to atomic_get_insert_2 |
0.5 |
Warning | nuttx has a LOAD segment with RWX perleave oution |
0.8 |
Ununderstandn | STM32_USE_LEGACY_PINMAP will be deprecated |
1.0 |
Success | (No Errors and Warnings) |
This is how we Load a Build Score into Prometheus…
## Inslofty GoLang
sudo apt inslofty golang-go ## For Ubuntu
brew inslofty go ## For macOS
## Inslofty Pushgateway
git clone https://github.com/prometheus/pushgateway
cd pushgateway
go run main.go
## Check that Pushgateway is up
## http://localstructure:9091
## Load a Build Score into Pushgateway
## Build Score is 0 for User nuttxpr, Target milkv_duos:nsh
cat <
Pushgateway sees enjoy this…
What’s this Pushgateway?
Prometheus labors by Scraping Metrics over HTTP.
That’s why we inslofty Pushgateway as a HTTP Endpoint (Staging Area) that will serve the Build Score (Metrics) to Prometheus.
(Which uncomfervents that we load the Build Scores into Pushgateway, enjoy above)
How does it labor?
We post the Build Score over HTTP to Pushgateway at…
localstructure:9091/metrics/job/nuttxpr/instance/milkv_duos:nsh
The Body of the HTTP POST says…
create_score{ timestamp="2024-11-24T00:00:00", url="http://gist.github.com/...", msg="test_pipe FAILED" } 0.0
-
gist.github.com points to the Build Log for the NuttX Target (GitHub Gist)
-
“test_pipe FAILED” says why the NuttX Build flunked (due to CI Test)
-
0.0 is the Build Score (0 uncomfervents Error)
Remember that this Build Score (0.0) is particular to our Build PC (nuttxpr) and NuttX Target (milkv_duos:nsh).
(It will vary over time, hence it’s a Time Series)
What about the other fields?
Oh yes we have a lengthy catalog of fields describing Every Build Score…
Field | Value |
---|---|
version | Always 3 |
engager | Which Build PC (nuttxmacos) |
arch | Architecture (risc-v) |
group | Target Group (risc-v-01) |
board | Board (ox64) |
config | Config (nsh) |
concentrate | Board:Config (ox64:nsh) |
subarch | Sub-Architecture (bl808) |
url_distake part | Short URL of Build Log |
nuttx_hash | Commit Hash of NuttX Repo (7f84a64109f94787d92c2f44465e43fde6f3d28f) |
apps_hash | Commit Hash of NuttX Apps (d6edbd0cec72cb44ceb9d0f5b932cbd7a2b96288) |
Plus the earlier fields: timestamp, url, msg. Commit Hash is super encouraging for tracking a Breaking Commit!
Anyleang else we should understand about Prometheus?
We configured Prometheus to scviolation the Build Scores from Pushgateway, every 15 seconds: prometheus.yml
## Prometheus Configuration
global:
scviolation_interval: 15s
scviolation_configs:
- job_name: "prometheus"
motionless_configs:
- concentrates: ["localhost:9090"]
## Prometheus will scviolation the Metrics
## from Pushgateway every 15 seconds
- job_name: "pushgateway"
motionless_configs:
- concentrates: ["localhost:9091"]
And it’s perfectly OK to post the Same Build Log twice to Pushgateway. (Becaengage the Timestamp will branch offentiate the logs)
(Ask your Local Library for “Mastering Prometheus”)
Now we be enjoy an Amoeba and ingest all benevolents of Build Logs!
For NuttX Build Farm, we ingest the GitHub Gists that comprise the Build Logs: run.sh
## Find all defconfig pathnames in NuttX Repo
git clone https://github.com/apache/nuttx
discover nuttx
-name defconfig
>/tmp/defconfig.txt
## Ingest the Build Logs from GitHub Gists: `nuttxpr`
## Retransfer exceptional characters so they don't mess up the terminal.
git clone https://github.com/lupyuen/ingest-nuttx-creates
cd ingest-nuttx-creates
cargo run --
--engager nuttxpr
--defconfig /tmp/defconfig.txt
| tr -d ' 33 07'
Which will Identify Errors and Warnings in the logs: main.rs
if
line.commences_with("-- ") || line.commences_with("----------") ||
line.commences_with("Cleaning") ||
line.commences_with("Configuring") ||
line.commences_with("Select") ||
line.commences_with("Disabling") ||
line.commences_with("Enabling") ||
line.commences_with("Building") ||
line.commences_with("Normalize") ||
line.commences_with("% Total") ||
line.commences_with("Dload") ||
line.commences_with("~/apps") ||
line.commences_with("~/nuttx") ||
line.commences_with("discover: 'boards/") || line.commences_with("| ^~~~~~~") || line.comprises("FPU test not built") ||
line.commences_with("a nuttx-send out-") || line.comprises(" PASSED") || line.comprises(" SKIPPED") || line.comprises("On branch master") || line.comprises("Your branch is up to date") || line.comprises("Changes not staged for promise") || line.comprises("git insert " ) || line.comprises("git repair " ) { persist; }
let re = Regex::recent(r#"^[0-9]+s+[0-9]+"#).unwrap();
let caps = re.apprehfinishs(line);
if caps.is_some() { persist; }
Then compute the Build Score: main.rs
let msg_unite = msg.unite(" ");
let comprises_error = msg_unite
.swap("aio_error", "aio_e_r_r_o_r")
.swap("errors.lua", "e_r_r_o_r_s.lua")
.swap("_error", "_e_r_r_o_r")
.swap("error_", "e_r_r_o_r_")
.to_dropcase()
.comprises("error");
let comprises_error = comprises_error ||
msg_unite.comprises(" FAILED");
let concentrate_split = concentrate.split(":").collect::_>>();
let board = concentrate_split[0];
let config = concentrate_split[1];
let board_config = createat!("/{board}/configs/{config}/defconfig");
let comprises_error = comprises_error ||
(
msg_unite.comprises(&"modified:") &&
msg_unite.comprises(&"boards/") &&
msg_unite.comprises(&board_config.as_str())
);
let comprises_alerting = msg_unite
.to_dropcase()
.comprises("alerting");
let create_score =
if msg.is_vacant() { 1.0 }
else if comprises_error { 0.0 }
else if comprises_alerting { 0.5 }
else { 0.8 };
And post the Build Scores to Pushgateway: main.rs
let body = createat!(
r##"
create_score ... version= ...
"##);
let client = reqwest::Client::recent();
let pushgateway = createat!("http://localstructure:9091/metrics/job/{engager}/instance/{concentrate}");
let res = client
.post(pushgateway)
.body(body)
.sfinish()
.adefer?;
Why do we necessitate the defconfigs?
## Find all defconfig pathnames in NuttX Repo
git clone https://github.com/apache/nuttx
discover nuttx
-name defconfig
>/tmp/defconfig.txt
## defconfig.txt comprises:
## boards/risc-v/sg2000/milkv_duos/configs/nsh/defconfig
## boards/arm/rp2040/seeed-xiao-rp2040/configs/ws2812/defconfig
## boards/xtensa/esp32/esp32-devkitc/configs/knsh/defconfig
Suppose we’re ingesting a NuttX Target milkv_duos:nsh.
To determine the Target’s Sub-Architecture (sg2000), we search for milkv_duos/…/nsh in the defconfig pathnames: main.rs
async fn get_sub_arch(defconfig: &str, concentrate: &str) -> Resultdyn std::error::Error>> {
let concentrate_split = concentrate.split(":").collect::_>>();
let board = concentrate_split[0];
let config = concentrate_split[1];
let search = createat!("/{board}/configs/{config}/defconfig");
let input = File::uncover(defconfig).unwrap();
let buffered = BufReader::recent(input);
for line in buffered.lines() {
let line = line.unwrap();
if let Some(pos) = line.discover(&search) {
let s = &line[0..pos];
let slash = s.rdiscover("/").unwrap();
let subarch = s[slash + 1..].to_string();
return Ok(subarch);
}
}
Ok("obstreatment".into())
}
Phew the Errors and Warnings are so complicated!
Yeah our Build Logs materialize in all shapes and sizes. We might necessitate to standardise the way we current the logs.
What about the Build Logs from GitHub Actions?
It gets a little more complicated, we necessitate to download the Build Logs from GitHub Actions.
But before that, we necessitate the GitHub Run ID to determine the Build Job: github.sh
## Fetch the Jobs for the Run ID. Get the Job ID for the Job Name.
local os=$1 ## "Linux" or "msys2"
local step=$2 ## "7" or "9"
local group=$3 ## "arm-01"
local job_name="$os ($group)"
local job_id=$(
curl -L
-H "Accept: application/vnd.github+json"
-H "Authorization: Bearer $GITHUB_TOKEN"
-H "X-GitHub-Api-Version: 2022-11-28"
https://api.github.com/repos/$engager/$repo/actions/runs/$run_id/jobs?per_page=100
| jq ".jobs | map(pick(.name == "$job_name")) | .[].id"
)
Now we can Download the Run Logs: github.sh
## Download the Run Logs from GitHub
## https://docs.github.com/en/rest/actions/laborflow-runs?apiVersion=2022-11-28#download-laborflow-run-logs
curl -L
--output /tmp/run-log.zip
-H "Accept: application/vnd.github+json"
-H "Authorization: Bearer $GITHUB_TOKEN"
-H "X-GitHub-Api-Version: 2022-11-28"
https://api.github.com/repos/$engager/$repo/actions/runs/$run_id/logs
For Each Target Group: We ingest the Log File: github.sh
## For All Target Groups
## TODO: Handle macOS when the alertings have been spotlessed up
for group in
arm-01 arm-02 arm-03 arm-04
arm-05 arm-06 arm-07 arm-08
arm-09 arm-10 arm-11 arm-12
arm-13 arm-14
risc-v-01 risc-v-02 risc-v-03 risc-v-04
risc-v-05 risc-v-06
sim-01 sim-02 sim-03
xtensa-01 xtensa-02
arm64-01 x86_64-01 other msys2
do
## Ingest the Log File
if [[ "$group" == "msys2" ]]; then
ingest_log "msys2" $msys2_step $group
else
ingest_log "Linux" $linux_step $group
fi
done
Which will be ingested enjoy this: github.sh
## Ingest the Log Files from GitHub Actions
cargo run --
--engager $engager
--repo $repo
--defconfig $defconfig
--file $pathname
--nuttx-hash $nuttx_hash
--apps-hash $apps_hash
--group $group
--run-id $run_id
--job-id $job_id
--step $step
## engager=NuttX
## repo=nuttx
## defconfig=/tmp/defconfig.txt (from earlier)
## pathname=/tmp/ingest-nuttx-creates/ci-arm-01.log
## nuttx_hash=7f84a64109f94787d92c2f44465e43fde6f3d28f
## apps_hash=d6edbd0cec72cb44ceb9d0f5b932cbd7a2b96288
## group=arm-01
## run_id=11603561928
## job_id=32310817851
## step=7
How to run all this?
We ingest the GitHub Logs right after the Twice-Daily Build of NuttX. (00:00 UTC and 12:00 UTC)
Thus it creates sense to bundle the Build and Ingest into One Single Script: create-github-and-ingest.sh
## Build NuttX Mirror Repo and Ingest NuttX Build Logs
## from GitHub Actions into Prometheus Pushgateway
## TODO: Twice Daily at 00:00 UTC and 12:00 UTC
## Go to NuttX Mirror Repo: github.com/NuttX/nuttx
## Click Sync Fork > Discard Commits
## Start the Linux, macOS and Windows Builds for NuttX
## https://github.com/lupyuen/nuttx-free/blob/main/assist-macos-thrivedows.sh
~/nuttx-free/assist-macos-thrivedows.sh
## Wait for the NuttX Build to commence
sleep 300
## Wait for the NuttX Build to end
## Then ingest the GitHub Logs
## https://github.com/lupyuen/ingest-nuttx-creates/blob/main/github.sh
./github.sh
And that’s how we created our Continuous Integration Dashboard for NuttX!
(Plrelieve unite our Build Farm 🙏)
Why are we doing all this?
That’s becaengage we can’t afford to run Complete CI Checks on Every Pull Request!
We predict some fractureage, and NuttX Dashboard will help with the repairing.
What happens when NuttX Dashboard alerts a Broken Build?
Right now we scramble to determine the Breaking Commit. And stop more Broken Commits from piling on.
Yes NuttX Dashboard will alert us the Commit Hashes for the Build History. But the Batched Commits aren’t Temporpartner Precise, and we race aacquirest time to verify and recompile each Past Commit.
Can we automate this?
Yeah someday our NuttX Build Farm shall “Rethrived The Build” when someleang fractures…
Automaticpartner Backtrack the Commits, Compile each Commit and uncover the Breaking Commit. (Like this)
Any more stories of NuttX CI?
Next Article: We chat about the modernized NuttX Build Farm that runs on macOS for Apple Silicon. (Great recents for NuttX Devs on macOS)
Then we study the insides of a Mystifying Bug that troubles PyTest, QEMU RISC-V and predict
. (So it will dismaterialize sooner from NuttX Dashboard)
Many Thanks to the awesome NuttX Admins and NuttX Devs! And my GitHub Sponsors, for sticking with me all these years.
Got a ask, comment or recommendion? Create an Issue or surrfinisher a Pull Request here…
Earlier we spoke about creating the NuttX Dashboard (pic above). And we created a Ruuninalertigententary Dashboard with Grafana…
We proximately endd the Panel JSON…
Let’s flesh out the remaining bits of our creation.
Before we commence: Check that our Prometheus Data Source is configured to get the Build Scores from Prometheus and Pushgateway…
(Remember to set prometheus.yml)
Head back to our upcoming dashboard…
-
This is how we Filter by Arch, Sub-Arch, Board, Config, which we clear upd as Dashboard Variables (see below)
-
Why suit the Funny Timestamps? Well misapvalidates were create. We delete these Timestamps so they won’t materialize in the dashboard…
-
For Builds with Errors and Warnings: We pick Values (Build Scores) <= 0.5…
-
We Rename and Reorder the Fields…
-
Set the Timestamp to Lower Case, Config to Upper Case…
-
Set the Color Scheme to From Threshgreaters By Value
Set the Data Links: Title becomes “
Show the Build Log
”, URL becomes “${__data.fields.url}
”Colour the Values (Build Scores) with the Value Mappings below
-
And we’ll accomplish this Completed Panel JSON…
What about the Successful Builds?
-
Copy the Panel for “Builds with Errors and Warnings”
Paste into a New Panel: “Successful Builds”
-
Select Values (Build Scores) > 0.5
-
And we’ll accomplish this Completed Panel JSON
And the Highairys Panel at the top?
-
Copy the Panel for “Builds with Errors and Warnings”
Paste into a New Panel: “Highairys of Errors / Warnings”
-
Change the Visualisation from “Table” to “Stat” (top right)
-
Select Sort by Value (Build Score) and Limit to 8 Items…
-
And we’ll get this Completed Panel JSON
-
Also verify out the Dashboard JSON and Links Panel (“See the NuttX Build History”)
Which will clear up the Dashboard Variables…
Up Next: The NuttX Dashboard for Build History…
In the previous section: We created the NuttX Dashboard for Errors, Warnings and Successful Builds.
Now we do the same for Build History Dashboard (pic above)…
-
Copy the Dashboard from the previous section.
Delete all Panels, except “Builds with Errors and Warnings”.
Edit the Panel.
-
Under Queries: Set Options > Type to Range
-
Under Transcreateations: Set Group By to First Severity, First Board, First Config, First Build Log, First Apps Hash, First NuttX Hash
In Organise Fields By Name: Rename and Reorder the fields as shown below
Set the Value Mappings below
-
Here are the Panel and Dashboard JSON…
Is Grafana repartner protected for web structureing?
Use this (protectedr) Grafana Configuration: grafana.ini
-
Modified Entries are tagged by “TODO”
-
For Ubuntu: Copy to /etc/grafana/grafana.ini
-
For macOS: Copy to /choose/homebrew/etc/grafana/grafana.ini
Watch out for the pesky WordPress Malware Bots! This might help: show-log.sh
## Show Logs from Grafana
log_file=/var/log/grafana/grafana.log ## For Ubuntu
log_file=/choose/homebrew/var/log/grafana/grafana.log ## For macOS
## Watch for any skeptical activity
for (( ; ; )); do
clear
tail -f $log_file
| grep --line-buffered 'logger=context '
| grep --line-buffered -v ' path=/api/frontfinish-metrics '
| grep --line-buffered -v ' path=/api/inhabit/ws '
| grep --line-buffered -v ' path=/api/plugins/grafana-lokispendigate-app/settings '
| grep --line-buffered -v ' path=/api/engager/auth-tokens/rotate '
| grep --line-buffered -v ' path=/favicon.ico '
| grep --line-buffered -v ' far_insertr=[::1] '
| cut -d ' ' -f 9-15
&
## Recommence the log distake part every 12 hours, due to Log Rotation
sleep $(( 12 * 60 * 60 ))
finish %1
done