20.04 daily testing: Calamares "alongside" failed

I’m testing today’s 20200315 image. There is no QA testcase for “alongside.” But, I tried it. It erred with “command: sfdisk --force --append /dev/sda” “Failed to add partition ‘New Partition’ to device ‘/dev/sda’.” (It repeats that failure msg twice on the same line.).

However, when I start the install a 2nd time, the partition was shrunk, and a new partition exists. If I choose “replace partition” it works fine. (So, it doesn’t seem like anything actually failed the first attempt at “alongside.” I just have to do two steps to get through the installation.).

I’ve tried this twice. It failed (but worked with replace partition on the newly-created partition) both times.

This is happening on i5-4200M 2.5Ghz; 4th gen core processor 4600 gfx; Intel 7260 (Lenovo ThinkPad E440 20C5 laptop). It has a Hitachi 500gb Sata3 hdd, 7200rpm. (HTS725050A7E360). The machine is CSM/legacy bios and is installing MBR/bios (not UEFI/GPT).

I can try “install alongside” with a different machine to see if it’s specific to this one.

(I can’t remember if I ever tried “alongside” with Lubuntu. I’m learning that I need to keep a spreadsheet of the machines & desktops I test; what I do differently with each one; what problems I find; etc. I’m keeping notes, but I can’t pivot on that data to see things grouped differently.).

FWIW: I got the same Calamares “alongside” error on a different laptop.[1] I can open a bug report if this is something not already known.

[1] i5-560M 2.67Ghz; NVIDIA GeForce GT420M; Intel Centrino 6200 (Dell XPS L501X laptop).

The replacement testcases not being available is mostly my fault sorry. Since 18.10 we’ve used a testing-checklist where in the comments I provide machine details first line, second line is usually the testcase based from https://phab.lubuntu.me/w/release-team/testing-checklist/

testcase: full disk, no-encryption, BIOS, no-internet

where my comma separated values are the various options taken from prior checklist (ie. encryption/no-encryption, bios/uefi/secure, yes|no internet.

When I’m able, I’ll delete the last attempt to have testcases amended & start afresh (re-branch, re-add, commit etc). I’ve attempted to do that for some time, but haven’t got there yet (it didn’t look good when popey video did his video and they weren’t there;…)

Your issue with “sfdisk --force --append /dev/sda” looks like https://bugs.launchpad.net/ubuntu/+source/calamares/+bug/1864787 (where you’ll also note I link it as related to 1864791), so I’d encourage you to have a look at it, if it applies click “affects me too” & use that in iso.qa,ubuntu.com bug tracking for iso testing. I’ve only had that issue on a single box myself, and could not re-create it on another dell 755 optiplex box (ie. same make/model/look-box, but a different motherboard as model numbers can mean little).

You may also note in that bug report I was using BTRFS which I’d not use myself, but I was following a testcase found on the aforementioned checklist.

2 Likes

I think what I experienced yesterday was related to MBR partition table. I’m doing the same “alongside” install today using a UEFI-only Ryzen 3/Vega 3 laptop. It didn’t fail. I confirmed that the partition table is GPT.

I’ll go back to one of the bios-only machines to see if it errs again.

1 Like

I installed Lubuntu (20200316) to MBR (erase disk). Then did another install “alongside.” I didn’t get the error.

I did encounter an error on the first (erase disk) install. I manually used KDE Partition Manager to change the partition table to MBR (from GPT; this is a UEFI/Legacy-csm laptop.[1] I was running it UEFI before). The install failed until I used KDE PM to create a partition first.

I’m going to play with that a little more, and will report a bug if I can recreate it.

For now, the sfdisk disk error didn’t seem related to MBR/bios. I can go back to the laptop I had that error on. It is bios-only (not "legacy-csm). This machine (today) was UEFI/Legacy-csm. Maybe there is a difference.

[1] Celeron N2830 2.4Ghz; Intel HD Atom Z36xxx/Z37xxx gfx; Atheros QCA9565 (Toshiba Satellite C55-B5299 laptop).

1 Like

I’m able to recreate errors changing from GPT to MBR/msdos (and vice-versa). But, the error is different for each direction of that change.

  1. If I go from GPT to MBR (change my bios from UEFI to CSM; choose “erase disk” in Calamares):

It works – unless I go into KDE Part. Mgr and set the partition table to MBR myself. Then it fails until I go into KDE Part. Mgr and create a partition. (Then Calamares creates the MBR partition table, and creates a partition for its install.).

That’s the error I mentioned in the previous post, as an observation while I was trying to recreate the other direction (the next case).

  1. If I go from MBR to GPT (change my bios from CSM to UEFI; choose “erase disk” in Calamares):

It fails – unless I go into KDE Part Mgr and set the partition table to GPT myself. Then it works (without having to create a partition in KDE Part. Mgr).

That’s strange because the problem is closely related (switching from one bios & partition). But, the behavior is almost opposite. In one direction (#2): I have to create the partition table (and it doesn’t need a partition to be present. In the other direction (#1): I shouldn’t create the partition table. If I do, then I have to create a partition too.

This was again using the Toshiba C55-B laptop[1] with yesterday’s daily image (20200316). I will test today’s image on a different UEFI/CSM laptop to see if the problem can be recreated.

I still haven’t been able to recreate the original problem (which I was trying to recreate): “install alongside” would fail, but it did resize the partition and all I had to do was choose “replace partition” to get the results of “alongside” (in two steps). That happened on an even older bios-only (no UEFI capability) laptop. A Dell XPS L501X. I want to return to that and see if I can recreate the “alongside” problem.

[1] Celeron N2830 2.4Ghz; Intel HD Atom Z36xxx/Z37xxx gfx; Atheros QCA9565 (Toshiba Satellite C55-B5299 laptop).

Testing today’s image (20200317). I cannot perfectly recreate what I outlined above. But, there is something not right about changing from GPT to MBR (and vice versa).

It seems like Calamares is not compatible with KDE Partition Manager (other distros creating partition tables, etc.). If the partition table was created by Calamares, it seems to work. But, if the partition table was created by KDE Part. Mgr, then Calamares doesn’t like it (unless you create a partition too). For example, when I had the “alongside” failure, I believe the partition being resized was created by a different *buntu flavor. The partition table would have been too (if that installer insisted on creating a partition table even when the right type already exists, like Calamares does).

I’m definitely seeing that when Calamares needs a GPT partition table (because the bios is UEFI), but an msdos/MBR was created with KDE Part. Mgr. Calamares fails trying to create the GPT partition table.

Right now, I can make that calamares “erase disk” fail over and over (on the attempt to create a GPT partition table).

But, I have seen it fail once, then succeed the next time. In that case, I didn’t do anything different.

And then, when I went to KDE Part. Mgr to set the partition table back to msdos, I couldn’t get Calamares to fail again (creating GPT).

I can’t pinpoint exactly what’s happening. It seems like Calamares is sensitive to something about partition tables. If a partition exists, then Calamares is less sensitive. (But, I’ve seen “Erase disk” work when there’s no partition present.).

IMO, Calamares should be run (by default) with maximum logging enabled so an error can be pinpointed and the maintainer can see exactly what happened. The problem doesn’t seem to be consistent enough to explain as steps to recreate. There should be a “send bug report” right from the Calamares error screen so whomever can see exactly what happened. We’re never going to be able to say “this is how to…”

Related topic: I think it would be a good idea if Lubuntu came with KSystemLog by default. It doesn’t seem like a big tool. If KDE Partition Manager is a good fit with LXQt, I think KSystemLog could be too. It might make it easier for people to see what’s happening. Puts them closer to the logs.

In this comment I’m talking about http://iso.qa.ubuntu.com/modules/qatracker/misc/bug.png (" Lubuntu failed install “sfdisk --force --append /dev/sda”)

Given we’re getting close to release time (that and I’ve been almost unable to achieve/do anything for awhile) it might be worthwhile for us to create a mitigation plan, ie. document a strategy to get around the issue should it not be fixed (this can be included in release notes if deemed necessary).

I have 20.04 installed on my d755-5 system, which I would achieve by booting the ‘live’ fresh, using KDE Partition Manager to

  • delete the partition that had the sfdisk --force --append /dev issue manually
  • create a new partition in the space where I’ll install into
  • ensure ‘swap’ isn’t mounted, then exit KDE Partition Manager
    then run calamares (installer) normally, use ‘Manual Partitioning’ (not ‘alongside’) to install into my created partition. I didn’t have issues with this on my d755-5 (the box that had this issue)

It might be useful to confirm this method works for you, or you write how you can ‘work around’ the issue for your box, and I can then ‘confirm’ your work-around works for me too.

For completeness, I’ve written this based from http://iso.qa.ubuntu.com/qatracker/milestones/408/builds/208267/testcases/1701/results comment #7 in the bug report. The KDE partition manager steps fail if attempted after a calamares failed install (why I wrote ‘fresh’ at the start)

I have only 3 systems currently that I’ll fully use (full-disk or test any combination with), on others it’s usually replace a partition using Manual Partitioning only, or use for ‘live’ only.

FYI: the d755-5 name of my system won’t mean anything to you, it helps me remember which I’m talking about… the 5 being the ram size; as I have 4, 5, 6 & 8 gb in 755 boxes

I think something may have changed with today’s (0317) daily image. I was working with 0316, and could recreate the problem (which seemed to be related to switching between GPT & MBR/msdos). I was convinced I found a way anyone could recreate it. It was absolutely consistent.

Then I zsynced to 0317, and it didn’t work the same way. After it worked through a couple of failures, I couldn’t get it to fail at all.

So, either that’s huge coincidence (and some flaky condition went away), or something changed (improved) with the new daily.

[I’ve never thought about this before, but I’m kinda not liking Ubuntu’s fixed release schedule. If something’s not ready to be released, it shouldn’t be. I think it would work better if releases occurred when things were ready. Not a drop-dead date. That probably works better for the organization. But, not the end user who could wait another month without noticing it.

I’m just thinking out loud. I would have never thought about this if I hadn’t got involved in testing, and realize how it really could be this way. “Time’s up!” That’s gotta be rough compared to a curated release where it can wait until bugs are worked out.]

You probably know this, but released when ready is what Debian does.

For businesses and those that plan ahead, knowing predictably when a release occurs (ie. 2nd last Thursday in April) is very nice especially given it doesn’t require looking up in a calendar/schedule (except to know the date); if it gets pushed a week it’s the last Thursday.

As all (freeze etc) dates are known in advance I really think Ubuntu’s approach works, and is the more professional option. The schedule was last edited 2019-10-17 because FF was finally known as Focal Fossa, so the non-changing status I think is a plus (esp. for the enterprise users).

Being fixed is good :slight_smile: As you’re aware of the issue, just be on the watch for it should it re-occur. That detail can be really useful to a developer when looking at a bug report (narrowing down when changes occur etc).

How much desktop use is corporate? I would have thought individuals have 80% (at least). In that case, I would think a more flexible “when it’s ready” release schedule would be better. But… it doesn’t matter. I’m not going to change anything. It’s a completely pointless thought.

I will. But, if it continues to be a thing, I think Calameres should be integrated into the “Ubuntu encountered a problem, send a report?” workflow. It is absolutely inhumane to expect people to figure out what’s wrong when the problem is that random (that it can be repeatable for 2 hours, and then virtually disappear, only to reappear). Whomever’s behind that package should enable serious debug logging, and notify themselves of these problems.

I’ll watch for it. But, I’m not spending much more time on that. I’d suggest going back to Ubiquity if Calamares can’t figure out what it’s doing.

It’s not just desktop.

As I recall a very recent release was delayed a week because of issues with new Ubuntu Core images, or from the UWN 617

Łukasz Zemczak tells us the Ubuntu 18.04.4 release date has been pushed back a week due to problems with the Ubuntu Core 18 images. All testers are thanked with a plan to re-spin uc18-only, an apology is given over the delay. The release is now expected February 13, 2020.

For the company Canonical who do a lot of the heavy lifting for all flavors, and support Ubuntu desktop - it makes sense for them to have common release dates for all of server, desktop and specialist releases (like Ubuntu Core for IoT devices).

I recall Popey (Alan Pope) talking about corporate use (podcast(s)) of desktop a number of times, and it was not unsubstantial even if somewhat vague (sorry I don’t remember what was said, only my reaction being it felt more than I’d have guessed even if from a company man careful with words). Little indications have appeared from time to time (eg. https://ubuntu.com/blog/a-first-look-at-desktop-metrics)

FYI: Calamares failed again with an “erase disk” install. (Same Toshiba laptop, UEFI mode). It failed during the “create new partition table (type: gpt)” when the existing table was gpt.

I believe that existing partition table was first created last night when I tested Xubuntu 0318. (I did an “alongside” with Mate this morning, and again with Budgie this morning. Now I’m doing “erase disk” with Lubuntu.).

My perception continues to be that Calamares doesn’t like partition tables created by other tools.

Right now, doing the same “erase disk” install over and over fails the same way (creating a new GPT partition table when one already exists).

I went into KDE Part Mgr and created a new GPT. Calamares failed again creating its GPT table (when one already exists). I went back into KDE PM and created an ext4 partition. Then Calamares got past the “creating GPT table.” It’s now installing.

Seriously, Calamares needs to gather more contextual information and display it on the error screen. During this exercise it has felt like the problem is reproduceable. But, then, like yesterday, suddenly the problem goes away and nothing I do can recreate it. Calamares is not helping us help them when its error messages are so void of any details about what it doesn’t like. I’m not going to beat my brains out trying to chase a mostly random problem (to identify what they could capture and display on the screen. Or, better yet, send to themselves with an “Ubuntu detected an error, send report?” dialogue.).

If the solution is to say “use manual partitioning,” why not just go back to Ubiquity until Calamares is ready for primetime?

FWIW: I switched to testing my Ryzen 5/Vega 8 laptop. I installed today’s Kubuntu “entire disk”, then today’s Lubuntu “entire disk.” I thought that might trigger Calamares (if it’s sensitive to partition tables created by something else.).

Then I realized I’m assuming Kubuntu creates a partition table when one of the same kind already exists (i.e., maybe it doesn’t behave the same way as Calamares in this regard. Calamares creates partititon table even though one already exists of the type it creates.).

So, I installed Studio’s daily image “entire disk.” But, before installing, I used its gparted to create an MBR/msdos partition table. This laptop is UEFI-only. I assume that would cause Studio to create a gpt partition table. (I don’t think it would install “entire disk” to MBR if it’s on a UEFI machine.).

Then I installed today’s Lubuntu again, entire disk. Calamares did not fail creating a GPT table.

Maybe there’s something wrong with the drive in my Toshiba laptop. Maybe I should swap it out and see if I continue to have those errors. I’m pretty sure I’ve seen the error with other machines. But, I can’t say for sure. I haven’t been keeping good notes. (Next time I’ll use a spreadsheet so I can group things better.). I’ve got a feeling it might be specific to UEFI/CSM machines (not bios only, not uefi only).

But, I’ll say again: the error message Calamares gives is not very informative. They take up half the screen with very little information. “This failed.” They should be providing environmental context, things that would help identify what’s similar each time “this failed” occurs. There doesn’t seem to be any procedural similarity. No way to recreate it without shooting in the dark forever.

I think I’ll test a couple different csm/uefi machines tomorrow (see if it happens again). If not, then I’ll swap the Toshiba’s hard disk. (That hard disk has been used 12 hours a day every day for 3-4 years. Maybe it has a problem.).

Thinking about that more: I’m positive it’s not the hard drive. If it were, all the other distros I installed would have had at least one problem. They haven’t. The Calamares failures are very specific to Calamares.

They may be specific to the Toshiba csm/uefi laptop. But, I got the “alongside” error on the Dell XPS 501X. That’s a bios-only machine.

It seems like I never get the error on the uefi-only machines (two new Acers, Ryzen 3 & 5). I’ve gotten it the most with the Toshiba (but no other *buntu flavor has failed on these disk operations, on this laptop). I don’t believe I’ve had the “alongside” error with the Toshiba. Just the “erase disk” (when it creates a new partition table). I had the “alongside” error on the bios-only Dell. I don’t recall if I had the “erase disk” error with that.

So, it seems like there is a pattern that way. I feel like I’ve seen patterns in terms of changing between mbr & gpt, or who created the partition table. I’ve definitely seen a pattern where a partition has to exist before Calamares can successfully replace the partition table. (But, these things aren’t absolute. I’ve seen it replace the partition table when there is no partition too.).

The next testing I do will be with 3 other csm/uefi laptops. That will narrow it down to whether if the problem is just one or them, or related to the class of machine. Then I can test the two bios-only machines (which I got the “alongside” error with.).

I pulled out the Thinkpad E440, went into the bios to make it “uefi only” & “no csm” (that should be the same as the Toshiba. It only has a uefi or csm choice. This Thinkpad has uefi, legacy, both. And, it has “csm” as an additional choice.).

Calamares failed the very first attempt to “erase disk” while trying to create the gpt partition table.

The disk has an MBR/msdos partition table (the bios was “legacy first” & “csm=yes” before I changed it for this session.). It has three Linux partitions from the last time I tested with it. (I must have done “alongside” twice.).

The first partition is Budgie. So, there is a good chance Budgie created the partition table. I’ve detected this pattern before. Calamares seems to fail when creating the partition table when it was created by someone else.

I rebooted. Calamares still fails. I think to make this work I have to go into KDE Part Mgr and create the GPT partition table for Calamares. Then it will replace that one with its own. But, I might have to create a partition. (Sometimes it can’t replace KDE PM’s partition table when there’s no partitions present.).

This problem does seem to be in the csm/uefi class of machine. It’s not specific to the Tosiba. I have two others I can test.

Instead of going further with that Thinkpad E440 (and then frustrating myself that I can’t recreate what’s staring me in the face right now), I turned on the Dell Lat E5420.

It has the exact same problem. First try, just like the E440.

I went into the bios and set it to uefi (apparently I was running it legacy all this time. I wasn’t concerned about it. I have uefi machines. Some of the csm/uefi machines are set to uefi. Some legacy.).

The drive has an MBR/msdos partition table (like the E440) because it’s been running legacy previously. The drive has two Linux partitions. The first one is Lubuntu. (Apparently Calamares created the MBR table, unless I installed with “replace partition,” which I rarely do. But, it’s possible.).

I boot today’s (yyyy0318) Lubuntu daily image, start the installation, choose erase disk. It fails creating the GPT partition table.

This is an exact duplicate of the Thinkpad E440. First boot of each. The only difference is that Calamares probably created this MBR partition that it’s now stumbling upon. (There is a slight chance I reused a partition table by putting Lubuntu in that first partition with a “replace partition” install. My theory that Calamares is sensitive to other partition tables might not be as solid as I thought.).

One additional point: I left out something above. It doesn’t affect anything I describe above, but the first boot was accidentally in legacy mode. Calamares detected that and tried to create a new msdos partition table (not gpt, as reported above). That failed too. So, it’s not just gpt. I accidentally ran into that this time because I chose the wrong boot menu-item, and it turned out to boot as a legacy bios.

I do like calamares myself. I’ll also provide the following
sudo -E calamares -d
which can be used when running/installing to get verbose debugging output from calamares (provided by Walter/wxl in comment #11 of https://bugs.launchpad.net/ubuntu/+source/calamares/+bug/1864787 and found in ~/.cache/calamares/session.log)

If I’m suspicious of hardware issues, I usually open a terminal and quickly scan for errors in dmesg and/or `journalctl. At least once I’ve filed a bug about a suspect issue that kept appearing that I suspected was more than just my hardware, but I believe I soon decided it was my hardware and changed the bug status to “invalid”. If the bug ID is used in the iso.qa.ubuntu.com report (in bugs fields) it’ll be picked up for other users to test for, so on those occasions I usually opt for commenting the bug only unless I’m sure.

If you get the same error/issue on two different laptops though, you’ve pretty much ruled out hardware anyway.

I’m not capable of providing any specific advice on your actual bugs, or issues with calamares. The issues & observations you’ve described in your last few posts, do they appear in lp.bug.reports too?

You’ve been testing heaps I note, so thank you. (You’ll soon pass me in the focal daily I see on http://iso.qa.ubuntu.com/qatracker/reports/testers)

It’s been so hard to describe (not repeatable; patterns end up being invalidated), I never felt like I had enough information to report. I’ve just bee talking out loud about a perplexing, seemingly random behavior.

For, example: I have asserted (with a fair degree of confidence) that the behavior only happens with dual-bios type machines (uefi/legacy). Just moments ago, my Ryzen 3 uefi-only Acer failed while creating the new (erase disk) gpt table.

I’ll restart Calamares from the command line using the debug parms and open a bug report with that info.

(sigh). I couldn’t recreate it. Calamares worked the next time I ran it (from the command line, passing it debug parms).

It’s probably not a problem. As you said, people can use manual partitioning. It also sounds like I’m the only person experiencing it. Maybe it just stands out to me because I see it so much. In the real world someone would only see it once (as they typically only install once.). So, it’s not that bad in terms of how much any single person would be affected.

[EDIT: The way having debugging enabled coincided with Calamares working, that makes me wonder if it’s a timing problem. Maybe Calamares isn’t waiting long enough for the result?

That would fit my experience with “alongside.” It failed, but when I went into partition manager, it appeared it had succeeded. All I had to do was start Calamares again and “replace partition” as a 2nd step of “alongside.”

I’ve also noticed Calamares will fail after creating a partition table in KDE Part. Mgr. But, if I create a partition too, then Calamares succeeds. That isn’t 100% predictable. But, I’ve noticed a trend in that direction. Maybe having a partition present causes Calamares to work a little harder, and consequently waits long enough for the partition table to be created.]

2 Likes