20.04 daily testing: Calamares "alongside" failed

It’s not just desktop.

As I recall a very recent release was delayed a week because of issues with new Ubuntu Core images, or from the UWN 617

Łukasz Zemczak tells us the Ubuntu 18.04.4 release date has been pushed back a week due to problems with the Ubuntu Core 18 images. All testers are thanked with a plan to re-spin uc18-only, an apology is given over the delay. The release is now expected February 13, 2020.

For the company Canonical who do a lot of the heavy lifting for all flavors, and support Ubuntu desktop - it makes sense for them to have common release dates for all of server, desktop and specialist releases (like Ubuntu Core for IoT devices).

I recall Popey (Alan Pope) talking about corporate use (podcast(s)) of desktop a number of times, and it was not unsubstantial even if somewhat vague (sorry I don’t remember what was said, only my reaction being it felt more than I’d have guessed even if from a company man careful with words). Little indications have appeared from time to time (eg. https://ubuntu.com/blog/a-first-look-at-desktop-metrics)

FYI: Calamares failed again with an “erase disk” install. (Same Toshiba laptop, UEFI mode). It failed during the “create new partition table (type: gpt)” when the existing table was gpt.

I believe that existing partition table was first created last night when I tested Xubuntu 0318. (I did an “alongside” with Mate this morning, and again with Budgie this morning. Now I’m doing “erase disk” with Lubuntu.).

My perception continues to be that Calamares doesn’t like partition tables created by other tools.

Right now, doing the same “erase disk” install over and over fails the same way (creating a new GPT partition table when one already exists).

I went into KDE Part Mgr and created a new GPT. Calamares failed again creating its GPT table (when one already exists). I went back into KDE PM and created an ext4 partition. Then Calamares got past the “creating GPT table.” It’s now installing.

Seriously, Calamares needs to gather more contextual information and display it on the error screen. During this exercise it has felt like the problem is reproduceable. But, then, like yesterday, suddenly the problem goes away and nothing I do can recreate it. Calamares is not helping us help them when its error messages are so void of any details about what it doesn’t like. I’m not going to beat my brains out trying to chase a mostly random problem (to identify what they could capture and display on the screen. Or, better yet, send to themselves with an “Ubuntu detected an error, send report?” dialogue.).

If the solution is to say “use manual partitioning,” why not just go back to Ubiquity until Calamares is ready for primetime?

FWIW: I switched to testing my Ryzen 5/Vega 8 laptop. I installed today’s Kubuntu “entire disk”, then today’s Lubuntu “entire disk.” I thought that might trigger Calamares (if it’s sensitive to partition tables created by something else.).

Then I realized I’m assuming Kubuntu creates a partition table when one of the same kind already exists (i.e., maybe it doesn’t behave the same way as Calamares in this regard. Calamares creates partititon table even though one already exists of the type it creates.).

So, I installed Studio’s daily image “entire disk.” But, before installing, I used its gparted to create an MBR/msdos partition table. This laptop is UEFI-only. I assume that would cause Studio to create a gpt partition table. (I don’t think it would install “entire disk” to MBR if it’s on a UEFI machine.).

Then I installed today’s Lubuntu again, entire disk. Calamares did not fail creating a GPT table.

Maybe there’s something wrong with the drive in my Toshiba laptop. Maybe I should swap it out and see if I continue to have those errors. I’m pretty sure I’ve seen the error with other machines. But, I can’t say for sure. I haven’t been keeping good notes. (Next time I’ll use a spreadsheet so I can group things better.). I’ve got a feeling it might be specific to UEFI/CSM machines (not bios only, not uefi only).

But, I’ll say again: the error message Calamares gives is not very informative. They take up half the screen with very little information. “This failed.” They should be providing environmental context, things that would help identify what’s similar each time “this failed” occurs. There doesn’t seem to be any procedural similarity. No way to recreate it without shooting in the dark forever.

I think I’ll test a couple different csm/uefi machines tomorrow (see if it happens again). If not, then I’ll swap the Toshiba’s hard disk. (That hard disk has been used 12 hours a day every day for 3-4 years. Maybe it has a problem.).

Thinking about that more: I’m positive it’s not the hard drive. If it were, all the other distros I installed would have had at least one problem. They haven’t. The Calamares failures are very specific to Calamares.

They may be specific to the Toshiba csm/uefi laptop. But, I got the “alongside” error on the Dell XPS 501X. That’s a bios-only machine.

It seems like I never get the error on the uefi-only machines (two new Acers, Ryzen 3 & 5). I’ve gotten it the most with the Toshiba (but no other *buntu flavor has failed on these disk operations, on this laptop). I don’t believe I’ve had the “alongside” error with the Toshiba. Just the “erase disk” (when it creates a new partition table). I had the “alongside” error on the bios-only Dell. I don’t recall if I had the “erase disk” error with that.

So, it seems like there is a pattern that way. I feel like I’ve seen patterns in terms of changing between mbr & gpt, or who created the partition table. I’ve definitely seen a pattern where a partition has to exist before Calamares can successfully replace the partition table. (But, these things aren’t absolute. I’ve seen it replace the partition table when there is no partition too.).

The next testing I do will be with 3 other csm/uefi laptops. That will narrow it down to whether if the problem is just one or them, or related to the class of machine. Then I can test the two bios-only machines (which I got the “alongside” error with.).

I pulled out the Thinkpad E440, went into the bios to make it “uefi only” & “no csm” (that should be the same as the Toshiba. It only has a uefi or csm choice. This Thinkpad has uefi, legacy, both. And, it has “csm” as an additional choice.).

Calamares failed the very first attempt to “erase disk” while trying to create the gpt partition table.

The disk has an MBR/msdos partition table (the bios was “legacy first” & “csm=yes” before I changed it for this session.). It has three Linux partitions from the last time I tested with it. (I must have done “alongside” twice.).

The first partition is Budgie. So, there is a good chance Budgie created the partition table. I’ve detected this pattern before. Calamares seems to fail when creating the partition table when it was created by someone else.

I rebooted. Calamares still fails. I think to make this work I have to go into KDE Part Mgr and create the GPT partition table for Calamares. Then it will replace that one with its own. But, I might have to create a partition. (Sometimes it can’t replace KDE PM’s partition table when there’s no partitions present.).

This problem does seem to be in the csm/uefi class of machine. It’s not specific to the Tosiba. I have two others I can test.

Instead of going further with that Thinkpad E440 (and then frustrating myself that I can’t recreate what’s staring me in the face right now), I turned on the Dell Lat E5420.

It has the exact same problem. First try, just like the E440.

I went into the bios and set it to uefi (apparently I was running it legacy all this time. I wasn’t concerned about it. I have uefi machines. Some of the csm/uefi machines are set to uefi. Some legacy.).

The drive has an MBR/msdos partition table (like the E440) because it’s been running legacy previously. The drive has two Linux partitions. The first one is Lubuntu. (Apparently Calamares created the MBR table, unless I installed with “replace partition,” which I rarely do. But, it’s possible.).

I boot today’s (yyyy0318) Lubuntu daily image, start the installation, choose erase disk. It fails creating the GPT partition table.

This is an exact duplicate of the Thinkpad E440. First boot of each. The only difference is that Calamares probably created this MBR partition that it’s now stumbling upon. (There is a slight chance I reused a partition table by putting Lubuntu in that first partition with a “replace partition” install. My theory that Calamares is sensitive to other partition tables might not be as solid as I thought.).

One additional point: I left out something above. It doesn’t affect anything I describe above, but the first boot was accidentally in legacy mode. Calamares detected that and tried to create a new msdos partition table (not gpt, as reported above). That failed too. So, it’s not just gpt. I accidentally ran into that this time because I chose the wrong boot menu-item, and it turned out to boot as a legacy bios.

I do like calamares myself. I’ll also provide the following
sudo -E calamares -d
which can be used when running/installing to get verbose debugging output from calamares (provided by Walter/wxl in comment #11 of Bug #1864787 “Lubuntu failed install “sfdisk --force --append /d...” : Bugs : calamares package : Ubuntu and found in ~/.cache/calamares/session.log)

If I’m suspicious of hardware issues, I usually open a terminal and quickly scan for errors in dmesg and/or `journalctl. At least once I’ve filed a bug about a suspect issue that kept appearing that I suspected was more than just my hardware, but I believe I soon decided it was my hardware and changed the bug status to “invalid”. If the bug ID is used in the iso.qa.ubuntu.com report (in bugs fields) it’ll be picked up for other users to test for, so on those occasions I usually opt for commenting the bug only unless I’m sure.

If you get the same error/issue on two different laptops though, you’ve pretty much ruled out hardware anyway.

I’m not capable of providing any specific advice on your actual bugs, or issues with calamares. The issues & observations you’ve described in your last few posts, do they appear in lp.bug.reports too?

You’ve been testing heaps I note, so thank you. (You’ll soon pass me in the focal daily I see on Top testers (current milestones) | Ubuntu QA)

It’s been so hard to describe (not repeatable; patterns end up being invalidated), I never felt like I had enough information to report. I’ve just bee talking out loud about a perplexing, seemingly random behavior.

For, example: I have asserted (with a fair degree of confidence) that the behavior only happens with dual-bios type machines (uefi/legacy). Just moments ago, my Ryzen 3 uefi-only Acer failed while creating the new (erase disk) gpt table.

I’ll restart Calamares from the command line using the debug parms and open a bug report with that info.

(sigh). I couldn’t recreate it. Calamares worked the next time I ran it (from the command line, passing it debug parms).

It’s probably not a problem. As you said, people can use manual partitioning. It also sounds like I’m the only person experiencing it. Maybe it just stands out to me because I see it so much. In the real world someone would only see it once (as they typically only install once.). So, it’s not that bad in terms of how much any single person would be affected.

[EDIT: The way having debugging enabled coincided with Calamares working, that makes me wonder if it’s a timing problem. Maybe Calamares isn’t waiting long enough for the result?

That would fit my experience with “alongside.” It failed, but when I went into partition manager, it appeared it had succeeded. All I had to do was start Calamares again and “replace partition” as a 2nd step of “alongside.”

I’ve also noticed Calamares will fail after creating a partition table in KDE Part. Mgr. But, if I create a partition too, then Calamares succeeds. That isn’t 100% predictable. But, I’ve noticed a trend in that direction. Maybe having a partition present causes Calamares to work a little harder, and consequently waits long enough for the partition table to be created.]

2 Likes

I have tested “install alongside” on several machines (including2 laptops) and have not had any problems with that. That being said I have noticed that every once in a while “Erase disk” will fail if I click to fast on next. Of course I try again but cannot repeat it. :grinning:

I haven’t had an “alongside” problem except once. But, I just saw this bug in the “bugs to look for” today. It sounds like Calamares has been a problem for over four months. People have spent an incredible amount of time catering to its sensitivities. It sounds very similar to what I’ve been talking about the past week or two.

It’s amazing to me that extended reporting hasn’t been enabled for such an ephemeral problem. It’s like everyone’s just supposed to play along forever. You’d think by now Calamares would collect coincident information, log more stuff, point the affected person to the log (maybe hook into the automated crash reporting and keep the user out of it).

The error page is useless. It repeats the same info twice, and then says what failed without any hint about why. And this is FOUR MONTHS later? I regret that I spent as much time as I did on this. You guys have been chasing it for all this time. When the problem is Calamares not knowing what’s wrong with the environment at the time the error occurs. That’s nuts to expect people to knock themselves out this way (for this long).

I would like to point out that when the bug 1851188 that you referred to was first reported on 04.Nov.19 at that time we were using calamares version 3.2.16 if I remember correctly. Since then 6 updates to calamares have been released - so it is unfair to the developers to say nothing has been done in the past 4 months.

Everyone is doing there best to make the systems work in the desired way. That is why we are testing - to find the bugs before the official release. Software development is always about making the products better,more secure and last but not least usable for everyone.

Again I hope you will continue to test with us and help with these objectives. You have been doing just great and we need all the help we can get from people willing to commit their time and effort.

2 Likes

I don’t recall saying “nothing had been done.” What I’ve said is that it is remarkable how little information is given about the error; no context about the environment in which its occurring, what it looked like before/after the error, what was expected. I think that’s meaningful when Calamares spent a significant amount of effort developing the means to collect individuals’ hardware information. They expect you guys to spend countless hours stabbing in the dark (which you guys expected me to do, too). But, collecting user data is something they had no trouble implementing.

When I first brought this up, that Calamares should be expected to provide more error information, even notify itself through the “Ubuntu detected an error… send report?” workflow (they have their own data-collection work flow when there’s no error. But, when there’s an error, just keep stabbing in the dark. Don’t expect them to provide any information that would help identify what’s common between the random errors.). When I brought that up, I was told extended logging can be enabled with -d. Why hasn’t that been enabled as a default in the past four months? I could have had data the first time I hit the problem. It’s like we’re expected to work harder than necessary?

I don’t want to quench anyone’s passion about furthering Lubuntu, but… isn’t there more productive things to spend so much time on? If stabbing in the dark looks like a valid use of time (instead of expecting Calamares to put the same effort into error reporting that they put into collecting hardware information; or enabling debug logging as the default), doesn’t that say something about the extent to which Lubuntu can be furthered? (I’m saying this from the perspective of having used it for 4 years, until it became something else. Not just some random visitor opining.).

I was thinking about this last night. This is a perfect example of what Windows enthusiasts commonly refer to whenever Linux is mentioned as an alternative. They start talking about fragmentation, duplication of effort, everyone recreating the wheel. Here we have someone who decided to create a new installer (when Ubiquity seems fine). Now we have people spending countless hours stabbing in the dark because the duplicate installer is really good at collecting user information, but not providing actionable details when there’s an error. Nobody seems to think that’s a problem. Quite the opposite, they’re defending it.

I didn’t come here to upset anyone. I just wanted to help with testing. I appreciate all the very good information Chris and Walter provided which got me started with Ubuntu’s release testing. That was not a waste of time. I will definitely put that to good use. My previous 4-year use of Lubuntu probably makes me sensitive to how things are now. Personally, I think the common Windows-enthusiast complaint about fragmentation, duplication of effort applies to LXQt and what Lubuntu has turned into. I.e., I think it would have been more productive to just leave LXDE/Lubuntu where it was (end of life) and further other desktops that fill a similar lightweight space. I admire you guys for fighting the good fight. But, its stunning to me that anyone would spend 4+ months stabbing in the dark, not expecting the program (having the problem) to produce meaningful error information (especially when resources were spent to create data collection for other, convenience purposes.). Moreover, after almost 5 months, the debugging info hasn’t even been turned on by default. And, people seem to think there’s nothing wrong with this scenario. To me, it seems like self-inflicted work.

Good luck. I hope everything works out. I’m sorry for getting this involved. All I wanted to do was just help test Ubuntu generally, not get into community things. In some ways I feel very bad for having gotten into a discouraging discussion. But, in other ways… I’m a little outraged that I spent so much time working on something that’s been known for at least 4 months – and nobody’s expected anything better from it. I thought I stumbled onto something new, then something related to Chris’s bug. Now it’s turned into a well-known thing which is expected to be worked on using brute force. If I’d known that from the start, I would have merely said “good luck” without rocking anyone’s boat. If that’s how you guys want to work, that’s your choice. (If I had to do this, I’d just go back to Windows. I think the Windows enthusiasts have a valid point in this regard.).

Previously you said you have worked in development. I’m curious: where you worked, would this kind of stabbing-in-the-dark for 5 months be the way they’d conduct problem resolution? (I worked as a programmer and systems analyst for many years. People would be fired for this kind of stuff. Not so much the bug itself, but the “try again” response for 5 months – while devoting significant development resources to create a data collection process for other purposes.).

Sorry. I’ll shut up now.

I´m sorry if I offended you in anyway - I certainly was not criticizing your work. As stated in my post you have been doing a great job.

2 Likes

Admittedly, because KDE Neon requested it and it’s disabled by default.

Ubiquity doesn’t do this, unless it crashes. And if Calamares crashes, it does it, too.

Most software overall does not have extremely verbose logging. This is way more than most users— even advanced ones— generally need.

Nope. For one, you’re conflating Calamares with Lubuntu. Maybe based on the trouble you’re seeing, Calamares should be dropped. I’ll give you several reasons why not:

  1. Ubiquity is really no better. Worse, in fact, I think, based on years of doing testing and development across both installers. The other day one of the most notable members of the Kubuntu team told me they file more bugs against Ubiquity than any other package.
  2. I can actually immediately connect with developers of Calamares and they usually implement very quick fixes. Past experience with Ubiquity, despite it being a product of Canonical and having ready access to every single developer of it, has been quite the opposite.
  3. Ubiquity’s codebase sucks. It’s terribly documented and poorly organized. It’s often really hard to figure out a problem to even begin to try to suggest an actual problem. I have no such problems with Calamares.
  4. Every piece of software has its problems. The determining factor in trying to figure out whether or not it’s worth spending time on is whether or not it works for most situations. And Calamares does (so does Ubiquity, actually).

Admittedly that’s because I haven’t been able to dig in deeper into the issue because, well, you know there’s a global pandemic going on? I haven’t even had sufficient information to really do so until the 2nd, so it’s not actually been that long. That said, if there were more people that could help out that would be nice…

Nope, it gets fixed like everything else: testing, investigation, questioning, and always working towards progress rather than being mired in bad feelings.

That said, I’ve appreciated your help. I certainly hope you continue.

3 Likes

Except Ubiquity doesn’t regularly randomly fail on creating GPT tables (and resizing alongside), and expect people to spend FIVE MONTHS stabbing in the dark, without even enabling debug logging by default.

I find these line-by-line dissections to be very effective at missing the general intent of a post. (Kind of like “I encourage everyone to ignore the rest of this thread, until it’s own topic” – while other such topics are split out when it’s deemed useful, such as the “woe to those…”).

I’m sorry that I have upset some comfort zones here. If people are happy with how things are, that’s great.

Good luck!

Nope, we’re not. But we’re limited in time and energy (or at least I am; yesterday I caught up on a huge backlog here) so I’m trying to get to the meat of the matter. The reality is imperfection, the goal is to always strive for perfection, and we don’t get there or any closer to it by complaining. Let’s work towards solutions.

2 Likes

The bolded bit is what I touched upon earlier. It was frustrating (to me, at least) to see so many people donating their time and expected to (or, not having a problem with) stabbing in the dark for nearly HALF A YEAR as a valid use of time. To me, it seems like there would be better approaches (expecting Calamares to provide more context about conditions before/after the error; reporting such things to its own developers; The affected distro enabling extended logging by default to help distro supporters provide more info.).

I’m sorry that my thoughts were viewed as “complaining.” I was worried about that, and tried more than once to say that it’s simply not my cup of tea (but, others have every right to spend their time however they wish. I didn’t want to discourage anyone from doing that, nor criticize that activity if it’s deemed valid.).

A few weeks ago I made a comment about how Ubuntu’s corporate culture is something I don’t think I could ever fit into. You seemed a bit triggered into a tedious analysis of how Canonical is separate, etc. But, I was talking about the overall largeness that leads to group-think (stabbing in the dark for FIVE MONTHS; being sensitive to anyone questioning it, etc. Others expected to fold inline because group membership/agreement is more important than fixing the bug – allowing for other bugs to be found and fixed with that effort.).

What I’ve said could be energizing, and refocusing. Or, it could be unwanted (as it has appeared to be). My actions, stepping back, mirror what I said. I get mixed signals from you (and others) about the preciousness of time (while promoting the squandering of it).

Good luck. I hope things will be good. I’m sorry for the flare up. I didn’t have any reason to continue this until you revived it. I’m out now. Feel free to say all you want.

What would help, in relation to the particular issue, would be providing all the details on the bug as to what exactly the problem is. This should be as concise and as organized as possible. Long paragraphs often don’t help. Tell me in numbered steps how to reproduce this and give me logs. If there are conditions that it fails in but not others, list those, briefly.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.