This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

System dies since 6 installed

I upgraded a working system from 5 to 6 and now it dies after 5 or 6 hours. It's still running though because I can SSH into it and I found the following in the kernel.log.

2005:09:21-21:16:21 (none) kernel: Losing too many ticks!
2005:09:21-21:16:21 (none) kernel: TSC cannot be used as a timesource.
2005:09:21-21:16:21 (none) kernel: Possible reasons for this are:
2005:09:21-21:16:21 (none) kernel: You're running with Speedstep,
2005:09:21-21:16:21 (none) kernel: You don't have DMA enabled for your hard disk
(see hdparm),
2005:09:21-21:16:21 (none) kernel: Incorrect TSC synchronization on an SMP syste
m (see dmesg).
2005:09:21-21:16:21 (none) kernel: Falling back to a sane timesource now.

Again, this system worked fine until I loaded version 6.

This thread was automatically locked due to age.

Parents

0 BarryG over 20 years ago

run
dmesg
to see the other info it suggests.

What hardware are you running on? (Motherboard chipset, CPU)

Barry
Cancel
Vote Up 0 Vote Down

Cancel
0 Michael_A over 20 years ago in reply to BarryG

It's an IBM A50, Intel P-IV 2.8, ATA drive and I've added an Intel NIC. Don't know the chipset but it's just under a year old and version 5 ran without any problems.

I have discovered that when the system dies the o/s clock has stopped. I can update the clock from the shell buy running hwclock --hctosys but all it does is set the current time. Any system script that relys on the systems ability to count (ie. it has a sleep command) simply hangs.

dmesg returned a couple of things of interest.

I had quite a few of these;

ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
    ACPI-1138: *** Error: Method execution failed [\_SB_.PCI0.USB1._PRW] (Node d
f744340), AE_AML_NO_RETURN_VALUE

and this;

PCI: Using ACPI for IRQ routing
** PCI interrupts are no longer routed automatically.  If this
** causes a device to stop working, it is probably because the
** driver failed to call pci_enable_device().  As a temporary
** workaround, the "pci=routeirq" argument restores the old
** behavior.  If this argument makes the device work again,
** please email the output of "lspci" to bjorn.helgaas@hp.com
** so I can fix the driver.

I don't understand how to try the workaound.
Cancel
Vote Up 0 Vote Down

Cancel
0 BarryG over 20 years ago in reply to Michael_A

I suspect that workaround command may need to be passed to the kernel at boot time. This can be done from the lilo prompt, e.g.
default pci=routeirq
to boot the default ASL kernel with that option. If it works, you can add the paramter to /etc/lilo.conf, however ASL will overwrite that with new kernel up2dates.

If you can you should probably run lspci and email that to the address listed also.

Barry
Cancel
Vote Up 0 Vote Down

Cancel
0 Michael_A over 20 years ago in reply to BarryG

I'm going to try this boot in the morning and if the system stays up for more then a day I may reload using the classic option. The classic option and the message both mention ACPI so I figure it's worth a shot and reloads are pretty painless.
Cancel
Vote Up 0 Vote Down

Cancel
0 Michael_A over 20 years ago in reply to Michael_A

Just to close this out.

The suggestion from the logs to use pci=routeirq did not help. Astaro gave me the following to try which is basically the same as a classic install.

default noapic aipc=off

This worked and so I reinstalled using the classic option and restored the configuration and all is running fine. Two things of note that I will pass along.

1) This hardware was only about 8 months old and had been running version 5 just fine. The problem occured as soon as version 6 was loaded. I have no idea why this occured but it did.

2) When the system failed, the o/s clock quit running. You could still login at the console or through SSH but type in date and you would see that the time did not advance.
Cancel
Vote Up 0 Vote Down

Cancel
0 BarryG over 20 years ago in reply to Michael_A

I'd still suggest running lspci and emailing that address.

Maybe the linux kernel people can fix it so it'll work better in the future.

Barry
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 BarryG over 20 years ago in reply to Michael_A

I'd still suggest running lspci and emailing that address.

Maybe the linux kernel people can fix it so it'll work better in the future.

Barry
Cancel
Vote Up 0 Vote Down

Cancel

Children

No Data