Date: Thu, 28 Oct 1999 15:14:24 -0500 (EST) From: Jim Blandy <jimb@cygnus.com> To: linux-kernel@vger.rutgers.edu Subject: Debugging support for Pentium Streaming SIMD Extensions Intel has contracted with Cygnus to add support for the Streaming SIMD Extensions in GCC, GDB, and so on. I've been working on the GDB side of things. I started with Doug Ledford's patches against 2.2.9, adapted them to 2.2.12, and extended ptrace to provide access to the SSE registers. They are available from: http://sourceware.cygnus.com/gdb/papers/linux The next GDB snapshot, available on Monday or Tuesday, will include some support for examining SSE registers, etc. GDB snapshots are available from a link on this page: http://sourceware.cygnus.com/gdb The patches here are preliminary, and I'm interested in hearing folks' comments on them. We're going to give our final version of these patches to Intel, and Intel is going to announce them in some way, and publicly contribute them to Linux. So we would very much like to get the patches in a form the kernel maintainers would actually approve of. Points of interest: * I've added two new requests to ptrace: PTRACE_{GET,SET}XFPREGS read and write a process's i387_hard_fxsave structure. These new requests are parallel to the existing PTRACE_{GET,SET}{,FP}REGS requests. I think these changes are backward-compatible in the right ways. If the processor doesn't support SSE, or the kernel was configured without SSE support, then ptrace returns EIO, just as an older kernel would if presented with an unrecognized ptrace request. GDB adapts to old and new kernels just fine, and old GDB's should continue to work unchanged. I've submitted corresponding patches to GLIBC to add the structure and ptrace request to <sys/ptrace.h> and <sys/user.h>. * The patch doesn't extend core dumps to include the SSE registers yet. I assume that the best way to do this is to add a new note, with some new type name like "NT_PRXFPREG", containing an i387_hard_fxsave structure. This would be in addition to an old-style NT_PRFPREG note, so all existing tools would continue to work. Naturally, this note would only be dumped if the kernel and processor supported it. * To save a process's context, these patches do both an fnstenv and an fxsave, even though fxsave saves all the context necessary. Thus, context switches are slower than they need to be. The rationale for the extra save, I think, is that several external interfaces --- signal contexts, core dumps, and ptrace --- use the old FP save format, but reconstructing an fstenv record from an fxsave record is non-trivial (although possible). By saving the FP/SSE state in both formats, it's easy to retain compatibility with those old interfaces. I think Linux should simply use fxsave and fxrstor for context switches, and convert between that and the fnstenv format when needed. The conversion is quick, but not as quick as a memcpy; see the description of the FXSAVE instruction for details. However, I don't think the cost is important. These conversions would occur in three places: - When dumping core. The cost of doing a single conversion is inconsequential next to the cost of writing the core file. - When a process makes a PTRACE_{GET,SET}FPREGS request. New code can use PTRACE_{GET,SET}XFPREGS instead, so this is also not a problem. - When the kernel delivers a signal, or restores a signal context. A conversion is necessary if the kernel wants to continue to use the existing struct _fpstate. However, I believe that struct _fpstate must be expanded to include the XMM registers as well. If we do this, we might as well change it to use the new format at the same time, so no conversion will be necessary. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/