Talk:Reproducing kernel Hilbert space

This article has been mentioned by a media organization:

John Timmer (December 29, 2015). "Editorial: Wikipedia fails as an encyclopedia, to science's detriment". Ars Technica.

Problem with section on connection between RKHS and ReLu and Neural Networks

The first part of this section is very disconnected from the last sentence which reads

"Using this formulation, we can apply the Representer theorem to the RKHS, letting one prove the optimality of using ReLU activations in neural network settings. "

The first part of the section just shows that the ReLu is a reproducing kernel when a Hilber space is considered. The Representer theorem shows that when optimizing a sum of two functionals, one just depending on a few points of a function and the second being monotonic on the norm of its input, that we only really need to optimize the objective on the function on the points that the first functional depends on.

When one rights "letting one prove the optimally of using Relu activation' what kind of optimallity is the text referring too? Is there some citation to go with this ? Typically, when using NN, one does not optimize the activaiton functions, but the weights. How is the Representer theorem then used in this context? — Preceding unsigned comment added by 136.167.52.51 (talk) 21:39, 11 July 2020 (UTC)[reply]

This sesquilinearity convention

What is this sesquilinearity convention of which you speak? Do you mean linear in the second variable and conjucate-linear in the first? If so, it seems a strange convention to me. Lupin 08:39, 6 Aug 2004 (UTC)

The definition requires two spaces, X and H!

A lot of the article (e.g. examples) is currently written as if an RKHS (as the name implies) is a special kind of Hilbert Space. This is quite misleading. Every Hilbert Space, H, is an RKHS for X := H.

Norm-continuous

The article states that H is a reproducing kernel Hilbert space iff the linear map f→f(x) is norm-continuous for any element x of X. -- what does norm-continuous mean in this context? linas 15:06, 25 Jun 2005 (UTC)

H is a Hilbert space, thus carries a norm. This f → f(x) has a well-defined topology on both source and target.--CSTAR 15:38, 25 Jun 2005 (UTC)

Examples, too

Some examples would be handy too. Reading this article naively, one has the Dirac delta function playing the role of K, so that $f(x)=\int \delta (x-y)f(y)dy$ and δ is usually naively understood to be the "identity operator" on the Hilbert space in question (that is, its the sum over all states, e.g. the sum over all orthogonal polynomials on an interval). Is this naive reading correct? Can I correctly say that an example is

K(x,y)=\sum _{n=0}^{\infty }T_{n}(x)T_{n}(y)

where T_n ar the Chebyshev polynomials (and the norm/volume elt/weight is taken appropriately)? It seems to fulfill the requirements described in this article, I think ... linas 15:21, 25 Jun 2005 (UTC)

That's just it, the delta function is not in the space.--CSTAR 15:38, 25 Jun 2005 (UTC)

Would you object if I added the above as an example to the article? linas 18:31, 25 Jun 2005 (UTC)

Counter-examples, too

Assuming the example given above is correct, then, naively, every hilbert space encountered by a naive physicist is a reproducing kernel hilbert space. So it now becomes hard to imagine a Hilbert space that doesn't have such a beast associated with it. (Since K seems to be defined as "the identity operator", and these are always easily constructed as a sum over states). linas 15:27, 25 Jun 2005 (UTC)

What's relevant here is the realization of the Hilbert space as a space of functions on a set X.--CSTAR 15:38, 25 Jun 2005 (UTC)

Well, then, for the record, are there any hilbert spaces that are not realized as a space of functions on a set? Alternately, are there any hilbert spaces that are realized as a space of functions on a set, but do not have the norm-continuous property? I'm one of these people who don't feel comfortable with a topic until I have understood both an example and a counter-example. I'll try to think of something, but if you have something up your sleeve, that's certainly better than whatever I might dream up. linas 18:31, 25 Jun 2005 (UTC)

Yes: any L²_μ for a non-atomic measure μ. These are realized as equivalence classes of functions.--CSTAR 18:41, 25 Jun 2005 (UTC)

late late comment: i think linas's original post is right. every Hilbert space H is isomorphic to a reproducing kernel Hilbert space, meaning there exist a set X, and a positive kernel K: X × X → C such that the resulting Hilbert space is isomorphic H, and, yes, K can be taken as, essentially, the identity operator.Mct mht 19:38, 30 March 2007 (UTC)[reply]

Late late late comment. Whilst it is true that every Hilbert space is isomorphic to a RKHS, that's not the same as saying it is a RKHS. For example, L²[0,1] is isomorphic to

l^{2}(\mathbb {Z} )

, which is a RKHS. However, the reproducting kernel here is

K(x,y)=\delta _{xy}

for

x,y\in \mathbb {Z}

. The key thing is, that this does not lead to a reproducting kernel for L²[0,1], as it takes x and y in

\mathbb {Z}

, not [0,1]. In fact, L²[0,1] is not a RKHS, as point evaluation is not a continuous function. The message here is that being a RKHS is not preserved by isomorphism. Also, I should point out that the delta function is not a function in the conventional sense. It's a common shorthand to regard it as one, but it isn't a function in the conventional sense, and it certainly isn't in L²[0,1]. James pic 13:37, 9 October 2007 (UTC)[reply]

P.s, the delta function I use in the example above as

\delta _{xy}

is the Kronecker delta, not the Dirac delta. The Kronecker delta is in

l^{2}(\mathbb {Z} )

, whereas the Dirac delta is not in L²[0,1].

Incidentally, in response to the request for examples, L²(X) is not a RKHS for X any open subset of

\mathbb {C} ^{n}

. On the other hand, Hardy spaces usually are RKHSs, using various variants of the Szego kernel.

For example

L^{2}(\mathbb {T} )

, the Hilbert space of L² functions on the circle is not a RKHS. On the other hand,

H^{2}(\mathbb {T} )

, the space of L² functions on the circle which are radial limits of holomorphic functions on the disc, is. James pic 14:18, 9 October 2007 (UTC)[reply]

Edit

User:Oleg Alexandrov states that the sentence I edited was wrong, e.g [1]. Hoever, several sentences were modified. Which sentence are you referring to here?--CSTAR 17:34, 25 Jun 2005 (UTC)

Well, I modified there only one sentence, everything else is just link insertion. I was referring to the sentence:

H is a reproducing kernel Hilbert space iff the linear map

f\mapsto f(x)

is norm-continuous for any element x of X.

I thought that the part "for any element x of X" is wrong, but then I realized I was wrong, so I put it back in the next edit. Does that make more sence now? Oleg Alexandrov 17:43, 25 Jun 2005 (UTC)

Seems to. I'll give this more thought later tonight. linas 18:35, 25 Jun 2005 (UTC)

So, basically x is fixed, and then you get a linear map with input f and output f(x). I understand that this is weird, because usually the function f is fixed and input is x while output is f(x). Oleg Alexandrov 19:29, 25 Jun 2005 (UTC)

An important class of examples are Hilbert spaces of analytic functions.--CSTAR 05:59, 26 Jun 2005 (UTC)

Article is not consistent

what is meant by "reproducing kernel Hilbert space" here? sections of the article disagree.

one can say that a Hilbert space of functions on X is a RKHS if it admits a positive definite kernel K on X × X. this is the approach in the first section. with this definition, as CSTAR said above, any L²_μ for a non-atomic measure μ would not be a RKHS. (BTW, the positivity of the kernel is really the central theme whether one adopts this formulation or the one below, but that's not pointed out at all in the intro.)
one can start with a positive kernel K on X × X. then there exists a Hilbert space H for which K is a reproducing kernel. this is alluded to in the Moore-Aronszajn theorem section. but then "is a reproducing kernel for" now means something slightly different. it is no longer insisted that H consists of functions on X. after quotiening out the degenrate subspace and taking the completion, we'd end up with equivalence classes in general. in this formulation, any Hilbert space is a RKHS, because the existence of orthonormal basis.

this should be resolved. if it's ok, i am going to attach clean-up tag to the article. Mct mht 01:40, 31 March 2007 (UTC)[reply]

I don't think your second point is correct. The phrase "is a reproducing kernel" implies that H consists of functions. This is because a reproducing kernel implies a continuous point-evaluation functional. Unless you are dealing with functions, point-evaluation makes no sense. Also, not every Hilbert space is an RKHS (which has been discussed above) although it may be isomorphic to one.150.203.33.77 02:27, 13 August 2007 (UTC)[reply]

i did not make a claim regarding what "is a reproducing kernel" means. i just pointed out that article seems to imply two inconsistent meanings. merely choosing one and label the other one incorrect, as you did, doesn't address the issue. in the intro, article states, as you say, "is a reproducing kernel" means H consists of functions and pointwise evaluations are bounded functionals. on the other hand, in the Moore-Aronszajn theorem section, it seems pretty clear to me that, if one carries out the construction as intended, the resulting Hilbert space would not in general be a spaces of functions, unless the ≥ is changed to >. Mct mht 03:04, 13 August 2007 (UTC)[reply]

Ah, I see. You are talking about the construction that you removed, right? I'll add in a correct construction.203.173.5.212 10:38, 13 August 2007 (UTC)[reply]

you've changed the section completely and i doubt it's right. do you have a reference? i think the point is that one considers here spaces of functions, not equivalence classes of functions. by starting with L², this is already violated (contradicting what you said yourself above). furthermore, your inner product does not converge in general. T_K is compact, so its eigenvalues λ_i goes to zero. if they do so, say, exponentially, then your inner product is not well-defined. assuming the inner product is valid, even then you may not have a Hilbert space. i don't think it is true in general that the range of T_K is closed.

For a reference, see (for example) "On the Mathematical Foundations of Learning" by Cucker&Smale (bulletin of the AMS). They define an RKHS on page 11. Also, "On the Performance of Kernel classes" by Mendelson (Journal of Machine Learning Research) has a slightly different way of defining it. I'll add these to the references list.

The fact that I started with L² is irrelevant. Two functions that belong to the same equivalence class will be mapped by the kernel operator to the same function so my RKHS is indeed a space of functions. I'll make a point of this.

The inner product I defined may not (in fact, will not) converge in L². Note, however, that I only defined it in the image of L_K. Then

L_{K}^{-1}g

is in L² and so the inner product is defined. The fact that the inner product doesn't converge in L² in important; it means that the RKHS is a proper subset of L². For a start, functions in the RKHS are continuous and have the same number of derivatives as the kernel does.

Are there any more issues with that section, or can I reinstate it? (I'm the same person that made the changes before -- I finally got around to making an account) Jneem 00:18, 19 August 2007 (UTC)[reply]

Sorry, I forgot to address the point of the space being complete. Let E be the closed (in L²) span of the eigenfunctions of L_K. This, with the standard L² inner product, is a Hilbert space. Also, L_K^1/2 is an isomorphism between E (with the L² inner product) and the RKHS (with the RKHS inner product). Therefore the RKHS is also complete. Jneem 00:24, 19 August 2007 (UTC)[reply]

i think the assumption of the previous version was probably correct and it just needed some elaboration. Mct mht 06:38, 17 August 2007 (UTC)[reply]

The previous version may have been correct, but it doesn't directly follow from any construction I am aware of. In addition, the construction I give says explicitly what the inner product is. It doesn't seem to be mentioned elsewhere on the page that the inner product on a RKHS will, except in finite-dimensional situations, be completely different from the normal L² inner product. Reading the talk page, there seems to be some confusion on this point (with people thinking of the point-evaluation functional as being a dirac function) so I think it's worth having.Jneem 00:18, 19 August 2007 (UTC)[reply]

hi, Jneem, good to have you contributing. is there a pdf file for the Cucker & Smale paper? some questions/replies:

Re "Two functions that belong to the same equivalence class will be mapped by the kernel operator to the same function": ok, that is true.
Re "The inner product I defined may not (in fact, will not) converge in L². Note, however, that I only defined it in the image of L_K. Then $L_{K}^{-1}g$ is in L² and so the inner product is defined.": ok, that is also true.
Re "Let E be the closed (in L²) span of the eigenfunctions of L_K. This, with the standard L² inner product, is a Hilbert space.": by adjoining elements of L² to get completeness, are you not ending up with a space of equivalence classes? in the line where you verified that this is a RKHS: $\langle K_{x},f\rangle _{H}=\langle K_{x},T_{K}^{-1}f\rangle _{2}=(T_{K}(T_{K}^{-1}f))(x)=f(x)$ , you only did it for f in the image of T_K. what if f is not in the image of T_K but in its completion?
Re the original (in the article right now) version: it is more general and fundamental than the construction you gave, which seems to me constructs a Hilbert space (of either functions or equivalence classes of them) using a positive, in your case unbounded, operator in a pretty standard way. it's clear that a positive definite kernel on a set X (with no extra structure, such as a measure) gives an inner product on functions from X to C with finite support. the issue is completion. if we take the usual Cauchy completion, then it will not be a space of functions. but i think it can be bypassed. Mct mht 01:28, 19 August 2007 (UTC)[reply]

OK, you make a good point about completion. What if I define the RKHS by the set of functions

\{f(x)=\sum _{i=1}^{\infty }a_{i}{\sqrt {\lambda _{i}}}\phi _{i}(x):\ \sum a_{i}^{2}<\infty \}

. Then f is defined pointwise because, by Mercer's theorem,

\infty >K(x,x)=\sum \lambda _{i}\phi _{i}(x)^{2}

and so f(x) < ∞ by Cauchy-Shwartz. Then this is obviously a family of functions and the fact that it is complete under the RKHS inner product follows from a diagonalisation argument. Basically, I'm forgetting about the kernel operator and following Mendelson's second definition more closely. By the way, there is a pdf of Cucker&Smale at the same address as the ps -- just change the ".ps" to ".pdf". Jneem 03:36, 19 August 2007 (UTC)[reply]

besides some details, i think that does it. that the positive operator has purely discrete spectrum seems to me essential here. this is really a kernel K on the natural numbers N given by K(i,j) = λ_i δ_{i j} in disguise (one can substitute the natural basis {e_i(j) = δ_{i j}} with any set of orthonormal functions and you did it here with the eigenvectors of T_K). i for one am not going to object if you wanna add it back. in any case let's keep the Moore-Aronszajn theorem section, as that is the most general formulation. Mct mht 17:56, 19 August 2007 (UTC)[reply]

Sorry, I'm still missing something. The Moore-Aroszajn theorem says that the RKHS is unique. Therefore, any construction we give must result in the same RKHS. So why is one thing more general than another? Jneem 22:57, 19 August 2007 (UTC)[reply]

well, the assumptions of the Moore-Aroszajn theorem are different than those for your construction so i don't think you can apply their uniqueness statement. if i understand correctly, the Moore-Aroszajn theorem says that for any set X and any positive definite kernel K: X × X → C, there is a unique RKHS consisting of functions on X. so no structure such as a measure is assumed on X and no passing through an integral operator. Mct mht 01:50, 20 August 2007 (UTC)[reply]

you're right, so I've done a proof of Moore-Aroszajn in the general case. I'll add the stuff about Mercer kernels when I have time.Jneem 08:17, 20 August 2007 (UTC)[reply]

also, the the Moore-Aronszajn theorem section contains some unusual notation and language ("first class citizens"?), and the very brief outline of the construction doesn't look quite right. Mct mht 02:04, 31 March 2007 (UTC)[reply]

I agree; the previous version by the anon contributor made a mishmash of Carleman operators, Hilbert Schmidt operators and Kernels. --CSTAR 15:21, 17 August 2007 (UTC)[reply]

It is a Hilbert space

The most recent edit claims that the inner product given in the proof of Moore-Aronszajn may be degenerate but doesn't say why. I think that it cannot be degenerate: suppose it is. Then there exists a point x such that $\langle K_{x},K_{x}\rangle =0$ and K_x is not the zero function. Then we can find y such that K_x(y) is not zero. Then the kernel matrix determined by x and y is

{\begin{bmatrix}0&a\\{\overline {a}}&b\\\end{bmatrix}}

for some non-zero a and some real number b. One can check that the eigenvalues are

{\frac {b\pm {\sqrt {b^{2}+4|a|^{2}}}}{2}}

and so one eigenvalue is negative, contradicting the fact that the kernel was positive definite. Unless someone can poke a hole in my argument, I'll reinstate the claim that the construction produces a genuine Hilbert space. Jneem 12:14, 27 September 2007 (UTC)[reply]

I see no flaw in your argument. I'll undo my edit. Memming 13:08, 27 September 2007 (UTC)[reply]

You were right to require symmetry of the kernel, though, so I'll put that back in. Jneem 00:17, 28 September 2007 (UTC)[reply]

i don't know if one is interested in real RKHS's but in the complex case, positivity implies Hermiticity, i.e. K(x,y) and K(y,x,) are mutual conjugates. Mct mht 05:06, 9 October 2007 (UTC)[reply]

Informal description

an anon contributor recently added the following passage:

"Informally, a reproducing kernel Hilbert space is a function space on which the concept of a Dirac delta function makes sense, despite possibly being technically absent from the Hilbert space."

i think that can be misleading, unless one introduces some concrete examples of RKHS (necessarily one that is not a space of analytic functions). does the anon have some such e.g.'s at hand?

even then, one might still argue that the statement is misleading. talking about δ-functions implicitly assumes that some kinda analytic structure is present, while the generic RKHS does not. take for instance the trivial kernel on R, K(x,y) = δ_xy. apply the Moore-Aronszajn construction given in the article. the resulting space of functions on R has no obvious analytic description. it would be a big stretch for fit δ-function into the picture, if that is possible. neither does it make sense to speak of δ-functions in the context of spaces of analytic functions. something that is closer to what anon describes is perhaps a rigged Hilbert space. although in that case, the δ-functions do reside in the space IIRC. Mct mht 05:50, 30 September 2007 (UTC)[reply]

Hi, I'm anon. My original statement was wrong. I changed it to

"Informally, a reproducing kernel Hilbert space is a function space which contains a Dirac delta-like function at every point of the underlying set."

Anyways, let me explain the rationale. There are only two important things about a dirac delta function: it has a mass (integral) of one concentrated entirely on a single point of the underlying set. The concept of analyticity and taking limits, while important in its construction and interpretation on R^n, is irrelevant to its purpose. It's more of a measure theoretic object (dirac measure) than an analytic object. Perhaps I should have said dirac measure instead of dirac delta function, but since the article was talking about function spaces I thought I'd use the latter to avoid confusion.

Since the rkhs abstracts those two essential properties from the Dirac delta into the definition of the reproducing kernel, I thought it appropriate to make that informal connection. In the article as it stood, it was not clear why anyone would care about rkhs. I couldn't figure out why until I read it carefully and made the connection above, so I thought it was worth sharing.

Your concerns are certainly legitimate (the dirac delta is technically an object with analytic structure, although I'm downplaying that to make the connection) and if you can think of a way to make it more accurate, please do. But let's try to make the article more readable, it's too formalistic right now.--141.211.61.254 18:50, 7 October 2007 (UTC)[reply]

I agree with Mct mht and I think the reference to a dirac function should be removed. It might make sense to think of a dirac "function" if the inner product on an RKHS were the usual L² inner product -- but it isn't and that is a very important point (I already removed a sentence that made the same mistake, so this probably needs to be emphasised somewhere). The inner product on an RKHS is given in the proof of Moore-Aronszajn and it is definitely not the usual one. In fact, as Mct mht pointed out on this talk page, an RKHS does not need a measure.

I think the term "point-evaluation functional" is less misleading than the term "Dirac delta function." I do agree, however that the article needs some motivation. However, I think that the motivation must not be misleading.Jneem 02:07, 9 October 2007 (UTC)[reply]

the connection between pointwise evaluations and delta-functions is rather forced in this setting. given a positive operator A on a Hilbert space H, it induces an inner product on its domain: <h, Ah>. in particular, a positive semifinite n × n matrix allows one to renorm Cⁿ. a RKHS is really a very natural generalization of this. if X is the integers, then one can think of positive kernel as an infinite positive matrix. the definition of positivity copies verbatim from the matrix case: we require that every principal minor (of finite size) is positive. so naturally we have an inner product on the space of functions f: X → C with finite support. to get a Hilbert space, two possible obstacles: degeneracy and completeness. if a positive semifinite n × n matrix A is not strictly positive, the induced inner product would be degenerate on Cⁿ. but it one restricts to the range of A, spanned by its columns, degeneracy is removed. this is essentially what is done also for the kernel case. for completeness, we include all functions of finite norm (the abstract Cauchy completion would not do here). Mct mht 05:02, 9 October 2007 (UTC)[reply]

Sesquilinearity

Should we be following the convention for inner product spaces of $\langle \cdot ,\ \cdot \rangle$ being linear in the first variable, or the convention for more general sesquilinear forms of being linear in the second variable? My feeling is that since we're dealing solely with Hilbert spaces here, so we're only dealing with inner products, we should use the inner product space convention, which also seems to be more common in the literature. What say you? James pic 11:35, 18 October 2007 (UTC)[reply]

matter of consensus. i personally don't have a preference. article just needs to be consistent. Mct mht 19:19, 18 October 2007 (UTC)[reply]

I don't mind terribly -- I used linearity in the second variable for consistency with the page on sesquilinear forms (the page on inner product spaces is inconsistent: first, it says that an inner product is a sesquilinear form. Then it says that an inner product is linear in the first variable. Then it gives an example of an inner product (the third example) that is linear in the second variable). Anyway, if you want to make it linear in the first variable, you should also change the proof of the Moore-Aronszajn theorem. Jneem 01:47, 20 October 2007 (UTC)[reply]

Like I said, my preference is to use linearity in the first variable, as this seems to be most consistent with the literature, at least for Hilbert spaces. However, even this could be contentious, as physics literature generally uses the opposite convention. As you said, it should be a matter of consensus, so I thought I'd call for a vote.

First Variable: I've seen it most often in the literature. James pic 16:20, 22 October 2007 (UTC)[reply]

looks like there won't be any objection if you wanna switch conventions. Mct mht 20:38, 24 October 2007 (UTC)[reply]

It all depends on whether you prefer Forward Polish Notation, fx to denote the application of f to x, or Reverse Polish Notation, xf. The usual convention is that the vector space V consists of column vectors v, which transform covariantly as Mv when multiplied on the left by a matrix M representing a linear transformation from V to some vector space. Its dual V* consists of row vectors, which transform contravariantly when multiplied on the right by the adjoint (as distinct from the adjugate) of M. Dot product has the row vector on the left representing a functional from V to the one-dimensional space. So if dot product is your basic example of an inner product then the second variable should be the linear one. Physicists got that one right, see e.g. Blank et al, Hilbert Space Operators in Quantum Physics where the bra is applied to the ket. The other way round corresponds to Reverse Polish Notation as popularized by HP calculators, no inconsistency there as long as you stick to RPN throughout. Switching everything in the article to RPN looks like more work than RPN warrants. --Vaughan Pratt (talk) 09:16, 1 February 2013 (UTC)[reply]

Is the proof correct?

I do not think that your proof on the fact that given a kernel K, there exists a unique RKHS associared with it is correct. Or, at leats, it is incomplete. After introducing G as another Hilbert space of functions for which K is a reproducing kernel, to prove uniqueness it is assumed that G is the completion of the span of \{K_x:\ x \in E \}. But where does this fact come from? This has to be proved. Notice that in Cucker and Smale's paper "On the mathematical foundations of learning" this is a condition which is listed in the statement of theorem 2 on pag. 35. —Preceding unsigned comment added by Spline (talk • contribs) 17:34, 10 December 2008 (UTC)[reply]

I understand that H (or G as well) is a space of functions on E: if an element f ∈ H is orthogonal to all the

\{K_{x}:\ x\in E\}

, then f is the 0 function on E, thus

f=0_{H}

, hence

\{K_{x}:\ x\in E\}

is total. Does it make sense? Bdmy (talk) 11:38, 11 December 2008 (UTC)[reply]

Yes, it makes sense. span{K_x} must contain an orthonormal basis of H, hence must be dense in G, if G has to satisfy the reproducing property. I think that this point should be mentioned in the proof, because it would make it more clear. —Preceding unsigned comment added by 87.4.244.249 (talk) 21:38, 11 December 2008 (UTC)[reply]

Yes, it makes sense. span{K_x} must contain an orthonormal basis of G, i.e. must be dense in G if G has to satisfy the reproducing property. I think that this point should be mentioned in the proof, it would make it more complete. The conclusion without mentioning this point is not so clear. —Preceding unsigned comment added by 87.4.244.249 (talk) 21:41, 11 December 2008 (UTC)[reply]

Symmetric or "Hermitian symmetric"

I find the wording "the kernel K is symmetric" ambiguous or misleading. Clearly what is meant but never said is the "Hermitian symmetry"

K(y,x)={\overline {K(x,y)}}

like the one for the Szegö kernel. I am right? Bdmy (talk) 11:43, 11 December 2008 (UTC)[reply]

Kernel as dot-product after non-linear map

I don't know a lot about this topic, but I ended up here by studying support vector machines. It seems the SVM world ended up using RKHS's by first mapping statistical data from a set X into a Hilbert Space H using a non-linear map f: X -> H, defining the kernel k: X x X -> IR: (x, y) -> <f(x), f(y)>. I can see to some extent how these definitions might be equivalent or at least related, but I think it might be enlightening for some people to mention this approach to RKHS's in the article as well.

Apologies for the clumsy notation, I couldn't find any way to get a tex-template here.

94.224.49.243 (talk) 15:04, 14 October 2012 (UTC)[reply]

Take a look at Sławomir Biały's edit here and see if you agree. He's also the main editor for Bergman kernel (Bergman 1922) which he wrote starting on the same day. Note that kernel trick (what you're talking about in connection with SVM's) has been in the See Also section of this article ever since this 2006 edit.

Incidentally you can tex-ify your formulas as

k:X\times X\to IR:(x,y)\to \langle f(x),f(y)\rangle

etc. --Vaughan Pratt (talk) 08:05, 1 February 2013 (UTC)[reply]

Contrast with L^2-spaces

Subsuming large parts of the discussion above, it seems that the article should be more precise and clearer right at the beginning about the fact that usual $L^{2}$ -spaces on manifolds are not RKHS's. These aren't because they are not defined as Hilbert spaces of functions, but only of equivalence classes of functions; leading to evaluation at a point of the manifold being undefined. This point is just too likely to be confusing for readers unfamiliar with the subject but used to the usual $L^{2}$ -spaces. Wurzel33 (talk) 19:36, 1 December 2013 (UTC)[reply]

I agree, made this point in the first paragraph, hope this reads well. Biker333 (talk) 11:27, 21 February 2016 (UTC)[reply]

Reproducing Kernels and Operators with a Cyclic Vector (V.N. Singh)

On the page Vashishtha Narayan Singh I have linked his PhD subject "Reproducing Kernels and Operators with a Cyclic Vector" to this page, but I am not sure if this is correct. Could someone please check this? Further, on Wiki Data he is described as "mathematician known for challenging Einstein Gauss theory". Is this correct too? If so, could this be added in the text as well, if possible with an explanation what this implies? I have no knowledge about this subject. Wiki-uk (talk) 11:53, 12 June 2015 (UTC)[reply]

Problems with the definition

-In (1), can M depend on f or not? If it can (as stated in the text), the "forall" notation in the formula is confusing.

- Last sentence in the definition: "From this definition it is easy to see that a function K: X \times X \to \mathbb{R} is a reproducing kernel if it is both symmetric and positive definite". Isn't this what Moore-Aronszajn Theorem proves below?

Expanding the examples

The examples are mostly a list of kernels, but this doesn't give a clue as to which functions are in the corresponding RKHS, and what the norm is. I added a sentence to partially complete this for the linear and Laplacian kernels, do others know more, e.g., is anything known about the double exponential case? Biker333 (talk) 00:26, 20 February 2016 (UTC)[reply]

This is a mess. For example, the first sentence of the definition,

Let {\displaystyle X}X be an arbitrary set and {\displaystyle H}H a Hilbert space of real-valued functions on {\displaystyle X}X, equipped with pointwise addition and pointwise scalar multiplication.

Hilbert space which is also a vector space??? Then further down the Hilbert space can be given an inner product. I'd suggest reviewing the definition of a Hilbert space before yammering on about RKHS.

A banach space with an inner product. Izmirlig (talk) 02:47, 5 July 2022 (UTC)[reply]

I think the point is that H is a Hilbert space of functions and not just some abstract vectors, because a function can be evaluated at a point, while a vector (i.e. an element of a linear space) does not have this property. One related remark is made in the article: L^2 is not a space of functions (but a space of classes of equivalence). Also, the discussion from 2005--2007 just above touches upon how the "evaluation at a point" property is not preserved under isomorphisms between Hilbert spaces, which (if my understanding is correct) is precisely because "evaluation at a point" is not in the structure of a Hilbert space. Thus, being specific that we consider Hilbert spaces of functions is not only correct but also necessary. AVM2019 (talk) 21:54, 9 September 2022 (UTC)[reply]

References and citations: more primary, less secondary

Parts of the article refer extensively to a single source, Lorenzo Rosasco's MIT course 9.520. This includes references to:

Durrett, Greg. 9.520 Course Notes, Massachusetts Institute of Technology, https://www.mit.edu/~9.520/scribe-notes/class03_gdurett.pdf, February 2010.
Rosasco, Lorenzo and Poggio, Thomas. "A Regularization Tour of Machine Learning – MIT 9.520 Lecture Notes" Manuscript, Dec. 2014.

The latter is not publicly available (neither free nor for a fee). The former are lecture notes for one of the classes apparently put together by one of Prof Rosasco's students (and has misattributed authorship).

I suggest to remove these references and replace them by more appropriate primary sources, such as endorsed books or peer-reviewed articles.

(Incidentally, on the course page, Prof. Rosasco encourages students to contribute to Wikipedia as part of assignments, which being a commendable thing, may explain some peculiarities of this article) AVM2019 (talk) 15:48, 10 September 2022 (UTC)[reply]

Kernel Examples and Kernel Type Query

Is the function f(x, y) = e^(x, y) a valid kernel function?

While there are some examples of kernels, would you like to expand upon the examples list? Inlin (talk) 11:26, 9 July 2023 (UTC)[reply]